In my last post we saw how to quickly compute powers of the form by repeatedly squaring:
; then
; and so on. This is much more efficient than computing powers by repeated multiplication: for example, we need only three multiplications to compute
by squaring, but we would need seven multiplications to compute it as
.
The question I left you with is whether we can use a similar technique to compute other powers which are not themselves powers of two. The idea is that in addition to squaring, we can also multiply by another copy of at strategic points. For example, suppose we want to compute
. We can do it like this:
(square and multiply by another
)
(square)
(square and multiply by
)
(square)
So how do we decide when to multiply by an extra copy of ? And can we get any exponent this way?
It becomes easier to answer these questions if we think recursively. Instead of starting with and building up to
, let’s start with our goal of computing
and see how to break it down into subproblems we can solve.
So suppose we’re trying to compute . There are two cases, depending on whether
is even or odd. If
is even, all we have to do is compute
; then we can square it to get
. What about if
is odd? Then we can compute
; squaring it gets us
; then multiplying by one more copy of
gets us
.
For example, let’s rethink our example with in this way.
is even, so our goal is to compute
; we can then square that to get
. So how do we compute
? Since
is odd, we can get it by squaring
, then multiplying by
one more time. To compute
, we want to square
; finally, to compute
we square
and multiply by another
. The base case of the recursion is when
, at which point we can stop:
is just
. In fact, we can use
as an even simpler base case:
. (Do you see how we will still get the right answer for
even if
is not a base case?)
As you may have already noticed, we can think of this in terms of the binary expansion of . Whether
is even or odd is determined by its final bit:
is even when the final bit is
, and odd when the final bit is
. Computing
(or
when
is odd) corresponds to chopping off the final bit of
. So what we’re really doing is chopping off one bit of
at a time, squaring at each step and multiplying by an extra copy of
when we see a
bit. For example,
in binary is
. Since the final bit is
, this tells us that we want to compute
and then square it. Since the final bit is now
, this in turn means that we want to compute
, square it, and multiply by
; and so on. So here is an equivalent way to write our algorithm as a loop:
result = 1
for each bit of n from left to right:
result = result * result
if the current bit is 1:
result = result * a
Here’s a visualization of the operations needed to compute using the two different methods:
Obviously, using the repeated squaring approach requires a lot fewer multiplications! But how many does it take, exactly? The loop repeats once for every bit of , and each time through the loop, we do either one or two multiplications. The number of bits in
is
; so in the worst case we do twice that many multiplications, that is,
. (This worst case happens precisely when
is one less than a power of two, that is, when its binary expansion consists of all
’s.)
Remember our example from last time? Suppose . Assuming that we can do
multiplications per second, computing
by repeated multiplication would take a minute and a half. So how about by repeated squaring? In that case it will take at worst
multiplications. Wow! Computing by repeatedly multiplying by
takes ten billion multiplication operations, but computing it by repeated squaring takes only thirty-five! (Actually it will take even less than that, since the binary expansion of
does not consist of all
’s—can you figure out exactly how many multiplications it will take?) I need hardly point out that if we can do
multiplications per second, doing
will take hardly any time at all (about 1/3 of a microsecond; in that amount of time, light travels only about
meters).
Obviously this makes a huge difference. And is actually rather small—although a minute and a half is a long time compared to a third of a microsecond, waiting a minute and a half for a computation is quite doable. But what about something like
? Computing
by repeated multiplication would require
multiplications; at
per second this would take
seconds, which is vastly, unimaginably longer than the estimated age of the universe (which is “only” about
seconds). Even if you expanded every microsecond in the history of the universe to take an entire age of the universe—and then repeated this process 20 times—you still wouldn’t have enough time! But what about with repeated squaring? Well,
.
One thousand seven hundred multiplications, at per second, would take… about 20 microseconds! 20 microseconds is, shall we say… a tiny bit faster than ages within ages within ages of the universe. This idea of doing repeated squaring—or, more generally, any algorithm where we get to repeatedly halve the size of the problem—is incredibly powerful!
(One thing I have been sweeping under the rug a bit is that not all multiplications take the same amount of time, so if you just want to compute , the multiplications will take longer and longer as the numbers become larger; saying that we can do
multiplications per second is only true if the numbers involved are less than, say,
. Not only that, but the results might be gigantic: for example, as far as we know there isn’t enough space in the entire universe to even write down the result of
, even if we could somehow write one digit on each atom! Everything I’ve said is justified, however, by the fact that we actually want to compute something like
: if we reduce
at each step, then all the multiplications really do take the same amount of time, and we don’t have to worry about the result getting astronomically large.)
You’ve got a
instead of a
at one point.
Thanks, fixed!
Pingback: Efficiency of repeated squaring | The Math Less Traveled
Pingback: Primality testing: recap | The Math Less Traveled