In my last post we saw how to quickly compute powers of the form by repeatedly squaring: ; then ; and so on. This is much more efficient than computing powers by repeated multiplication: for example, we need only three multiplications to compute by squaring, but we would need seven multiplications to compute it as .

The question I left you with is whether we can use a similar technique to compute other powers which are not themselves powers of two. The idea is that in addition to squaring, we can also multiply by another copy of at strategic points. For example, suppose we want to compute . We can do it like this:

- (square and multiply by another )
- (square)
- (square and multiply by )
- (square)

So how do we decide when to multiply by an extra copy of ? And can we get *any* exponent this way?

It becomes easier to answer these questions if we think recursively. Instead of starting with and building up to , let’s start with our goal of computing and see how to break it down into subproblems we can solve.

So suppose we’re trying to compute . There are two cases, depending on whether is even or odd. If is even, all we have to do is compute ; then we can square it to get . What about if is odd? Then we can compute ; squaring it gets us ; then multiplying by one more copy of gets us .

For example, let’s rethink our example with in this way. is even, so our goal is to compute ; we can then square that to get . So how do we compute ? Since is odd, we can get it by squaring , then multiplying by one more time. To compute , we want to square ; finally, to compute we square and multiply by another . The *base case* of the recursion is when , at which point we can stop: is just . In fact, we can use as an even simpler base case: . (Do you see how we will still get the right answer for even if is not a base case?)

As you may have already noticed, we can think of this in terms of the *binary* expansion of . Whether is even or odd is determined by its final bit: is even when the final bit is , and odd when the final bit is . Computing (or when is odd) corresponds to chopping off the final bit of . So what we’re really doing is chopping off one bit of at a time, squaring at each step and multiplying by an extra copy of when we see a bit. For example, in binary is . Since the final bit is , this tells us that we want to compute and then square it. Since the final bit is now , this in turn means that we want to compute , square it, and multiply by ; and so on. So here is an equivalent way to write our algorithm as a loop:

```
result = 1
for each bit of n from left to right:
result = result * result
if the current bit is 1:
result = result * a
```

Here’s a visualization of the operations needed to compute using the two different methods:

Obviously, using the repeated squaring approach requires a lot fewer multiplications! But *how many* does it take, exactly? The loop repeats once for every bit of , and each time through the loop, we do either one or two multiplications. The number of bits in is ; so in the worst case we do twice that many multiplications, that is, . (This worst case happens precisely when is one less than a power of two, that is, when its binary expansion consists of all ’s.)

Remember our example from last time? Suppose . Assuming that we can do multiplications per second, computing by repeated multiplication would take a minute and a half. So how about by repeated squaring? In that case it will take at worst

multiplications. Wow! Computing by repeatedly multiplying by takes ten billion multiplication operations, but computing it by repeated squaring takes only *thirty-five*! (Actually it will take even less than that, since the binary expansion of does not consist of all ’s—can you figure out exactly how many multiplications it will take?) I need hardly point out that if we can do multiplications per second, doing will take hardly any time at all (about 1/3 of a microsecond; in that amount of time, light travels only about meters).

Obviously this makes a *huge* difference. And is actually rather small—although a minute and a half is a long time compared to a third of a microsecond, waiting a minute and a half for a computation is quite doable. But what about something like ? Computing by repeated multiplication would require multiplications; at per second this would take seconds, which is vastly, unimaginably longer than the estimated age of the universe (which is “only” about seconds). Even if you expanded every microsecond in the history of the universe to take an entire age of the universe—and then repeated this process 20 times—you still wouldn’t have enough time! But what about with repeated squaring? Well,

.

One thousand seven hundred multiplications, at per second, would take… about 20 microseconds! 20 microseconds is, shall we say… a tiny bit faster than ages within ages within ages of the universe. This idea of doing repeated squaring—or, more generally, any algorithm where we get to repeatedly halve the size of the problem—is incredibly powerful!

(One thing I have been sweeping under the rug a bit is that not all multiplications take the same amount of time, so if you just want to compute , the multiplications will take longer and longer as the numbers become larger; saying that we can do multiplications per second is only true if the numbers involved are less than, say, . Not only that, but the results might be gigantic: for example, as far as we know there isn’t enough space in the *entire universe* to even *write down* the result of , even if we could somehow write one digit on each atom! Everything I’ve said is justified, however, by the fact that we actually want to compute something like : if we reduce at each step, then all the multiplications really do take the same amount of time, and we don’t have to worry about the result getting astronomically large.)