Modular exponentiation by repeated squaring

In my last post we saw how to quickly compute powers of the form a^{2^k} by repeatedly squaring: (a^2)^2 = a^4; then (a^4)^2 = a^8; and so on. This is much more efficient than computing powers by repeated multiplication: for example, we need only three multiplications to compute a^8 by squaring, but we would need seven multiplications to compute it as a \cdot a \cdot \dots \cdot a.

The question I left you with is whether we can use a similar technique to compute other powers which are not themselves powers of two. The idea is that in addition to squaring, we can also multiply by another copy of a at strategic points. For example, suppose we want to compute a^{26}. We can do it like this:

  • a^2 \cdot a = a^3 (square and multiply by another a)
  • (a^3)^2 = a^6 (square)
  • (a^6)^2 \cdot a = a^{13} (square and multiply by a)
  • (a^{13})^2 = a^{26} (square)

So how do we decide when to multiply by an extra copy of a? And can we get any exponent this way?

It becomes easier to answer these questions if we think recursively. Instead of starting with a and building up to a^n, let’s start with our goal of computing a^n and see how to break it down into subproblems we can solve.

So suppose we’re trying to compute a^n. There are two cases, depending on whether n is even or odd. If n is even, all we have to do is compute a^{n/2}; then we can square it to get a^n. What about if n is odd? Then we can compute a^{(n-1)/2}; squaring it gets us a^{n-1}; then multiplying by one more copy of a gets us a^n.

For example, let’s rethink our example with a^{26} in this way. 26 is even, so our goal is to compute a^{26/2} = a^{13}; we can then square that to get a^{26}. So how do we compute a^{13}? Since 13 is odd, we can get it by squaring a^6, then multiplying by a one more time. To compute a^6, we want to square a^3; finally, to compute a^3 we square a^1 and multiply by another a. The base case of the recursion is when n=1, at which point we can stop: a^1 is just a. In fact, we can use n=0 as an even simpler base case: a^0 = 1. (Do you see how we will still get the right answer for a^1 even if n=1 is not a base case?)

a^n = \begin{cases} 1 & \text{if \begin{math}n=0\end{math}} \\ (a^{n/2})^2 & \text{if \begin{math}n\end{math} is even} \\ (a^{(n-1)/2})^2 \cdot a & \text{if \begin{math}n\end{math} is odd} \end{cases}

As you may have already noticed, we can think of this in terms of the binary expansion of n. Whether n is even or odd is determined by its final bit: n is even when the final bit is 0, and odd when the final bit is 1. Computing n/2 (or (n-1)/2 when n is odd) corresponds to chopping off the final bit of n. So what we’re really doing is chopping off one bit of n at a time, squaring at each step and multiplying by an extra copy of a when we see a 1 bit. For example, 26 in binary is 11010_2. Since the final bit is 0, this tells us that we want to compute a^{1101_2} and then square it. Since the final bit is now 1, this in turn means that we want to compute a^{110_2}, square it, and multiply by a; and so on. So here is an equivalent way to write our algorithm as a loop:

result = 1
for each bit of n from left to right:
    result = result * result
    if the current bit is 1:
        result = result * a

Here’s a visualization of the operations needed to compute a^{26} using the two different methods:

Obviously, using the repeated squaring approach requires a lot fewer multiplications! But how many does it take, exactly? The loop repeats once for every bit of n, and each time through the loop, we do either one or two multiplications. The number of bits in n is \lfloor \log_2 n \rfloor + 1; so in the worst case we do twice that many multiplications, that is, 2 \lfloor \log_2 n \rfloor + 2. (This worst case happens precisely when n is one less than a power of two, that is, when its binary expansion consists of all 1’s.)

Remember our example from last time? Suppose n = 10^{10}. Assuming that we can do 10^8 multiplications per second, computing a^n by repeated multiplication would take a minute and a half. So how about by repeated squaring? In that case it will take at worst

2 \lfloor \log_2 10^{10} \rfloor + 2 = 2 \lfloor 10 \log_2 10 \rfloor + 2 = 35

multiplications. Wow! Computing 10^{10} by repeatedly multiplying by a takes ten billion multiplication operations, but computing it by repeated squaring takes only thirty-five! (Actually it will take even less than that, since the binary expansion of 10^{10} does not consist of all 1’s—can you figure out exactly how many multiplications it will take?) I need hardly point out that if we can do 10^8 multiplications per second, doing 35 will take hardly any time at all (about 1/3 of a microsecond; in that amount of time, light travels only about 100 meters).

Obviously this makes a huge difference. And n = 10^{10} is actually rather small—although a minute and a half is a long time compared to a third of a microsecond, waiting a minute and a half for a computation is quite doable. But what about something like n = 10^{512}? Computing a^n by repeated multiplication would require 10^{512} multiplications; at 10^8 per second this would take 10^{504} seconds, which is vastly, unimaginably longer than the estimated age of the universe (which is “only” about 4 \times 10^{17} seconds). Even if you expanded every microsecond in the history of the universe to take an entire age of the universe—and then repeated this process 20 times—you still wouldn’t have enough time! But what about with repeated squaring? Well,

2 \lfloor \log_2 10^{512} \rfloor + 2 = 2 \lfloor 512 \log_2 10 \rfloor + 2 = 1702.

One thousand seven hundred multiplications, at 10^8 per second, would take… about 20 microseconds! 20 microseconds is, shall we say… a tiny bit faster than ages within ages within ages of the universe. This idea of doing repeated squaring—or, more generally, any algorithm where we get to repeatedly halve the size of the problem—is incredibly powerful!

(One thing I have been sweeping under the rug a bit is that not all multiplications take the same amount of time, so if you just want to compute a^k, the multiplications will take longer and longer as the numbers become larger; saying that we can do 10^8 multiplications per second is only true if the numbers involved are less than, say, 2^{64}. Not only that, but the results might be gigantic: for example, as far as we know there isn’t enough space in the entire universe to even write down the result of 2^{10^{512}}, even if we could somehow write one digit on each atom! Everything I’ve said is justified, however, by the fact that we actually want to compute something like a^{n-1} \bmod n: if we reduce \pmod n at each step, then all the multiplications really do take the same amount of time, and we don’t have to worry about the result getting astronomically large.)

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in computation, number theory and tagged , , , , , , , . Bookmark the permalink.

4 Responses to Modular exponentiation by repeated squaring

  1. blaisepascal2014 says:

    You’ve got a 10^10 instead of a 10^{10} at one point.

  2. Pingback: Efficiency of repeated squaring | The Math Less Traveled

  3. Pingback: Primality testing: recap | The Math Less Traveled

Comments are closed.