Modular exponentiation

In my previous post I explained the Fermat primality test:

  • Input: n
  • Repeat k times:
    • Randomly choose 1 < a < n-1.
    • If a^{n-1} \not \equiv 1 \pmod n, stop and output COMPOSITE.
  • Output PROBABLY PRIME.

In future posts I’ll discuss how well this works, things to worry about, and so on. But first, I realized there’s one important piece to discuss first: How do we compute a^{n-1} \bmod n? This is obviously a key component of the above algorithm—and many related algorithms—and the fact that it can actually be done quite efficiently is what makes these algorithms practically useful. But if you’ve never thought about this before, it’s probably not obvious how to compute a^{n-1} \bmod n efficiently.

The obvious, “naive” way is to just do what it literally says: multiply a by itself n-1 times, and then reduce the result \pmod n. However, this has two very big problems; we’ll talk about each and then see how to fix them.

Problem 1: a^{n-1} might be very big!

For example, suppose we want to test n = 13499 to see whether it is prime. (This is a pretty “small” number as far as testing for primality goes!) Say we choose a = 10006 (I actually generated a at random using my computer). But then a^{n-1} = 10006^{13498} has almost 54 thousand digits! (I computed this as \lfloor \log_{10} 10006^{13498} \rfloor + 1 = \lfloor 13498 \log_{10} 10006 \rfloor + 1; since \log_{10} 10006 is very close to 4, this is very close to 4 \times 13498 \approx 4 \times 13500 = 54000.) It takes about 10 bits to store every three decimal digits (since 2^{10} = 1024 is about the same as 10^3 = 1000), so a 54 thousand-digit number would require about 54000 \text{ digits} \times \frac{10 \text{ bits}}{3 \text{ digits}} \times \frac{1 \text{ byte}}{8 \text{ bits}} = 22500 \text{ bytes} to store, about the size of a small image. Such a number would take a while to compute, probably longer than simply trying all the possible divisors of 13499. And if we want to test numbers n with hundreds of digits, we would be completely out of luck.

Of course, it’s not really a^{n-1} we want, but a^{n-1} \bmod n. Thankfully, we don’t actually have compute a^{n-1} and then reduce it \pmod n at the very end. The key is that taking remainders “commutes” with multiplication. That is, it doesn’t matter whether you multiply first or take remainders first:

xy \bmod n \equiv (x \bmod n) (y \bmod n)

So instead of waiting until the very end to reduce \pmod n, we can reduce \pmod n after each multiplication. For example, we could compute a_2 = a^2 \bmod n, and then a_3 = a_2 \cdot a \bmod n, and so on. Much better! Now instead of ending up with a monstrosity with thousands, millions, etc. of digits, the intermediate numbers we have to deal with never get bigger than about n^2, which is quite reasonable.

As a simple example, just to show that this really works, let’s pick a = 7, n = 11. Directly computing 7^{10} yields 282475249, which leaves a remainder of 1 when divided by 11. On the other hand, we could first compute a_2 = 7\cdot 7 \bmod 11 = 49 \bmod 11 = 5, then a_3 = 5 \cdot 7 \bmod 11 = 35 \bmod 11 = 2, and so on: a_4 = 3, a_5 = 10, a_6 = 4, a_7 = 6, a_8 = 9, a_9 = 8, and a_{10} = 1. At each step we multiply by 7 and then reduce \pmod {11}. The numbers we have to deal with are never bigger than 70. And sure enough, we still end up with 1.

However, there’s still another problem:

Problem 2: computing a^{n-1} naively takes too many steps!

How many multiplication operations does it take to compute a^{n-1}? Well, a^{n-1} = \underbrace{a \cdot a \cdot a \cdot \dots \cdot a}_{n-1}, so this takes n-2 multiplication steps, right? First we compute a^2, then a^3, then a^4 … we multiply by a (and reduce \pmod n) at each step.

However, if n is big this could take quite a while! The number of multiplications grows linearly with n, that is, it grows exponentially with the size (number of bits/digits) of n—which is no better than trial division!

For example, assume we can do 10^8 multiplications per second (this is actually fairly realistic). Testing n = 10^{10} + 19—only an 11-digit number—requires computing a^{n-1}, which would take 10^{10} + 17 multiplications. At only 10^8 multiplications per second, this would take about 100 seconds—and we have to repeat this k times! If n actually had hundreds of digits instead of just 11, then this would take way longer than the estimated age of the universe.

But there is a better way! Let’s start with something simple: how many multiplications does it take to compute a^4? Again, you might think it takes three (a \cdot a = a^2, a^2 \cdot a = a^3, a^3 \cdot a = a^4) but it can be done with only two multiplications; can you see how?

The secret is that we don’t have to multiply by a every time. Once we have computed bigger powers of a, we can use them to build up even bigger powers faster. In the specific example of computing a^4, we first compute a^2 = a \cdot a; but now that we have a^2, we need only one more multiplication: we just square a^2 to compute a^2 \cdot a^2 = a^4. With yet one more multiplication, we could then square a^4, getting a^4 \cdot a^4 = a^8, and so on.

So that quickly gets us exponents that are themselves powers of two. What about other exponents? This post is long enough so I think I’ll leave this for the next post. But if you’ve never seen this before I encourage you to think about how this would work. For example, how would you most efficiently compute a^{11}? Can you come up with an algorithm for computing a^n in general?

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in computation, number theory and tagged , , , . Bookmark the permalink.

2 Responses to Modular exponentiation

  1. Pingback: Modular exponentiation by repeated squaring | The Math Less Traveled

  2. Pingback: Primality testing: recap | The Math Less Traveled

Comments are closed.