In my previous post I explained the Fermat primality test:

- Input:
- Repeat times:
- Randomly choose .
- If , stop and output COMPOSITE.

- Output PROBABLY PRIME.

In future posts I’ll discuss how well this works, things to worry about, and so on. But first, I realized there’s one important piece to discuss first: *How do we compute *? This is obviously a key component of the above algorithm—and many related algorithms—and the fact that it can actually be done quite efficiently is what makes these algorithms practically useful. But if you’ve never thought about this before, it’s probably not obvious how to compute efficiently.

The obvious, “naive” way is to just do what it literally says: multiply by itself times, and then reduce the result . However, this has two very big problems; we’ll talk about each and then see how to fix them.

**Problem 1**: might be *very big*!

For example, suppose we want to test to see whether it is prime. (This is a pretty “small” number as far as testing for primality goes!) Say we choose (I actually generated at random using my computer). But then has *almost 54 thousand digits*! (I computed this as ; since is very close to , this is very close to .) It takes about 10 bits to store every three decimal digits (since is about the same as ), so a 54 thousand-digit number would require about to store, about the size of a small image. Such a number would take a while to compute, probably longer than simply trying all the possible divisors of . And if we want to test numbers with hundreds of digits, we would be completely out of luck.

Of course, it’s not really we want, but . Thankfully, we don’t actually have compute and then reduce it at the very end. The key is that taking remainders “commutes” with multiplication. That is, it doesn’t matter whether you multiply first or take remainders first:

So instead of waiting until the very end to reduce , we can reduce after each multiplication. For example, we could compute , and then , and so on. Much better! Now instead of ending up with a monstrosity with thousands, millions, *etc.* of digits, the intermediate numbers we have to deal with never get bigger than about , which is quite reasonable.

As a simple example, just to show that this really works, let’s pick , . Directly computing yields , which leaves a remainder of when divided by . On the other hand, we could first compute , then , and so on: , , , , , , and . At each step we multiply by and then reduce . The numbers we have to deal with are never bigger than . And sure enough, we still end up with .

However, there’s still another problem:

**Problem 2**: computing naively takes too many steps!

How many multiplication operations does it take to compute ? Well, , so this takes multiplication steps, right? First we compute , then , then … we multiply by (and reduce ) at each step.

However, if is big this could take quite a while! The number of multiplications grows *linearly* with , that is, it grows *exponentially* with the *size* (number of bits/digits) of —which is no better than trial division!

For example, assume we can do multiplications per second (this is actually fairly realistic). Testing —only an -digit number—requires computing , which would take multiplications. At only multiplications per second, this would take about seconds—and we have to repeat this times! If actually had hundreds of digits instead of just , then this would take way longer than the estimated age of the universe.

But there is a better way! Let’s start with something simple: how many multiplications does it take to compute ? Again, you might think it takes three (, , ) but it can be done with only two multiplications; can you see how?

The secret is that we don’t have to multiply by every time. Once we have computed bigger powers of , we can use them to build up even bigger powers faster. In the specific example of computing , we first compute ; but now that we have , we need only one more multiplication: we just square to compute . With yet one more multiplication, we could then square , getting , and so on.

So that quickly gets us exponents that are themselves powers of two. What about other exponents? This post is long enough so I think I’ll leave this for the next post. But if you’ve never seen this before I encourage you to think about how this would work. For example, how would you most efficiently compute ? Can you come up with an algorithm for computing in general?