Test for divisibility by all primes up to , and also . (In practice I test for 2 and 5 first, which is pretty much automatic; then for 3 and 11, which both involve adding and subtracting digits; then 7 and 19, which both involve multiplying the final digit by two and either subtracting (7) or adding (19); and last I test for 13.)
Check if the number is one of a memorized list of 18 composites less than which are not divisible by either or any prime . (In practice, of course, I do this check first, before doing any divisibility tests.)
Using this method, presented with a random number from 1-1000 not divisible by 2 or 5 (I exclude those because I can pretty much tell they are not prime instantaneously, and I think it makes for a more interesting measurement to exclude them), right now it takes me on average 15 seconds to determine whether such a number is prime or not. Of course the variance is huge: numbers that are divisible by 3 I can identify in a second; for numbers that are actually prime, it can take me something like 40 seconds to run through all the divisibility tests in my head. I expect with practice I’ll get faster.
I’m still interested in exploring other methods as well, but figured this is something relatively simple that can provide a good baseline for comparison!
For the rest of the post I want to talk about how I memorized the list of exceptional composites. Here’s the list again, this time with prime factorizations given:
That looks like a lot of stuff to memorize. And it would be, if you tried to memorize it as a bunch of individual, disconnected facts. But fortunately we can do better! Human brains are good at remembering sequences and stories. So we’re going to look at the numbers in order and tell ourselves stories about them. The better we get to know them and their relationships the easier they are to remember!
There is only one exceptional composite in each of the 200’s, 300’s, and 400’s, namely, 289, 391, and 493. What strikes one immediately about these numbers is that each is exactly 102 more than the previous. Is this a coincidence?
Of course not! It is actually due to the fact that . Each of these numbers is a product of with some other prime, and the second prime increases by every time. That is, , then , and . Of course adding 6 to a prime doesn’t always get us another prime—but it works surprisingly often for smaller primes. And every prime besides 2 and 3 is either one more than a multiple of 6, or one less. So if we start with 5 and 7 and keep adding 6, we will definitely hit all the primes.
This sequence of multiples of 17 starts from , and if we continue it further we see that it contains several more numbers from our exceptional set as well:
What about if we start with and keep adding ?
This sequence yields three of our exceptional composites, and quite a few others which in theory we can rule out with our divisibility tests but are probably worth knowing anyway (, , , ).
There are only two exceptional composites in the 500’s, and they are twins: 527 and 529. 527 we have already seen: it shows up in the sequence of multiples of 17 that begins with . On the other hand 529 is .
The next exceptional composite is 629, the next multiple of 17 in the sequence after 527. Of course it is also exactly 100 bigger than 529. I personally find the sequence 527, 529, 629 to be quite memorable.
Next is , which is the closest integer to two-thirds of 1000. If you know that , then 667 is also easy to remember for two reasons: it has the same digits as 676 but with the 7 and 6 reversed, and it is exactly 9 less than 676, and hence it is .
The last exceptional composite in the 600’s is 697, which is from the sequence of multiples of that started with 289, 391, 493 (595 is skipped because it is obviously a multiple of 5).
Next come a pair of twins, a 99, then another pair of twins, then another 99! 713 and 731 are twins because they have the same digits, with the 1 and 3 reversed. 731 we have already seen: it is from the same 17-sequence as 527 and 629. 713 is . 799 is again from the 17-sequence beginning with 289.
841 and 851 are twins because they have the same digits except the 4 and the 5, which are consecutive. 841 is , and 851 is . Finally we have 899 which is .
I haven’t thought of a nice story to tell about these—I think of the last three as sort of “sporadic”, but there’s only three of them so it’s not that hard. Someone else could probably come up with a nice mnemonic.
901 in any case is not too hard to remember because it’s a twin with 899, and it’s also the end of the 17-sequence that started with 289.
943 is . It’s , but unlike some of the other differences of squares I’ve highlighted, I doubt this will actually help me remember it.
961 is . I think it’s cute that , , and are all perfect squares.
Last but not least, . If you happen to know that (I sure don’t!), then this is easy to remember as .
Repetition helps too, so let’s recite: it starts with , then continues by 102s: 391, 493. After that the twins 527, 529, followed by 629; then 667 and 697. Then two sets of twins each with its 99: 713, 731, 799; 841, 851, 899; then 901 to come after 899, and then the three sporadic values: 943, 961, 989!
]]>In any case, today I want to return to the problem of quickly recognizing small primes. In my previous post we considered “small” to mean “less than 100”. Today we’ll kick it up a notch and consider recognizing primes less than 1000. I want to start by considering some simple approaches and see how far we can push them. In future posts we’ll consider some fancier things.
First, some divisibility tests! We already know how to test for divisibility by , , and . Let’s see rules for , , and .
To test for divisibility by , take the last digit, chop it off, and subtract double that digit from the rest of the number. Keep doing this until you get something which obviously either is or isn’t divisible by . For example, if we take , we first chop off the final 2; double it is 4, and subtracting 4 from leaves . Subtracting twice from yields , which is not divisible by ; hence neither is .
As an optimization, we can always reduce things mod 7. For example, if we see the digit 7, we can just throw it away; if we see an 8 or 9 we can treat it as 1 or 2, respectively. And if we see a 3, the rule would tell us to subtract 6, but if it’s easier we can add 1 instead, since subtracting 6 and adding 1 are the same mod 7. With a bit of practice this can be done quite quickly.
For an explanation of why this works, and several other fun methods for testing divisibility by 7, see this post by Mark Dominus.
To test a -digit number for divisibility by , just add the first and last digits and then subtract the middle digit. The original number is divisible by 11 if and only if the result is.
This is especially obvious with numbers like , where the sum of the first and last digits is equal to the middle digit. (Subtracting the middle digit would leave 0, which is divisible by 11.) But it also applies in cases like : we have .
The reason this works is that is equivalent to , so . This also suggests how to generalize to more than just 3-digit numbers: just alternately add and subtract digits.
To test for divisibility by , chop off the last digit, multiply it by , and add it to the remaining number. Keep doing this until you end up with something that you know either is or isn’t divisible by .
Here reducing mod can be even more helpful. For example, if the last digit is a , the rule says to add to what’s left. But is only 2 more than , so adding is equivalent to just adding .
Why does this work? Suppose the final digit of our number is and the rest of the number is . That is, our number is of the form , and we want to know whether this is equivalent to . But now note that if and only if . Why? From left to right, we are just multiplying both sides by ; from right to left, we are allowed to divide by since is relatively prime to . So why did we choose to multiply by ? It’s because it lets us get rid of the : is the smallest multiple of which is one away from a multiple of . Hence iff iff .
(Challenge: can you go back now and prove the test for divisibility by ?)
At this point we might ask: if we take a number less than and test it for divisibility by , , , , , and , what’s left? In other words, what are the composite numbers under that we haven’t found yet? It turns out there are of them: 289, 323, 361, 391, 437, 493, 527, 529, 551, 589, 629, 667, 697, 703, 713, 731, 779, 799, 817, 841, 851, 893, 899, 901, 943, 961, and 989. I’ll let you work out the factorizations; of course each one is a product of two primes which are at least .
So we could try to memorize this list and call it a day. Then the procedure becomes: given a number less than , (1) test it for divisibility by all primes up to 13, and (2) check if it is one of the 27 composite numbers we have memorized. If it passes both tests, then it is prime. This sounds doable, though honestly I’m not super excited about memorizing a list of 27 composites.
There are a few more things we could do, though. First of all, notice that the divisibility test for 19 is super easy, since 19 is one less than 2 times 10: chop off the last digit, double it, and add it to the rest. Keep doing this until… you know the drill. This is just like the test for 7, but we add instead of subtract.
OK, so what if we test for all primes up to 13 and also 19? Then there are only 18 composites left that we have to memorize: 289, 391, 493, 527, 529, 629, 667, 697, 713, 731, 799, 841, 851, 899, 901, 943, 961, and 989. This is looking a bit better, and I am already noticing lots of patterns that would help with memorization: 529 and 629; 713 and 731; 899 and 901… oh, and (since is ). (…and it turns out that before publishing this post I couldn’t help myself and went ahead and memorized the list. It wasn’t that hard. I’ll say more about it in a future post!)
We could also test for divisibility by 17, of course. Unfortunately it is a bit more annoying: the smallest multiple of 10 which is one away from a multiple of 17 is 50, which is one less than . So, to test for divisibility by 17, we chop off the last digit, multiply by 5, and subtract. This seems distinctly harder to do in my head than the other tests, because it seems to actually require dealing with two-digit numbers. If we do this, though, we are down to only 9 composites to memorize, which is not bad at all: 529, 667, 713, 841, 851, 899, 943, 961, 989.
]]>As a warmup, today I’ll write about how I determine whether a number less than 100 is prime: I don’t have them memorized, but can very quickly decide whether such a number is prime or not—and you can, too! This is a situation where doing a little mathematical analysis beforehand goes a long way.
Since , every composite number less than has at least one factor which is less than . This means that every composite less than is divisible by , , , or . Multiples of 2, 3, and 5 are relatively easy to recognize, and as we’ll see, is not hard to deal with either.
Any even number or multiple of 5 (i.e. numbers ending with an even number or with a 5) is clearly composite. (Other than 2 and 5 themselves, of course.)
Multiples of are composite. There are many numbers which I think of as “obvious” multiples of three: 9, 21, 27, and 33 because I have had their factorizations memorized since third grade; 81 because it’s a power of three so I have it memorized too; and 39, 63, 69, 93, and 99 because they consist of two digits each of which is a multiple of three. As you probably know, there is also a simple test for determining divisibility by three: just add the digits and see whether the result is divisible by three. This test identifies a few more “nonobvious” multiples of three which you might otherwise think are prime: , , and .
What’s left? There are only three numbers less than 100 which are composite and not divisible by 2, 3, or 5:
So, to sum up, faced with a number less than 100 that I want to test for primality, I can quickly rule it out if it is divisible by 2, 3, or 5, or if it is a multiple of 7 I recognize (49, 77, or 91). And that’s it! Anything else has to be prime.
In a future post I plan to write about how feasible it is to come up with a similar procedure to identify all primes under 1000.
]]>We started with Fermat’s Little Theorem and Euler’s Theorem, which form the basis for a lot of primality testing algorithms, with three different proofs of FlT and a proof of Euler’s Theorem.
We then detoured a bit to talk about some hypothetical machines and how fast they are.
Finally I got around to presenting the Fermat primality test, which is directly based on FlT.
For the Fermat test to make sense I realized we had to talk about modular exponentiation and how to do it by repeated squaring, which led to a Post Without Words and a whole tangent on its efficiency.
Next up: first of all, there’s still quite a bit more to say about the Fermat primality test. After that I plan to present some better/more efficient tests as well (at least Miller-Rabin and Baille-PSW, possibly others). And knowing me there will be at least three more tangents along the way!
]]>Today I want to explain another nice proof, written in a comment by an anonymous^{1} commenter. So although this proof is not originally due to me, I thought it deserved to be written up more fully and not squished into a comment, and I’ve changed it in a few minor ways which I think make it easier to understand (although perhaps not everyone will agree!).
Let denote the minimum number of doubling and incrementing steps needed to generate , and let denote the number of steps used by the binary algorithm. Note that for all : if the binary algorithm uses steps, then the optimal number of steps can’t be any higher.
Now, suppose that the binary algorithm (call it algorithm B) isn’t the most efficient algorithm (our goal will be to derive a contradiction from this assumption). That means there exist values of for which . Let be the smallest such , so we must have for all .
First, note that : B uses zero steps for which is obviously optimal. Now, let’s think about the parity of :
If is odd, then the last step of any algorithm to compute has to be incrementing, since that’s the only way to get an odd number—if the last step is doubling then the result would be even. Removing this last incrementing step from the sequence generated by algorithm B results in a sequence of steps which yields . Since , there must be some other sequence of length that yields , but since is odd it must also end in an increment, so likewise we can delete the final increment step to get a sequence of steps which yields . But , so algorithm B is not optimal for —contradicting our assumption that is the smallest number for which B is not optimal.
Put more succinctly: if we can generate an odd number more efficiently than algorithm B, then we can also generate more efficiently, so the smallest non-optimal can’t be odd.
So suppose is even, say . We know the last step of B is doubling in this case, since the binary representation of ends in a . Let be a sequence of length that generates .
So suppose the last step of A is incrementing. Since the binary sequence for is the same as the sequence for followed by a doubling step, . This in turn is equal to , since we assumed that for any . So we have .
On the other hand, since the last step of the optimal sequence A for is an increment, we have (since A is an optimal sequence for if and only if A without the final increment is an optimal sequence for ). This is equal to , since and are equal on everything less than . Since is odd, the binary algorithm sequence for ends with a double followed by an increment, hence .
Putting this all together, we have , which means . But this is absurd: there’s no way the optimal sequence for takes two more steps than the optimal sequence for , because we could just add an increment.
So we have shown that all these cases lead to absurdity: the conclusion is that there can’t be any such where : the binary algorithm is optimal for every !
The binary algorithm is the most efficient way to build using only doubling and incrementing steps. That is, any other way to build by doubling and incrementing uses an equal or greater number of steps than the binary algorithm.
Someone posted a very nice, relatively short proof in the comments, which was quite different from the proof I had in mind. Maybe I’ll write about it in another post, but for now you can go read it for yourself.
In this post I’d like to present the proof I had in mind, which has a more constructive/computational flavor. Let’s use the digit to represent an increment step, and the digit to represent a doubling step. We can use a sequence of these digits to compactly represent a sequence of steps. Each sequence of ’s and ’s thus corresponds to a natural number, namely, the one we get if we start from zero and execute all the steps from left to right. Let’s denote this number by . So, for example, , since starting from , we first increment (yielding 1), then double twice (yielding 4), increment twice (6), double (12), and finally increment (13). Also, denote the length of by . The length of is the same as the number of steps it represents. For example, .
Now, it turns out that is not the most efficient way to generate . In looking at this and other examples of such non-optimal sequences, what stuck out to me is that they all seemed to contain consecutive increment operations. For example, in the case of , consecutive ’s are used in the middle to go from to , after doubling to get . But if we had just incremented once first (going from to ), then doubling would get us to right away; the effect of the one increment is “magnified” by the subsequent doubling. Formally, we can say that (a double followed by two increments) can always be replaced by (an increment followed by a double), because .
What if we keep doing this operation—replacing with —as much as we can? It turns out that by doing this we can always transform any sequence into an equivalent one the same length or shorter which has no consecutive ’s. This is the idea behind the first lemma.
Lemma 1. Given any sequence , there exists a sequence such that
In other words, for any sequence we can always find an equivalent, shorter (or at least not longer) sequence with no consecutive ’s.
Proof. By induction on the number of symbols in .
Let and suppose we know that for any with exactly ’s there exists an equivalent with the stated properties. Now suppose we have a sequence with exactly ’s. If none are adjacent then we are done, since itself has the required properties. Otherwise, consider the leftmost pair of consecutive ’s. There are two cases to consider:
If begins with two ’s, this means that the procedure for computing starts by incrementing twice from to reach . Let be the sequence obtained from by replacing the initial with . An increment followed by a double also yields , and the rest of is identical to , so . But has one fewer than , so the induction hypothesis tells us there must be some equivalent no longer than with no consecutive ’s. This is the one we are looking for; we need only observe that and .
Otherwise, the leftmost pair of ’s must occur immdiately following a . In that case, as argued before, we can replace by , which yields an equivalent but shorter sequence and reduces the number of ’s by one. Again, the induction hypothesis now implies that we can find an equivalent sequence with no repeated ’s which is no longer.
Let’s look at an example. Suppose we start with the most inefficient way possible to represent , namely, consecutive increment steps. Then the lemma above says we can find an equivalent, shorter sequence with no consecutive ’s. Moreover, the proof is actually constructive, that is, it does not merely assert that such a sequence exists, but gives us a concrete way to compute it. The proof outlines a recursive rewriting process which we can visualize as follows:
The red underline shows the part that is going to be rewritten at each step. Notice how the length either stays the same or decreases at each step, and how the number of ’s decreases at each step. In fact, either a gets deleted and the number of ’s stays the same, or a is replaced by a when there are two ’s at the leftmost position. This is the only way new ’s are generated. So ’s are “born” at the left end from a pair of ’s, and then they spend the rest of their life slowly “migrating” to the right.
Let’s try starting with a different representation of , say, :
Curiously, it seems that the process ends with the same sequence () even though we started with a different sequence that did not occur anywhere during the process generated by starting with thirteen ’s. Could this just be a coincidence?
Well, the fact that I’m even asking the question kind of gives away the answer: it’s no coincidence. In fact, for a given , if you start with any sequence representing and run this process, you will always end with the same sequence at the end. And not only will this particular process always yield the same sequence, but there is only one possible sequence it could yield:
Lemma 2. For every natural number , there is a unique sequence such that has no consecutive ’s and .
Proof. By (strong) induction on .
In the base case (), there is only one sequence which represents at all, namely, the empty sequence (and it has no consecutive ’s).
Now pick some and assume that consecutive--less representations are unique for all . Suppose and are two sequences with no consecutive ’s such that . Our goal is to show that in fact .
If and both end with the same symbol, then removing it yields two consecutive--less sequences representing some (either or depending on the symbol removed); by the induction hypothesis they must be the same and hence as well.
Otherwise, suppose without loss of generality that ends with and ends with ; we will show that this case is actually impossible. must be even, since it ends with a doubling step. If ends with a doubling step followed by an increment step (or if it consists solely of an increment), then it would be odd, but this is impossible since and is even. Hence must end in consecutive ’s; but this is also impossible since we assumed had no consecutive ’s.
Finally, putting the pieces together: notice that the sequence generated from the binary representation of has no consecutive ’s, since each bit always generates a , which is optionally followed by a . Since by Lemma 2 we now know that such representations are unique, the process explained in the proof of Lemma 1 must actually result in this same unique sequence corresponding to the binary representation of . Since this process results in a sequence which is no longer than the starting sequence, this means that every sequence of steps representing must be at least as long as the binary sequence. Hence, the binary sequence is the most efficient!
]]>Each row has two open rectangles exactly the same length as the previous row; this represents squaring, that is, multiplying the exponent by two. Some rows also have an extra dark square at the end, which represents multiplying by , that is, adding to the exponent. You can read off the binary representation of the final exponent by reading from top to bottom: a filled-in square represents a , and no filled-in square represents a . In the case of 13 above, we can see that the binary representation of 13 is .
Commenter Steven G made a very interesting guess, that the images represented the most efficient way to form each integer using only doubling and adding 1. This seems plausible, but I was not at all sure. There are lots of ways to build a given integer by doubling and adding 1. For example, we can get 3 by adding 1 three times; or by adding 1, then doubling, then adding 1. We can get 6 by adding 1, doubling, adding 1, and doubling; or by adding 1, doubling twice, and then adding 1 twice. For certain numbers, might there be some weird clever way to build them more efficiently than the algorithm corresponding to their binary representation?
Claim: the binary algorithm is the most efficient way to build using only doubling and incrementing steps. That is, any other way to build by doubling and incrementing uses an equal or greater number of steps than the binary algorithm.
Let’s make this precise:
“One step” refers to either a doubling step or an incrementing step.
The binary algorithm is as follows: start with ; reading the binary representation of from left to right, double the current number and add 1 (two steps) every time you encounter a bit, and only double (one step) every time you encounter a bit— except that for the initial bit you simply add one, instead of doing a useless doubling first (it’s pointless to double zero).
Can you prove the claim? I think I have a nice proof, which I’ll share in a future post, but I’m interested to see what others come up with.
]]>The question I left you with is whether we can use a similar technique to compute other powers which are not themselves powers of two. The idea is that in addition to squaring, we can also multiply by another copy of at strategic points. For example, suppose we want to compute . We can do it like this:
So how do we decide when to multiply by an extra copy of ? And can we get any exponent this way?
It becomes easier to answer these questions if we think recursively. Instead of starting with and building up to , let’s start with our goal of computing and see how to break it down into subproblems we can solve.
So suppose we’re trying to compute . There are two cases, depending on whether is even or odd. If is even, all we have to do is compute ; then we can square it to get . What about if is odd? Then we can compute ; squaring it gets us ; then multiplying by one more copy of gets us .
For example, let’s rethink our example with in this way. is even, so our goal is to compute ; we can then square that to get . So how do we compute ? Since is odd, we can get it by squaring , then multiplying by one more time. To compute , we want to square ; finally, to compute we square and multiply by another . The base case of the recursion is when , at which point we can stop: is just . In fact, we can use as an even simpler base case: . (Do you see how we will still get the right answer for even if is not a base case?)
As you may have already noticed, we can think of this in terms of the binary expansion of . Whether is even or odd is determined by its final bit: is even when the final bit is , and odd when the final bit is . Computing (or when is odd) corresponds to chopping off the final bit of . So what we’re really doing is chopping off one bit of at a time, squaring at each step and multiplying by an extra copy of when we see a bit. For example, in binary is . Since the final bit is , this tells us that we want to compute and then square it. Since the final bit is now , this in turn means that we want to compute , square it, and multiply by ; and so on. So here is an equivalent way to write our algorithm as a loop:
result = 1
for each bit of n from left to right:
result = result * result
if the current bit is 1:
result = result * a
Here’s a visualization of the operations needed to compute using the two different methods:
Obviously, using the repeated squaring approach requires a lot fewer multiplications! But how many does it take, exactly? The loop repeats once for every bit of , and each time through the loop, we do either one or two multiplications. The number of bits in is ; so in the worst case we do twice that many multiplications, that is, . (This worst case happens precisely when is one less than a power of two, that is, when its binary expansion consists of all ’s.)
Remember our example from last time? Suppose . Assuming that we can do multiplications per second, computing by repeated multiplication would take a minute and a half. So how about by repeated squaring? In that case it will take at worst
multiplications. Wow! Computing by repeatedly multiplying by takes ten billion multiplication operations, but computing it by repeated squaring takes only thirty-five! (Actually it will take even less than that, since the binary expansion of does not consist of all ’s—can you figure out exactly how many multiplications it will take?) I need hardly point out that if we can do multiplications per second, doing will take hardly any time at all (about 1/3 of a microsecond; in that amount of time, light travels only about meters).
Obviously this makes a huge difference. And is actually rather small—although a minute and a half is a long time compared to a third of a microsecond, waiting a minute and a half for a computation is quite doable. But what about something like ? Computing by repeated multiplication would require multiplications; at per second this would take seconds, which is vastly, unimaginably longer than the estimated age of the universe (which is “only” about seconds). Even if you expanded every microsecond in the history of the universe to take an entire age of the universe—and then repeated this process 20 times—you still wouldn’t have enough time! But what about with repeated squaring? Well,
.
One thousand seven hundred multiplications, at per second, would take… about 20 microseconds! 20 microseconds is, shall we say… a tiny bit faster than ages within ages within ages of the universe. This idea of doing repeated squaring—or, more generally, any algorithm where we get to repeatedly halve the size of the problem—is incredibly powerful!
(One thing I have been sweeping under the rug a bit is that not all multiplications take the same amount of time, so if you just want to compute , the multiplications will take longer and longer as the numbers become larger; saying that we can do multiplications per second is only true if the numbers involved are less than, say, . Not only that, but the results might be gigantic: for example, as far as we know there isn’t enough space in the entire universe to even write down the result of , even if we could somehow write one digit on each atom! Everything I’ve said is justified, however, by the fact that we actually want to compute something like : if we reduce at each step, then all the multiplications really do take the same amount of time, and we don’t have to worry about the result getting astronomically large.)
]]>In future posts I’ll discuss how well this works, things to worry about, and so on. But first, I realized there’s one important piece to discuss first: How do we compute ? This is obviously a key component of the above algorithm—and many related algorithms—and the fact that it can actually be done quite efficiently is what makes these algorithms practically useful. But if you’ve never thought about this before, it’s probably not obvious how to compute efficiently.
The obvious, “naive” way is to just do what it literally says: multiply by itself times, and then reduce the result . However, this has two very big problems; we’ll talk about each and then see how to fix them.
Problem 1: might be very big!
For example, suppose we want to test to see whether it is prime. (This is a pretty “small” number as far as testing for primality goes!) Say we choose (I actually generated at random using my computer). But then has almost 54 thousand digits! (I computed this as ; since is very close to , this is very close to .) It takes about 10 bits to store every three decimal digits (since is about the same as ), so a 54 thousand-digit number would require about to store, about the size of a small image. Such a number would take a while to compute, probably longer than simply trying all the possible divisors of . And if we want to test numbers with hundreds of digits, we would be completely out of luck.
Of course, it’s not really we want, but . Thankfully, we don’t actually have compute and then reduce it at the very end. The key is that taking remainders “commutes” with multiplication. That is, it doesn’t matter whether you multiply first or take remainders first:
So instead of waiting until the very end to reduce , we can reduce after each multiplication. For example, we could compute , and then , and so on. Much better! Now instead of ending up with a monstrosity with thousands, millions, etc. of digits, the intermediate numbers we have to deal with never get bigger than about , which is quite reasonable.
As a simple example, just to show that this really works, let’s pick , . Directly computing yields , which leaves a remainder of when divided by . On the other hand, we could first compute , then , and so on: , , , , , , and . At each step we multiply by and then reduce . The numbers we have to deal with are never bigger than . And sure enough, we still end up with .
However, there’s still another problem:
Problem 2: computing naively takes too many steps!
How many multiplication operations does it take to compute ? Well, , so this takes multiplication steps, right? First we compute , then , then … we multiply by (and reduce ) at each step.
However, if is big this could take quite a while! The number of multiplications grows linearly with , that is, it grows exponentially with the size (number of bits/digits) of —which is no better than trial division!
For example, assume we can do multiplications per second (this is actually fairly realistic). Testing —only an -digit number—requires computing , which would take multiplications. At only multiplications per second, this would take about seconds—and we have to repeat this times! If actually had hundreds of digits instead of just , then this would take way longer than the estimated age of the universe.
But there is a better way! Let’s start with something simple: how many multiplications does it take to compute ? Again, you might think it takes three (, , ) but it can be done with only two multiplications; can you see how?
The secret is that we don’t have to multiply by every time. Once we have computed bigger powers of , we can use them to build up even bigger powers faster. In the specific example of computing , we first compute ; but now that we have , we need only one more multiplication: we just square to compute . With yet one more multiplication, we could then square , getting , and so on.
So that quickly gets us exponents that are themselves powers of two. What about other exponents? This post is long enough so I think I’ll leave this for the next post. But if you’ve never seen this before I encourage you to think about how this would work. For example, how would you most efficiently compute ? Can you come up with an algorithm for computing in general?
]]>