Quickly recognizing primes less than 100

Recently, Mark Dominus wrote about trying to memorize all the prime numbers under 1000. This is a cool idea, but it made me start thinking about alternative: instead of memorizing primes, could we memorize a procedure for determining whether a number under 1000 is prime or composite? And can we make it clever enough to be done relatively quickly? This does tie into my other recent posts about primality testing, but to be clear, it’s also quite different: I am not talking about a general method for determining primality, but the fastest method we can devise for mentally determining, by hook or by crook, whether a given number under 1000 is prime. Hopefully there are rules we can come up with which are valid for numbers less than 1000—and thus make them easier to test—even though they aren’t valid for bigger numbers in general.

As a warmup, today I’ll write about how I determine whether a number less than 100 is prime: I don’t have them memorized, but can very quickly decide whether such a number is prime or not—and you can, too! This is a situation where doing a little mathematical analysis beforehand goes a long way.

Since 10^2 = 100, every composite number less than 100 has at least one factor which is less than 10. This means that every composite less than 100 is divisible by 2, 3, 5, or 7. Multiples of 2, 3, and 5 are relatively easy to recognize, and as we’ll see, 7 is not hard to deal with either.

  1. Any even number or multiple of 5 (i.e. numbers ending with an even number or with a 5) is clearly composite. (Other than 2 and 5 themselves, of course.)

  2. Multiples of 3 are composite. There are many numbers which I think of as “obvious” multiples of three: 9, 21, 27, and 33 because I have had their factorizations memorized since third grade; 81 because it’s a power of three so I have it memorized too; and 39, 63, 69, 93, and 99 because they consist of two digits each of which is a multiple of three. As you probably know, there is also a simple test for determining divisibility by three: just add the digits and see whether the result is divisible by three. This test identifies a few more “nonobvious” multiples of three which you might otherwise think are prime: 51 = 3 \cdot 17, 57 = 3 \cdot 19, and 87 = 3 \cdot 29.

  3. What’s left? There are only three numbers less than 100 which are composite and not divisible by 2, 3, or 5:

    • 49 = 7 \cdot 7 and 77 = 7 \cdot 11 are easy to spot.
    • 91 = 7 \cdot 13 is the one I always used to forget about—I call it a “fool’s prime” because it “looks” prime but isn’t. It’s worth just memorizing the fact that it is composite (and its factorization).

So, to sum up, faced with a number less than 100 that I want to test for primality, I can quickly rule it out if it is divisible by 2, 3, or 5, or if it is a multiple of 7 I recognize (49, 77, or 91). And that’s it! Anything else has to be prime.

In a future post I plan to write about how feasible it is to come up with a similar procedure to identify all primes under 1000.

Posted in arithmetic, computation, primes | Tagged , , , | 16 Comments

Primality testing: recap

Whew, this is developing into one of the longest post series I’ve ever written (with quite a few tangents and detours along the way). I thought it would be worth taking a step back for a minute to recap what we’ve covered and where this is going.

Next up: first of all, there’s still quite a bit more to say about the Fermat primality test. After that I plan to present some better/more efficient tests as well (at least Miller-Rabin and Baille-PSW, possibly others). And knowing me there will be at least three more tangents along the way!

Posted in computation, number theory, primes | Tagged , , , | 2 Comments

Efficiency of repeated squaring: another proof

In my previous post I proved that the “binary algorithm” (corresponding to the binary expansion of a number n) is the most efficient way to build n using only doubling and incrementing steps.

Today I want to explain another nice proof, written in a comment by an anonymous1 commenter. So although this proof is not originally due to me, I thought it deserved to be written up more fully and not squished into a comment, and I’ve changed it in a few minor ways which I think make it easier to understand (although perhaps not everyone will agree!).

Let \alpha(n) denote the minimum number of doubling and incrementing steps needed to generate n, and let \beta(n) denote the number of steps used by the binary algorithm. Note that \alpha(n) \leq \beta(n) for all n: if the binary algorithm uses \beta(n) steps, then the optimal number of steps can’t be any higher.

Now, suppose that the binary algorithm (call it algorithm B) isn’t the most efficient algorithm (our goal will be to derive a contradiction from this assumption). That means there exist values of n for which \alpha(n) < \beta(n). Let m be the smallest such n, so we must have \alpha(k) = \beta(k) for all k < m.

First, note that m > 0: B uses zero steps for n = 0 which is obviously optimal. Now, let’s think about the parity of m:

  • If m is odd, then the last step of any algorithm to compute m has to be incrementing, since that’s the only way to get an odd number—if the last step is doubling then the result would be even. Removing this last incrementing step from the sequence generated by algorithm B results in a sequence of \beta(m) - 1 steps which yields m-1. Since \alpha(m) < \beta(m), there must be some other sequence of length \alpha(m) that yields m, but since m is odd it must also end in an increment, so likewise we can delete the final increment step to get a sequence of \alpha(m) - 1 steps which yields m-1. But \alpha(m) - 1 < \beta(m) - 1, so algorithm B is not optimal for (m-1)—contradicting our assumption that m is the smallest number for which B is not optimal.

    Put more succinctly: if we can generate an odd number m more efficiently than algorithm B, then we can also generate m-1 more efficiently, so the smallest non-optimal m can’t be odd.

  • So suppose m is even, say m = 2k. We know the last step of B is doubling in this case, since the binary representation of m ends in a 0. Let A be a sequence of length \alpha(m) that generates m.

    • If the last step of A is also doubling, then removing the doubling step would give us a way to generate k = m/2 more efficiently than algorithm B (since algorithm B’s sequence for m/2 is also the same as the sequence for m without the final doubling step), again contradicting the minimality of m.
    • So suppose the last step of A is incrementing. Since the binary sequence for 2k is the same as the sequence for k followed by a doubling step, \beta(2k) = 1 + \beta(k). This in turn is equal to 1 + \alpha(k), since we assumed that \alpha(k) = \beta(k) for any k < m. So we have \beta(2k) = 1 + \alpha(k).

      On the other hand, since the last step of the optimal sequence A for m = 2k is an increment, we have \alpha(2k) = 1 + \alpha(2k-1) (since A is an optimal sequence for 2k if and only if A without the final increment is an optimal sequence for 2k-1). This is equal to 1 + \beta(2k-1), since \alpha and \beta are equal on everything less than k. Since 2k-1 is odd, the binary algorithm sequence for 2k-1 ends with a double followed by an increment, hence 1 + \beta(2k-1) = 2 + \beta(2k-2) = 3 + \beta(k-1) = 3 + \alpha(k-1).

      Putting this all together, we have 1 + \alpha(k) = \beta(2k) > \alpha(2k) = 3 + \alpha(k-1), which means \alpha(k) > 2 + \alpha(k-1). But this is absurd: there’s no way the optimal sequence for k takes two more steps than the optimal sequence for k-1, because we could just add an increment.

So we have shown that all these cases lead to absurdity: the conclusion is that there can’t be any such n where \alpha(n) < \beta(n): the binary algorithm is optimal for every n!


  1. Well, unless their given name is actually John Doe.

Posted in computation, proof | Tagged , , , , , , , , | 3 Comments

Efficiency of repeated squaring: proof

My last post proposed a claim:

The binary algorithm is the most efficient way to build n using only doubling and incrementing steps. That is, any other way to build n by doubling and incrementing uses an equal or greater number of steps than the binary algorithm.

Someone posted a very nice, relatively short proof in the comments, which was quite different from the proof I had in mind. Maybe I’ll write about it in another post, but for now you can go read it for yourself.

In this post I’d like to present the proof I had in mind, which has a more constructive/computational flavor. Let’s use the digit 1 to represent an increment step, and the digit 2 to represent a doubling step. We can use a sequence of these digits to compactly represent a sequence of steps. Each sequence s of 1’s and 2’s thus corresponds to a natural number, namely, the one we get if we start from zero and execute all the steps from left to right. Let’s denote this number by N(s). So, for example, N(1221121) = 13, since starting from 0, we first increment (yielding 1), then double twice (yielding 4), increment twice (6), double (12), and finally increment (13). Also, denote the length of s by \ell(s). The length of s is the same as the number of steps it represents. For example, \ell(1221121) = 7.

Now, it turns out that 1221121 is not the most efficient way to generate 13. In looking at this and other examples of such non-optimal sequences, what stuck out to me is that they all seemed to contain consecutive increment operations. For example, in the case of 121121, consecutive 1’s are used in the middle to go from 4 to 6, after doubling 2 to get 4. But if we had just incremented once first (going from 2 to 3), then doubling would get us to 6 right away; the effect of the one increment is “magnified” by the subsequent doubling. Formally, we can say that 211 (a double followed by two increments) can always be replaced by 12 (an increment followed by a double), because 2x + 1 + 1 = 2x+2 = 2(x+1).

What if we keep doing this operation—replacing 211 with 12—as much as we can? It turns out that by doing this we can always transform any sequence into an equivalent one the same length or shorter which has no consecutive 1’s. This is the idea behind the first lemma.

Lemma 1. Given any sequence s, there exists a sequence s' such that

  • N(s') = N(s),
  • \ell(s') \leq \ell(s), and
  • s' has no consecutive 1’s.

In other words, for any sequence we can always find an equivalent, shorter (or at least not longer) sequence with no consecutive 1’s.

Proof. By induction on the number of 1 symbols in s.

  • Base case: if s has no consecutive 1’s, then s = s' satisfies the conditions of the lemma, and certainly an s with zero 1’s has no consecutive 1’s.
  • Let k \geq 0 and suppose we know that for any s with exactly k 1’s there exists an equivalent s' with the stated properties. Now suppose we have a sequence s with exactly (k+1) 1’s. If none are adjacent then we are done, since s itself has the required properties. Otherwise, consider the leftmost pair of consecutive 1’s. There are two cases to consider:

    • If s begins with two 1’s, s = 11\dots this means that the procedure for computing N(s) starts by incrementing twice from 0 to reach 2. Let s' be the sequence obtained from s by replacing the initial 11 with 12. An increment followed by a double also yields 2, and the rest of s' is identical to s, so N(s) = N(s'). But s' has one fewer 1 than s, so the induction hypothesis tells us there must be some equivalent s'' no longer than s' with no consecutive 1’s. This s'' is the one we are looking for; we need only observe that N(s) = N(s') = N(s'') and \ell(s'') \leq \ell(s') = \ell(s).

    • Otherwise, the leftmost pair of 1’s must occur immdiately following a 2. In that case, as argued before, we can replace 211 by 12, which yields an equivalent but shorter sequence and reduces the number of 1’s by one. Again, the induction hypothesis now implies that we can find an equivalent sequence with no repeated 1’s which is no longer.

Let’s look at an example. Suppose we start with the most inefficient way possible to represent 13, namely, 13 consecutive increment steps. Then the lemma above says we can find an equivalent, shorter sequence with no consecutive 1’s. Moreover, the proof is actually constructive, that is, it does not merely assert that such a sequence exists, but gives us a concrete way to compute it. The proof outlines a recursive rewriting process which we can visualize as follows:

The red underline shows the part that is going to be rewritten at each step. Notice how the length either stays the same or decreases at each step, and how the number of 1’s decreases at each step. In fact, either a 1 gets deleted and the number of 2’s stays the same, or a 1 is replaced by a 2 when there are two 1’s at the leftmost position. This is the only way new 2’s are generated. So 2’s are “born” at the left end from a pair of 1’s, and then they spend the rest of their life slowly “migrating” to the right.

Let’s try starting with a different representation of 13, say, 121112111:

Curiously, it seems that the process ends with the same sequence (121221) even though we started with a different sequence that did not occur anywhere during the process generated by starting with thirteen 1’s. Could this just be a coincidence?

Well, the fact that I’m even asking the question kind of gives away the answer: it’s no coincidence. In fact, for a given n, if you start with any sequence representing n and run this process, you will always end with the same sequence at the end. And not only will this particular process always yield the same sequence, but there is only one possible sequence it could yield:

Lemma 2. For every natural number n, there is a unique sequence s such that s has no consecutive 1’s and N(s) = n.

Proof. By (strong) induction on n.

  • In the base case (n = 0), there is only one sequence which represents 0 at all, namely, the empty sequence (and it has no consecutive 1’s).

  • Now pick some n \geq 0 and assume that consecutive-1-less representations are unique for all n' < n. Suppose s_1 and s_2 are two sequences with no consecutive 1’s such that N(s_1) = N(s_2) = n. Our goal is to show that in fact s_1 = s_2.

    • If s_1 and s_2 both end with the same symbol, then removing it yields two consecutive-1-less sequences representing some n' < n (either n - 1 or n / 2 depending on the symbol removed); by the induction hypothesis they must be the same and hence s_1 = s_2 as well.

    • Otherwise, suppose without loss of generality that s_1 ends with 1 and s_2 ends with 2; we will show that this case is actually impossible. N(s_2) must be even, since it ends with a doubling step. If s_1 ends with a doubling step followed by an increment step (or if it consists solely of an increment), then it would be odd, but this is impossible since N(s_1) = N(s_2) and N(s_2) is even. Hence s_1 must end in consecutive 1’s; but this is also impossible since we assumed s_1 had no consecutive 1’s.

Finally, putting the pieces together: notice that the sequence generated from the binary representation of n has no consecutive 1’s, since each bit always generates a 2, which is optionally followed by a 1. Since by Lemma 2 we now know that such representations are unique, the process explained in the proof of Lemma 1 must actually result in this same unique sequence corresponding to the binary representation of n. Since this process results in a sequence which is no longer than the starting sequence, this means that every sequence of steps representing n must be at least as long as the binary sequence. Hence, the binary sequence is the most efficient!

Posted in computation, proof | Tagged , , , , , , , , | 2 Comments

Efficiency of repeated squaring

As you probably realized if you read both, my recent post without words connects directly to my previous post on exponentiation by repeated squaring Each section shows the sequence of operations used by the repeated squaring algorithm to build up a certain exponent. For example, here’s the diagram for 13:

Each row has two open rectangles exactly the same length as the previous row; this represents squaring, that is, multiplying the exponent by two. Some rows also have an extra dark square at the end, which represents multiplying by a, that is, adding 1 to the exponent. You can read off the binary representation of the final exponent by reading from top to bottom: a filled-in square represents a 1, and no filled-in square represents a 0. In the case of 13 above, we can see that the binary representation of 13 is 1101.

Commenter Steven G made a very interesting guess, that the images represented the most efficient way to form each integer using only doubling and adding 1. This seems plausible, but I was not at all sure. There are lots of ways to build a given integer by doubling and adding 1. For example, we can get 3 by adding 1 three times; or by adding 1, then doubling, then adding 1. We can get 6 by adding 1, doubling, adding 1, and doubling; or by adding 1, doubling twice, and then adding 1 twice. For certain numbers, might there be some weird clever way to build them more efficiently than the algorithm corresponding to their binary representation?

Claim: the binary algorithm is the most efficient way to build n using only doubling and incrementing steps. That is, any other way to build n by doubling and incrementing uses an equal or greater number of steps than the binary algorithm.

Let’s make this precise:

  • “One step” refers to either a doubling step or an incrementing step.

  • The binary algorithm is as follows: start with 0; reading the binary representation of n from left to right, double the current number and add 1 (two steps) every time you encounter a 1 bit, and only double (one step) every time you encounter a 0 bit— except that for the initial 1 bit you simply add one, instead of doing a useless doubling first (it’s pointless to double zero).

Can you prove the claim? I think I have a nice proof, which I’ll share in a future post, but I’m interested to see what others come up with.

Posted in computation | Tagged , , , , , , , | 7 Comments

Post without words #22

Image | Posted on by | Tagged , , | 8 Comments

Modular exponentiation by repeated squaring

In my last post we saw how to quickly compute powers of the form a^{2^k} by repeatedly squaring: (a^2)^2 = a^4; then (a^4)^2 = a^8; and so on. This is much more efficient than computing powers by repeated multiplication: for example, we need only three multiplications to compute a^8 by squaring, but we would need seven multiplications to compute it as a \cdot a \cdot \dots \cdot a.

The question I left you with is whether we can use a similar technique to compute other powers which are not themselves powers of two. The idea is that in addition to squaring, we can also multiply by another copy of a at strategic points. For example, suppose we want to compute a^{26}. We can do it like this:

  • a^2 \cdot a = a^3 (square and multiply by another a)
  • (a^3)^2 = a^6 (square)
  • (a^6)^2 \cdot a = a^{13} (square and multiply by a)
  • (a^{13})^2 = a^{26} (square)

So how do we decide when to multiply by an extra copy of a? And can we get any exponent this way?

It becomes easier to answer these questions if we think recursively. Instead of starting with a and building up to a^n, let’s start with our goal of computing a^n and see how to break it down into subproblems we can solve.

So suppose we’re trying to compute a^n. There are two cases, depending on whether n is even or odd. If n is even, all we have to do is compute a^{n/2}; then we can square it to get a^n. What about if n is odd? Then we can compute a^{(n-1)/2}; squaring it gets us a^{n-1}; then multiplying by one more copy of a gets us a^n.

For example, let’s rethink our example with a^{26} in this way. 26 is even, so our goal is to compute a^{26/2} = a^{13}; we can then square that to get a^{26}. So how do we compute a^{13}? Since 13 is odd, we can get it by squaring a^6, then multiplying by a one more time. To compute a^6, we want to square a^3; finally, to compute a^3 we square a^1 and multiply by another a. The base case of the recursion is when n=1, at which point we can stop: a^1 is just a. In fact, we can use n=0 as an even simpler base case: a^0 = 1. (Do you see how we will still get the right answer for a^1 even if n=1 is not a base case?)

a^n = \begin{cases} 1 & \text{if \begin{math}n=0\end{math}} \\ (a^{n/2})^2 & \text{if \begin{math}n\end{math} is even} \\ (a^{(n-1)/2})^2 \cdot a & \text{if \begin{math}n\end{math} is odd} \end{cases}

As you may have already noticed, we can think of this in terms of the binary expansion of n. Whether n is even or odd is determined by its final bit: n is even when the final bit is 0, and odd when the final bit is 1. Computing n/2 (or (n-1)/2 when n is odd) corresponds to chopping off the final bit of n. So what we’re really doing is chopping off one bit of n at a time, squaring at each step and multiplying by an extra copy of a when we see a 1 bit. For example, 26 in binary is 11010_2. Since the final bit is 0, this tells us that we want to compute a^{1101_2} and then square it. Since the final bit is now 1, this in turn means that we want to compute a^{110_2}, square it, and multiply by a; and so on. So here is an equivalent way to write our algorithm as a loop:

result = 1
for each bit of n from left to right:
    result = result * result
    if the current bit is 1:
        result = result * a

Here’s a visualization of the operations needed to compute a^{26} using the two different methods:

Obviously, using the repeated squaring approach requires a lot fewer multiplications! But how many does it take, exactly? The loop repeats once for every bit of n, and each time through the loop, we do either one or two multiplications. The number of bits in n is \lfloor \log_2 n \rfloor + 1; so in the worst case we do twice that many multiplications, that is, 2 \lfloor \log_2 n \rfloor + 2. (This worst case happens precisely when n is one less than a power of two, that is, when its binary expansion consists of all 1’s.)

Remember our example from last time? Suppose n = 10^{10}. Assuming that we can do 10^8 multiplications per second, computing a^n by repeated multiplication would take a minute and a half. So how about by repeated squaring? In that case it will take at worst

2 \lfloor \log_2 10^{10} \rfloor + 2 = 2 \lfloor 10 \log_2 10 \rfloor + 2 = 35

multiplications. Wow! Computing 10^{10} by repeatedly multiplying by a takes ten billion multiplication operations, but computing it by repeated squaring takes only thirty-five! (Actually it will take even less than that, since the binary expansion of 10^{10} does not consist of all 1’s—can you figure out exactly how many multiplications it will take?) I need hardly point out that if we can do 10^8 multiplications per second, doing 35 will take hardly any time at all (about 1/3 of a microsecond; in that amount of time, light travels only about 100 meters).

Obviously this makes a huge difference. And n = 10^{10} is actually rather small—although a minute and a half is a long time compared to a third of a microsecond, waiting a minute and a half for a computation is quite doable. But what about something like n = 10^{512}? Computing a^n by repeated multiplication would require 10^{512} multiplications; at 10^8 per second this would take 10^{504} seconds, which is vastly, unimaginably longer than the estimated age of the universe (which is “only” about 4 \times 10^{17} seconds). Even if you expanded every microsecond in the history of the universe to take an entire age of the universe—and then repeated this process 20 times—you still wouldn’t have enough time! But what about with repeated squaring? Well,

2 \lfloor \log_2 10^{512} \rfloor + 2 = 2 \lfloor 512 \log_2 10 \rfloor + 2 = 1702.

One thousand seven hundred multiplications, at 10^8 per second, would take… about 20 microseconds! 20 microseconds is, shall we say… a tiny bit faster than ages within ages within ages of the universe. This idea of doing repeated squaring—or, more generally, any algorithm where we get to repeatedly halve the size of the problem—is incredibly powerful!

(One thing I have been sweeping under the rug a bit is that not all multiplications take the same amount of time, so if you just want to compute a^k, the multiplications will take longer and longer as the numbers become larger; saying that we can do 10^8 multiplications per second is only true if the numbers involved are less than, say, 2^{64}. Not only that, but the results might be gigantic: for example, as far as we know there isn’t enough space in the entire universe to even write down the result of 2^{10^{512}}, even if we could somehow write one digit on each atom! Everything I’ve said is justified, however, by the fact that we actually want to compute something like a^{n-1} \bmod n: if we reduce \pmod n at each step, then all the multiplications really do take the same amount of time, and we don’t have to worry about the result getting astronomically large.)

Posted in computation, number theory | Tagged , , , , , , , | 4 Comments