But wait a minute, this is silly: if shares a common factor with , then we don’t need anything as complex as Fermat’s Little Theorem to figure out that is not prime! All we have to do is compute using the Euclidean Algorithm, and when we find out that the result isn’t , we can immediately conclude that isn’t prime since it has a nontrivial divisor.
So for comparison, let’s consider this (terrible!) primality test, call it the GCD test: given an we want to test, pick values of such that and compute for each one. If we find an for which , then is not prime.
This is terrible for several reasons. First, if we want this test to tell us for certain when is prime, we essentially have to try all the possible values of . We can optimize a bit by only trying —if has any nontrivial divisors we will be sure to find some that are under —but this doesn’t help all that much in the grand scheme of things. In fact, this GCD test is actually almost the same thing as trial division, where we try a bunch of different numbers to see whether they evenly divide . Both are essentially trying to find divisors of by brute-force search.
So suppose instead that we are willing to live with some uncertainty, and we just try some fixed number of values for , and either conclude with certainty that is composite (if we happen to find an that shares a nontrivial factor with ), or report that it is probably prime. How bad can this be—or put another way, how many values of do we have to try so we can be “reasonably certain” that really is prime when the test says it is?
The answer is that it can be very bad indeed. Euler’s totient function counts the number of integers which are relatively prime to . So if is composite and we pick uniformly at random, there are choices which won’t reveal the fact that is composite, and approximately choices which do share a common factor with and hence do reveal the fact that it is composite.
So the question is, how big can be, relative to ? We would like it to be small—which would leave us with many opportunities to learn that is composite—but in fact it can be quite big. For example, , which is not much smaller than itself. This means that of the numbers less than share no factors in common with ; only of them share a factor. In fact, is big precisely when has just a few large prime factors, which intuitively is exactly when is most difficult to factor. For such there really is no acceptable number of ’s we can test with the GCD test in order to be reasonably sure that is prime—in the case of , for example, each we randomly pick has only a chance of sharing a common factor with ; and this can get arbitrarily bad as gets larger.
So remember we started down this path by showing that if we use the Fermat primality test, values of which share a common factor with will definitely reveal the compositeness of . But now we know that if we rely on only such values of , not only do we have a very small chance of discovering composite —no better than just using trial division—but more than that, actually using the Fermat test itself would be silly, since we should just use the simpler GCD test instead!
So if the Fermat primality test is worthwhile at all, it must be because there are other values of , which don’t share any common factors with , but nonetheless still reveal the fact that is composite since . So how many of those values of are there? Stay tuned!
]]>If is prime and is an integer where , then .
Recall that we can turn this directly into a test for primality, called the Fermat primality test, as follows: given some number that we want to test for primality, pick an integer between and (say, at random), and compute . If the result is not equal to , then Fermat’s Little Theorem tells us that is definitely not prime. Repeat some fixed number of times . If we get every time, then report that is probably prime.
This “probably prime” business is a little annoying. Can we somehow turn this into a deterministic primality test? Well, for one thing, instead of just picking random values, what if we tested all ? (Yes, of course this would take way too long, but bear with me for a bit!) If they all yield when raised to the power, could still possibly be composite, or could we conclude it is definitely prime? Put another way, are there any composite numbers such that for all ?
It turns out that this would work: there do not exist composite numbers such that for all , so if we test all possible values of and we always get , then we can conclude with certainty that is prime. Let’s prove it!
Suppose is composite, and let be some number which shares a nontrivial common divisor with , that is, . If is composite then such an must exist; for example, we could just take to be one of ’s prime divisors. Now, I claim that can’t possibly be equivalent to . Let be the remainder when dividing by , that is, . Rearranging gives , which means that is divisible by , that is, for some integer . Rearranging this again, we get . But by our assumption, and are both divisible by , and hence must be divisible by —but that means must be divisible by as well. Since we assumed that and have a nontrivial common factor, that is, , we conclude that too, that is, .
So we now know that any number which shares a common factor with will definitely reveal the fact that is composite. How many (or how few) such could there be? And what about other values of , which don’t share a factor with —how many of them might also reveal the fact that is composite? We’ll tackle these questions in my next post!
]]>Given the exact sequence of operations you did to reduce the rational to zero, you can easily reverse them to reconstruct the original rational. For example, suppose you had to subtract 1 three times, then reciprocate, then subtract 1 five times, reciprocate, then subtract 1 two more times. This gives us the equation
or, inverting everything to solve for ,
.
This clearly gives the right answer; the only question is whether it will stop after a finite amount of time. But it does, since every time we subtract 1 we are making the numerator smaller without changing the denominator, and reciprocating just switches the numerator and denominator without making them bigger. Also, if we had a number less than 1 and reciprocate it, the result will definitely be bigger than 1, at which point we can subtract 1 from it at least once, and hence we cannot get stuck reciprocating repeatedly without doing any subtracting. Since we have two positive integers which are either staying the same or getting smaller on each step, the process must eventually stop.
There are several equivalent ways to think about what is going on here:
One way to think about this—made clear by the expression for in the example above—is that we are building up the continued fraction expansion for .
Another way to think about it is that we are running the Euclidean Algorithm on ’s numerator and denominator. Of course the Euclidean Algorithm is typically used to find the greatest common divisor of two numbers, and if we imagine to be in lowest terms then of course the gcd of its numerator and denominator will be 1, so finding the gcd doesn’t tell us anything in and of itself. The point is, however, that the exact sequence of steps taken by the Euclidean Algorithm is unique for each relatively prime pair , and so we can use the sequence of steps to reverse-engineer what and must have been in the first place.
Yet another way, closely related to the previous, is to imagine that we are finding our way up the Calkin-Wilf tree step by step. Recall that the Calkin-Wilf tree is the infinite binary tree of positive rationals where the root is and each node has as its left child and as its right child. Each rational appears at a unique location in the tree, so the sequence of upwards steps taken from to the root allows us to reconstruct the original .
More specifically, if we have a number bigger than , it must be of the form for some and , i.e. it is a right child in the Calkin-Wilf tree. Subtracting one from yields , so it corresponds to taking one step up and to the left in the Calkin-Wilf tree. When we reach a number less than , it means that node is a left child, so we cannot take another step up and to the left. Reciprocating corresponds to mirroring the entire tree left-right, and then we can continue taking steps up and to the left.
So how long does this take? Moving up one level in the Calkin-Wilf tree takes at worst operations: one subtraction, two comparisons, and a reciprocate. (We have to do two comparisons because when we find out that a number is less than one, we have to also check if it is greater than 0 to see whether we should reciprocate or stop.) On the other hand, the best case takes only two operations (a subtraction and a comparison, if the result is still greater than one). The worst case for a given depth in the C-W tree would be if we start with the ratio of two consecutive Fibonacci numbers, since we would have to reciprocate after doing only one subtraction at every single step (try it!).
There’s also another fun thing we can do to speed things up. Instead of just subtracting repeatedly, we can use a sort of binary search instead. That is, we can first create cubes containing powers of two (in practice we can just create them on the fly as needed). Then given a number , we first find the smallest power of two such that is less (by computing , then , then , then , … and checking each time until we find the first such that ), and then we do a binary search on the interval to find the smallest such that (in practice this just means adding or subtracting the next smaller power of two at each step, depending on whether the previous step was less than or greater than 1). Not counting the operations needed to create the cubes with the powers of 2 in the first place (since we can reuse them many times, and in any case it takes only one operation per power of two), this would take about addition and subtraction operations. One might worry that this would be slightly slower for small values of ; however, I think (but have not fully worked out the details) that this will actually never require more operations than the naive subtraction method; I will leave this as an exercise. Of course, for larger this is clearly going to be a big win, since is much smaller than .
Of course, if the wizard had provided a machine that could perform a “floor” operation, we could make this even more efficient: instead of subtracting until finding a result less than , we could just compute . This is like being able to jump as far up and to the left as possible in the Calkin-Wilf tree using only two operations. (Unsurprisingly, the floor function plays a key role in the algorithm for generating the Calkin-Wilf sequence.) I actually had this in the original version of the puzzle, but took it out when I realized that it was not necessary, and slightly more interesting to do without!
Several commenters mentioned using the Stern-Brocot tree to search for the secret rational. It’s probably a topic for another blog post, but briefly, the idea is to keep track of four integers , , , and representing the rational numbers and . We start with and (representing ) and , (, representing “infinity”). We maintain the invariant that , that is, we maintain and as the endpoints of an interval that contains the secret rational . At each step we compute the mediant , which is guaranteed to lie in between and (exercise: prove this!), and check whether it is equal to the secret rational. If not, we either set or , depending on whether the secret rational is greater or less than , respectively. Unlike a simple binary search (which can only find rationals with a denominator that is a power of two in finite time), this is guaranteed to terminate in a finite amount of time; every rational can be obtained after a finite number of successive steps of taking the mediant.
So how long does it take? It turns out that the Stern-Brocot tree and the Calkin-Wilf tree have all the same rationals on each level, but in a different order^{1}, so the two methods are easy to compare. The proposed Stern-Brocot method moves down the tree one level each step, and it needs four five operations for each step (two additions to form the mediant, one division to turn the pair of integers representing the mediant into a rational number, and then two comparisons to find out whether the mediant is less than, equal to, or greater than the target number). Unless there is a more clever way to do this that I missed, it seems my method is a clear win: for a rational on level of the Stern-Brocot or Calkin-Wilf trees, the iterated mediant algorithm always needs exactly operations, whereas for my algorithm is only a worst case (for a ratio of consecutive Fibonacci numbers), but in many cases it does better (much better for rationals with large entries in their continued fraction expansion, which we can skip past in logarithmic time).
Each is a bit-reversal permutation of the other.
I had fun creating an elaborate setup to frame the puzzle, but as you probably figured out, really the puzzle comes down to this: is it possible to figure out the numerator and denominator of an unknown positive rational number, if you are only allowed to take reciprocals, add, subtract, multiply, and test whether one number is less than another?
There are many ways to solve this. First, a few preliminary ideas which many commenters picked up on:
Given these preliminaries, the simplest method, put forth originally by Eric Burgess, is to methodically list all the positive rational numbers (creating each one by building up its numerator and denominator out of copies of the number ) and test each one to see whether it is equal to the wizard’s number. Since the rational numbers are countable, we can list them in such a way that every rational number appears at a finite position. This means that no matter what the wizard’s number is, after some finite amount of time we will encounter it in our list. (One very nice such list we might consider is the Calkin-Wilf order , …)
However, this could of course take a long time. For example, suppose the wizard has chosen . It turns out that this is the 4,285,192nd rational in the Calkin-Wilf order. Even if we can construct and test one rational number every 20 minutes (which actually seems overly optimistic, more on this later), it would still take us over years (with no breaks for sleeping or eating, and did you notice the wizard didn’t seem to provide a bathroom for us to use?). So we have to ask: is there a faster way?
Test for divisibility by all primes up to , and also . (In practice I test for 2 and 5 first, which is pretty much automatic; then for 3 and 11, which both involve adding and subtracting digits; then 7 and 19, which both involve multiplying the final digit by two and either subtracting (7) or adding (19); and last I test for 13.)
Check if the number is one of a memorized list of 18 composites less than which are not divisible by either or any prime . (In practice, of course, I do this check first, before doing any divisibility tests.)
Using this method, presented with a random number from 1-1000 not divisible by 2 or 5 (I exclude those because I can pretty much tell they are not prime instantaneously, and I think it makes for a more interesting measurement to exclude them), right now it takes me on average 15 seconds to determine whether such a number is prime or not. Of course the variance is huge: numbers that are divisible by 3 I can identify in a second; for numbers that are actually prime, it can take me something like 40 seconds to run through all the divisibility tests in my head. I expect with practice I’ll get faster.
I’m still interested in exploring other methods as well, but figured this is something relatively simple that can provide a good baseline for comparison!
For the rest of the post I want to talk about how I memorized the list of exceptional composites. Here’s the list again, this time with prime factorizations given:
That looks like a lot of stuff to memorize. And it would be, if you tried to memorize it as a bunch of individual, disconnected facts. But fortunately we can do better! Human brains are good at remembering sequences and stories. So we’re going to look at the numbers in order and tell ourselves stories about them. The better we get to know them and their relationships the easier they are to remember!
There is only one exceptional composite in each of the 200’s, 300’s, and 400’s, namely, 289, 391, and 493. What strikes one immediately about these numbers is that each is exactly 102 more than the previous. Is this a coincidence?
Of course not! It is actually due to the fact that . Each of these numbers is a product of with some other prime, and the second prime increases by every time. That is, , then , and . Of course adding 6 to a prime doesn’t always get us another prime—but it works surprisingly often for smaller primes. And every prime besides 2 and 3 is either one more than a multiple of 6, or one less. So if we start with 5 and 7 and keep adding 6, we will definitely hit all the primes.
This sequence of multiples of 17 starts from , and if we continue it further we see that it contains several more numbers from our exceptional set as well:
What about if we start with and keep adding ?
This sequence yields three of our exceptional composites, and quite a few others which in theory we can rule out with our divisibility tests but are probably worth knowing anyway (, , , ).
There are only two exceptional composites in the 500’s, and they are twins: 527 and 529. 527 we have already seen: it shows up in the sequence of multiples of 17 that begins with . On the other hand 529 is .
The next exceptional composite is 629, the next multiple of 17 in the sequence after 527. Of course it is also exactly 100 bigger than 529. I personally find the sequence 527, 529, 629 to be quite memorable.
Next is , which is the closest integer to two-thirds of 1000. If you know that , then 667 is also easy to remember for two reasons: it has the same digits as 676 but with the 7 and 6 reversed, and it is exactly 9 less than 676, and hence it is .
The last exceptional composite in the 600’s is 697, which is from the sequence of multiples of that started with 289, 391, 493 (595 is skipped because it is obviously a multiple of 5).
Next come a pair of twins, a 99, then another pair of twins, then another 99! 713 and 731 are twins because they have the same digits, with the 1 and 3 reversed. 731 we have already seen: it is from the same 17-sequence as 527 and 629. 713 is . 799 is again from the 17-sequence beginning with 289.
841 and 851 are twins because they have the same digits except the 4 and the 5, which are consecutive. 841 is , and 851 is . Finally we have 899 which is .
I haven’t thought of a nice story to tell about these—I think of the last three as sort of “sporadic”, but there’s only three of them so it’s not that hard. Someone else could probably come up with a nice mnemonic.
901 in any case is not too hard to remember because it’s a twin with 899, and it’s also the end of the 17-sequence that started with 289.
943 is . It’s , but unlike some of the other differences of squares I’ve highlighted, I doubt this will actually help me remember it.
961 is . I think it’s cute that , , and are all perfect squares.
Last but not least, . If you happen to know that (I sure don’t!), then this is easy to remember as .
Repetition helps too, so let’s recite: it starts with , then continues by 102s: 391, 493. After that the twins 527, 529, followed by 629; then 667 and 697. Then two sets of twins each with its 99: 713, 731, 799; 841, 851, 899; then 901 to come after 899, and then the three sporadic values: 943, 961, 989!
]]>In any case, today I want to return to the problem of quickly recognizing small primes. In my previous post we considered “small” to mean “less than 100”. Today we’ll kick it up a notch and consider recognizing primes less than 1000. I want to start by considering some simple approaches and see how far we can push them. In future posts we’ll consider some fancier things.
First, some divisibility tests! We already know how to test for divisibility by , , and . Let’s see rules for , , and .
To test for divisibility by , take the last digit, chop it off, and subtract double that digit from the rest of the number. Keep doing this until you get something which obviously either is or isn’t divisible by . For example, if we take , we first chop off the final 2; double it is 4, and subtracting 4 from leaves . Subtracting twice from yields , which is not divisible by ; hence neither is .
As an optimization, we can always reduce things mod 7. For example, if we see the digit 7, we can just throw it away; if we see an 8 or 9 we can treat it as 1 or 2, respectively. And if we see a 3, the rule would tell us to subtract 6, but if it’s easier we can add 1 instead, since subtracting 6 and adding 1 are the same mod 7. With a bit of practice this can be done quite quickly.
For an explanation of why this works, and several other fun methods for testing divisibility by 7, see this post by Mark Dominus.
To test a -digit number for divisibility by , just add the first and last digits and then subtract the middle digit. The original number is divisible by 11 if and only if the result is.
This is especially obvious with numbers like , where the sum of the first and last digits is equal to the middle digit. (Subtracting the middle digit would leave 0, which is divisible by 11.) But it also applies in cases like : we have .
The reason this works is that is equivalent to , so . This also suggests how to generalize to more than just 3-digit numbers: just alternately add and subtract digits.
To test for divisibility by , chop off the last digit, multiply it by , and add it to the remaining number. Keep doing this until you end up with something that you know either is or isn’t divisible by .
Here reducing mod can be even more helpful. For example, if the last digit is a , the rule says to add to what’s left. But is only 2 more than , so adding is equivalent to just adding .
Why does this work? Suppose the final digit of our number is and the rest of the number is . That is, our number is of the form , and we want to know whether this is equivalent to . But now note that if and only if . Why? From left to right, we are just multiplying both sides by ; from right to left, we are allowed to divide by since is relatively prime to . So why did we choose to multiply by ? It’s because it lets us get rid of the : is the smallest multiple of which is one away from a multiple of . Hence iff iff .
(Challenge: can you go back now and prove the test for divisibility by ?)
At this point we might ask: if we take a number less than and test it for divisibility by , , , , , and , what’s left? In other words, what are the composite numbers under that we haven’t found yet? It turns out there are of them: 289, 323, 361, 391, 437, 493, 527, 529, 551, 589, 629, 667, 697, 703, 713, 731, 779, 799, 817, 841, 851, 893, 899, 901, 943, 961, and 989. I’ll let you work out the factorizations; of course each one is a product of two primes which are at least .
So we could try to memorize this list and call it a day. Then the procedure becomes: given a number less than , (1) test it for divisibility by all primes up to 13, and (2) check if it is one of the 27 composite numbers we have memorized. If it passes both tests, then it is prime. This sounds doable, though honestly I’m not super excited about memorizing a list of 27 composites.
There are a few more things we could do, though. First of all, notice that the divisibility test for 19 is super easy, since 19 is one less than 2 times 10: chop off the last digit, double it, and add it to the rest. Keep doing this until… you know the drill. This is just like the test for 7, but we add instead of subtract.
OK, so what if we test for all primes up to 13 and also 19? Then there are only 18 composites left that we have to memorize: 289, 391, 493, 527, 529, 629, 667, 697, 713, 731, 799, 841, 851, 899, 901, 943, 961, and 989. This is looking a bit better, and I am already noticing lots of patterns that would help with memorization: 529 and 629; 713 and 731; 899 and 901… oh, and (since is ). (…and it turns out that before publishing this post I couldn’t help myself and went ahead and memorized the list. It wasn’t that hard. I’ll say more about it in a future post!)
We could also test for divisibility by 17, of course. Unfortunately it is a bit more annoying: the smallest multiple of 10 which is one away from a multiple of 17 is 50, which is one less than . So, to test for divisibility by 17, we chop off the last digit, multiply by 5, and subtract. This seems distinctly harder to do in my head than the other tests, because it seems to actually require dealing with two-digit numbers. If we do this, though, we are down to only 9 composites to memorize, which is not bad at all: 529, 667, 713, 841, 851, 899, 943, 961, 989.
]]>As a warmup, today I’ll write about how I determine whether a number less than 100 is prime: I don’t have them memorized, but can very quickly decide whether such a number is prime or not—and you can, too! This is a situation where doing a little mathematical analysis beforehand goes a long way.
Since , every composite number less than has at least one factor which is less than . This means that every composite less than is divisible by , , , or . Multiples of 2, 3, and 5 are relatively easy to recognize, and as we’ll see, is not hard to deal with either.
Any even number or multiple of 5 (i.e. numbers ending with an even number or with a 5) is clearly composite. (Other than 2 and 5 themselves, of course.)
Multiples of are composite. There are many numbers which I think of as “obvious” multiples of three: 9, 21, 27, and 33 because I have had their factorizations memorized since third grade; 81 because it’s a power of three so I have it memorized too; and 39, 63, 69, 93, and 99 because they consist of two digits each of which is a multiple of three. As you probably know, there is also a simple test for determining divisibility by three: just add the digits and see whether the result is divisible by three. This test identifies a few more “nonobvious” multiples of three which you might otherwise think are prime: , , and .
What’s left? There are only three numbers less than 100 which are composite and not divisible by 2, 3, or 5:
So, to sum up, faced with a number less than 100 that I want to test for primality, I can quickly rule it out if it is divisible by 2, 3, or 5, or if it is a multiple of 7 I recognize (49, 77, or 91). And that’s it! Anything else has to be prime.
In a future post I plan to write about how feasible it is to come up with a similar procedure to identify all primes under 1000.
]]>We started with Fermat’s Little Theorem and Euler’s Theorem, which form the basis for a lot of primality testing algorithms, with three different proofs of FlT and a proof of Euler’s Theorem.
We then detoured a bit to talk about some hypothetical machines and how fast they are.
Finally I got around to presenting the Fermat primality test, which is directly based on FlT.
For the Fermat test to make sense I realized we had to talk about modular exponentiation and how to do it by repeated squaring, which led to a Post Without Words and a whole tangent on its efficiency.
Next up: first of all, there’s still quite a bit more to say about the Fermat primality test. After that I plan to present some better/more efficient tests as well (at least Miller-Rabin and Baille-PSW, possibly others). And knowing me there will be at least three more tangents along the way!
]]>Today I want to explain another nice proof, written in a comment by an anonymous^{1} commenter. So although this proof is not originally due to me, I thought it deserved to be written up more fully and not squished into a comment, and I’ve changed it in a few minor ways which I think make it easier to understand (although perhaps not everyone will agree!).
Let denote the minimum number of doubling and incrementing steps needed to generate , and let denote the number of steps used by the binary algorithm. Note that for all : if the binary algorithm uses steps, then the optimal number of steps can’t be any higher.
Now, suppose that the binary algorithm (call it algorithm B) isn’t the most efficient algorithm (our goal will be to derive a contradiction from this assumption). That means there exist values of for which . Let be the smallest such , so we must have for all .
First, note that : B uses zero steps for which is obviously optimal. Now, let’s think about the parity of :
If is odd, then the last step of any algorithm to compute has to be incrementing, since that’s the only way to get an odd number—if the last step is doubling then the result would be even. Removing this last incrementing step from the sequence generated by algorithm B results in a sequence of steps which yields . Since , there must be some other sequence of length that yields , but since is odd it must also end in an increment, so likewise we can delete the final increment step to get a sequence of steps which yields . But , so algorithm B is not optimal for —contradicting our assumption that is the smallest number for which B is not optimal.
Put more succinctly: if we can generate an odd number more efficiently than algorithm B, then we can also generate more efficiently, so the smallest non-optimal can’t be odd.
So suppose is even, say . We know the last step of B is doubling in this case, since the binary representation of ends in a . Let be a sequence of length that generates .
So suppose the last step of A is incrementing. Since the binary sequence for is the same as the sequence for followed by a doubling step, . This in turn is equal to , since we assumed that for any . So we have .
On the other hand, since the last step of the optimal sequence A for is an increment, we have (since A is an optimal sequence for if and only if A without the final increment is an optimal sequence for ). This is equal to , since and are equal on everything less than . Since is odd, the binary algorithm sequence for ends with a double followed by an increment, hence .
Putting this all together, we have , which means . But this is absurd: there’s no way the optimal sequence for takes two more steps than the optimal sequence for , because we could just add an increment.
So we have shown that all these cases lead to absurdity: the conclusion is that there can’t be any such where : the binary algorithm is optimal for every !