An Illustrated Theory of Numbers
Martin H. Weissman
American Mathematical Society, 2017
“Sorry… I couldn’t tell if you were napping, or praying, or reading…”
Thus apologized the barista at my favorite coffee shop for her hesitancy in telling me that my tea was ready. Or perhaps she was just apologizing for interrupting my nap/prayer/reading. In any case, what was it I was actually doing?
In fact, as you may have guessed from the subject of this post, I was reading Martin Weissman’s new book, An Illustrated Theory of Numbers. The reason the barista was so confused was that I was hunched over the book, deep in fascinated concentration over one of the many rich data visualizations. If I recall correctly it was actually a visualization of the distribution of prime numbers (and while staring at it I had a sudden epiphany that I have been implementing prime sieves inefficiently, but that is a story for another blog post!).
As any reader of this blog knows, I love number theory and I love visualizations, so this book is right up my alley. Weissman is a really great teacher, and has obviously spent a lot of time thinking carefully about the best way to structure and explain various topics—even without the illustrations I think the book would still make a valuable contribution to the state-of-the-art in number theory pedagogy. But with the visualizations and illustrations it is truly wonderful! They are plentiful (almost 500!), beautiful, and pedagogically powerful. I had never thought of number theory as a particularly visual subject before, but Weissman makes an impassioned—and, I think, successful—case that it is, or should be. I’d love to give an example but I’m not sure I could do it justice, and it would make this post too long. I will consider whether there is a cool example I can share in a future blog post.
Although the book makes it to “advanced” topics by the end (quadratic reciprocity and quadratic forms), it is fairly self-contained and doesn’t formally require much background beyond high school or basic undergraduate mathematics. All the necessary results are carefully explained and proved, each new result building upon previous ones. It also has a large collection of exercises after each chapter, making it useful for either self-study or for using as part of a class. To be honest I kind of wish I could teach a number theory class so I could use this as the textbook. And if I knew any mathematically inclined, self-motivated high school students I would get them a copy of this book in a heartbeat (well, modulo budgetary constraints).
]]>In a previous post I gave rules for when two orthogons will be considered the same:
I just realized that I forgot a rule!
You might not have even noticed this omission, because it would be strange not to have this rule: usually we think of two polygons as being the same if one is just a rotation of the other. And if rotating an orthogon were to result in a distinct orthogon, then there would be way too many distinct orthogons (uncountably infinitely many, in fact).
So, for example, these four orthogons are all the same:
Of course this notion of sameness should also be reflexive (any orthogon is the same as itself) and transitive (if A is the same as B and B is the same as C, then A is the same as C), so that it is an equivalence relation.
Now, in my previous post I finished the proof showing that orthogons are in 1-1 correspondence with sequences of X
’s and V
’s, containing exactly 4 more X
’s than V
’s, when the sequences are also considered the same up to rotation and reversal. Sequences considered the same up to rotation and reversal are sometimes called bracelets. They are very similar to the necklaces we saw in a recent proof of Fermat’s Little Theorem; the only difference is that two sequences which are the reverse of each other are considered distinct as necklaces, but the same as bracelets. Let’s call bracelets of X
’s and V
’s, with exactly four more X
’s than V
’s, orthobraces. (Not to be confused with orthodontics, which are also called braces, but are definitely not the same when rotated.)
So we have reduced the problem of listing all the orthogons with a given number of vertices to two subproblems:
I want to start by talking about the first subproblem. To get a sense for why listing bracelets is an interesting problem, here are some questions you can ponder:
Start by considering normal sequences (so, e.g., XXXXXV
, VXXXXX
, and XXXVXX
are all different sequences). How many sequences are there with four X
’s and no V
s? How many with five X
’s and one V
? Six X
’s and two V
’s? In general, how many sequences are there with X
’s and V
’s?
Try listing small orthobraces. How many do you get for each number of V
’s? You can try to come up with a formula if you really want, but this actually turns out to be quite difficult; I suggest just trying to systematically list orthobraces and see how many you get. For example, there is only one orthobrace when , namely XXXX
. There is also only one, XXXXXV
, for . This is the only one since any sequence with five X
’s and one V
is just a rotation of this one. (The particular sequence we pick to represent the orthobrace doesn’t matter.)
Compare how many bracelets there are of each size with how many sequences there are. What seems to be the general trend?
An obvious algorithm for listing all bracelets of a given size is to list all the sequences (this is very easy), and throw away any sequences which are equivalent to one we’ve already generated when considered as bracelets (i.e. up to rotation and reversal). Based on your observations above, how feasible is this method?
Every sequence of an even number of X’s and V’s, with exactly four more X’s than V’s, corresponds to the vertex sequence of some orthogonal polygon.
From the previous properties we know that any orthogon gives rise to such a vertex sequence; but how do we know the correspondence goes two ways? Could there be some vertex sequence that for some reason is not possible to realize geometrically?
Of course the answer is no—and here’s one proof. (There may well be better ones; I welcome your ideas! Commenter Gesh also gave a similar proof in the comments to my last post.) The proof is by induction on the number of V
’s in the sequence.
First, if the sequence contains no occurrences of V
, then it must be XXXX
(since there are four more X
’s than V
’s), and we know this is the vertex sequence of a square.
So now suppose there is at least one V
(and hence at least five X
’s). There must be an occurrence of either VX
or XV
somewhere in the sequence. If we delete a VX
or XV
from the sequence, the resulting sequence still has four more X
’s than V
’s, and has one fewer V
, so the induction hypothesis tells us that it is the vertex sequence of some orthogon. We can take this orthogon and turn it into an orthogon corresponding to the original sequence with the VX
or XV
re-inserted, by inserting a “step” into the appropriate edge, like so:
Since an orthogon is finite and all its edges must have some positive length, it is always possible to make this “step” small enough that it does not cause any problems with the rest of the orthogon. (For example, there might be another parallel edge very near to the edge in which we are inserting the step; we have to make the step small enough so that it does not intersect with this other edge.) Concretely, we could, for example, make the first step have height , the second one , and so on, with the th step having height .
This proof not only shows that there must exist an orthogon corresponding to each valid vertex sequence, it actually gives an algorithm for constructing one. (There is only a small problem: the resulting drawings look terrible! But one thing at a time…)
As a concrete example, let’s consider the string XXXXXXVVXV
.
XV
, resulting in XXXXXVXV
. We have to recursively produce a drawing of this orthogon before we can re-insert the XV
.
XV
, leaving XXXXXV
.
Finally we remove the last XV
, leaving XXXX
. This is the base case, and corresponds to a square:
We re-insert XV
to give XXXXXV
. This corresponds to inserting a step into one of the edges; we’ll make it unit tall, like so:
Re-inserting XV
again yields XXXXXVXV
, which corresponds to adding a step (of size ) to the edge just before the convex vertex of the previously added step:
Finally, to re-insert the final XV
, we add a size- step in the middle of the step we just added:
So, in summary, We now know there is a 1-1 correspondence between orthogons and sequences of V
’s and X
’s with four more X
’s than V
’s (where we also consider two sequences equivalent if one is a cyclic rotation of the other, or if one is the reverse of the other.) If we can find a way to enumerate such sequences, we can use them to generate all possible orthogons. I’ll talk about generating these sequences in an upcoming post. Also, as I mentioned before, although this algorithm proves that a drawing always exists for any suitable sequence of X
’s and V
’s, drawings actually produced with this algorithm do not look very good! The exponentially decreasing steps guarantee nothing will ever intersect, but after inserting five or six of them they basically get too small to see. Producing good-looking drawings turns out to be a very interesting challenge, which I will explain in another post.
In my previous post, I posed several properties of orthogons and invited you to figure out why they are true. Here are my proofs.
Every orthogon has an even number of vertices.
Since every vertex is a right angle, the edges of an orthogon must alternate between horizontal and vertical. Hence there must be an even number of vertices; otherwise there would be two vertical edges or two horizontal edges meeting at a vertex, which does not make sense.
Orthogons only have two kinds of vertices: “right turns” and “left turns” (let’s suppose we always travel clockwise around the polygon).
Since writing this, I realized that it is probably easier to think of the vertices as either convex (the corner points towards the exterior of the polygon) or concave (the corner points towards the interior). I’ll use the letter X to stand for conveX vertices, and V for concaVe.
This way, we don’t have to worry about whether we traverse the polygon in clockwise or counterclockwise order (and besides, it can be confusing to figure out what “clockwise” vs “counterclockwise” even means when the polygon doubles back on itself a lot, as in the example shown below).
If you switch directions, left turns become right turns and vice versa. But describing vertices as convex or concave is independent of the polygon’s orientation. You can verify for yourself that the orthogon shown above has the vertex sequence , that is, nine concave vertices in a row followed by 13 convex vertices.
In any case, the reason there are only two types of vertices is simple: if you are travelling along an edge and come to a right-angle vertex, there are only two choices: turn 90 degrees to the left, or to the right.
Every (closed, non-self-intersecting) orthogon has four more right turns than left turns.
Again, the way I originally stated this, with left and right turns, is annoying: if you travel around an orthogon in the opposite direction, then it instead has four more left turns than right turns. We can state this in a much nicer way which is independent of direction: an orthgon always has four more convex vertices than concave.
Obviously this is true for a square, the one possible orthogon with four vertices: there are four convex vertices and no concave vertices. We can see that it also holds for the example orthogon shown above, with nine concave vertices and thirteen convex. Intuitively, the reason it is true in general is that every concave vertex is a turn in “the wrong direction” which needs to be “cancelled” by a convex vertex; on top of that, there need to be four convex vertices to make the whole thing “close up”.
Let’s see how to prove this more formally. We start from the fact that if you sum the internal angles of any polygon with vertices, you get (that is, times degrees). When this is just the familiar fact that the angles of a triangle sum to . But then for bigger , we can always cut up an -gon into triangles, and adding up the angles of all the triangles is the same as adding up all the internal angles of the whole polygon.
Now, suppose an orthogon has convex vertices and concave vertices, and hence vertices in total. Convex vertices have an internal angle of , and concave have an internal angle of . Thus,
If we divide both sides by and multiply by , we get
and then a bit of algebraic rearranging yields .
Incidentally, this now gives us another way to see that there must be an even number of vertices: , which is clearly even.
The final property I stated—namely, that every viable sequence of V’s and X’s describes some orthogon—will have to wait for another post. I’ll give you a hint: the best way to prove this is to give an algorithm that takes a sequence of V’s and X’s as input and produces an actual drawing of an orthogon as output. (But for now it doesn’t matter if the drawing is horrible, just whether one exists!)
Wiktionary suggests that “orthogon” can refer to a right triangle or to a “rectangular figure” but I have never heard it used to refer to those things. Perhaps it is more common in other English-speaking countries?
Quite a few commenters figured out what was going on, and mentioned several nice (equivalent) ways to think about it. Primarily, the idea is to draw all possible orthogonal polygons, that is, polygons with only right angles, organized by the total number of vertices. (So, for example, the picture above shows all orthogonal polygons with exactly ten vertices.) However, we have to be careful what we mean by the phrase “all possible”: there would be an infinite number of such polygons if we think about things like the precise lengths of edges. So we have to say when two polygons are considered the same, and when they are distinct. My rules are as follows:
So, for example, I will consider these three polygons are all the same:
(It’s easy to see why the first two are the same. Can you see why the other one is the same too?)
I want to explain some of the mathematics behind generating these. In order to get there, I will start by stating some propositions. Can you see why each of these statements is true?
The fourth statement may seem trivial but it is worth a bit of thought: how do you know that you can always draw a non-self-intersecting orthogonal polygon for any valid sequence of left and right turns?
I mentioned that we know how to make primality machines that are much faster than factorization machines, or even than factor machines. Before I finally get around to explaining how to build such fast primality machines, it’s worth explaining what I mean when I talk about a machine being “fast” or “slow”.
Of course this whole time when I have been talking about machines, what I really have in mind are algorithms, i.e. specific sets of steps to take some given input and produce a desired output. In other words, computer programs. (It’s a fascinating and wonderful fact that computers are universal machines, in the sense that they can simulate any other machine; instead of having one special machine to write email, and another special machine to read Wikipedia, and yet another special machine to find prime numbers, you can just have one single, general-purpose machine that can do all of these things. But this is a subject for another blog post!)
The obvious, naive way to create a factor machine is as follows:
This is called trial division. (Can you see why it will always return a prime divisor of ?) There are several ways this can be optimized, but we’ll get back to those later. For now let’s think about how long this takes.
When measuring how long an algorithm takes, it is useless to measure the actual running time, in seconds, on some particular computer. For one thing, the amount of time the algorithm takes to run is highly dependent on the particular computer executing it, what other programs are running on the same computer, the weather, time of day, etc., so it is quite difficult to compare to other algorithms. For another thing, the algorithm may take different amounts of time for different inputs. When mathematically analyzing how long an algorithm takes, what we really care about is how the running time scales as a function of the size of the input. This tells us something about how big the inputs can get before the algorithm takes an infeasibly long time to run.
So, what about trial division? The input is an integer ; the size of the input is the number of digits needed to write . (It is more typical to measure the size in terms of the number of bits needed to write it in binary, but it turns out that the base doesn’t really matter, so we’ll stick to base 10 for now.) Let be the number of digits needed to write , so , and . How many steps does the algorithm take to run? We do one division operation for each from up to , so it essentially takes time proportional to . Since , this means that the time needed to run scales exponentially with the size of the input: every time we add one more digit to the input, we multiply the time needed by a factor of .
At this point you may protest that the version of trial division I gave above is hopelessly naive; and indeed, we can optimize it. For example, we don’t have to try every : if we try dividing by first, after that point we can just try dividing by all the odd numbers up to , since if doesn’t divide then no other even number can either. This essentially halves the necessarily running time; but is still proportional to .
A more important optimization is to only test values of up to , instead of going all the way up to . If is the product of two divisors, , then one of them (say, ) must be , and the other must be . So if we haven’t found any divisors by the time we get to then we certainly aren’t going to find any above .
So now the algorithm only takes time proportional to , which seems like it is a big improvement. To be sure, it is an improvement; but if we relate it back to the number of digits , we find , so the running time still grows exponentially in the number of digits , just with a smaller base.
In general, exponential algorithms are fairly useless—for even modest-sized inputs, the running time can be astronomical. For example, suppose an algorithm takes exactly seconds to run on an input of size . When it takes seconds; when , seconds. So far, so good. For it takes a couple minutes ( seconds); for , it takes 15 minutes; for , a few hours; for , a whole day. By the time we get up to an input of size , the algorithm takes over 300 years. For an input of size , it takes over 30 million years. We only have to get up to size or so before the algorithm would take longer than the estimated age of the known universe.
In some sense, we don’t know how to do better than this for factor machines and factorization machines. There are factorization algorithms which are indeed much faster than trial division (with fancy names like the Generalized Number Field Sieve). Using such algorithms it’s feasible to factor numbers with up to hundreds of digits instead of just a few—but nevertheless, the running time of these algorithms still scales essentially exponentially with the size of the input. (There is actually quite a lot of technical detail I am sweeping under the rug here, but I don’t really want to get into it so I hope you will forgive me.)
Primality testing, on the other hand, is a completely different story! Trial division is the only algorithm most people know for primality testing, but over the next few posts I will explain a few other algorithms (and prove that they work!) which are much faster.
]]>If you want to understand what computers are actually doing when they check a Mersenne number for primality, I wrote a whole series about it two years ago: visit this list of my post series and search for “Lucas-Lehmer”.
]]>This is the most obvious question we could ask. The Fundamental Theorem tells us such a prime factorization must exist (and must be unique up to the order of the factors), so we can simply ask what it is. For example, if , then the answer would be . What we really want is some sort of machine (let’s call it a factorization machine) where we can put a positive integer in one end, the machine makes whirring, griding, and beeping sounds for a while, and then a factorization pops out the other end:
Another question we could ask about would be:
An answer to this question is a similar sort of machine, except instead of getting the entire prime factorization out, we just get one prime factor; let’s call this a factor machine. For example, it might work like this:
These first two machines are very closely related. If we have a factorization machine we can easily use it to build a factor machine: just run through the factorization machine, pick one of the factors to return, and throw the rest away.
Slightly less obviously, we can also use a factor machine to build a factorization machine: run through the factor machine to get a prime factor . Now we can compute (by definition, must evenly divide ); to get the rest of the factorization of we just need to factor . So we run through the factor machine to get a prime factor ; we compute ; and so on. We are done when putting into the factor machine returns itself; that means is prime. Finally, we return the factorization .
For example, if we run through the factor machine and get , then we compute . Next we run through the factor machine again and get, say, , and then compute the remaining part that needs to be factored, ; and so on.
Finally, here’s a third question we could ask:
In answer to this we could imagine another similar machine which outputs a simple yes/no answer. Let’s call this a primality machine:
Given a factor machine, we can easily use it to build a primality machine: take the input and run it through the factor machine. If the answer is , then is prime so output “YES”; otherwise, is not prime (since it is divisible by the output factor), so output “NO”.
But what about the converse? If we have a working primality machine, can we use it to build a factor machine? Given some , if the primality machine says YES then is prime, so the factor machine should output . But what if the primality machine says NO? On the face of it, the mere fact of knowing that is composite does not help us find a factor. But what is the primality machine doing on the inside? If it knows that is not prime, it must have figured out a way to factor , right? That is, somewhere in the internals of the machine, there must have been a factor that it simply isn’t telling us about:
If this is true, it means we could rip open the guts of the primality machine and use it to build a factor machine. Or, put more simply, this would be saying that the only way to make a primality machine is to build it out of a factor machine. This seems intuitively reasonable: how could you know that is composite without factoring it?
The punchline is that this is not true!! That is,
It is possible to make working primality machines that truly do not know anything about any factors of .
Not only that, but we know how to make primality machines that run much faster than the fastest known factor machines!
Let that sink in for a minute. It is really quite surprising that we can find out whether has any prime divisors without actually finding out what they are, and that moreover it is much faster if you don’t care what the prime factors are. Think of it as a sort of “computational loophole” in our universe.
This is no mere curiosity; it turns out that this “loophole” is actually the basis of almost all of modern cryptography. When your browser makes an encrypted connection to a website in order to securely transmit your credit card information, it is relying on this loophole: your browser needs to pick some numbers and make sure that they are prime, which it can do quickly; but for anyone intercepting the messages to steal your credit card, they would have to factor another number, which (as far as we know) cannot be done quickly. (This is really cool and a topic for another post.)
It gets stranger, though: you may have noticed that I keep using phrases such as “fastest known”, “seems”, and “as far as we know”. It turns out that no one has been able to prove that making faster factor machines can’t be done! So it’s possible that the “loophole” isn’t a loophole at all; perhaps with a clever enough idea it will turn out to be possible to factor numbers just as fast as we can test whether they are prime. But most people don’t believe that.
In some upcoming posts I plan to talk a bit more about what we actually mean when we are talking about these machines being “fast” or “slow”, and then finally get around to explaining some different ways of building fast primality machines.
]]>This time we’re going to prove statement 1:
If is prime and is an integer where , then .
Consider the set . I claim that this set forms a group under the operation of multiplication . To verify this we need to check several things:
So is a group under the operation of multiplication , and it has elements, that is, its order is . But by Lagrange’s Theorem, the order of each element evenly divides the order of the group. So let and suppose has order , that is, . Then divides , that is, there is some such that . Then .
It is also easy to generalize this proof to a proof of Euler’s Theorem; I will leave it to interested readers to fill in the details. Recall that Euler’s Theorem says that if is any integer and is relatively prime to , then . Consider the set of all positive integers which are relatively prime to . This also forms a group under the multiplication , and hence any element raised to the order of the group () results in the identity element ().
There are many more proofs of Fermat’s Little Theorem, but this has been a representative sampling! In future posts, I’m going to explore the mathematics behind several computational tests for primality, most of which end up relying on Fermat’s Little Theorem in one way or another.
]]>