Book review: An Illustrated Theory of Numbers

[Disclosure of Material Connection: The AMS kindly provided me with a free review copy of this book. I was not required to write a positive review. The opinions expressed are my own.]

An Illustrated Theory of Numbers
Martin H. Weissman
American Mathematical Society, 2017

“Sorry… I couldn’t tell if you were napping, or praying, or reading…”

Thus apologized the barista at my favorite coffee shop for her hesitancy in telling me that my tea was ready. Or perhaps she was just apologizing for interrupting my nap/prayer/reading. In any case, what was it I was actually doing?

In fact, as you may have guessed from the subject of this post, I was reading Martin Weissman’s new book, An Illustrated Theory of Numbers. The reason the barista was so confused was that I was hunched over the book, deep in fascinated concentration over one of the many rich data visualizations. If I recall correctly it was actually a visualization of the distribution of prime numbers (and while staring at it I had a sudden epiphany that I have been implementing prime sieves inefficiently, but that is a story for another blog post!).

As any reader of this blog knows, I love number theory and I love visualizations, so this book is right up my alley. Weissman is a really great teacher, and has obviously spent a lot of time thinking carefully about the best way to structure and explain various topics—even without the illustrations I think the book would still make a valuable contribution to the state-of-the-art in number theory pedagogy. But with the visualizations and illustrations it is truly wonderful! They are plentiful (almost 500!), beautiful, and pedagogically powerful. I had never thought of number theory as a particularly visual subject before, but Weissman makes an impassioned—and, I think, successful—case that it is, or should be. I’d love to give an example but I’m not sure I could do it justice, and it would make this post too long. I will consider whether there is a cool example I can share in a future blog post.

Although the book makes it to “advanced” topics by the end (quadratic reciprocity and quadratic forms), it is fairly self-contained and doesn’t formally require much background beyond high school or basic undergraduate mathematics. All the necessary results are carefully explained and proved, each new result building upon previous ones. It also has a large collection of exercises after each chapter, making it useful for either self-study or for using as part of a class. To be honest I kind of wish I could teach a number theory class so I could use this as the textbook. And if I knew any mathematically inclined, self-motivated high school students I would get them a copy of this book in a heartbeat (well, modulo budgetary constraints).

Posted in books, review | Tagged , , , , , | 1 Comment

Orthogons and orthobraces

One of these days soon I will get back to writing about primality tests, but for now I am having fun getting sidetracked on orthogons!

In a previous post I gave rules for when two orthogons will be considered the same:

  • Two orthogons are the same if one is the mirror image of the other.
  • Two orthogons are the same if one can be turned into the other just by changing the length of some of the edges.

I just realized that I forgot a rule!

  • Two orthogons are the same if one is a rotation of the other.

You might not have even noticed this omission, because it would be strange not to have this rule: usually we think of two polygons as being the same if one is just a rotation of the other. And if rotating an orthogon were to result in a distinct orthogon, then there would be way too many distinct orthogons (uncountably infinitely many, in fact).

So, for example, these four orthogons are all the same:

Of course this notion of sameness should also be reflexive (any orthogon is the same as itself) and transitive (if A is the same as B and B is the same as C, then A is the same as C), so that it is an equivalence relation.

Now, in my previous post I finished the proof showing that orthogons are in 1-1 correspondence with sequences of X’s and V’s, containing exactly 4 more X’s than V’s, when the sequences are also considered the same up to rotation and reversal. Sequences considered the same up to rotation and reversal are sometimes called bracelets. They are very similar to the necklaces we saw in a recent proof of Fermat’s Little Theorem; the only difference is that two sequences which are the reverse of each other are considered distinct as necklaces, but the same as bracelets. Let’s call bracelets of X’s and V’s, with exactly four more X’s than V’s, orthobraces. (Not to be confused with orthodontics, which are also called braces, but are definitely not the same when rotated.)

So we have reduced the problem of listing all the orthogons with a given number of vertices to two subproblems:

  1. List all orthobraces of a given length.
  2. Turn each orthobrace into a drawing of an orthogon.

I want to start by talking about the first subproblem. To get a sense for why listing bracelets is an interesting problem, here are some questions you can ponder:

  • Start by considering normal sequences (so, e.g., XXXXXV, VXXXXX, and XXXVXX are all different sequences). How many sequences are there with four X’s and no Vs? How many with five X’s and one V? Six X’s and two V’s? In general, how many sequences are there with n+4 X’s and n V’s?

  • Try listing small orthobraces. How many do you get for each number of V’s? You can try to come up with a formula if you really want, but this actually turns out to be quite difficult; I suggest just trying to systematically list orthobraces and see how many you get. For example, there is only one orthobrace when v = 0, namely XXXX. There is also only one, XXXXXV, for v = 1. This is the only one since any sequence with five X’s and one V is just a rotation of this one. (The particular sequence we pick to represent the orthobrace doesn’t matter.)

  • Compare how many bracelets there are of each size with how many sequences there are. What seems to be the general trend?

  • An obvious algorithm for listing all bracelets of a given size is to list all the sequences (this is very easy), and throw away any sequences which are equivalent to one we’ve already generated when considered as bracelets (i.e. up to rotation and reversal). Based on your observations above, how feasible is this method?

Posted in combinatorics, geometry | Tagged , , , | 2 Comments

Properties of orthogons II

In my previous post I proved three out of the four properties of orthogons I originally stated. Now let’s prove the final property:

Every sequence of an even number of X’s and V’s, with exactly four more X’s than V’s, corresponds to the vertex sequence of some orthogonal polygon.

From the previous properties we know that any orthogon gives rise to such a vertex sequence; but how do we know the correspondence goes two ways? Could there be some vertex sequence that for some reason is not possible to realize geometrically?

Of course the answer is no—and here’s one proof. (There may well be better ones; I welcome your ideas! Commenter Gesh also gave a similar proof in the comments to my last post.) The proof is by induction on the number of V’s in the sequence.

First, if the sequence contains no occurrences of V, then it must be XXXX (since there are four more X’s than V’s), and we know this is the vertex sequence of a square.

So now suppose there is at least one V (and hence at least five X’s). There must be an occurrence of either VX or XV somewhere in the sequence. If we delete a VX or XV from the sequence, the resulting sequence still has four more X’s than V’s, and has one fewer V, so the induction hypothesis tells us that it is the vertex sequence of some orthogon. We can take this orthogon and turn it into an orthogon corresponding to the original sequence with the VX or XV re-inserted, by inserting a “step” into the appropriate edge, like so:

Since an orthogon is finite and all its edges must have some positive length, it is always possible to make this “step” small enough that it does not cause any problems with the rest of the orthogon. (For example, there might be another parallel edge very near to the edge in which we are inserting the step; we have to make the step small enough so that it does not intersect with this other edge.) Concretely, we could, for example, make the first step have height 1/2, the second one 1/4, and so on, with the nth step having height 1/2^n.

This proof not only shows that there must exist an orthogon corresponding to each valid vertex sequence, it actually gives an algorithm for constructing one. (There is only a small problem: the resulting drawings look terrible! But one thing at a time…)

As a concrete example, let’s consider the string XXXXXXVVXV.

  • Let’s remove the first XV, resulting in XXXXXVXV. We have to recursively produce a drawing of this orthogon before we can re-insert the XV.
    • We remove another XV, leaving XXXXXV.
      • Finally we remove the last XV, leaving XXXX. This is the base case, and corresponds to a square:

      • We re-insert XV to give XXXXXV. This corresponds to inserting a step into one of the edges; we’ll make it 1/2 unit tall, like so:

    • Re-inserting XV again yields XXXXXVXV, which corresponds to adding a step (of size 1/4) to the edge just before the convex vertex of the previously added step:

  • Finally, to re-insert the final XV, we add a size-1/8 step in the middle of the step we just added:

So, in summary, We now know there is a 1-1 correspondence between orthogons and sequences of V’s and X’s with four more X’s than V’s (where we also consider two sequences equivalent if one is a cyclic rotation of the other, or if one is the reverse of the other.) If we can find a way to enumerate such sequences, we can use them to generate all possible orthogons. I’ll talk about generating these sequences in an upcoming post. Also, as I mentioned before, although this algorithm proves that a drawing always exists for any suitable sequence of X’s and V’s, drawings actually produced with this algorithm do not look very good! The exponentially decreasing steps guarantee nothing will ever intersect, but after inserting five or six of them they basically get too small to see. Producing good-looking drawings turns out to be a very interesting challenge, which I will explain in another post.

Posted in combinatorics, geometry | Tagged , , , | 1 Comment

Properties of orthogons I

First things first: from now on, when talking about polygons with only right angles, instead of calling them “orthogonal polygons” I’m going to start calling them “orthogons”, which sounds cool, is much less clunky than “orthogonal polygons”, and doesn’t seem to already be in use.1 This name was recently suggested to me by Mark Dominus (who was the one who got me thinking about orthogons in the first place).

In my previous post, I posed several properties of orthogons and invited you to figure out why they are true. Here are my proofs.

  1. Every orthogon has an even number of vertices.

    Since every vertex is a right angle, the edges of an orthogon must alternate between horizontal and vertical. Hence there must be an even number of vertices; otherwise there would be two vertical edges or two horizontal edges meeting at a vertex, which does not make sense.

  2. Orthogons only have two kinds of vertices: “right turns” and “left turns” (let’s suppose we always travel clockwise around the polygon).

    Since writing this, I realized that it is probably easier to think of the vertices as either convex (the corner points towards the exterior of the polygon) or concave (the corner points towards the interior). I’ll use the letter X to stand for conveX vertices, and V for concaVe.

    This way, we don’t have to worry about whether we traverse the polygon in clockwise or counterclockwise order (and besides, it can be confusing to figure out what “clockwise” vs “counterclockwise” even means when the polygon doubles back on itself a lot, as in the example shown below).

    If you switch directions, left turns become right turns and vice versa. But describing vertices as convex or concave is independent of the polygon’s orientation. You can verify for yourself that the orthogon shown above has the vertex sequence V^9 X^{13}, that is, nine concave vertices in a row followed by 13 convex vertices.

    In any case, the reason there are only two types of vertices is simple: if you are travelling along an edge and come to a right-angle vertex, there are only two choices: turn 90 degrees to the left, or to the right.

  3. Every (closed, non-self-intersecting) orthogon has four more right turns than left turns.

    Again, the way I originally stated this, with left and right turns, is annoying: if you travel around an orthogon in the opposite direction, then it instead has four more left turns than right turns. We can state this in a much nicer way which is independent of direction: an orthgon always has four more convex vertices than concave.

    Obviously this is true for a square, the one possible orthogon with four vertices: there are four convex vertices and no concave vertices. We can see that it also holds for the example orthogon shown above, with nine concave vertices and thirteen convex. Intuitively, the reason it is true in general is that every concave vertex is a turn in “the wrong direction” which needs to be “cancelled” by a convex vertex; on top of that, there need to be four convex vertices to make the whole thing “close up”.

    Let’s see how to prove this more formally. We start from the fact that if you sum the internal angles of any polygon with n vertices, you get \pi(n-2) (that is, n-2 times 180 degrees). When n = 3 this is just the familiar fact that the angles of a triangle sum to \pi. But then for bigger n, we can always cut up an n-gon into n-2 triangles, and adding up the angles of all the triangles is the same as adding up all the internal angles of the whole polygon.

    Now, suppose an orthogon has x convex vertices and v concave vertices, and hence x + v vertices in total. Convex vertices have an internal angle of \pi/2, and concave have an internal angle of 3\pi/2. Thus,

    \displaystyle x(\pi/2) + v(3\pi/2) = \pi(x+v-2)

    If we divide both sides by \pi and multiply by 2, we get

    \displaystyle x + 3v = 2x+2v-4

    and then a bit of algebraic rearranging yields x - v = 4.

    Incidentally, this now gives us another way to see that there must be an even number of vertices: x + v = x + (x - 4) = 2x - 4, which is clearly even.

The final property I stated—namely, that every viable sequence of V’s and X’s describes some orthogon—will have to wait for another post. I’ll give you a hint: the best way to prove this is to give an algorithm that takes a sequence of V’s and X’s as input and produces an actual drawing of an orthogon as output. (But for now it doesn’t matter if the drawing is horrible, just whether one exists!)

  1. Wiktionary suggests that “orthogon” can refer to a right triangle or to a “rectangular figure” but I have never heard it used to refer to those things. Perhaps it is more common in other English-speaking countries?

Posted in combinatorics, geometry, proof | Tagged , , , , , , , | 8 Comments

Orthogonal polygons

It’s time to say more about PWW #21, in which I exhibited things like this:

Quite a few commenters figured out what was going on, and mentioned several nice (equivalent) ways to think about it. Primarily, the idea is to draw all possible orthogonal polygons, that is, polygons with only right angles, organized by the total number of vertices. (So, for example, the picture above shows all orthogonal polygons with exactly ten vertices.) However, we have to be careful what we mean by the phrase “all possible”: there would be an infinite number of such polygons if we think about things like the precise lengths of edges. So we have to say when two polygons are considered the same, and when they are distinct. My rules are as follows:

  • Two polygons are the same if one is the mirror image of the other.
  • Two polygons are the same if one can be turned into the other just by changing the length of some of the edges.

So, for example, I will consider these three polygons are all the same:

(It’s easy to see why the first two are the same. Can you see why the other one is the same too?)

I want to explain some of the mathematics behind generating these. In order to get there, I will start by stating some propositions. Can you see why each of these statements is true?

  1. Every orthogonal polygon has an even number of vertices.
  2. Orthongal polygons only have two kinds of vertices: “right turns” and “left turns” (let’s suppose we always travel clockwise around the polygon).
  3. Every (closed, non-self-intersecting) orthogonal polygon has four more right turns than left turns.
  4. Every sequence of an even number of R’s and L’s, with exactly four more R’s than L’s, corresponds to the vertex sequence of some orthogonal polygon.

The fourth statement may seem trivial but it is worth a bit of thought: how do you know that you can always draw a non-self-intersecting orthogonal polygon for any valid sequence of left and right turns?

Posted in combinatorics, geometry | Tagged , , | 5 Comments

Fast and slow machines

In my previous post, I presented three hypothetical machines which take a positive integer n as input and give us something else as output:

  1. a factorization machine gives us the complete prime factorization of n;
  2. a factor machine gives us one prime factor of n;
  3. a primality machine just tells us whether or not n is prime (that is, it gives us an answer of either true or false).

I mentioned that we know how to make primality machines that are much faster than factorization machines, or even than factor machines. Before I finally get around to explaining how to build such fast primality machines, it’s worth explaining what I mean when I talk about a machine being “fast” or “slow”.

Of course this whole time when I have been talking about machines, what I really have in mind are algorithms, i.e. specific sets of steps to take some given input and produce a desired output. In other words, computer programs. (It’s a fascinating and wonderful fact that computers are universal machines, in the sense that they can simulate any other machine; instead of having one special machine to write email, and another special machine to read Wikipedia, and yet another special machine to find prime numbers, you can just have one single, general-purpose machine that can do all of these things. But this is a subject for another blog post!)

The obvious, naive way to create a factor machine is as follows:

  • For each positive integer d from 2 up to n-1:
    • Try dividing n by d.
    • If d evenly divides n, that is, the remainder is 0, then stop and output d.
  • If every d from 2 to n-1 is tried without finding one that evenly divides n, then n must be prime, so output n.

This is called trial division. (Can you see why it will always return a prime divisor of n?) There are several ways this can be optimized, but we’ll get back to those later. For now let’s think about how long this takes.

When measuring how long an algorithm takes, it is useless to measure the actual running time, in seconds, on some particular computer. For one thing, the amount of time the algorithm takes to run is highly dependent on the particular computer executing it, what other programs are running on the same computer, the weather, time of day, etc., so it is quite difficult to compare to other algorithms. For another thing, the algorithm may take different amounts of time for different inputs. When mathematically analyzing how long an algorithm takes, what we really care about is how the running time scales as a function of the size of the input. This tells us something about how big the inputs can get before the algorithm takes an infeasibly long time to run.

So, what about trial division? The input is an integer n; the size of the input is the number of digits needed to write n. (It is more typical to measure the size in terms of the number of bits needed to write it in binary, but it turns out that the base doesn’t really matter, so we’ll stick to base 10 for now.) Let m be the number of digits needed to write n, so m \approx \log_{10} n, and n \approx 10^m. How many steps does the algorithm take to run? We do one division operation for each d from 2 up to n-1, so it essentially takes time proportional to n. Since n \approx 10^m, this means that the time needed to run scales exponentially with the size of the input: every time we add one more digit to the input, we multiply the time needed by a factor of 10.

At this point you may protest that the version of trial division I gave above is hopelessly naive; and indeed, we can optimize it. For example, we don’t have to try every d: if we try dividing by 2 first, after that point we can just try dividing by all the odd numbers up to n, since if 2 doesn’t divide n then no other even number can either. This essentially halves the necessarily running time; but 10^m/2 is still proportional to 10^m.

A more important optimization is to only test values of d up to \sqrt{n}, instead of going all the way up to n. If n is the product of two divisors, n = xy, then one of them (say, x) must be \leq \sqrt{n}, and the other must be \geq \sqrt n. So if we haven’t found any divisors by the time we get to \sqrt n then we certainly aren’t going to find any above \sqrt n.

So now the algorithm only takes time proportional to \sqrt n, which seems like it is a big improvement. To be sure, it is an improvement; but if we relate it back to the number of digits m, we find \sqrt{n} \approx \sqrt{10^m} = (\sqrt{10})^m \approx 3.16^m, so the running time still grows exponentially in the number of digits m, just with a smaller base.

In general, exponential algorithms are fairly useless—for even modest-sized inputs, the running time can be astronomical. For example, suppose an algorithm takes exactly 3.16^m seconds to run on an input of size m. When m = 1 it takes 3.16 seconds; when m = 2, 10 seconds. So far, so good. For m = 4 it takes a couple minutes (100 seconds); for m = 6, it takes 15 minutes; for m = 8, a few hours; for m = 10, a whole day. By the time we get up to an input of size 20, the algorithm takes over 300 years. For an input of size 30, it takes over 30 million years. We only have to get up to size 35 or so before the algorithm would take longer than the estimated age of the known universe.

In some sense, we don’t know how to do better than this for factor machines and factorization machines. There are factorization algorithms which are indeed much faster than trial division (with fancy names like the Generalized Number Field Sieve). Using such algorithms it’s feasible to factor numbers with up to hundreds of digits instead of just a few—but nevertheless, the running time of these algorithms still scales essentially exponentially with the size of the input. (There is actually quite a lot of technical detail I am sweeping under the rug here, but I don’t really want to get into it so I hope you will forgive me.)

Primality testing, on the other hand, is a completely different story! Trial division is the only algorithm most people know for primality testing, but over the next few posts I will explain a few other algorithms (and prove that they work!) which are much faster.

Posted in computation, number theory, primes | Tagged , , , , , , , , , , | 1 Comment

New Mersenne prime

With impeccable timing, just in the middle of my series about primality testing, a new Mersenne prime has been announced, a little under two years after the previous one. In particular, it has been shown that 2^{77,232,917}-1 is prime; this is now the largest known prime number.

If you want to understand what computers are actually doing when they check a Mersenne number for primality, I wrote a whole series about it two years ago: visit this list of my post series and search for “Lucas-Lehmer”.

Posted in number theory, primes | Tagged , , , , | Leave a comment