It turns out that de Grey’s new proof follows exactly this same pattern: he constructs a unit distance graph which needs at least 5 colors. So why hasn’t anyone else done this before now? The Moser spindle (a unit distance graph with chromatic number 4) has been around since 1961; why did it take almost 60 years for someone to come up with a unit distance graph with chromatic number 5?
The reason is that the graph is very big! de Grey’s graph has 1581 vertices. Although it is itself constructed out of copies of smaller pieces, which are themselves constructed out of pieces, etc.—so it’s not quite as hard to analyze as any random 1581-vertex graph—proving that it requires 5 colors still required a computer. Even the process of finding the graph in the first place required a computer search. So that’s why no one had done it in the last 60 years.
A lot of people have been working on improving and extending this result. The current record-holder for smallest unit-distance graph with chromatic number 5 has 610 vertices, and was found by Marijn Heule using a computer search. Here’s a picture of it, provided by Marijn:
It’s unknown whether we will be able to find a proof that can be constructed and understood by humans without the help of a computer. It’s also unknown whether these techniques and the associated technology will be able to extend far enough to find a unit distance graph with chromatic number 6 (if such a thing exists—though it seems many mathematicians believe it should).
]]>In this post I’ll talk about lower bounds. That is, how do we know that the chromatic number of the plane (CNP) is at least 5?
Let’s start with a smaller number. Can we argue that the CNP is at least 2? Sure, this is easy: if the entire plane is colored with only one color, then obviously any two points separated by a distance of 1 unit have the same color. So one color is not enough, and hence CNP .
Is two colors enough? If you play around with it a bit, most likely you will quickly get an intuition that this won’t work either. Here’s an intuitive argument why we need at least three colors. Suppose some region of the plane is colored, say, blue.
(It doesn’t have to be a circle, but let’s just think about a circular-ish region for simplicity’s sake.) The diameter of the region has to be smaller than , because otherwise there would be two points within the region that are unit apart. But that means the entire area outside the region has to be, say, red.
But if the diameter of the blue region is less than 1 and the area outside the region is all red, then intuitively, we can find two red points which are a distance of 1 apart, as illustrated in the above picture.
This isn’t really a formal argument at all (what about really weirdly-shaped regions)? But we can give a different argument which is completely formal yet still quite simple. Consider the graph consisting of three vertices connected in a triangle.
As mentioned in my previous post, the chromatic number of this graph is —each vertex needs its own color since each vertex is adjacent to both other vertices. But this actually tells us something about the plane, since this graph can be drawn as an equilateral triangle with all its edges having length . If we pick any three points in the plane arranged in a unit equilateral triangle, they must all have different colors; hence at least three colors are needed to color the plane.
Generalizing from the previous proof, we can see a general plan of attack to prove that three colors are not enough to color the plane. We want to find some graph such that
If we can find such a graph, it immediately shows that we need at least four colors for the plane, since we can pick a set of points in the plane in exactly the same configuration as the vertices of the graph; since the vertices of the graph need four colors, so do the chosen points of the plane.
Well, such a graph exists! In fact, there are many such graphs, but the simplest one is called the Moser spindle. (It’s named after William and Leo Moser, who published it in 1961; I have no idea why it is called a “spindle”.) Here it is:
As you can see, it’s made of four equilateral triangles, with two pairs glued together along an edge, a shared vertex at the top, and an extra edge of length exactly connecting the bottom vertices. You can check for yourself that this is a unit-distance graph. But what is its chromatic number? Well, suppose we try coloring it with only three colors. Suppose the top vertex is red. That vertex is connected to two equilateral triangles; of course the other two vertices of each triangle must have the other two colors (say, green and blue). Then the two bottom vertices must again be colored red, since each is part of an equilateral triangle with one green and one blue vertex. But this isn’t allowed: the two bottom vertices are connected by an edge, so they can’t be the same color.
So, three colors aren’t enough to color the Moser spindle, and hence three colors aren’t enough to color the plane either!
So the above proofs are not that hard to grasp, and we have known that CNP > 3 since 1961 or so. So how did Aubrey de Grey prove that CNP > 4, and why did it take almost 60 years to go from 3 to 4? I’ll explain that in my next post!
]]>In this case, we hit again, after which point the values will start to repeat. So in this case, iterating enters a loop of length , where it will get stuck forever.
Let’s try another starting value, say, .
In this case we hit , which is a fixed point of , that is, , so we simply get stuck on forever.
The amazing thing is that these are the only two things that can happen—for any positive integer, iterating will eventually either reach , or enter the loop of eight numbers . Arthur Porges proved this in his article, A Set of Eight Numbers, published in 1945 in The American Mathematical Monthly (Porges 1945).
Let’s see a proof, inspired by Porges’s approach. First, suppose , and let be the number of digits in , so . How big could be? Since each digit contributes to the sum independently, and the digit with the biggest square is , attains its maximum possible value when consists of all ’s, in which case .
I claim that when . We can directly check that this is true when , since . And every time we increase by , the left-hand side increases by , but the right-hand side is multiplied by . At that rate the left-hand side can never hope to catch up!
Putting this together, we see that when has four or more digits, , which by transitivity means . In other words, if has four or more digits, necessarily makes it smaller. So if we start with some huge 500-digit number and start iterating , the results will get smaller and smaller until we finally get to a number with fewer than four digits. (A fun exercise for you: if we start with a 500-digit number, what is the maximum number of iterations of we need to reach a number with fewer than four digits?)
So what happens then? Actually, the bulk of Porges’s article is taken up with analyzing the situation for numbers of three or fewer digits. He proves various lemmas which cleverly reduce the number of cases that actually need to be checked by hand. But you see, in 1945 he didn’t have any computers to help! 1945 is right around the time when the first electronic, digital computers were being developed; it would be at least another ten or twenty years before any would have been generally available for a mathematician to use in checking a conjecture like this.
In 2018, on the other hand, it’s much faster to just write a program to test all the numbers with three or fewer digits than it is to follow Porges’s arguments. Here’s some Haskell code which does just that:
loop :: [Integer]
loop = [145, 42, 20, 4, 16, 37, 58, 89]
s :: Integer -> Integer
s = sum . map (^2) . map (read . (:[])) . show
ok :: Integer -> Bool
ok = any (\m -> m == 1 || m `elem` loop) . iterate s
The ok
function checks whether iterating s
ever produces either the number or some number in the loop. Now we just have to run ok
on all the numbers we want to check:
>>> all ok [1..1000]
True
And voila!
Porges, Arthur. 1945. “A Set of Eight Numbers.” The American Mathematical Monthly 52 (7). Mathematical Association of America:379–82. http://www.jstor.org/stable/2304639.
Suppose we have a graph with vertices connected by (bidirectional) edges. One natural task is to assign a color to each vertex of in such a way that for every edge, the two vertices at the ends of the edge have different colors. Put another way, no two vertices connected by an edge should have the same color. This is called a (vertex) coloring of .
For example, here is a graph with a valid coloring (using four colors):
If you look at any edge in this graph, you can see that the two vertices at the ends of the edge have different colors.
As an example of why we might care about this, suppose we have a graph where the vertices represent people, and two people are connected by an edge if they are enemies. Then a vertex coloring corresponds to a way to assign everyone to a team (one team for each color) such that enemies are never on the same team! Here’s another example: suppose vertices represent courses, and there is an edge between two courses if there is at least one student who is signed up for both courses. Then a vertex coloring is a way to schedule courses—each “color” represents a different time slot—such that no student has a conflict. Or if vertices represent countries, and there is an edge between two vertices whenever those countries share a border, a coloring of the graph is a way to literally color a map so bordering countries never have the same color. (Famously, for any (non-weird) map you can do this using only four colors; this is something I’d like to write more about someday.) These examples are just the tip of the iceberg: this is a natural problem that comes up all the time in practice.
Given a graph , coming up with a vertex coloring is trivial: just make every vertex a different color! But this is silly; typically, it is better to use fewer colors, so although giving every vertex a different color results in a valid vertex coloring, it is not a very good one. The natural question to ask is: given a graph , what is the smallest possible number of colors needed to construct a vertex coloring? This smallest possible number is called the chromatic number of (and is often written ).
For example, consider the following three graphs:
The graph on the left has chromatic number 2: we can’t use only one color since then the edges would have vertices of the same color on both ends, but we can get away with only two colors since there’s nothing to prevent the vertices on the ends from having the same color as each other, as long as they have a different color than the middle vertex. On the other hand, the graph in the middle has chromatic number 3: since each vertex is adjacent to both other vertices, we actually need to give every vertex its own distinct color.
The graph on the right is the same as the example graph I showed before, with a coloring using four vertices. So is the chromatic number four? Not so fast! At this point we only know the chromatic number is at most four. I will leave you the following exercises:
It’s a bit of a stretch, but we can generalize this to ask about the “chromatic number” not of a graph, but of the plane. In particular, we want to assign a color to every point on the plane such that any two points which are exactly 1 unit apart have different colors. Notice that the plane contains infinitely many points (uncountably many, in fact), so we certainly can’t hope to color them by simply listing all the points and giving a color to each! In essence, we are looking for a function such that for any points and , if the distance between and (computed using the distance formula, aka Pythagorean Theorem) is exactly 1, then .
You can think of a function as an (infinite) “picture” or “painting”. Imagine looking at a painting and holding up a 1-inch ruler next to it. No matter where you place the ruler and no matter which way you turn it, the two points at opposite ends of the ruler always have different colors! What could such a painting look like? And how many colors does it need?
The minimum number of colors needed is known as the chromatic number of the plane, abbreviated CNP. The problem of determining CNP (known as the Hadwiger-Nelson problem) was first posed around 1950, and we have known for a long time that the answer is somewhere between and , inclusive. (I will explain how we know this in later posts!) De Grey’s contribution was to show that the answer can’t be ; hence we now know that the answer has to be , , or —but we still don’t know which!
]]>Good to know!
Commenter Anthony asked if the proof worked for numbers in any base, which was an important point I forgot to mention: what the paper actually proved is that every positive integer can be written as the sum of three palindromes in any base .
The theorem is not true for numbers in base 2. Notice first that any binary number starts with , so any binary palindrome must end with as well, that is, it must be odd—except , which we define to be a palindrome as well. But a sum of three odd numbers is odd, so for the theorem to true, every even number would have to be the sum of two palindromes in base (plus zero to make three palindromes). But the even binary number is a counterexample, since it can be checked (e.g. by writing a computer program to try all possibilities) that it is not the sum of two binary palindromes. The following code does exactly that:
ghci> :set -XBinaryLiterals
ghci> import Text.Printf
ghci> let n = 0b10110000
ghci> let isBinPal x = printf "%b" x == reverse (printf "%b" x :: String)
ghci> [ k | k <- [1..n], isBinPal k, isBinPal (n-k) ]
[]
The []
on the last line is an empty list indicating that there are no binary palindromes which add to .
This leaves some questions: what about bases 3 and 4? And how many palindromes do you need in base 2? At the time the paper was published, the answers to these questions were not known. It was conjectured that four palindromes are enough in base 2. It was also conjectured that three palindromes are enough for bases 3 and 4—just like in every other base ; the proof just didn’t cover those cases.
In his email, Lewis Baxter also pointed me to an even more recent paper by Aayush Rajasekaran, Jeffrey Shallit, and Tim Smith which uses a completely different method to resolve the remaining questions: it turns out that the conjectures were correct; we now know that every positive integer can indeed be written as a sum of four binary palindromes, and only three palindromes in base 3 or base 4! Jeffrey Shallit has a nice blog post explaining their method here. It’s not clear to me whether their method can be used to produce an actual algorithm for finding base-2, -3, or -4 palindromes that sum to a given number, though it certainly should be possible.
]]>The really cool thing is that the proof essentially consists of a (very long, very detailed) algorithm, which, for any positive integer , explicitly finds three palindromes which sum to ! For example, when (the first 21 decimal digits of ), their algorithm gives the three palindromes
You can easily verify (try it!) that these are all palindromes, and that they add up to .
ghci> p1 = 210100100111001001012
ghci> p2 = 98639929400492993689
ghci> p3 = 5419235847485329145
ghci> isPalindrome x = show x == reverse (show x)
ghci> isPalindrome p1 && isPalindrome p2 && isPalindrome p3
True
ghci> p1 + p2 + p3
314159265358979323846
The reason this is so cool is that in many cases, the proof of a theorem of the form “For every integer , there exists a thing” might be nonconstructive—that is, it shows that for every it can’t be the case that there doesn’t exist a thing, which “proves” the theorem but doesn’t actually help us find the things which we suposedly know must exist. But this proof is very much constructive: not only does it show that every positive integer must be expressible as a sum of three palindromes, it actually backs it up with explicit evidence by giving us a way to find the palindromes.
Apparently the authors have written a Python program which implements the algorithm, and I have asked for a copy; I’m excited to play with it! I’d implement it myself, except that the algorithm has so many cases and sub-cases and sub-sub-cases and special adjustments to the sub-sub-cases, etc. that it would probably take me a week.
]]>I described a way to turn orthobraces into drawings in a previous post, but the drawings it produced were terrible—the edges get smaller and smaller as the algorithm proceeds. For example, given the string XXXXXXVVXV
, the algorithm produced this drawing:
But we’d really like to be able to produce a drawing like this instead:
What kind of drawings do we want in general? Here are my criteria:
These criteria seem reasonable, to me at least, though one can easily imagine making other choices. For example, some might wish to consider polygons with edges that intersect at a vertex, such as the one shown below.
We could also imagine minimizing the area, rather than the perimeter, of the polygon.^{1} In any case, the idea is that a polygon with these properties should be as small as possible while remaining non-intersecting and with all edges having a common measurement. (The edges all having integer lengths also means that we will be able to create these drawings by gluing together squares.) Let’s call orthogon drawings with these properties good.
Given these criteria, we can ask questions such as the following:
It is easy to show that (1) is true: first, use the naive orthogon drawing algorithm to produce a lopsided drawing with tiny edges. Now, if the smallest edge has length , scale the whole polygon by a factor of . This results in a polygon with no self-intersections and all integer edges. This proves that at least one such polygon exists; among all such polygons there must be at least one with minimal perimeter.
However, this does not actually give us an algorithm for producing good drawings! That last step of the proof is the problem: just because there must exist some drawing with minimal perimeter doesn’t help us actually find one. In the end, coming up with an algorithm to produce good drawings turns out to be surprisingly tricky, and illustrates a very important principle in mathematics and computer science. Faced with some large problem/task/proof/computation, what we would ideally like to be able to do is to split it into subproblems/subtasks/lemmas/helper functions, then solve/complete/prove/compute each of those subthings independently, and somehow combine the results to come up with an overall result. For example, to compute the product of two matrices, we just have to break it down into a bunch of dot products between two vectors, doing that for each combination of a row and a column, and then put all the results together into a big result matrix. Working with smaller independent pieces and then combining results is a very powerful technique, since it lets us solve problems, do computations, prove theorems, etc. which are much larger than what we would be able to accomplish by attacking them directly.
However, it seems like this approach fails for drawing orthogons! We might imagine something like this: take the orthobrace and break it up into a bunch of substrings; come up with some kind of drawing for each one; then combine the drawings into a drawing for the whole thing. Unfortunately, this does not work, since the length of an edge can be influenced by other features of the orthogon which are far away. For example, look at this polygon:
The top edge has to be very long in order for the polygon to not intersect itself. How do we know how long it has to be? We cannot infer it just by looking at neighboring edges. It is not even due solely to some particular edges on the other side of the polygon. It is in some sense a “global” property of the whole polygon: the way in which all the edges and vertices interact produces the need for the top edge to be long.
Intuitively, it is this “global” nature of deciding on good edge lengths which makes this problem so difficult. In order to decide how long to make each edge, we have to look at all the vertices and edges… and the length we choose for one edge may influence the length of some other edges, which may in turn force us to choose a different length for the original edge! How can we ever make any progress? And even worse, how can we ever hope to know whether some choices we have made lead to a polygon with minimal perimeter?
In my next (and final!) post on orthogons, I will finally explain how we can leverage some amazingly sophisticated software to produce good orthogon drawings!
]]>In one sense, I don’t know why I’m writing about this: it’s more complicated than I’m willing to explain at the moment, and actually it makes little practical difference anyway, because the naive code from my previous post can generate, say, all orthobraces of length in 1.3 seconds—plenty fast enough for practical purposes, and we can’t reasonably expect to draw or look at more than of these things. I just think it’s cool that a simple question like this turns out to have a very nontrivial answer, and someone has answered it—but only quite recently! (Incidentally, for comparison, the efficient code generates all orthobraces of length basically instantaneously, in 0.03 seconds.)
In an upcoming post or two, I’ll explain the remaining missing piece: how do we go from an orthobrace to a picture of an orthogon?
]]>First, some library imports we’ll need.
> import Data.List
> import qualified Data.Set as S
Here’s a function to generate all sequences with X
s and V
s.
> -- @orthoseqs n@ generates all length 2n+4 *sequences* with n+4 X's and
> -- n V's.
> orthoseqs :: Int -> [[Char]]
> orthoseqs n = go (n+4) n
orthoseqs
works by calling a recursive helper function go
, which keeps track of how many X
s and how many V
s still need to be generated.
> where
> go 0 v = [replicate v 'V']
> go x 0 = [replicate x 'X']
If the number of X
s or V
s remaining is , we just output the right number of copies of the other letter.
> go x v = map ('X':) (go (x-1) v) ++ map ('V':) (go x (v-1))
Otherwise, we recursively generate all strings with X
’s and V
’s, and put an X
on the front of each; and similarly for strings with a V
at the beginning.
Let’s try it:
ghci> orthoseqs 0
["XXXX"]
ghci> orthoseqs 1
["XXXXXV","XXXXVX","XXXVXX","XXVXXX","XVXXXX","VXXXXX"]
ghci> length (orthoseqs 2)
28
ghci> length (orthoseqs 3)
120
This looks right and seems to match up with the numbers we computed last time.
Now, to throw out sequences which are equivalent up to rotation and/or reversal, the easiest way is to write a function canonicalize
which computes a canonical representative from each equivalence class: in particular, given a sequence, it computes the lexicographically smallest (that’s a fancy way of saying “comes first in dictionary order”) sequence which is equivalent to it.
> canonicalize :: Ord a => [a] -> [a]
> canonicalize as = minimum (rots as ++ rots (reverse as))
canonicalize
just finds the minimum sequence among all rotations of as
and rotations of reverse as
.
> where
> rots xs = map (take n) . take n . tails $ xs ++ xs
> where
> n = length xs
To generate all rotations, we do a little list manipulation: xs ++ xs
puts two copies of xs
together end to end; tails
produces something like this:
ghci> xs = [1,4,5]
ghci> tails (xs ++ xs)
[[1,4,5,1,4,5],[4,5,1,4,5],[5,1,4,5],[1,4,5],[4,5],[5],[]]
Then we take just the first results of tails
and take only the first n
elements of each, like so:
ghci> map (take 3) . take 3 . tails $ xs ++ xs
[[1,4,5],[4,5,1],[5,1,4]]
Now we can write orthobraces
fairly simply: generate all sequences, canonicalize each (resulting in a bunch of duplicates), then put them into a set (which gets rid of duplicates) and list the elements of the set.
> -- @orthobraces n@ generates all length 2n+4 *bracelets* with n+4 X's
> -- and n V's. That is, one representative is generated for each
> -- bracelet, up to reflection and rotation.
> orthobraces :: Int -> [[Char]]
> orthobraces = S.toList . S.fromList . map canonicalize . orthoseqs
Let’s try it:
ghci> orthobraces 0
["XXXX"]
ghci> orthobraces 1
["VXXXXX"]
ghci> orthobraces 2
["VVXXXXXX","VXVXXXXX","VXXVXXXX","VXXXVXXX"]
ghci> orthobraces 3
["VVVXXXXXXX","VVXVXXXXXX","VVXXVXXXXX","VVXXXVXXXX","VXVXVXXXXX","VXVXXVXXXX","VXVXXXVXXX","VXXVXXVXXX"]
ghci> map (length . orthobraces) [0..6]
[1,1,4,8,29,79,280]
It seems to work! As we already discussed last time, this is not a particularly efficient way to generate orthobraces, since we end up spending most of our time throwing away duplicates (and what’s worse, even the proportion of time spent throwing away duplicates increases as increases). But it’s rather simple to write and test, and can serve as a good reference implementation to compare other implementations against.
]]>I took a bit of a break to finish writing a paper for submission to the International Conference on Functional Programming—which I should write about here! All in good time. (I tend to accumulate things to write about faster than I can write them… a blessing and a curse!) For now, back to your irregularly scheduled orthogons.
Recall from my previous post that a bracelet is a string considered unique only up to rotation and reflection. We also gave the name orthobrace to a bracelet consisting of X
’s and V
’s, with exactly four more X
’s than V
’s, and we saw that they are in one-to-one correspondence with orthogons. If we can find a way to list all orthobraces of a certain size (and if we can find a reasonable way to turn them into drawings!) we can make a list of all the orthogons of a certain size.
In my previous post I left you with a few questions.
How many sequences are there with X
’s and V
’s?
(Here I really do mean just sequence, not bracelet, so e.g. XXXV
and XXVX
are the same bracelet but different sequences.)
If we have four X
’s and no V
’s then there is of course only one. With five X
’s and one V
, we have a choice of five different places where we could put the V
: before the first X
, after the first X
, after the second X
, … and so on. With six X
’s and two V
’s, we get to choose any two out of six possible places to put the V
’s.
In general, the number of sequences with X
’s and V
’s is the number of ways to choose places out of the total to put the V
’s. We can make a table:
As you can see, the number of sequences grows quite quickly; in fact (though I won’t prove this) it grows exponentially relative to .
What about orthobraces, where we consider sequences equivalent up to rotation and reversal, and just want one representative from each equivalence class? If you try listing them and counting how many there are of each size, you get a sequence of numbers like
which does not grow quite so quickly (though it turns out it still grows exponentially).
Gesh made a nice argument in a comment on my previous post which gives us an idea of how quickly we should expect this to grow relative to how many sequences there are. In particular, we expect that every sequence of length will have roughly different sequences which it is equivalent to— rotations plus all their reversals. This is only a rough estimate, since if the sequence has any sort of symmetry, some of the rotations and reflections may be identical. But symmetry is special—an average random string will not be symmetric, and hence most of the time all the rotations and reflections will be distinct. So we would expect there to be about times as many sequences as bracelets—a little less to account for some of the rotations and reflections not always being distinct. And if we take ratios, here’s what we get:
On the left hand side of the table is the ratio of the number of sequences of size to the number of orthobraces; on the right hand side is . As you can see, the ratios do indeed seem to be approaching from below.
So, what about generating all orthobraces of a given size by listing all sequences and throwing away any sequences which are equivalent to one we’ve already generated? This can be done, but as we can see from the above analysis, we would spend most of our time throwing out sequences which are equivalent to ones we’ve already generated.
So generating orthobraces this way isn’t actually such a great idea, although it does have the benefit of being quite simple conceptually. In another post I’ll show some simple code to carry out this “naive” algorithm, and then talk about an alternative!
]]>