do go no to

dodo
do go on
do no harm
do to others
go do likewise
go-go music
go no further
goto considered harmful
no can do
it’s a no go
a big no-no
say no to drugs
what to do
here or to go
to no avail
I don’t think we’re in Kansas anymore, Toto

English is strange.

Posted in counting, humor | Tagged , ,

Book review: Tales of Impossibility

[Disclosure of Material Connection: Princeton Press kindly provided me with a free review copy of this book. I was not required to write a positive review. The opinions expressed are my own.]

Tales of Impossibility: The 2000-Year Quest to Solve the Mathematical Problems of Antiquity
David S. Richeson
Princeton University Press, 2019

Let me get right to the point: this was hands-down my favorite math book that I read this year. If you don’t already have a copy, you should stop reading this post right now and go buy one! Go on, you’ll thank me. Need more convincing? Read on.

The book is focused around the four “problems of antiquity”: squaring the circle (i.e. constructing a square with the same area as a given circle), angle trisection, doubling the cube (constructing the side length of a cube double the volume of a given cube), and constructing regular n-gons. The “problem” of each is to carry out the required construction using only a compass and straightedge (a set of tools that is probably familiar to most readers from some point in their mathematical education). As Richeson so ably relates, these problems inspired all sorts of advances in mathematics over thousands of years—even though (because?) all were eventually proved impossible in general: Wantzel (angle trisection, doubling the cube, regular n-gons) and Lindemann (squaring the circle) gave the final, definitive proofs, but both built on top of a great deal of mathematics that came before them. Each new player in the story added layer upon layer of understanding over thousands of years.

First and foremost, I am amazed at the incredible amount of historical and mathematical background research that Richeson obviously did for this book, and the way he intertwines mathematics and history into a compelling story. Stereotypically, a book of mathematical history runs a double risk of being dry: too much unmotivated historical or mathematical detail can put anyone to sleep. Richeson deftly avoids this trap, and his book exudes human warmth. But it doesn’t skimp on details either; I learned a great deal of both history and mathematics. In many cases (such as with many of the purely geometric arguments) proofs are included in full detail. In other cases (such as in the discussion of irreducible polynomials), some mathematical details are omitted. Richeson has a good nose for sniffing out the most elegant way to present a proof, and also for knowing when to omit things that would bog down the story too much.

Alternating with the “regular” chapters, Richeson includes a number of “tangents”, each one a short, fascinating glimpse into some topic which is related to the previous chapter but isn’t strictly necessary for driving the story forward (e.g. toothpick constructions, Crockett Johnson, origami, the Indiana pi bill, computing digits of pi, the tau vs pi debate, etc.). Even though none of them are strictly necessary, taken as a whole these “tangent” chapters do a lot to round out the story and give a fuller sense of the many explorations inspired by the problems of antiquity.

In addition to the many mathematical and historical details I learned from the book, I also took away a more fundamental insight. I had always thought of “compass and straightedge” constructions as being rather arbitrary: these are the tools the Greeks happened to choose, and so now we are stuck in a rut of thinking about geometrical constructions using these tools—or so I thought. However, it turns out that they are not quite so arbitrary after all: there are many different sets of tools that lead to exactly the same set of constructible things (there is even some interesting history here, as mathematicians figured out what it should even mean to say that you can “construct the same things” with different tools, leading to definitions of constructible points and constructible numbers). For example, toothpicks, a straightedge and “rusty” compass, a straightedge and a single circle, a compass by itself, or a “thick” straightedge by itself (with two given starting points), all can perform exactly the same set of constructions as a traditional straightedge and compass. And as we learned in later centuries, the constructible numbers have a nice algebraic characterization as well: a point (x,y) is constructible with straightedge and compass if and only if x and y can be described using the four arithmetic operations and square roots. In other words, the set of constructible points seems to be a robust set that can be described in many equivalent ways; it is a more fundamental notion than the arbitrary-sounding description in terms of compass and straightedge would seem to imply. I don’t think I would have been able to understand this without someone like Richeson to do a lot of research and then put all the details together into a coherent story.

[It reminds me of a similar phenomenon with computation: for example, the description of a Turing machine seems rather arbitrary, and in some ways it is, but it turns out that many different models of computation (Turing machines, multi-tape Turing machines, lambda calculus, Post canonical systems, RAM machines…) all yield the same set of computable functions, and so the arbitrary-seeming choice is actually describing something more fundamental.]

In the same way, I thought the problems of antiquity themselves were somewhat arbitrary; but they were famous because they are hard, and it turns out they were hard precisely because they were really getting at the heart of some fundamentally deep ideas. So the fact that they inspired so much rich mathematics is no mere accident of history. One gets the sense that if we ever encounter intelligent life elsewhere in the universe, we may find that they struggled with the same mathematical problems—in very different forms, to be sure, but recognizably the same on a fundamental level.

Anyway, I’ve written more than enough at this point, and I think you get the idea: I thoroughly enjoyed this book, learned a lot from it, and highly recommend it!

Posted in books, review | Tagged , , , , , , , , , , , , | 4 Comments

A new counting system

0 = t__ough
1 = t_rough
2 = th_ough
3 = through

So, for example, 458 = trough through tough though though.

English is so strange.

Posted in arithmetic, counting | Tagged , , , | 1 Comment

A simple proof of the quadratic formula

If you’re reading this blog you have probably memorized (or used to have memorized) the quadratic formula, which can be used to solve quadratic equations of the form

ax^2 + bx + c = 0.

But do you know how to derive the formula? Usually the derivation is presented via completing the square and it involves some somewhat messy algebra (not to mention the idea of “completing the square” itself).

My colleague Gabe Ferrer recently brought to my attention a remarkable new paper by Po-Shen Loh, A Simple Proof of the Quadratic Formula. This paper is remarkable for several reasons: first of all, it’s remarkable that anyone could discover anything new about the quadratic formula; it’s also remarkable for a research mathematician to publish something about elementary mathematics. (But Po-Shen Loh is not your average research mathematician either; he does lots of really cool work making mathematics more accessible for all kinds of learners.) I’m going to explain the basic idea but I highly recommend actually reading the paper, which not only explains the ideas but also does a great job putting everything in proper historical context. Loh has also made a whole web page dedicated to explaining the ideas, with a video, worked examples, etc.; it’s definitely worth taking a look!

The Setup

Suppose we have a quadratic equation we want to solve,

x^2 + bx + c = 0.

To make things simpler, we’ll assume that x^2 has a coefficient of 1. (If we have a quadratic equation with some other coefficient ax^2, we can always divide everything by a first.)

Now imagine we knew how to factor the quadratic. Then we could rewrite the equation into the form

(x - r)(x - s) = 0

which would imply that x = r and x = s are the two solutions. If we multiply out the above factorization (using, you know, “FOIL”), we get

x^2 - (r+s)x + rs = 0

which means we’re looking for values r and s whose product is c and whose sum is -b.

So far, so good; everyone learns this much in high school algebra. The way one usually goes about factoring quadratic polynomials is to make informed guesses for values of r and s and check whether their sum and product give the right coefficients.

The Insight

The key insight at this point, however, is that we don’t actually have to guess! Starting from r + s = -b, let’s divide both sides by 2:

\displaystyle \frac{r+s}{2} = -\frac{b}{2}

The left-hand side is the average of r and s, which lies halfway in between them on the number line. Let’s use z to denote the distance from r to -b/2. Since -b/2 is halfway in between r and s, z must also be the distance from -b/2 to s. So we can write r and s in the form

r,s = -b/2 \pm z

Now, we know their product has to be c, and multiplying them is particularly easy because we get a difference of squares:

\displaystyle c = rs = \left(-\frac{b}{2} + z \right) \left(-\frac{b}{2} - z \right) = \left(-\frac{b}{2} \right)^2 - z^2

Now solving for z is easy; just move z^2 to one side of the equation by itself and take the square root:

\displaystyle z = \pm \sqrt{\frac{b^2}{4} - c}

That means the solutions are

\displaystyle r,s = -\frac{b}{2} \pm z = -\frac{b}{2} \pm \sqrt{\frac{b^2}{4} - c}.

If you like, you can use the same method starting from ax^2 + bx + c = 0 to derive the usual quadratic formula including an arbitrary value of a, although the required algebra gets a bit messier.

Using it in practice

One particularly nice thing about this derivation is that it corresponds to a simple algorithm for solving an arbitrary quadratic equation x^2 + bx + c = 0, so there’s no need to memorize a formula at all:

  1. Note that the two solutions must add up to -b, so their average is half of -b, and hence they can be written as -b/2 \pm z.
  2. Write down the equation (-b/2 + z)(-b/2 - z) = b^2/4 - z^2 = c, and solve for z.
  3. The solutions are -b/2 + z and -b/2 - z.

Of course if you need to solve something of the form ax^2 + bx + c = 0, you can add an extra step to divide through by a first.

And that’s it! I really hope this new method will make its way into classrooms around the world; Loh makes the argument (and I agree) that it really is much easier for early algebra learners to grasp. And again, I really encourage you to go look at Loh’s web page to read more, especially about the historical context: at what point in human history could someone have come up with this idea? And why didn’t they? (Or if they did, why did we forget?) All this and more are in the original paper, which is a really fascinating and accessible read.

Posted in algebra, proof | Tagged , , , , | 3 Comments

Post without words #29

(This variant was requested by Mark Dominus.)

Posted in posts without words | Tagged , , , , | 2 Comments

Post without words #28

Image | Posted on by | Tagged , , | 12 Comments

Book review: Opt Art

[Disclosure of Material Connection: Princeton Press kindly provided me with a free review copy of this book. I was not required to write a positive review. The opinions expressed are my own.]

Opt Art: From Mathematical Optimization to Visual Design
Robert Bosch
Princeton University Press, 2019

I recently finished reading Robert Bosch’s new book, Opt Art. It was a quick read, both because it’s not actually that long, but also because it was fascinating and beautiful and I didn’t want to put it down!

The central theme of the book is using linear optimization (aka “linear programming”) to design and generate art. The resulting art can be simply beautiful for its own sake, or can also give us insight into underlying mathematics.

Linear optimization is something I knew about in a general sense, but after reading Bosch’s book I understand it much better—both the details of how the simplex algorithm works, and especially the various ways linear optimization can be applied. I think Bosch does a fantastic job explaining things in a way that gives real insight but doesn’t get bogged down in too much detail. (In a few places I personally wish there had been a few more details—but it’s quite possible that adding more detail would have made the book better for me but worse for a bunch of other people, i.e. it would not be a global optimum!)

A Celtic knot pattern created out of a single continuous TSP tour

Another thing the book explains really well is how the Travelling Salesman Problem (TSP) can be solved using linear optimization. I had no idea there was a connection between the two topics. I’m sure the connection is explained in great detail in the TSP book by William Cook, which I read 7 years ago, but for some reason when I read that I guess it didn’t really click. But from reading Bosch’s book I feel like I now know enough to put together the details and implement a basic TSP solver myself if I wanted to (maybe I will)!

I’m definitely inspired to use some of Bosch’s techniques to make my own artwork—if I do, I will obviously post about it here!

Posted in books, review | Tagged , , ,

More on Human Randomness

In a post a few months ago I asked whether there is a way for a human to reliably generate truly random numbers. I got a lot of great responses and I think it’s worth summarizing them here!

Randomness in poker strategies

Robert Anderson noted that poker players sometimes use the second hand of a watch to introduce some randomness into their strategy. I assumed this would be something like getting a random bit based on whether the number of seconds is even or odd, but Pete McAllister chimed in to say that it is usually something more like dividing a minute into chunks, and making a decision based on which chunk the current second is in. For example, if you want to make one choice 20 percent of the time and another choice 80 percent of the time, you could just make the first choice if the second hand is between 0–12 seconds, and the other choice otherwise.

In game theory this is called a “mixed” strategy, and this kind of strategy can arise naturally as the Nash equilibrium of certain kinds of games, so it’s not surprising to me that it would show up in high-level poker. I found conflicting advice about this online; some people were claiming that you should not use randomness when playing poker, but I did find a website that talked about implementing this kind of mixed strategy using the second hand of a watch, and it seemed to be a website with pretty high-level poker advice.

In any case, if you have a phone or a watch with you, this does suggest some strategies for generating random numbers: for example, look at the last digit of the seconds to get a random number from 0–9, or whether it is even or odd to get a bit. Or you could just take the number of seconds directly as a random number between 0–59. Of course this only works once and then you have to wait a while before you can do it again. Also, it turns out that my phone doesn’t show seconds by default. Taking the ones digit of the minutes as a random number from 0–9 should work too, but the tens digit of the minutes seems like it’s “not random enough”, in the sense that it might be correlated with whatever it is that I’m doing.

Of course, a phone or watch counts as an “aid”, but most people tend to carry around something like this all the time, so it’s relatively practical. On the other hand, if you’re going to use a phone anyway, you should just use an app for generating random numbers.

Bodies and clothing

  • Naren Sundar commented that hair is pretty random, but admitted that it would be hard to measure.

  • Frederik suggested spitting, or throwing your shoe in the air and seeing which way the toe points when it lands. I like the shoe idea, but on the other hand it’s somewhat obtrusive to take your shoe off and throw it in the air every time you want a random bit! And what if you’re not wearing shoes? I’m also afraid I might throw my shoe in the same way every time; I’m not sure how random it would be in practice.

Minds and memorization

Kaligule suggested taking whatever song is currently running through your head, stopping it at a random point, and getting a random bit by seeing whether the number of consonants in the next word is even or odd.

This is a cool idea, and is the only proposal that really meets my criterion of generating randomness “without aids”. I think for some people it could work quite well. “Stopping at a random point” is somewhat problematic—you might be biased to stop at certain points more than others—but it’s pretty hard to know how many consonants are in a word before you count, so I’m not sure this would really bias the results that much.

Unfortunately, however, it won’t work for me because, although I do always have some kind of music running through my head, it often has no lyrics! Kaligule suggested using whether the melody goes up or down, but this is obvious (unlike number of consonants in a word) and too easy to “cheat”, i.e. pick a stopping point that gives me the bit I “want”.

This suggested another idea to me, however: just pre-generate some random data and put some up-front effort into memorizing it. Whenever you need some randomness, use the next part of the random sequence you memorized. When you use it up, generate another and memorize that instead. This leaves a number of questions:

  • How do you reliably keep track of where you are in the sequence? I don’t actually have a good answer to this. I think in practice I would get confused and forget whether I had already used a certain part or not. Though maybe this doesn’t really matter that much.

  • What format would be most effective, and how do you go about memorizing it? Some ideas:

    • My first idea is to generate a sequence of random bits, and then write a story where sequential words have even or odd numbers of letters corresponding to the bits in your sequence. Unfortunately, this seems like a relatively inefficient way to memorize data, but writing a story that corresponds to a given sequence of bits does sound like a fun exercise in constrained writing.

    • Alternatively, one could simply generate a random sequence of digits (or hexadecimal digits) and memorize them using whatever sort of memorization technique you like (e.g. a memory palace). This is less fun but probably more effective. Memorizing a story sounds like it would be easier, but I don’t think it is, especially since you would have to memorize it word-for-word and you only get one bit per word memorized, as opposed to something like e.g. four bits per hexadecimal digit.

I have generated some random hexadecimal digits but haven’t gotten around to trying to memorize them yet. If I do I will definitely report on the experience. In the meantime, I’m also open to more ideas!

Posted in computation, people | Tagged , , , | 2 Comments

Book review: The Mathematics of Various Entertaining Subjects, Volume 3

I have a bunch of books in the queue to review—I hope to begin writing these more regularly again!

[Disclosure of Material Connection: Princeton Press kindly provided me with a free review copy of this book. I was not required to write a positive review. The opinions expressed are my own.]

The Mathematics of Various Entertaining Subjects, Volume 3: The Magic of Mathematics
Jennifer Beineke and Jason Rosenhouse, eds.
Princeton University Press, 2019

The MOVES conference takes place every two years in New York. MOVES is an acronym for “The Mathematics of Various Entertaining Subjects”, and the conference is a celebration of math that isn’t necessarily considered an Important Research Topic, and doesn’t necessarily have Important Applications—but simply math that is fun for its own sake. (Although in hindsight, math that starts out as Just For Fun often seems to end up with important applications too—for example, think of graph theory or probability theory.) The most recent conference took place just a few months ago, in August 2019; the next one will be in August 2021 (you can already register if you like to plan that far ahead!).

This book is basically the conference proceedings from 2017—a collection of papers that were presented at the conference, published all together in book form. So it’s important to state at the outset that although the topics are entertaining, this really is a collection of research papers. Overall this is definitely not a book written for a general audience! I had to work hard to understand some of the papers, and some of them lost me completely.

However, there’s some great stuff in here that rewards patient study. Some of my favorites that are more generally accessible include:

  • A chapter on “Wiggly Games and Burnside’s Lemma” that does a great job explaining Burnside’s Lemma—a classic result about counting things with symmetry, at the intersection of combinatorics and group theory—via applications to counting the number of possible tiles in several different games.

  • “Solving Puzzles Backwards” has some nice puzzles and a discussion of elegant ways to approach their solutions.

  • “Should we Call Them Flexa-Bands?” has some interesting reflections on the topology of different types of flexagons.

Some other things I particularly enjoyed but which are not so accessible without some background include a chapter on the computational complexity of losing at checkers, a chapter on “Kings, sages, hats, and codes” that I wish I understood better, and a chapter on the combinatorics of Legos.

There’s so much other stuff in there on such wildly varying topics that it’s impossible to summarize. In any case, definitely recommended if you are a professional mathematician looking for some fun yet still technically meaty reading; definitely not recommended if you’re looking for a casual read of a popular math book. And if you’re somewhere in between—that is, you’re not a professional mathematician but you aspire to read and understand things on that level—this could honestly be a great place to start!

Posted in books, review | Tagged , , , ,

A combinatorial proof: PIE a la mode!

Continuing from my last post in this series, we’re trying to show that S = n!, where S is defined as

\displaystyle S = \sum_{i=0}^n (-1)^i \binom{n}{i} (k+(n-i))^n

which is what we get when we start with a sequence of n+1 consecutive nth powers and repeatedly take successive differences.

Recall that we defined M_a as the set of all functions from a set of size n (visualized as n blue dots) to a set of size k + n (visualized as k yellow dots on top of n blue dots) such that the blue dot numbered a is missing. I also explained in my previous post that the functions with at least one blue dot missing from the output are exactly the “bad” functions, that is, the functions which do not correspond to a one-to-one matching between the blue dots on the left and the blue dots on the right.

As an example, the function pictured above is an element of M_1, as well as an element of M_3. (That means it’s also an element of the intersection M_1 \cap M_3—this will be important later!)

Let F be the set of all functions from n to k+n, and let P be the set of “good” functions, that is, the subset of F consisting of matchings (aka Permutations—I couldn’t use M for Matchings because M is already taken!) between the blue sets. We already know that the number of matchings between two sets of size n, that is, |P|, is equal to n!. However, let’s see if we can count them a different way.

Every function is either “good” or “bad”, so we can describe the set of good functions as what’s left over when we remove all the bad ones:

\displaystyle P = F - \bigcup_{a=1}^n M_a

(Notice how we can’t just write P = F - M_1 - M_2 - \dots - M_n, because the M_a sets overlap! But if we union all of them we’ll get each “bad” function once.)

In other words, we want to count the functions that aren’t in any of the M_a. But this is exactly what the Principle of Inclusion-Exclusion (PIE) is for! PIE tells us that the size of this set is

\displaystyle |P| = |F| - \left|\bigcup_{a=1}^n M_a \right| = \sum_{T \subseteq \{1 \dots n\}} (-1)^{|T|}\left| \bigcap_{a \in T} M_a \right|,

that is, we take all possible intersections of some of the M_a, and either add or subtract the size of each intersection depending on whether the number of sets being intersected is even or odd.

We’re getting close! To simplify this more we’ll need to figure out what those intersections \bigcap_{a \in T} M_a look like.

Intersections

What does M_a \cap M_b look like? The members of M_a \cap M_b are exactly those functions which are in both M_a and M_b, so M_a \cap M_b contains all the functions that are missing both a and b (and possibly other elements). Likewise, M_a \cap M_b \cap M_c contains all the functions that are missing (at least) a, b, and c; and so on.

Last time we argued that |M_a| = (k + (n-1))^n, since functions from n to k+n that are missing a can be put into a 1-1 matching with arbitrary functions from n to k + (n-1), just by deleting or inserting element a:

So what about an intersection—how big is M_a \cap M_b (assuming a \neq b)? By a similar argument, it must be (k + (n-2))^n, since we can match up each function in M_a \cap M_b with a function from n to k+(n-2): just delete or insert both elements a and b, like this:

Generalizing, if we have a subset T \subseteq \{1, \dots, n\} and intersect all the M_a for a \in T, we get the set of functions whose output is missing all the elements of T, and we can match them up with functions from n to k + (n-|T|). In formal notation,

\displaystyle \left| \bigcap_{a \in T} M_a \right| = (k + (n-|T|))^n

Substituting this into the previous expression for the number of blue matchings |P|, we get

\displaystyle |P| = \sum_{T \subseteq \{1 \dots n\}} (-1)^{|T|}(k + (n-|T|))^n

Counting subsets

Notice that the value of (-1)^{|T|}(k + (n-|T|))^n depends only on the size of the subset T and not on its specific elements. This makes sense: the number of functions missing some particular number of elements is the same no matter which specific elements we pick to be missing.

So for each particular size |T| = i, we are adding up a bunch of copies of the same value (-1)^i (k + (n-i))^n—as many copies as there are different subsets of size i. The number of subsets T of size i is \binom n i, the number of ways of choosing exactly i things out of n. Therefore, if we add things up size by size instead of subset by subset, we get

\begin{array}{rcl} |P| &=& \displaystyle \sum_{\text{all possible sizes } i} (\text{number of subsets of size } i) \cdot \left[(-1)^{i}(k + (n-i))^n \right] \\[1em] &=& \displaystyle \sum_{i=0}^n \binom n i (-1)^{i}(k + (n-i))^n\end{array}

But this is exactly the expression for S that we came up with earlier! And since we already know |P| = n! this means that S = n! too.

And that’s essentially it for the proof! I think there’s still more to say about the big picture, though. In a future post I’ll wrap things up and offer some reflections on why I find this interesting and where else it might lead.

Posted in arithmetic, combinatorics, proof | Tagged , , , , ,