I’ve written about this before, but it’s worth spelling it out for completeness’ sake. If you have a sum of something which is itself a sum, like this:
you can split it up into two separate sums:
(You can also sort of think of this as the sigma “distributing” over the sum.) For example,
Why is this? Last time, the fact that we can pull constants in and out of a sigma came down to a property of addition, namely, that multiplication distributes over it. This, too, turns out to come down to some other properties of addition. As before, let’s think about writing out these sums explicitly, without sigma notation.
First, on the left-hand side, we have something like
And on the right-hand side, we have
We can see that we get all the same terms, but in a different order. But as we are all familiar with, the order in which you add things up doesn’t matter: addition is both associative and commutative, so we can freely reorder terms and still get the same sum.^{1}
So “distributes over” sums! Let’s use the example from above to see how this can be useful. Suppose we want to figure out a closed-form expression for
If we didn’t otherwise know how to proceed we could certainly start by trying some examples and looking for a pattern. Or we could even be a bit more sophisticated and notice that this sum will be , so it must be less than the triangular number . But we don’t even need to be this clever. If we just distribute the sigma over the addition, we transform the expression into two simpler sums which are easier to deal with on their own:
The first sum is , that is, the th triangular number, which is equal to . The second sum is just ( times), so it is equal to . Thus, an expression for the entire sum is
As a double check, is this indeed three less than the nd triangular number?
Sure enough! Math works!
At least, as long as the sum is finite! This still works for some infinite sums too, but we have to be careful about convergence.↩
[Disclosure of Material Connection: Princeton Press kindly provided me with free review copies of these books. I was not required to write a positive review. The opinions expressed are my own.]
Liz McMahon, Gary Gordon, Hannah Gordon, and Rebecca Gordon
Princeton University Press, 2016
Most people are probably familiar with the card game SET: each card has four attributes (number, color, shading, shape) each of which can have one of three values, for a total of cards. The goal is to find “sets”, which consist of three cards where each attribute is either identical on all three cards, or distinct on all three cards. It’s a fun game, and because it has to do with combinations of things and pattern recognition, many people probably have the intuitive sense that it’s a “mathy” sort of game, or the sort of game that people who enjoy math would also enjoy
Well, it turns out, as the authors convincingly demonstrate, that the mathematics behind SET actually goes very deep. For example, did you know that there are exactly distinct SETs in an -dimensional version of the game? (The normal game that everyone plays has .) How about the fact that the SET deck is a concrete model of the four-dimensional affine geometry ? Did you know that the most cards you can have without a SET is 20, and that this is intimately connected to structures called maximal caps in affine geometries—and that no one knows how many cards you could have without a SET in a -dimensional (or higher) version of the game?
The authors explain all this, and much more (with a lot of humor^{1} along the way!), ranging through probability, modular arithmetic, combinatorics, geometry, linear algebra, and a bunch of other topics. The book begins gently, but by the end it gets into some fairly deep mathematics, and there are lots of exercises and projects at the end of each chapter. This book would make a fantastic resource for a middle school, high school, or undergraduate math club. I could even see using it as the textbook for some sort of extra/special topics class with some motivated students.
John Stillwell
Princeton University Press, 2016
I am a huge fan of Stillwell’s writing (almost six years ago I wrote a short review of another one of his books, Roads to Infinity) and I wasn’t disappointed. This book is definitely aimed at a more sophisticated audience than the SET book, but due to Stillwell’s lucid explanations it still manages to start out rather gently and holds many treasures even for the intrepid high school reader.
The book has two basic goals. The first is to simply lay out an overview of “elementary” mathematics, accessible in theory to anyone with a high school level mathematical background. “Elementary” mathematics refers not just to the sort of mathematics learned in grade school (arithmetic, fractions, and so on) but to the mathematics that would nowadays be viewed as “basic” by professional mathematicians—the sort of stuff that every professional mathematician is familiar with regardless of their specialty. In this respect the book is quite a tour de force, organized by areas of mathematics—arithmetic, computation, algebra, geometry, calculus, and so on—and in each area Stillwell manages to distill down the big ideas and the connections with other areas. He is a master expositor, and the text manages to be engaging and accessible without watering down the mathematics. I definitely learned new things from the book! One thing Stillwell does very well in particular is to explain not just the big ideas but the connections between them.
The other basic goal of the book is to explore the boundary between “elementary” and “advanced” mathematics. This sounds like it would be rather vague and amorphous—after all, aren’t the notions of “elementary” and “advanced” quite relative? Doesn’t it depend on how much background you have? Can’t math that is “elementary” to one person be “advanced” to someone else? This is all true, but Stillwell isn’t really talking about which areas of math are hard and which are easy. Professional mathematicians often talk about certain proofs being “elementary”, and it is often celebrated when someone finds an “elementary” proof of a theorem, even if that theorem had already been proved by “non-elementary” means, and even if the non-elementary proof was shorter. Stillwell is trying to pin down a precise meaning of this sense of “elementary”, and makes a well-reasoned case that it all comes down to infinity: something is non-elementary precisely when infinity enters into its proof in a fundamental way. This may seem rather arbitrary at first blush, but through a number of examples and surprising connections between different areas of mathematics, Stillwell makes it clear that this is an extremely “natural” place to draw a line in the sand. Not that having such a dividing line is in and of itself of any value—it’s simply fascinating to note that there is such a natural line at all, and by exploring it in depth we shed new light on the mathematics to either side of it.
They are extremely fond of footnotes. Reminds me of someone I know.↩
Could you explain how to take a constant outside of a summation and bring it inside the summation?
This made me realize there’s a lot more still to be explained! In particular, understanding what sigma notation means is one thing, but becoming fluent in its use requires learning a number of “tricks”. Of course, as always, they’re not really “tricks” at all: understanding what the notation means is the necessary foundation for understanding why the tricks work!
For today, we’ll start by considering what Kevin asked about. Consider what is meant by this sigma notation:
It doesn’t really matter what the ’s are, the point is just that each might be different, whereas is a constant that doesn’t change. So this can be expanded as
Since multiplication distributes over addition, we can factor out the :
The right-hand side can now be written as
so overall we have shown that
We usually omit the parentheses and just write
Our argument didn’t really depend on any of the specifics (like the fact that goes from to ). The general principle is that constants can “jump” back and forth across the sigma, which corresponds to multiplication distributing across addition.
The one remaining question is—what counts as a “constant”? The answer is, anything that doesn’t depend on the index variable. So the “constant” can even involve some variables, as long as they are other variables! For example,
In the context of this sum, is a “constant”, because it does not have in it. Since it doesn’t contain , it is going to be exactly the same for each term of the sum, which means it can be factored out.
Today I want to give you a glimpse of what it has to do with prime numbers—which is a big part of why it is so famous.
Consider the infinite product
where each sequential factor has the next prime number raised to the power. Using big-Pi notation, we can write this infinite product more concisely as
(The big means a roduct just like a big means a um.)
Now let’s do a bit of algebra. First, recall that the infinite geometric series has the sum
as long as . (For some hints on how to derive this formula if you haven’t seen it before, see this blog post or this one.) Of course, is of this form, with . Note that which is less than as long as , so the geometric series formula applies, and we have
(From now on I’ll just write instead of .) That is,
So this is an infinite product of infinite sums! But you’re not scared of a little infinity, are you? Good, I thought not. Now, what would happen if we “multiplied out” this infinite product of infinite sums? Note that every term in the result would come from picking one term of the form from each of the factors, one for each prime , and multiplying them. (Though infinitely many of the choices have to be if we are to get a finite term as a result.) For example, one way to choose terms would be
which would give us . In fact, because of the Fundamental Theorem of Arithmetic (every positive integer has a distinct prime factorization), each choice gives us the prime factorization of a different positive integer, and conversely, every positive integer shows up exactly once. That is, after multiplying everything out, we get one term of the form for each positive integer :
But that’s just our old friend ! So in fact,
turns out to be an equivalent way to write the Riemann zeta function.
We can now use this in a really cute proof that there are infinitely many primes. Consider , where we substitute for . In our original definition of , we get
This is known as the harmonic series, and it is a well-known fact that it diverges, that is, as you keep adding up more and more terms of the series, the sum keeps getting bigger and bigger without bound. Put another way, pick any number you like—a hundred, a million, a trillion—and eventually, if you keep adding long enough, the sum will become bigger than your chosen number. (Though you may have to wait a very long time—the harmonic series diverges rather slowly indeed!) One way to prove this is to note that the series is greater than
(the original series is greater than this because we only made some of its terms smaller—I changed the into , and then changed through into , and then through into , and so on). But this new smaller series is equal to which will clearly get arbitrarily large. So the harmonic series, which is larger, must diverge as well.
So, diverges. But what happens if we plug into the other expression for ? We get
If there were only finitely many primes, this would be a finite product of some fractions and would thus have some definite, finite value—but we know it has to diverge! Thus there must be infinitely many primes.
I will note one other thing—when I was writing up some notes for this post I was initially confused by the fact that if we set, say, , we already know that ; but now we also know that
But this is an infinite product of fractions which are all bigger than ! How could it converge? …well, my intuition was just playing tricks with me. Although I have lots of practice thinking about infinite sums that converge, I am just not used to thinking about infinite products that converge. But in the end it is not really any more surprising than the fact that an infinite sum can converge even though all its terms are positive: as long as the fractions are getting smaller quickly enough, such an infinite product certainly can converge, and in fact it does. Using a computer confirms that the more terms of this product we include, the closer the product gets to
If you want to get your own set you can buy one here! Also, if you have any other ideas for games or activities using the cards, please send them my way.
First, you can play a classic game of War. The twist is that while playing you should only look at the diagram side of the cards, not the side with the written-out number. So part of the game is figuring out which factorization diagram represents a bigger number. One could of course just work out what each number is and then compare, but I imagine students may also find tricks they can use to decide which is bigger without fully working out what the numbers are.
Variant 1: primes are wild, that is, primes always beat composite numbers. (If you have two primes or two composite numbers, then the higher one beats the lower one as usual.) This may actually make the game a bit easier, since when a prime is played you don’t actually need to work out the value of any composite number played in opposition to it.
Variant 2: like variant 1, except that primes only beat those composite numbers which don’t have them as a factor. For example, 5 beats 24, but 5 loses to 30: since 30 has 5 as a prime factor it is “immune” to 5.
As a fun follow-on activity to variant 2, try listing the cards in order according to which beats which!^{1}
Alex and his students came up with a fun variant on SET. Start by dealing out twelve factorization cards, diagram-side-up. Like the usual SET game, the aim is to find and claim sets of three cards. The difference is in how sets are defined. A “set” of factorization cards is any set of three cards that either
Here are a few examples of valid sets:
And here are a few invalid sets:
In order to claim a set you have to state the number on each card and explain why they form a set. If you are correct, remove the cards and deal three new cards. If you are incorrect, keep looking!
Alex and his students found that, just as with the classic SET game, it is possible to have a layout of twelve cards containing no set. For example, here’s the layout they found:
Just to double-check, I confirmed with a computer program that the above layout indeed contains no valid sets. As with the usual SET, if you find yourself in a situation where everyone agrees there are no sets, you can just deal out three more cards.
The natural follow-up question is: what’s the largest possible layout with no sets? So far, this is an open question!
Since someone asked in a comment, I thought it was worth mentioning where this comes from. It would typically be covered in a second-semester calculus class, but it’s possible to understand the idea with only a very basic knowledge of derivatives.
First, recall the derivatives and . Continuing, this means that the third derivative of is , and the derivative of that is again. So the derivatives of repeat in a cycle of length 4.
Now, suppose that an infinite series representation for exists (it’s not at all clear, a priori, that it should, but we’ll come back to that). That is, something of the form
What could this possibly look like? We can use what we know about and its derivatives to figure out that there is only one possible infinite series that could work.
First of all, we know that . When we plug into the above infinite series, all the terms with in them cancel out, leaving only : so must be .
Now if we take the first derivative of the supposed infinite series for , we get
We know the derivative of is , and : hence, using similar reasoning as before, we must have . So far, we have
Now, the second derivative of is . If we take the second derivative of this supposed series for , we get
Again, since this should be , if we substitute we ought to get zero, so must be zero.
Taking the derivative a third time yields
and this is supposed to be , so substituting ought to give us : in order for that to happen we need , and hence .
To sum up, so far we have discovered that
Do you see the pattern? When we take the th derivative, the constant term is going to end up being (because it started out as and then went through successive derivative operations before the term disappeared: ). If is even, the th derivative will be , and so the constant term should be zero; hence all the even coefficients will be zero. If is odd, the th derivative will be , and so the constant term should be : hence , so , with the signs alternating back and forth. And this produces exactly what I claimed to be the expansion for :
Using some other techniques from calculus, we can prove that this infinite series does in fact converge to , so even though we started with the potentially bogus assumption that such a series exists, once we have found it we can prove that it is in fact a valid representation of . It turns out that this same process can be performed to turn almost any function into an infinite series, which is called the Taylor series for the function (a MacLaurin series is a special case of a Taylor series). For example, you might like to try figuring out the Taylor series for , or for (using the fact that is its own derivative).
At the time I didn’t know how to prove this, but I did some quick research and today I’m going to explain it! It turns out that determining the value of this infinite sum was a famous open question from the mid-1600s until it was solved by Leonhard Euler in 1734. It is now known as the Basel problem (it’s not clear to me whether it was called that when Euler solved it). Since then, there have been many different proofs using all sorts of techniques, but I think Euler’s original proof is still the easiest to follow (though it turns out to implicitly rely on some not-so-obvious assumptions, so a completely formal proof is still quite tricky). I learned about this proof from some slides by Brendan Sullivan and an accompanying document.
First, recall the MacLaurin series for :
This infinite sum continues forever with successive odd powers of , alternating between positive and negative. (If you’ve never seen this before, you can take my word for it I suppose; if anyone asks in a comment I would be happy to write another post explaining where this comes from.)
If we substitute for we get
Note that the coefficient of is . Remember that—it will return later!
Now, recall that for finite polynomials, the Fundamental Theorem of Algebra tells us that we can always factor them into a product of linear factors, one for each root (technically, this is only true if we allow for complex roots, though we won’t need that fact here). For example, consider the polynomial
It turns out that this has zeros at , , and , as you can verify by plugging in those values for . By the Fundamental Theorem, this means it must be possible to factor this polynomial as
Note how each factor corresponds to one of the roots: when , then is zero, making the whole product zero; when , the becomes zero, and so on. We also had to put in a constant multiple of 2, to make sure the coefficient of is correct.
So, we can always factorize finite polynomials in this way. Can we do something similar for infinite polynomials, like the MacLaurin series for ? Euler guessed so. It turns out the answer is “yes, under certain conditions”, but this is not at all obvious. This is known as the Weierstrass factorization theorem, but I won’t get into the details. You can just take it on faith that it works in this case, so we can “factorize” the MacLaurin series for , getting one linear factor for each root, that is, for each integer value of :
For example, makes the term zero, and in general will make the term zero. Note how we also included a factor of , corresponding to the root at . We also have to include a constant factor of : this means that the coefficient of in the resulting sum (obtained by multiplying the leading by all the copies of ) will be , as it should be.
Now, since we can simplify this as
Let’s think about what the coefficient of will be once this infinite product is completely distributed out and like degrees of are collected. The only way to get an term is by multiplying the initial by a single term of the form , and then a whole bunch of ’s. There is one way to do this for each possible . All told, then, we are going to have
And now we’re almost done: recall that previously, by considering the MacLaurin series, we concluded that the coefficient of in is . But looking at it a different way, we have now concluded that the coefficient is . Setting these equal to each other, and dividing both sides by , we conclude that
Magic!
We also proved that : the product of Dirichlet generating functions is the Dirichlet generating function of the Dirichlet convolution. Now, consider taking in the above definition. We get
which is also often written as just plain . This function is quite famous: it is the Riemann zeta function. The reason it is so famous is because it is the subject of a famous unproved conjecture, the Riemann hypothesis, which in turn is famous because it has been both difficult to prove—many mathematicians have been attacking it for a long time—and deeply related to many other areas of mathmatics. In particular it is deeply related to prime numbers. If you want to understand the Riemann hypothesis better, I highly recommend reading the (truly wonderful) Secrets of Creation Trilogy, especially the first two volumes, which explain it from first principles. (I have previously written about the Secrets of Creation trilogy on this blog: a review of Volume 1 is here, and here is my review of Volume 2). In this post I just want to help you understand a few cool things about the zeta function.
Remember that , so we must have . Also recall that when but it equals otherwise, so in fact
since the only nonzero term is . All together, then, we have
and hence
For example, consider :
This converges to something, although a priori it is not obvious what. By writing a simple computer program, or by asking Wolfram Alpha, we can add up, say, 1000 terms and find that it is approximately . Apparently, the reciprocal of this number is given by
where each numerator is . Again, we can use a computer to check that this sum is approximately : and sure enough, this is approximately !
It turns out that converges to exactly (!!!)—hopefully I can write another blog post explaining that (to be honest at the moment I don’t know how to prove it). We also know that
and there is actually a formula giving for any even positive integer. is called Apéry’s constant, since Roger Apéry proved in 1978 that it is irrational; but we don’t know of any nice formula for it.
I will leave you with a few things to prove that you may find amusing:
Recall that is the number of divisors of , and is the sum of divisors of ; to prove these you will want to reference some facts we proved about and in terms of Dirichlet convolution.
Next time, we’ll see another way to relate the zeta function to prime numbers.
(This might look a bit strange, but bear with me!) For example, suppose for all . Then
(Note that in this case, with , the infinite sum converges; but often we just use as a formal placeholder and don’t particularly care about convergence. That is, is perfectly well defined as an infinite series without worrying about the value of .)
In fact is called the Dirichlet generating function for . Of course, this name should remind you of Dirichlet convolution—and it’s no coincidence; Dirichlet convolution and Dirichlet generating functions are closely related. Let’s see how.
Suppose we have two functions and , and consider multiplying their Dirichlet generating functions (with plain old, regular multiplication):
We have a big (well, infinite) sum of things multiplied by another big sum. By distributivity, we’re going to get a big sum as a result, where each term of the resulting sum is the product of one thing chosen from the left-hand sum and one thing chosen from the right-hand sum. (This is just like FOIL, only way cooler and more infinite-r.) That is, the resulting sum is going to look something like this:
with one term for every possible choice of and . The in the denominator is of course equal to , and we can collect up all the fractions with the same denominator: for each , we will get a denominator of in each case where we pick some and whose product is . So we can reorganize the terms in the above sum, grouping together all the terms where the product of and is the same, and rewrite it like this (is this starting to look familiar…?):
Now we can factor out the (since it doesn’t depend on or ), like so:
But the inner sum is now just the definition of the Dirichlet convolution of and ! So the whole thing becomes
And finally, we note that the thing on the right is itself the Dirichlet generating function for . So in summary, we have shown that
Neato! So in some sense, Dirichlet generating functions “turn Dirichlet convolution into regular multiplication”.
This is far from earth-shattering, but it’s fun to see how a number-theoretic function like arises in a simple formula involving Dirichlet convolution, and how Möbius inversion allows us to quickly derive a related but non-obvious fact. Let’s do a few more!
First, let’s consider . We have
which is just counting the number of divisors of . This function is often denoted . For example, has six divisors (namely, 1, 2, 3, 4, 6, and 12), so . Likewise (1 and 7), (1, 2, and 4), and so on.
The above equation can be restated as , so by Möbius inversion, we immediately conclude that , that is,
For example, we can check this for :
indeed.
As another example, gives us , the sum of the divisors of . This is usually denoted . For example, , , , and so on. Often we also define to be the sum of all the divisors of other than itself, i.e. the proper divisors. Perfect numbers like 6 and 28 are those for which .
Again, since , by Möbius inversion we immediately conclude ; for example, again when , we have