This is far from earth-shattering, but it’s fun to see how a number-theoretic function like arises in a simple formula involving Dirichlet convolution, and how Möbius inversion allows us to quickly derive a related but non-obvious fact. Let’s do a few more!
First, let’s consider . We have
which is just counting the number of divisors of . This function is often denoted . For example, has six divisors (namely, 1, 2, 3, 4, 6, and 12), so . Likewise (1 and 7), (1, 2, and 4), and so on.
The above equation can be restated as , so by Möbius inversion, we immediately conclude that , that is,
For example, we can check this for :
indeed.
As another example, gives us , the sum of the divisors of . This is usually denoted . For example, , , , and so on. Often we also define to be the sum of all the divisors of other than itself, i.e. the proper divisors. Perfect numbers like 6 and 28 are those for which .
Again, since , by Möbius inversion we immediately conclude ; for example, again when , we have
This is a torus, made from 24 crescent-shaped pieces of paper with slots cut into them so they interlock with each other. I followed these instructions on cutoutfoldup.com. There is also a template with some ideas for nice variations here.
The idea of this model is to highlight Villarceau circles. Everyone knows how to find circular cross-sections of a torus, right? You can cut it horizontally (like cutting a bagel in half) in which case you get two concentric circles:
Or you can cut it vertically (like cutting a donut in half, if you wanted to share it with someone else), in which case you get two separate circles:
But it turns out there is yet another way to cut a torus to get circular cross-sections, by cutting it with a diagonal plane:
Note that the plane is tangent to the torus at the two places where the circles intersect. These circles are called Villarceau circles, named after French mathematician Yvon Villarceau, who published a paper about them in 1848. The paper model highlights these circles: each circle is composed of edges from two differently-colored crescents; the points of the crescents come together at the tangent points where two Villarceau circles intersect.
If you want to try making this model yourself, be warned: it is quite difficult! The five-star difficulty rating is no joke. The time from when I first printed out the templates for cutting until the time I finally finished it was several months. Partly that was because I only worked off and on and often went weeks without touching it. But partly it was also because I gave up in frustration a few times. The first time I attempted to assemble all the pieces it ended up completely falling apart. But the second time went much better (using what I had learned from my first failed attempt).
Möbius inversion. Suppose is defined for as the sum of another function over all the divisors of :
Then we can “invert” this relationship to express as a sum over :
The above is the “traditional” way of stating the principle, but I like to write it in this equivalent, more symmetric way:
where, as usual, the sum is over all factorizations of into a product . This is the same as the previous equation, because if and only if , in which case . But now it becomes more clear that we are dealing with a Dirichlet convolution. Can you see how to prove this?
Proof. If we think in terms of Dirichlet convolution, the proof is almost trivial. Note first that saying is the same as saying (recall that taking the Dirichlet convolution with is the same as summing over all divisors). We can then take the Dirichlet convolution of both sides with :
In other words, Möbius inversion is really just saying that if , then , because is the inverse of .
Let’s try an example! Recall that denotes the Euler totient function, which counts how many positive integers less than or equal to are relatively prime to . For example, since all the positive integers less than or equal to (other than itself) are relatively prime to it, and more generally for any prime . As another example, since the only positive integers between and which are relative prime to are , , , and . In a previous post we considered a visual proof that
For example, as seen in the picture above, .
Now, this is in the perfect format to apply Möbius inversion: the sum over all divisors of the function is equal to the function . So we conclude that
or, factoring out ,
which is interesting since it expresses as a fraction of .
Let’s use this to compute . We have
The prime factorization of is . We know that will be for any divisor with any repeated factors, so those all cancel out; we only need to consider divisors which correspond to subsets of . ( is the same example we used in our proof of the key property of the Möbius function.) So
Remembering that yields for an even number of prime factors and for an odd number, we can evaluate this as
(and we can use e.g. a computer to verify that this is correct). Note how this works: we start with and then subtract out all the numbers divisible by (all of them), as well as those divisible by () and (). But now we have subtracted some numbers multiple times, e.g. those which are divisible by both and . So we add back in the numbers which are divisible by ( of them), by (), and by (). Finally, we’ve now overcounted numbers which are divisible by all three, so we subtract off the numbers which are divisible by . Egad! This seems really similar to PIE! Indeed, this is no accident: it turns out that the Principle of Inclusion-Exclusion is really just another kind of Möbius inversion. But that’s a subject for another post!
where the sum is taken over all possible factorizations of into a product of positive integers. Last time we saw that is commutative and associative, and has as an identity, where is the function which produces for an input of and for all other inputs.
So what does the Möbius function have to do with this? Let’s start by considering a different function:
is the function which ignores its input and always returns . Of course this is not the same thing as , and despite its name is indeed not an identity for Dirichlet convolution. That is, if we take some function and find its Dirichlet convolution with , we don’t get again—but we do get something interesting:
The first step is the definition of ; the second step is the definition of ; and the last step just notes that the sum of all where is the same as taking the sum of for all divisors of .
That is, is the function which for a given outputs not just but the sum of over all divisors of .
We’re now ready to see how enters the picture!
Claim: .
That is, is the inverse of with respect to Dirichlet convolution. In my next post we’ll see some cool implications of this; for now, let’s just prove it.
Proof. Here’s the proof. Ready?
Actually, there was hardly anything left to prove! The first equality is because of what we just showed about taking the Dirichlet convolution of with something else. And the second equality is a property of we just finished proving in some previous posts.
The sum is taken over all possible factorizations of into a product of positive integers. For example, suppose and . Then
At this point this may seem somewhat arbitrary, but over the next few posts we’ll see that it’s not arbitrary at all—this operation is deeply connected with the Möbius function and hence with primes and factorization.
So what properties does have?
It’s commutative, that is, . This follows directly from the commutativity of multiplication:
It’s also associative, that is, . This also follows from the associativity of multiplication, and distributivity of multiplication over addition. The proof is kind of fiddly but not really difficult:
Define the function as follows:
Then I claim that is an identity element for , that is, for any function we have . By definition,
But for every other than , which cancels out most of the terms of the sum. The only term that does not become zero is . So the right-hand sum reduces to just , that is, . Since we already know is commutative this proves that too.
So, to sum up^{1}, we have defined the Dirichlet convolution , and shown that it is associative, commutative, and has as an identity element. That is, to use some fancy math words, we have shown that is a commutative monoid over the set of functions defined on the positive integers.^{2} Next time we will see what this has to do with the Möbius function .
and that therefore for all , that is, is the sum of all the th primitive roots of unity.
We had gotten as far as the following lemma:
Subset parity lemma: any nonempty finite set has an equal number of even-sized and odd-sized subsets.
In fact, this lemma is the only remaining piece of the proof: if we can prove this then we will have a complete proof that . Just for fun, we’re actually going to consider two different proofs of this lemma today.
The first proof will be by a combinatorial or counting argument, and was suggested in a comment by Andrey Mokhov. The idea is to match up the subsets in such a way that:
Since the subsets are in pairs and each pair has one even and one odd subset, the inescapable conclusion will be that there are the same number of even and odd subsets. For example, suppose there are a bunch of people in a room and you tell them to get into pairs where each pair has one person with a hat and one person without a hat. If they are able to do this exactly, with no unpaired people left over, then you know (without actually having to count people or hats!) that there must be an equal number of hatted and hatless people.
Call the entire set . Let’s first consider the case where has an odd number of elements. How will we match up its subsets? That is, given some subset , which other subset should be matched with? A natural idea—which turns out to be the right one in this case—is to match with its complement , that is, the subset of elements from which are not in . Since the complement of ’s complement is again (and also because cannot be its own complement), this does indeed put the subsets of into pairs. Also, if is even, then (and vice versa if is odd). So each pair has one even and one odd subset.
A picture will probably help. Here’s an example where . As you can check, each row in the picture is a pair of subsets which are each other’s complement, with one even subset and one odd subset:
Now, what about if is even? Matching with doesn’t work anymore, since they have the same parity—both are even or both are odd. But we can use a cute trick to fix things. Pick some particular element (note, this is the place in the proof where we depend on the assumption that is nonempty!)—for example, we might pick the smallest element if they can be ordered, but it really doesn’t matter which one we pick. Previously, when complementing a subset , we “flipped” the membership of each element of : if is element of , then is not an element of , and vice versa. Now we do the same, except that we don’t flip : if we keep it; if we leave it out. But we still flip all the other elements. Here’s an example when :
Note how we match all the sets that don’t have the smallest (blue) element with each other, and likewise all the sets that do have the smallest element are matched with each other. But within each pair, all the elements other than the smallest one are flipped.
You can convince yourself that this does indeed match up all the subsets into pairs. Moreover, each pair has one even and one odd subset, because the sizes of the paired sets add up to an odd number: if they don’t contain , or if they do. QED.
And that’s it for the first proof! We’ve shown that any nonempty finite set has the same number of even and odd subsets, because we can always match them up perfectly.
And now for a small tangent. Remember that a set of size has subsets of size , where is a binomial coefficient, or an entry in Pascal’s Triangle. Since we now know there are the same number of even and odd subsets, if we add when is even and subtract when is odd, we must get zero:
That is, the alternating sum of each row of Pascal’s Triangle (besides the zeroth row) is 0:
and so on. As you can see, this is already really obvious for odd rows, for example, , since equal numbers cancel out. This is why in some sense the proof was so easy for odd : we just took the complement of each . However, for even rows, such as , it is not as obvious: why should we have ? So the proof for even required a bit more cleverness.
In any case, this discussion of binomial coefficiencts brings us to:
The second proof was hinted at in a comment by decourse. Recall the Binomial Theorem:
Set and . Then the Binomial Theorem says that
And voila! The fact that this sum is zero is the same as saying that there are the same number of even and odd subsets. So we can see that the subset parity lemma is really just a special case of the Binomial Theorem. QED.
Which proof is better? I like them both. The second proof is really short, and would be your best bet to quickly communicate the idea of the proof to someone who already knows the Binomial Theorem. On the other hand, although it’s longer, the first proof is really cute and feels simpler somehow—it doesn’t depend on the Binomial Theorem and doesn’t even use any actual numbers or equations.
So, where are we? We’ve proved the subset parity lemma. But this means that
since we saw that this amounts to looking at subsets of the set of unique prime factors of , and adding for even subsets and for odd subsets.
This proves that (the sum of primitive th roots of unity) since satisfies the same equation. But it’s also a useful fact in its own right, will play an important role in our continuing exploration of the Möbius function .
Previously, we also considered the sum of all the primitive th roots of unity. Today, we will begin proving they are the same: for all ! This is surprising since it is not immediately obvious what the two functions have to do with each other: one is about sums of complex numbers, and the other is about prime factorization. On the other hand, perhaps it is not all that surprising after all: we’ve already seen that primitive roots of unity have a lot to do with divisibility and prime factorization. In any case, this connection is the key that cracks open a lot of really cool mathematics.
So, how will we do it? Recall that we proved satisfies this equation:
In the case of this was fairly “obvious”: we proved it essentially by staring at some pictures. But we will show that also satisfies this equation, that is,
(In contrast to , this is not so obvious! But it’s true, and we will prove it.) Given a starting value for , we have seen that this equation uniquely determines all values of a function for : for example, given , we can use the equation to uniquely determine ; given and , we can then uniquely determine ; and so on. So, if and both satisfy this equation, they must be the same for all as long as they have the same starting value—and they do, since .
I have to say that this is one of my favorite proofs—it rests on a beautiful little combinatorial argument, and, as we’ll see, it gives some insight into the Principle of Inclusion-Exclusion as well. We’ll start the proof today and then see the really beautiful part in my next post. So, without further ado:
Theorem.
Corollary. for all .
Proof. Since , we can already see that as desired. So now consider for some ; we wish to show this sum is zero.
Since for values of which are divisble by a perfect square, those values of contribute nothing to the sum. We can therefore concentrate on only those which are not divisible by a perfect square (we call such numbers squarefree). Each squarefree must be a product of distinct prime factors, since if any prime factor was repeated, then would be divisible by a perfect square. Since , the prime factorization of each squarefree is therefore a subset of the set of distinct prime factors of , and conversely, each subset of the distinct prime factors of yields a distinct squarefree divisor .
For example, consider . Each squarefree divisor of can have at most one copy of each of , , and . There are therefore squarefree divisors of , one corresponding to each subset of :
Adding up all the values of listed in the right column, we can see that there are four ’s and four ’s, which cancel to yield as claimed.
As another example, for , all of the divisors of are squarefree: we have
Our goal is to show that this sum is always zero, and the only way for it to be zero is if there are always an equal number of ’s and ’s. Given the definition of , we know that the ’s come from even-sized subsets of the distinct prime factors of , and the ’s come from odd-sized subsets. So in other words, we need to show that there are the same number of even-sized subsets as there are odd-sized subsets.
The fact that these are sets of distinct prime factors really makes no difference, since we are just counting them. What we really need, in essence, is the following lemma:
Subset parity lemma: any nonempty finite set has an equal number of even-sized and odd-sized subsets.
Can you see how to prove this? (And as an aside, do you see why it is false if we exclude the word “nonempty”? Do you see what this has to do with the versus cases?) My next post will explain a nice proof of the lemma and then conclude the proof that !
Last time, we defined as the sum of all the primitive th roots of unity. We noted that
and used this to deduce values of . In principle we could carry this as far as we want to compute for any , but this doesn’t necessarily give us any insight into the nature of . For example, is always either , , or ? Is there a nicer way to characterize , and/or a quicker way to compute it?
It turns out that is more than just an idle curiosity. In fact, it is famous enough that it has a name: it is known as the Möbius function, and it is usually written . There are indeed nicer ways to characterize and compute . Here’s how it is most commonly defined:
For example:
So far, this seems to match up with what we had already deduced about , but it is not at all obvious whether and are the same. Why should adding up a particular set of complex numbers have anything to do with the number of prime factors of ? In my next post, we’ll prove that they are in fact the same. The proof is a nice exercise in combinatorics and relates to Pascal’s Triangle. (Michael Paul Goldenberg hinted at such a proof in a comment on my previous post.)
As you might guess, is deeply related to prime numbers and the Riemann zeta function. It turns out it is also related to the Principle of Inclusion-Exclusion. All this and more to come!
It, and other pictures like it, express the fact that for a given , if we take the primitive roots for each of the divisors of , together they make up exactly the set of all th roots of unity. The above picture is for the specific case of : the th roots of unity (the dots on the bottom circle) are composed of the primitive roots for , , , , , and (the dots on each of the top circles). I proved this in another post.
Of course, if two sets of complex numbers are the same, then their sums must also be the same. Let’s write for the sum of all the primitive th roots: in my previous post we worked out for certain but weren’t sure how to compute it in other cases. Well, today that’s going to change!
We already know by symmetry that the sum of all the th roots of unity is zero, except when in which case the sum is . Putting all of this together,
That is, the sum of all th roots of unity is the same as summing the primitive roots, , for each divisor of . (The notation means evenly divides , so the summation symbol with underneath means we are summing over all divisors of .)
So what have we gained? Well, we can use this equation “backwards” to compute values for !
And so on. Since each value of can be computed in this way once we know for all the divisors of (which are smaller than ), we can continue filling in a table of values of like this forever.
For example, to fill in we compute and hence .
Do you notice any patterns? It is not too hard to see that must always be an integer (why?), but so far it has always been either , , or ; will it always be one of those three values? Before my next post you might like to try extending the table of values for further and exploring these questions yourself!
Today I want to consider the question: what happens when we add up only the primitive th roots of unity? Recall that a primitive th root is one which is not also an th root for some smaller . As points on the unit circle, the primitive roots correspond to the spokes which are relatively prime to .
In some cases we clearly still get zero, again because of symmetry. Shown below are , , , and :
For we of course get again:
For the only primitive root is , so the sum is clearly . It’s not too hard to see that the sum of primitive roots for will also be :
Why is that? Well, when , the only root that is not a primitive root is , so if we subtract out from the sum of all roots, we are left with the sum of the remaining, primitive roots. But we know the sum of all roots is zero, and hence the sum of the remaining primitive roots must be . In fact, this is not specific to ; the same argument applies to any prime number. When is prime, every root is primitive except for itself, so in order for the sum of all the roots to be zero, the sum of all the primitive roots must be .
However, this still leaves other values of where it is not obvious what we will get as the sum of primitive roots. For example, here are , , and :
Let’s look at . The primitive roots have reflection symmetry across the -axis, so their sum must lie somewhere on the -axis (since their “up-down pulls” cancel out). Since they are leaning more in the positive direction, the sum will be bigger than zero. But how much bigger? Just thinking about the components, we are adding together irrational numbers involving things like the cosine of , so maybe we just get some weird irrational sum like ? It seems plausible, but actually, we don’t: we instead get a sum of… exactly ! And the sums for and … are also exactly . Something strange is going on.
But you don’t have to take my word for it! Next time we’ll put together what we know about the sum of all roots and how the set of th roots is made up of primitive roots for the divisors of to compute sums of primitive roots.