The Riemann zeta function

Recall from my previous post that given a function f(n), we define \zeta_f, the Dirichlet generating function of f, by

\displaystyle \displaystyle \zeta_f(s) = \sum_{n \geq 1} \frac{f(n)}{n^s}.

We also proved that \zeta_f \zeta_g = \zeta_{f \ast g}: the product of Dirichlet generating functions is the Dirichlet generating function of the Dirichlet convolution. Now, consider taking f(n) = 1 in the above definition. We get

\displaystyle \displaystyle \zeta_{\mathbf{1}}(s) = \sum_{n \geq 1} \frac{1}{n^s}

which is also often written as just plain \zeta(s). This function is quite famous: it is the Riemann zeta function. The reason it is so famous is because it is the subject of a famous unproved conjecture, the Riemann hypothesis, which in turn is famous because it has been both difficult to prove—many mathematicians have been attacking it for a long time—and deeply related to many other areas of mathmatics. In particular it is deeply related to prime numbers. If you want to understand the Riemann hypothesis better, I highly recommend reading the (truly wonderful) Secrets of Creation Trilogy, especially the first two volumes, which explain it from first principles. (I have previously written about the Secrets of Creation trilogy on this blog: a review of Volume 1 is here, and here is my review of Volume 2). In this post I just want to help you understand a few cool things about the zeta function.

Remember that \mu \ast \mathbf{1} = \varepsilon, so we must have \zeta_\mu \zeta_{\mathbf{1}} = \zeta_\varepsilon. Also recall that \varepsilon(n) = 1 when n = 1 but it equals 0 otherwise, so in fact

\displaystyle \displaystyle \zeta_\varepsilon(s) = \sum_{n \geq 1} \frac{\varepsilon(n)}{n^s} = 1

since the only nonzero term is \varepsilon(1)/1^s = 1/1 = 1. All together, then, we have

\displaystyle \zeta_\mu(s) \zeta(s) = 1

and hence

\displaystyle \displaystyle \frac{1}{\zeta(s)} = \zeta_\mu(s) = \sum_{n \geq 1} \frac{\mu(n)}{n^s}

For example, consider \zeta(2):

\displaystyle \zeta(2) = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \dots

This converges to something, although a priori it is not obvious what. By writing a simple computer program, or by asking Wolfram Alpha, we can add up, say, 1000 terms and find that it is approximately 1.6439\dots. Apparently, the reciprocal of this number is given by

\displaystyle \frac{1}{\zeta(2)} = \frac{1}{1^2} + \frac{-1}{2^2} + \frac{-1}{3^2} + \frac{0}{4^2} + \dots

where each numerator is \mu(n). Again, we can use a computer to check that this sum is approximately 0.6079\dots: and sure enough, this is approximately 1/1.6439\dots!

It turns out that \zeta(2) = \sum_{n \geq 1} 1/n^2 converges to exactly \pi^2/6 (!!!)—hopefully I can write another blog post explaining that (to be honest at the moment I don’t know how to prove it). We also know that

\displaystyle \displaystyle \zeta(4) = \sum_{n \geq 1} \frac{1}{n^4} = \frac{\pi^4}{90}

and there is actually a formula giving \zeta(2n) for any even positive integer. \zeta(3) \approx 1.2020\dots is called Apéry’s constant, since Roger Apéry proved in 1978 that it is irrational; but we don’t know of any nice formula for it.

I will leave you with a few things to prove that you may find amusing:

  • \zeta(s)^2 = \zeta_\tau(s)
  • \zeta(s)\zeta(s-1) = \zeta_\sigma(s)

Recall that \tau(n) is the number of divisors of n, and \sigma(n) is the sum of divisors of n; to prove these you will want to reference some facts we proved about \tau and \sigma in terms of Dirichlet convolution.

Next time, we’ll see another way to relate the zeta function to prime numbers.

Posted in number theory | Tagged , , , , , , , | 2 Comments

Dirichlet generating functions

Suppose f(n) is a function defined for positive integers n \geq 1. Then we can define an infinite series \zeta_f(s) as follows:

\displaystyle \begin{array}{rcl}\zeta_f(s) &=& \displaystyle \frac{f(1)}{1^s} + \frac{f(2)}{2^s} + \frac{f(3)}{3^s} + \dots + \frac{f(n)}{n^s} + \dots \\[1em] &=& \displaystyle \sum_{n \geq 1} \frac{f(n)}{n^s} \end{array}

(This might look a bit strange, but bear with me!) For example, suppose f(n) = n+2 for all n. Then

\displaystyle \zeta_f(3) = \displaystyle \frac{1+2}{1^3} + \frac{2+2}{2^3} + \frac{3+2}{3^3} + \frac{4+2}{4^3} + \dots \approx 4.0490\dots

(Note that in this case, with s = 3, the infinite sum converges; but often we just use s as a formal placeholder and don’t particularly care about convergence. That is, \zeta_f is perfectly well defined as an infinite series without worrying about the value of s.)

In fact \zeta_f is called the Dirichlet generating function for f. Of course, this name should remind you of Dirichlet convolution—and it’s no coincidence; Dirichlet convolution and Dirichlet generating functions are closely related. Let’s see how.

Suppose we have two functions f(n) and g(n), and consider multiplying their Dirichlet generating functions (with plain old, regular multiplication):

\displaystyle \zeta_f(s) \zeta_g(s) = \displaystyle \left( \sum_{n \geq 1} \frac{f(n)}{n^s} \right) \left( \sum_{n \geq 1} \frac{g(n)}{n^s} \right)

We have a big (well, infinite) sum of things multiplied by another big sum. By distributivity, we’re going to get a big sum as a result, where each term of the resulting sum is the product of one thing chosen from the left-hand sum and one thing chosen from the right-hand sum. (This is just like FOIL, only way cooler and more infinite-r.) That is, the resulting sum is going to look something like this:

\displaystyle \displaystyle \zeta_f(s) \zeta_g(s) = \dots + \frac{f(a)}{a^s} \frac{g(b)}{b^s} + \dots = \sum_{a,b \geq 1} \frac{f(a)}{a^s} \frac{g(b)}{b^s}

with one term for every possible choice of a and b. The a^s b^s in the denominator is of course equal to (ab)^s, and we can collect up all the fractions with the same denominator: for each n, we will get a denominator of n^s in each case where we pick some a and b whose product is n. So we can reorganize the terms in the above sum, grouping together all the terms where the product of a and b is the same, and rewrite it like this (is this starting to look familiar…?):

\displaystyle \displaystyle \zeta_f(s) \zeta_g(s) = \sum_{n \geq 1} \sum_{ab = n} \frac{f(a) g(b)}{n^s}

Now we can factor out the 1/n^s (since it doesn’t depend on a or b), like so:

\displaystyle \displaystyle \zeta_f(s) \zeta_g(s) = \sum_{n \geq 1} \frac{1}{n^s} \sum_{ab = n} f(a) g(b)

But the inner sum is now just the definition of the Dirichlet convolution of f and g! So the whole thing becomes

\displaystyle \displaystyle \zeta_f(s) \zeta_g(s) = \sum_{n \geq 1} \frac{1}{n^s} \sum_{ab = n} f(a) g(b) = \sum_{n \geq 1} \frac{(f \ast g)(n)}{n^s}

And finally, we note that the thing on the right is itself the Dirichlet generating function for f \ast g. So in summary, we have shown that

\displaystyle \displaystyle \zeta_f(s) \zeta_g(s) = \zeta_{f \ast g}(s)

Neato! So in some sense, Dirichlet generating functions “turn Dirichlet convolution into regular multiplication”.

Posted in number theory | Tagged , , , , , , , | 3 Comments

More fun with Dirichlet convolution

I’m back after a bit of a hiatus for the holidays! Last time we saw how the principle of Möbius inversion arises from considering the \mu function from the point of view of Dirichlet convolution. Put simply, the Möbius function \mu is the inverse of \mathbf{1} with respect to Dirichlet convolution. As an example, we noted that \mathbf{1} \ast \varphi = \mathit{id} (that is, the sum of \varphi(d) over all divisors d of n is equal to n), and hence by Möbius inversion \varphi = \mu \ast \mathit{id}.

This is far from earth-shattering, but it’s fun to see how a number-theoretic function like \varphi arises in a simple formula involving Dirichlet convolution, and how Möbius inversion allows us to quickly derive a related but non-obvious fact. Let’s do a few more!

First, let’s consider \mathbf{1} \ast \mathbf{1}. We have

\displaystyle (\mathbf{1} \ast \mathbf{1})(n) = \sum_{d \mid n} \mathbf{1}(d) \mathbf{1}(n/d) = \sum_{d \mid n} 1,

which is just counting the number of divisors of n. This function is often denoted \tau. For example, 12 has six divisors (namely, 1, 2, 3, 4, 6, and 12), so \tau(12) = 6. Likewise \tau(7) = 2 (1 and 7), \tau(4) = 3 (1, 2, and 4), and so on.

The above equation can be restated as \mathbf{1} \ast \mathbf{1} = \tau, so by Möbius inversion, we immediately conclude that \mathbf{1} = \mu \ast \tau, that is,

\displaystyle 1 = \sum_{ab = n} \mu(a) \tau(b).

For example, we can check this for n = 12:

\displaystyle \begin{array}{l} \mu(1) \tau(12) + \mu(2) \tau(6) + \mu(3) \tau(4)\\[1em] + \mu(4) \tau(3) + \mu(6) \tau(2) + \mu(12) \tau(1) \\[1em] = 6 - 4 - 3 + 0 + 2 + 0 \\[1em] = 1 \end{array}


As another example, \mathbf{1} \ast \mathit{id} gives us \sum_{d \mid n} d, the sum of the divisors of n. This is usually denoted \sigma. For example, \sigma(12) = 1+2+3+4+6+12 = 28, \sigma(6) = 1+2+3+6 = 12, \sigma(7) = 1+7 = 8, and so on. Often we also define s(n) = \sigma(n) - n to be the sum of all the divisors of n other than n itself, i.e. the proper divisors. Perfect numbers like 6 and 28 are those for which n = s(n).

Again, since \mathbf{1} \ast \mathit{id} = \sigma, by Möbius inversion we immediately conclude \mathit{id} = \mu \ast \sigma; for example, again when n = 12, we have

\displaystyle \begin{array}{l} \mu(1) \sigma(12) + \mu(2) \sigma(6) + \mu(3) \sigma(4)\\[1em] + \mu(4) \sigma(3) + \mu(6) \sigma(2) + \mu(12) \sigma(1) \\[1em] = 28 - 12 - 7 + 0 + 3 + 0 \\[1em] = 12. \end{array}

Posted in number theory | Tagged , , , , , , | 1 Comment

Paper torus with Villarceau circles

I made another thing!

This is a torus, made from 24 crescent-shaped pieces of paper with slots cut into them so they interlock with each other. I followed these instructions on There is also a template with some ideas for nice variations here.

The idea of this model is to highlight Villarceau circles. Everyone knows how to find circular cross-sections of a torus, right? You can cut it horizontally (like cutting a bagel in half) in which case you get two concentric circles:

Or you can cut it vertically (like cutting a donut in half, if you wanted to share it with someone else), in which case you get two separate circles:

But it turns out there is yet another way to cut a torus to get circular cross-sections, by cutting it with a diagonal plane:

Note that the plane is tangent to the torus at the two places where the circles intersect. These circles are called Villarceau circles, named after French mathematician Yvon Villarceau, who published a paper about them in 1848. The paper model highlights these circles: each circle is composed of edges from two differently-colored crescents; the points of the crescents come together at the tangent points where two Villarceau circles intersect.

If you want to try making this model yourself, be warned: it is quite difficult! The five-star difficulty rating is no joke. The time from when I first printed out the templates for cutting until the time I finally finished it was several months. Partly that was because I only worked off and on and often went weeks without touching it. But partly it was also because I gave up in frustration a few times. The first time I attempted to assemble all the pieces it ended up completely falling apart. But the second time went much better (using what I had learned from my first failed attempt).

Posted in geometry | Tagged , , , , , | Leave a comment

Möbius inversion

In my last post we saw that \mu \ast \mathbf{1} = \varepsilon, that is, the Möbius function \mu is the inverse of \mathbf{1} with respect to Dirichlet convolution. This directly leads to an interesting principle called Möbius inversion.

Möbius inversion. Suppose g(n) is defined for n \geq 1 as the sum of another function f over all the divisors of n:

\displaystyle g(n) = \sum_{d \mid n} f(d).

Then we can “invert” this relationship to express f as a sum over g:

\displaystyle f(n) = \sum_{d \mid n} \mu(d) g(n/d).

The above is the “traditional” way of stating the principle, but I like to write it in this equivalent, more symmetric way:

\displaystyle f(n) = \sum_{ab = n} \mu(a) g(b)

where, as usual, the sum is over all factorizations of n into a product ab. This is the same as the previous equation, because ab = n if and only if a \mid n, in which case b = n/a. But now it becomes more clear that we are dealing with a Dirichlet convolution. Can you see how to prove this?

Proof. If we think in terms of Dirichlet convolution, the proof is almost trivial. Note first that saying g(n) = \sum_{d \mid n} f(n) is the same as saying g = \mathbf{1} \ast f (recall that taking the Dirichlet convolution with \mathbf{1} is the same as summing over all divisors). We can then take the Dirichlet convolution of both sides with \mu:

\displaystyle \begin{array}{rcl} \mu \ast g &=& \mu \ast (\mathbf{1} \ast f) \\ &=& (\mu \ast \mathbf{1}) \ast f \\ &=& \varepsilon \ast f \\ &=& f.\end{array}

In other words, Möbius inversion is really just saying that if f = \mathbf{1} \ast g, then \mu \ast f = g, because \mu is the inverse of \mathbf{1}.

Let’s try an example! Recall that \varphi denotes the Euler totient function, which counts how many positive integers less than or equal to n are relatively prime to n. For example, \varphi(5) = 4 since all the positive integers less than or equal to 5 (other than 5 itself) are relatively prime to it, and more generally \varphi(p) = p-1 for any prime p. As another example, \varphi(12) = 4 since the only positive integers between 1 and 11 which are relative prime to 12 are 1, 5, 7, and 11. In a previous post we considered a visual proof that

\displaystyle \sum_{d \mid n} \varphi(d) = n:

For example, as seen in the picture above, 12 = \varphi(1) + \varphi(2) + \varphi(3) + \varphi(4) + \varphi(6) + \varphi(12) = 1 + 1 + 2 + 2 + 2 + 4.

Now, this is in the perfect format to apply Möbius inversion: the sum over all divisors of the function g(n) = \varphi(n) is equal to the function f(n) = n. So we conclude that

\displaystyle \varphi(n) = \sum_{d \mid n} \mu(d) (n/d)

or, factoring out n,

\displaystyle \varphi(n) = n \sum_{d \mid n} \frac{\mu(d)}{d}

which is interesting since it expresses \varphi(n) as a fraction of n.

Let’s use this to compute \varphi(360). We have

\displaystyle \varphi(360) = \sum_{d \mid 360} \mu(d) (360/d).

The prime factorization of 360 is 2^3 \cdot 3^2 \cdot 5. We know that \mu(d) will be 0 for any divisor with any repeated factors, so those all cancel out; we only need to consider divisors which correspond to subsets of \{2,3,5\}. (n = 360 is the same example we used in our proof of the key property of the Möbius function.) So

\displaystyle \begin{array}{rcl} \varphi(360) &=& \mu(1) \cdot 360 \\ &+& \mu(2) \cdot 180 \\ &+& \mu(3) \cdot 120 \\ &+& \mu(5) \cdot 72 \\ &+& \mu(2\cdot 3) \cdot 60 \\ &+& \mu(2 \cdot 5) \cdot 36 \\ &+& \mu(3 \cdot 5) \cdot 24 \\ &+& \mu(2 \cdot 3 \cdot 5) \cdot 12 \end{array}

Remembering that \mu yields 1 for an even number of prime factors and -1 for an odd number, we can evaluate this as

\displaystyle \varphi(360) = 360 - 180 - 120 - 72 + 60 + 36 + 24 - 12 = 96.

(and we can use e.g. a computer to verify that this is correct). Note how this works: we start with 360 and then subtract out all the numbers divisible by 2 (all 180 of them), as well as those divisible by 3 (120) and 5 (72). But now we have subtracted some numbers multiple times, e.g. those which are divisible by both 2 and 3. So we add back in the numbers which are divisible by 2 \cdot 3 (60 of them), by 2 \cdot 5 (36), and by 3 \cdot 5 (24). Finally, we’ve now overcounted numbers which are divisible by all three, so we subtract off the 12 numbers which are divisible by 2 \cdot 3 \cdot 5. Egad! This seems really similar to PIE! Indeed, this is no accident: it turns out that the Principle of Inclusion-Exclusion is really just another kind of Möbius inversion. But that’s a subject for another post!

Posted in combinatorics, proof | Tagged , , , , | 3 Comments

Dirichlet convolution and the Möbius function

Recall from last time that the Dirichlet convolution of two functions f and g is written f \ast g and defined by:

\displaystyle (f \ast g)(n) = \sum_{ab = n} f(a) g(b)

where the sum is taken over all possible factorizations of n into a product ab of positive integers. Last time we saw that \ast is commutative and associative, and has \varepsilon(n) as an identity, where \varepsilon is the function which produces 1 for an input of 1 and 0 for all other inputs.

So what does the Möbius function have to do with this? Let’s start by considering a different function:

\displaystyle \mathbf{1}(n) = 1

is the function which ignores its input and always returns 1. Of course this is not the same thing as \varepsilon, and despite its name \mathbf{1} is indeed not an identity for Dirichlet convolution. That is, if we take some function f and find its Dirichlet convolution with \mathbf{1}, we don’t get f again—but we do get something interesting:

\displaystyle (\mathbf{1} \ast f)(n) = \sum_{ab=n} \mathbf{1}(a) f(b) = \sum_{ab=n} f(b) = \sum_{d \mid n} f(d)

The first step is the definition of \ast; the second step is the definition of \mathbf{1}; and the last step just notes that the sum of all f(b) where ab = n is the same as taking the sum of f(d) for all divisors of n.

That is, \mathbf{1} \ast f is the function which for a given n outputs not just f(n) but the sum of f over all divisors of n.

We’re now ready to see how \mu enters the picture!

Claim: \mathbf{1} \ast \mu = \varepsilon.

That is, \mu is the inverse of \mathbf{1} with respect to Dirichlet convolution. In my next post we’ll see some cool implications of this; for now, let’s just prove it.

Proof. Here’s the proof. Ready?

\displaystyle (\mathbf{1} \ast \mu)(n) = \sum_{d \mid n} \mu(d) = \varepsilon(n).

Actually, there was hardly anything left to prove! The first equality is because of what we just showed about taking the Dirichlet convolution of \mathbf{1} with something else. And the second equality is a property of \mu we just finished proving in some previous posts.

Posted in combinatorics, proof | Tagged , , , | 1 Comment

Dirichlet convolution

Let f and g be two functions defined on the positive integers. Then the Dirichlet convolution of f and g, written f \ast g, is another function on the positive integers, defined as follows:

\displaystyle (f \ast g)(n) = \sum_{ab = n} f(a) g(b)

The sum is taken over all possible factorizations of n into a product ab of positive integers. For example, suppose f(n) = n and g(n) = n^2. Then

\displaystyle \begin{array}{rcl}(f \ast g)(6) &=& \displaystyle \sum_{ab = 6} f(a) g(b) \\[1.5em] &=& f(1)g(6) + f(2)g(3) + f(3)g(2) + f(6)g(1) \\[1em] &=& 1 \cdot 6^2 + 2 \cdot 3^2 + 3 \cdot 2^2 + 6 \cdot 1^2 \\[1em] &=& 36 + 18 + 12 + 6 = 72.\end{array}

At this point this may seem somewhat arbitrary, but over the next few posts we’ll see that it’s not arbitrary at all—this operation is deeply connected with the Möbius function \mu and hence with primes and factorization.

So what properties does \ast have?

  • It’s commutative, that is, f \ast g = g \ast f. This follows directly from the commutativity of multiplication:

    \displaystyle (f \ast g)(n) = \sum_{ab = n} f(a) g(b) = \sum_{ba = n} g(b) f(a) = (g \ast f)(n).

  • It’s also associative, that is, f \ast (g \ast h) = (f \ast g) \ast  h. This also follows from the associativity of multiplication, and distributivity of multiplication over addition. The proof is kind of fiddly but not really difficult:

    \displaystyle \begin{array}{rcl} (f \ast (g \ast h))(n) &=& \displaystyle \sum_{ab = n} f(a) (g \ast h)(b) \\[1.5em] &=& \displaystyle \sum_{ab = n} f(a) \left( \sum_{cd = b} g(c) h(d) \right) \\[1.5em] &=& \displaystyle \sum_{ab = n} \sum_{cd = b} f(a) \left( g(c) h(d) \right) \\[1.5em] &=& \displaystyle \sum_{a(cd) = n} f(a) (g(c) h(d)) \\[1.5em] &=& \displaystyle \sum_{(ac)d = n} (f(a) g(c)) h(d) \\[1.5em] &=& \displaystyle \sum_{ed = n} \sum_{ac=e} (f(a) g(c)) h(d) \\[1.5em] &=& \displaystyle \sum_{ed = n} \left(\sum_{ac=e} f(a)g(c)\right) h(d) \\[1.5em] &=& \displaystyle \sum_{ed = n} (f \ast g)(e) h(d) \\[1.5em] &=& ((f \ast g) \ast h)(n) \end{array}

  • Define the function \varepsilon as follows:

    \displaystyle \varepsilon(n) = \begin{cases} 1 & \text{if } n=1 \\ 0 & \text{otherwise.} \end{cases}

    Then I claim that \varepsilon is an identity element for \ast, that is, for any function f we have f \ast \varepsilon = \varepsilon \ast f = f. By definition,

    \displaystyle \displaystyle (f \ast \varepsilon)(n) = \sum_{ab = n}f(a) \varepsilon(b)

    But \varepsilon(b) = 0 for every b other than 1, which cancels out most of the terms of the sum. The only term that does not become zero is f(n) \varepsilon(1) = f(n). So the right-hand sum reduces to just f(n), that is, f \ast \varepsilon = f. Since we already know \ast is commutative this proves that \varepsilon \ast f = f too.

So, to sum up1, we have defined the Dirichlet convolution \ast, and shown that it is associative, commutative, and has \varepsilon as an identity element. That is, to use some fancy math words, we have shown that (\ast, \varepsilon) is a commutative monoid over the set of functions defined on the positive integers.2 Next time we will see what this has to do with the Möbius function \mu.

  1. Haha.

  2. Technically we should also specify the codomain of the functions, but it doesn’t matter much: typically it is taken to be the complex numbers, but really anything with a commutative, associative multiplication that distributes over addition will do.

Posted in combinatorics, proof | Tagged , , , | 8 Comments