PIE: proof by counting

Recall the setup: we have a universal set U and a collection of subsets A_1, A_2, A_3, and so on, up to A_n. PIE claims that we can compute the number of elements of U that are in none of the A_i (that is, |U| - |A_1 \cup A_2 \cup \dots \cup A_n|) if we take the sizes of every possible intersection of some of the sets (including none of them, which yields U itself), and alternately add (if we intersected an even number of sets) or subtract (if odd). That is, using formal notation,

\displaystyle \left| |U| - \bigcup_{i = 1}^n A_i \right| = \sum_{S \subseteq \{1, \dots, n\}} (-1)^{|S|} \left|\bigcap_{i \in S} A_i\right|

Last time, we proved this using some algebra. This time, we’re going to prove it by counting. Consider a particular (arbitrary) element x \in U. We want to see how many times x gets counted: we count 1 for each intersection containing x that gets added, and -1 for each intersection containing x that gets subtracted. In the end, x should end up being counted exactly once if it’s not in any of the A_i, and exactly zero times otherwise.

So, suppose x is in exactly k of the sets A_i. Let’s consider intersections of different numbers of sets and see how many times x gets counted in each.

  • There is only one way to intersect 0 of the sets, which gives U itself. x is definitely in there, so including |U| means x gets counted once.

  • “Intersecting one set” sounds nonsensical, but really it just means we are considering the sets A_1, A_2, etc. by themselves; these all get subtracted. By assumption, x is in exactly k of them, so it gets counted -k times.

  • There are a bunch of different ways to intersect two sets, A_i \cap A_j. How many of them contain x? Of course x is in A_i \cap A_j exactly when it is in both A_i and A_j, so the number of two-set intersections containing x is the number of different ways to pick two out of the k sets containing x, that is, \binom{k}{2}. These all get added, so x is counted \binom{k}{2} times.

  • Likewise with intersections of three sets: the only way for x to end up in an intersection of three sets is if x is contained in all of them, and there are \binom{k}{3} ways to choose three out of the k sets containing x. These get subtracted.

In general, there are \binom{k}{i} different intersections of exactly i sets which contain x, up through i = k. So, all in all, x gets counted a grand total of

\displaystyle\binom{k}{0} - \binom{k}{1} + \binom{k}{2} - \binom{k}{3} + \dots \pm \binom{k}{k}

times. Now, when k = 0, this is just \binom{0}{0} = 1, which makes sense: if x is in none of the A_i, then it gets counted exactly once (as a part of U itself), just as it should be. Otherwise, we have an alternating sum of successive binomial coefficients—that is, we add up a row of Pascal’s triangle with alternating signs. Such an alternating sum of binomial coefficients is always zero, which I proved in another blog post. And that’s also as it should be: if x is in at least one of the A_i then we ultimately want it to be counted zero times.

Posted in combinatorics, pattern, proof | Tagged , , , , | Leave a comment

PIE: proof by algebra

In my previous post I stated a very formal, general form of the Principle of Inclusion-Exclusion, or PIE.1 In this post I am going to outline one proof of PIE. I’m not going to give a completely formal proof, because the notation gets quite hairy (imagine doing algebraic manipulations on the formulas in the previous post)! But I hope to give enough of an intuition that when I say “and so on”, intrepid readers with good algebra skills should be able to complete a formal algebraic proof.

Set algebra

For the proof, we will first need a few algebraic facts about set operations:

  1. Set intersection distributes over union, that is,

    (A \cup B) \cap C = (A \cap C) \cup (B \cap C).

    You should be able to convince yourself of this by staring at a picture:

    Also, if you think of union as being like addition and intersection as being like multiplication, then this says (A +  B)C = AC + BC, which makes it easy to remember. (Although unlike the case of addition and multiplication, union also distributes over intersection, that is, (A \cap B) \cup C = (A \cup C) \cap  (B \cup C)—you might like to try convincing yourself that this works. We won’t need this fact, though.)

  2. Set intersection has some nice properties:

    • It is idempotent, that is, A \cap A = A: if you intersect a set with itself you get the same set back.

    • It is commutative, that is, A \cap B = B \cap A: the order of sets being intersected doesn’t matter.

    • Finally, it is associative, that is, (A \cap B) \cap C = A \cap (B \cap C): if we have a sequence of intersections to do it doesn’t matter which one we start with. Because of this we can just write things like A \cap B \cap C \cap D without worrying about parentheses.

    Later in the proof we will use these properties together to argue, for example, that A \cap C \cap B \cap C = A \cap B \cap C: associativity means it doesn’t matter where we put parentheses; commutativity means we can swap the C and B, and then idempotence says that the two adjacent copies of C can be replaced by one copy.

There are lots of other nice properties, too—for example, union has similar properties to intersection—but these are the only ones we will need for the proof below.

Finally, to save space and make things a bit easier to read, in the rest of the post I will sometimes omit the \cap symbol and write just AB instead of A \cap B. For example, using this notation the fact that intersection distributes over union can be written

(A \cup B)C = AC \cup BC.

The proof

Onward to the proof! We’ll prove the alternative presentation of PIE I introduced in my previous post, which tells us how to compute the size of a set like A \cup B \cup C \cup \dots. The proof is by induction on the number of sets being unioned.

First, for n = 1, PIE just says that |A| = |A|. Done!

Well, actually, to get things off the ground we really need to treat n=2 as a base case as well. In this case, PIE says that

|A \cup B| = |A| + |B| - |AB|

(remember that AB means A \cap B). We’ll call this equation PIE2. We can prove it by staring at another picture:

This is pretty clear: adding the sizes of A and B double-counts their overlap, so we have to subtract it once.

Now is where it gets more interesting. Consider the case with three sets, |A \cup B \cup C|. The trick is to start by writing A \cup B \cup C as (A \cup B) \cup C and then to apply PIE2, using A \cup B as the first set and C as the second. In the proof below I’ve written the justification for each step in curly brackets.

\begin{array}{rcl}|A \cup B \cup C| &=& |(A \cup B) \cup C| \\[1em] &=& \quad\text{\{ PIE2 \}} \\[1em] && |A \cup B| + |C| - |(A \cup B)C| \\[1em] &=& \quad\text{\{ intersection distributes over union \}} \\[1em] && |A \cup B| + |C| - |AC \cup BC| \\[1em] &=& \quad\text{\{ PIE2, twice \}} \\[1em] && (|A| + |B| - |AB|) + |C| - (|AC| + |BC| - |ACBC|) \\[1em] &=& \quad\text{\{ intersection properties, rearrange \}} \\[1em] && |A| + |B| + |C| - |AB| - |AC| - |BC| + |ABC| \end{array}

Notice in the last step how we turned ACBC into ABC—recall that the properties of intersection let us reorder the sets in an intersection like this and get rid of duplicates.

And there we go, we have derived PIE for n = 3 just using PIE2 and the laws of set algebra! We’ll call this result PIE3. Now, what about the case for n = 4?

\begin{array}{rcl}|A \cup B \cup C \cup D| &=& |(A \cup B \cup C) \cup D| \\[1em] &=& \quad\text{\{ PIE2 \}} \\[1em] && |A \cup B \cup C| + |D| - |(A \cup B \cup C)D| \\[1em] &=& \quad\text{\{ intersection distributes over union \}} \\[1em] && |A \cup B \cup C| + |D| - |AD \cup BD \cup CD| \\[1em] &=& \quad\text{\{ PIE3, twice \}} \\[1em] \dots \end{array}

There is a lot of algebra to do after applying PIE3 twice, but I think it’s actually better to not do it and instead focus on the big picture. We already know what |A \cup B \cup C| is. We get every combination of intersections among A, B, and C, added or subtracted according to whether the number is odd or even. But what about |AD \cup BD \cup CD|? Notice it’s the same as |A \cup B \cup C| but where each set has been intersected with D. We’re going to end up with every possible intersection between AD, BD, and CD, but because we can cancel out duplicate copies of D, this is the same as taking every possible intersection between A, B, and C, and then intersecting each with one copy of D. Also note that because we’re subtracting |AD \cup BD \cup CD|, the signs will be flipped relative to what we got with |A \cup B \cup C|—which makes sense since each term will have one extra set in it (namely, D) which will turn odd intersections into even and vice versa. In other words, at a very high level we will end up with

(all combinations of A,B,C) + D – (all combinations of A,B,C with a D added)

Notice how every possible combination of A, B, C, and D other than the empty intersection shows up exactly once, with the correct sign: every combination that doesn’t include D shows up in the first part (with the correct sign); then |D| is added by itself; then every combination that does include D and at least one other set shows up in the last part (with sign flipped to reflect the addition of one extra set).

In general, you can see how this is going to work. For each n we can prove PIE(n+1), that is, the version of PIE for a union of n+1 sets, by splitting off a single set X, using PIE2 like so:

|(A_1 \cup \dots \cup A_n) \cup X| = |A_1 \cup \dots \cup A_n| + |X| - |(A_1 \cup \dots \cup A_n)X|

and then using PIEn twice, once yielding all combinations without X, and once yielding all combinations with X. As I said, the notation to write that out formally (without any ellipses!) gets a bit hairy, but hopefully the idea makes sense!

Next time we’ll see a completely different proof of PIE, that relies on counting rather than algebra.

  1. It turns out there are even more general forms—for example see Callan McGill’s comment on my previous post—perhaps a later post!

Posted in combinatorics, induction, pattern, proof | Tagged , , , , , , | 1 Comment

Formal PIE

I’ve been talking informally about the Principle of Inclusion-Exclusion but I realized it would be useful to state it more formally before proceeding to some proofs. The only problem is that a fully formal statement of PIE has a lot of details and uses a lot of notation—so I’ll state it formally and then spend some time going through and explaining the parts carefully.

Suppose we have a collection of n finite sets A_1, A_2, A_3, and so on up to A_n, which are all subsets of a finite set U. We think of U as the “universe”, i.e. the entire population of things we are considering, and all the A_i are sets which “live in” the universe. Each set A_i typically represents a subpopulation with a particular property. For example, U might be the set of all people; then A_1 might be the set of people named Brent, A_2 might be the set of people who own a dog, and so on. Or as another example, U might be the set of natural numbers up to one million, and then A_1 might be the even numbers up to one million, A_2 might be the prime numbers less than one million, A_3 might be the set of all the ages I’ve ever been in my life, and so on.

The diagram below illustrates the general situation for n=3. The big green rectangle is the universe set U; the blue circles represent the three subsets A_1, A_2, and A_3, which in general may overlap, as in the diagram (though they don’t have to).

As I’ve presented it so far, PIE then tells us how to compute the number of elements of the universe which are not in any of the A_i, represented by the green shaded area in the diagram above. Put another way, this is the number of elements of the universe which have none of the properties (e.g. people not named Brent who don’t own a dog; natural numbers up to one million which aren’t even or prime or an age I’ve ever been; etc.).

PIE, formally

Without further ado, here is PIE:

\displaystyle |U| - \left| \bigcup_{i = 1}^n A_i \right| = \sum_{S \subseteq \{1 \dots n\}} (-1)^{|S|} \left|\bigcap_{i \in S} A_i\right|

Let’s go through this piece by piece. On the left-hand side, the big union symbol works just like a summation symbol, i.e. big-sigma notation: so \displaystyle \bigcup_{i=1}^n A_i means A_1 \cup A_2 \cup \dots \cup A_n, the union of all the A_i. Unioning the sets throws out duplicates, so the union represents all the elements of the universe which are in one or more of the A_i (i.e. the blue area in the diagram above). Subtracting this number from the size of U then gives the number of elements which aren’t in any of the A_i.

The right-hand side is a bit more complicated. First, \displaystyle \sum_{S \subseteq \{1 \dots n\}} means we take a sum over every subset of \{1 \dots n\}. That is, we will get one number for every possible subset of \{1 \dots n\} (all 2^n of them), and we add all these numbers up. Now, for each subset S, the big intersection symbol \displaystyle \bigcap_{i \in S} A_i means to take the intersection of all the A_i represented by S. For example, if S = \{2, 3, 7, 9\}, then we get the intersection A_2 \cap A_3 \cap A_7 \cap A_9. We find the size of this intersection and make it either positive or negative: raising -1 to a power keeps flipping back and forth between 1 (if the power is even) and -1 (if the power is odd). So (-1)^{|S|} means we should add the term if the number of A_i in the intersection is even, and subtract if the number is odd.

There is one subtle point to make, which I also mentioned in my previous post. All subsets of \{1 \dots n\} includes the empty subset, S = \varnothing, and it’s not necessarily obvious what the intersection \displaystyle \bigcap_{i \in S} A_i means when S is the empty set. How are we supposed to take the intersection of zero sets? To make everything work, we have to interpret it as being equal to the entire universe U. However, this is actually sensible, since U is like the “identity” for intersection: U \cap A_i = A_i for each A_i. Put another way, intersecting always results in a smaller set than you started with, so when intersecting you always start with the biggest possible set (namely, the universe U) and then the result gets smaller and smaller as you intersect more sets. That is, we can interpret A_i \cap A_j \cap A_k \cap \dots as (((U \cap A_i) \cap A_j) \cap A_k) \cap \dots. So if you don’t intersect any A_i you’re just left with the starting set U. (This is exactly the same argument for why a union of zero sets should be the empty set, but that certainly feels more intuitive! Pop quiz: what do we get if we’re taking the product of a set of numbers and the set is empty?)

For some examples, see the previous post. You might like to try writing out PIE for n=4.

An alternative presentation

Since we just argued why the empty set results in a term of |U| on the right-hand side (note it is positive since (-1)^{|\varnothing|} = (-1)^0 = 1), and there is also a positive |U| term on the left-hand side, we can subtract |U| from both sides to cancel them out, leaving this:

\displaystyle - \left| \bigcup_{i = 1}^n A_i \right| = \sum_{\stackrel{S \subseteq \{1 \dots n\}}{S \neq \varnothing}} (-1)^{|S|} \left|\bigcap_{i \in S} A_i\right|

(Notice how we got rid of the |U| term on the right-hand side just by specifying that the summation should not use S = \varnothing.)

If we finally negate both sides, we get an alternative form of PIE which is actually the way it is typically presented. We can negate every term on the right-hand side by simply adding 1 to the power of (-1), so any terms which used to be positive are now negative and vice versa.

\displaystyle \left| \bigcup_{i = 1}^n A_i \right| = \sum_{\stackrel{S \subseteq \{1 \dots n\}}{S \neq \varnothing}} (-1)^{|S|+1} \left|\bigcap_{i \in S} A_i\right|

For example, when n = 3 this says

|A_1 \cup A_2 \cup A_3| = |A_1| + |A_2| + |A_3| - |A_1 \cap A_2| - |A_1 \cap A_3| - |A_2 \cap A_3| + |A_1 \cap A_2 \cap A_3|.

This alternative presentation is nice because it tells us directly about the size of the union of all the A_i, i.e. the number of things in at least one of the sets (the blue region in the diagram), rather than the number of things in none of the sets (the green region). It seems like we would typically care more about the blue region; talking about the green region seems roundabout. However, as you can see, the right-hand side becomes a bit more awkward in this alternative presentation, because of the need to specify the extra condition S \neq \varnothing, and the +1 in the exponent of the (-1). Strange as it might seem at first, PIE is really more natural when phrased in terms of the number of things in none of the sets A_i.

Posted in combinatorics, pattern | Tagged , , , , , | 1 Comment

Have a piece of PIE

Commenter Stuart LoPresti gave a very nice analysis answering the questions at the end of my last post. And indeed, the punchline is that the answers are very similar to the answers to the post before that.


If we want to calculate the number of people who like neither anchovies nor books, we can begin by subtracting the people who like anchovies and the people who like books from the total population: |P| - |A| - |B|. However, anyone who likes both anchovies and books got subtracted twice, once because they like anchovies and once because they like books, so this number is too low. We have to add back in the people who like both so they only get subtracted once overall:

|P| - |A| - |B| + |A \cap B|

Likewise, to count the number of people who like neither anchovies nor books nor carpets, we start by subtracting people who like each, then adding back the people who like combinations of two things, and finally subtracting again the people who like all three:

|P| - |A| - |B| - |C| + |A \cap B| + |A \cap C| + |B \cap C| - |A \cap B \cap C|

(You might like to convince yourself that this counts everyone properly—these pictures might help.) In general, the claim is that if we have some collection of sets, to compute the number of elements that are not in any of the sets, we start by taking every possible subset of sets from the collection; find the size of the intersection of each such subset; and then either add or subtract depending on whether the number of sets in the intersection is respectively even or odd. (Notice that even taking the subset containing zero of the sets is included—the intersection of zero sets can reasonably be taken to mean the entire population P, and we add it because zero is even.) This is called the Principle of Inclusion-Exclusion, or PIE for short.

But why? Even for the case of three sets it is a bit tricky to keep track of how many times each possible group of people has been added or subtracted. It is clear that we need to do some kind of adding and subtracting—for example, |P| - |A| - |B| - |C| clearly subtracts some people too many times. But it seems rather magical that we need to add or subtract every possible intersection of some of the sets exactly once.

Proof by probability?

In my previous post we saw that a very similar pattern emerges if we compute the probabilities of being in various sets. For example, the probability of liking none of the three things is

1 - a - b - c + ab + ac + bc - abc

which is exactly analogous to the expression above for the number of people in the set, where multiplication has been replaced by set intersection. Can we turn this into a proof of PIE? Perhaps we can just multiply through by |P|? After all, the probability of being in set A is really just the ratio p_a = |A|/|P|, the size of A divided by the total size of the population. So if the probability of being in set A is p, then the size of A should be |A| = p_a \cdot |P|, right?

Well, this is all true, but it’s still not a proof of PIE. The problem is that when talking about probabilities we made a big assumption, namely, that the probabilities were all independent. This is what allowed us to conclude that the probability of being in the intersection of two sets is the product of their probabilities. In general there is no reason this should be true. For example, suppose exactly half the population likes anchovies and exactly half likes books. If these were independent, then you would expect one quarter of the population to like both. But it’s possible that due to complex socio-political-economic factors, people who like books are actually more likely to also like anchovies than people who don’t like books (insert your favorite armchair-sociological explanation here). In this scenario, the product ab is not the same as the probability of being in both sets, and so the fact that (1-a)(1-b) = 1 - a - b + ab doesn’t really tell us anything about the size of the sets involved.

So why does PIE work out so nicely? Over the next two posts I’ll present two different proofs!

Posted in pattern, probability | Tagged , | 5 Comments

Probabilistic PIE

Commenters Sylvain B., xpil, and Christian Luca all gave correct answers to the challenge from my previous post. If the probability that someone likes X is p, then the probability they don’t like X is (1-p). Therefore the probability that someone doesn’t like anchovies and doesn’t like books is (1-a)(1-b). (If liking anchovies and books are independent, then not liking those things is also independent.) Likewise, the probability that someone doesn’t like all three is (1-a)(1-b)(1-c).

The reason I asked the question, however, is to see the pattern that emerges if we multiply out these expressions, which commenter Buddha Buck explained.

1 - a - b + ab

1 - a - b - c + ab + ac + bc - abc

1 - a - b - c - d + ab + ac + ad + bc + bd + cd - abc - abd - acd - bcd + abcd

The first line is the result of multiplying out (1-a)(1-b); the second is (1-a)(1-b)(1-c); and the last is of course (1-a)(1-b)(1-c)(1-d). After expanding everything as much as possible, we get one term for the product of every possible combination of the probabilities, with a positive sign for an even number of probabilities, and negative for an odd number.

Do you see why this happens? Each term in the result arises from choosing one of the two possible terms from each parenthesized expression. Taking (1-a)(1-b)(1-c) as a specific example, we get bc if we choose 1 from (1-a), and -b from (1-b), and -c from (1-c), yielding 1 \cdot (-b) \cdot (-c) = bc.

And now for a set of follow-up questions. Suppose that P is the set of all people in our population, A is the set of people who like anchovies, and B is the set of people who like books. Suppose we also know how many people like both anchovies and books, that is, we know |A \cap B|. (The notation |S| denotes the size of a set S, and \cap denotes the intersection of two sets, that is, the set of elements the two sets have in common.)

  • How many people like neither anchovies nor books?

  • Now suppose C is the set of people who like carpets. Suppose we also know the sizes of the sets A \cap C, B \cap C, and A \cap B \cap C. How many people like none of the things?

  • And so on?

Posted in pattern, probability, solutions | Tagged , | 6 Comments

A probabilty puzzle

Suppose that we have surveyed a certain population of people, and have determined the following probabilities:

  • The probability that a person likes anchovies is a.
  • The probability that a person likes to read books is b.
  • The probability that a person likes carpets is c.

Suppose we also know that these probabilities are all independent, that is, they do not influence each other at all. Wether a person likes anchovies has nothing to do with whether they like to read or whether they like carpets, and so on. When probabilities are independent, it makes sense to multiply them: for example, the probability that someone likes both anchovies and carpets is the product ac.

  • What is the probability that a randomly chosen person likes neither anchovies nor books?

  • What is the probability that a randomly chosen person likes none of these three things?

  • Can you generalize? What if we had four, five, or more independent probabilities for things people might like; what would be the probability that a random person didn’t like any of the things?

Posted in challenges, probability | Tagged , | 8 Comments

Computing the Euler totient function, part 4: totient of prime powers

I’ve been on a bit of a hiatus as I’ve been travelling with my family for the past month. So here’s a recap.

Our story so far

  • Recall that the Euler totient function, \varphi(n), counts how many numbers from 1 to n are relatively prime to (that is, share no prime factors in common with) n.

  • My current goal is to show that if we know the prime factorization of n, we can calculate \varphi(n) much more quickly than just by counting.

  • This rests on the fact that \varphi is multiplicative, that is, \varphi(mn) = \varphi(m)\varphi(n) whenever m and n are relatively prime. In particular this means we can break down \varphi based on a number’s prime factorization: \varphi(p_1^{k_1} p_2^{k_2} \dots p_n^{k_n}) = \varphi(p_1^{k_1})  \varphi(p_2^{k_2}) \dots \varphi(p_n^{k_n}).

  • First I displayed some visualizations that give us some intuition for why this is true. Then in another post I presented a formal proof based on that intuition.

  • So the only thing left is to figure out how to compute \varphi(p^k). That’s what we’ll do today! Then we’ll see the whole method of computing \varphi in action.

Totient of prime powers

As a warmup, what is \varphi(p) when p is prime? Since p is prime, by definition it has no prime factors other than itself, and no smaller number can share a prime factor with it. Hence every number from 1 to p-1 is relatively prime to p, and \varphi(p) = p-1.

How about \varphi(p^2)? Here’s an example with p=5. The blue squares are the numbers relatively prime to 5^2 = 25—the ones we want to count—and the white squares are the ones which share a prime factor with 25.

In general, the only numbers that share a prime factor with p^2 are numbers that are divisible by p. In the example above, there are five white squares; in general, counting p^2 itself, there are exactly p of these numbers: p, 2p, 3p, 4p, \dots, (p-1)p, p^2. So all of the numbers from 1 to p^2 are relatively prime to p^2 except for those p multiples of p, for a total of p^2 - p. Put another way, exactly one out of every p numbers is divisible by p, so each consecutive group of p numbers yields p-1 which are relatively prime to p. There are p such groups up to p^2, so the total is p(p-1) = p^2 - p. In the example with p=5, you can verify that there are 5^2 - 5 = 5 \cdot (5-1) = 20 blue squares.

Finally, what about bigger powers, \varphi(p^k)? Well, once again, the only numbers which share any common factor with p^k are exactly the multiples of p. There’s one multiple of p (and p-1 non-multiples of p) in every group of p numbers. How many such groups are there up to p^k? There are p^{k-1} of them. Hence \varphi(p^k) = p^{k-1}(p-1).

Putting it all together

A previous post contained an example that required computing \varphi(151389); now we can see how to do it. First, we factor the number as 151389 = 3^5 \cdot 7 \cdot 89. Then, using the fact that \varphi is multiplicative, we can break it down into three separate applications of \varphi:

\varphi(3^5 \cdot 7 \cdot 89) = \varphi(3^5) \cdot \varphi(7) \cdot  \varphi(89).

Finally, we can evaluate \varphi for each of these prime powers:

  • \varphi(3^5) = 3^4 \cdot (3-1) = 81 \cdot 2 = 162
  • \varphi(7) = 6
  • \varphi(89) = 88

Hence \varphi(151389) = 162 \cdot 6 \cdot 88 = 85536. For this relatively small number we can actually double-check this result using a computer to count the answer by brute force:

>>> length [n | n <- [1 .. 151389], gcd n 151389 == 1]

Math works!

Posted in arithmetic, computation, counting | Tagged , , | Leave a comment