Commenter Stuart LoPresti gave a very nice analysis answering the questions at the end of my last post. And indeed, the punchline is that the answers are very similar to the answers to the post before that.

## PIE

If we want to calculate the number of people who like neither anchovies nor books, we can begin by subtracting the people who like anchovies and the people who like books from the total population: . However, anyone who likes *both* anchovies and books got subtracted twice, once because they like anchovies and once because they like books, so this number is too low. We have to add back in the people who like both so they only get subtracted once overall:

Likewise, to count the number of people who like neither anchovies nor books nor carpets, we start by subtracting people who like each, then adding back the people who like combinations of two things, and finally subtracting again the people who like all three:

(You might like to convince yourself that this counts everyone properly—these pictures might help.) In general, the claim is that if we have some collection of sets, to compute the number of elements that are *not* in any of the sets, we start by taking *every possible subset* of sets from the collection; find the size of the *intersection* of each such subset; and then either add or subtract depending on whether the number of sets in the intersection is respectively even or odd. (Notice that even taking the subset containing *zero* of the sets is included—the intersection of zero sets can reasonably be taken to mean the entire population , and we add it because zero is even.) This is called the **Principle of Inclusion-Exclusion**, or PIE for short.

But *why*? Even for the case of three sets it is a bit tricky to keep track of how many times each possible group of people has been added or subtracted. It is clear that we need to do some kind of adding and subtracting—for example, clearly subtracts some people too many times. But it seems rather magical that we need to add or subtract every possible intersection of some of the sets *exactly once*.

## Proof by probability?

In my previous post we saw that a very similar pattern emerges if we compute the *probabilities* of being in various sets. For example, the probability of liking none of the three things is

which is exactly analogous to the expression above for the number of people in the set, where multiplication has been replaced by set intersection. Can we turn this into a proof of PIE? Perhaps we can just multiply through by ? After all, the probability of being in set is really just the ratio , the size of divided by the total size of the population. So if the probability of being in set is , then the size of should be , right?

Well, this is all true, but it’s still not a proof of PIE. The problem is that when talking about probabilities we made a big assumption, namely, that the probabilities were all *independent*. This is what allowed us to conclude that the probability of being in the intersection of two sets is the product of their probabilities. In general there is no reason this should be true. For example, suppose exactly half the population likes anchovies and exactly half likes books. If these were independent, then you would expect one quarter of the population to like both. But it’s possible that due to complex socio-political-economic factors, people who like books are actually *more likely* to also like anchovies than people who don’t like books (insert your favorite armchair-sociological explanation here). In this scenario, the product is *not* the same as the probability of being in both sets, and so the fact that doesn’t really tell us anything about the size of the sets involved.

So why does PIE work out so nicely? Over the next two posts I’ll present two different proofs!

For a short moment I thought this was about the dependently typed programming language Pie! https://github.com/david-christiansen/pie-hs

One thing I have always liked about the principle of inclusion–exclusion is it’s generalization to an arbitrary distributive lattice. Namely if L is a distributive lattice and if c : L -> G, where G is a commutative group (or some similar structure such as ℕ), is a function such that L (A ⊔ B) = L(A) + L(B) – L(A ⊓ B) then the general principle of inclusion–exclusion holds. This is a kind of “minimal context” in which one can do the induction proof.

This also, to some extent, goes towards explaining some quirky facts where this principle fails. For instance the following notes the common false belief that the dimensions of subspaces of a vector space satisfy an inclusion–exclusion principle:

https://mathoverflow.net/a/23501/1199

In particular the collection of subspaces of a vector space do not form a distributive lattice and so the above fact does not apply. This is actually an example for why we cannot weaken the condition.

Cool, thanks, I didn’t know about this! This is a different way to generalize PIE than I knew about, namely, general Möbius inversion over incidence algebras (PIE arises from considering the poset of subsets of some set ordered by inclusion). At least it seems like a different way to me, perhaps there is something that generalizes both.

I forgot to mention that I got the idea for this intuitive presentation of PIE from commenter Steve when I asked (seven years ago!) how to explain PIE: https://mathlesstraveled.com/2012/06/11/how-to-explain-the-principle-of-inclusion-exclusion/#comment-11211

Pingback: Formal PIE | The Math Less Traveled

Pingback: PIE: proof by algebra | The Math Less Traveled

Pingback: A combinatorial proof: reboot! | The Math Less Traveled