The hyperbinary sequence and the Calkin-Wilf tree

And now, the amazing conclusion to this series of posts on Neil Calkin and Herbert Wilf’s paper, Recounting the Rationals, and the answers to all the questions about the hyperbinary sequence. Hold on to your hats!

The Calkin-Wilf Tree

First, recall the Calkin-Wilf tree, defined as follows: the “root” of the tree is the fraction 1/1, and every node labeled i/j in the tree has two “children”, with the left child labeled i/(i+j), and the right labeled (i+j)/j. Here’s a picture of the first four levels of the tree (of course, the tree is infinite):

Now wait just one minute… those numbers look familiar! Let’s list those fractions in order, one level at a time, from left to right:

\displaystyle \frac 1 1, \frac 1 2, \frac 2 1, \frac 1 3, \frac 3 2, \frac 2 3, \frac 3 1, \frac 1 4, \frac 4 3, \frac 3 5, \frac 5 2, \frac 2 5, \frac 5 3, \frac 3 4, \frac 4 1 \dots

That’s right, it’s the hyperbinary sequence! In particular, the numerators are the hyperbinary sequence starting at index 0, and the denominators follow the same sequence, offset by one.

…but why?! The Calkin-Wilf tree is defined by a recursive rule for manipulating fractions, and the hyperbinary sequence is defined in terms of ways to write numbers as sums of powers of two… it certainly isn’t a priori obvious why these should have anything to do with each other!

Labelling the CW-tree

Let’s label the nodes in the Calkin-Wilf tree with positive integers, starting with 1 at the root and proceeding in order by levels, from left to right. This means that the fraction at node n is h(n-1)/h(n)—at least, that’s what we’re trying to prove! Here’s an illustration of our numbering scheme:

Now, what do you notice about the relationship between the label on a node and the labels on its children? It’s not too hard to guess the pattern: the left child of the node labelled k is labelled 2k, and the right child is labelled 2k+1. How can we prove this?

Well, it’s certainly true for the root note: it’s labelled 1, and its children are labelled 2 and 3. Now we note that if the pattern holds for the node immediately preceding node k—that is, the children of node k-1 are 2k-2 and 2k-1—then the children of node k come immediately after the children of k-1, so they are numbered 2k and 2k+1. If you think about it, this is true regardless of whether k-1 and k are on the same level, or k-1 is the end of one level and k is the beginning of the next.

We can think of labelling each edge of the Calkin-Wilf tree with either a 0 or a 1: 0 for left edges, and 1 for right edges. Then taking all the zeros and ones along the path from the root to any node and sticking an extra 1 at the beginning gives us the label of that node in binary! For example, the path that goes left, left, right corresponds to 0, 0, 1, and adding an extra 1 to the front gives us 1001–which is indeed the binary representation of 9!

So, is it really true that the fraction at node n is always h(n-1)/h(n)? It is indeed, and we now have the tools we need to see why. First of all, the root node of the tree—that is, node 1—is (by definition) 1/1, which is indeed h(0)/h(1). Next, suppose that the fraction at node k is h(k-1)/h(k); we need to show that the fraction at the left child (which is node 2k) is h(2k-1)/h(2k), and the fraction at the right child is h(2k)/h(2k+1). But this is now easy: by the way the Calkin-Wilf tree is defined, the left child is

\displaystyle\frac{h(k-1)}{h(k-1) + h(k)} = \frac{h(2k-1)}{h(2k)}.

The equality, of course, follows directly from the recurrence relation for the hyperbinary sequence! Likewise, the right child is

\displaystyle \frac{h(k-1) + h(k)}{h(k)} = \frac{h(2k)}{h(2k+1)}.

Solving the \phi(n) mystery

As pointed out by JM, this connection with the Calkin-Wilf tree is exactly what we need to prove Fergal Daly’s conjecture about the number of primary occurrences of any given number.

We defined a primary occurrence simply as a number at an even position in the hyperbinary sequence. Of course, even-labelled nodes in the Calkin-Wilf tree are exactly the ones which are left children, and so the primary occurrences are the denominators of left children (equivalently, the numerators of right children). Also, since left children are always of the form i/(i+j), left children always have denominators bigger than their numerators. Finally, as you may recall, we showed earlier that every positive rational number occurs somewhere in the Calkin-Wilf tree, in lowest terms, exactly once.

Putting this all together: each primary occurrence of n corresponds to a fraction m/n in the Calkin-Wilf tree, where m < n. And since every fraction occurs exactly once in lowest terms, there is exactly one such fraction m/n for each positive integer m < n which is relatively prime to n. Of course, by definition, there are \phi(n) such numbers, so this is the proof of Fergal’s conjecture! There are exactly \phi(n) primary occurrences of n, since that’s how many reduced fractions there are between 0 and 1 which have a denominator of n.

Computing primary occurrences

The connection with the Calkin-Wilf tree also allows us to compute h^{-1}—that is, given any n, we can efficiently (in O(n \log n) time) compute all the primary occurrences k such that h(k) = n!

How does this work? Well, if you remember, the Euclidan Algorithm gives us a way to find our way up the Calkin-Wilf tree from any starting fraction. Combining this with the relationship we discovered earlier between the label on a node and the labels on its children allows us to compute the label on the node where we started.

As an example, let’s compute one of the primary occurrences of 5. In particular, let’s do the one that corresponds to the fraction 3/5. Since 3 is smaller than 5, 3/5 must be the left child of 3/(5-3) = 3/2. 3 is bigger than 2, so 3/2 is the right child of (3-2)/2 = 1/2. Finally, 1/2 is the left child of 1/1. Now we can work backwards: 1/1 is node 1; 1/2 is to the left, so it’s node 2*1 = 2; 3/2 is to the right of that, so it’s node 2*2 + 1 = 5; and 3/5 is to the left again, so it’s node 2*5 = 10. Therefore h(10) = 5 is a primary occurrence of 5! We can also think of this more succinctly by recalling that paths in the Calkin-Wilf tree correspond to binary representations of the labels. So we can run the Euclidean Algorithm, getting a zero or one at each step, and then reverse them and put an extra 1 at the beginning to get the binary representation of the primary occurrence. To be more precise:

  1. Start with the pair (m,n) where m < n is relatively prime to n. Call the current pair of numbers (a,b).
  2. Given (a,b), if a < b then write (a,b) \stackrel{0}{\to} (a,b-a). Otherwise write (a,b) \stackrel{1}{\to} (a-b,b).
  3. Repeat step 2 until reaching (1,1).
  4. Read off the primary occurrence of n by prefixing 1 to the reverse of the binary digits written over the arrows.

All of this leads to the following (somewhat mind-blowing) observation: the position of each primary occurrence of n in the hyperbinary sequence is a proof (encoded in binary) that n is relatively prime to the number that comes immediately before it.

To tie this all together, here’s a little Haskell program I wrote to compute primary occurrences:

import System.Environment (getArgs)
import Control.Monad (forM_)
import Data.Maybe (catMaybes)
import Data.List (sort)

-- Find a fraction in the Calkin-Wilf tree.  findCW a b computes the
-- location of a/b in the Calkin-Wilf tree, or returns Nothing if a
-- and b are not relatively prime.
findCW :: Integer -> Integer -> Maybe Integer
findCW 1 1 = Just 1
findCW a b | a < b  = left `fmap` findCW a (b-a)
           | a > b  = right `fmap` findCW (a-b) b
           | a == b = Nothing

left  x = 2*x
right x = 2*x + 1

-- Find the primary occurrences of n.
primaryOccs :: Integer -> [Integer]
primaryOccs n = sort . catMaybes $ [ findCW m n | m <- [1..n] ]

main :: IO ()
main = do (from:to:_) <- getArgs
          forM_ [read from..read to] $ \i -> do
            putStr $ show i ++ ": "
            putStrLn . unwords . map show . primaryOccs $ i

If there’s interest, I could write another post explaining how this program works in a bit more detail, but hopefully you can mostly understand what it’s doing from the preceding discussion. Let’s check that it works:

[brent@archimedes:~/teaching/mlt/hyperbin]$ ./prim-occs 2 16
2: 2
3: 4 6
4: 8 14
5: 10 12 16 30
6: 32 62
7: 18 22 24 28 64 126
8: 20 26 128 254
9: 34 46 48 60 256 510
10: 38 56 512 1022
11: 36 40 54 58 66 94 96 124 1024 2046
12: 44 50 2048 4094
13: 42 52 70 78 112 120 130 190 192 252 4096 8190
14: 68 80 110 122 8192 16382
15: 72 118 258 382 384 508 16384 32766
16: 92 98 134 158 224 248 32768 65534

Yup, looks good! And to show that it really is efficient, let’s use it to compute the primary occurrences of 1000 (yes, all 400 of them!):

[brent@archimedes:~/teaching/mlt/hyperbin]$ ./prim-occs 1000 1000
1000: 39770 42278 56024 58532 68396 70568 71444 75674 76688 85046 86480 92576 104030
... 386 more primary occurrences...
1071508607186267320948425049060001810561404811705533607443750388370351051124936
1224931983788156958581275946729175531468251871452856923140435984577574698574803
9345677748242309854210746050623711418779541821530464749835819412673987675591655
43946077062914571196477686542167660429831652624386837205668069374

On my computer this runs in about 0.03 seconds. Sweet! Note that the position of the final primary occurrence of 1000 has 303 digits–that’s approximately one centillion. I certainly wouldn’t want to go through the first one centillion numbers in the hyperbinary sequence looking for 1000 by hand, would you?

Further Reading

I hope you’ve enjoyed this little (1.5-year-long!) excursion into the fascinating world opened up by Calkin and Wilf’s paper. (I’ve sure enjoyed writing it!) If you’re interested in exploring more, I suggest looking at Roland Backhouse and João Ferreira’s paper, Recounting the Rationals, Twice! which also explains the connection to another famous tree of fractions, the Stern-Brocot tree. If you’re interested in the computational aspect of the hyperbinary numbers, have a look at Enumerating the Rationals, by Jeremy Gibbons, David Lester, and Richard Bird.

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in arithmetic, computation, induction, iteration, number theory, pattern, proof, recursion, sequences, solutions and tagged , , , , , , . Bookmark the permalink.

6 Responses to The hyperbinary sequence and the Calkin-Wilf tree

  1. Pingback: Carnival of Mathematics #59 « The Number Warrior

  2. Jonathan says:

    By the time you reached the end, I was forced to go back and reread from the beginning!

    Very nice job, and nice connections, throughout.

    Thank you.

  3. Brent says:

    Yes, in order to reach the end, I was forced to go back and reread from the beginning, too! =) Glad you enjoyed it.

  4. Ashwin Jain says:

    Very brilliant work. Really liked it.
    Awesome.
    Really Great.

  5. Pingback: Learn You a Haskell! | The Math Less Traveled

  6. Joe ST says:

    Epic stuff, damn number theory is awesome 😀

Comments are closed.