The Recamán sequence

I recently learned about a really interesting sequence of integers, called the Recamán sequence (it’s sequence A005132 in the Online Encyclopedia of Integer Sequences). It is very simple to define, but the resulting complexity shows how powerful self-reference is (for both good and evil). Here’s the definition. The first term of the sequence is $a_0 = 0$ , and each term $a_n$ differs from $a_{n-1}$ by $n$ . Now, if $a_n$ were always just $n$ more than $a_{n-1}$ , we would have the triangular numbers:

$\displaystyle 0, 1, 3, 6, 10, 15, 21, \dots$

Note how $a_1 = 1$ is one more than $a_0 = 0$ ; $a_2 = 3$ is $2$ more than $a_1$ ; $a_3 = 6$ is $3$ more than $a_2$ ; and so on.

But that’s not quite how the Recamán sequence is defined. In the Recamán sequence, $a_n$ “wants” to be $n$ less than $a_{n-1}$ : it will be so if $a_{n-1} - n$ is nonnegative and has not already appeared in the sequence. Otherwise, it “settles” for being $n$ more than $a_{n-1}$ (as with the triangular numbers).

So $a_0 = 0$ . Then:

$a_1$ has to differ from $a_0$ by $1$ —that is, it must be $-1$ or $1$ . But it can’t be negative, so $1$ it is.
How about $a_2$ ? It must be $2$ away from $a_1 = 1$ , so either $-1$ or $3$ —but again, it can’t be negative, so it is $3$ .
Now, $a_3$ must be $3$ away from $a_2$ , so it must be $0$ or $6$ . $0$ is nonnegative, but it has already appeared in the sequence as $a_0$ , so we choose $6$ .
So far, this is looking a lot like the triangular numbers! But let’s see what happens with $a_4$ . It must be $6 \pm 4$ , that is, either $2$ or $10$ . And here something different happens: $2$ is positive and has not appeared yet in the sequence, so $a_4 = 2$ .

Continuing this pattern, we get

$\displaystyle 0, 1, 3, 6, 2, 7, 13, 20, 12, 21, 11, 22, 10, 23, 9, 24, 8, 25, 43, \dots$

From the definition, you might initially think that no number will ever be repeated: we explicitly avoid picking numbers that have already occurred in the sequence, right? Well, we don’t pick $a_{n-1} - n$ if it has already occurred, but in that case we definitely pick $a_{n-1} + n$ whether it has already occurred or not—and in fact sometimes it has. You don’t have to continue the sequence very much farther before you find, for example, $a_{20} = a_{24} = 42$ .

Here’s a scatterplot of the first $100$ terms, with the $y$ -axis scaled by 1/2 to make it easier to see:

From the graph we can see that the numbers tend to form long parallel alternating runs where the top numbers are increasing by $1$ and the bottom decreasing. For example, we can see the first of these starting at $13$ :

$\displaystyle 13, 20, 12, 21, 11, 22, 10, 23, 9, 24, 8, 25, \dots$

This makes sense: we are alternately adding $n$ , then subtracting $n+1$ , then adding $n+2$ , then subtracting $n+3$ , and so on. The run will be broken when we hit a number that has already occurred: in this case, the number after $25$ is not $7$ , because that has already occurred, so instead it jumps up again to $43$ . After $43$ , it jumps up one more time (since $24$ was already in the sequence) and we get a very short parallel sequence before it falls back down and starts another alternating sequence $41, 18, 42, 17, 43, 16, \dots$ .

So far, this looks regular-ish—one might hope to discover some regular patterns that could, for example, lead to a closed formula. But our hopes are dashed when we look further out in the sequence:

It doesn’t look very regular-ish anymore! And this is what I find fascinating about the Recamán sequence: the self-reference in its definition (we choose $a_{n-1} - n$ only when it has not already appeared) throws a giant monkey wrench of chaos into the works, so that it is very difficult to find any patterns or prove anything definite about it, even though it is still highly structured. To see what I mean, here’s a graph of the first 5000 terms:

There is most definitely structure—for example, all the terms seem to fall along these curving “bands” radiating out from the origin. I imagine one might even be able to say something approximate about the shape of those curves. But there is still obviously lots of chaos—it’s hard to discern any regular patterns.

So we know that the Recamán sequence is not 1-1: some numbers can appear multiple times. But is it onto? That is, does every number appear at least once, somewhere in the sequence? It is conjectured that this is true, but no one knows—it is an open question! According to the OEIS entry, after looking at the first $10^{15}$ terms, the smallest number that is still missing is $852655$ ; every number smaller than that has appeared somewhere in the first $10^{15}$ terms of the sequence. Crazily though, $852655$ is still missing after looking at the first $10^{230}$ terms!!! (That’s a staggeringly large number of terms; it is far more than the estimated number of atoms in the universe.) So perhaps the Recamán sequence is not onto after all—is there something special about $852655$ so that it never apears? Or does it eventually appear very very far into the sequence? No one knows, and to be honest, I think the latter actually seems more likely. But it just underscores how difficult it would be to prove this.

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.

View all posts by Brent →

5 Responses to The Recamán sequence

Julian R. says:

June 13, 2016 at 2:02 am

Wow!

How is it possible to calculate 10^230 terms?
I assume the self reference makes it so, that time taken to calculate more terms is at least linear in the number of terms. And 10^230 seems just way too large.

(I know nothing about Numerical methods or scientific computing)
Brent says:

June 13, 2016 at 4:14 pm

No, there is no possible way to actually calculate $10^{230}$ terms! Even if you could calculate $10^{15}$ terms per second (which is definitely an upper bound given today’s fastest supercomputers), it would still take over $10^{200}$ years. I do not actually know how the computation was done; I got this information from a comment on the OEIS entry. But I assume that it was making use of the structure of the sequence to skip a lot of work. For example, consider the sequence starting at $a_6 = 13$ . At this point we know the sequence will bounce up and down in two parallel trajectories until the bottom trajectory reaches 8, since 7 is the largest number we have seen so far. So we can jump straight ahead to $a_{18} = 43$ without explicitly computing all the numbers in between—we know exactly what is going to happen. You can probably do much more sophisticated things to skip even more work.
Laura Spencer says:

June 14, 2016 at 4:41 am

It looks like an isosceles right triangle, until I remembered you changed the scale.
Jan Van lent says:

June 16, 2016 at 7:26 pm

It is a fun exercise to implement this using a data structure that stores ranges of consecutive integers using just the start and end values. This is like run-length compression on the differences. It seems that that for n terms of the sequence, the number of ranges, i.e., the number of “gaps”, behaves as const*sqrt(n).
ignotus2013 says:

June 17, 2016 at 7:04 am

Thank you Brent for your blog about my sequence. It is my tiny contribution to the formidable building of mathematics. Your exposition of the sequence is clearer than many I have seen. I too believe (and hope) 852655 is somewhere in the sequence, and so every other number.