- algorithm approximation bar binary binomial coefficients book cards carnival Carnival of Mathematics chocolate circle coins complex conjecture convolution counting decadic decimal diagrams Dirichlet Euclidean factorization fibonacci fractal game games gcd graph groups Haskell hyperbinary idempotent integers interactive irrational Ivan Niven Lagrange lehmer lucas MaBloWriMo making Mersenne moebius mu nim number numbers objects omega order pi powers prime primes primitive programming proof puzzle review roots sequence square strategy sum symmetry test tree triangular two-player unit unity video visualization X zero-sum
### Blogroll

### Fun

### Reference

### Categories

- algebra (46)
- arithmetic (63)
- books (29)
- calculus (7)
- challenges (53)
- combinatorics (12)
- complex numbers (6)
- computation (46)
- convergence (9)
- counting (32)
- famous numbers (48)
- fibonacci (18)
- fractals (13)
- games (34)
- geometry (56)
- golden ratio (8)
- group theory (26)
- humor (6)
- induction (7)
- infinity (19)
- iteration (24)
- links (74)
- logic (6)
- meta (41)
- modular arithmetic (24)
- number theory (76)
- open problems (11)
- paradox (1)
- pascal's triangle (8)
- pattern (92)
- people (21)
- pictures (70)
- posts without words (25)
- primes (35)
- probability (6)
- programming (17)
- proof (69)
- puzzles (15)
- recursion (16)
- review (20)
- sequences (28)
- solutions (28)
- teaching (14)
- trig (3)
- Uncategorized (6)
- video (19)

### Archives

- July 2017 (4)
- June 2017 (4)
- May 2017 (9)
- April 2017 (7)
- March 2017 (5)
- February 2017 (4)
- January 2017 (3)
- December 2016 (4)
- November 2016 (6)
- October 2016 (6)
- September 2016 (2)
- August 2016 (5)
- July 2016 (2)
- June 2016 (4)
- May 2016 (4)
- April 2016 (2)
- March 2016 (3)
- February 2016 (9)
- January 2016 (8)
- December 2015 (5)
- November 2015 (29)
- August 2015 (3)
- June 2015 (2)
- April 2015 (1)
- May 2014 (1)
- December 2013 (1)
- October 2013 (1)
- July 2013 (1)
- June 2013 (1)
- May 2013 (1)
- April 2013 (3)
- March 2013 (3)
- February 2013 (2)
- January 2013 (5)
- December 2012 (3)
- November 2012 (4)
- October 2012 (5)
- September 2012 (1)
- August 2012 (4)
- July 2012 (1)
- June 2012 (6)
- May 2012 (2)
- April 2012 (3)
- March 2012 (1)
- February 2012 (4)
- January 2012 (5)
- December 2011 (1)
- November 2011 (7)
- October 2011 (4)
- September 2011 (6)
- July 2011 (2)
- June 2011 (4)
- May 2011 (5)
- April 2011 (2)
- March 2011 (4)
- February 2011 (1)
- January 2011 (1)
- December 2010 (1)
- November 2010 (4)
- October 2010 (2)
- September 2010 (1)
- August 2010 (1)
- July 2010 (1)
- June 2010 (2)
- May 2010 (3)
- April 2010 (1)
- February 2010 (6)
- January 2010 (3)
- December 2009 (8)
- November 2009 (7)
- October 2009 (3)
- September 2009 (3)
- August 2009 (1)
- June 2009 (4)
- May 2009 (5)
- April 2009 (4)
- March 2009 (2)
- February 2009 (1)
- January 2009 (7)
- December 2008 (1)
- October 2008 (2)
- September 2008 (7)
- August 2008 (1)
- July 2008 (1)
- June 2008 (1)
- April 2008 (5)
- February 2008 (4)
- January 2008 (4)
- December 2007 (3)
- November 2007 (12)
- October 2007 (2)
- September 2007 (4)
- August 2007 (3)
- July 2007 (1)
- June 2007 (3)
- May 2007 (1)
- April 2007 (4)
- March 2007 (3)
- February 2007 (7)
- January 2007 (1)
- December 2006 (2)
- October 2006 (2)
- September 2006 (6)
- July 2006 (4)
- June 2006 (2)
- May 2006 (6)
- April 2006 (3)
- March 2006 (6)

### Meta

## AlphaGo

Many of my readers may have already heard about AlphaGo, a computer program developed by Google DeepMind which plays the ancient board game of Go. Developing a program to play Go is not that big of a deal—the first such program was written in 1968—but what *is* a big deal is how well it plays. In fact, it has just beaten Ke Jie, widely regarded at the moment to be the best human Go player in the world, three games to none in a three-game series.

# What is Go?

Go is a very old board game, invented in ancient China over 2500 years ago. Players take turns playing white and black stones on the intersections of a grid, with the goal being to surround more territory than the opponent. The rules themselves are actually quite simple—it takes just a few minutes to learn the basics—but the simple rules have complex emergent properties that make the strategy incredibly deep. Top human players spend their whole lives devoted to studying and improving at the game.

I enjoy playing Go for many of the same reasons that I love math—it is beautiful, deep, often surprising, and rewards patient study. If you want to learn how to play—and I highly recommend that you do—try starting here!

# Why is this a big deal?

Ever since IBM’s Deep Blue beat world champion Garry Kasparov in a chess match in 1997, almost 20 years ago, the best chess-playing computer programs have been able to defeat even top human players. Go, on the other hand, is much more difficult for computers to play. There are several reasons for this:

- The number of possible moves is
*much*higher in Go than in chess, and games tend to last much longer (a typical game of Go takes hundreds of moves as opposed to around 40 moves for chess). So it is completely infeasible to just try all possible moves by brute force. - With chess, it is not too hard to evaluate who is winning in a particular position, by looking at which pieces they have left and where the pieces are on the board; with Go, on the other hand, evaluating who is winning can be extremely difficult.

Up until just a few years ago, the best Go-playing programs could play at the level of a decent amateur player but could not come anywhere close to beating professional-level players. Most people thought that further improvements would be very difficult to achieve and that it would be another decade or two before computer programs could beat top human players. So AlphaGo came as quite a surprise, and is based on recent fundamental advances in machine learning techniques (which are already having lots of other cool applications).

It is particularly interesting that AlphaGo works in a much more “human-like” way than Deep Blue did. Deep Blue was able to win at chess essentially by being really fast at evaluating lots of potential moves—it could analyze hundreds of millions of positions per second—and by consulting giant tables of memorized positions. Chess playing programs are better than us at chess, yes, but as far as I know we haven’t particularly learned anything from them. AlphaGo, on the other hand, “learned” how to play go by studying thousands of human games and then improving by playing itself many, many times. It uses several “neural networks”—a machine learning technique which is ultimately modeled on the structure of the human brain—both to predict promoising moves to study and to evaluate board positions. Of course it also plays out many potential sequences of moves to evaluate them. So it plays using a combination of pattern recognition and speculatively playing out potential sequences of moves—which is exactly how humans play. The amazing thing is that AlphaGo has actually taught us new things about Go—on many occasions it has played moves that humans describe as surprising and beautiful. It has also played moves that the accepted wisdom said were bad moves, but AlphaGo showed how to make them work. One might expect that people in the Go world might feel a sense of loss upon being beaten by a computer program—but the feeling is actually quite the opposite, because of the beatiful way AlphaGo plays and how much it has taught us about the game.

## Computing optimal play for the greedy coins game, part 4

Last time I explained a method for computing best play for instances of the greedy coins game, which is feasible even for large games. This general approach is known as *dynamic programming* and is applicable whenever we have some recursively defined thing where the recursion generates a lot of overlap/duplication: in our case, the game tree and the values for Alice are defined recursively, but we saw that there is a lot of duplication of subgames. The solution is to organize things in such a way that we never have to look at a particular subthing more than once.

In practice, instead of thinking about trees, we can just keep an upper-triangular matrix , where will denote Alice’s best score from the point when only coins are left. (The matrix is upper-triangular since this only makes sense when .) We can also keep a separate upper-triangular matrix where is the best move when coins are left ( means that both moves are equally good).

When coins are left, either coin or coin will be taken, leaving coins or . So, if we already know the values of and , we can use them to compute the optimal value for (and to decide which move is better). This corresponds to the observation that we can compute the value at a node in the game tree as long as we already know the values at both of its children.

Here is one way to visualize these tables, turned diagonally so it ends up looking very similar to the compact trees from my previous post; each cell corresponds to the coins along the base of the triangle which has that cell as its apex. The light blue square in each cell shows the value of ; the arrows indicate the best move(s) , with blue arrows for Alice’s moves and green for Bob’s.

For example, the top cell says that from this state (when all four coins remain) Alice will get 5 points with best play, and the two blue arrows mean that it does not matter which coin Alice takes. Suppose she takes the , so the are left. The corresponding cell is the cell at the apex of the triangle whose base is :

So now Bob can look at this cell to see what his optimal play is. He can see that from this position Alice will get 2 more points if they both play their best. He can also see from the green arrow that his best move is to move into the cell below and to the left, that is, to leave Alice with the coins —which means he should take the coin on the *right*, namely, the . Finally, Alice’s best move in this situation is to take the on the right, with the blue arrow pointing to what Bob is left with.

Using this visualization we can easily look at bigger games. For example, in my first post I left readers with the challenge of analyzing this game:

From the table we can now see that Alice will score points with best play, and that her best move is to start by taking the (the blue arrow points to the right, so she should take the on the left in order to send Bob to the game on the right). It doesn’t matter which move Bob plays next, and then Alice will take either the or the , depending on Bob’s move, and so on.

One nice thing to note is that these tables don’t just tell us what should happen when *both* players play optimally. They tell us the optimal play in *any* subgame. In other words, one could say that they even show us how to best capitalize on mistakes made by our opponent. To play the greedy coins game perfectly, first just compute the tables and (actually, this is not too hard to learn how to do by hand, especially if you use the above format). Then when it is your turn, if coins remain just look up to see what your best move is. If you have used the above format you don’t even need to bother with keeping track of the indices and ; just find the remaining coins along the bottom and find the apex of their triangle. (In addition to finding your best move you can also confidently, and annoyingly, announce to your opponent that you will get at least points no matter what they do; for extra annoyingness, you can let your opponent choose your move whenever the move table tells you that both moves are optimal.)

Just for fun, here’s an analysis of the slightly larger game , which is a counterexample to one of my original conjectures about tied games:

One final thing I will mention is that it’s hard to tell from looking at Alice’s *total* score whether the game is tied, or how much Alice will win by. Of course we can compute it if we know the total value of all the coins: Alice will win by the difference between her total and half the total coin value. But it might be nicer to directly visualize not Alice’s *total* score but the *margin* of her victory over Bob. This is related to the function defined by Eric Burgess in a previous comment; more on this in a future post.

Posted in computation, games, recursion
Tagged coins, dynamic, game, optimal, play, programming, proof, recurrence, strategy, tree, two-player, zero-sum
Leave a comment