Many of my readers may have already heard about AlphaGo, a computer program developed by Google DeepMind which plays the ancient board game of Go. Developing a program to play Go is not that big of a deal—the first such program was written in 1968—but what is a big deal is how well it plays. In fact, it has just beaten Ke Jie, widely regarded at the moment to be the best human Go player in the world, three games to none in a three-game series.
Go is a very old board game, invented in ancient China over 2500 years ago. Players take turns playing white and black stones on the intersections of a grid, with the goal being to surround more territory than the opponent. The rules themselves are actually quite simple—it takes just a few minutes to learn the basics—but the simple rules have complex emergent properties that make the strategy incredibly deep. Top human players spend their whole lives devoted to studying and improving at the game.
I enjoy playing Go for many of the same reasons that I love math—it is beautiful, deep, often surprising, and rewards patient study. If you want to learn how to play—and I highly recommend that you do—try starting here!
Ever since IBM’s Deep Blue beat world champion Garry Kasparov in a chess match in 1997, almost 20 years ago, the best chess-playing computer programs have been able to defeat even top human players. Go, on the other hand, is much more difficult for computers to play. There are several reasons for this:
Up until just a few years ago, the best Go-playing programs could play at the level of a decent amateur player but could not come anywhere close to beating professional-level players. Most people thought that further improvements would be very difficult to achieve and that it would be another decade or two before computer programs could beat top human players. So AlphaGo came as quite a surprise, and is based on recent fundamental advances in machine learning techniques (which are already having lots of other cool applications).
It is particularly interesting that AlphaGo works in a much more “human-like” way than Deep Blue did. Deep Blue was able to win at chess essentially by being really fast at evaluating lots of potential moves—it could analyze hundreds of millions of positions per second—and by consulting giant tables of memorized positions. Chess playing programs are better than us at chess, yes, but as far as I know we haven’t particularly learned anything from them. AlphaGo, on the other hand, “learned” how to play go by studying thousands of human games and then improving by playing itself many, many times. It uses several “neural networks”—a machine learning technique which is ultimately modeled on the structure of the human brain—both to predict promoising moves to study and to evaluate board positions. Of course it also plays out many potential sequences of moves to evaluate them. So it plays using a combination of pattern recognition and speculatively playing out potential sequences of moves—which is exactly how humans play. The amazing thing is that AlphaGo has actually taught us new things about Go—on many occasions it has played moves that humans describe as surprising and beautiful. It has also played moves that the accepted wisdom said were bad moves, but AlphaGo showed how to make them work. One might expect that people in the Go world might feel a sense of loss upon being beaten by a computer program—but the feeling is actually quite the opposite, because of the beatiful way AlphaGo plays and how much it has taught us about the game.
Last time I explained a method for computing best play for instances of the greedy coins game, which is feasible even for large games. This general approach is known as dynamic programming and is applicable whenever we have some recursively defined thing where the recursion generates a lot of overlap/duplication: in our case, the game tree and the values for Alice are defined recursively, but we saw that there is a lot of duplication of subgames. The solution is to organize things in such a way that we never have to look at a particular subthing more than once.
In practice, instead of thinking about trees, we can just keep an upper-triangular matrix , where will denote Alice’s best score from the point when only coins are left. (The matrix is upper-triangular since this only makes sense when .) We can also keep a separate upper-triangular matrix where is the best move when coins are left ( means that both moves are equally good).
When coins are left, either coin or coin will be taken, leaving coins or . So, if we already know the values of and , we can use them to compute the optimal value for (and to decide which move is better). This corresponds to the observation that we can compute the value at a node in the game tree as long as we already know the values at both of its children.
Here is one way to visualize these tables, turned diagonally so it ends up looking very similar to the compact trees from my previous post; each cell corresponds to the coins along the base of the triangle which has that cell as its apex. The light blue square in each cell shows the value of ; the arrows indicate the best move(s) , with blue arrows for Alice’s moves and green for Bob’s.
For example, the top cell says that from this state (when all four coins remain) Alice will get 5 points with best play, and the two blue arrows mean that it does not matter which coin Alice takes. Suppose she takes the , so the are left. The corresponding cell is the cell at the apex of the triangle whose base is :
So now Bob can look at this cell to see what his optimal play is. He can see that from this position Alice will get 2 more points if they both play their best. He can also see from the green arrow that his best move is to move into the cell below and to the left, that is, to leave Alice with the coins —which means he should take the coin on the right, namely, the . Finally, Alice’s best move in this situation is to take the on the right, with the blue arrow pointing to what Bob is left with.
Using this visualization we can easily look at bigger games. For example, in my first post I left readers with the challenge of analyzing this game:
From the table we can now see that Alice will score points with best play, and that her best move is to start by taking the (the blue arrow points to the right, so she should take the on the left in order to send Bob to the game on the right). It doesn’t matter which move Bob plays next, and then Alice will take either the or the , depending on Bob’s move, and so on.
One nice thing to note is that these tables don’t just tell us what should happen when both players play optimally. They tell us the optimal play in any subgame. In other words, one could say that they even show us how to best capitalize on mistakes made by our opponent. To play the greedy coins game perfectly, first just compute the tables and (actually, this is not too hard to learn how to do by hand, especially if you use the above format). Then when it is your turn, if coins remain just look up to see what your best move is. If you have used the above format you don’t even need to bother with keeping track of the indices and ; just find the remaining coins along the bottom and find the apex of their triangle. (In addition to finding your best move you can also confidently, and annoyingly, announce to your opponent that you will get at least points no matter what they do; for extra annoyingness, you can let your opponent choose your move whenever the move table tells you that both moves are optimal.)
Just for fun, here’s an analysis of the slightly larger game , which is a counterexample to one of my original conjectures about tied games:
One final thing I will mention is that it’s hard to tell from looking at Alice’s total score whether the game is tied, or how much Alice will win by. Of course we can compute it if we know the total value of all the coins: Alice will win by the difference between her total and half the total coin value. But it might be nicer to directly visualize not Alice’s total score but the margin of her victory over Bob. This is related to the function defined by Eric Burgess in a previous comment; more on this in a future post.