the neural net backgammon programs

The best backgammon programs use temporal difference algorithms to train a backpropagation neural network by self-play. The top programs are world-class in playing strength.

Jay : game learning : backgammon

These programs excel at strategic and positional judgment, using their knowledge to make fine distinctions between plays. They are less skilled in "technical" positions, such as bearing in against an anchor, which humans solve by calculation of the probabilities.

This is the opposite of the situation in many other games, in which computers calculate tactics well but fall short in strategic understanding. As a result there's been a lot of interest in applying temporal difference learning with neural nets to other games, and to mundane tasks too.

Special thanks to backgammon expert Kit Woolsey <kwoolsey@netcom.com>. He has investigated TD-Gammon and JellyFish in depth and offered me his findings freely.

- Gerry Tesauro's TD-Gammon
The first, with many references.

- programs inspired by TD-Gammon
JellyFish, mloner, and Snowie.

- programs using other techniques
Motif, MVP Backgammon and Silicon Highlands Backgammon.

- Greg Galperin's rollout project
Rolling out positions in real time with a parallel computer.

- the hillclimbing HC-Gammon
A research program that explores a different learning algorithm.

Also see TD learning as a mutation for another training idea.

FIBS

The First Internet Backgammon Server provides interactive play by telnet.

Many programs play automatically on FIBS. The FIBS Bots page keeps track of them.

comparing the programs

Kit Woolsey wrote to me that, although stylistic differences become minor at such a high level of play, TD-Gammon and JellyFish do have distinct styles. TD-Gammon prefers positional play, and is stronger than JellyFish at priming battles and backgames. JellyFish likes to attack, and plays blitzes better than TD-Gammon.

It's interesting that the programs' strengths correspond to their styles. That surprised me, because they play the moves that they evaluate as strongest, not the moves that they understand the best. It's possible that this happens as a side effect of training: if a program understands some type of position better, then it wins that type more often in self-play, so the backed-up evaluations are higher and it tends to prefer those positions. But I'm speculating.

weaknesses

According to Kit Woolsey, the types of technical positions in which the programs show some weakness include races, coming in against an anchor, and board building while playing a holding game. These are positions in which the probabilities of important events--such as being forced to leave a shot for your opponent--can be calculated.

The neural networks merely estimate the probabilities, rather than calculating them. In addition, I suspect that technical positions may be more difficult for the net to learn about because their values are less smooth with respect to the net's input features. Fortunately, the differences between plausible technical moves are small, so the errors are usually slight.

TD-Gammon's equity estimates are often systematically biased. This doesn't much affect checker play, because plays are usually ranked in the right order, but it can hurt doubling cube decisions. Kit Woolsey confirms that TD-Gammon accepts too many doubles in certain types of positions, because it underestimates its probability of being gammoned. I believe that the other programs have a similar weakness.

how can the backgammon programs be improved?

How can these weaknesses be addressed? From what I've found out so far, I see no large obstacles in the path to eventually creating a backgammon program unquestionably stronger than any human.

Better features. The most obvious approach is to give the neural network a set of backgammon features that allow it to learn a smooth value function; learning will then be more accurate. A difficulty is that comparison experiments are expensive; each trial set of features may have to play a million games before it can be compared to others.

Brute force. Someday, computers will be so fast that they can do rollout analyses (playing a position out with many different dice rolls) in real time. For now, that still takes hours on fast personal computers; in the meantime, we could run backgammon programs on supercomputers, or on special purpose neural net hardware, to allow deeper searches. Or we could seek software improvements to speed up evaluation.

Smarter search. As far as I have been able to figure out, backgammon programs are doing full-width search. Because searches are so expensive (since they have to cover the possible dice rolls), and because it seems relatively easy to estimate the uncertainties in evaluations, backgammon should be an ideal domain for rational search, in which branches of the tree that are more likely to affect the final decision are searched more deeply in a principled way.

can the backgammon success be duplicated in other games?

Attempts so far have been discouraging, at least by comparison. See Thrun's NeuroChess and Schraudolph's go networks. The KnightCap learning chess program is relatively successful compared to these, but hardly as impressive as the backgammon programs. Apparently backgammon is particularly easy for the temporal difference plus neural net technique.

Also, in a game with a large tactical component, like chess, most positions have the quality of technical positions in backgammon--of needing precise calculation to find the truth--so we shouldn't expect a network to magically learn good evaluations.

Nevertheless, I'm confident that with more effort the tree will bear fruit outside the garden. The two basic approaches are to extend the method so that it works in more situations, or to decompose the task, manually or automatically, into subtasks, so that one subtask is suitable--in other words, to use a hybrid method, as suggested by Nici Schraudolph in the conclusion of his go networks paper. For example, a net might be assigned the task of comparing grand strategy options at the highest level of abstraction: its job would be to turn the knobs and throw the switches to control a performance element that worked with a conventional lookahead and evaluation.


updated 4 January 2001 (link updated)