playout analysis

Here is my early proposal of repeated playouts as a general form of search. I first wrote up this idea in the late 1990’s, when playouts were known and used in some games (as mentioned below), but years before the invention of Monte Carlo tree search (MCTS). MCTS is a family of algorithms that concretely implement playout analysis as a general form of search. I don’t know that my writeup had any influence on the invention of MCTS, but perhaps it helped prepare the atmosphere.

Playout analysis is a more general idea than MCTS, or in other words, MCTS is only one form of playout analysis. There is still a lot of room for new ideas, and below I point out a couple places to look for them.

Jay : game learning : ideas : playout

Rollout analysis is a popular way to analyze a backgammon position. A computer program generates sequences of dice rolls and plays them through to the end of the game. Repeat this many times and, if the program plays strongly enough, you get an accurate evaluation of the original game position. The analogous idea in Scrabble is called “simulation”: a program shuffles the letter tiles and repeatedly plays out the game for both sides.

The idea is general. A card playing program might play out the game assuming various distributions of the unseen cards in its opponents’ hands, as Matt Ginsberg’s bridge program GIB does (though GIB assumes it knows the unseen cards, and solves the hand given that distribution). Even a chess program might play out a position, trying different moves rather than different random events.

I call the general form of the idea playout analysis.

playout analysis can work for perfect-information games

Playout analysis makes sense for imperfect-information games like backgammon and Scrabble because it can capture the effects of the random events in these games. It turns out that playout analysis can also be useful in perfect-information games like chess, at least in some situations.

Here’s an example. In chess, there are many “fortress draw” endgames, where one side has extra material but the opponent’s position has no weaknesses--it is a “fortress” that can be defended indefinitely. Chess programs often play these endings well, attacking persistently on the strong side and defending accurately on the weak side. But a chess program that uses depth-limited search will often believe that the strong side can win. It cannot search deeply enough to realize that there’s a defense to every attack. Therefore if it’s on the strong side it may enter a drawn endgame in the belief that it can win; or if it’s on the weak side, it may fail to build a fortress when it has an opportunity.

Since it can play both sides well once the fortress is built, a program that used playout analysis would understand that the position was drawn. It would analyze various lines of play to the end of the game, using a shallow search to choose moves along each line. If all or most of the lines end in draws, the program can reasonably suspect that the position is a draw.

Of course, this isn’t guaranteed to work in every position. If the program plays good moves then the playout will return good results, but errors may muck it up. The example shows that there’s at least one kind of position where it does work, however.

playout analysis is a kind of search

A depth-limited search looks at moves near the root of the search tree, which is to say, near the original position. Playout is a search which spends more effort looking at moves far from the root. Playout is a kind of selective search. Like other selective searches, it will work best if it looks at good moves, and it will tend to fail if its “move generator” turns up a lot of lemons.

[Note added 2012: This turned out to be false! The success of MCTS showed that there are other ways of reducing errors. Perhaps you could even say: For every way of turning greater search effort into greater accuracy, you get a different family of search algorithms.]

variations

There are infinite variations. Backgammon programs often offer truncated rollouts, where the analysis stops after a given number of steps, to save time while hopefully losing little information. More generally, a playout line can be cut off as soon as the game outcome becomes clear. A game in which many interactions cover only a short distance on the board, like go or a wargame, might play out one sector only, ignoring the rest of the board.

Consider a chess program, or a wargame AI, which is trying to decide whether to launch a risky attack. The program can’t see to the end of the attack and doesn’t know whether it’ll work, so it plays out sample sequences to estimate the chances. Each line can stop as soon as the attack can be declared successful or unsuccessful.

Playout analysis isn’t restricted to searching future moves. It can search over plans. Suppose a program has preset “aggressive” and “defensive” styles of play. It can play out a position with each style to decide which is more suitable in this case.

playout analysis is embarrassingly parallel

So what’s better, depth-limited alpha-beta or playout? For most games, most of the time, I would guess that moves near the root are more informative than moves far away, so on a uniprocessor I’d usually pick alpha-beta (if I had to choose only one).

On a multiprocessor, alpha-beta suffers parallelization losses which increase with the number of processors. A massively parallel machine can only work at a fraction of its capability if it’s doing alpha-beta.

Playout analysis has virtually no parallelization loss. It’s “embarrassingly parallel”. You set a bunch of processors running, and their only communication need is to return their results, which are then averaged together. So playouts may have an advantage on a massively parallel machine.

A hybrid scheme, with some processors crunching nearby nodes in a parallel alpha-beta search while others explore far away, might be best of all. Of course, everything depends on the game.

the bottom line

It makes sense to see as far into the future as you can. Playout is a way of bringing distant information into view, information that many current game-playing programs overlook. It may or may not be a good way, in any given case, but we should stay on the lookout for any way that works.

Playout analysis has obvious value for games with random events, because it can capture the effects of these events over time, even distant effects. I think it may also be especially helpful for wargames and similar games in which exhaustive search is infeasible and the value of long-term strategies must be estimated. [Note added 2012: The game of go falls into this category; exhaustive search is infeasible and the value of long-term strategies must be estimated. MCTS is the best known search for go, so I call this prediction a success.]


updated 27 May 2012