SAL

SAL is a general game-learning system written by Michael Gherrity and described in his 1993 PhD thesis for the University of California at San Diego.

Jay : game learning : SAL

about SAL

SAL can apparently learn any two-player board game of perfect information, provided the game can be played on a rectangular board and uses fixed types of pieces. (How well it learns to play is another matter.) Here’s how it works.

- The user supplies a move generator, written in C, and some constants that declare the size of the board and the number and type of pieces used in the game. This is SAL’s only knowledge of the rules of the game.

- SAL chooses its moves with a search, like any chess program. It uses a two-ply, full-width alpha-beta search, plus a consistency search (explained below) for quiescence.

- SAL’s evaluation function is a backpropagation neural network (actually, it uses separate evaluators for the two sides of the game). The inputs to the network are features representing the board position, the number of pieces of each type on the board, the type of piece just moved, the type of piece just captured, and a bunch of ad hoc features having to do with pieces and squares under attack and threats to win the game.

- The neural network evaluator is trained by the method of temporal differences to estimate the outcome of the game, given the current position.

SAL is written in ANSI C, and page 66 of the thesis mentions that it runs on both Macintosh and Cray computers. Hmm!

consistency search

I hadn’t heard of consistency search before, and I thought that the analysis of this search algorithm was the most interesting part of the thesis.

Gherrity modifies the original consistency search algorithm to assume that only one child of an inconsistent node has an incorrect evaluation. The modified algorithm searches by expanding children until it identifies and corrects the incorrect evaluation.

A long mathematical analysis shows that the modified consistency search works better than a simple one-ply search if the evaluation function is accurate enough. Mike Gherrity reports (in private communication) that in practice the evaluation function is rarely accurate enough.

results

SAL was tested on the games of tic-tac-toe, connect-four, and chess.

- At tic-tac-toe, SAL learned to play perfectly after 20,000 games, which means it’s a bit slow to learn.

- At connect-four, a more complex game, after 100,000 games SAL had learned to defeat an opponent program about 80% of the time. I’m not sure how good that is, because I can’t tell how strong the opponent program was.

- At chess, SAL played 4200 games against gnuchess, with gnuchess set to play at a rate of one second per move. SAL drew eight games and lost the others. Graphs show that SAL was continuing to improve; the experiment had to be cut short for lack of CPU time. The first game shows random play by SAL, since it hadn’t learned anything yet. A drawn game shows how much SAL improved, though it still plays very weakly. (It drew because one second was not enough time for gnuchess to foresee a draw by repetition.) A graph indicates that SAL did not usually survive into the endgame against gnuchess.

my opinion

SAL uses powerful learning methods, and some of the neural network evaluation features seem to be chosen to help it learn chess-like games. Nevertheless, it plays chess poorly after thousands of training games. The difficulty is not only SAL’s shallow search; playing over the sample chess game convinced me that SAL did not learn enough chess knowledge to play competently—in the sample drawn game SAL repeatedly sacrificed material for little compensation, drawing when gnuchess played the ridiculous move 25. Qxc6??

I think the lesson is that a fixed set of evaluation features is not enough to play a difficult game well—not when the learning method is as simpleminded as backpropagation. General game-learning systems need to discover or generate their own features, ones that work well for specific games, perhaps using genetic algorithms or by reasoning about the rules of the game. Programs need a deeper understanding of the structure of the game than can be achieved with backprop’s generalist function approximation view of the world.

references

- A Game-Learning Machine (1993, about 100 pages)
Compressed postscript.

- SAL source code (over 1MB)
ANSI C, compressed unix tar format.


updated 6 May 2012 (one overlooked bad link fixed)