SAL is a general game-learning system written by Michael Gherrity and described in his 1993 PhD thesis for the University of California at San Diego.
| Jay : game learning : SAL |
SAL can apparently learn any two-player board game of perfect information, provided the game can be played on a rectangular board and uses fixed types of pieces. (How well it learns to play is another matter.) Here’s how it works.
The user supplies a move generator, written in C, and some constants
that declare the size of the board and the number and type of pieces
used in the game. This is SAL’s only knowledge of the rules of the game.
SAL chooses its moves with a search, like any chess program.
It uses a two-ply, full-width alpha-beta search, plus a
consistency search (explained below) for
quiescence.
SAL’s evaluation function is a backpropagation neural network
(actually, it uses separate evaluators for the two sides of
the game). The inputs to the network are features representing
the board position, the number of pieces of each type on the
board, the type of piece just moved, the type of piece
just captured, and a bunch of ad hoc features having to do with
pieces and squares under attack and threats to win the game.
The neural network evaluator is trained by the method of
temporal differences to estimate the outcome of the game,
given the current position.
SAL is written in ANSI C, and page 66 of the thesis mentions that it runs on both Macintosh and Cray computers. Hmm!
I hadn’t heard of consistency search before, and I thought that the analysis of this search algorithm was the most interesting part of the thesis.
Gherrity modifies the original consistency search algorithm to assume that only one child of an inconsistent node has an incorrect evaluation. The modified algorithm searches by expanding children until it identifies and corrects the incorrect evaluation.
A long mathematical analysis shows that the modified consistency search works better than a simple one-ply search if the evaluation function is accurate enough. Mike Gherrity reports (in private communication) that in practice the evaluation function is rarely accurate enough.
SAL was tested on the games of tic-tac-toe, connect-four, and chess.
At tic-tac-toe, SAL learned to play perfectly after 20,000 games,
which means it’s a bit slow to learn.
At
connect-four,
a more complex game, after 100,000 games SAL had
learned to defeat an opponent program about 80% of the time.
I’m not sure how good that is, because I can’t tell how strong
the opponent program was.
At chess, SAL played 4200 games against gnuchess, with gnuchess
set to play at a rate of one second per move. SAL drew eight games
and lost the others. Graphs show that SAL was continuing to improve;
the experiment had to be cut short for lack of CPU time.
The first game shows
random play by SAL, since it hadn’t learned anything yet.
A drawn game
shows how much SAL improved, though it still plays very weakly.
(It drew because one second was not enough time for
gnuchess to foresee a draw by repetition.)
A graph indicates that SAL did not usually survive into the endgame
against gnuchess.
SAL uses powerful learning methods, and some of the neural network evaluation features seem to be chosen to help it learn chess-like games. Nevertheless, it plays chess poorly after thousands of training games. The difficulty is not only SAL’s shallow search; playing over the sample chess game convinced me that SAL did not learn enough chess knowledge to play competently—in the sample drawn game SAL repeatedly sacrificed material for little compensation, drawing when gnuchess played the ridiculous move 25. Qxc6??
I think the lesson is that a fixed set of evaluation features is not enough to play a difficult game well—not when the learning method is as simpleminded as backpropagation. General game-learning systems need to discover or generate their own features, ones that work well for specific games, perhaps using genetic algorithms or by reasoning about the rules of the game. Programs need a deeper understanding of the structure of the game than can be achieved with backprop’s generalist function approximation view of the world.
A Game-Learning Machine (1993, about 100 pages)
Compressed postscript.
SAL source code (over 1MB)
ANSI C, compressed unix tar format.