NeuroChess is a chess learning program by Sebastian Thrun when he was still at Carnegie Mellon.
Jay : game learning : NeuroChess |
NeuroChess is a chess program which learns from a combination of self-play and observing expert play. Its search algorithm is taken from gnuchess, but its evaluator is a neural network.
The information here is from one short paper, listed below, and from a sneak preview of half a chapter of his thesis that Thrun was kind enough to send me.
NeuroChess is an application of the “explanation-based neural network” (EBNN) method. EBNN adapts explanation-based learning to use neural network domain theories.
NeuroChess represents a chess position as a vector of 175 hand-written features. It has two neural networks, V and M. V is the evaluation function, which takes the feature vector as input and produces an evaluation used to play chess. M is a chess model which takes the feature vector as input and predicts its value two half-moves later. M‘s output is used in training V.
The chess model M is a neural network “theory” of good chess play. It is trained before V, on a large sample of grandmaster games.
The evaluation function V is trained on a mixture of grandmaster games and self-play. It learns from each position in each game, using an algorithm that takes into account a target evaluation function value and a target slope of the evaluation function for the training vector of 175 features. The target values are computed by a temporal difference method. The target slopes are computed using the chess model M applied to the feature vector of the chess position. In effect, the chess model estimates the importance of each feature to the evaluation of the position, and the training of V takes that information into account.
After training M with 120,000 grandmaster games, and training V with a further 2400 games, NeuroChess defeats gnuchess about 13% of the time. With the same training, a version of NeuroChess which does not use the chess model M wins about 10% of the time against gnuchess. EBNN is doing some good.
Learning from self-play alone failed. However, learning primarily from expert play introduced artifacts into the program’s evaluations. The paper gives an example: the program observes that grandmasters are more likely to win when they move their queens to the center of the board. It doesn’t realize that grandmasters only do such a thing when the queen is safe from harassment, and ends up moving its queen to the center at the earliest opportunity, safe or not. The best results came from a mixture of self-play and expert play. Training against other opponents was not tried (gnuchess was used only as a test opponent).
This sample win shows NeuroChess’s play against gnuchess after about 1000 training games. This version was trained mainly on grandmaster games and shows the resulting artifacts, especially in the opening. This sample loss was played by a version of NeuroChess with 382 features after about 3000 training games. In both games, both programs are searching three ply deep.
Thrun e-mailed me saying that he believes that with enough work, tuning and training, it’s likely that NeuroChess’s performance could be significantly improved. “The progress was rather steep” when the experiments were stopped, he wrote.
The sample win shows (as the paper agrees) both strengths and weaknesses in NeuroChess’s play. At various points in the game it maneuvers skillfully to reach important goals, such as winning the b-pawn and preventing black’s king from reaching safety.
At other times, NeuroChess plays absurd moves. Its opening play is pathetically weak, and the paper mentions that it often plays poorly in the endgame too. At one point in the game it moved its king unnecessarily into a position of greater danger. The sample loss is similar in this respect; NeuroChess won a pawn in the opening and played well for a while, holding its advantage, then made a series of terrible blunders to lose.
As a system, I think NeuroChess is a mixed success. It plays more strongly than SAL or Morph (but then, it has the advantage of 175 hand-coded features and gnuchess’s quiescence search). It shows that EBNN has some value. But the violent contrast between its strong moves and its absurd ones suggests that it is missing something. Perhaps it only needs longer training, but its uneven play makes me think that its learning architecture doesn’t have a wide enough scope for it to learn everything it needs to know to play well.
I think the important point here is nothing specific that NeuroChess does or does not do, but rather the general idea of having a game model that’s independent of the game evaluation. Learning to play a game is difficult, and any good way of decomposing the problem into subtasks is worth considering. I think that breaking the task down into a chess model and an evaluation model makes good sense, no matter what the models look like.
Learning to play the game of chess (1995, 8 pages)
Sebastian Thrun
Web page, with abstract and pointer to the full paper.