Professor Risto Miikkulainen’s group at the University of Texas at Austin, the Neural Networks Research Group, has written three papers on a technique for designing neural networks using a genetic algorithm, one describing the idea and two applying it to the game of othello.
| Jay : game learning : GA : marker-based |
The first authors of the papers are Brad Fullmer and David Moriarty. The papers can be found, along with others, on the group’s publications page (you may have to click the weird “also show archived” checkbox). The group is friendly to game research, so there’s a bunch of cool stuff there.
Using marker-based genetic encoding of neural networks
to evolve finite-state behavior (1991, 8 pages)
Brad Fullmer and Risto Miikkulainen
Abstract and paper.
The genetic algorithm is a standard one; what’s interesting is the network encoding. The encoding allows the network to have any architecture; there can be few units which are densely interconnected, or many units which are sparsely connected; the network can have internal feedback.
The idea was inspired by markers in biological DNA. Each network is represented as a list of small integers, say, in the range -100 to 100. Some of the integers, such as those which leave remainder one when divided by 15, are considered start markers, and others (such as those leaving remainder two) are end markers. To make a network, scan the list from beginning to end. When you encounter a start marker, the numbers up to the next end marker define one node of the network. There will usually be unused space between node definitions, like the (more complicated) gaps between genes in DNA.
The numbers in the node definition are interpreted as a key identifying the node, an initial value for the node’s state, and a list of (key, weight) pairs which are the node’s inputs from other nodes of the network.
Discovering complex othello strategies through
evolutionary neural networks (1995, 14 pages)
David Moriarty and Risto Miikkulainen
Abstract and paper.
The system evolves a network to play othello without search. The network inputs are the current board position, and its outputs correspond to potential moves. It plays the legal move with the highest activation. Networks which win games are considered more fit and get to reproduce.
Against an opponent that plays random moves, the system rapidly learns a simple positional strategy that almost always wins.
Against a smarter opponent that uses a reasonably sophisticated positional strategy and a three-ply search, but which does not understand the importance of mobility, the networks eventually discover othello’s mobility strategy, winning 70% of the time. That’s impressive, both because a non-searching, automatically-learned program defeats a reasonably sophisticated hand-made program that searches, and because humans took many years to discover the counterintuitive mobility strategy.
Against Bill, the first of the strong learning othello programs, the networks make no headway.
The system is learning to choose a move in a single position. This is an easier task than learning an evaluation function, which must produce values that can be compared across positions. The paper below, the one about pruning search trees, mentions that attempts to get the system to learn evaluation functions have been disappointing.
Evolving neural networks to focus minimax search
(1995, 7 pages)
David Moriarty and Risto Miikkulainen
Abstract and paper.
The system is given a fixed evaluation function, and performs a search to a predetermined depth to decide on its moves in an othello game. It evolves a “focus network” to choose which moves should be included in the search and which should be pruned away. Again, networks which win games are more fit.
The network’s inputs were the current game position at a node in the search tree. Its outputs corresponded to moves on the board. After running the networks, moves with positive output values were included in the search and the other moves were pruned away.
In tests with a simple evaluation function, the networks evolved to play far more strongly than a full-width search of the same depth, winning 76% of the time. The networks were effectively hiding errors made by the evaluation function. In tests with Bill’s sophisticated evaluation function, which didn’t have much error to conceal, the networks won 51% of the time, but did so while examining fewer nodes.
The SANE neurogenetic algorithm performed better, producing a slight improvement in strength even with Bill’s evaluator.
Neuro-evolutionary methods are promising, and this looks like a good one. The trouble with learning evaluation functions, and the so-so results using Bill, show the limitations of the method. (As I mentioned a paragraph back, the SANE neurogenetic algorithm by the same authors performed better on the focus network task.)
You could use methods like this to try to learn an evaluation function and a search control regime in tandem. If such a system were given a time limit rather than a depth limit, it might discover interesting ways to trade off thoroughness of search versus thoroughness of evaluation.
In these papers, the networks are evolved with no input other than the board position. If you’re writing a performance program, you could add extra inputs for hand-written evaluation function components, or other information, and let the system decide for itself whether to use the extras. You can even add inputs that are expensive to compute, and only compute them for networks that use the value, so that the system itself decides whether the information is worth its cost.