temporal difference learning with genetic algorithms

Jay : game learning : ideas : TD with GA

Most of the work I’ve seen with temporal difference learning uses neural networks to minimize the temporal difference errors. That’s a good method because the network can learn from each move of a game, resulting in relatively fast learning.

It doesn’t seem appropriate for a genetic algorithm to learn from each move of a game, so you might overlook them altogether. But consider it from the opposite point of view: most work with genetic algorithms that I’ve seen relies solely on game outcome for learning.

A summary measure like RMS temporal difference error over a game contains more information and is less noisy than the game outcome alone. Members of the GA population should get reliable fitness measures after fewer games, so learning should be faster. GA’s are CPU-intensive, so the speedup could be invaluable.

There’s one degenerate solution that you might have to watch out for: A player that predicted a zero probability of winning for each move could be right every time!


updated 8 August 2000