online papers

Here's every online paper I know of which isn't mentioned elsewhere on the web site. If yours should be here, let me know about it.

Jay : game learning : papers

papers added in January

Evolutionary induction of subroutines (1992, 6 pages)
Peter J. Angeline and Jordan Pollack
Compressed postscript. A genetic programming method which includes learning of subroutines is tested on a tic-tac-toe task.

Coevolving high-level representations (1993, 15 pages)
Peter J. Angeline and Jordan Pollack
Compressed postscript. An expanded version of the above paper, "Evolutionary induction of subroutines," with new results. Here, the subroutines are described as "modules".

Competitive environments evolve better solutions for complex tasks (1993, 8 pages)
Peter J. Angeline and Jordan Pollack
Compressed postscript. Advocates competitive learning for genetic algorithms. In an experiment, tic-tac-toe players evolved by genetic programming were more robust (though not stronger) when a fitness function was defined by tournament play rather than by play against a fixed opponent.

Learning logical exceptions in chess (1994, close to 200 pages)
Michael Bain
Compressed postscript. A PhD thesis. Earlier work on inductive learning applied to the King and Rook versus King chess endgame. There's even a psychology experiment on human learning. See the paper below.

Generalizing closed world specialization: a chess end game application (1995, 48 pages)
Michael Bain, Stephen Muggleton and Ashwin Srinivasan
Compressed postscript. The paper is about inductive logic programming, that is, learning logic programs from data. The authors develop an algorithm which repeatedly applies generalization and closed-world specialization to produce domain theories in the form of logic programs. They test it on the King and Rook versus King chess endgame.

From simple features to sophisticated evaluation functions (20 pages, 1998)
Michael Buro
Compressed postscript. Describes a framework for creating evaluation functions with semi-automatic construction of new evaluation features, generalized from Logistello's evaluator. It's worth a look because it's somewhat different from the usual thing. The results would be more impressive if they were from more domains than only othello.

Opponent modeling in poker (1998)
Darse Billings, Denis Papp, Jonathan Schaeffer, Duane Szafron
Web page. In poker, knowing your opponents' weaknesses is vital. Experiments with the program Loki show that an opponent modeling method improves play against computer opposition.

KnightCap: A chess program that learns by combining TD(lambda) with minimax search (1997, 16 pages)
Jonathan Baxter, Andrew Tridgell and Lex Weaver
Compressed postscript. Describes KnightCap and its learning algorithm, TDLeaf(lambda). TDLeaf(lambda) is a TD(lambda) variant for programs which search a tree and return a principal variation (the line of play that the program expects). It amounts to using TD(lambda) on the end positions of principal variations rather than on the actual game positions. The algorithm is described mathematically, so the presentation is rather dense at points.

Experiments in parameter learning using temporal differences (1998, 25 pages)
Jonathan Baxter, Andrew Tridgell and Lex Weaver
Postscript. This paper gives more details of TDLeaf(lambda) as implemented in KnightCap, of the learning experiments, and of the opening book learning.

TDLeaf(lambda): Combining temporal difference learning with game tree search (1998, 5 pages)
Jonathan Baxter, Andrew Tridgell and Lex Weaver
Compressed postscript. More briefly restates some material from the above paper, and describes a couple of backgammon experiments using the same algorithm.

Temporal-differences based policy iteration and applications in neuro-dynamic programming (1997, 19 pages)
Dimitri P. Bertsekas and Sergey Ioffe
Postscript. There is an online abstract buried on this page. This largely theoretical paper describes the lambda-policy iteration algorithm, which is based on temporal difference ideas. It can be used for the same kinds of jobs as TD(lambda). They implement it in a program which learns to play tetris, showing that it succeeds where a TD(lambda) method fails.

A novel strategy acquisition mechanism for evaluation function learning (1995, 21 pages)
Graham Bevan, Edmund Furse and Derek Smith
This paper disappeared, but there is still an online abstract. Proposes a four-layer model of position evaluation knowledge consisting of meta strategies, strategies, concepts, and primitives. Also see the Learning Board Games web page for an overview of the research.

Monte Carlo go (1993)
Bernd Brügmann
Compressed TeX paper about the Gobble go program, which uses simulated annealing. Check go archive mirrors if you have trouble retrieving this.

Learning models of the opponent strategy in game playing (1993, 12 pages)
David Carmel and Shaul Markovitch
Compressed postscript. An online abstract is buried on this page. M* is a search algorithm which can make use of an opponent model. Experiments using a perfect opponent model in tic-tac-toe and checkers show that the benefit is greatest if the learning program is stronger than its opponent, and that it hurts to overestimate the opponent strength. A learning program which knows the features used in its non-learning, fixed-depth opponent's evaluation function was able to infer an accurate model which improved its results against that opponent.

The M* algorithm: Incorporating opponent models into adversary search (1994, 29 pages)
David Carmel and Shaul Markovitch
Compressed postscript. An online abstract is buried on this page. An expanded version of the previous paper.

Opponent modeling in a multi-agent system (1995, 6 pages)
David Carmel and Shaul Markovitch
Compressed postscript. An online abstract is buried on this page. Learning a deterministic finite automaton opponent model for a class of abstract two-player games. This works if the opponent actually can be represented by a reasonably small DFA.

Kwizatz Haderach: The Universal Supreme Being (1996)
Bjorn Chambless, Charlie Mauck, Ian Romanick and Ken Smith
Web page. Compressed source code is also available from this page. A student project to build a Quixo player using genetic algorithms. They didn't finish.

Co-evolving checkers players using only win, lose, or draw (1999)
Kumar Chellapilla and David B. Fogel
Zipped postscript. The authors evolved neural networks which evaluated checkers positions, evaluating each network in ten games at each generation. The only hand-picked network input was a piece count differential, the key information for evaluating a checkers position. The neural players reached intermediate playing strength.

Go and genetic programming (playing go with filter functions) (1996, 41 pages)
Stephan da Silva
Postscript, with online abstract (this page also points to a compressed copy of the paper). A masters thesis about learning go evaluators using genetic programming over a class of blurring functions. I found it unimpressive.

Using experience-based learning in game playing (1988, 8 pages)
Kenneth A. De Jong and Alan C. Schultz
Compressed postscript. Othello program GINA builds an "experience base" (an opening book) which enables it to quickly learn to defeat non-learning opponents.

Evolving go playing strategy in neural networks (9 pages)
Paul Donnelly, Patrick Corr and Danny Crookes
Compressed postscript. Reports on "preliminary experiments". More ideas than results. Check go archive mirrors if you have trouble retrieving this.

The Golem go program (1991)
Herbert D. Enderton <hde@cs.cmu.edu>
This unix shell archive unpacks into two postscript files, a title page and a paper about a go program which uses neural networks. The title page and paper are also available as straight postscript from this CMU site. Check go archive mirrors if you have trouble retrieving this.

The integration of a priori knowledge into a go playing neural network (1996)
Markus Enzenberger <markus.enzenberger@physik.uni-muenchen.de>
Web page. Also available in various postscript forms from this page (8 pages). The postscript is formatted for European A4-size paper; I had to edit it to print it on my U.S. printer. The NeuroGo program integrates hard-coded game knowledge with a neural network that learns by temporal differences.

Setting parameters by example (1999, 13 pages)
David Eppstein
Abstract with the paper available in various formats. Includes evaluation function tuning as one application of "inverse parametric optimization," an abstract class of problems.

Collaboration and interdependence among limitedly rational agents (1995, 6 pages)
Susan Epstein
Postscript. Discusses the FORR "learning and problem-solving architecture" in general terms. Hoyle and the non-game program Ariadne are instances.

For the right reasons: The FORR architecture for learning in a skill domain (1994, 34 pages)
Susan Epstein
Postscript. Using Hoyle as an example, argues that learning to improve at a skill can be done by starting with a weak theory (as embodied by Hoyle's Advisors) and gradually filling it out with domain-specific knowledge. The paper emphasizes that Hoyle's Advisors can reach a good consensus decision even though they are individually unreliable and may conflict with each other.

Identifying the right reasons: Learning to filter decision makers (1994)
Susan Epstein
The paper claims to be in postscript format, but it's not. I don't know what the format really is, but the text part can be read with an editor. The Hoyle game-learning system learns how good the advice of each of its Advisors (knowledge sources) is, and eliminates irrelevant Advisors.

Prior knowledge strengthens learning to control search in weak theory domains (1992, 53 pages)
Susan Epstein
Postscript. An early paper about Hoyle, interesting for its descriptions of the overall goals and ideas behind Hoyle.

The role of memory and concepts in learning (1992, 42 pages)
Susan Epstein
Postscript. The paper argues that explicit representation of concepts is necessary for memory-efficient learning in reactive systems. Hoyle is an example.

Toward an ideal trainer (1994, 23 pages)
Susan Epstein
Postscript. Argues that a game learner needs "a broad variety of training experience with play at many levels." Experiments with Hoyle support the value of "lesson and practice training" which combines expert "lessons" (play against an expert opponent) with self-play "practice".

Learning new spatially-oriented game-playing agents through experience (1995, 6 pages)
Susan Epstein and Jack Gelfand
Postscript. An earlier paper about the same work as the below paper.

Pattern-based learning and spatially-oriented concept formation in a multi-agent, decision-making expert (1996, 30 pages)
Susan Epstein, Jack Gelfand and Joanna Lesniak
Postscript. A visual pattern recognizer is added as an advisor to the Hoyle game-learning system. It associates small geometric patterns with the game result.

Feature discovery for inductive concept learning (1990, 13 pages)
Tom Fawcett <fawcett@nynexst.com>
Postscript, with an online abstract. The Zenith system learns an evaluation function for othello, features and weights, given the rules of othello and an opponent to practice against. This is an early description, see below for later details.

A hybrid theory of feature generation (1991, 48 pages)
Tom Fawcett <fawcett@nynexst.com>
Postscript, with an online abstract. More about Zenith, as in the paper above, with tons of background info. It was also applied to a telecom network task.

Automatic feature generation for problem solving systems (1992, 12 pages)
Tom Fawcett <fawcett@nynexst.com> and Paul Utgoff
Postscript, with an online abstract. Concentrates on Zenith's feature-discovery system. It discovered a variety of useful features, including one novel one. No mention is made of its playing ability.

Feature discovery for problem solving systems (1993, more than 150 pages)
Tom Fawcett <fawcett@nynexst.com>
Postscript, with an online abstract. Fawcett's PhD dissertation about Zenith (see the last few items, above).

Genetic Othello v1.02 (1999)
Itamar Faybish
Web page. A master's thesis on the straightforward application of a genetic algorithm to optimizing a fixed set of evaluation parameters in the game of othello. Source and Windows executable are available for download.

Using genetic programming to evolve board evaluation functions (1995, 6 pages)
Gabriel Ferrer and Worthy Martin
Postscript. An abstract can be found deeply buried in the Genetic Programming Bibliography. A genetic programming system learns evaluation functions which are used with one-ply lookahead to select moves in a reconstructed version of the ancient Egyptian game of senet, a game remotely similar to backgammon. The players were evaluated in each generation using a knockout tournament. Since there are no strong modern players, it's unknown how strong the learned evaluators are.

Using genetic programming to evolve board evaluation functions (1996, over 65 pages)
Gabriel Ferrer
Postscript. A master's thesis. Considers the games of senet and othello, as in the above paper. The othello players use alpha-beta. Includes some experiments on population seeding with handwritten evaluators.

Learning approximately-admissible heuristics in two-player games (1994)
Evan Fletcher and Armand Prieditis
Compressed postscript. Theoretically-motivated work on evaluation function learning, from the UC Davis AI Lab.

On the automatic generation of case libraries by chunking chess games (1995?, 10 pages)
Stephen Flinter and Mark T. Keane
Postscript. The unfinished TAL chess program will consist of a Library of chess knowledge, constructed automatically from games and used for case-based reasoning, and a Game-Playing component. This paper is about the Library; the Game-Playing component isn't written yet.

Efficient algorithms for learning to play repeated games against computationally bounded adversaries (1995, 10 pages)
Yoav Freund, Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld and Robert Shapire
Compressed postscript, with online abstract. Theoretical results on learning to play against certain classes of simple, imperfect opponents.

Machine learning in computer chess: The next generation (1996, 13 pages)
Johannes Fürnkranz
Compressed postscript, with online abstract. A wide-ranging review of machine learning applied to computer chess. The methods used are categorized as induction of endgame classifiers, explanation-based learning, case-based reasoning, and evaluation function learning.

Knowledge discovery in chess databases: A research proposal (1997, 8 pages)
Johannes Fürnkranz
Compressed postscript. Proposes that data mining in chess databases would be a fruitful line of inquiry.

Learning and improving backgammon strategy
Greg Galperin <grg@ai.mit.edu>
Web page, from Center for Biological and Computational Learning. A parallel computer plays backgammon by rolling out positions in parallel.

Pragmatic reasoning in bridge (1993, 23 pages)
Björn Gambäck, Manny Rayner and Barney Pell
Compressed postscript, with an online abstract. Includes information about a learning bridge program. Also available from this SRI International Cambridge location.

Chunking for experience (1990, 13 pages)
Michael George and Jonathan Schaeffer
Postscript, from this page of Schaeffer's. MACII is a case-based reasoning program for the game of chess. It uses chunking and a similarity method to find positions in its case-base similar to the current position. It can provide advice to a performance chess program. Informal tests with the chess program Phoenix suggested that MACII helped Phoenix avoid making errors immediately after leaving its opening book, a common computer weakness.

A machine learning approach to the game of go (1999, 7 pages)
Thore Graepel
Online abstract (the postscript link was broken when I tried it). Suggests that machine learning is a good way to create a strong go program. Proposes a board representation called the Common Fate Graph (CFG)--showing what points on the board will share a common fate at the end of the game--that can be used with temporal difference learning.

Go, SVM, go (2000, 7 pages)
Thore Graepel, Mike Goutrie, Marco Krüger, and Ralf Herbrich
Online abstract with compressed postscript. They trained support vector machines (SVM) and kernel perceptrons to solve go problems (tsume go) and play 9x9 go. For 9x9 go playing, the training data consisted of games by strong humans played on the Internet Go Server. In two sample games against gnugo, the SVM program lost both.

NeuroDraughts: the role of representation, search, training regime, and architecture in a TD draughts player (1997, 8 pages)
Niall Griffith and Mark Lynch <mark@did.com>
Compressed postscript. Discusses the effects of various factors on the strength of the learned player, with experimental data. There's a longer report on NeuroDraughts by Mark Lynch alone, listed below.

An adaptive video game using the method of temporal differences (1996)
Tim Groth
Text file. A student project, Neuro-Pong. The main goal was to make the game player an interesting opponent for humans.

A simple adaptive procedure leading to correlated equilibrium (1997)
Sergiu Hart and Andreu Mas-Colell
Web page, with online abstract and links to postscript. Also available from EconWPA. A game theoretic analysis showing that a simple learning algorithm converges to a specific kind of optimum. Unfortunately, real learning programs don't have enough information to use this algorithm, so it's difficult to apply in practice.

Bootstrap learning of alpha-beta-evaluation functions (1993, 5 pages)
Alois Heinz and Christoph Hense
Postscript. Proposes learning evaluation functions in the form of decision trees which can use internal alpha-beta cutoffs to speed calculation (taking the alpha and beta values from the search tree). The authors trained progressively stronger evaluators for the game of malawi, using each player to produce training data for its successor, with good results.

Efficient neural net alpha-beta-evaluators (1994, 4 pages)
Alois Heinz
Postscript. Converts the decision tree evaluators of the previous paper into tree-shaped networks, which can be further trained to increase accuracy. The networks retain the tree property that not all nodes need be evaluated to find the output value, so they're efficient.

Lernen von Klassifikationsbäumen für Spielsituationen (1992, over 75 pages)
Christoph Hense
Compressed postscript. Written in German. A graduate thesis from the University of Freiburg. The title means "learning of classification trees for game positions". The thesis includes work on the game of malawi.

Introduction to learning in games
Geoff Howland <ghowland@lupinegames.com>
A short web page about simple forms of opponent modeling for commercial computer games. You can get a quick view of the general state of the art.

A grandmaster chess machine (1990)
Feng-hsiung Hsu, Thomas Anantharaman, Murray Campbell, and Andreas Nowatzyk
Web page. The famous Scientific American article about the Deep Thought chess program. This contains the only online discussion I know about of Deep Thought's evaluation parameter learning algorithm.

Unsupervised learning of go Patterns (1999)
Antti Huima
Web page. Vector Quantization (VQ) for learning patterns with suggested moves in the game of go.

Explorations of the practical issues of learning prediction-control tasks using temporal difference learning methods (1992, over 70 pages)
Charles L. Isbell
Compressed postscript. A master's thesis. Explores the temporal difference algorithm TD(lambda) with radial basis function neural networks on two tasks, one of which is tic-tac-toe.

Bottom-Up Recognition Learning: A compilation-based model of limited-lookahead learning (1994)
Todd Johnson, Jiajie Zhang and Hongbin Wang
Compressed postscript, from this ftp site. BURL, a psychological model of game learning. It was implemented in SOAR and applied to tic-tac-toe.

Genetische Algorithmen zur Bestimmung heuristischer Bewertungsfunktionen bei Spielen (1992, over 100 pages)
Dorothee Kümmerlin
Compressed postscript. Written in German. A graduate thesis from the University of Freiburg. The title means "genetic algorithms for the determination of heuristic evaluation functions in games". The thesis includes work on the game of reversi, of which othello is a recent variant.

Lernen von Bewertungsfunktionen mittels Adaptiver Logischer Netzwerke (1994, over 60 pages)
Marc Leineweber
Compressed postscript. Written in German. There is an online abstract (also in German). It is a graduate thesis from the University of Freiburg. The title means "learning evaluation functions using Adaptive Logical Networks". The game is nine men's morris.

Learning of position evaluation in the game of othello (1995)
Anton Leousky
Postscript, with an online abstract. I have not been able to view or print this paper; I suspect that the postscript may be faulty. The abstract reports that a neural network reached intermediate strength at othello by self-play using temporal differences.

What a neural network can learn about othello (1996)
Anton Leousky and Paul Utgoff
Web page, with online abstract. Also available as postscript. Compares three neural network architectures in the task of learning othello evaluation functions by temporal differences, and analyzes what the networks learned.

Markov games as a framework for multi-agent reinforcement learning (1994, 7 pages)
Michael L. Littman
Postscript, with online abstract. Extends reinforcement learning theory with game theory. Demonstrates learning of a simplified soccer-like game played on a 4x5 grid.

A generalized reinforcement-learning model: Convergence and applications (1996, 9 pages)
Michael L. Littman and Csaba Szepesvári
Postscript, with online abstract. Presents a generalization of the Markov decision process, with special reference to Q-learning and two-player games.

Derivative evaluation function learning using genetic operators (1993, 16 pages)
David H. Lorenz and Shaul Markovitch
Compressed postscript. An online abstract is buried on this page. Learning the derivative of the evaluation function. Tested with the game of checkers.

NeuroDraughts: an application of temporal difference learning to draughts (1997, 41 pages)
Mark Lynch <mark@did.com>
Compressed postscript. A checkers program whose evaluator is a neural network trained by temporal differences, using self-play. The paper also serves as a manual for users of the software; see the online software page. There's a shorter paper about NeuroDraughts with first author Niall Griffith, listed above.

Incorporating advice into agents that learn from reinforcements (1994, 29 pages)
Richard Maclin and Jude W. Shavlik
Postscript. Also available as compressed postscript. The RATLE system is a neural network which learns on its own using Q-learning. It translates user advice into extra network nodes, which then participate in learning like the other nodes. It was tested on a video game-like task.

Incorporating advice into agents that learn from reinforcements (1994, 6 pages)
Richard Maclin and Jude W. Shavlik
Postscript. A shorter version of the preceding paper with the same title.

Learning of resource allocation strategies for game playing (1993, 16 pages)
Shaul Markovitch and Yaron Sella
Compressed postscript. An abstract is buried on this page. Learning how much time to spend on each move. Tested with the game of checkers. The learning method is less sophisticated than the non-learning methods currently employed by chess programs.

Tuning evaluation functions for search (6 pages)
Chris McConnell
Postscript. This appears to be an extended abstract rather than a paper. Mentions a new technique for estimating evaluation function parameters from a database of human games with some extra annotation.

Kinglet (1998)
Chris Moreton
Web page. A genetic algorithm plays the game of Kinglet, a chess variant.

Learning to play games from experience: An application of artificial neural networks and temporal difference learning (1993, over 70 pages)
Daniel Kenneth Olson
Compressed postscript. A master's thesis. Learning blackjack and tic-tac-toe with temporal differences, keeping the goal of a more general learning system in sight.

Exploratory learning in the game of go (1992)
Barney Pell
Compressed postscript, with an online abstract. Trading off strong moves against exploratory moves.

Skilled like a human: A comparison of human and computer game playing (1995, 6 pages)
Mary Jo Rattermann and Susan Epstein
Postscript. A psychological experiment comparing Hoyle with human game players. "The results of the experiment support our conjecture that Hoyle is a plausible model of human game playing."

Evolving neural networks to play go (1996, 18 pages)
Norman Richards
Postscript. Applying the SANE neurogenetic algorithm to 9x9 go, using knowledge-free networks. The player learned to defeat Wally, even with a two-stone handicap. I think similar approaches using knowledge are promising.

Evolving neural networks to play go (1997, 8 pages)
Norman Richards, David Moriarty, Paul McQuesten and Risto Miikkulainen
Postscript. A shorter, more polished presentation of the above work.

Competitive learning (1994, 2 pages)
Christopher D. Rosin and Richard K. Belew
Postscript. Defines the framework of competitive learning, in which the only knowledge you have of the quality of solutions to a problem comes from competitions which show one solution to be better than another. This is a natural framework for game learning.

Methods for competitive co-evolution: Finding opponents worth beating (1995, 11 pages)
Christopher D. Rosin and Richard K. Belew
Postscript. Proposes the techniques of competitive fitness sharing and shared opponent sampling to maintain population diversity in a coevolutionary genetic algorithm. The methods are tested on tic-tac-toe and nim, and then tried out with cellular automata playing 7x7 go, which became strong enough to defeat Wally, a weak public domain program.

A competitive approach to game learning (1996, 10 pages)
Christopher D. Rosin and Richard K. Belew
Postscript. A theoretical analysis of game learning, with the goal of identifying polynomial-time algorithms guaranteed to find perfect game strategies. The analytic framework separates the problem into strategy learning, which comes up with new game strategies, and competitive learning (see "Competitive Learning", above), which compares strategies.

New methods for competitive co-evolution (1996, 25 pages)
Christopher D. Rosin and Richard K. Belew
Postscript. Improves competitive coevolution with fitness sharing (to promote diversity), sample sharing (to maintain diversity while reducing evaluation costs), and a "hall of fame" that retains a sample of old champions to insure continuing improvement. The experiments are informative, with lots of data.

Coevolutionary search among adversaries (1997, 214 pages)
Christopher D. Rosin
Compressed postscript. A PhD thesis on the same topics as the papers above, with additional background, details, and results. Includes experiments with the games of tic-tac-toe, a 3D tic-tac-toe variant, nim, and go. The go experiment was far more sophisticated than the one reported in "Finding opponents worth beating", above, and I found it very interesting.

GPRobots and GPTeams - competition, co-evolution and co-operation in genetic programming (1995, 8 pages)
Conor Ryan
Compressed postscript. Proposes a robot programming game as a benchmark for genetic programming. The results are not too impressive.

Neural networks and chess (1993, 41 pages)
Martin Schmidt
Compressed postscript, from this ftp directory. Neural nets learned absolute or relative evaluations of chess positions using a cascade correlation-type algorithm. The networks played using a selective search which is not fully described here. The presentation is poor.

Temporal difference learning and chess (1994, over 100 pages)
Martin Schmidt
Compressed archive of twelve postscript files, from this ftp directory. There is an online abstract and a guide to the contents of the archive. Systematic, though sparse, tests of variations of TD(lambda) with knowledge-free backprop neural nets, starting with simple chess endgames and working up to full chess games. Extensive charts of raw results.

Markybot (1997)
Jay Scott
Web page. The program plays a financial trading game, relying on a learned statistical model of price fluctuations.

An engine for a Go playing computer program (1999)
Peter Smith <peter.smith@smallworld.co.uk>
Web page with annoying frames. You can download a Windows 95 executable, for a simple program, tarkus. A go program with evaluation components optimized by simulated annealing.

An analysis of forward pruning (1994, 6 pages)
Stephen J.J. Smith and Dana S. Nau
Postscript. Models a weak form of selective search.

Strategic planning for imperfect-information games (1993, 8 pages)
Stephen J.J. Smith and Dana S. Nau
Postscript. A planning method used in Tignum, a prototype bridge program.

Artificial neural nets applied to strategic games (1996, 20 pages)
Peer Sommerlund
Compressed postscript, with online abstract. A brief explanation of neural nets, a few examples of how they've been applied to evaluation function learning, and a set of final experiments with the game connect-four. The outlook is more practical than academic. The conclusion: "neural nets do not belong in a program that plays connect-four."

Machine learning, game play, and go (1991, 90 pages)
David Stoutamire
Compressed postscript. An extract from a master's thesis. Check go archive mirrors if you have trouble retrieving this.

Multi-criteria reinforcement learning (1997, 13 pages)
Csaba Szepesvári
Compressed postscript. A theoretical paper exploring a class of reinforcement learning algorithms, with small experiments using the game of tic-tac-toe. A game player with two ordered goals, (1) play winning moves and (2) win quickly rather than slowly, learned to play a higher proportion of winning moves than a player with only goal (1).

Automated feature selection to maximize learning in artificial intelligence (1996)
Joseph Turian
Web page. There is an online abstract with pointers to other versions of the paper. Using the game of backgammon, tests two methods of adding new, randomly-generated features to neural networks. The goal is to raise the level at which learning plateaus.

Evolution of neural networks to play the game of dots-and-boxes (1996, 8 pages)
Lex Weaver and Terry Bossomaier
Postscript. Here is another copy of the paper with online abstract. Tries both direct evolution and co-evolution schemes, and tests the players against simple benchmark strategies and backprop-trained networks.

How hard is the correct coding of an easy endgame (1994, 11 pages)
Jean-Christophe Weill
Compressed postscript. The author, a non-chessplayer, coded predicates which allowed the decision tree induction algorithm ID3 to classify queen-and-king versus queen-and-king chess endgame positions as won or drawn.

Programmes d'échecs de championnat: Architecture logicielle, synthèse de fonctions d'évaluation, parallélisme de recherche (1995, close to 200 pages)
Jean-Christophe Weill
Compressed postscript. A long PhD thesis, written in French. The title means "Championship chess programs: logical architecture, synthesis of evaluation functions, search parallelism". Describes the chess programs Cumulus and Ecume. There is a short discussion of machine learning of evaluation functions.

TD learning of game evaluation functions with hierarchical neural architectures (1995, about 85 pages)
Marco A. Wiering
Compressed postscript, from this publications page. A master's thesis that can be seen as an extension of Justin Boyan's work (see MVP Backgammon on other backgammon programs). The author used a temporal difference method to train modular neural networks to play tic-tac-toe and backgammon endgames.

Adaptive critic design in learning to play game of go (1997)
Raonak Zaman, Danil Prokhorov, and Donald C. Wunsch II
Web page. The authors trained a neural network, using what they call a Heuristic Dynamic Programming (HDP) adaptive critic. After 6000 training games, the network had learned to defeat Wally, a weak free program, 1% of the time.

updated 3 January 2001