Here's every online paper I know of which isn't mentioned elsewhere on the web site. If yours should be here, let me know about it.
| Jay : game learning : papers |
papers added in January
Evolutionary induction of subroutines (1992, 6 pages)
Peter J. Angeline and
Jordan Pollack
Compressed postscript.
A genetic programming method which includes learning of subroutines
is tested on a tic-tac-toe task.
Coevolving high-level representations (1993, 15 pages)
Peter J. Angeline and
Jordan Pollack
Compressed postscript.
An expanded version of the above paper, "Evolutionary induction of
subroutines," with new results.
Here, the subroutines are described as "modules".
Competitive environments evolve better solutions for complex tasks
(1993, 8 pages)
Peter J. Angeline and
Jordan Pollack
Compressed postscript.
Advocates competitive learning for genetic algorithms.
In an experiment, tic-tac-toe players evolved by genetic programming
were more robust (though not stronger) when a fitness function was
defined by tournament play rather than by play against a fixed opponent.
Learning logical exceptions in chess (1994, close to 200 pages)
Michael Bain
Compressed postscript. A PhD thesis.
Earlier work on inductive learning applied to the King and Rook versus King
chess endgame. There's even a psychology experiment on human learning.
See the paper below.
Generalizing closed world specialization:
a chess end game application (1995, 48 pages)
Michael Bain,
Stephen Muggleton and
Ashwin Srinivasan
Compressed postscript.
The paper is about inductive logic programming, that is,
learning logic programs from data.
The authors develop an algorithm which repeatedly applies generalization and
closed-world specialization to produce domain theories
in the form of logic programs.
They test it on the King and Rook versus King chess endgame.
From simple features to sophisticated evaluation functions
(20 pages, 1998)
Michael Buro
Compressed postscript.
Describes a framework for creating evaluation functions with
semi-automatic construction of new evaluation features, generalized
from Logistello's
evaluator.
It's worth a look because it's somewhat different from the usual thing.
The results would be more impressive if they were from more domains
than only othello.
Opponent modeling in poker (1998)
Darse Billings,
Denis Papp,
Jonathan Schaeffer,
Duane Szafron
Web page. In poker, knowing your opponents' weaknesses is vital.
Experiments with the program Loki show that an opponent modeling method
improves play against computer opposition.
KnightCap: A chess program that learns
by combining TD(lambda) with minimax search (1997, 16 pages)
Jonathan Baxter,
Andrew Tridgell and
Lex Weaver
Compressed postscript.
Describes KnightCap and its learning algorithm, TDLeaf(lambda).
TDLeaf(lambda) is a TD(lambda) variant for programs which
search a tree and return a principal variation (the line of play
that the program expects). It amounts to using TD(lambda) on
the end positions of principal variations rather than on
the actual game positions.
The algorithm is described mathematically, so the presentation
is rather dense at points.
Experiments in parameter learning using temporal differences
(1998, 25 pages)
Jonathan Baxter,
Andrew Tridgell and
Lex Weaver
Postscript. This paper gives more details of TDLeaf(lambda) as
implemented in KnightCap, of the learning experiments, and of
the opening book learning.
TDLeaf(lambda): Combining temporal difference learning
with game tree search (1998, 5 pages)
Jonathan Baxter,
Andrew Tridgell and
Lex Weaver
Compressed postscript.
More briefly restates some material from the above paper, and
describes a couple of backgammon experiments using the same
algorithm.
Temporal-differences based policy iteration
and applications in neuro-dynamic programming
(1997, 19 pages)
Dimitri P. Bertsekas and
Sergey Ioffe
Postscript. There is an online abstract buried
on this page.
This largely theoretical paper describes the lambda-policy iteration
algorithm, which is based on temporal difference ideas.
It can be used for the same kinds of jobs as TD(lambda).
They implement it in a program which learns to play tetris, showing
that it succeeds where a TD(lambda) method fails.
A novel strategy acquisition mechanism for evaluation function learning
(1995, 21 pages)
Graham Bevan,
Edmund Furse
and Derek Smith
This paper disappeared, but there is still an
online abstract.
Proposes a four-layer model of position evaluation knowledge
consisting of meta strategies, strategies, concepts, and primitives.
Also see the
Learning Board Games web page for an overview of the research.
Monte Carlo go (1993)
Bernd Brügmann
Compressed TeX paper about the
Gobble go program, which uses simulated annealing.
Check go archive mirrors if you have trouble retrieving this.
Learning models of the opponent strategy in game playing
(1993, 12 pages)
David Carmel and
Shaul Markovitch
Compressed postscript. An online abstract is buried on
this page.
M* is a search algorithm which can make use of
an opponent model. Experiments using a perfect opponent model in
tic-tac-toe and checkers show
that the benefit is greatest if the learning program is stronger
than its opponent, and that it hurts to overestimate the opponent strength.
A learning program which knows the features used in its non-learning,
fixed-depth opponent's evaluation function was able to infer an accurate
model which improved its results against that opponent.
The M* algorithm:
Incorporating opponent models into adversary search (1994, 29 pages)
David Carmel and
Shaul Markovitch
Compressed postscript. An online abstract is buried on
this page.
An expanded version of the previous paper.
Opponent modeling in a multi-agent system (1995, 6 pages)
David Carmel and
Shaul Markovitch
Compressed postscript. An online abstract is buried on
this page.
Learning a deterministic finite automaton opponent model for
a class of abstract two-player games. This works if the opponent
actually can be represented by a reasonably small DFA.
Kwizatz Haderach: The Universal Supreme Being (1996)
Bjorn Chambless,
Charlie Mauck,
Ian Romanick and
Ken Smith
Web page.
Compressed source code is also available from
this page.
A student project to build a Quixo player using genetic algorithms.
They didn't finish.
Co-evolving checkers players using only win, lose, or draw (1999)
Kumar Chellapilla and
David B. Fogel
Zipped postscript. The authors evolved neural networks which evaluated
checkers positions, evaluating each network in ten games at each generation.
The only hand-picked network input was a piece count differential, the key
information for evaluating a checkers position. The neural players reached
intermediate playing strength.
Go and genetic programming (playing go with filter functions) (1996, 41 pages)
Stephan da Silva
Postscript, with
online abstract (this page also points to a compressed copy of the paper).
A masters thesis about learning go evaluators using genetic programming
over a class of blurring functions. I found it unimpressive.
Using experience-based learning in game playing (1988, 8 pages)
Kenneth A. De Jong and
Alan C. Schultz
Compressed postscript.
Othello program GINA builds an "experience base"
(an opening book) which enables it to quickly learn to defeat
non-learning opponents.
Evolving go playing strategy in neural networks (9 pages)
Paul Donnelly,
Patrick Corr and
Danny Crookes
Compressed postscript. Reports on "preliminary experiments".
More ideas than results.
Check go archive mirrors if you have trouble retrieving this.
The Golem go program (1991)
Herbert D. Enderton
<hde@cs.cmu.edu>
This unix shell archive unpacks into two postscript files,
a title page and a paper about a go program which uses neural
networks.
The
title page and
paper
are also available as straight postscript from this
CMU site.
Check go archive mirrors if you have trouble retrieving this.
The integration of a priori knowledge
into a go playing neural network (1996)
Markus Enzenberger
<markus.enzenberger@physik.uni-muenchen.de>
Web page. Also available in various postscript forms from
this page (8 pages).
The postscript is formatted for European A4-size paper; I had
to edit it to print it on my U.S. printer.
The NeuroGo program integrates hard-coded game knowledge with
a neural network that learns by temporal differences.
Setting parameters by example (1999, 13 pages)
David Eppstein
Abstract with the paper available in various formats.
Includes evaluation function tuning as one application of
"inverse parametric optimization," an abstract class of problems.
Collaboration and interdependence among
limitedly rational agents (1995, 6 pages)
Susan Epstein
Postscript.
Discusses the FORR "learning and problem-solving architecture" in
general terms. Hoyle and the non-game program Ariadne are instances.
For the right reasons: The FORR architecture
for learning in a skill domain (1994, 34 pages)
Susan Epstein
Postscript.
Using Hoyle as an example, argues that learning to improve at a skill can
be done by starting with a weak theory (as embodied by Hoyle's Advisors)
and gradually filling it out with domain-specific knowledge. The paper
emphasizes that Hoyle's Advisors can reach a good consensus decision even
though they are individually unreliable and may conflict with each other.
Identifying the right reasons:
Learning to filter decision makers (1994)
Susan Epstein
The paper claims to be in postscript format, but it's not. I don't know
what the format really is, but the text part can be read with an editor.
The Hoyle game-learning system learns how good the advice of each of
its Advisors (knowledge sources) is, and eliminates irrelevant
Advisors.
Prior knowledge strengthens learning to control search
in weak theory domains (1992, 53 pages)
Susan Epstein
Postscript. An early paper about Hoyle, interesting for its
descriptions of the overall goals and ideas behind Hoyle.
The role of memory and concepts in learning (1992, 42 pages)
Susan Epstein
Postscript.
The paper argues that explicit representation of concepts is necessary
for memory-efficient learning in reactive systems. Hoyle is an
example.
Toward an ideal trainer (1994, 23 pages)
Susan Epstein
Postscript.
Argues that a game learner needs "a broad variety of training experience
with play at many levels." Experiments with Hoyle support the value of
"lesson and practice training" which combines expert "lessons" (play against
an expert opponent) with self-play "practice".
Learning new spatially-oriented game-playing agents through experience
(1995, 6 pages)
Susan Epstein and
Jack Gelfand
Postscript.
An earlier paper about the same work as the below paper.
Pattern-based learning and spatially-oriented concept formation
in a multi-agent, decision-making expert (1996, 30 pages)
Susan Epstein,
Jack Gelfand and
Joanna Lesniak
Postscript.
A visual pattern recognizer is added as an advisor to the Hoyle
game-learning system. It associates small geometric patterns with
the game result.
Feature discovery for inductive concept learning (1990, 13 pages)
Tom Fawcett
<fawcett@nynexst.com>
Postscript, with an
online abstract.
The Zenith system learns an evaluation function for othello,
features and weights, given the rules of othello and an opponent
to practice against. This is an early description, see below for
later details.
A hybrid theory of feature generation (1991, 48 pages)
Tom Fawcett
<fawcett@nynexst.com>
Postscript, with an
online abstract.
More about Zenith, as in the paper above, with tons of background info.
It was also applied to a telecom network task.
Automatic feature generation for problem solving systems
(1992, 12 pages)
Tom Fawcett
<fawcett@nynexst.com> and
Paul Utgoff
Postscript, with an
online abstract.
Concentrates on Zenith's feature-discovery system.
It discovered a variety of useful features, including one novel one.
No mention is made of its playing ability.
Feature discovery for problem solving systems
(1993, more than 150 pages)
Tom Fawcett
<fawcett@nynexst.com>
Postscript, with an
online abstract.
Fawcett's PhD dissertation about Zenith (see the last few items, above).
Genetic Othello v1.02 (1999)
Itamar Faybish
Web page. A master's thesis on the straightforward application of
a genetic algorithm to optimizing a fixed set of evaluation parameters
in the game of othello. Source and Windows executable are available
for download.
Using genetic programming to evolve board evaluation functions
(1995, 6 pages)
Gabriel Ferrer and
Worthy Martin
Postscript. An abstract can be found deeply buried in the
Genetic Programming Bibliography.
A genetic programming system learns evaluation functions which are
used with one-ply lookahead to select moves in a reconstructed version
of the ancient Egyptian game of
senet,
a game remotely similar to backgammon.
The players were evaluated in each generation using a knockout tournament.
Since there are no strong modern
players, it's unknown how strong the learned evaluators are.
Using genetic programming to evolve board evaluation functions
(1996, over 65 pages)
Gabriel Ferrer
Postscript. A master's thesis.
Considers the games of
senet
and othello, as in the above paper. The othello players use alpha-beta.
Includes some experiments on population seeding with handwritten
evaluators.
Learning approximately-admissible heuristics in two-player games
(1994)
Evan Fletcher and
Armand Prieditis
Compressed postscript.
Theoretically-motivated work on evaluation function learning,
from the
UC Davis AI Lab.
On the automatic generation of case libraries by chunking chess games
(1995?, 10 pages)
Stephen Flinter and Mark T. Keane
Postscript.
The unfinished TAL chess program will consist of a Library of chess
knowledge, constructed automatically from games
and used for case-based reasoning,
and a Game-Playing component.
This paper is about the Library; the Game-Playing component
isn't written yet.
Efficient algorithms for learning to play repeated games
against computationally bounded adversaries (1995, 10 pages)
Yoav Freund,
Michael Kearns,
Yishay Mansour,
Dana Ron,
Ronitt Rubinfeld and
Robert Shapire
Compressed postscript, with
online abstract.
Theoretical results on learning to play against certain classes
of simple, imperfect opponents.
Machine learning in computer chess:
The next generation (1996, 13 pages)
Johannes Fürnkranz
Compressed postscript, with
online abstract.
A wide-ranging review of machine learning applied to computer chess.
The methods used are categorized as induction of endgame classifiers,
explanation-based learning, case-based reasoning, and evaluation
function learning.
Knowledge discovery in chess databases: A research proposal
(1997, 8 pages)
Johannes Fürnkranz
Compressed postscript.
Proposes that data mining in chess databases
would be a fruitful line of inquiry.
Learning and improving backgammon strategy
Greg Galperin
<grg@ai.mit.edu>
Web page, from
Center for Biological and Computational Learning.
A parallel computer plays backgammon by rolling out positions in parallel.
Pragmatic reasoning in bridge (1993, 23 pages)
Björn Gambäck,
Manny Rayner and
Barney Pell
Compressed postscript, with an
online abstract.
Includes information about a learning bridge program.
Also available from
this SRI International Cambridge location.
Chunking for experience (1990, 13 pages)
Michael George and
Jonathan Schaeffer
Postscript, from
this page of Schaeffer's.
MACII is a case-based reasoning program for the game of chess.
It uses chunking and a
similarity method to find positions in its case-base similar
to the current position. It can provide advice to a performance
chess program. Informal tests with the chess program Phoenix
suggested that MACII helped Phoenix avoid making errors immediately
after leaving its opening book, a common computer weakness.
A machine learning approach to the game of go (1999, 7 pages)
Thore Graepel
Online abstract (the postscript link was broken when I tried it).
Suggests that machine learning is a good way to create a strong go program.
Proposes a board representation called the Common Fate Graph (CFG)--showing
what points on the board will share a common fate at the end of the game--that
can be used with temporal difference learning.
Go, SVM, go (2000, 7 pages)
Thore Graepel,
Mike Goutrie, Marco Krüger, and Ralf Herbrich
Online abstract with compressed postscript.
They trained support vector machines (SVM) and kernel perceptrons
to solve go problems (tsume go) and play 9x9 go.
For 9x9 go playing, the training data consisted of games by strong humans
played on the Internet Go Server. In two sample games against gnugo,
the SVM program lost both.
NeuroDraughts: the role of representation, search, training regime,
and architecture in a TD draughts player (1997, 8 pages)
Niall Griffith and
Mark Lynch <mark@did.com>
Compressed postscript.
Discusses the effects of various factors on the strength of the
learned player, with experimental data. There's a longer report
on NeuroDraughts by Mark Lynch alone, listed below.
An adaptive video game using the method of temporal differences
(1996)
Tim Groth
Text file. A student project, Neuro-Pong.
The main goal was to make the game player an interesting
opponent for humans.
A simple adaptive procedure leading to correlated equilibrium
(1997)
Sergiu Hart and
Andreu Mas-Colell
Web page, with online abstract and links to postscript.
Also available from
EconWPA.
A game theoretic analysis showing that a simple learning algorithm
converges to a specific kind of optimum.
Unfortunately, real learning programs don't have enough information
to use this algorithm, so it's difficult to apply in practice.
Bootstrap learning of alpha-beta-evaluation functions
(1993, 5 pages)
Alois Heinz and Christoph Hense
Postscript.
Proposes learning evaluation functions in the form of
decision trees which
can use internal alpha-beta cutoffs to speed calculation
(taking the alpha and beta values from the search tree).
The authors trained progressively stronger evaluators
for the game of malawi, using each player to produce
training data for its successor, with good results.
Efficient neural net alpha-beta-evaluators
(1994, 4 pages)
Alois Heinz
Postscript. Converts the decision tree evaluators of the previous
paper into tree-shaped networks, which can be further trained to
increase accuracy. The networks retain the tree property that not
all nodes need be evaluated to find the output value, so they're
efficient.
Lernen von Klassifikationsbäumen für Spielsituationen
(1992, over 75 pages)
Christoph Hense
Compressed postscript. Written in German.
A
graduate thesis from the
University of Freiburg.
The title means "learning of classification trees for game
positions". The thesis includes work on the game of malawi.
Introduction to learning in games
Geoff Howland <ghowland@lupinegames.com>
A short web page about simple forms of opponent modeling for
commercial computer games. You can get a quick view of the
general state of the art.
A grandmaster chess machine (1990)
Feng-hsiung Hsu, Thomas Anantharaman, Murray Campbell, and
Andreas Nowatzyk
Web page. The famous Scientific American article about
the Deep Thought chess program.
This contains the only online discussion I know about of Deep Thought's
evaluation parameter learning algorithm.
Unsupervised learning of go Patterns (1999)
Antti Huima
Web page. Vector Quantization (VQ) for learning patterns with
suggested moves in the game of go.
Explorations of the practical issues of learning prediction-control tasks
using temporal difference learning methods (1992, over 70 pages)
Charles L. Isbell
Compressed postscript. A master's thesis.
Explores the temporal difference algorithm TD(lambda) with radial basis
function neural networks on two tasks, one of which is tic-tac-toe.
Bottom-Up Recognition Learning:
A compilation-based model of limited-lookahead learning
(1994)
Todd Johnson,
Jiajie Zhang and
Hongbin Wang
Compressed postscript, from this
ftp site.
BURL, a psychological model of game learning.
It was implemented in SOAR and applied to tic-tac-toe.
Genetische Algorithmen zur Bestimmung heuristischer Bewertungsfunktionen
bei Spielen (1992, over 100 pages)
Dorothee Kümmerlin
Compressed postscript. Written in German.
A
graduate thesis from the
University of Freiburg.
The title means "genetic algorithms for the determination
of heuristic evaluation functions in games".
The thesis includes work on the game of reversi, of which
othello is a recent variant.
Lernen von Bewertungsfunktionen mittels Adaptiver Logischer Netzwerke
(1994, over 60 pages)
Marc Leineweber
Compressed postscript. Written in German. There is an
online abstract (also in German). It is a
graduate thesis from the
University of Freiburg.
The title means "learning evaluation functions using
Adaptive Logical Networks".
The game is nine men's morris.
Learning of position evaluation in the game of othello
(1995)
Anton Leousky
Postscript, with an
online abstract.
I have not been able to view or print this paper; I suspect that the
postscript may be faulty.
The abstract reports that a neural network reached intermediate
strength at othello by self-play using temporal differences.
What a neural network can learn about othello (1996)
Anton Leousky and
Paul Utgoff
Web page, with
online abstract.
Also available
as postscript.
Compares three neural network architectures in the task of
learning othello evaluation functions by temporal differences,
and analyzes what the networks learned.
Markov games as a framework for multi-agent reinforcement learning
(1994, 7 pages)
Michael L. Littman
Postscript, with
online abstract.
Extends reinforcement learning theory with game theory.
Demonstrates learning of a simplified soccer-like game
played on a 4x5 grid.
A generalized reinforcement-learning model: Convergence and applications
(1996, 9 pages)
Michael L. Littman and
Csaba Szepesvári
Postscript, with
online abstract.
Presents a generalization of the Markov decision process, with
special reference to Q-learning and two-player games.
Derivative evaluation function learning using genetic operators
(1993, 16 pages)
David H. Lorenz and
Shaul Markovitch
Compressed postscript.
An online abstract is buried on
this page.
Learning the derivative of the evaluation function.
Tested with the game of checkers.
NeuroDraughts: an application of temporal difference learning to draughts
(1997, 41 pages)
Mark Lynch <mark@did.com>
Compressed postscript.
A checkers program whose evaluator is a neural network trained
by temporal differences, using self-play.
The paper also serves as a manual for users of the software; see the
online software
page. There's a shorter paper about NeuroDraughts with first author
Niall Griffith, listed above.
Incorporating advice into agents that learn from reinforcements
(1994, 29 pages)
Richard Maclin and
Jude W. Shavlik
Postscript. Also available as
compressed postscript.
The RATLE system is a neural network which learns on its own using
Q-learning. It translates user advice into extra network nodes,
which then participate in learning like the other nodes.
It was tested on a video game-like task.
Incorporating advice into agents that learn from reinforcements
(1994, 6 pages)
Richard Maclin and
Jude W. Shavlik
Postscript. A shorter version of the preceding paper with the same title.
Learning of resource allocation strategies for game playing
(1993, 16 pages)
Shaul Markovitch and Yaron Sella
Compressed postscript.
An abstract is buried on
this page.
Learning how much time to spend on each move.
Tested with the game of checkers.
The learning method is less sophisticated than
the non-learning methods currently employed by chess programs.
Tuning evaluation functions for search (6 pages)
Chris McConnell
Postscript. This appears to be an extended abstract rather than a paper.
Mentions a new technique for estimating evaluation function parameters
from a database of human games with some extra annotation.
Kinglet (1998)
Chris Moreton
Web page. A genetic algorithm plays the game of Kinglet, a chess variant.
Learning to play games from experience:
An application of artificial neural
networks and temporal difference learning (1993, over 70 pages)
Daniel Kenneth Olson
Compressed postscript. A master's thesis.
Learning blackjack and tic-tac-toe with temporal differences,
keeping the goal of a more general learning system in sight.
Exploratory learning in the game of go (1992)
Barney Pell
Compressed postscript, with an
online abstract.
Trading off strong moves against exploratory moves.
Skilled like a human:
A comparison of human and computer game playing (1995, 6 pages)
Mary Jo Rattermann and
Susan Epstein
Postscript.
A psychological experiment comparing Hoyle with human game players.
"The results of the experiment support our conjecture that Hoyle
is a plausible model of human game playing."
Evolving neural networks to play go (1996, 18 pages)
Norman Richards
Postscript. Applying the
SANE neurogenetic algorithm
to 9x9 go, using knowledge-free networks.
The player learned to defeat Wally, even with a two-stone handicap.
I think similar approaches using knowledge are promising.
Evolving neural networks to play go (1997, 8 pages)
Norman Richards,
David Moriarty,
Paul McQuesten and
Risto Miikkulainen
Postscript.
A shorter, more polished presentation of the above work.
Competitive learning (1994, 2 pages)
Christopher D. Rosin and
Richard K. Belew
Postscript. Defines the framework of competitive learning,
in which the only knowledge you have of the quality of solutions
to a problem comes from competitions which show one solution
to be better than another. This is a natural framework
for game learning.
Methods for competitive co-evolution:
Finding opponents worth beating (1995, 11 pages)
Christopher D. Rosin and
Richard K. Belew
Postscript. Proposes the techniques of
competitive fitness sharing and shared opponent sampling
to maintain population diversity in a coevolutionary
genetic algorithm. The methods are tested on tic-tac-toe
and nim, and then tried out with cellular automata
playing 7x7 go, which became strong enough to defeat Wally,
a weak public domain program.
A competitive approach to game learning (1996, 10 pages)
Christopher D. Rosin and
Richard K. Belew
Postscript. A theoretical analysis of game learning, with the goal
of identifying polynomial-time algorithms guaranteed to find perfect
game strategies.
The analytic framework separates the problem into strategy learning, which
comes up with new game strategies, and competitive learning
(see "Competitive Learning", above),
which compares strategies.
New methods for competitive co-evolution (1996, 25 pages)
Christopher D. Rosin and
Richard K. Belew
Postscript.
Improves competitive coevolution with
fitness sharing (to promote diversity),
sample sharing (to maintain diversity while reducing evaluation costs),
and a "hall of fame" that retains a sample of old champions
to insure continuing improvement.
The experiments are informative, with lots of data.
Coevolutionary search among adversaries (1997, 214 pages)
Christopher D. Rosin
Compressed postscript.
A PhD thesis on the same topics as the papers above,
with additional background, details, and results.
Includes experiments with the games of tic-tac-toe,
a 3D tic-tac-toe variant, nim, and go.
The go experiment was far more sophisticated than the
one reported in "Finding opponents worth beating", above,
and I found it very interesting.
GPRobots and GPTeams -
competition, co-evolution and co-operation in genetic programming
(1995, 8 pages)
Conor Ryan
Compressed postscript.
Proposes a robot programming game as a benchmark for
genetic programming. The results are not too impressive.
Neural networks and chess (1993, 41 pages)
Martin Schmidt
Compressed postscript, from
this ftp directory.
Neural nets learned absolute or relative evaluations of chess positions
using a cascade correlation-type algorithm. The networks played
using a selective search which is not fully described here.
The presentation is poor.
Temporal difference learning and chess (1994, over 100 pages)
Martin Schmidt
Compressed archive of twelve postscript files, from
this ftp directory. There is an
online abstract and a
guide to the contents of the archive.
Systematic, though sparse, tests of variations of TD(lambda) with
knowledge-free backprop neural nets,
starting with simple chess endgames
and working up to full chess games.
Extensive charts of raw results.
Markybot (1997)
Jay Scott
Web page. The program plays a financial trading game, relying
on a learned statistical model of price fluctuations.
An engine for a Go playing computer program (1999)
Peter Smith <peter.smith@smallworld.co.uk>
Web page with annoying frames. You can download a Windows 95 executable,
for a simple program, tarkus.
A go program with evaluation components optimized by simulated annealing.
An analysis of forward pruning (1994, 6 pages)
Stephen J.J. Smith and
Dana S. Nau
Postscript. Models a weak form of selective search.
Strategic planning for imperfect-information games (1993, 8 pages)
Stephen J.J. Smith and
Dana S. Nau
Postscript.
A planning method used in Tignum, a prototype bridge program.
Artificial neural nets applied to strategic games (1996, 20 pages)
Peer Sommerlund
Compressed postscript, with
online abstract.
A brief explanation of neural nets, a few examples of how they've been
applied to evaluation function learning, and a set of final experiments
with the game connect-four. The outlook is more practical than academic.
The conclusion: "neural nets do not belong in a program that plays
connect-four."
Machine learning, game play, and go (1991, 90 pages)
David Stoutamire
Compressed postscript. An extract from a master's thesis.
Check go archive mirrors if you have trouble retrieving this.
Multi-criteria reinforcement learning (1997, 13 pages)
Csaba Szepesvári
Compressed postscript. A theoretical paper exploring a class
of reinforcement learning algorithms, with small experiments
using the game of tic-tac-toe.
A game player with two ordered goals, (1) play winning moves and
(2) win quickly rather than slowly, learned to play a higher
proportion of winning moves than a player with only goal (1).
Automated feature selection to maximize learning
in artificial intelligence (1996)
Joseph Turian
Web page. There is an
online abstract with pointers to other versions of the paper.
Using the game of backgammon, tests two methods of adding
new, randomly-generated features to neural networks.
The goal is to raise the level at which learning plateaus.
Evolution of neural networks to play the game of dots-and-boxes
(1996, 8 pages)
Lex Weaver and
Terry Bossomaier
Postscript. Here is
another copy of the paper with online abstract.
Tries both direct evolution and co-evolution schemes,
and tests the players against simple benchmark strategies and
backprop-trained networks.
How hard is the correct coding of an easy endgame
(1994, 11 pages)
Jean-Christophe Weill
Compressed postscript.
The author, a non-chessplayer, coded predicates which
allowed the decision tree induction algorithm ID3 to classify
queen-and-king versus queen-and-king chess endgame positions
as won or drawn.
Programmes d'échecs de championnat:
Architecture logicielle,
synthèse de fonctions d'évaluation,
parallélisme de recherche
(1995, close to 200 pages)
Jean-Christophe Weill
Compressed postscript.
A long PhD thesis, written in French.
The title means "Championship chess programs: logical
architecture, synthesis of evaluation functions,
search parallelism".
Describes the chess programs Cumulus and Ecume. There is a
short discussion of machine learning of evaluation functions.
TD learning of game evaluation functions
with hierarchical neural architectures
(1995, about 85 pages)
Marco A. Wiering
Compressed postscript, from
this publications page. A master's thesis that can be seen as
an extension of Justin Boyan's work (see MVP Backgammon on
other backgammon programs).
The author used a temporal difference method to train modular neural networks
to play tic-tac-toe and backgammon endgames.
Adaptive critic design in learning to play game of go
(1997)
Raonak Zaman,
Danil Prokhorov,
and
Donald C. Wunsch II
Web page.
The authors trained a neural network, using what they call a
Heuristic Dynamic Programming (HDP) adaptive critic.
After 6000 training games, the network had learned to defeat
Wally, a weak free program, 1% of the time.