breakthrough game programs
In a given game, the progress of game playing programs shows occasional sudden breakthroughs to higher performance and long periods of slow improvement while programmers learn how to better exploit the previous breakthrough. It’s my observation. Historically, most games have 1 breakthrough idea whose refinement, as computers grow faster, eventually leads to superhuman skill.
A breakthrough is always based on a simple idea. Maybe it’s my power of oversimplification, but that’s how it seems to me. A breakthrough program is always complex, though: A new idea is not accepted as a breakthrough unless it performs, and if you come up with a potential breakthrough idea then you have to do the detail work to bring it to the point where it can actually break through. The detail work can include inventing more new ideas to support the breakthrough.
chess: Chess 4.0, Slate and Atkin, 1973
the simple idea: full-width search
By 1973, the alpha-beta algorithm was universal. Chess programmers generally believed that, to search deeply enough to play well, programs had to be selective about which moves they searched. They wrote complicated code to select good candidate moves and prune bad ones. Slate and Atkin observed that the pruning code was slow and introduced a lot of play errors when good moves were mistakenly pruned—how are you supposed to know whether a move is good before you search it? Their new version, Chess 4.0, searched all moves as deeply as their previous version had searched selected moves, and played much better. To exploit the breakthrough, they used efficient bit-board data structures and added complex new ideas of iterative deepening search, a transposition table to avoid re-searching known positions and to re-use move ordering information from past search iterations, and a quiescence search of forcing moves such as captures. In the 1990s the null move heuristic became popular and chess programs started to become more selective again—they try to search bad moves less rather than not at all. Today’s superhuman chess programs are highly selective in their searches. I don’t see it as a repudiation of the original breakthrough idea, but as a refinement: Here is how you tune it; you still search every move, some deeply and some shallowly.
backgammon: TD-Gammon, Gerald Tesauro, 1992
the simple idea: neural network evaluator trained by self-play
Previous backgammon programs had used hand-coded evaluators. Gerald Tesauro figured out how to train a neural network as the evaluator by playing the program against itself while using temporal difference learning to reduce errors, producing a far more accurate evaluator. Today’s near-perfect backgammon programs use the same idea, with the addition of various kinds of search. AlphaGo also famously adopted this style of evaluator—now you know where the idea came from.
othello: Logistello, Michael Buro, 1993
the simple idea: statistical evaluator with a huge number of binary features
An othello program with a fast search and a simple evaluation function can play well. Michael Buro added a highly knowledgeable evaluation function for overwhelming strength, crushing the human champion 6-0 in 1997. The idea behind the evaluator is: Score a large number of positions by search; these scores have to be reasonably accurate. Find a huge number of binary features that you can calculate fast (in othello, millions of possible disk patterns that may appear on the board). Then treat the evaluation features as independent and use straightforward statistical regression to find the value of each feature. It’s important to get the math details right; you can read some of the papers at the link. Suddenly you have an evaluator with a hell of a lot of detailed knowledge! The same idea has been used to produce strong evaluators in many other games, and I don’t know any reason it wouldn’t work in Starcraft.
contract bridge: GIB, Matthew Ginsberg, 1998
the simple idea: draw a statistical sample of the possible game situations
Bridge is a game of partial information: After the deal, you only know your own cards. As play goes on, you gain more information. GIB randomly generates game situations compatible with what it knows and evaluates its choices in the sample situations; whatever is best on average is best in reality, within the statistical margin of error. GIB and other current programs still use the same idea. See computer bridge for more. Starcraft is also a game of partial information, so the idea is directly relevant.
go: Crazy Stone, Rémi Coulom, 2006
the simple idea: draw a statistical sample of possible lines of play (Monte Carlo tree search)
(This is the same Rémi Coulom who wrote bayeselo, by the way.) GIB knows its possible lines of play and has to sample from the unknown game situations. Go programs know the game situation but, because there are a huge number of possible moves at any time, can only sample from possible lines of play. By now, this family of search algorithms is somewhat well developed; there have been rounds of improvement as new refinements are invented. The Starcraft bot MaasCraft uses a variation, and the latest versions of LetaBot are supposed to as well.
go: AlphaGo, DeepMind, 2015
the simple idea: use deep learning for search control
Go is such a difficult game that it took 2 breakthroughs to catch up to human performance. AlphaGo keeps the idea of Monte Carlo tree search from Crazy Go and other programs and adds search control by deep learning; that is its main new idea, in my view. AlphaGo also includes the learning by self-play idea of TD-Gammon and, with deep learning, makes it more successful than past attempts in go. I think everybody has noticed by now that deep learning is highly promising for Starcraft, too.
Search and knowledge are the two ways to get stuff done, and here I strip new ideas down to their essence, so it should be no surprise that the breakthroughs fall more or less neatly into two categories. The search breakthroughs are in Chess 4.0, GIB, and Crazy Stone. The knowledge breakthroughs are in TD-Gammon, Logistello, and AlphaGo. AlphaGo does not categorize so neatly, though; it is both a knowledge and a search idea. And that is as it should be, if you ask me. Search and knowledge belong together.
Right now, Starcraft programs are at the stage where the greater part of most programs is a mass of ad hoc hand-crafted heuristic kludges (it’s redundant; the 4 terms all mean the same). We’re still at the stage where chess programs were before Chess 4.0 and backgammon programs before TD-Gammon. We are still looking for our breakthrough. People are casting around for the weak point in the wall of difficulty. Now is the time to learn from past breakthroughs!
Personally, I’m expecting a knowledge breakthrough more than a search breakthrough. But it’s only my guess, and I could easily be wrong. The skills we want are creativity and close analysis; whoever has enough of both will produce the Starcraft breakthrough.
Comments
krasi0 on :
Could you expand on the following statement: "I think everybody has noticed by now that deep learning is highly promising for Starcraft, too.". I must have missed something but has there been any evidence to support that particular claim? I am not saying that DL wouldn't work here. It's an honest question.
I believe that the time for a breakthrough would come once we hit a *hard* wall / ceiling on the way up to improvement. Still, many bot authors are able to register gradual improvements in the strength of play of their bots (just look at Iron, tscmoo or Bereaver, for example), so no one really feels hard pressed to utilize their creativity and to think out of the box for the purpose of a long-term success prospect.
Just my 2 cents
Jay Scott on :
Jay Scott on :
Jay Scott on :
krasi0 on :
Jay Scott on :
krasi0 on :
Jay Scott on :