Overkill’s new learning 3 - one model or many?
My first question about Overkill’s model is: One model for all opponents, or one model for each opponent? It turns out that the answer is: It depends. It checks curMode.
enum developMode { Develop, Release };
extern developMode curMode;
Here is an example. Bits of code like this show up for each of the 3 data files used to store the model and learning information.
if (curMode == Develop)
{
filePath = “./bwapi-data/write/RL_data”;
}
else
{
string enemyName = BWAPI::Broodwar->enemy()->getName();
filePath = “./bwapi-data/write/RL_data”;
filePath += enemyName;
}
In the AIIDE 2016 competition, curMode is set to Release. It looks as though each opponent gets its own model, learned independently. But not learned from scratch!
My idea that Overkill has a general model turned out true. (I may have read it somewhere and forgotten where.) When it plays an opponent for the first time, it uses a model defined in file ModelWeightInit.h as the initial model, and learns starting from there. I don’t see any information about how the initial model was created. It may have been trained by playing against a variety of opponents, in Develop mode.
You could say that the initial model is “how to play Starcraft” and the refined model made for each opponent is “how to beat this opponent.” The same learning system can be used both offline to learn the game and online to model the opponent.
How well did opponent modeling work? We can look at Overkill’s graph of win rate over time in the AIIDE 2016 results. Its winning rate after 20 rounds was 0.59 and after 90 rounds was 0.62. The curve shows a fairly steady rise, and visually it’s convincing that Overkill learned something about its opponents, but it improved its win rate only a little. The unsteady learning curve we saw in the description suggests that Overkill might have eventually learned more if the tournament had gone on long enough—maybe.
Next: The features of Overkill’s model.
Comments
Jay Scott on :
krasi0 on :