Overkill’s new learning 3 - one model or many?

My first question about Overkill’s model is: One model for all opponents, or one model for each opponent? It turns out that the answer is: It depends. It checks curMode.

enum developMode { Develop, Release };
extern developMode	curMode;

Here is an example. Bits of code like this show up for each of the 3 data files used to store the model and learning information.

	if (curMode == Develop)
	{
		filePath = “./bwapi-data/write/RL_data”;
	}
	else
	{
		string enemyName = BWAPI::Broodwar->enemy()->getName();
		filePath = “./bwapi-data/write/RL_data”;
		filePath += enemyName;
	}

In the AIIDE 2016 competition, curMode is set to Release. It looks as though each opponent gets its own model, learned independently. But not learned from scratch!

My idea that Overkill has a general model turned out true. (I may have read it somewhere and forgotten where.) When it plays an opponent for the first time, it uses a model defined in file ModelWeightInit.h as the initial model, and learns starting from there. I don’t see any information about how the initial model was created. It may have been trained by playing against a variety of opponents, in Develop mode.

You could say that the initial model is “how to play Starcraft” and the refined model made for each opponent is “how to beat this opponent.” The same learning system can be used both offline to learn the game and online to model the opponent.

How well did opponent modeling work? We can look at Overkill’s graph of win rate over time in the AIIDE 2016 results. Its winning rate after 20 rounds was 0.59 and after 90 rounds was 0.62. The curve shows a fairly steady rise, and visually it’s convincing that Overkill learned something about its opponents, but it improved its win rate only a little. The unsteady learning curve we saw in the description suggests that Overkill might have eventually learned more if the tournament had gone on long enough—maybe.

Next: The features of Overkill’s model.

Trackbacks

No Trackbacks

Comments

Jay Scott on Wednesday, November 16. 2016:

I should add that there’s a complicating factor in interpreting the graph of winning rate over time. The Q-learning is not the only learning Overkill was doing—its opening strategy learning was operating too! It could be that opening learning was responsible for the nice-looking curve and Q-learning did no good at all. We have reason to suspect that 90 games are not enough.

krasi0 on Thursday, November 17. 2016:

Yeah, I, too believe that learning a proper Q function of so many input parameters (4000) in just 90 rounds is not viable. Hopefully, the pretrained model (weight initialization) was good enough to begin with

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA