Overkill’s new learning 1 - the description

Overkill has new strategy learning unlike what we’ve seen before. I’m excited to look into it.

Author Sijia Xu updated Overkill for AIIDE 2016 with a machine learning method to learn what units to build after the opening, depending on the game situation. As far as I know, no other bot has anything similar. So far, this version of Overkill played only in AIIDE 2016, where it finished 7th; it’s not playing on SSCAIT or anywhere else that I know of. But the code is in Sijia Xu's github repository along with a description.

overview

I’ll start with the description. To quote:

Overkill currently contain two main AI system:

building unit selection. Using reinforcement learning model to choose the most proper unit to build(currently only three types of unit to choose: zerglings,mutalisk,hydralisk) according to current game status, and saving the match info to continually improve the RL model itself.

opening strategy selection. Using UCB1 to select the best opening strategy at the the beginning of the game according to the previous match result against this opponent.

In other words, the opening strategy learning that I went over before remains the same, though according to the readme the openings have been trimmed back to let the unit selection model make more decisions.

It sounds as though Overkill’s tactical abilities are not changed. Overkill 2015 plays with zerglings, mutalisks, and hydralisks and nothing else, switching from one to the next in a fixed order when simple heuristics are triggered. Overkill 2016 plays with the same units but can change between them however it likes depending on the way the game develops. Its play should be more flexible and fun.

My thought: Unit selection is an important skill, but it is only one skill. Unit selection among only 3 units is an interesting experiment, not a way in itself to score a lot of wins. The most basic zerg skill isn’t knowing when to make which fighting unit, it’s knowing when to make drones, and that is not learned here. With no ability to use scourge, lurkers, or any hive tech, and hand-coded tactics that maneuver each type of unit independently of the other types, Overkill does not have a high ceiling on its skill level, no matter how smart the unit selection model grows. That’s why it only came in 7th, I guess. It’s the academic attitude at work, tackling one problem at a time and not worrying about full generality until later. And I’m intensely interested to see what lessons we can draw from the experiment.

the method

Now I’ll cover the “model”, “feature” and “learning methods” sections from the description. They are written to be understandable to AI people, and in fact I can understand the outline clearly enough. But it’s probably incomprehensible unless you know the jargon, so I’ll try to explain step by step in followup posts.

The basic algorithm is Q-learning, a fundamental reinforcement learning algorithm. Q-learning is well worth knowing if you care about AI at all, so I’ll write a post to explain it. Q-learning is a plug-in algorithm where you can plug in different kinds of models which are learned by another algorithm.

Overkill plugs in a model of a linear fit to a large number of binary game features, which is exactly the same kind of model that I briefly mentioned in breakthrough game programs as used in the evaluation function of the othello program Logistello. The description says that Overkill has about 4000 features, which (believe it or not) is a small number for this kind of model! Having more features might make it too slow. The features are things like which units and tech each side has, and so on. I’ll read the code and find out details.

Overkill does its linear fit to the features not using regression like Logistello but using stochastic gradient descent or SGD. Versions of SGD are state of the art and heavily used in deep learning (and more). SGD is important, but it’s not fundamental so I don’t intend to look at it.

The description does not tell us whether Overkill includes “who is my opponent?” in its model. Looking at the I/O files from AIIDE 2016, I see files like “feature_gradientAiur”, “feature_valueAiur”, and “RL_dataAiur” for each opponent. It looks as though Overkill learns a different model for each opponent, although it could combine the opponent data into one model. I’ll read the code and find out.

the results

The description shows us 3 graphs from one experimental run of Overkill playing 600 games versus IceBot. That’s a lot of games. The graphs show Overkill’s model appearing to converge, slowly and unsteadily, and improve its already-high winning rate.

My thoughts: 1. One experiment does not tell us whether this works. It seemed to work against IceBot. 2. The learning rate was slow. It took hundreds of games. As I pointed out in the generalization for strategy learning post, reinforcement learning doesn’t stand out as a good choice when you have limited data. 3. Therefore Overkill’s learning seems suitable only for a very long tournament. If Overkill learns a model for each opponent, then the 90 games against each opponent in AIIDE 2016 were not enough. If it learns one overall model, then there may be enough games but an experiment against one opponent doesn’t tell us.

I don’t see where this is written, but somewhere I got the idea that for AIIDE 2016, Overkill started with a general model trained against some range of opponents, its prior knowledge. Then it refined the prior knowledge with experience during the tournament. It seems like a good idea. On the other hand, Overkill’s AIIDE 2016 winning rate of 69/87 (79%) doesn’t match the expected >90% win rate from the graph if Overkill had been pre-trained against IceBot. So I’m not sure. A general model plus 90 games for each opponent could produce the results we see—it’s plausible.

Next: Q-learning.

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA