AIUR learns more

The protoss bot AIUR by Florian Richoux has a set of hand-coded strategies and learns over time which strategies win against which opponents. That’s a popular religion; other bots like Overkill (see my post on it) and Tscmoo worship at the same altar. But a funny thing happened on the way through the tournament. In the AIIDE 2015 competition report, look at the graph of winning rate over time for the different bots. Let me steal the image showing the top half of participants:

win rates by round in AIIDE 2015

AIUR’s line is the one in the middle that keeps rising and rising. Look carefully and you can see it leveling off, but it hasn’t reached its asymptote at the end of the very long tournament. AIUR’s learning seems to learn more, and to keep on learning, even though AIUR’s learning method is about the same as other bots. Howzat happen?

Of course AIUR doesn’t do exactly the same thing as other bots. After all, it calls its strategies “moods,” which sounds entirely different. It doesn’t learn an opponent -> strategy mapping, it learns opponent + map size -> strategy, where map size means the number of starting bases, usually 2, 3, or 4. It can figure out that its cannon rush works better on 2-player maps, for example. I imagine that that’s part of the answer, but could it be the whole story?

I have a theory. My theory is that AIUR’s extreme strategies make good probes for weakness. AIUR’s strategies range from absolutely reckless cannon rush, dark templar rush, and 4-zealot drop cheese to defensive and macro-oriented game plans. AIUR’s strategies stake out corners of strategy space. Compare Overkill’s middle-of-the-road zergling, mutalisk, and hydralisk strats, with no fast rushes or slow macro plays, nothing highly aggressive and nothing highly cautious. My theory is that if an enemy makes systematic mistakes, then one of AIUR’s extreme strategies is likely to exploit the mistakes, and AIUR will eventually learn so.

If true, that could explain why AIUR learns more effectively in the long run. Presumably the reason that it takes so long to reach its asymptote is that it has to learn the effect of the map size. The tournament had 27 games per opponent on 2-player maps, 18 on 3-player, and 45 on 4-player, not enough to test each of its 6 strategies repeatedly. It could learn faster by doing a touch of generalization—I’ll post on that some other day.

AIUR also claims to implement its strategies with a further dose of randomness. Intentional unpredictability could confuse the learning algorithms of its enemies. I approve.

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA