AIUR learns more
The protoss bot AIUR by Florian Richoux has a set of hand-coded strategies and learns over time which strategies win against which opponents. That’s a popular religion; other bots like Overkill (see my post on it) and Tscmoo worship at the same altar. But a funny thing happened on the way through the tournament. In the AIIDE 2015 competition report, look at the graph of winning rate over time for the different bots. Let me steal the image showing the top half of participants:
AIUR’s line is the one in the middle that keeps rising and rising. Look carefully and you can see it leveling off, but it hasn’t reached its asymptote at the end of the very long tournament. AIUR’s learning seems to learn more, and to keep on learning, even though AIUR’s learning method is about the same as other bots. Howzat happen?
Of course AIUR doesn’t do exactly the same thing as other bots. After all, it calls its strategies “moods,” which sounds entirely different. It doesn’t learn an opponent -> strategy mapping, it learns opponent + map size -> strategy, where map size means the number of starting bases, usually 2, 3, or 4. It can figure out that its cannon rush works better on 2-player maps, for example. I imagine that that’s part of the answer, but could it be the whole story?
I have a theory. My theory is that AIUR’s extreme strategies make good probes for weakness. AIUR’s strategies range from absolutely reckless cannon rush, dark templar rush, and 4-zealot drop cheese to defensive and macro-oriented game plans. AIUR’s strategies stake out corners of strategy space. Compare Overkill’s middle-of-the-road zergling, mutalisk, and hydralisk strats, with no fast rushes or slow macro plays, nothing highly aggressive and nothing highly cautious. My theory is that if an enemy makes systematic mistakes, then one of AIUR’s extreme strategies is likely to exploit the mistakes, and AIUR will eventually learn so.
If true, that could explain why AIUR learns more effectively in the long run. Presumably the reason that it takes so long to reach its asymptote is that it has to learn the effect of the map size. The tournament had 27 games per opponent on 2-player maps, 18 on 3-player, and 45 on 4-player, not enough to test each of its 6 strategies repeatedly. It could learn faster by doing a touch of generalization—I’ll post on that some other day.
AIUR also claims to implement its strategies with a further dose of randomness. Intentional unpredictability could confuse the learning algorithms of its enemies. I approve.
Comments