what AIUR learned

After Overkill yesterday, I wrote a not-quite-as-little Perl script to read AIUR’s learning files. AIUR learns more data: Overkill learns a table (opponent, strategy), while AIUR learns a table (opponent, strategy, map size) where map size is the number of starting positions, which is 2, 3 or 4 in AIIDE 2015.

Unlike Overkill, AIUR recorded every game exactly once, missing none and adding none, so its data should be easier to interpret.

Here’s a sample table for one opponent. Compare it against AIUR’s row in Overkill’s table from yesterday. See the full AIUR learning results.

overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	18	67%	3	33%	1	0%	22	59%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	1	0%	3	33%	25	12%	29	14%
defensive	5	40%	9	33%	15	40%	29	38%
total	27	52%	18	28%	45	20%	90	31%

For reference, here are AIUR’s “moods,” aka strategies.

cheese - cannon rush
rush - dark templar rush
aggressive - fast 4-zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes

We see that against Overkill, the cannon rush was relatively successful on 2-player maps, 3-player maps were a struggle, and on 4-player maps AIUR discovered a little late that the defensive mood was better than the macro mood. We also see that AIUR barely explored further when it found a reasonably successful try. If the best strategy was one that happened to lose its first game and didn’t get tried again, it would never know. With so many table cells to fill in, the tremendously long tournament was not long enough for AIUR to explore every possibility thoroughly.

AIUR selected strategies with an initial phase of try-everything-approximately-once followed by an epsilon-greedy algorithm, with epsilon set at 6%. Epsilon-greedy means that 6% of the time it chose a strategy at random, and otherwise it made the greedy choice, the strategy with the best record so far. With 90 games against each opponent to fill in 18 table cells, most cells never came up in the 6% random sample.

It should be clear why AIUR was still improving steadily at the end of the tournament! I offered a theory that AIUR learned so much because of its extreme strategies. If you read through the full set of tables, you’ll see that a strategy which works on one map size only sometimes works on other sizes too. The combination of opponent and map size paid off in ways that neither could alone, though only sometimes.

Overkill and AIUR fought a learning duel during the tournament. Both are running learning algorithms which assume that the opponent does not change (or at least settles down in the long run), and both bots violated the assumption. AIUR violated it more strongly. Was that an advantage? Could there be a connection with AIUR’s late discovery of the defensive strategy on 4-player maps?

I updated the zip archive of the Perl scripts and related files to add AIUR’s script alongside Overkill’s. By the way, I haven’t tested it on Windows, so it might need a tweak or two for that (nothing more than one or two very small changes).

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA