what AIUR learned
After Overkill yesterday, I wrote a not-quite-as-little Perl script to read AIUR’s learning files. AIUR learns more data: Overkill learns a table (opponent, strategy), while AIUR learns a table (opponent, strategy, map size) where map size is the number of starting positions, which is 2, 3 or 4 in AIIDE 2015.
Unlike Overkill, AIUR recorded every game exactly once, missing none and adding none, so its data should be easier to interpret.
Here’s a sample table for one opponent. Compare it against AIUR’s row in Overkill’s table from yesterday. See the full AIUR learning results.
overkill | 2 | 3 | 4 | total | ||||
---|---|---|---|---|---|---|---|---|
n | wins | n | wins | n | wins | n | wins | |
cheese | 18 | 67% | 3 | 33% | 1 | 0% | 22 | 59% |
rush | 1 | 0% | 1 | 0% | 1 | 0% | 3 | 0% |
aggressive | 1 | 0% | 1 | 0% | 1 | 0% | 3 | 0% |
fast expo | 1 | 0% | 1 | 0% | 2 | 0% | 4 | 0% |
macro | 1 | 0% | 3 | 33% | 25 | 12% | 29 | 14% |
defensive | 5 | 40% | 9 | 33% | 15 | 40% | 29 | 38% |
total | 27 | 52% | 18 | 28% | 45 | 20% | 90 | 31% |
For reference, here are AIUR’s “moods,” aka strategies.
- cheese - cannon rush
- rush - dark templar rush
- aggressive - fast 4-zealot drop
- fast expo - nexus first
- macro - aim for a strong middle game army
- defensive - be safe against rushes
We see that against Overkill, the cannon rush was relatively successful on 2-player maps, 3-player maps were a struggle, and on 4-player maps AIUR discovered a little late that the defensive mood was better than the macro mood. We also see that AIUR barely explored further when it found a reasonably successful try. If the best strategy was one that happened to lose its first game and didn’t get tried again, it would never know. With so many table cells to fill in, the tremendously long tournament was not long enough for AIUR to explore every possibility thoroughly.
AIUR selected strategies with an initial phase of try-everything-approximately-once followed by an epsilon-greedy algorithm, with epsilon set at 6%. Epsilon-greedy means that 6% of the time it chose a strategy at random, and otherwise it made the greedy choice, the strategy with the best record so far. With 90 games against each opponent to fill in 18 table cells, most cells never came up in the 6% random sample.
It should be clear why AIUR was still improving steadily at the end of the tournament! I offered a theory that AIUR learned so much because of its extreme strategies. If you read through the full set of tables, you’ll see that a strategy which works on one map size only sometimes works on other sizes too. The combination of opponent and map size paid off in ways that neither could alone, though only sometimes.
Overkill and AIUR fought a learning duel during the tournament. Both are running learning algorithms which assume that the opponent does not change (or at least settles down in the long run), and both bots violated the assumption. AIUR violated it more strongly. Was that an advantage? Could there be a connection with AIUR’s late discovery of the defensive strategy on 4-player maps?
I updated the zip archive of the Perl scripts and related files to add AIUR’s script alongside Overkill’s. By the way, I haven’t tested it on Windows, so it might need a tweak or two for that (nothing more than one or two very small changes).
Comments