archive by month
Skip to content

AIIDE 2022 - map tables by bot

For each bot, its win rate by map and opponent. You can abbreviate it as bot x (map x opponent) if you like. Yesterday’s tables showed that maps make little difference when averaged across opponents. Today’s show that (as usual) maps do make a difference for specific opponents.

Each cell represents 22 or 23 games, sometimes fewer when games did not complete. No cell has fewer than 20 games. The same tables last year had 15 games per cell. The numbers are a trifle more reliable this year, but there is still a lot of statistical noise.

#bananabrainoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
2stardust52%52%65%32%32%77%45%50%50%50%64%
3dragon78%78%78%77%77%86%86%77%64%82%73%
4steamhammer89%87%96%82%86%91%86%77%100%86%95%
5purplewave69%87%65%73%64%86%50%55%68%68%73%
6mcrave94%91%83%95%95%91%95%91%95%100%100%
7microwave91%91%91%91%100%86%95%82%95%91%91%
8ualbertabot97%91%100%95%100%95%100%95%95%95%100%
9pylonpuller92%91%87%95%100%100%91%91%86%100%77%
10styx94%100%100%100%100%86%100%95%95%73%91%
11cunybot100%100%100%100%100%100%100%100%100%100%100%
overall85.53%87%87%84%85%90%85%81%85%85%86%

#1 BananaBrain was solid against most opponents, but inconsistent across maps versus its top protoss competition, #2 Stardust and #5 PurpleWave.

#stardustoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain48%48%35%68%68%23%55%50%50%50%36%
3dragon92%100%95%77%100%100%95%64%100%91%95%
4steamhammer83%100%82%73%86%91%77%50%95%95%82%
5purplewave52%57%50%73%68%27%64%18%50%59%50%
6mcrave89%83%83%86%95%95%100%95%91%77%82%
7microwave93%96%100%95%100%100%95%95%77%82%86%
8ualbertabot83%83%86%91%100%77%100%55%77%82%82%
9pylonpuller95%96%95%100%100%100%100%86%77%91%100%
10styx84%100%95%100%95%77%91%32%86%86%73%
11cunybot97%96%100%100%100%100%100%95%95%86%95%
overall81.48%86%82%86%91%79%88%64%80%80%78%

Here is the source of #2 Stardust’s relative weakness on Empire of the Sun: #5 PurpleWave and #10 Styx found holes in its play on the map. The upset by Styx on that map only is particularly extreme. Heartbreak Ridge, Longinus, and Empire of the Sun are the maps where the main bases are on the same level as the naturals, with no ramp, and all of them had at least one opponent that could exploit Stardust. But if that’s the cause, then why is Aztec fine for Stardust? On Aztec, the naturals are uphill from the mains.

#dragonoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain22%22%22%23%23%14%14%23%36%18%27%
2stardust8%0%5%23%0%0%5%36%0%9%5%
4steamhammer21%26%32%14%18%32%32%14%5%36%5%
5purplewave97%91%95%91%100%100%100%100%100%95%100%
6mcrave95%96%87%95%100%100%95%91%95%100%95%
7microwave56%65%55%36%45%64%73%59%55%59%50%
8ualbertabot77%82%64%95%80%73%77%77%68%77%73%
9pylonpuller98%100%100%95%100%91%95%100%100%100%100%
10styx94%96%100%100%91%91%95%100%100%82%86%
11cunybot95%96%96%95%91%95%100%91%100%95%95%
overall66.46%67%65%67%65%66%69%69%66%67%64%

Last year and the year before I thought that #3 Dragon was inconsistent across maps. This year it doesn’t look that way. It’s the same bot carried over. The difference seems to be that this year Dragon either smashed its opponents or got smashed by them. It remains inconsistent against #7 Microwave and #8 UAlbertaBot, the opponents scoring closest to 50%.

#steamhammeroverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain11%13%4%18%14%9%14%23%0%14%5%
2stardust17%0%18%27%14%9%23%50%5%5%18%
3dragon79%74%68%86%82%68%68%86%95%64%95%
5purplewave43%57%55%50%41%27%41%27%36%32%59%
6mcrave43%48%74%45%45%45%45%23%32%45%27%
7microwave73%70%57%50%82%68%91%77%77%91%68%
8ualbertabot95%91%100%100%86%100%95%95%100%91%95%
9pylonpuller80%70%64%82%86%86%82%91%82%77%82%
10styx90%91%86%100%95%86%100%95%95%77%73%
11cunybot97%100%96%100%100%91%95%95%100%100%95%
overall62.71%61%62%66%65%59%65%66%62%60%61%

Someday I will get Steamhammer to adapt properly to the map it is playing on.

#4 Steamhammer owes its ranking in large part to its strong performance against the carryover bots that it specifically prepared for. Versus #3 Dragon: Last year 63%, this year 79%. Versus #8 UAlbertaBot: Last year 92%, this year 95%. I knew that both would be up. I’m surprised that other bots seem to have been unprepared for Dragon in particular.

#purplewaveoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain31%13%35%27%36%14%50%45%32%32%27%
2stardust48%43%50%27%32%73%36%82%50%41%50%
3dragon3%9%5%9%0%0%0%0%0%5%0%
4steamhammer57%43%45%50%59%73%59%73%64%68%41%
6mcrave84%26%57%100%91%86%91%100%100%95%100%
7microwave50%35%17%55%45%55%36%82%77%45%50%
8ualbertabot66%78%55%100%68%68%55%64%55%64%50%
9pylonpuller87%91%74%100%95%77%91%77%95%82%86%
10styx96%100%100%73%100%100%100%95%100%95%100%
11cunybot89%83%96%82%86%86%100%77%95%100%86%
overall61.17%52%53%62%61%63%62%70%67%63%59%

#5 PurpleWave struggled versus #6 McRave on the 2-player maps Destination and Heartbreak Ridge, but scored 100% on the other 2-player map Polaris Rhapsody. It smells like a bug—but see the next table.

#mcraveoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain6%9%17%5%5%9%5%9%5%0%0%
2stardust11%17%17%14%5%5%0%5%9%23%18%
3dragon5%4%13%5%0%0%5%9%5%0%5%
4steamhammer57%52%26%55%55%55%55%77%68%55%73%
5purplewave16%74%43%0%9%14%9%0%0%5%0%
7microwave92%100%91%68%100%100%77%91%95%100%100%
8ualbertabot29%74%41%43%14%36%5%14%9%36%10%
9pylonpuller62%65%70%82%73%50%68%36%68%59%50%
10styx91%100%70%82%91%100%95%100%95%82%91%
11cunybot100%100%100%100%100%95%100%100%100%100%100%
overall46.79%60%49%45%45%46%42%44%45%46%45%

Why does #6 McRave like Destination? Mainly because of upsets against #5 PurpleWave and #8 UAlbertaBot that otherwise defeat it. If the win over PurpleWave is due to PurpleWave’s putative bug, then what explains the win over UAlbertaBot?

#microwaveoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain9%9%9%9%0%14%5%18%5%9%9%
2stardust7%4%0%5%0%0%5%5%23%18%14%
3dragon44%35%45%64%55%36%27%41%45%41%50%
4steamhammer27%30%43%50%18%32%9%23%23%9%32%
5purplewave50%65%83%45%55%45%64%18%23%55%50%
6mcrave8%0%9%32%0%0%23%9%5%0%0%
8ualbertabot57%70%65%55%43%36%77%73%59%55%32%
9pylonpuller67%57%61%91%68%50%55%77%77%64%68%
10styx99%100%100%100%95%100%100%100%95%100%100%
11cunybot99%100%100%100%100%95%100%95%100%100%100%
overall46.62%47%52%55%43%41%46%46%45%45%45%

#ualbertabotoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain3%9%0%5%0%5%0%5%5%5%0%
2stardust17%17%14%9%0%23%0%45%23%18%18%
3dragon23%18%36%5%20%27%23%23%32%23%27%
4steamhammer5%9%0%0%14%0%5%5%0%9%5%
5purplewave34%22%45%0%32%32%45%36%45%36%50%
6mcrave71%26%59%57%86%64%95%86%91%64%90%
7microwave43%30%35%45%57%64%23%27%41%45%68%
9pylonpuller66%74%64%91%82%59%64%43%45%64%77%
10styx95%86%100%100%95%95%86%82%100%100%100%
11cunybot98%100%100%95%100%100%100%95%100%91%100%
overall45.74%39%45%41%49%47%44%45%48%45%54%

It’s interesting that #8 UAlbertaBot does better against #6 McRave on the 4-player maps. You might think that UAlbertaBot’s rushes would work better on 2-player maps with a short rush distance, but it’s the opposite. I imagine it is because McRave takes longer to scout, so it can’t adapt as quickly.

#pylonpulleroverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain8%9%13%5%0%0%9%9%14%0%23%
2stardust5%4%5%0%0%0%0%14%23%9%0%
3dragon2%0%0%5%0%9%5%0%0%0%0%
4steamhammer20%30%36%18%14%14%18%9%18%23%18%
5purplewave13%9%26%0%5%23%9%23%5%18%14%
6mcrave38%35%30%18%27%50%32%64%32%41%50%
7microwave33%43%39%9%32%50%45%23%23%36%32%
8ualbertabot34%26%36%9%18%41%36%57%55%36%23%
10styx62%87%64%18%73%82%45%64%73%59%55%
11cunybot74%83%83%36%86%55%77%82%68%91%77%
overall28.91%33%33%12%25%32%28%34%31%31%29%

Wow, look at results versus #10 Styx. Polaris Rhapsody does seem to be an outlier among the 2-player maps.

#styxoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain6%0%0%0%0%14%0%5%5%27%9%
2stardust16%0%5%0%5%23%9%68%14%14%27%
3dragon6%4%0%0%9%9%5%0%0%18%14%
4steamhammer10%9%14%0%5%14%0%5%5%23%27%
5purplewave4%0%0%27%0%0%0%5%0%5%0%
6mcrave9%0%30%18%9%0%5%0%5%18%9%
7microwave1%0%0%0%5%0%0%0%5%0%0%
8ualbertabot5%14%0%0%5%5%14%18%0%0%0%
9pylonpuller38%13%36%82%27%18%55%36%27%41%45%
11cunybot44%48%17%59%68%50%23%32%45%55%41%
overall13.92%9%10%19%13%13%11%17%10%20%17%

Only a few pinprick upsets, but one of them is extreme.

#cunybotoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain0%0%0%0%0%0%0%0%0%0%0%
2stardust3%4%0%0%0%0%0%5%5%14%5%
3dragon5%4%4%5%9%5%0%9%0%5%5%
4steamhammer3%0%4%0%0%9%5%5%0%0%5%
5purplewave11%17%4%18%14%14%0%23%5%0%14%
6mcrave0%0%0%0%0%5%0%0%0%0%0%
7microwave1%0%0%0%0%5%0%5%0%0%0%
8ualbertabot2%0%0%5%0%0%0%5%0%9%0%
9pylonpuller26%17%17%64%14%45%23%18%32%9%23%
10styx56%52%83%41%32%50%77%68%55%45%59%
overall10.69%10%11%13%7%13%10%14%10%8%11%

AIIDE 2022 - maps and game durations

First, win rates for bots x maps. This is identical to the third table in the official results, except for the presentation.

#botoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain85.53%87%87%84%85%90%85%81%85%85%86%
2stardust81.48%86%82%86%91%79%88%64%80%80%78%
3dragon66.46%67%65%67%65%66%69%69%66%67%64%
4steamhammer62.71%61%62%66%65%59%65%66%62%60%61%
5purplewave61.17%52%53%62%61%63%62%70%67%63%59%
6mcrave46.79%60%49%45%45%46%42%44%45%46%45%
7microwave46.62%47%52%55%43%41%46%46%45%45%45%
8ualbertabot45.74%39%45%41%49%47%44%45%48%45%54%
9pylonpuller28.91%33%33%12%25%32%28%34%31%31%29%
10styx13.92%9%10%19%13%13%11%17%10%20%17%
11cunybot10.69%10%11%13%7%13%10%14%10%8%11%

Stardust had some trouble on Empire of the Sun, and McRave liked Destination. For the most part, maps did not make a big difference when averaged out over opponents.

game durations

Game durations for bots x maps. The top number in each cell is the median duration of winning games, and the bottom number is for losing games. The overall numbers in the bottom row are the median duration of all games played on each map. The cell coloring is the same as in the table above—it reflects the winning rate, so you can judge by eye the balance of games in the top and bottom numbers.

As a general guideline, if winning games are shorter than losing games then the bot likes to win by early pressure and loses by getting outplayed later. Early pressure costs economy and tech. In the opposite case, the bot defends any early pressure and has stronger play in the long run (it shows any or all of macro, micro, and tech advantage). #8 UAlbertaBot is the most determined rushbot. #3 Dragon is the most prominent defensive bot. #7 Microwave is well-balanced. Note: Adding up the overall median winning times across opponents does not give the same result as adding up the losing times. The median is insensitive to outliers.

#botoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1bananabrain11:25
15:22
12:11
15:55
11:27
16:21
11:21
15:24
11:46
19:56
11:22
12:48
11:20
16:57
11:13
13:49
11:16
14:38
11:11
13:41
11:35
14:42
2stardust10:34
15:16
10:56
16:23
10:07
17:05
10:54
17:55
10:42
18:14
10:34
15:13
10:51
18:56
10:51
0:01
10:47
12:51
10:03
0:01
10:14
14:59
3dragon13:57
11:35
14:53
12:53
14:28
10:22
12:52
13:32
14:10
12:04
13:39
10:13
13:57
12:00
13:41
12:09
13:46
12:37
13:23
9:59
14:26
10:47
4steamhammer8:18
11:23
8:47
10:54
7:53
11:55
7:28
11:45
10:12
10:47
7:47
11:31
9:34
11:43
7:50
11:48
7:23
11:25
8:00
10:49
9:17
10:12
5purplewave12:09
17:50
13:20
17:45
13:07
16:45
12:31
19:14
12:08
20:51
12:34
15:48
12:09
18:54
12:09
16:38
11:47
18:24
11:53
18:18
11:38
17:08
6mcrave8:34
12:02
9:51
13:06
10:30
12:40
8:07
13:01
7:58
12:11
7:39
12:30
9:17
11:32
8:26
11:40
8:53
11:50
7:52
11:35
8:04
11:41
7microwave9:36
9:24
10:12
10:02
8:51
10:22
9:52
10:03
10:48
9:18
7:15
8:12
11:41
11:40
10:54
10:22
8:53
10:06
10:04
8:27
9:43
8:27
8ualbertabot6:28
10:33
8:00
11:18
6:39
10:09
5:24
10:30
6:22
10:48
6:18
10:21
6:55
10:00
6:38
10:26
6:28
10:34
6:19
10:52
6:50
10:41
9pylonpuller10:31
11:38
10:04
12:13
10:51
12:29
10:38
6:30
10:34
12:03
9:47
11:49
10:17
11:56
10:54
11:55
10:30
11:20
10:52
11:51
10:33
11:21
10styx8:28
8:14
11:32
8:39
9:07
7:38
9:10
8:16
8:29
8:38
8:32
8:03
9:21
8:45
8:54
8:25
7:16
8:09
6:44
8:04
8:38
8:08
11cunybot8:16
9:21
9:16
9:33
8:05
9:15
7:51
8:35
7:10
9:56
8:13
8:53
9:33
9:46
8:34
10:05
9:46
9:15
6:03
10:01
8:48
9:00
overall10:4711:2110:4910:3010:5810:2811:0610:4510:3810:3410:30

The top three bots have consistent winning times across maps. BananaBrain in particular is highly consistent. It seems to indicate a strong and well-executed strategy that wins on schedule. Losing times vary because they depend on what the opponent does after surviving.

The map with the longest game times is Destination. That probably reflects the difficulty of attacking across the twin bridges into the natural. The losing side can often defend until it runs out of resources.

Stardust

The most striking cells in the table are Stardust’s losing times on Empire of the Sun and Python. The time rendered as 0:01 is 33 frames, which is always the point when Stardust crashes, when it does (I checked). Over half the losses on those maps were crashes, so that the median loss was a crash. There were still plenty of wins. Is it due to Stardust crashing on those maps, or to winning so often that the median losing game was a crash? I made a little table of Stardust games which are exactly 33 frames long.

#botDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
2stardustlosses 33
crashes 4
losses 40
crashes 0
losses 30
crashes 0
losses 19
crashes 1
losses 46
crashes 1
losses 27
crashes 0
losses 79
crashes 46
losses 44
crashes 17
losses 44
crashes 23
losses 48
crashes 11

Answer: It’s due to Stardust crashing on those maps. The rate of 33-frame games varies extremely by map, though if you ran enough games I imagine it would be non-zero for every map. Four-player maps other than Circuit Breakers have a high crash rate.

Next: Breaking down results by map and opponent.

AIIDE 2022 - first look at results

AIIDE 2022 results are out today, complete with the detailed results file. The carryovers from last year are #3 Dragon and #8 UAlbertaBot. The others are updated for this year.

My version of the crosstable. It’s identical to the official crosstable except for the presentation.

#botoverallbanastardragsteapurpmcramicrualbpylostyxcuny
1bananabrain85.53%52%78%89%69%94%91%97%92%94%100%
2stardust81.48%48%92%83%52%89%93%83%95%84%97%
3dragon66.46%22%8%21%97%95%56%77%98%94%95%
4steamhammer62.71%11%17%79%43%43%73%95%80%90%97%
5purplewave61.17%31%48%3%57%84%50%66%87%96%89%
6mcrave46.79%6%11%5%57%16%92%29%62%91%100%
7microwave46.62%9%7%44%27%50%8%57%67%99%99%
8ualbertabot45.74%3%17%23%5%34%71%43%66%95%98%
9pylonpuller28.91%8%5%2%20%13%38%33%34%62%74%
10styx13.92%6%16%6%10%4%9%1%5%38%44%
11cunybot10.69%0%3%5%3%11%0%1%2%26%56%

The top four finishers are the same as last year, except that #1 BananaBrain and #2 Stardust are reversed. In CoG this year #5 PurpleWave made it to second, but not in AIIDE. #2 Stardust did not overtake BananaBrain, but came closer. The top two were not far apart from each other and dominated the rest.

I’m pleased that Steamhammer was able to hold its rank, because it is only slightly improved over last year’s version. I expected to be behind #5 PurpleWave and hoped to pass #3 Dragon, since I knew Steamhammer would score well head-to-head.

Stardust had around a hundred crashes. PurpleWave, McRave, and CUNYBot had hundreds of frame timeouts each. All these bots had a chance to move up in the rankings if they hadn’t lost so often for non-play-related reasons. Bots seem to be having increasing trouble with the time limits.

Thanks to an influx of weaker opponents, #8 UAlbertaBot finished above the bottom of the table, unlike last year. I wasn’t afraid of Styx finishing high, but I’m surprised it did so poorly. In the BASIL rankings, new bot #9 PylonPuller (which has been improving fast) and #10 Styx have almost the same elo. In fact, all the tail enders have curiously low win rates. Last year UAlbertaBot scored 27%—much worse than this year—and the fairly weak FreshMeat one rank up scored 34%. This year, the weak tail enders pushed everybody else’s wins up and made the tournament seem easy for them.

by race

The table of how each bot did by opponent race. Since there is only one terran and only one random bot, it’s less informative than we might like.

#botoverallvTvPvZvR
1bananabrain85.53%78%71%94%97%
2stardust81.48%92%65%89%83%
3dragon66.46%-56%73%77%
4steamhammer62.71%79%38%76%95%
5purplewave61.17%3%55%75%66%
6mcrave46.79%5%24%85%29%
7microwave46.62%44%33%58%57%
8ualbertabot45.74%23%30%63%-
9pylonpuller28.91%2%9%45%34%
10styx13.92%6%16%16%5%
11cunybot10.69%5%10%15%2%

Every bot scored better versus zerg than versus protoss, except for Styx which was about the same. That’s the important message in the table.

Next: Map tables.

CoG 2022 results first look

As Dan Gant let me know, CoG 2022 results are out today, complete with the detailed results file. The participants are the same as last year, except that MetaBot was dropped for unreliability that affecting the running of the tournament. The carryovers from last year are #6 XiaoYi, #7 CUNYbot, and #8 BetaStar. The others are updated for this year.

My version of the crosstable.

overallBanaPurpStarMcRaMicrXIAOCUNYBeta
#1 BananaBrain85.40%79%60%69%90%100%100%100%
#2 PurpleWave75.24%21%84%55%74%97%97%100%
#3 Stardust73.02%40%16%69%90%98%100%98%
#4 McRave68.60%31%45%31%93%81%100%100%
#5 Microwave46.54%10%26%10%7%74%98%100%
#6 XIAOYI35.40%0%3%2%19%26%98%100%
#7 CUNYBot15.49%0%3%0%0%2%2%100%
#8 BetaStar0.32%0%0%2%0%0%0%0%

There are surprises throughout, from top to bottom.

Stardust’s reign is over for the moment. Last year, Stardust scored over 90% in CoG and over 95% in AIIDE, crushing the competition. This time, #1 BananaBrain dominated with 85%, and #2 PurpleWave edged out #3 Stardust. The official results show that Stardust had 67 crashes and 7 frame timeouts in 3150 games. If Stardust had the same number of crashes (zero) and frame timeouts (1) as the two bots above it, it would have finished second by a razor-thin margin.

There is not a single upset, where a lower-ranked bot defeated a higher-ranked bot. The crosstable is very orderly. The lowest winning rate of a higher-ranked bot is 55% for #2 PurpleWave over #4 McRave.

Something went wrong with BetaStar. It is a strong bot and finished well ahead of CUNYbot last year. Head to head versus CUNYBot, it scored 40 wins out of 50 games. This year it scored 10 wins total against all opposition, and all wins were against Stardust and likely due to crashes. What went wrong? Did the new and improved map pool break it? Was there a rule change that it could not cope with?

race results

I made two versions of each table. The left one includes all results, the right one excludes BetaStar.

racescore
terran35%
protoss58%
zerg44%
racescore
terran25%
protoss74%
zerg34%

It’s not very informative, but I like to include it anyway. There was only one terran; we need more. Protoss dominated, as usual in recent years, even when including BetaStar’s debacle.

botraceoverallvTvPvZ
BananaBrainprotoss85.40%100%80%86%
PurpleWaveprotoss75.24%97%68%75%
Stardustprotoss73.02%98%51%86%
McRavezerg68.60%81%52%96%
Microwavezerg46.54%74%37%53%
XIAOYIterran35.40%-26%48%
CUNYBotzerg15.49%2%26%1%
BetaStarprotoss0.32%0%1%0%
botraceoverallvTvPvZ
BananaBrainprotoss82.96%100%70%86%
PurpleWaveprotoss71.11%97%52%75%
Stardustprotoss68.89%98%28%86%
McRavezerg63.37%81%36%96%
Microwavezerg37.63%74%16%53%
XIAOYIterran24.63%-1%48%
CUNYBotzerg1.41%2%1%1%

Again, not very informative with so few participants. Excluding BetaStar clarifies that CUNYbot was outclassed. XiaoYi was also outclassed by the remaining protoss, and was only able to fight against the zergs.

the surprising poor results

Stardust’s crash rate surprises me. It does not have a crashing problem on BASIL. There was something in the tournament environment that it was not ready for. I can’t guess whether that’s more due to Stardust, or more due to the tournament.

BetaStar essentially scored zero and added no information to the tournament results. To me it suggests that the tournament environment changed somehow (we know that at least the map pool changed), and the organizers did not test the carryover bots to make sure they still worked.

CoG 2022 prospects

CoG this year is a small, elite tournament, virtually the same as last year. The entrants, with their win rates in last year’s CoG:

botwinsauthor
Stardust90.25%Bruce Nielsen
BananaBrain74.69%Johan de Jong
McRave68.17%Christian McCrave
Microwave54.14%Micky Holdorf
PurpleWave52.14%Dan Gant

It’s a fair guess at the likely finishing order. Today’s BASIL ranks and 2021 CoG ranks are the same with one exception, McRave and Microwave are reversed. But PurpleWave has been playing for a long time with one bug that prevents it from doing any upgrades, including basics like dragoon range and zealot speed, and another bug that causes it to construct duplicate buildings. I think it’s a safe guess that the bugs will be fixed for the tournament, and PurpleWave may finish higher.

No terrans. That’s unfortunate.

The carryovers, also with their win rates from last year:

botwinsauthor
XiaoYi40.10%Benchang Zheng
BetaStar39.29%Ruo-Ze Luo
MetaBot23.08%Anderson Tavares
CUNYBot7.5%Bryan Weber

There’s a note that MetaBot may be dropped due to a tournament stability issue. The decision has not been announced yet.

All updated entrants are likely to outscore all carryovers. XiaoYi is the sole terran, and should not be much of a challenge for current bots. On BASIL, terran krasi0 has just in the last two days retaken its top spot. It would have been good to at least have Hao Pan or Dragon in the tournament.

But there is a silver lining. This is exactly the same participants as last year, assuming that MetaBot is kept. The biggest difference is that CUNYBot is carried over from last year rather than updated. It will be fun to compare relative progress.

CoG 2022 entry deadline

The entry deadline for CoG 2022 is this Sunday.

CoG has an exciting new map pool. In past years they selected at tournament time a small number of maps from a large pool that included some clunkers. This year they gave that up and chose 9 maps ahead of time (see the tournament rules). The exciting part is that they include BWAPI 1.16.1 versions of the newer maps Eclipse, Neo Sylphid, and Polypoid. Neo Sylphid is available on SCHNAIL, but I think the interesting and popular maps Eclipse and Polypoid are new to the major competitions.

They also included the map Outsider, which is difficult for bots. Most bots should be able to play games, but skills to cope with the blocked-off side bases will be valuable.

Finally, at least a few maps that are modern and familiar to current human players! CoG is always interesting for its map variety, and this year it is better than ever.

Update: I see that the entry deadline has been extended to 19 June. It might be a sign that they’re not getting many entrants. I hope it’s only a typical delay; delays can come from anywhere.

Steamhammer in SSCAIT 2021

I predicted Steamhammer to finish at #11 in SSCAIT this year, and hoped it would do a little better. It finished tied for #12-13. On the one hand, it’s only a little lower than I expected. On the other, the difference in games from what I expected is glaring, to my eyes. When I made the prediction, I didn’t realize that Steamhammer’s saved learning data had been reset at some recent time. In the games I saw, Steamhammer had about 8 past games of data on each opponent. I did not imagine that Steamhammer might lose 2 games in a row to XIMP by Tomas Vajda, and 2 games in a row to WuliBot, and other losses to fixed-strategy opponents—it simply doesn’t happen when Steamhammer is trained up.

I estimate that if Steamhammer had won its “easy” games at the rate it does on BASIL, it would have finished at #10, with a chance of reaching #9. It would have been as I hoped.

Today’s finals round 1 match against Halo by Hao Pan was awful. Steamhammer scores over 60% versus Halo on BASIL. In the SSCAIT round robin it scored 2-0 using a ling flood strategy, which won when Halo opened its wall prematurely. In today’s match the ling flood failed, though it was close. Steamhammer didn’t have much experience to back up its next choices, and made poor ones.

Steamhammer’s next match is in the loser’s bracket against #13 McRaveZ. I think its odds are under 50%.

SSCAIT 2021 nears its end

The round robin phase of SSCAIT 2021 is nearly over. The current ranking is close to the final one.

Places #15 and #16 are not quite sealed up. #15 Microwave and #16 WillyT are at risk of slipping.

For Steamhammer, it’s touch-and-go whether it will hold its position at #10 ahead of McRave, or will fall back to #11 behind McRave. Steamhammer has one loss fewer, and one more in remaining games to play.

Drama is good, that’s what SSCAIT is for.

SSCAIT 3 second game

What happened in game 947, Steamhammer-Florian Richoux? It wasn’t the failure to connect that has disturbed other games. It looks like a related but different server failure.

Both bots recorded replays. Both replays are 3 seconds long. Florian Richoux (aka AIUR) recorded a replay where both bots sent workers to mine, end of game. Steamhammer recorded a replay where it sent drones to mine while Florian Richoux was idle as if it had not connected. The official result has Florian Richoux winning, and the game is not considered a crash.

I guess Steamhammer connected and then somehow lost its connection after it issued its mining orders and before Florian Richoux’s mining orders reached it? Or something?

So far Steamhammer has 4 games out of 34 played which were disturbed by apparent server failures. 2 are wins and 2 are losses. That’s about 12%, consistent with the estimated 14% overall rate from earlier on. The failures are adding noise and on average causing scores to shift toward 50%.

SSCAIT early returns

SSCAIT has only been underway for a short time. Results so far are very rough and will change. Even so, Steamhammer is scoring about as expected, currently 10-4 for #10. It has played more games than most bots. A good sign is that it has played more games against the top 16 than any other bot in the top 16, and still held its expected position.

A bad sign is that Steamhammer has two wins over opponents that did not start up: Halo by Hao Pan and Stardust. Stardust has 3 losses, all against opponents it should beat easily. None of the 3 has a replay recorded on Stardust’s side, so it must have failed to start all 3 games. If it’s the server’s fault, either the server bug has a bias or else Stardust is extremely unlucky. In a real game, Steamhammer has good odds against Hao Pan (better than 2:1), but virtually no chance against Stardust.

The ranking will change a lot before the end. So far, BetaStar and PurpleWave have perfect records with 7 and 6 games played respectively. BananaBrain, Monster, and Krasi0P follow with around 90%.

Steamhammer in SSCAIT 2021

Games for SSCAIT 2021 will be starting any time now. Meanwhile, I have been working on an unrelated project which is well over half complete.

Steamhammer has participated in SSCAIT every year since 2016. This year makes six. Steamhammer finished at #11 in 2018, #11 in 2019, #11 in 2020. This year will be the first time Steamhammer has played without any special preparation or last-minute fixes. I expect it to finish at... #11, maybe a little better. If I had worked on it in the runup, it would have had a good chance to finish in the top half, because I’m at a point where big improvements are possible. I didn’t, but Steamhammer is still in good shape to finish as well as it has in past years.

Anyway, the proof is in the pudding. Let’s go!

AIIDE 2021 - one hour games

The second game I found where both bots believed they had lost was a game that went to the full 60 minutes. The cause is not the same as the frame timeout issue.

The official results have 22 games that went the full hour and had to be adjudicated on points. (Click the “duration” column twice to sort the longest games first.) In most cases, I’m not able to check whether both bots recorded the result correctly, because I can only check bots that have history files and the files are complete. All but a few of the hour-long games have a participant whose recorded value I can’t check.

But I did find several games where the official winner recorded that it had lost. Initial indications are that if the game runs the full hour, both bots are told that they lost, at least sometimes. I only watched one of the long replays, and in that game the official results were correct and winner WillyT believed it had lost to Stardust. At the end of the hour, WillyT’s tanks were clearing protoss bases against no resistance, but had not quite finished the job.

I’ll check a little more tomorrow and inform Dave Churchill.

AIIDE 2021 - what UAlbertaBot learned

I haven’t found time to investigate the second instance of “we both lost”. After this post, I’m nearly done with summarizing and aligning the bot learning files. The only bot I haven’t gotten to is FreshMeat, which has a unique learning system, not similar to any other bot’s. FreshMeat’s code is remarkably low-level, and deciphering the learning algorithm and the meaning of the learning files will take time.

In any case, here is UAlbertaBot’s learned data. UAlbertaBot keeps counts of wins and losses per strategy, not full history files, so its data can be laid out in a single table.

openingtotal#1
stardus
#2
bananab
#3
dragon
#4
steamha
#5
mcrave
#6
willyt
#7
microwa
#8
daqin
#9
freshme
total-  26%2-155  1%8-147  5%27-130  17%13-139  9%98-57  63%48-105  31%67-88  43%32-124  21%68-81  46%
4RaxMarines58-93 38%0-15 0%0-11 0%3-15 17%1-17 6%40-2 95%2-9 18%0-5 0%0-10 0%12-9 57%
MarineRush18-97 16%0-15 0%1-15 6%0-6 0%0-11 0%2-3 40%0-8 0%13-25 34%0-10 0%2-4 33%
TankPush12-102 11%0-15 0%0-11 0%0-6 0%0-11 0%1-2 33%5-25 17%0-5 0%3-23 12%3-4 43%
VultureRush15-90 14%0-14 0%0-10 0%5-19 21%0-11 0%0-1 0%0-8 0%1-9 10%0-10 0%9-8 53%
DTRush41-85 33%2-18 10%0-11 0%10-26 28%0-8 0%0-2 0%0-3 0%-0-4 0%29-13 69%
DragoonRush10-62 14%0-10 0%0-11 0%0-6 0%1-12 8%0-2 0%0-3 0%-7-14 33%2-4 33%
ZealotRush104-150 41%0-10 0%4-28 12%0-6 0%10-16 38%24-24 50%19-25 43%35-15 70%12-21 36%0-5 0%
2HatchHydra6-72 8%0-15 0%0-10 0%6-20 23%0-12 0%-0-2 0%0-3 0%0-4 0%0-6 0%
3HatchMuta1-61 2%0-15 0%0-10 0%0-6 0%0-12 0%-0-2 0%0-3 0%0-4 0%1-9 10%
3HatchScourge0-56 0%0-14 0%0-9 0%0-6 0%0-12 0%-0-2 0%0-3 0%0-4 0%0-6 0%
ZerglingRush98-158 38%0-14 0%3-21 12%3-14 18%1-17 6%31-21 60%22-18 55%18-20 47%10-20 33%10-13 43%

Looking down the total column on the left, there is one big surprise. UAlbertaBot has a primary strategy for each race it may roll, and switches away only if the primary strategy turns out poorly. In past years when I analyzed UAlbertaBot’s data (2018, 2019, and 2020), UAlbertaBot’s primary strategy with every race was also its best strategy overall when it rolled that race. This year, the primary terran strategy MarineRush was no longer best; it was far exceeded by 4RaxMarines, with better results against 5 opponents and equal zero against 2 more. 4RaxMarines does not mean build four barracks to train marines, it means build a barracks at supply 4: It is a fast rush. Here is the build order from the config file.

"Terran_4RaxMarines" : { "Race" : "Terran", "OpeningBuildOrder" : ["Barracks", "SCV", "SCV", "Marine", "Supply Depot", "Marine", "SCV", "Marine", "SCV", "Marine", "SCV", "Marine", "Barracks", "Marine", "Marine", "Marine"]}

I guess opponents were less prepared for the fast marine rush. McRave in particular was unable to cope. I looked through BASIL’s build order page and did not see it; I guess no bot plays 4 rax regularly. The version of UAlbertaBot on BASIL is different from the one in the tournament. The BASIL UAlbertaBot does play the slower marine rush, so its opponents have gotten used to it.

The 3HatchScourge build was useless. The build was specially designed to give UAlbertaBot a chance against XIMP, and apparently has no other value. Curiously, 3HatchMuta was nearly as helpless, with only 1 win, against FreshMeat. That win was the only win as zerg against FreshMeat, though, so chalk up one advantage.

AIIDE 2021 - Microwave versus DaQin

Two posts again today. Blue is good for Microwave, red is good for DaQin.

microwave strategies versus daqin strategies

overall4GateGoonForgeExpand5GateGoonForgeExpandSpeedlots
overall128/157 82%0/1 0%3/3 100%125/153 82%
1HatchMuta_Sparkle27/33 82%--27/33 82%
3HatchHydra0/1 0%--0/1 0%
3HatchLurker0/1 0%--0/1 0%
3HatchMuta95/106 90%-2/2 100%93/104 89%
3HatchMutaExpo0/1 0%--0/1 0%
4HatchPoolHydra1/1 100%--1/1 100%
5HatchPoolHydra1/2 50%0/1 0%-1/1 100%
6Pool0/1 0%--0/1 0%
6PoolSpeed0/1 0%--0/1 0%
9PoolHatchGasSpeed7D1/3 33%--1/3 33%
9PoolHatchGasSpeed8D3/6 50%-1/1 100%2/5 40%
9PoolSpeedLing0/1 0%--0/1 0%

DaQin barely varied its play, so again, nothing to see here.

microwave as seen by daqin

microwave played#daqin recognized
1HatchMuta_Sparkle3322 Not fast rush | 7 Heavy rush | 4 Proxy
3HatchHydra11 Not fast rush
3HatchLurker11 Heavy rush
3HatchMuta10684 Not fast rush | 16 Heavy rush | 4 Proxy | 2 Unknown
3HatchMutaExpo11 Not fast rush
4HatchPoolHydra11 Hydra bust
5HatchPoolHydra22 Not fast rush
6Pool11 Fast rush
6PoolSpeed11 Fast rush
9PoolHatchGasSpeed7D32 Heavy rush | 1 Not fast rush
9PoolHatchGasSpeed8D63 Fast rush | 2 Heavy rush | 1 Unknown
9PoolSpeedLing11 Heavy rush

9 pool is again sometimes a fast rush and sometimes something incompatible. And there are some stray proxies again. That is probably a bug inherited from Steamhammer (and long since fixed there).

daqin as seen by microwave

daqin played#microwave recognized
4GateGoon11 Unknown
ForgeExpand5GateGoon32 Turtle | 1 Unknown
ForgeExpandSpeedlots15387 Turtle | 43 SafeExpand | 16 Unknown | 4 NakedExpand | 3 HeavyRush

AIIDE 2021 - McRave versus DaQin

Blue is good for McRave, red is good for DaQin.

mcrave strategies versus daqin strategies

overallForgeExpand5GateGoonForgeExpandSpeedlots
overall122/157 78%3/3 100%119/154 77%
HatchPool,12Hatch,2HatchMuta102/123 83%3/3 100%99/120 82%
PoolHatch,9Pool,2HatchMuta1/3 33%-1/3 33%
PoolHatch,9Pool,3HatchMuta1/2 50%-1/2 50%
PoolHatch,9Pool,6HatchHydra0/2 0%-0/2 0%
PoolHatch,Overpool,2HatchMuta18/23 78%-18/23 78%
PoolHatch,Overpool,3HatchMuta0/3 0%-0/3 0%
PoolHatch,Overpool,6HatchHydra0/1 0%-0/1 0%

Move along, nothing to see here folks.

mcrave as seen by daqin

mcrave played#daqin recognized
HatchPool,12Hatch,2HatchMuta12393 Not fast rush | 18 Unknown | 12 Heavy rush
PoolHatch,9Pool,2HatchMuta31 Not fast rush | 1 Unknown | 1 Fast rush
PoolHatch,9Pool,3HatchMuta22 Fast rush
PoolHatch,9Pool,6HatchHydra21 Unknown | 1 Not fast rush
PoolHatch,Overpool,2HatchMuta2322 Not fast rush | 1 Heavy rush
PoolHatch,Overpool,3HatchMuta32 Not fast rush | 1 Heavy rush
PoolHatch,Overpool,6HatchHydra11 Not fast rush

Apparently 9 pool is sometimes a fast rush and sometimes a not fast rush.

daqin as seen by mcrave

daqin played#mcrave recognized
ForgeExpand5GateGoon33 FFE,Forge,5GateGoon
ForgeExpandSpeedlots15488 FFE,Forge,Speedlot | 24 FFE,Forge,5GateGoon | 23 FFE,Gateway,Speedlot | 7 FFE,Forge,ZealotArchon | 6 FFE,Nexus,Speedlot | 2 FFE,Nexus,5GateGoon | 2 FFE,Forge,Unknown | 2 FFE,Gateway,5GateGoon

There are those dragoons again, even when DaQin believes it is making zealots. I imagine that something in McRave’s recognizer is approximate. It only matters if McRave reacts to its own wrong recognition, though.