tournaments - 3 | Starcraft AI blog

AIIDE 2022 - map tables by bot

For each bot, its win rate by map and opponent. You can abbreviate it as bot x (map x opponent) if you like. Yesterday’s tables showed that maps make little difference when averaged across opponents. Today’s show that (as usual) maps do make a difference for specific opponents.

Each cell represents 22 or 23 games, sometimes fewer when games did not complete. No cell has fewer than 20 games. The same tables last year had 15 games per cell. The numbers are a trifle more reliable this year, but there is still a lot of statistical noise.

#	bananabrain	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
2	stardust	52%	52%	65%	32%	32%	77%	45%	50%	50%	50%	64%
3	dragon	78%	78%	78%	77%	77%	86%	86%	77%	64%	82%	73%
4	steamhammer	89%	87%	96%	82%	86%	91%	86%	77%	100%	86%	95%
5	purplewave	69%	87%	65%	73%	64%	86%	50%	55%	68%	68%	73%
6	mcrave	94%	91%	83%	95%	95%	91%	95%	91%	95%	100%	100%
7	microwave	91%	91%	91%	91%	100%	86%	95%	82%	95%	91%	91%
8	ualbertabot	97%	91%	100%	95%	100%	95%	100%	95%	95%	95%	100%
9	pylonpuller	92%	91%	87%	95%	100%	100%	91%	91%	86%	100%	77%
10	styx	94%	100%	100%	100%	100%	86%	100%	95%	95%	73%	91%
11	cunybot	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
	overall	85.53%	87%	87%	84%	85%	90%	85%	81%	85%	85%	86%

#1 BananaBrain was solid against most opponents, but inconsistent across maps versus its top protoss competition, #2 Stardust and #5 PurpleWave.

#	stardust	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	48%	48%	35%	68%	68%	23%	55%	50%	50%	50%	36%
3	dragon	92%	100%	95%	77%	100%	100%	95%	64%	100%	91%	95%
4	steamhammer	83%	100%	82%	73%	86%	91%	77%	50%	95%	95%	82%
5	purplewave	52%	57%	50%	73%	68%	27%	64%	18%	50%	59%	50%
6	mcrave	89%	83%	83%	86%	95%	95%	100%	95%	91%	77%	82%
7	microwave	93%	96%	100%	95%	100%	100%	95%	95%	77%	82%	86%
8	ualbertabot	83%	83%	86%	91%	100%	77%	100%	55%	77%	82%	82%
9	pylonpuller	95%	96%	95%	100%	100%	100%	100%	86%	77%	91%	100%
10	styx	84%	100%	95%	100%	95%	77%	91%	32%	86%	86%	73%
11	cunybot	97%	96%	100%	100%	100%	100%	100%	95%	95%	86%	95%
	overall	81.48%	86%	82%	86%	91%	79%	88%	64%	80%	80%	78%

Here is the source of #2 Stardust’s relative weakness on Empire of the Sun: #5 PurpleWave and #10 Styx found holes in its play on the map. The upset by Styx on that map only is particularly extreme. Heartbreak Ridge, Longinus, and Empire of the Sun are the maps where the main bases are on the same level as the naturals, with no ramp, and all of them had at least one opponent that could exploit Stardust. But if that’s the cause, then why is Aztec fine for Stardust? On Aztec, the naturals are uphill from the mains.

#	dragon	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	22%	22%	22%	23%	23%	14%	14%	23%	36%	18%	27%
2	stardust	8%	0%	5%	23%	0%	0%	5%	36%	0%	9%	5%
4	steamhammer	21%	26%	32%	14%	18%	32%	32%	14%	5%	36%	5%
5	purplewave	97%	91%	95%	91%	100%	100%	100%	100%	100%	95%	100%
6	mcrave	95%	96%	87%	95%	100%	100%	95%	91%	95%	100%	95%
7	microwave	56%	65%	55%	36%	45%	64%	73%	59%	55%	59%	50%
8	ualbertabot	77%	82%	64%	95%	80%	73%	77%	77%	68%	77%	73%
9	pylonpuller	98%	100%	100%	95%	100%	91%	95%	100%	100%	100%	100%
10	styx	94%	96%	100%	100%	91%	91%	95%	100%	100%	82%	86%
11	cunybot	95%	96%	96%	95%	91%	95%	100%	91%	100%	95%	95%
	overall	66.46%	67%	65%	67%	65%	66%	69%	69%	66%	67%	64%

Last year and the year before I thought that #3 Dragon was inconsistent across maps. This year it doesn’t look that way. It’s the same bot carried over. The difference seems to be that this year Dragon either smashed its opponents or got smashed by them. It remains inconsistent against #7 Microwave and #8 UAlbertaBot, the opponents scoring closest to 50%.

#	steamhammer	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	11%	13%	4%	18%	14%	9%	14%	23%	0%	14%	5%
2	stardust	17%	0%	18%	27%	14%	9%	23%	50%	5%	5%	18%
3	dragon	79%	74%	68%	86%	82%	68%	68%	86%	95%	64%	95%
5	purplewave	43%	57%	55%	50%	41%	27%	41%	27%	36%	32%	59%
6	mcrave	43%	48%	74%	45%	45%	45%	45%	23%	32%	45%	27%
7	microwave	73%	70%	57%	50%	82%	68%	91%	77%	77%	91%	68%
8	ualbertabot	95%	91%	100%	100%	86%	100%	95%	95%	100%	91%	95%
9	pylonpuller	80%	70%	64%	82%	86%	86%	82%	91%	82%	77%	82%
10	styx	90%	91%	86%	100%	95%	86%	100%	95%	95%	77%	73%
11	cunybot	97%	100%	96%	100%	100%	91%	95%	95%	100%	100%	95%
	overall	62.71%	61%	62%	66%	65%	59%	65%	66%	62%	60%	61%

Someday I will get Steamhammer to adapt properly to the map it is playing on.

#4 Steamhammer owes its ranking in large part to its strong performance against the carryover bots that it specifically prepared for. Versus #3 Dragon: Last year 63%, this year 79%. Versus #8 UAlbertaBot: Last year 92%, this year 95%. I knew that both would be up. I’m surprised that other bots seem to have been unprepared for Dragon in particular.

#	purplewave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	31%	13%	35%	27%	36%	14%	50%	45%	32%	32%	27%
2	stardust	48%	43%	50%	27%	32%	73%	36%	82%	50%	41%	50%
3	dragon	3%	9%	5%	9%	0%	0%	0%	0%	0%	5%	0%
4	steamhammer	57%	43%	45%	50%	59%	73%	59%	73%	64%	68%	41%
6	mcrave	84%	26%	57%	100%	91%	86%	91%	100%	100%	95%	100%
7	microwave	50%	35%	17%	55%	45%	55%	36%	82%	77%	45%	50%
8	ualbertabot	66%	78%	55%	100%	68%	68%	55%	64%	55%	64%	50%
9	pylonpuller	87%	91%	74%	100%	95%	77%	91%	77%	95%	82%	86%
10	styx	96%	100%	100%	73%	100%	100%	100%	95%	100%	95%	100%
11	cunybot	89%	83%	96%	82%	86%	86%	100%	77%	95%	100%	86%
	overall	61.17%	52%	53%	62%	61%	63%	62%	70%	67%	63%	59%

#5 PurpleWave struggled versus #6 McRave on the 2-player maps Destination and Heartbreak Ridge, but scored 100% on the other 2-player map Polaris Rhapsody. It smells like a bug—but see the next table.

#	mcrave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	6%	9%	17%	5%	5%	9%	5%	9%	5%	0%	0%
2	stardust	11%	17%	17%	14%	5%	5%	0%	5%	9%	23%	18%
3	dragon	5%	4%	13%	5%	0%	0%	5%	9%	5%	0%	5%
4	steamhammer	57%	52%	26%	55%	55%	55%	55%	77%	68%	55%	73%
5	purplewave	16%	74%	43%	0%	9%	14%	9%	0%	0%	5%	0%
7	microwave	92%	100%	91%	68%	100%	100%	77%	91%	95%	100%	100%
8	ualbertabot	29%	74%	41%	43%	14%	36%	5%	14%	9%	36%	10%
9	pylonpuller	62%	65%	70%	82%	73%	50%	68%	36%	68%	59%	50%
10	styx	91%	100%	70%	82%	91%	100%	95%	100%	95%	82%	91%
11	cunybot	100%	100%	100%	100%	100%	95%	100%	100%	100%	100%	100%
	overall	46.79%	60%	49%	45%	45%	46%	42%	44%	45%	46%	45%

Why does #6 McRave like Destination? Mainly because of upsets against #5 PurpleWave and #8 UAlbertaBot that otherwise defeat it. If the win over PurpleWave is due to PurpleWave’s putative bug, then what explains the win over UAlbertaBot?

#	microwave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	9%	9%	9%	9%	0%	14%	5%	18%	5%	9%	9%
2	stardust	7%	4%	0%	5%	0%	0%	5%	5%	23%	18%	14%
3	dragon	44%	35%	45%	64%	55%	36%	27%	41%	45%	41%	50%
4	steamhammer	27%	30%	43%	50%	18%	32%	9%	23%	23%	9%	32%
5	purplewave	50%	65%	83%	45%	55%	45%	64%	18%	23%	55%	50%
6	mcrave	8%	0%	9%	32%	0%	0%	23%	9%	5%	0%	0%
8	ualbertabot	57%	70%	65%	55%	43%	36%	77%	73%	59%	55%	32%
9	pylonpuller	67%	57%	61%	91%	68%	50%	55%	77%	77%	64%	68%
10	styx	99%	100%	100%	100%	95%	100%	100%	100%	95%	100%	100%
11	cunybot	99%	100%	100%	100%	100%	95%	100%	95%	100%	100%	100%
	overall	46.62%	47%	52%	55%	43%	41%	46%	46%	45%	45%	45%

#	ualbertabot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	3%	9%	0%	5%	0%	5%	0%	5%	5%	5%	0%
2	stardust	17%	17%	14%	9%	0%	23%	0%	45%	23%	18%	18%
3	dragon	23%	18%	36%	5%	20%	27%	23%	23%	32%	23%	27%
4	steamhammer	5%	9%	0%	0%	14%	0%	5%	5%	0%	9%	5%
5	purplewave	34%	22%	45%	0%	32%	32%	45%	36%	45%	36%	50%
6	mcrave	71%	26%	59%	57%	86%	64%	95%	86%	91%	64%	90%
7	microwave	43%	30%	35%	45%	57%	64%	23%	27%	41%	45%	68%
9	pylonpuller	66%	74%	64%	91%	82%	59%	64%	43%	45%	64%	77%
10	styx	95%	86%	100%	100%	95%	95%	86%	82%	100%	100%	100%
11	cunybot	98%	100%	100%	95%	100%	100%	100%	95%	100%	91%	100%
	overall	45.74%	39%	45%	41%	49%	47%	44%	45%	48%	45%	54%

It’s interesting that #8 UAlbertaBot does better against #6 McRave on the 4-player maps. You might think that UAlbertaBot’s rushes would work better on 2-player maps with a short rush distance, but it’s the opposite. I imagine it is because McRave takes longer to scout, so it can’t adapt as quickly.

#	pylonpuller	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	8%	9%	13%	5%	0%	0%	9%	9%	14%	0%	23%
2	stardust	5%	4%	5%	0%	0%	0%	0%	14%	23%	9%	0%
3	dragon	2%	0%	0%	5%	0%	9%	5%	0%	0%	0%	0%
4	steamhammer	20%	30%	36%	18%	14%	14%	18%	9%	18%	23%	18%
5	purplewave	13%	9%	26%	0%	5%	23%	9%	23%	5%	18%	14%
6	mcrave	38%	35%	30%	18%	27%	50%	32%	64%	32%	41%	50%
7	microwave	33%	43%	39%	9%	32%	50%	45%	23%	23%	36%	32%
8	ualbertabot	34%	26%	36%	9%	18%	41%	36%	57%	55%	36%	23%
10	styx	62%	87%	64%	18%	73%	82%	45%	64%	73%	59%	55%
11	cunybot	74%	83%	83%	36%	86%	55%	77%	82%	68%	91%	77%
	overall	28.91%	33%	33%	12%	25%	32%	28%	34%	31%	31%	29%

Wow, look at results versus #10 Styx. Polaris Rhapsody does seem to be an outlier among the 2-player maps.

#	styx	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	6%	0%	0%	0%	0%	14%	0%	5%	5%	27%	9%
2	stardust	16%	0%	5%	0%	5%	23%	9%	68%	14%	14%	27%
3	dragon	6%	4%	0%	0%	9%	9%	5%	0%	0%	18%	14%
4	steamhammer	10%	9%	14%	0%	5%	14%	0%	5%	5%	23%	27%
5	purplewave	4%	0%	0%	27%	0%	0%	0%	5%	0%	5%	0%
6	mcrave	9%	0%	30%	18%	9%	0%	5%	0%	5%	18%	9%
7	microwave	1%	0%	0%	0%	5%	0%	0%	0%	5%	0%	0%
8	ualbertabot	5%	14%	0%	0%	5%	5%	14%	18%	0%	0%	0%
9	pylonpuller	38%	13%	36%	82%	27%	18%	55%	36%	27%	41%	45%
11	cunybot	44%	48%	17%	59%	68%	50%	23%	32%	45%	55%	41%
	overall	13.92%	9%	10%	19%	13%	13%	11%	17%	10%	20%	17%

Only a few pinprick upsets, but one of them is extreme.

#	cunybot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	stardust	3%	4%	0%	0%	0%	0%	0%	5%	5%	14%	5%
3	dragon	5%	4%	4%	5%	9%	5%	0%	9%	0%	5%	5%
4	steamhammer	3%	0%	4%	0%	0%	9%	5%	5%	0%	0%	5%
5	purplewave	11%	17%	4%	18%	14%	14%	0%	23%	5%	0%	14%
6	mcrave	0%	0%	0%	0%	0%	5%	0%	0%	0%	0%	0%
7	microwave	1%	0%	0%	0%	0%	5%	0%	5%	0%	0%	0%
8	ualbertabot	2%	0%	0%	5%	0%	0%	0%	5%	0%	9%	0%
9	pylonpuller	26%	17%	17%	64%	14%	45%	23%	18%	32%	9%	23%
10	styx	56%	52%	83%	41%	32%	50%	77%	68%	55%	45%	59%
	overall	10.69%	10%	11%	13%	7%	13%	10%	14%	10%	8%	11%

AIIDE 2022 - maps and game durations

First, win rates for bots x maps. This is identical to the third table in the official results, except for the presentation.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	85.53%	87%	87%	84%	85%	90%	85%	81%	85%	85%	86%
2	stardust	81.48%	86%	82%	86%	91%	79%	88%	64%	80%	80%	78%
3	dragon	66.46%	67%	65%	67%	65%	66%	69%	69%	66%	67%	64%
4	steamhammer	62.71%	61%	62%	66%	65%	59%	65%	66%	62%	60%	61%
5	purplewave	61.17%	52%	53%	62%	61%	63%	62%	70%	67%	63%	59%
6	mcrave	46.79%	60%	49%	45%	45%	46%	42%	44%	45%	46%	45%
7	microwave	46.62%	47%	52%	55%	43%	41%	46%	46%	45%	45%	45%
8	ualbertabot	45.74%	39%	45%	41%	49%	47%	44%	45%	48%	45%	54%
9	pylonpuller	28.91%	33%	33%	12%	25%	32%	28%	34%	31%	31%	29%
10	styx	13.92%	9%	10%	19%	13%	13%	11%	17%	10%	20%	17%
11	cunybot	10.69%	10%	11%	13%	7%	13%	10%	14%	10%	8%	11%

Stardust had some trouble on Empire of the Sun, and McRave liked Destination. For the most part, maps did not make a big difference when averaged out over opponents.

game durations

Game durations for bots x maps. The top number in each cell is the median duration of winning games, and the bottom number is for losing games. The overall numbers in the bottom row are the median duration of all games played on each map. The cell coloring is the same as in the table above—it reflects the winning rate, so you can judge by eye the balance of games in the top and bottom numbers.

As a general guideline, if winning games are shorter than losing games then the bot likes to win by early pressure and loses by getting outplayed later. Early pressure costs economy and tech. In the opposite case, the bot defends any early pressure and has stronger play in the long run (it shows any or all of macro, micro, and tech advantage). #8 UAlbertaBot is the most determined rushbot. #3 Dragon is the most prominent defensive bot. #7 Microwave is well-balanced. Note: Adding up the overall median winning times across opponents does not give the same result as adding up the losing times. The median is insensitive to outliers.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	bananabrain	11:25 15:22	12:11 15:55	11:27 16:21	11:21 15:24	11:46 19:56	11:22 12:48	11:20 16:57	11:13 13:49	11:16 14:38	11:11 13:41	11:35 14:42
2	stardust	10:34 15:16	10:56 16:23	10:07 17:05	10:54 17:55	10:42 18:14	10:34 15:13	10:51 18:56	10:51 0:01	10:47 12:51	10:03 0:01	10:14 14:59
3	dragon	13:57 11:35	14:53 12:53	14:28 10:22	12:52 13:32	14:10 12:04	13:39 10:13	13:57 12:00	13:41 12:09	13:46 12:37	13:23 9:59	14:26 10:47
4	steamhammer	8:18 11:23	8:47 10:54	7:53 11:55	7:28 11:45	10:12 10:47	7:47 11:31	9:34 11:43	7:50 11:48	7:23 11:25	8:00 10:49	9:17 10:12
5	purplewave	12:09 17:50	13:20 17:45	13:07 16:45	12:31 19:14	12:08 20:51	12:34 15:48	12:09 18:54	12:09 16:38	11:47 18:24	11:53 18:18	11:38 17:08
6	mcrave	8:34 12:02	9:51 13:06	10:30 12:40	8:07 13:01	7:58 12:11	7:39 12:30	9:17 11:32	8:26 11:40	8:53 11:50	7:52 11:35	8:04 11:41
7	microwave	9:36 9:24	10:12 10:02	8:51 10:22	9:52 10:03	10:48 9:18	7:15 8:12	11:41 11:40	10:54 10:22	8:53 10:06	10:04 8:27	9:43 8:27
8	ualbertabot	6:28 10:33	8:00 11:18	6:39 10:09	5:24 10:30	6:22 10:48	6:18 10:21	6:55 10:00	6:38 10:26	6:28 10:34	6:19 10:52	6:50 10:41
9	pylonpuller	10:31 11:38	10:04 12:13	10:51 12:29	10:38 6:30	10:34 12:03	9:47 11:49	10:17 11:56	10:54 11:55	10:30 11:20	10:52 11:51	10:33 11:21
10	styx	8:28 8:14	11:32 8:39	9:07 7:38	9:10 8:16	8:29 8:38	8:32 8:03	9:21 8:45	8:54 8:25	7:16 8:09	6:44 8:04	8:38 8:08
11	cunybot	8:16 9:21	9:16 9:33	8:05 9:15	7:51 8:35	7:10 9:56	8:13 8:53	9:33 9:46	8:34 10:05	9:46 9:15	6:03 10:01	8:48 9:00
	overall	10:47	11:21	10:49	10:30	10:58	10:28	11:06	10:45	10:38	10:34	10:30

The top three bots have consistent winning times across maps. BananaBrain in particular is highly consistent. It seems to indicate a strong and well-executed strategy that wins on schedule. Losing times vary because they depend on what the opponent does after surviving.

The map with the longest game times is Destination. That probably reflects the difficulty of attacking across the twin bridges into the natural. The losing side can often defend until it runs out of resources.

Stardust

The most striking cells in the table are Stardust’s losing times on Empire of the Sun and Python. The time rendered as 0:01 is 33 frames, which is always the point when Stardust crashes, when it does (I checked). Over half the losses on those maps were crashes, so that the median loss was a crash. There were still plenty of wins. Is it due to Stardust crashing on those maps, or to winning so often that the median losing game was a crash? I made a little table of Stardust games which are exactly 33 frames long.

#	bot	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
2	stardust	losses 33 crashes 4	losses 40 crashes 0	losses 30 crashes 0	losses 19 crashes 1	losses 46 crashes 1	losses 27 crashes 0	losses 79 crashes 46	losses 44 crashes 17	losses 44 crashes 23	losses 48 crashes 11

Answer: It’s due to Stardust crashing on those maps. The rate of 33-frame games varies extremely by map, though if you ran enough games I imagine it would be non-zero for every map. Four-player maps other than Circuit Breakers have a high crash rate.

Next: Breaking down results by map and opponent.

AIIDE 2022 - first look at results

AIIDE 2022 results are out today, complete with the detailed results file. The carryovers from last year are #3 Dragon and #8 UAlbertaBot. The others are updated for this year.

My version of the crosstable. It’s identical to the official crosstable except for the presentation.

#	bot	overall	bana	star	drag	stea	purp	mcra	micr	ualb	pylo	styx	cuny
1	bananabrain	85.53%		52%	78%	89%	69%	94%	91%	97%	92%	94%	100%
2	stardust	81.48%	48%		92%	83%	52%	89%	93%	83%	95%	84%	97%
3	dragon	66.46%	22%	8%		21%	97%	95%	56%	77%	98%	94%	95%
4	steamhammer	62.71%	11%	17%	79%		43%	43%	73%	95%	80%	90%	97%
5	purplewave	61.17%	31%	48%	3%	57%		84%	50%	66%	87%	96%	89%
6	mcrave	46.79%	6%	11%	5%	57%	16%		92%	29%	62%	91%	100%
7	microwave	46.62%	9%	7%	44%	27%	50%	8%		57%	67%	99%	99%
8	ualbertabot	45.74%	3%	17%	23%	5%	34%	71%	43%		66%	95%	98%
9	pylonpuller	28.91%	8%	5%	2%	20%	13%	38%	33%	34%		62%	74%
10	styx	13.92%	6%	16%	6%	10%	4%	9%	1%	5%	38%		44%
11	cunybot	10.69%	0%	3%	5%	3%	11%	0%	1%	2%	26%	56%

The top four finishers are the same as last year, except that #1 BananaBrain and #2 Stardust are reversed. In CoG this year #5 PurpleWave made it to second, but not in AIIDE. #2 Stardust did not overtake BananaBrain, but came closer. The top two were not far apart from each other and dominated the rest.

I’m pleased that Steamhammer was able to hold its rank, because it is only slightly improved over last year’s version. I expected to be behind #5 PurpleWave and hoped to pass #3 Dragon, since I knew Steamhammer would score well head-to-head.

Stardust had around a hundred crashes. PurpleWave, McRave, and CUNYBot had hundreds of frame timeouts each. All these bots had a chance to move up in the rankings if they hadn’t lost so often for non-play-related reasons. Bots seem to be having increasing trouble with the time limits.

Thanks to an influx of weaker opponents, #8 UAlbertaBot finished above the bottom of the table, unlike last year. I wasn’t afraid of Styx finishing high, but I’m surprised it did so poorly. In the BASIL rankings, new bot #9 PylonPuller (which has been improving fast) and #10 Styx have almost the same elo. In fact, all the tail enders have curiously low win rates. Last year UAlbertaBot scored 27%—much worse than this year—and the fairly weak FreshMeat one rank up scored 34%. This year, the weak tail enders pushed everybody else’s wins up and made the tournament seem easy for them.

by race

The table of how each bot did by opponent race. Since there is only one terran and only one random bot, it’s less informative than we might like.

#	bot	overall	vT	vP	vZ	vR
1	bananabrain	85.53%	78%	71%	94%	97%
2	stardust	81.48%	92%	65%	89%	83%
3	dragon	66.46%	-	56%	73%	77%
4	steamhammer	62.71%	79%	38%	76%	95%
5	purplewave	61.17%	3%	55%	75%	66%
6	mcrave	46.79%	5%	24%	85%	29%
7	microwave	46.62%	44%	33%	58%	57%
8	ualbertabot	45.74%	23%	30%	63%	-
9	pylonpuller	28.91%	2%	9%	45%	34%
10	styx	13.92%	6%	16%	16%	5%
11	cunybot	10.69%	5%	10%	15%	2%

Every bot scored better versus zerg than versus protoss, except for Styx which was about the same. That’s the important message in the table.

Next: Map tables.

CoG 2022 results first look

As Dan Gant let me know, CoG 2022 results are out today, complete with the detailed results file. The participants are the same as last year, except that MetaBot was dropped for unreliability that affecting the running of the tournament. The carryovers from last year are #6 XiaoYi, #7 CUNYbot, and #8 BetaStar. The others are updated for this year.

My version of the crosstable.

	overall	Bana	Purp	Star	McRa	Micr	XIAO	CUNY	Beta
#1 BananaBrain	85.40%		79%	60%	69%	90%	100%	100%	100%
#2 PurpleWave	75.24%	21%		84%	55%	74%	97%	97%	100%
#3 Stardust	73.02%	40%	16%		69%	90%	98%	100%	98%
#4 McRave	68.60%	31%	45%	31%		93%	81%	100%	100%
#5 Microwave	46.54%	10%	26%	10%	7%		74%	98%	100%
#6 XIAOYI	35.40%	0%	3%	2%	19%	26%		98%	100%
#7 CUNYBot	15.49%	0%	3%	0%	0%	2%	2%		100%
#8 BetaStar	0.32%	0%	0%	2%	0%	0%	0%	0%

There are surprises throughout, from top to bottom.

Stardust’s reign is over for the moment. Last year, Stardust scored over 90% in CoG and over 95% in AIIDE, crushing the competition. This time, #1 BananaBrain dominated with 85%, and #2 PurpleWave edged out #3 Stardust. The official results show that Stardust had 67 crashes and 7 frame timeouts in 3150 games. If Stardust had the same number of crashes (zero) and frame timeouts (1) as the two bots above it, it would have finished second by a razor-thin margin.

There is not a single upset, where a lower-ranked bot defeated a higher-ranked bot. The crosstable is very orderly. The lowest winning rate of a higher-ranked bot is 55% for #2 PurpleWave over #4 McRave.

Something went wrong with BetaStar. It is a strong bot and finished well ahead of CUNYbot last year. Head to head versus CUNYBot, it scored 40 wins out of 50 games. This year it scored 10 wins total against all opposition, and all wins were against Stardust and likely due to crashes. What went wrong? Did the new and improved map pool break it? Was there a rule change that it could not cope with?

race results

I made two versions of each table. The left one includes all results, the right one excludes BetaStar.

race	score
terran	35%
protoss	58%
zerg	44%

race	score
terran	25%
protoss	74%
zerg	34%

It’s not very informative, but I like to include it anyway. There was only one terran; we need more. Protoss dominated, as usual in recent years, even when including BetaStar’s debacle.

bot	race	overall	vT	vP	vZ
BananaBrain	protoss	85.40%	100%	80%	86%
PurpleWave	protoss	75.24%	97%	68%	75%
Stardust	protoss	73.02%	98%	51%	86%
McRave	zerg	68.60%	81%	52%	96%
Microwave	zerg	46.54%	74%	37%	53%
XIAOYI	terran	35.40%	-	26%	48%
CUNYBot	zerg	15.49%	2%	26%	1%
BetaStar	protoss	0.32%	0%	1%	0%

bot	race	overall	vT	vP	vZ
BananaBrain	protoss	82.96%	100%	70%	86%
PurpleWave	protoss	71.11%	97%	52%	75%
Stardust	protoss	68.89%	98%	28%	86%
McRave	zerg	63.37%	81%	36%	96%
Microwave	zerg	37.63%	74%	16%	53%
XIAOYI	terran	24.63%	-	1%	48%
CUNYBot	zerg	1.41%	2%	1%	1%

Again, not very informative with so few participants. Excluding BetaStar clarifies that CUNYbot was outclassed. XiaoYi was also outclassed by the remaining protoss, and was only able to fight against the zergs.

the surprising poor results

Stardust’s crash rate surprises me. It does not have a crashing problem on BASIL. There was something in the tournament environment that it was not ready for. I can’t guess whether that’s more due to Stardust, or more due to the tournament.

BetaStar essentially scored zero and added no information to the tournament results. To me it suggests that the tournament environment changed somehow (we know that at least the map pool changed), and the organizers did not test the carryover bots to make sure they still worked.

AIIDE 2022 registration

AIIDE 2022 registration is open.

Steamhammer will be competing.

CoG 2022 prospects

CoG this year is a small, elite tournament, virtually the same as last year. The entrants, with their win rates in last year’s CoG:

bot	wins	author
Stardust	90.25%	Bruce Nielsen
BananaBrain	74.69%	Johan de Jong
McRave	68.17%	Christian McCrave
Microwave	54.14%	Micky Holdorf
PurpleWave	52.14%	Dan Gant

It’s a fair guess at the likely finishing order. Today’s BASIL ranks and 2021 CoG ranks are the same with one exception, McRave and Microwave are reversed. But PurpleWave has been playing for a long time with one bug that prevents it from doing any upgrades, including basics like dragoon range and zealot speed, and another bug that causes it to construct duplicate buildings. I think it’s a safe guess that the bugs will be fixed for the tournament, and PurpleWave may finish higher.

No terrans. That’s unfortunate.

The carryovers, also with their win rates from last year:

bot	wins	author
XiaoYi	40.10%	Benchang Zheng
BetaStar	39.29%	Ruo-Ze Luo
MetaBot	23.08%	Anderson Tavares
CUNYBot	7.5%	Bryan Weber

There’s a note that MetaBot may be dropped due to a tournament stability issue. The decision has not been announced yet.

All updated entrants are likely to outscore all carryovers. XiaoYi is the sole terran, and should not be much of a challenge for current bots. On BASIL, terran krasi0 has just in the last two days retaken its top spot. It would have been good to at least have Hao Pan or Dragon in the tournament.

But there is a silver lining. This is exactly the same participants as last year, assuming that MetaBot is kept. The biggest difference is that CUNYBot is carried over from last year rather than updated. It will be fun to compare relative progress.

CoG 2022 entry deadline

The entry deadline for CoG 2022 is this Sunday.

CoG has an exciting new map pool. In past years they selected at tournament time a small number of maps from a large pool that included some clunkers. This year they gave that up and chose 9 maps ahead of time (see the tournament rules). The exciting part is that they include BWAPI 1.16.1 versions of the newer maps Eclipse, Neo Sylphid, and Polypoid. Neo Sylphid is available on SCHNAIL, but I think the interesting and popular maps Eclipse and Polypoid are new to the major competitions.

They also included the map Outsider, which is difficult for bots. Most bots should be able to play games, but skills to cope with the blocked-off side bases will be valuable.

Finally, at least a few maps that are modern and familiar to current human players! CoG is always interesting for its map variety, and this year it is better than ever.

Update: I see that the entry deadline has been extended to 19 June. It might be a sign that they’re not getting many entrants. I hope it’s only a typical delay; delays can come from anywhere.

Steamhammer in SSCAIT 2021

I predicted Steamhammer to finish at #11 in SSCAIT this year, and hoped it would do a little better. It finished tied for #12-13. On the one hand, it’s only a little lower than I expected. On the other, the difference in games from what I expected is glaring, to my eyes. When I made the prediction, I didn’t realize that Steamhammer’s saved learning data had been reset at some recent time. In the games I saw, Steamhammer had about 8 past games of data on each opponent. I did not imagine that Steamhammer might lose 2 games in a row to XIMP by Tomas Vajda, and 2 games in a row to WuliBot, and other losses to fixed-strategy opponents—it simply doesn’t happen when Steamhammer is trained up.

I estimate that if Steamhammer had won its “easy” games at the rate it does on BASIL, it would have finished at #10, with a chance of reaching #9. It would have been as I hoped.

Today’s finals round 1 match against Halo by Hao Pan was awful. Steamhammer scores over 60% versus Halo on BASIL. In the SSCAIT round robin it scored 2-0 using a ling flood strategy, which won when Halo opened its wall prematurely. In today’s match the ling flood failed, though it was close. Steamhammer didn’t have much experience to back up its next choices, and made poor ones.

Steamhammer’s next match is in the loser’s bracket against #13 McRaveZ. I think its odds are under 50%.

SSCAIT 2021 nears its end

The round robin phase of SSCAIT 2021 is nearly over. The current ranking is close to the final one.

Places #15 and #16 are not quite sealed up. #15 Microwave and #16 WillyT are at risk of slipping.

For Steamhammer, it’s touch-and-go whether it will hold its position at #10 ahead of McRave, or will fall back to #11 behind McRave. Steamhammer has one loss fewer, and one more in remaining games to play.

Drama is good, that’s what SSCAIT is for.

SSCAIT 3 second game

What happened in game 947, Steamhammer-Florian Richoux? It wasn’t the failure to connect that has disturbed other games. It looks like a related but different server failure.

Both bots recorded replays. Both replays are 3 seconds long. Florian Richoux (aka AIUR) recorded a replay where both bots sent workers to mine, end of game. Steamhammer recorded a replay where it sent drones to mine while Florian Richoux was idle as if it had not connected. The official result has Florian Richoux winning, and the game is not considered a crash.

I guess Steamhammer connected and then somehow lost its connection after it issued its mining orders and before Florian Richoux’s mining orders reached it? Or something?

So far Steamhammer has 4 games out of 34 played which were disturbed by apparent server failures. 2 are wins and 2 are losses. That’s about 12%, consistent with the estimated 14% overall rate from earlier on. The failures are adding noise and on average causing scores to shift toward 50%.

SSCAIT early returns

SSCAIT has only been underway for a short time. Results so far are very rough and will change. Even so, Steamhammer is scoring about as expected, currently 10-4 for #10. It has played more games than most bots. A good sign is that it has played more games against the top 16 than any other bot in the top 16, and still held its expected position.

A bad sign is that Steamhammer has two wins over opponents that did not start up: Halo by Hao Pan and Stardust. Stardust has 3 losses, all against opponents it should beat easily. None of the 3 has a replay recorded on Stardust’s side, so it must have failed to start all 3 games. If it’s the server’s fault, either the server bug has a bias or else Stardust is extremely unlucky. In a real game, Steamhammer has good odds against Hao Pan (better than 2:1), but virtually no chance against Stardust.

The ranking will change a lot before the end. So far, BetaStar and PurpleWave have perfect records with 7 and 6 games played respectively. BananaBrain, Monster, and Krasi0P follow with around 90%.

Steamhammer in SSCAIT 2021

Games for SSCAIT 2021 will be starting any time now. Meanwhile, I have been working on an unrelated project which is well over half complete.

Steamhammer has participated in SSCAIT every year since 2016. This year makes six. Steamhammer finished at #11 in 2018, #11 in 2019, #11 in 2020. This year will be the first time Steamhammer has played without any special preparation or last-minute fixes. I expect it to finish at... #11, maybe a little better. If I had worked on it in the runup, it would have had a good chance to finish in the top half, because I’m at a point where big improvements are possible. I didn’t, but Steamhammer is still in good shape to finish as well as it has in past years.

Anyway, the proof is in the pudding. Let’s go!

AIIDE 2021 - one hour games

The second game I found where both bots believed they had lost was a game that went to the full 60 minutes. The cause is not the same as the frame timeout issue.

The official results have 22 games that went the full hour and had to be adjudicated on points. (Click the “duration” column twice to sort the longest games first.) In most cases, I’m not able to check whether both bots recorded the result correctly, because I can only check bots that have history files and the files are complete. All but a few of the hour-long games have a participant whose recorded value I can’t check.

But I did find several games where the official winner recorded that it had lost. Initial indications are that if the game runs the full hour, both bots are told that they lost, at least sometimes. I only watched one of the long replays, and in that game the official results were correct and winner WillyT believed it had lost to Stardust. At the end of the hour, WillyT’s tanks were clearing protoss bases against no resistance, but had not quite finished the job.

I’ll check a little more tomorrow and inform Dave Churchill.

AIIDE 2021 - what UAlbertaBot learned

I haven’t found time to investigate the second instance of “we both lost”. After this post, I’m nearly done with summarizing and aligning the bot learning files. The only bot I haven’t gotten to is FreshMeat, which has a unique learning system, not similar to any other bot’s. FreshMeat’s code is remarkably low-level, and deciphering the learning algorithm and the meaning of the learning files will take time.

In any case, here is UAlbertaBot’s learned data. UAlbertaBot keeps counts of wins and losses per strategy, not full history files, so its data can be laid out in a single table.

opening	total	#1 stardus	#2 bananab	#3 dragon	#4 steamha	#5 mcrave	#6 willyt	#7 microwa	#8 daqin	#9 freshme
total	- 26%	2-155 1%	8-147 5%	27-130 17%	13-139 9%	98-57 63%	48-105 31%	67-88 43%	32-124 21%	68-81 46%
4RaxMarines	58-93 38%	0-15 0%	0-11 0%	3-15 17%	1-17 6%	40-2 95%	2-9 18%	0-5 0%	0-10 0%	12-9 57%
MarineRush	18-97 16%	0-15 0%	1-15 6%	0-6 0%	0-11 0%	2-3 40%	0-8 0%	13-25 34%	0-10 0%	2-4 33%
TankPush	12-102 11%	0-15 0%	0-11 0%	0-6 0%	0-11 0%	1-2 33%	5-25 17%	0-5 0%	3-23 12%	3-4 43%
VultureRush	15-90 14%	0-14 0%	0-10 0%	5-19 21%	0-11 0%	0-1 0%	0-8 0%	1-9 10%	0-10 0%	9-8 53%
DTRush	41-85 33%	2-18 10%	0-11 0%	10-26 28%	0-8 0%	0-2 0%	0-3 0%	-	0-4 0%	29-13 69%
DragoonRush	10-62 14%	0-10 0%	0-11 0%	0-6 0%	1-12 8%	0-2 0%	0-3 0%	-	7-14 33%	2-4 33%
ZealotRush	104-150 41%	0-10 0%	4-28 12%	0-6 0%	10-16 38%	24-24 50%	19-25 43%	35-15 70%	12-21 36%	0-5 0%
2HatchHydra	6-72 8%	0-15 0%	0-10 0%	6-20 23%	0-12 0%	-	0-2 0%	0-3 0%	0-4 0%	0-6 0%
3HatchMuta	1-61 2%	0-15 0%	0-10 0%	0-6 0%	0-12 0%	-	0-2 0%	0-3 0%	0-4 0%	1-9 10%
3HatchScourge	0-56 0%	0-14 0%	0-9 0%	0-6 0%	0-12 0%	-	0-2 0%	0-3 0%	0-4 0%	0-6 0%
ZerglingRush	98-158 38%	0-14 0%	3-21 12%	3-14 18%	1-17 6%	31-21 60%	22-18 55%	18-20 47%	10-20 33%	10-13 43%

Looking down the total column on the left, there is one big surprise. UAlbertaBot has a primary strategy for each race it may roll, and switches away only if the primary strategy turns out poorly. In past years when I analyzed UAlbertaBot’s data (2018, 2019, and 2020), UAlbertaBot’s primary strategy with every race was also its best strategy overall when it rolled that race. This year, the primary terran strategy MarineRush was no longer best; it was far exceeded by 4RaxMarines, with better results against 5 opponents and equal zero against 2 more. 4RaxMarines does not mean build four barracks to train marines, it means build a barracks at supply 4: It is a fast rush. Here is the build order from the config file.

"Terran_4RaxMarines" : { "Race" : "Terran", "OpeningBuildOrder" : ["Barracks", "SCV", "SCV", "Marine", "Supply Depot", "Marine", "SCV", "Marine", "SCV", "Marine", "SCV", "Marine", "Barracks", "Marine", "Marine", "Marine"]}

I guess opponents were less prepared for the fast marine rush. McRave in particular was unable to cope. I looked through BASIL’s build order page and did not see it; I guess no bot plays 4 rax regularly. The version of UAlbertaBot on BASIL is different from the one in the tournament. The BASIL UAlbertaBot does play the slower marine rush, so its opponents have gotten used to it.

The 3HatchScourge build was useless. The build was specially designed to give UAlbertaBot a chance against XIMP, and apparently has no other value. Curiously, 3HatchMuta was nearly as helpless, with only 1 win, against FreshMeat. That win was the only win as zerg against FreshMeat, though, so chalk up one advantage.

AIIDE 2021 - Microwave versus DaQin

Two posts again today. Blue is good for Microwave, red is good for DaQin.

microwave strategies versus daqin strategies

	overall	4GateGoon	ForgeExpand5GateGoon	ForgeExpandSpeedlots
overall	128/157 82%	0/1 0%	3/3 100%	125/153 82%
1HatchMuta_Sparkle	27/33 82%	-	-	27/33 82%
3HatchHydra	0/1 0%	-	-	0/1 0%
3HatchLurker	0/1 0%	-	-	0/1 0%
3HatchMuta	95/106 90%	-	2/2 100%	93/104 89%
3HatchMutaExpo	0/1 0%	-	-	0/1 0%
4HatchPoolHydra	1/1 100%	-	-	1/1 100%
5HatchPoolHydra	1/2 50%	0/1 0%	-	1/1 100%
6Pool	0/1 0%	-	-	0/1 0%
6PoolSpeed	0/1 0%	-	-	0/1 0%
9PoolHatchGasSpeed7D	1/3 33%	-	-	1/3 33%
9PoolHatchGasSpeed8D	3/6 50%	-	1/1 100%	2/5 40%
9PoolSpeedLing	0/1 0%	-	-	0/1 0%

DaQin barely varied its play, so again, nothing to see here.

microwave as seen by daqin

microwave played	#	daqin recognized
1HatchMuta_Sparkle	33	22 Not fast rush \| 7 Heavy rush \| 4 Proxy
3HatchHydra	1	1 Not fast rush
3HatchLurker	1	1 Heavy rush
3HatchMuta	106	84 Not fast rush \| 16 Heavy rush \| 4 Proxy \| 2 Unknown
3HatchMutaExpo	1	1 Not fast rush
4HatchPoolHydra	1	1 Hydra bust
5HatchPoolHydra	2	2 Not fast rush
6Pool	1	1 Fast rush
6PoolSpeed	1	1 Fast rush
9PoolHatchGasSpeed7D	3	2 Heavy rush \| 1 Not fast rush
9PoolHatchGasSpeed8D	6	3 Fast rush \| 2 Heavy rush \| 1 Unknown
9PoolSpeedLing	1	1 Heavy rush

9 pool is again sometimes a fast rush and sometimes something incompatible. And there are some stray proxies again. That is probably a bug inherited from Steamhammer (and long since fixed there).

daqin as seen by microwave

daqin played	#	microwave recognized
4GateGoon	1	1 Unknown
ForgeExpand5GateGoon	3	2 Turtle \| 1 Unknown
ForgeExpandSpeedlots	153	87 Turtle \| 43 SafeExpand \| 16 Unknown \| 4 NakedExpand \| 3 HeavyRush

AIIDE 2021 - McRave versus DaQin

Blue is good for McRave, red is good for DaQin.

mcrave strategies versus daqin strategies

	overall	ForgeExpand5GateGoon	ForgeExpandSpeedlots
overall	122/157 78%	3/3 100%	119/154 77%
HatchPool,12Hatch,2HatchMuta	102/123 83%	3/3 100%	99/120 82%
PoolHatch,9Pool,2HatchMuta	1/3 33%	-	1/3 33%
PoolHatch,9Pool,3HatchMuta	1/2 50%	-	1/2 50%
PoolHatch,9Pool,6HatchHydra	0/2 0%	-	0/2 0%
PoolHatch,Overpool,2HatchMuta	18/23 78%	-	18/23 78%
PoolHatch,Overpool,3HatchMuta	0/3 0%	-	0/3 0%
PoolHatch,Overpool,6HatchHydra	0/1 0%	-	0/1 0%

Move along, nothing to see here folks.

mcrave as seen by daqin

mcrave played	#	daqin recognized
HatchPool,12Hatch,2HatchMuta	123	93 Not fast rush \| 18 Unknown \| 12 Heavy rush
PoolHatch,9Pool,2HatchMuta	3	1 Not fast rush \| 1 Unknown \| 1 Fast rush
PoolHatch,9Pool,3HatchMuta	2	2 Fast rush
PoolHatch,9Pool,6HatchHydra	2	1 Unknown \| 1 Not fast rush
PoolHatch,Overpool,2HatchMuta	23	22 Not fast rush \| 1 Heavy rush
PoolHatch,Overpool,3HatchMuta	3	2 Not fast rush \| 1 Heavy rush
PoolHatch,Overpool,6HatchHydra	1	1 Not fast rush

Apparently 9 pool is sometimes a fast rush and sometimes a not fast rush.

daqin as seen by mcrave

daqin played	#	mcrave recognized
ForgeExpand5GateGoon	3	3 FFE,Forge,5GateGoon
ForgeExpandSpeedlots	154	88 FFE,Forge,Speedlot \| 24 FFE,Forge,5GateGoon \| 23 FFE,Gateway,Speedlot \| 7 FFE,Forge,ZealotArchon \| 6 FFE,Nexus,Speedlot \| 2 FFE,Nexus,5GateGoon \| 2 FFE,Forge,Unknown \| 2 FFE,Gateway,5GateGoon

There are those dragoons again, even when DaQin believes it is making zealots. I imagine that something in McRave’s recognizer is approximate. It only matters if McRave reacts to its own wrong recognition, though.