AIIDE 2021 - results by map | Starcraft AI blog

AIIDE 2021 - results by map

This post is about the details of how bots performed on maps. I wrote up the map pool last year. In order across the top of each table, there are 3 maps with 2 starting positions, 2 with 3, and 5 with 4. The tables are full of information, but I’ve learned that it is hard to extract insights from the information; to find out what strengths and weaknesses the data points out, you usually have to watch the games. The value of the tables lies in telling authors what games to watch to identify weaknesses.

For reference, here’s a copy of the map table from yesterday, the summary of how well bots did overall on each map.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%
2	bananabrain	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%
3	dragon	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%
4	steamhammer	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%
5	mcrave	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%
6	willyt	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%
7	microwave	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%
8	daqin	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%
9	freshmeat	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%
10	ualbertabot	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Each bot gets its own table, how well it performed against each opponent on each map. Each cell represents 15 games, occasionally 14 if not all games completed, so expect noise in the numbers.

#	stardust	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
2	bananabrain	84%	93%	93%	80%	80%	67%	60%	100%	87%	93%	87%
3	dragon	98%	87%	100%	100%	100%	100%	93%	100%	100%	100%	100%
4	steamhammer	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
5	mcrave	95%	100%	93%	93%	100%	100%	100%	87%	93%	100%	80%
6	willyt	95%	100%	100%	100%	100%	93%	100%	100%	100%	60%	100%
7	microwave	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
8	daqin	91%	87%	87%	100%	100%	53%	93%	100%	93%	93%	100%
9	freshmeat	99%	100%	100%	100%	100%	93%	100%	100%	100%	100%	100%
10	ualbertabot	99%	93%	100%	100%	100%	100%	100%	93%	100%	100%	100%
	overall	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%

A solid wall of blue, but with a few gouges. The lower results versus WillyT on Python and DaQin on Longinus probably represent weaknesses exposed by specific game events that these players tend to bring about on these maps. The weaknesses are not visible in the overall chart, only here where broken down by opponent. The weaknesses show up in only a few cells, but they might occur in many games. Maybe the opponent only happened to exploit the weaknesses then.

#	bananabrain	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	16%	7%	7%	20%	20%	33%	40%	0%	13%	7%	13%
3	dragon	76%	93%	73%	67%	80%	87%	73%	60%	73%	80%	73%
4	steamhammer	83%	80%	87%	80%	100%	80%	80%	80%	87%	80%	80%
5	mcrave	83%	67%	80%	93%	80%	100%	80%	73%	80%	93%	80%
6	willyt	93%	93%	93%	93%	100%	93%	87%	93%	87%	100%	87%
7	microwave	86%	87%	100%	80%	87%	87%	73%	93%	100%	67%	87%
8	daqin	90%	87%	100%	93%	80%	93%	80%	73%	93%	100%	100%
9	freshmeat	96%	100%	100%	100%	87%	87%	93%	100%	100%	93%	100%
10	ualbertabot	95%	93%	93%	100%	87%	87%	100%	93%	100%	100%	93%
	overall	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%

And this is a blue wall with sharp stuff on top, staining the top course of bricks with blood.

#	dragon	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	2%	13%	0%	0%	0%	0%	7%	0%	0%	0%	0%
2	bananabrain	24%	7%	27%	33%	20%	13%	27%	40%	27%	20%	27%
4	steamhammer	37%	53%	47%	27%	40%	47%	13%	53%	33%	27%	33%
5	mcrave	67%	53%	27%	53%	80%	73%	87%	73%	80%	80%	67%
6	willyt	96%	93%	93%	100%	87%	100%	93%	93%	100%	100%	100%
7	microwave	66%	47%	80%	60%	93%	60%	80%	67%	73%	53%	47%
8	daqin	47%	40%	40%	40%	27%	40%	47%	67%	33%	73%	60%
9	freshmeat	39%	47%	40%	60%	33%	40%	47%	27%	20%	27%	47%
10	ualbertabot	83%	100%	73%	93%	67%	80%	93%	87%	87%	67%	80%
	overall	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%

Dragon’s results, as last year, are inconsistent across maps. Again, it doesn’t show in the averages across the bottom. Actually, comparing with other bots, it doesn’t seem much different. Most had extra good and extra bad maps against some opponents.

#	steamhammer	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	bananabrain	17%	20%	13%	20%	0%	20%	20%	20%	13%	20%	20%
3	dragon	63%	47%	53%	73%	60%	53%	87%	47%	67%	73%	67%
5	mcrave	54%	73%	60%	53%	47%	47%	73%	27%	60%	40%	60%
6	willyt	56%	80%	67%	60%	73%	40%	40%	60%	53%	27%	60%
7	microwave	73%	80%	87%	67%	73%	73%	53%	67%	67%	93%	67%
8	daqin	27%	13%	53%	13%	20%	7%	27%	40%	20%	47%	27%
9	freshmeat	68%	60%	73%	67%	80%	67%	60%	93%	73%	60%	47%
10	ualbertabot	92%	93%	100%	93%	100%	87%	100%	80%	93%	87%	93%
	overall	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%

The inconsistent results across maps may mean that bots are weak at adjusting their strategies to fit the maps. Steamhammer makes an attempt, but with 10 maps, it would take a very long tournament to gather the data to decide well. This is one of the issues that the opening timing data—the project I chose to delay—would address. It would at least help on BASIL maps, where there are enough games.

#	mcrave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	5%	0%	7%	7%	0%	0%	0%	13%	7%	0%	20%
2	bananabrain	17%	33%	20%	7%	20%	0%	20%	27%	20%	7%	20%
3	dragon	33%	47%	73%	47%	20%	27%	13%	27%	20%	20%	33%
4	steamhammer	46%	27%	40%	47%	53%	53%	27%	73%	40%	60%	40%
6	willyt	32%	47%	40%	33%	20%	27%	13%	27%	33%	60%	20%
7	microwave	60%	40%	47%	40%	67%	67%	67%	73%	73%	67%	60%
8	daqin	79%	87%	87%	73%	87%	100%	80%	60%	73%	67%	80%
9	freshmeat	65%	53%	47%	80%	60%	60%	60%	67%	60%	73%	93%
10	ualbertabot	37%	73%	67%	40%	47%	7%	33%	13%	47%	40%	7%
	overall	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%

As an example of the uninterpretability of the data, why did McRave do especially well against Dragon on Heartbreak Ridge? Is it because it was a 2-player map? No, the other 2-player maps Destination and Polaris Rhapsody do not agree. Was it because the map is flat, without a ramp? No, Dragon crushed it on Longinus and Empire of the Sun. Was it because of the short rush distance? I don’t think that matches McRave’s play style. It might be because Dragon makes specific mistakes in building placement or tactics, which McRave’s play is lucky enough to exploit on Heartbreak Ridge. The multiple paths through the center of the map might confuse Dragon into splitting its forces. To know for sure, we have to examine the games.

#	willyt	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	5%	0%	0%	0%	0%	7%	0%	0%	0%	40%	0%
2	bananabrain	7%	7%	7%	7%	0%	7%	13%	7%	13%	0%	13%
3	dragon	4%	7%	7%	0%	13%	0%	7%	7%	0%	0%	0%
4	steamhammer	44%	20%	33%	40%	27%	60%	60%	40%	47%	73%	40%
5	mcrave	68%	53%	60%	67%	80%	73%	87%	73%	67%	40%	80%
7	microwave	67%	60%	67%	87%	67%	53%	73%	73%	67%	73%	53%
8	daqin	38%	40%	33%	40%	20%	33%	47%	27%	47%	53%	40%
9	freshmeat	68%	80%	67%	73%	60%	40%	93%	60%	73%	73%	60%
10	ualbertabot	69%	79%	73%	67%	60%	47%	80%	53%	71%	86%	73%
	overall	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%

For bot authors, I think it’s likely to be more useful to look at weaknesses than strengths. The weaknesses with the greatest contrast with the bot’s other results against the same opponent may be worth figuring out. For WillyT, that is the 20% score versus Steamhammer on Destination, a map where the natural should be easy to defend thanks to the double bridges. The weak result might represent a systematic mistake, though of course it could also be something very specific to the map and opponent.

#	microwave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	bananabrain	14%	13%	0%	20%	13%	13%	27%	7%	0%	33%	13%
3	dragon	34%	53%	20%	40%	7%	40%	20%	33%	27%	47%	53%
4	steamhammer	27%	20%	13%	33%	27%	27%	47%	33%	33%	7%	33%
5	mcrave	40%	60%	53%	60%	33%	33%	33%	27%	27%	33%	40%
6	willyt	33%	40%	33%	13%	33%	47%	27%	27%	33%	27%	47%
8	daqin	81%	87%	100%	67%	93%	80%	60%	67%	87%	67%	100%
9	freshmeat	83%	73%	73%	73%	80%	87%	87%	80%	100%	93%	80%
10	ualbertabot	55%	67%	73%	60%	40%	33%	73%	67%	40%	57%	40%
	overall	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%

Strong and weak results could also be just luck, statistical fluctuations. It’s safe to promise that some seemingly meaningful numbers... aren’t, because they’re based on only 15 games.

#	daqin	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	9%	13%	13%	0%	0%	47%	7%	0%	7%	7%	0%
2	bananabrain	10%	13%	0%	7%	20%	7%	20%	27%	7%	0%	0%
3	dragon	53%	60%	60%	60%	73%	60%	53%	33%	67%	27%	40%
4	steamhammer	73%	87%	47%	87%	80%	93%	73%	60%	80%	53%	73%
5	mcrave	21%	13%	13%	27%	13%	0%	20%	40%	27%	33%	20%
6	willyt	62%	60%	67%	60%	80%	67%	53%	73%	53%	47%	60%
7	microwave	19%	13%	0%	33%	7%	20%	40%	33%	13%	33%	0%
9	freshmeat	31%	27%	47%	33%	40%	40%	33%	0%	27%	13%	47%
10	ualbertabot	78%	80%	80%	73%	67%	73%	100%	80%	87%	67%	73%
	overall	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%

#	freshmeat	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	1%	0%	0%	0%	0%	7%	0%	0%	0%	0%	0%
2	bananabrain	4%	0%	0%	0%	13%	13%	7%	0%	0%	7%	0%
3	dragon	61%	53%	60%	40%	67%	60%	53%	73%	80%	73%	53%
4	steamhammer	32%	40%	27%	33%	20%	33%	40%	7%	27%	40%	53%
5	mcrave	35%	47%	53%	20%	40%	40%	40%	33%	40%	27%	7%
6	willyt	32%	20%	33%	27%	40%	60%	7%	40%	27%	27%	40%
7	microwave	17%	27%	27%	27%	20%	13%	13%	20%	0%	7%	20%
8	daqin	69%	73%	53%	67%	60%	60%	67%	100%	73%	87%	53%
10	ualbertabot	52%	21%	67%	80%	50%	43%	64%	57%	33%	47%	53%
	overall	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%

#	ualbertabot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	1%	7%	0%	0%	0%	0%	0%	7%	0%	0%	0%
2	bananabrain	5%	7%	7%	0%	13%	13%	0%	7%	0%	0%	7%
3	dragon	17%	0%	27%	7%	33%	20%	7%	13%	13%	33%	20%
4	steamhammer	8%	7%	0%	7%	0%	13%	0%	20%	7%	13%	7%
5	mcrave	63%	27%	33%	60%	53%	93%	67%	87%	53%	60%	93%
6	willyt	31%	21%	27%	33%	40%	53%	20%	47%	29%	14%	27%
7	microwave	45%	33%	27%	40%	60%	67%	27%	33%	60%	43%	60%
8	daqin	22%	20%	20%	27%	33%	27%	0%	20%	13%	33%	27%
9	freshmeat	48%	79%	33%	20%	50%	57%	36%	43%	67%	53%	47%
	overall	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Next: I want to take a day to show off Steamhammer skills before I get back to AIIDE analysis.

Trackbacks

No Trackbacks

Comments

Dave Churchill on Thursday, October 14. 2021:

Great work as always Jay. If you want to submit a pull request with modifications to the results parser I can include these in the official tournament results next year. But then I guess you may not have this content for your blog :) Up to you!

Jay Scott on Friday, October 15. 2021:

Hmm. My analysis ware with parser is an independent development, written in perl. It requires a bit of hand configuration every time, because the detailed results file doesn’t include bot race. I think you probably don’t want to run it as part of your pipeline.

Jay Scott on Friday, October 15. 2021:

And the work is not that great. I got the css for the # column wrong.... Let me fix that.

Jay Scott on Friday, October 15. 2021:

There, fixed.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA