archive by month
Skip to content

AIIDE 2021 - results by map

This post is about the details of how bots performed on maps. I wrote up the map pool last year. In order across the top of each table, there are 3 maps with 2 starting positions, 2 with 3, and 5 with 4. The tables are full of information, but I’ve learned that it is hard to extract insights from the information; to find out what strengths and weaknesses the data points out, you usually have to watch the games. The value of the tables lies in telling authors what games to watch to identify weaknesses.

For reference, here’s a copy of the map table from yesterday, the summary of how well bots did overall on each map.

#botoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust95.63%96%97%97%98%90%94%98%97%94%96%
2bananabrain79.70%79%81%81%80%83%79%74%81%80%79%
3dragon51.19%50%47%52%50%50%55%56%50%50%51%
4steamhammer49.78%51%56%49%50%44%51%48%49%50%49%
5mcrave41.70%45%47%41%41%38%35%42%41%44%41%
6willyt41.05%38%39%42%36%36%51%38%43%49%40%
7microwave40.70%46%41%41%36%40%41%38%39%40%45%
8daqin39.63%41%36%42%42%45%44%39%41%31%35%
9freshmeat33.61%31%36%33%34%37%32%37%31%35%31%
10ualbertabot26.70%22%19%22%31%38%17%31%27%28%32%

Each bot gets its own table, how well it performed against each opponent on each map. Each cell represents 15 games, occasionally 14 if not all games completed, so expect noise in the numbers.

#stardustoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
2bananabrain84%93%93%80%80%67%60%100%87%93%87%
3dragon98%87%100%100%100%100%93%100%100%100%100%
4steamhammer100%100%100%100%100%100%100%100%100%100%100%
5mcrave95%100%93%93%100%100%100%87%93%100%80%
6willyt95%100%100%100%100%93%100%100%100%60%100%
7microwave100%100%100%100%100%100%100%100%100%100%100%
8daqin91%87%87%100%100%53%93%100%93%93%100%
9freshmeat99%100%100%100%100%93%100%100%100%100%100%
10ualbertabot99%93%100%100%100%100%100%93%100%100%100%
overall95.63%96%97%97%98%90%94%98%97%94%96%

A solid wall of blue, but with a few gouges. The lower results versus WillyT on Python and DaQin on Longinus probably represent weaknesses exposed by specific game events that these players tend to bring about on these maps. The weaknesses are not visible in the overall chart, only here where broken down by opponent. The weaknesses show up in only a few cells, but they might occur in many games. Maybe the opponent only happened to exploit the weaknesses then.

#bananabrainoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust16%7%7%20%20%33%40%0%13%7%13%
3dragon76%93%73%67%80%87%73%60%73%80%73%
4steamhammer83%80%87%80%100%80%80%80%87%80%80%
5mcrave83%67%80%93%80%100%80%73%80%93%80%
6willyt93%93%93%93%100%93%87%93%87%100%87%
7microwave86%87%100%80%87%87%73%93%100%67%87%
8daqin90%87%100%93%80%93%80%73%93%100%100%
9freshmeat96%100%100%100%87%87%93%100%100%93%100%
10ualbertabot95%93%93%100%87%87%100%93%100%100%93%
overall79.70%79%81%81%80%83%79%74%81%80%79%

And this is a blue wall with sharp stuff on top, staining the top course of bricks with blood.

#dragonoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust2%13%0%0%0%0%7%0%0%0%0%
2bananabrain24%7%27%33%20%13%27%40%27%20%27%
4steamhammer37%53%47%27%40%47%13%53%33%27%33%
5mcrave67%53%27%53%80%73%87%73%80%80%67%
6willyt96%93%93%100%87%100%93%93%100%100%100%
7microwave66%47%80%60%93%60%80%67%73%53%47%
8daqin47%40%40%40%27%40%47%67%33%73%60%
9freshmeat39%47%40%60%33%40%47%27%20%27%47%
10ualbertabot83%100%73%93%67%80%93%87%87%67%80%
overall51.19%50%47%52%50%50%55%56%50%50%51%

Dragon’s results, as last year, are inconsistent across maps. Again, it doesn’t show in the averages across the bottom. Actually, comparing with other bots, it doesn’t seem much different. Most had extra good and extra bad maps against some opponents.

#steamhammeroverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust0%0%0%0%0%0%0%0%0%0%0%
2bananabrain17%20%13%20%0%20%20%20%13%20%20%
3dragon63%47%53%73%60%53%87%47%67%73%67%
5mcrave54%73%60%53%47%47%73%27%60%40%60%
6willyt56%80%67%60%73%40%40%60%53%27%60%
7microwave73%80%87%67%73%73%53%67%67%93%67%
8daqin27%13%53%13%20%7%27%40%20%47%27%
9freshmeat68%60%73%67%80%67%60%93%73%60%47%
10ualbertabot92%93%100%93%100%87%100%80%93%87%93%
overall49.78%51%56%49%50%44%51%48%49%50%49%

The inconsistent results across maps may mean that bots are weak at adjusting their strategies to fit the maps. Steamhammer makes an attempt, but with 10 maps, it would take a very long tournament to gather the data to decide well. This is one of the issues that the opening timing data—the project I chose to delay—would address. It would at least help on BASIL maps, where there are enough games.

#mcraveoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust5%0%7%7%0%0%0%13%7%0%20%
2bananabrain17%33%20%7%20%0%20%27%20%7%20%
3dragon33%47%73%47%20%27%13%27%20%20%33%
4steamhammer46%27%40%47%53%53%27%73%40%60%40%
6willyt32%47%40%33%20%27%13%27%33%60%20%
7microwave60%40%47%40%67%67%67%73%73%67%60%
8daqin79%87%87%73%87%100%80%60%73%67%80%
9freshmeat65%53%47%80%60%60%60%67%60%73%93%
10ualbertabot37%73%67%40%47%7%33%13%47%40%7%
overall41.70%45%47%41%41%38%35%42%41%44%41%

As an example of the uninterpretability of the data, why did McRave do especially well against Dragon on Heartbreak Ridge? Is it because it was a 2-player map? No, the other 2-player maps Destination and Polaris Rhapsody do not agree. Was it because the map is flat, without a ramp? No, Dragon crushed it on Longinus and Empire of the Sun. Was it because of the short rush distance? I don’t think that matches McRave’s play style. It might be because Dragon makes specific mistakes in building placement or tactics, which McRave’s play is lucky enough to exploit on Heartbreak Ridge. The multiple paths through the center of the map might confuse Dragon into splitting its forces. To know for sure, we have to examine the games.

#willytoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust5%0%0%0%0%7%0%0%0%40%0%
2bananabrain7%7%7%7%0%7%13%7%13%0%13%
3dragon4%7%7%0%13%0%7%7%0%0%0%
4steamhammer44%20%33%40%27%60%60%40%47%73%40%
5mcrave68%53%60%67%80%73%87%73%67%40%80%
7microwave67%60%67%87%67%53%73%73%67%73%53%
8daqin38%40%33%40%20%33%47%27%47%53%40%
9freshmeat68%80%67%73%60%40%93%60%73%73%60%
10ualbertabot69%79%73%67%60%47%80%53%71%86%73%
overall41.05%38%39%42%36%36%51%38%43%49%40%

For bot authors, I think it’s likely to be more useful to look at weaknesses than strengths. The weaknesses with the greatest contrast with the bot’s other results against the same opponent may be worth figuring out. For WillyT, that is the 20% score versus Steamhammer on Destination, a map where the natural should be easy to defend thanks to the double bridges. The weak result might represent a systematic mistake, though of course it could also be something very specific to the map and opponent.

#microwaveoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust0%0%0%0%0%0%0%0%0%0%0%
2bananabrain14%13%0%20%13%13%27%7%0%33%13%
3dragon34%53%20%40%7%40%20%33%27%47%53%
4steamhammer27%20%13%33%27%27%47%33%33%7%33%
5mcrave40%60%53%60%33%33%33%27%27%33%40%
6willyt33%40%33%13%33%47%27%27%33%27%47%
8daqin81%87%100%67%93%80%60%67%87%67%100%
9freshmeat83%73%73%73%80%87%87%80%100%93%80%
10ualbertabot55%67%73%60%40%33%73%67%40%57%40%
overall40.70%46%41%41%36%40%41%38%39%40%45%

Strong and weak results could also be just luck, statistical fluctuations. It’s safe to promise that some seemingly meaningful numbers... aren’t, because they’re based on only 15 games.

#daqinoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust9%13%13%0%0%47%7%0%7%7%0%
2bananabrain10%13%0%7%20%7%20%27%7%0%0%
3dragon53%60%60%60%73%60%53%33%67%27%40%
4steamhammer73%87%47%87%80%93%73%60%80%53%73%
5mcrave21%13%13%27%13%0%20%40%27%33%20%
6willyt62%60%67%60%80%67%53%73%53%47%60%
7microwave19%13%0%33%7%20%40%33%13%33%0%
9freshmeat31%27%47%33%40%40%33%0%27%13%47%
10ualbertabot78%80%80%73%67%73%100%80%87%67%73%
overall39.63%41%36%42%42%45%44%39%41%31%35%

#freshmeatoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust1%0%0%0%0%7%0%0%0%0%0%
2bananabrain4%0%0%0%13%13%7%0%0%7%0%
3dragon61%53%60%40%67%60%53%73%80%73%53%
4steamhammer32%40%27%33%20%33%40%7%27%40%53%
5mcrave35%47%53%20%40%40%40%33%40%27%7%
6willyt32%20%33%27%40%60%7%40%27%27%40%
7microwave17%27%27%27%20%13%13%20%0%7%20%
8daqin69%73%53%67%60%60%67%100%73%87%53%
10ualbertabot52%21%67%80%50%43%64%57%33%47%53%
overall33.61%31%36%33%34%37%32%37%31%35%31%

#ualbertabotoverallDestinHeartbPolariAztecLonginCircuiEmpireFightiPythonRoadki
1stardust1%7%0%0%0%0%0%7%0%0%0%
2bananabrain5%7%7%0%13%13%0%7%0%0%7%
3dragon17%0%27%7%33%20%7%13%13%33%20%
4steamhammer8%7%0%7%0%13%0%20%7%13%7%
5mcrave63%27%33%60%53%93%67%87%53%60%93%
6willyt31%21%27%33%40%53%20%47%29%14%27%
7microwave45%33%27%40%60%67%27%33%60%43%60%
8daqin22%20%20%27%33%27%0%20%13%33%27%
9freshmeat48%79%33%20%50%57%36%43%67%53%47%
overall26.70%22%19%22%31%38%17%31%27%28%32%

Next: I want to take a day to show off Steamhammer skills before I get back to AIIDE analysis.

Trackbacks

No Trackbacks

Comments

Dave Churchill on :

Great work as always Jay. If you want to submit a pull request with modifications to the results parser I can include these in the official tournament results next year. But then I guess you may not have this content for your blog :) Up to you!

Jay Scott on :

Hmm. My analysis ware with parser is an independent development, written in perl. It requires a bit of hand configuration every time, because the detailed results file doesn’t include bot race. I think you probably don’t want to run it as part of your pipeline.

Jay Scott on :

And the work is not that great. I got the css for the # column wrong.... Let me fix that.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.