AIIDE 2021 - results by map
This post is about the details of how bots performed on maps. I wrote up the map pool last year. In order across the top of each table, there are 3 maps with 2 starting positions, 2 with 3, and 5 with 4. The tables are full of information, but I’ve learned that it is hard to extract insights from the information; to find out what strengths and weaknesses the data points out, you usually have to watch the games. The value of the tables lies in telling authors what games to watch to identify weaknesses.
For reference, here’s a copy of the map table from yesterday, the summary of how well bots did overall on each map.
# | bot | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 95.63% | 96% | 97% | 97% | 98% | 90% | 94% | 98% | 97% | 94% | 96% |
2 | bananabrain | 79.70% | 79% | 81% | 81% | 80% | 83% | 79% | 74% | 81% | 80% | 79% |
3 | dragon | 51.19% | 50% | 47% | 52% | 50% | 50% | 55% | 56% | 50% | 50% | 51% |
4 | steamhammer | 49.78% | 51% | 56% | 49% | 50% | 44% | 51% | 48% | 49% | 50% | 49% |
5 | mcrave | 41.70% | 45% | 47% | 41% | 41% | 38% | 35% | 42% | 41% | 44% | 41% |
6 | willyt | 41.05% | 38% | 39% | 42% | 36% | 36% | 51% | 38% | 43% | 49% | 40% |
7 | microwave | 40.70% | 46% | 41% | 41% | 36% | 40% | 41% | 38% | 39% | 40% | 45% |
8 | daqin | 39.63% | 41% | 36% | 42% | 42% | 45% | 44% | 39% | 41% | 31% | 35% |
9 | freshmeat | 33.61% | 31% | 36% | 33% | 34% | 37% | 32% | 37% | 31% | 35% | 31% |
10 | ualbertabot | 26.70% | 22% | 19% | 22% | 31% | 38% | 17% | 31% | 27% | 28% | 32% |
Each bot gets its own table, how well it performed against each opponent on each map. Each cell represents 15 games, occasionally 14 if not all games completed, so expect noise in the numbers.
# | stardust | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | bananabrain | 84% | 93% | 93% | 80% | 80% | 67% | 60% | 100% | 87% | 93% | 87% |
3 | dragon | 98% | 87% | 100% | 100% | 100% | 100% | 93% | 100% | 100% | 100% | 100% |
4 | steamhammer | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
5 | mcrave | 95% | 100% | 93% | 93% | 100% | 100% | 100% | 87% | 93% | 100% | 80% |
6 | willyt | 95% | 100% | 100% | 100% | 100% | 93% | 100% | 100% | 100% | 60% | 100% |
7 | microwave | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
8 | daqin | 91% | 87% | 87% | 100% | 100% | 53% | 93% | 100% | 93% | 93% | 100% |
9 | freshmeat | 99% | 100% | 100% | 100% | 100% | 93% | 100% | 100% | 100% | 100% | 100% |
10 | ualbertabot | 99% | 93% | 100% | 100% | 100% | 100% | 100% | 93% | 100% | 100% | 100% |
overall | 95.63% | 96% | 97% | 97% | 98% | 90% | 94% | 98% | 97% | 94% | 96% |
A solid wall of blue, but with a few gouges. The lower results versus WillyT on Python and DaQin on Longinus probably represent weaknesses exposed by specific game events that these players tend to bring about on these maps. The weaknesses are not visible in the overall chart, only here where broken down by opponent. The weaknesses show up in only a few cells, but they might occur in many games. Maybe the opponent only happened to exploit the weaknesses then.
# | bananabrain | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 16% | 7% | 7% | 20% | 20% | 33% | 40% | 0% | 13% | 7% | 13% |
3 | dragon | 76% | 93% | 73% | 67% | 80% | 87% | 73% | 60% | 73% | 80% | 73% |
4 | steamhammer | 83% | 80% | 87% | 80% | 100% | 80% | 80% | 80% | 87% | 80% | 80% |
5 | mcrave | 83% | 67% | 80% | 93% | 80% | 100% | 80% | 73% | 80% | 93% | 80% |
6 | willyt | 93% | 93% | 93% | 93% | 100% | 93% | 87% | 93% | 87% | 100% | 87% |
7 | microwave | 86% | 87% | 100% | 80% | 87% | 87% | 73% | 93% | 100% | 67% | 87% |
8 | daqin | 90% | 87% | 100% | 93% | 80% | 93% | 80% | 73% | 93% | 100% | 100% |
9 | freshmeat | 96% | 100% | 100% | 100% | 87% | 87% | 93% | 100% | 100% | 93% | 100% |
10 | ualbertabot | 95% | 93% | 93% | 100% | 87% | 87% | 100% | 93% | 100% | 100% | 93% |
overall | 79.70% | 79% | 81% | 81% | 80% | 83% | 79% | 74% | 81% | 80% | 79% |
And this is a blue wall with sharp stuff on top, staining the top course of bricks with blood.
# | dragon | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 2% | 13% | 0% | 0% | 0% | 0% | 7% | 0% | 0% | 0% | 0% |
2 | bananabrain | 24% | 7% | 27% | 33% | 20% | 13% | 27% | 40% | 27% | 20% | 27% |
4 | steamhammer | 37% | 53% | 47% | 27% | 40% | 47% | 13% | 53% | 33% | 27% | 33% |
5 | mcrave | 67% | 53% | 27% | 53% | 80% | 73% | 87% | 73% | 80% | 80% | 67% |
6 | willyt | 96% | 93% | 93% | 100% | 87% | 100% | 93% | 93% | 100% | 100% | 100% |
7 | microwave | 66% | 47% | 80% | 60% | 93% | 60% | 80% | 67% | 73% | 53% | 47% |
8 | daqin | 47% | 40% | 40% | 40% | 27% | 40% | 47% | 67% | 33% | 73% | 60% |
9 | freshmeat | 39% | 47% | 40% | 60% | 33% | 40% | 47% | 27% | 20% | 27% | 47% |
10 | ualbertabot | 83% | 100% | 73% | 93% | 67% | 80% | 93% | 87% | 87% | 67% | 80% |
overall | 51.19% | 50% | 47% | 52% | 50% | 50% | 55% | 56% | 50% | 50% | 51% |
Dragon’s results, as last year, are inconsistent across maps. Again, it doesn’t show in the averages across the bottom. Actually, comparing with other bots, it doesn’t seem much different. Most had extra good and extra bad maps against some opponents.
# | steamhammer | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
2 | bananabrain | 17% | 20% | 13% | 20% | 0% | 20% | 20% | 20% | 13% | 20% | 20% |
3 | dragon | 63% | 47% | 53% | 73% | 60% | 53% | 87% | 47% | 67% | 73% | 67% |
5 | mcrave | 54% | 73% | 60% | 53% | 47% | 47% | 73% | 27% | 60% | 40% | 60% |
6 | willyt | 56% | 80% | 67% | 60% | 73% | 40% | 40% | 60% | 53% | 27% | 60% |
7 | microwave | 73% | 80% | 87% | 67% | 73% | 73% | 53% | 67% | 67% | 93% | 67% |
8 | daqin | 27% | 13% | 53% | 13% | 20% | 7% | 27% | 40% | 20% | 47% | 27% |
9 | freshmeat | 68% | 60% | 73% | 67% | 80% | 67% | 60% | 93% | 73% | 60% | 47% |
10 | ualbertabot | 92% | 93% | 100% | 93% | 100% | 87% | 100% | 80% | 93% | 87% | 93% |
overall | 49.78% | 51% | 56% | 49% | 50% | 44% | 51% | 48% | 49% | 50% | 49% |
The inconsistent results across maps may mean that bots are weak at adjusting their strategies to fit the maps. Steamhammer makes an attempt, but with 10 maps, it would take a very long tournament to gather the data to decide well. This is one of the issues that the opening timing data—the project I chose to delay—would address. It would at least help on BASIL maps, where there are enough games.
# | mcrave | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 5% | 0% | 7% | 7% | 0% | 0% | 0% | 13% | 7% | 0% | 20% |
2 | bananabrain | 17% | 33% | 20% | 7% | 20% | 0% | 20% | 27% | 20% | 7% | 20% |
3 | dragon | 33% | 47% | 73% | 47% | 20% | 27% | 13% | 27% | 20% | 20% | 33% |
4 | steamhammer | 46% | 27% | 40% | 47% | 53% | 53% | 27% | 73% | 40% | 60% | 40% |
6 | willyt | 32% | 47% | 40% | 33% | 20% | 27% | 13% | 27% | 33% | 60% | 20% |
7 | microwave | 60% | 40% | 47% | 40% | 67% | 67% | 67% | 73% | 73% | 67% | 60% |
8 | daqin | 79% | 87% | 87% | 73% | 87% | 100% | 80% | 60% | 73% | 67% | 80% |
9 | freshmeat | 65% | 53% | 47% | 80% | 60% | 60% | 60% | 67% | 60% | 73% | 93% |
10 | ualbertabot | 37% | 73% | 67% | 40% | 47% | 7% | 33% | 13% | 47% | 40% | 7% |
overall | 41.70% | 45% | 47% | 41% | 41% | 38% | 35% | 42% | 41% | 44% | 41% |
As an example of the uninterpretability of the data, why did McRave do especially well against Dragon on Heartbreak Ridge? Is it because it was a 2-player map? No, the other 2-player maps Destination and Polaris Rhapsody do not agree. Was it because the map is flat, without a ramp? No, Dragon crushed it on Longinus and Empire of the Sun. Was it because of the short rush distance? I don’t think that matches McRave’s play style. It might be because Dragon makes specific mistakes in building placement or tactics, which McRave’s play is lucky enough to exploit on Heartbreak Ridge. The multiple paths through the center of the map might confuse Dragon into splitting its forces. To know for sure, we have to examine the games.
# | willyt | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 5% | 0% | 0% | 0% | 0% | 7% | 0% | 0% | 0% | 40% | 0% |
2 | bananabrain | 7% | 7% | 7% | 7% | 0% | 7% | 13% | 7% | 13% | 0% | 13% |
3 | dragon | 4% | 7% | 7% | 0% | 13% | 0% | 7% | 7% | 0% | 0% | 0% |
4 | steamhammer | 44% | 20% | 33% | 40% | 27% | 60% | 60% | 40% | 47% | 73% | 40% |
5 | mcrave | 68% | 53% | 60% | 67% | 80% | 73% | 87% | 73% | 67% | 40% | 80% |
7 | microwave | 67% | 60% | 67% | 87% | 67% | 53% | 73% | 73% | 67% | 73% | 53% |
8 | daqin | 38% | 40% | 33% | 40% | 20% | 33% | 47% | 27% | 47% | 53% | 40% |
9 | freshmeat | 68% | 80% | 67% | 73% | 60% | 40% | 93% | 60% | 73% | 73% | 60% |
10 | ualbertabot | 69% | 79% | 73% | 67% | 60% | 47% | 80% | 53% | 71% | 86% | 73% |
overall | 41.05% | 38% | 39% | 42% | 36% | 36% | 51% | 38% | 43% | 49% | 40% |
For bot authors, I think it’s likely to be more useful to look at weaknesses than strengths. The weaknesses with the greatest contrast with the bot’s other results against the same opponent may be worth figuring out. For WillyT, that is the 20% score versus Steamhammer on Destination, a map where the natural should be easy to defend thanks to the double bridges. The weak result might represent a systematic mistake, though of course it could also be something very specific to the map and opponent.
# | microwave | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
2 | bananabrain | 14% | 13% | 0% | 20% | 13% | 13% | 27% | 7% | 0% | 33% | 13% |
3 | dragon | 34% | 53% | 20% | 40% | 7% | 40% | 20% | 33% | 27% | 47% | 53% |
4 | steamhammer | 27% | 20% | 13% | 33% | 27% | 27% | 47% | 33% | 33% | 7% | 33% |
5 | mcrave | 40% | 60% | 53% | 60% | 33% | 33% | 33% | 27% | 27% | 33% | 40% |
6 | willyt | 33% | 40% | 33% | 13% | 33% | 47% | 27% | 27% | 33% | 27% | 47% |
8 | daqin | 81% | 87% | 100% | 67% | 93% | 80% | 60% | 67% | 87% | 67% | 100% |
9 | freshmeat | 83% | 73% | 73% | 73% | 80% | 87% | 87% | 80% | 100% | 93% | 80% |
10 | ualbertabot | 55% | 67% | 73% | 60% | 40% | 33% | 73% | 67% | 40% | 57% | 40% |
overall | 40.70% | 46% | 41% | 41% | 36% | 40% | 41% | 38% | 39% | 40% | 45% |
Strong and weak results could also be just luck, statistical fluctuations. It’s safe to promise that some seemingly meaningful numbers... aren’t, because they’re based on only 15 games.
# | daqin | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 9% | 13% | 13% | 0% | 0% | 47% | 7% | 0% | 7% | 7% | 0% |
2 | bananabrain | 10% | 13% | 0% | 7% | 20% | 7% | 20% | 27% | 7% | 0% | 0% |
3 | dragon | 53% | 60% | 60% | 60% | 73% | 60% | 53% | 33% | 67% | 27% | 40% |
4 | steamhammer | 73% | 87% | 47% | 87% | 80% | 93% | 73% | 60% | 80% | 53% | 73% |
5 | mcrave | 21% | 13% | 13% | 27% | 13% | 0% | 20% | 40% | 27% | 33% | 20% |
6 | willyt | 62% | 60% | 67% | 60% | 80% | 67% | 53% | 73% | 53% | 47% | 60% |
7 | microwave | 19% | 13% | 0% | 33% | 7% | 20% | 40% | 33% | 13% | 33% | 0% |
9 | freshmeat | 31% | 27% | 47% | 33% | 40% | 40% | 33% | 0% | 27% | 13% | 47% |
10 | ualbertabot | 78% | 80% | 80% | 73% | 67% | 73% | 100% | 80% | 87% | 67% | 73% |
overall | 39.63% | 41% | 36% | 42% | 42% | 45% | 44% | 39% | 41% | 31% | 35% |
# | freshmeat | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 1% | 0% | 0% | 0% | 0% | 7% | 0% | 0% | 0% | 0% | 0% |
2 | bananabrain | 4% | 0% | 0% | 0% | 13% | 13% | 7% | 0% | 0% | 7% | 0% |
3 | dragon | 61% | 53% | 60% | 40% | 67% | 60% | 53% | 73% | 80% | 73% | 53% |
4 | steamhammer | 32% | 40% | 27% | 33% | 20% | 33% | 40% | 7% | 27% | 40% | 53% |
5 | mcrave | 35% | 47% | 53% | 20% | 40% | 40% | 40% | 33% | 40% | 27% | 7% |
6 | willyt | 32% | 20% | 33% | 27% | 40% | 60% | 7% | 40% | 27% | 27% | 40% |
7 | microwave | 17% | 27% | 27% | 27% | 20% | 13% | 13% | 20% | 0% | 7% | 20% |
8 | daqin | 69% | 73% | 53% | 67% | 60% | 60% | 67% | 100% | 73% | 87% | 53% |
10 | ualbertabot | 52% | 21% | 67% | 80% | 50% | 43% | 64% | 57% | 33% | 47% | 53% |
overall | 33.61% | 31% | 36% | 33% | 34% | 37% | 32% | 37% | 31% | 35% | 31% |
# | ualbertabot | overall | Destin | Heartb | Polari | Aztec | Longin | Circui | Empire | Fighti | Python | Roadki |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | stardust | 1% | 7% | 0% | 0% | 0% | 0% | 0% | 7% | 0% | 0% | 0% |
2 | bananabrain | 5% | 7% | 7% | 0% | 13% | 13% | 0% | 7% | 0% | 0% | 7% |
3 | dragon | 17% | 0% | 27% | 7% | 33% | 20% | 7% | 13% | 13% | 33% | 20% |
4 | steamhammer | 8% | 7% | 0% | 7% | 0% | 13% | 0% | 20% | 7% | 13% | 7% |
5 | mcrave | 63% | 27% | 33% | 60% | 53% | 93% | 67% | 87% | 53% | 60% | 93% |
6 | willyt | 31% | 21% | 27% | 33% | 40% | 53% | 20% | 47% | 29% | 14% | 27% |
7 | microwave | 45% | 33% | 27% | 40% | 60% | 67% | 27% | 33% | 60% | 43% | 60% |
8 | daqin | 22% | 20% | 20% | 27% | 33% | 27% | 0% | 20% | 13% | 33% | 27% |
9 | freshmeat | 48% | 79% | 33% | 20% | 50% | 57% | 36% | 43% | 67% | 53% | 47% |
overall | 26.70% | 22% | 19% | 22% | 31% | 38% | 17% | 31% | 27% | 28% | 32% |
Next: I want to take a day to show off Steamhammer skills before I get back to AIIDE analysis.
Comments
Dave Churchill on :
Jay Scott on :
Jay Scott on :
Jay Scott on :