tournaments - 5 | Starcraft AI blog

AIIDE 2021 - BananaBrain versus Dragon

BananaBrain and Dragon both recorded their own opening builds for all 157 games played, so I can align their learning files and see how their strategies matched up against each other. BananaBrain also recorded its representation of what the opponent played, so I can compare its idea of Dragon’s build with Dragon’s own idea. I first did this last year. Dragon is carried over from last year unchanged, while BananaBrain is much stronger now.

The win rates and coloring are from the point of view of BananaBrain. Blue is good for BananaBrain and red is good for Dragon.

bananabrain strategies versus dragon strategies

	overall	1rax fe	2rax bio	2rax mech	bio	dirty worker rush	mass vulture	siege expand
overall	117/157 75%	8/8 100%	19/30 63%	8/8 100%	12/13 92%	8/8 100%	27/36 75%	35/54 65%
PvT_10/12gate	34/48 71%	1/1 100%	9/17 53%	1/1 100%	2/2 100%	2/2 100%	3/5 60%	16/20 80%
PvT_1gatedtexpo	0/1 0%	-	0/1 0%	-	-	-	-	-
PvT_28nexus	3/6 50%	-	1/2 50%	-	-	-	1/1 100%	1/3 33%
PvT_2gaterngexpo	2/4 50%	-	0/1 0%	-	-	-	1/1 100%	1/2 50%
PvT_32nexus	0/1 0%	-	-	-	-	-	-	0/1 0%
PvT_9/9gate	78/96 81%	7/7 100%	9/9 100%	7/7 100%	10/11 91%	6/6 100%	22/29 76%	17/27 63%
PvT_9/9proxygate	0/1 0%	-	-	-	-	-	-	0/1 0%

dragon as seen by bananabrain

dragon played	#	bananabrain recognized
1rax fe	8	7 T_unknown \| 1 T_fastexpand
2rax bio	30	30 T_unknown
2rax mech	8	8 T_unknown
bio	13	13 T_unknown
dirty worker rush	8	8 T_unknown
mass vulture	36	21 T_1fac \| 14 T_unknown \| 1 T_2fac
siege expand	54	38 T_1fac \| 16 T_unknown

Last year this table showed that BananaBrain was weak at recognizing Dragon’s builds, with a lot of unknowns. There are more recognized builds this year, but BananaBrain plays differently so I’m not sure whether BananaBrain has improved at recognition. What is clear is that everything is blue. Recognizing some builds does not seem to have helped BananaBrain; it did well no matter what.

AIIDE 2021 - what Dragon learned

Dragon records for each game only its own build and win/loss, so the information is sparse. It has a total of 7 builds. Dragon is a carryover bot, and I analyzed its game records from AIIDE 2020 last year. Dragon considers that “the opening” is a very brief phase of the game: It quickly adapts to what it sees of the opponent’s play, and the opening build fades out of view. Last year I found that, against many opponents, the choice of opening build made little difference; the game was decided later.

#1 stardust

opening	games	wins	first	last
1rax fe	24	0%	5	156
2rax bio	35	6%	1	153
2rax mech	18	0%	2	146
bio	20	5%	3	144
dirty worker rush	19	0%	0	133
mass vulture	20	0%	4	152
siege expand	21	0%	6	150
7 openings	157	2%

It’s interesting that the only openings to make a dent were “2rax bio” and “bio”. Was Stardust surprised by marines? If Stardust made zealots to get units faster, that may have backfired.

#2 bananabrain

opening	games	wins	first	last
1rax fe	8	0%	6	82
2rax bio	30	37%	1	156
2rax mech	8	0%	7	109
bio	13	8%	8	140
dirty worker rush	8	0%	5	148
mass vulture	36	25%	2	151
siege expand	54	35%	0	145
7 openings	157	25%

Again, the marines were a relatively successful choice against protoss. It’s a surprise. #8 DaQin below is different. Yesterday we saw that BananaBrain liked zealot openings against Dragon, and it’s true that marines with good micro can hold their own against zealots. Maybe BananaBrain liked zealots because they upset Dragon’s tech builds, and Dragon found that marines answered best, so that the two settled into this equilibrium with neither bot able to 100% counter the other. It’s a nice story, at least.

#4 steamhammer

opening	games	wins	first	last
1rax fe	53	45%	8	124
2rax bio	7	14%	0	153
2rax mech	12	25%	4	121
bio	17	29%	6	135
dirty worker rush	47	40%	2	156
mass vulture	15	27%	10	117
siege expand	6	0%	5	122
7 openings	157	36%

Switching between opposite builds like fast expand (“1rax fe”) and worker rush (“dirty worker rush”) is not a bad plan for defeating Steamhammer. I don’t think Dragon did it on purpose, though. Most openings scored about equal.

#5 mcrave

opening	games	wins	first	last
1rax fe	20	80%	34	153
2rax bio	40	65%	0	109
2rax mech	17	65%	23	90
bio	24	62%	64	100
dirty worker rush	2	0%	20	98
mass vulture	6	33%	63	137
siege expand	46	72%	11	154
7 openings	155	66%

Again, most were about equal, with only a couple of exceptions. “1rax fe” was also best against last year’s McRave, even though it played rather differently.

#6 willyt

opening	games	wins	first	last
2rax bio	1	0%	53	53
mass vulture	154	98%	0	154
2 openings	155	97%

Last year Dragon chose “2rax mech” as its build to trample on WillyT (winning fewer, 94%, even though this year’s WillyT is substantially stronger). I think it found something that worked and felt no need to experiment any further.

#7 microwave

opening	games	wins	first	last
1rax fe	24	67%	1	120
2rax bio	28	61%	31	152
2rax mech	66	74%	0	149
bio	26	62%	46	147
dirty worker rush	2	0%	8	124
mass vulture	5	40%	12	156
siege expand	6	33%	43	153
7 openings	157	65%

#8 daqin

opening	games	wins	first	last
1rax fe	98	56%	8	156
2rax bio	6	17%	14	104
2rax mech	11	36%	1	105
bio	13	31%	7	136
dirty worker rush	5	0%	5	153
mass vulture	20	50%	9	103
siege expand	4	0%	0	132
7 openings	157	47%

Best was the fast expansion. (“1rax fe” is faster than “siege expand”.) That makes sense against DaQin’s style of play.

#9 freshmeat

opening	games	wins	first	last
1rax fe	12	25%	4	128
2rax bio	12	33%	18	143
2rax mech	25	36%	0	141
bio	10	30%	9	139
dirty worker rush	34	35%	7	155
mass vulture	49	53%	10	156
siege expand	15	27%	15	142
7 openings	157	39%

Mostly about equal again. At some point I’ll look at the games and see how FreshMeat upset Dragon.

#10 ualbertabot

opening	games	wins	first	last
1rax fe	40	92%	116	155
2rax bio	6	67%	54	73
2rax mech	2	50%	114	115
bio	13	77%	36	49
dirty worker rush	17	76%	50	74
mass vulture	76	84%	0	113
siege expand	2	50%	85	110
7 openings	156	83%

CoG 2021 data released

Source code, learning files, and replays from CoG 2021 were released today. Though there is still a note above them saying “(You cannnot access replay files yet)”.

AIIDE 2021 - what BananaBrain learned

Here’s my summary of BananaBrain’s learning files. BananaBrain records both its own strategy and the recognized enemy strategy for every game.

#1 stardust

opening	games	wins	first	last
PvP_10/12gate	5	0%	9	121
PvP_12nexus	5	0%	10	119
PvP_2gatedt	5	0%	3	120
PvP_2gatedtexpo	16	19%	0	125
PvP_2gatereaver	5	0%	1	118
PvP_3gaterobo	5	0%	13	123
PvP_3gatespeedzeal	5	0%	5	116
PvP_4gategoon	24	25%	12	126
PvP_9/9gate	5	0%	6	122
PvP_9/9proxygate	38	29%	8	156
PvP_nzcore	13	15%	4	149
PvP_zcore	5	0%	7	117
PvP_zcorez	9	11%	11	144
PvP_zzcore	17	24%	2	127
14 openings	157	17%

enemy	games	wins
P_1gatecore	6	50%
P_2gate	26	19%
P_2gatefast	13	31%
P_4gategoon	107	14%
P_cannonturtle	1	0%
P_unknown	4	0%
6 openings	157	17%

The most successful: Double proxy gates. Stardust plays the same every game, except for reactions to its opponent, so it’s interesting that BananaBrain diagnosed so many different openings. I suspect that they were all, or nearly all, 4 gate goon, and BananaBrain was not always able to scout long enough to see it. I think the variety is what you get when BananaBrain sees only part of the build.

#3 dragon

opening	games	wins	first	last
PvT_10/12gate	48	71%	0	156
PvT_1gatedtexpo	1	0%	16	16
PvT_28nexus	6	50%	13	119
PvT_2gaterngexpo	4	50%	10	91
PvT_32nexus	1	0%	89	89
PvT_9/9gate	96	81%	2	118
PvT_9/9proxygate	1	0%	92	92
7 openings	157	75%

enemy	games	wins
T_1fac	59	66%
T_2fac	1	100%
T_fastexpand	1	100%
T_unknown	96	79%
4 openings	157	75%

The best builds were zealot builds. BananaBrain seems to be especially successful with early zealot pressure.

#4 steamhammer

opening	games	wins	first	last
PvZ_10/12gate	3	67%	5	7
PvZ_1basespeedzeal	21	86%	37	157
PvZ_2basespeedzeal	9	78%	21	149
PvZ_4gate2archon	1	0%	31	31
PvZ_5gategoon	2	50%	29	30
PvZ_9/9gate	92	88%	61	156
PvZ_9/9proxygate	1	0%	57	57
PvZ_bisu	5	80%	32	36
PvZ_neobisu	12	83%	8	19
PvZ_sairdt	3	67%	146	148
PvZ_sairgoon	1	0%	20	20
PvZ_sairreaver	3	67%	58	60
PvZ_stove	5	80%	0	4
13 openings	158	83%

enemy	games	wins
Z_10hatch	32	84%
Z_12hatch	57	75%
Z_12hatchmain	1	100%
Z_12pool	2	100%
Z_4/5pool	1	100%
Z_9pool	23	96%
Z_9poolspeed	8	88%
Z_overpool	19	84%
Z_unknown	15	80%
9 openings	158	83%

Again, zealot builds. Steamhammer tried a wide variety of counters, of which 12 hatch worked best. BananaBrain records only the earliest steps of zerg openings, so what BananaBrain calls Z_12hatch could have a range of followups.

#5 mcrave

opening	games	wins	first	last
PvZ_10/12gate	54	85%	17	119
PvZ_1basespeedzeal	3	67%	58	60
PvZ_2basespeedzeal	5	80%	1	5
PvZ_4gate2archon	1	0%	61	61
PvZ_5gategoon	1	0%	66	66
PvZ_9/9gate	1	0%	6	6
PvZ_9/9proxygate	8	75%	24	100
PvZ_bisu	5	80%	53	57
PvZ_neobisu	4	75%	62	65
PvZ_sairdt	3	67%	14	16
PvZ_sairgoon	12	83%	7	105
PvZ_sairreaver	1	0%	0	0
PvZ_stove	59	88%	31	156
13 openings	157	82%

enemy	games	wins
Z_12hatch	84	81%
Z_12pool	2	0%
Z_9pool	23	78%
Z_overpool	45	89%
Z_unknown	3	100%
5 openings	157	82%

Most things worked against McRave, but especially tech openings. The earliest steps of McRave’s openings are stereotyped, so BananaBrain recognized few choices.

#6 willyt

opening	games	wins	first	last
PvT_10/12gate	44	93%	7	50
PvT_12nexus	6	83%	0	5
PvT_2gatedt	1	0%	6	6
PvT_32nexus	24	88%	51	74
PvT_9/9proxygate	77	99%	80	156
PvT_dtdrop	2	50%	78	79
PvT_stove	3	67%	75	77
7 openings	157	93%

enemy	games	wins
T_1fac	12	100%
T_2rax	55	95%
T_fastexpand	52	88%
T_unknown	38	95%
4 openings	157	93%

The proxy gates won 76 times out of 77. Ouch.

#7 microwave

opening	games	wins	first	last
PvZ_10/12gate	86	97%	31	156
PvZ_1basespeedzeal	1	0%	23	23
PvZ_2basespeedzeal	2	50%	19	20
PvZ_4gate2archon	2	50%	24	25
PvZ_5gategoon	30	83%	39	79
PvZ_9/9gate	15	80%	9	82
PvZ_9/9proxygate	2	50%	63	64
PvZ_bisu	2	50%	7	8
PvZ_neobisu	2	50%	21	22
PvZ_sairdt	5	80%	0	4
PvZ_sairgoon	5	80%	26	30
PvZ_sairreaver	2	50%	5	6
PvZ_stove	3	67%	16	18
13 openings	157	87%

enemy	games	wins
Z_10hatch	8	100%
Z_12hatch	31	97%
Z_12pool	13	85%
Z_4/5pool	13	100%
Z_9pool	58	79%
Z_9poolspeed	6	100%
Z_overpool	20	75%
Z_unknown	8	88%
8 openings	157	87%

Zealots were best again, though dragoons were good too. I wonder why the economic 10/12 gates were more successful than the fast 9/9 gates? It suggests that Microwave may overdefend, fearing fast zealots, and not have a strong enough economy to hold off efficient zealots instead. Or the followup after the zealots; BananaBrain likes to expand quickly.

#8 daqin

opening	games	wins	first	last
PvP_2gatedt	10	80%	0	37
PvP_2gatedtexpo	1	0%	6	6
PvP_2gatereaver	142	92%	7	156
PvP_9/9gate	3	67%	31	33
PvP_zcore	1	0%	26	26
5 openings	157	90%

enemy	games	wins
P_1gatecore	69	88%
P_4gategoon	68	91%
P_ffe	1	100%
P_unknown	19	89%
4 openings	157	90%

DaQin was apparently not ready for reavers. Otherwise it did not badly against a powerful opponent.

#9 freshmeat

opening	games	wins	first	last
PvZ_4gate2archon	8	88%	26	33
PvZ_9/9gate	122	100%	35	156
PvZ_neobisu	14	86%	0	13
PvZ_sairgoon	1	0%	34	34
PvZ_stove	12	83%	14	25
5 openings	157	96%

enemy	games	wins
Z_12hatch	27	85%
Z_12hatchmain	22	91%
Z_12pool	1	100%
Z_4/5pool	27	100%
Z_9pool	11	100%
Z_overpool	3	100%
Z_unknown	66	100%
7 openings	157	96%

#10 ualbertabot

opening	games	wins	first	last
PvU_10/12gate	4	75%	0	3
PvU_9/9gate	1	0%	4	4
PvU_9/9proxygate	5	80%	10	14
PvU_nzcore	5	80%	5	9
PvU_zcore	142	97%	15	156
5 openings	157	95%

enemy	games	wins
P_1gatecore	19	100%
P_2gate	1	100%
P_2gatefast	25	84%
P_4gategoon	3	100%
P_unknown	6	100%
T_1fac	1	100%
T_2fac	22	100%
T_2rax	16	94%
T_unknown	11	100%
Z_12hatch	26	100%
Z_4/5pool	23	87%
Z_overpool	3	100%
Z_unknown	1	100%
13 openings	157	95%

AIIDE 2021 - Stardust table in minutes and seconds

It occurred to me a little late that many people would find the Stardust data table easier to understand if the frame times were converted to minutes and seconds. So here’s that version. See the previous post from today.

		firstDarkTemplarCompleted				pylonInOurMain				firstMutaliskCompleted
opponent	games	n	min	median	max	n	min	median	max	n	min	median	max
bananabrain	155	20	5:15	5:29	16:11	0	-	-	-	0	-	-	-
dragon	156	0	-	-	-	0	-	-	-	0	-	-	-
steamhammer	158	0	-	-	-	0	-	-	-	17	4:59	5:43	7:11
mcrave	157	0	-	-	-	0	-	-	-	124	6:17	7:35	11:12
willyt	157	0	-	-	-	0	-	-	-	0	-	-	-
microwave	157	0	-	-	-	0	-	-	-	17	5:07	5:55	7:54
daqin	156	126	5:13	5:29	12:36	2	1:53	1:54	1:55	0	-	-	-
freshmeat	157	0	-	-	-	0	-	-	-	1	11:40	11:40	11:40
ualbertabot	157	17	4:19	4:29	4:36	0	-	-	-	0	-	-	-

AIIDE 2021 - Stardust’s learning

I investigated how Stardust’s learning works, and what it learned. It’s unusual, so it was worth a close look.

In its learning file of game records for each opponent, Stardust records values for 3 keys for each game, firstDarkTemplarCompleted, pylonInOurMain, and firstMutaliskCompleted. If the event occurs in the game, the value is the frame time of the event; otherwise the value is 2147483647 (INT_MAX, the largest int value, in this C++ implementation). It also records whether the game was a win or a loss. It records the hash of the map, too, but that doesn’t seem to be used again.

summarizing the data

The class Opponent is responsible for providing the learned information to the rest of the bot. It summarizes the game records via two routines.

  int minValueInPreviousGames(const std::string &key, int defaultNoData, int maxCount = INT_MAX, int minCount = 0);

If there are at least minCount games, then look through the game records, most recent first, for up to maxCount games. Look up the key for each game and return its minimum value, or the default value if there are none. This amounts to finding the earliest frame at which the event happened, or the default if it did not happen in the specified number of games.

   double winLossRatio(double defaultValue, int maxCount = INT_MAX);

Look through the game records, most recent first, for up to maxCount games and return the winning ratio, or the default value if there are no games yet.

using the summarized data

Each of the 3 keys is used in exactly one place in the code. Here is where firstDarkTemplarCompleted is looked up in the PvP strategy code:

    if (Opponent::winLossRatio(0.0, 200) < 0.99)
    {
        expectedCompletionFrame = Opponent::minValueInPreviousGames("firstDarkTemplarCompleted", 7300, 15, 10);
    }

This means “If we’re rolling you absolutely flat (at least 99% wins in the last 200 games), then it doesn’t matter. Otherwise there’s some risk. In the most recent 15 games, find the earliest frame that the first enemy dark templar was (estimated to be) completed, or return frame 7300 if none.” The default frame 7300 is not the earliest a DT can emerge; they can be on the map over a thousand frames earlier. So it is not a worst-case assumption. Further code overrides the frame number if there is scouting information related to dark templar production. It attempts to build a defensive photon cannon just in time for the enemy DT’s arrival, and sometimes to get an observer.

The key pylonInOurMain is part of cannon rush defense. Stardust again checks the win ratio and again looks back 15 games with a minimum game count of 10, this time with a default of 0 if there are not enough games. It starts scouting its base 500 frames (about 21 seconds) ahead of the earliest seen enemy pylon appearing in its base, which may be never. The idea is that Stardust doesn’t waste time scouting its own base if it hasn’t seen you proxy a pylon in the last 15 games, and delays the scout if the pylon is proxied late.

The key firstMutaliskCompleted is used very similarly, to decide whether and when to defend each nexus with cannons. The goal is to get cannons in time in case mutalisks arrive without being scouted. There are simple rules to decide how many cannons at each nexus:

    // Main and natural are special cases, we only get cannons there to defend against air threats
    if (base == Map::getMyMain() || base == Map::getMyNatural())
    {
        if (enemyAirUnits > 6) return 4;
        if (enemyAirThreat) return 3;
        if (enemyDropThreat && BWAPI::Broodwar->getFrameCount() > 8000) return 1;
        return 0;
    }

    // At expansions we get cannons if the enemy is not contained or has an air threat
    if (!Strategist::isEnemyContained() || enemyAirUnits > 0) return 2;
    if (enemyAirThreat || enemyDropThreat) return 1;
    return 0;

If the firstMutaliskCompleted check says that it’s time, it sets enemyAirThreat to true and makes 3 cannons each at main and natural, and at least 1 at each other base.

the data itself

Here’s my summary of the data in Stardust’s files. The files include prepared data. I left the prepared data out; this covers only what was recorded during the tournament. The tournament was run for 157 rounds, although the official results are given after round 150. The table here is data for all 157 rounds. I don’t have a way to tell which unrecorded games were from rounds 1-150 and which were from 151-157... though I think I could guess.

n is the number of games for which a value (other than 2147483647) was recorded for the key. The values are frame numbers.

		firstDarkTemplarCompleted				pylonInOurMain				firstMutaliskCompleted
opponent	games	n	min	median	max	n	min	median	max	n	min	median	max
bananabrain	155	20	7579	7897.5	23319	0	-	-	-	0	-	-	-
dragon	156	0	-	-	-	0	-	-	-	0	-	-	-
steamhammer	158	0	-	-	-	0	-	-	-	17	7188	8241	10355
mcrave	157	0	-	-	-	0	-	-	-	124	9070	10939	16146
willyt	157	0	-	-	-	0	-	-	-	0	-	-	-
microwave	157	0	-	-	-	0	-	-	-	17	7371	8534	11397
daqin	156	126	7533	7912.5	18154	2	2721	2743.5	2766	0	-	-	-
freshmeat	157	0	-	-	-	0	-	-	-	1	16801	16801	16801
ualbertabot	157	17	6230	6477	6627	0	-	-	-	0	-	-	-

As you might expect after deep contemplation of the nature of reality, only protoss makes dark templar or proxy pylons, and only zerg makes mutalisks. Nothing interesting was recorded for the terran opponents.

Notice that UAlbertaBot sometimes makes dark templar much earlier than the no-data 7300 frame default time; the others do not. DaQin is recorded as twice placing a proxy pylon in Stardust’s main. I didn’t think it ever did that. I guess it’s a holdover from the Locutus proxy pylon play, to trick opponents into overreacting? DaQin made DTs in most games, and McRave went mutalisks in most games. FreshMeat is recorded as having made a mutalisk (or more than one) in exactly one game, which seems unusual.

AIIDE 2021 - the learning curves

Before I dig into what each bot learned, I thought I’d look at the win percentage over time graph. Every bot wrote data, and it is likely that every bot attempted to learn and improve over time. Only some succeeded in improving their results, though.

Every bot shows a startup transient on the graph. The early swings up and down are controlled by some combination of luck and learning; luck because there are few games so statistical variation is high, and learning if and when the learning algorithms make fast adjustments (I think they usually do). To disentangle luck from learning, I think I want both statistical tests and a look into the algorithms to see what the learning rates could be. It would be too much for one post. In this post, I’m looking at the curves after 20 or 30 rounds, when the swings have mostly leveled off. I’m answering the question: Is the bot able to keep learning throughout a long tournament, outlearning its competition in the long run?

Four bots more or less held even. There are wobbles or slight trends, but not large ones. It’s what you expect if most bots are about equally good at lifetime learning. The learning systems are more or less saturated, and when one discovers an exploit, its counterpart figures out soon enough how to neuter the exploit, or so I imagine it. The learning competition is near an equilibrium.

Stardust doesn’t learn much, and apparently doesn’t have to. Steamhammer and McRave have messy early curves, perhaps reflecting complicated learning systems. FreshMeat has a beautiful clean early curve, unlike any other bot’s, suggesting that it knows what it is doing and straightforwardly does it. All 3 of the lower bots show low humps followed by slight regressions. I provisionally interpret that as the bot’s learning system saturating, then its opponents adjusting to that over time.

Four bots were able to improve. BananaBrain was in a class by itself, improving far more than any other bot. WillyT, Microwave, and UAlbertaBot had slight upward trends. None of them looks as impressive as AIUR did in 2015.

What gives BananaBrain a steeper curve? Is it good at learning in the long term, or bad at learning in the short term? (See that down-hook at the beginning.) I’ll look into it later on.

Dragon and DaQin fell behind. If somebody’s going up, somebody else must be going down. It may not be a coincidence that both are carryover bots from last year. Dragon’s learning files have a simple structure, the strategy name and win/loss. DaQin plays few strategies and has few ways to escape from exploits that other bots may find.

Next: Looking at Stardust’s learning.

AIIDE 2021 - what bots wrote data?

I looked in each bot’s final write directory to see what files it wrote, if any, and in its AI directory to see if it had prepared data for any opponents. Be sure to note: A bot does not necessarily use the data it writes. Preparation for specific opponents is not necessarily in the form of data in the AI directory, it might be in code.

#	bot	info
1	Stardust	Unlike last year, this year Stardust wrote data. It’s in JSON format, and records the map by hash, win or loss, and the timings of up to 3 game events, named `firstDarkTemplarCompleted`, `firstMutaliskCompleted`, and `pylonInOurMain`. The times look like frame numbers, and the great majority are 2147483647 (-1 printed as unsigned), which must mean “didn’t happen”. There is prepared data for 7 opponents (including PurpleWave which did not compete), so I assume that Stardust uses the data. I’ll find out for sure when I look at the source.
2	BananaBrain	The learning files look unchanged from last year and the year before: One file for each opponent in the form of brief records of results. Each record consists of date+time, map, BananaBrain’s strategy (“PvZ_9/9proxygate”), the opponent’s recognized strategy (“Z_9pool”), a floating point number which we were told last year is the game duration in minutes, and the game result. Pre-learned data for DaQin and Dragon, the two stronger carryover bots. Last year there was pre-learned data for more opponents; maybe prep for opponents that might change turned out risky.
3	Dragon	Simple game records, one per line, with strategy and game result, like `"siege expand" won`.
4	Steamhammer	Steamhammer’s learning file format is documented here.
5	McRave	The files look to have the same information as last year, but the format is slightly different. Two files for each opponent, named like `ZvU UAlbertaBot.txt` and `ZvU UAlbertaBot Info.txt`. The first file is short and counts wins and losses overall and for each of McRave’s strategies. The info file has detailed game records with aspects of the opponent’s strategy (`2Gate,Main,ZealotRush`), McRave’s strategy at 3 levels of abstraction (`PoolHatch,Overpool,2HatchMuta`), timings, and unit counts. No prepared files.
6	WillyT	The files seem to have been corrected since last year. There is one file per opponent, one line per game, with lines that look like `20211005,Z,03,0`. The items look like date, opponent race, a number 01 02 or 03, and win/loss. No prepared files.
7	Microwave	Result and history files for each opponent. They look identical to last year’s, except that Microwave now lists a much larger number of strategies for itself. The result files count wins and losses for each Microwave strategy. The history files have a one-line record of data about each game. Also pre-learned history files for all opponents, each with over 100 game records.
8	DaQin	Carried over from last year. Learning files straight from its parent Locutus (very similar to the old format Steamhammer files). No prepared files (and they’d be out of date if they existed).
9	FreshMeat	Three files for each opponent, except 6 files for UAlbertaBot, presumably because it plays random. The contents of the files are opaque: Two are bare lists of numbers, one is a list of incomprehensible 14-character strings. I’ll have to read the code. No prepared files.
10	UAlbertaBot	Carried over from past years. For each opponent, a file listing strategies with win and loss counts for each.

The only real surprise is Stardust’s minimalist and rather weird-seeming data. FreshMeat is new, of course, so anything it did would be unsurprising! It’s notable that every single participant wrote learning data, but that’s not a surprise either because this was an elite tournament. Except for Stardust, all the elite bots have used learning for years.

In unrelated news, I expected that CoG would post replays and learning files shortly after the AIIDE submission deadline. But no, they haven’t done it yet.

AIIDE 2021 - results by map

This post is about the details of how bots performed on maps. I wrote up the map pool last year. In order across the top of each table, there are 3 maps with 2 starting positions, 2 with 3, and 5 with 4. The tables are full of information, but I’ve learned that it is hard to extract insights from the information; to find out what strengths and weaknesses the data points out, you usually have to watch the games. The value of the tables lies in telling authors what games to watch to identify weaknesses.

For reference, here’s a copy of the map table from yesterday, the summary of how well bots did overall on each map.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%
2	bananabrain	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%
3	dragon	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%
4	steamhammer	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%
5	mcrave	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%
6	willyt	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%
7	microwave	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%
8	daqin	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%
9	freshmeat	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%
10	ualbertabot	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Each bot gets its own table, how well it performed against each opponent on each map. Each cell represents 15 games, occasionally 14 if not all games completed, so expect noise in the numbers.

#	stardust	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
2	bananabrain	84%	93%	93%	80%	80%	67%	60%	100%	87%	93%	87%
3	dragon	98%	87%	100%	100%	100%	100%	93%	100%	100%	100%	100%
4	steamhammer	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
5	mcrave	95%	100%	93%	93%	100%	100%	100%	87%	93%	100%	80%
6	willyt	95%	100%	100%	100%	100%	93%	100%	100%	100%	60%	100%
7	microwave	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
8	daqin	91%	87%	87%	100%	100%	53%	93%	100%	93%	93%	100%
9	freshmeat	99%	100%	100%	100%	100%	93%	100%	100%	100%	100%	100%
10	ualbertabot	99%	93%	100%	100%	100%	100%	100%	93%	100%	100%	100%
	overall	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%

A solid wall of blue, but with a few gouges. The lower results versus WillyT on Python and DaQin on Longinus probably represent weaknesses exposed by specific game events that these players tend to bring about on these maps. The weaknesses are not visible in the overall chart, only here where broken down by opponent. The weaknesses show up in only a few cells, but they might occur in many games. Maybe the opponent only happened to exploit the weaknesses then.

#	bananabrain	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	16%	7%	7%	20%	20%	33%	40%	0%	13%	7%	13%
3	dragon	76%	93%	73%	67%	80%	87%	73%	60%	73%	80%	73%
4	steamhammer	83%	80%	87%	80%	100%	80%	80%	80%	87%	80%	80%
5	mcrave	83%	67%	80%	93%	80%	100%	80%	73%	80%	93%	80%
6	willyt	93%	93%	93%	93%	100%	93%	87%	93%	87%	100%	87%
7	microwave	86%	87%	100%	80%	87%	87%	73%	93%	100%	67%	87%
8	daqin	90%	87%	100%	93%	80%	93%	80%	73%	93%	100%	100%
9	freshmeat	96%	100%	100%	100%	87%	87%	93%	100%	100%	93%	100%
10	ualbertabot	95%	93%	93%	100%	87%	87%	100%	93%	100%	100%	93%
	overall	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%

And this is a blue wall with sharp stuff on top, staining the top course of bricks with blood.

#	dragon	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	2%	13%	0%	0%	0%	0%	7%	0%	0%	0%	0%
2	bananabrain	24%	7%	27%	33%	20%	13%	27%	40%	27%	20%	27%
4	steamhammer	37%	53%	47%	27%	40%	47%	13%	53%	33%	27%	33%
5	mcrave	67%	53%	27%	53%	80%	73%	87%	73%	80%	80%	67%
6	willyt	96%	93%	93%	100%	87%	100%	93%	93%	100%	100%	100%
7	microwave	66%	47%	80%	60%	93%	60%	80%	67%	73%	53%	47%
8	daqin	47%	40%	40%	40%	27%	40%	47%	67%	33%	73%	60%
9	freshmeat	39%	47%	40%	60%	33%	40%	47%	27%	20%	27%	47%
10	ualbertabot	83%	100%	73%	93%	67%	80%	93%	87%	87%	67%	80%
	overall	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%

Dragon’s results, as last year, are inconsistent across maps. Again, it doesn’t show in the averages across the bottom. Actually, comparing with other bots, it doesn’t seem much different. Most had extra good and extra bad maps against some opponents.

#	steamhammer	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	bananabrain	17%	20%	13%	20%	0%	20%	20%	20%	13%	20%	20%
3	dragon	63%	47%	53%	73%	60%	53%	87%	47%	67%	73%	67%
5	mcrave	54%	73%	60%	53%	47%	47%	73%	27%	60%	40%	60%
6	willyt	56%	80%	67%	60%	73%	40%	40%	60%	53%	27%	60%
7	microwave	73%	80%	87%	67%	73%	73%	53%	67%	67%	93%	67%
8	daqin	27%	13%	53%	13%	20%	7%	27%	40%	20%	47%	27%
9	freshmeat	68%	60%	73%	67%	80%	67%	60%	93%	73%	60%	47%
10	ualbertabot	92%	93%	100%	93%	100%	87%	100%	80%	93%	87%	93%
	overall	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%

The inconsistent results across maps may mean that bots are weak at adjusting their strategies to fit the maps. Steamhammer makes an attempt, but with 10 maps, it would take a very long tournament to gather the data to decide well. This is one of the issues that the opening timing data—the project I chose to delay—would address. It would at least help on BASIL maps, where there are enough games.

#	mcrave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	5%	0%	7%	7%	0%	0%	0%	13%	7%	0%	20%
2	bananabrain	17%	33%	20%	7%	20%	0%	20%	27%	20%	7%	20%
3	dragon	33%	47%	73%	47%	20%	27%	13%	27%	20%	20%	33%
4	steamhammer	46%	27%	40%	47%	53%	53%	27%	73%	40%	60%	40%
6	willyt	32%	47%	40%	33%	20%	27%	13%	27%	33%	60%	20%
7	microwave	60%	40%	47%	40%	67%	67%	67%	73%	73%	67%	60%
8	daqin	79%	87%	87%	73%	87%	100%	80%	60%	73%	67%	80%
9	freshmeat	65%	53%	47%	80%	60%	60%	60%	67%	60%	73%	93%
10	ualbertabot	37%	73%	67%	40%	47%	7%	33%	13%	47%	40%	7%
	overall	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%

As an example of the uninterpretability of the data, why did McRave do especially well against Dragon on Heartbreak Ridge? Is it because it was a 2-player map? No, the other 2-player maps Destination and Polaris Rhapsody do not agree. Was it because the map is flat, without a ramp? No, Dragon crushed it on Longinus and Empire of the Sun. Was it because of the short rush distance? I don’t think that matches McRave’s play style. It might be because Dragon makes specific mistakes in building placement or tactics, which McRave’s play is lucky enough to exploit on Heartbreak Ridge. The multiple paths through the center of the map might confuse Dragon into splitting its forces. To know for sure, we have to examine the games.

#	willyt	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	5%	0%	0%	0%	0%	7%	0%	0%	0%	40%	0%
2	bananabrain	7%	7%	7%	7%	0%	7%	13%	7%	13%	0%	13%
3	dragon	4%	7%	7%	0%	13%	0%	7%	7%	0%	0%	0%
4	steamhammer	44%	20%	33%	40%	27%	60%	60%	40%	47%	73%	40%
5	mcrave	68%	53%	60%	67%	80%	73%	87%	73%	67%	40%	80%
7	microwave	67%	60%	67%	87%	67%	53%	73%	73%	67%	73%	53%
8	daqin	38%	40%	33%	40%	20%	33%	47%	27%	47%	53%	40%
9	freshmeat	68%	80%	67%	73%	60%	40%	93%	60%	73%	73%	60%
10	ualbertabot	69%	79%	73%	67%	60%	47%	80%	53%	71%	86%	73%
	overall	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%

For bot authors, I think it’s likely to be more useful to look at weaknesses than strengths. The weaknesses with the greatest contrast with the bot’s other results against the same opponent may be worth figuring out. For WillyT, that is the 20% score versus Steamhammer on Destination, a map where the natural should be easy to defend thanks to the double bridges. The weak result might represent a systematic mistake, though of course it could also be something very specific to the map and opponent.

#	microwave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	bananabrain	14%	13%	0%	20%	13%	13%	27%	7%	0%	33%	13%
3	dragon	34%	53%	20%	40%	7%	40%	20%	33%	27%	47%	53%
4	steamhammer	27%	20%	13%	33%	27%	27%	47%	33%	33%	7%	33%
5	mcrave	40%	60%	53%	60%	33%	33%	33%	27%	27%	33%	40%
6	willyt	33%	40%	33%	13%	33%	47%	27%	27%	33%	27%	47%
8	daqin	81%	87%	100%	67%	93%	80%	60%	67%	87%	67%	100%
9	freshmeat	83%	73%	73%	73%	80%	87%	87%	80%	100%	93%	80%
10	ualbertabot	55%	67%	73%	60%	40%	33%	73%	67%	40%	57%	40%
	overall	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%

Strong and weak results could also be just luck, statistical fluctuations. It’s safe to promise that some seemingly meaningful numbers... aren’t, because they’re based on only 15 games.

#	daqin	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	9%	13%	13%	0%	0%	47%	7%	0%	7%	7%	0%
2	bananabrain	10%	13%	0%	7%	20%	7%	20%	27%	7%	0%	0%
3	dragon	53%	60%	60%	60%	73%	60%	53%	33%	67%	27%	40%
4	steamhammer	73%	87%	47%	87%	80%	93%	73%	60%	80%	53%	73%
5	mcrave	21%	13%	13%	27%	13%	0%	20%	40%	27%	33%	20%
6	willyt	62%	60%	67%	60%	80%	67%	53%	73%	53%	47%	60%
7	microwave	19%	13%	0%	33%	7%	20%	40%	33%	13%	33%	0%
9	freshmeat	31%	27%	47%	33%	40%	40%	33%	0%	27%	13%	47%
10	ualbertabot	78%	80%	80%	73%	67%	73%	100%	80%	87%	67%	73%
	overall	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%

#	freshmeat	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	1%	0%	0%	0%	0%	7%	0%	0%	0%	0%	0%
2	bananabrain	4%	0%	0%	0%	13%	13%	7%	0%	0%	7%	0%
3	dragon	61%	53%	60%	40%	67%	60%	53%	73%	80%	73%	53%
4	steamhammer	32%	40%	27%	33%	20%	33%	40%	7%	27%	40%	53%
5	mcrave	35%	47%	53%	20%	40%	40%	40%	33%	40%	27%	7%
6	willyt	32%	20%	33%	27%	40%	60%	7%	40%	27%	27%	40%
7	microwave	17%	27%	27%	27%	20%	13%	13%	20%	0%	7%	20%
8	daqin	69%	73%	53%	67%	60%	60%	67%	100%	73%	87%	53%
10	ualbertabot	52%	21%	67%	80%	50%	43%	64%	57%	33%	47%	53%
	overall	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%

#	ualbertabot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	1%	7%	0%	0%	0%	0%	0%	7%	0%	0%	0%
2	bananabrain	5%	7%	7%	0%	13%	13%	0%	7%	0%	0%	7%
3	dragon	17%	0%	27%	7%	33%	20%	7%	13%	13%	33%	20%
4	steamhammer	8%	7%	0%	7%	0%	13%	0%	20%	7%	13%	7%
5	mcrave	63%	27%	33%	60%	53%	93%	67%	87%	53%	60%	93%
6	willyt	31%	21%	27%	33%	40%	53%	20%	47%	29%	14%	27%
7	microwave	45%	33%	27%	40%	60%	67%	27%	33%	60%	43%	60%
8	daqin	22%	20%	20%	27%	33%	27%	0%	20%	13%	33%	27%
9	freshmeat	48%	79%	33%	20%	50%	57%	36%	43%	67%	53%	47%
	overall	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Next: I want to take a day to show off Steamhammer skills before I get back to AIIDE analysis.

AIIDE 2021 - summary tables

This year, for the first time ever, I did not have to update my parser to get results that exactly match the official results. Go stable tooling!

Here’s my version of the crosstable, identical to the official one except for the presentation. I have to produce the table to verify that I got it right, so I might as well show it. Also, for some people and some purposes, it’s easier to read than the original. For official results, it’s correct to use exact numbers, as is done. For general use, percentages are easier to interpret.

#	bot	overall	star	bana	drag	stea	mcra	will	micr	daqi	fres	ualb
1	stardust	95.63%		84%	98%	100%	95%	95%	100%	91%	99%	99%
2	bananabrain	79.70%	16%		76%	83%	83%	93%	86%	90%	96%	95%
3	dragon	51.19%	2%	24%		37%	67%	96%	66%	47%	39%	83%
4	steamhammer	49.78%	0%	17%	63%		54%	56%	73%	27%	68%	92%
5	mcrave	41.70%	5%	17%	33%	46%		32%	60%	79%	65%	37%
6	willyt	41.05%	5%	7%	4%	44%	68%		67%	38%	68%	69%
7	microwave	40.70%	0%	14%	34%	27%	40%	33%		81%	83%	55%
8	daqin	39.63%	9%	10%	53%	73%	21%	62%	19%		31%	78%
9	freshmeat	33.61%	1%	4%	61%	32%	35%	32%	17%	69%		52%
10	ualbertabot	26.70%	1%	5%	17%	8%	63%	31%	45%	22%	48%

And here’s my version of the bot performance per map table. I use red and blue colors, which means less trouble for people who are red-green colorblind (supposed to be 8% of men plus a few women). The official tables have a sharp color shift between red at 49% and green at 51%, which is good if you want to distinguish ahead from behind. I didn’t go to any special trouble to make perceptually accurate colors, but my color shift is pretty smooth anyway, good if you want to accentuate big differences. 49% is very pale red and 51% is very pale blue; they look nearly the same because the numbers are nearly the same. If you’re interested, compare Steamhammer’s rows in the two tables, all close to 50%.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%
2	bananabrain	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%
3	dragon	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%
4	steamhammer	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%
5	mcrave	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%
6	willyt	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%
7	microwave	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%
8	daqin	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%
9	freshmeat	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%
10	ualbertabot	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Least but not last, the overall race balance. There is only one random bot, UAlbertaBot, and two terran bots, so the data is more sparse than usual. This table mainly tells us that the protoss participants were strong.

	overall	vT	vP	vZ	vR
terran	46%		20%	57%	76%
protoss	72%	80%		74%	90%
zerg	41%	43%	26%		59%
random	27%	24%	10%	41%

Finally, how each bot did against each race.

#	bot	overall	vT	vP	vZ	vR
1	stardust	95.63%	97%	87%	98%	99%
2	bananabrain	79.70%	84%	53%	87%	95%
3	dragon	51.19%	96%	24%	52%	83%
4	steamhammer	49.78%	59%	14%	65%	92%
5	mcrave	41.70%	32%	34%	57%	37%
6	willyt	41.05%	4%	17%	62%	69%
7	microwave	40.70%	33%	32%	50%	55%
8	daqin	39.63%	58%	10%	36%	78%
9	freshmeat	33.61%	47%	25%	28%	52%
10	ualbertabot	26.70%	24%	10%	41%	-

Next: Map tables for each bot.

a first look at the AIIDE 2021 results

AIIDE 2021 results are out.

Ever since the unfortunate withdrawal of PurpleWave due to frame time issues, it was sure that protoss Stardust and BananaBrain would finish first and second—it seemed likely from the start, but without the other strong protoss it was inescapable. As it turned out, #1 Stardust was in a class by itself, scoring 96%, and #2 BananaBrain was in the following class by itself at 80%. #3 Dragon was only the best of the rest, the leader of the trailers, barely above breakeven with 51%. All others scored below even. I didn’t expect Dragon to place so high, because it was a holdover from last year and bots should have been prepared for it. I knew that #4 Steamhammer would outscore it head-to-head.

#4 Steamhammer did great at 49%. I met my goals of finishing above the middle and of murderfying #7 Microwave (73% score head-to-head). I had hoped to make third, but missed by about 1.4%. I expected to and did beat #3 Dragon, #5 McRave, and #7 Microwave, so I had some reason. I knew that Steamhammer risked a zero score against #1 Stardust—and it did happen—but the win count was going to be tiny no matter what so it wasn’t a big concern. I was worried about #6 WillyT because its big tank-infantry attacks are effective, but Steamhammer scored OK there too with 56%. Like last year, the trouble was a huge upset by carryover #8 DaQin. 2020 score 22%, 2021 score 27%—an improvement, but not by much. I had expected better.

#5 McRave scored better than the other zergs versus #1 Stardust and #2 BananaBrain, but it was not enough to move the needle. It was upset by #6 WillyT and, strangely, by #10 UAlbertaBot (last year it scored 89% against UAlbertaBot). #6 WillyT could not cope at all with #3 Dragon, and was upset by #8 DaQin too. #7 Microwave was little updated, according to the author. #9 FreshMeat, the new zerg by Hao Pan, scored 34% and was the tail ender of the submitted bots (those other than the holdovers). #10 UAlbertaBot’s upset of #5 McRave and stubborn ability to score some wins against every opponent kept it up at 27%, higher than I had anticipated. I guess UAlbertaBot will remain a usable benchmark for at least one more year.

The tournament ranks are similar to the BASIL ranks. BASIL has Stardust as the top among the AIIDE participants and BananaBrain as next. Microwave’s higher placement on BASIL is the biggest discrepancy. FreshMeat may be class B on BASIL and ranked 18 out of 86, but its BASIL rank still predicts its second-to-last finish.

This highlights that AIIDE 2021 was an elite tournament. There were few participants, and every submitted bot was already known to be highly ranked. 3 newcomer bots registered, and none was submitted. To me, it smells as though authors only want to submit if they believe they can do well. I see that as a mistake. From the author’s point of view, a tournament is a chance to gain experience, to learn about your own bot and others, and to show off your good ideas. From the community’s point of view, a tournament is an opportunity to invite new members in and to trade insights. In my experience, virtually every bot has good ideas that we can learn from. Many bots that perform poorly in games still have impressive skills in specific circumstances, not to mention other clever ideas. See for example my analysis of AITP, which scored 12% in AIIDE 2019.

Next: New bot Broken Horn. After that, stand by for more analysis of AIIDE.

AIIDE 2021 dropouts

The AIIDE 2021 list of entrants says that all 3 of the new names did not submit: Taiji, real5drone, and BlueSoup. That leaves 8 familiar names and 3 bots carried over from last year, 11 total. See AIIDE 2021 prospects.

Unfortunate but unsurprising. :-( Lately new bots have been dropping out of tournaments at a high rate. I will keep advising authors to participate if they can. Even if you think you’re not ready, it’s worth it. If your bot plays games without crashing more than occasionally, you have nothing to lose and experience to gain.

Steamhammer is ready for AIIDE 2021

Steamhammer is all set for AIIDE. I’m still making checks and running tests to be extra-duper-sure, but I’m convinced that this is the strongest and least-buggy Steamhammer ever. It hasn’t been uploaded anywhere, so nobody will be 100% ready for it... though I guess Stardust will be 99% ready. I plan to submit it today, a day ahead of time.

I’ll post the change list in a day or so, and the code after the submission deadline is safely past.

Steamhammer is frozen for AIIDE

Steamhammer is feature-frozen for AIIDE 2021, so that I don’t risk breaking my good version. Well, maybe not entirely frozen, but reduced to a low temperature (keyword simulated annealing). I will fix bugs and prepare for specific opponents, and I’ll probably also make small feature tweaks if they are safe.

My change list right now has 54 items on it, 14 of them marked as important changes that significantly improve play. With that many, obviously none of them is a big project—there hasn’t been time! But major weaknesses are fixed or reduced, affecting the whole range of strategy, tactics, and micro; all levels have important improvements. I’m pleased and optimistic. (Of course I was optimistic before AIST S4, and then Steamhammer lost 0-4, so....)

For one new feature, I ran a test that involved adding a spore colony at every base. At supply 7 (very early, before the spawning pool), Steamhammer started an evolution chamber and made a spore in the main, and then added a spore at every new base for the rest of the game. The opponent didn’t matter for the test, so I ran it against the protoss built-in AI. I was tickled that, even with the giant handicap, Steamhammer won several games in a row with apparent ease.

the sunken range bug and AIIDE 2021

In Steamhammer 3.5.1 (see the “zerg” section) I added a defense against cannon rushes which exploits the sunken range bug. The bug makes it possible, under specific conditions, for a sunken colony to target an enemy which is outside the sunken’s range. Exploiting the bug is allowed in human tournaments. In fact, it’s a standard defense against cannon rushes, one that players know and use. An example is ASL 11 Semifinal A, Mini vs Queen, game 1—see about 32 minutes into the vod for a complicated sequence where Mini eventually abandons the cannon rush knowing that it has been countered, and notice that casters Nyoken and Scan have little trouble understanding what happened and why.

At the time I wrote “Use of this bug seems to be universally legal,” but today I checked the AIIDE rules more closely. The rules include a list of allowed bugs to exploit, and add “All other bugs/exploits are forbidden.” The sunken range bug is not on the list.

I sent e-mail to Dave Churchill explaining the situation and its complexities. He’s busy and I don’t know if he’ll have time to look into it. Basically, I’m expecting to disable the behavior in Steamhammer for the tournament. I’m adding a configuration setting Config::Skills::UseSunkenRangeBug so I can turn it on and off.

Most likely no AIIDE 2021 protoss will cannon rush at all, so in a way the point is academic. But who knows?

what should the rules say?

It’s complicated!

The range bug is a game behavior, and it can happen unintentionally in real games, just because events happen to trigger its conditions. It’s fairly rare, but I expect all who play regularly have seen it (whether they recognized it or not). Bots should not be penalized for game behavior that they did not intend, and have no reason to even notice.

Steamhammer deliberately attempts to exploit the bug to beat cannon rushes. I have to interpret that as a violation of the AIIDE rules as they are written.

If you’re actively trying to enforce the rule, how would you do it? First, you’d have to examine the games, presumably with replay analysis software since there are too many to watch in person. Then you’d have to decide whether at least one instance of the bug was a deliberate exploit. That likely involves reading the code to be sure. Tournament organizers are not going to go to so much trouble, so probably the only practical enforcement would be for other authors or observers to point out possible infractions after the fact.

Then there’s the point that exploiting the bug is legal in human play, so presumably it should be legal in bot play. But that has a hidden assumption behind it: Humans can’t or don’t exploit the bug in any way that seems unfair, therefore bots won’t either. It might be true, but how sure are you? Bots with perfect timing and simultaneous view of all information might be able to exploit the bug in a way that feels unfair. Then the rules would be unfair.

Maybe it’s right to allow exploiting the range bug unless and until some bot implements an unfair exploitation.

Even if it may be a good idea to change the rules, it’s no good to change them close to the submission deadline. The rules for this year should stay put. Next year’s rules may be open to debate.

Update: I have mail from Dave Churchill. After some flip-flopping, the final ruling is “INTENTIONAL use of this bug via any specific code that invokes it is not allowed.” That follows the original rules.

AIIDE 2021 prospects

The AIIDE registration list is available today. There are 14 bots on the list, compared to 15 last year. I count 7 protoss, 5 zerg, 1 terran, and 1 random, though among the updated returning bots protoss and zerg are 3-4. Zerg seems to be gaining in popularity. (As it should, ahem. I have been ahead of the curve the whole time!)

The familiar bots, in order of their BASIL ranks today:

bot	author
Stardust	Bruce Nielsen
BananaBrain	Johan de Jong
PurpleWave	Dan Gant
Microwave	Micky Holdorf
McRave	Christian McCrave
Fresh Meat	Hao Pan
WillyT	Nico Klausner
Steamhammer	Jay Scott

Protoss dominance is showing, but it already cracked in CoG without the help of Monster. Terran shyness is also showing, but I notice that WillyT has improved a lot in the last year. I counted new zerg Fresh Meat as a familiar bot even though neither Fresh Meat nor its terran counterpart Halo by Hao Pan has participated in AIIDE before—Hao Pan is an old stalwart. It will be interesting to see how Fresh Meat performs in the different world of a long tournament. Steamhammer is last on this list, but I have already fixed key weaknesses. It will perform better in the tournament.

We have 3 newcomers, a fair number. Last year there were 4 newcomers, and unfortunately only EggBot ended up playing.

bot	author
Blue Soup	Eujain Ting
real5drone	Kim TaeYoung
Taiji	Wang Bin

Eujain Ting seems to have some experience. I found old repos related to Broodwar and BWAPI on Bitbucket: Eujain Ting repositories. And I see Eujain Ting registered for the 2011(!) AIIDE tournament without playing. The name “Blue Soup” (after the debris of a destroyed dragoon) suggests either low expectations or a sense of humor! Kim TaeYoung last year registered protoss DanDanBot (which did not play). This year, the zerg name “real5drone” suggests a 5 pool strategy, whether honestly or otherwise. Wang Bin last year registered Taij, which I’m guessing was a typo for this year’s allusive name Taiji (and which also did not play). From past experience I do not have high hopes for the newcomers, but occasionally a great one appears. And I think it’s valuable experience to participate no matter how good or bad your bot is.

Plus 3 holdovers from previous tournaments, with their win rates from last year.

bot	author
Dragon 62.38%	Vegard Mella
DaQin 50.14%	Lion GIS
UAlbertaBot 31.14%	Dave Churchill

It’s too bad that we don’t get an updated Dragon. It’s a complicated and interesting opponent. UAlbertaBot is here for yet another year, I gather, mainly as a long-term baseline to measure progress. I forecast that it will score less than 15% against the updated bots, and this year or next it is likely to lose its value as a baseline. 2017 champion ZZZKBot scored 39.89% last year and is not being carried over—now that’s a sign of progress.