Starcraft AI blog

AIIDE 2021 - Stardust’s learning

I investigated how Stardust’s learning works, and what it learned. It’s unusual, so it was worth a close look.

In its learning file of game records for each opponent, Stardust records values for 3 keys for each game, firstDarkTemplarCompleted, pylonInOurMain, and firstMutaliskCompleted. If the event occurs in the game, the value is the frame time of the event; otherwise the value is 2147483647 (INT_MAX, the largest int value, in this C++ implementation). It also records whether the game was a win or a loss. It records the hash of the map, too, but that doesn’t seem to be used again.

summarizing the data

The class Opponent is responsible for providing the learned information to the rest of the bot. It summarizes the game records via two routines.

  int minValueInPreviousGames(const std::string &key, int defaultNoData, int maxCount = INT_MAX, int minCount = 0);

If there are at least minCount games, then look through the game records, most recent first, for up to maxCount games. Look up the key for each game and return its minimum value, or the default value if there are none. This amounts to finding the earliest frame at which the event happened, or the default if it did not happen in the specified number of games.

   double winLossRatio(double defaultValue, int maxCount = INT_MAX);

Look through the game records, most recent first, for up to maxCount games and return the winning ratio, or the default value if there are no games yet.

using the summarized data

Each of the 3 keys is used in exactly one place in the code. Here is where firstDarkTemplarCompleted is looked up in the PvP strategy code:

    if (Opponent::winLossRatio(0.0, 200) < 0.99)
    {
        expectedCompletionFrame = Opponent::minValueInPreviousGames("firstDarkTemplarCompleted", 7300, 15, 10);
    }

This means “If we’re rolling you absolutely flat (at least 99% wins in the last 200 games), then it doesn’t matter. Otherwise there’s some risk. In the most recent 15 games, find the earliest frame that the first enemy dark templar was (estimated to be) completed, or return frame 7300 if none.” The default frame 7300 is not the earliest a DT can emerge; they can be on the map over a thousand frames earlier. So it is not a worst-case assumption. Further code overrides the frame number if there is scouting information related to dark templar production. It attempts to build a defensive photon cannon just in time for the enemy DT’s arrival, and sometimes to get an observer.

The key pylonInOurMain is part of cannon rush defense. Stardust again checks the win ratio and again looks back 15 games with a minimum game count of 10, this time with a default of 0 if there are not enough games. It starts scouting its base 500 frames (about 21 seconds) ahead of the earliest seen enemy pylon appearing in its base, which may be never. The idea is that Stardust doesn’t waste time scouting its own base if it hasn’t seen you proxy a pylon in the last 15 games, and delays the scout if the pylon is proxied late.

The key firstMutaliskCompleted is used very similarly, to decide whether and when to defend each nexus with cannons. The goal is to get cannons in time in case mutalisks arrive without being scouted. There are simple rules to decide how many cannons at each nexus:

    // Main and natural are special cases, we only get cannons there to defend against air threats
    if (base == Map::getMyMain() || base == Map::getMyNatural())
    {
        if (enemyAirUnits > 6) return 4;
        if (enemyAirThreat) return 3;
        if (enemyDropThreat && BWAPI::Broodwar->getFrameCount() > 8000) return 1;
        return 0;
    }

    // At expansions we get cannons if the enemy is not contained or has an air threat
    if (!Strategist::isEnemyContained() || enemyAirUnits > 0) return 2;
    if (enemyAirThreat || enemyDropThreat) return 1;
    return 0;

If the firstMutaliskCompleted check says that it’s time, it sets enemyAirThreat to true and makes 3 cannons each at main and natural, and at least 1 at each other base.

the data itself

Here’s my summary of the data in Stardust’s files. The files include prepared data. I left the prepared data out; this covers only what was recorded during the tournament. The tournament was run for 157 rounds, although the official results are given after round 150. The table here is data for all 157 rounds. I don’t have a way to tell which unrecorded games were from rounds 1-150 and which were from 151-157... though I think I could guess.

n is the number of games for which a value (other than 2147483647) was recorded for the key. The values are frame numbers.

		firstDarkTemplarCompleted				pylonInOurMain				firstMutaliskCompleted
opponent	games	n	min	median	max	n	min	median	max	n	min	median	max
bananabrain	155	20	7579	7897.5	23319	0	-	-	-	0	-	-	-
dragon	156	0	-	-	-	0	-	-	-	0	-	-	-
steamhammer	158	0	-	-	-	0	-	-	-	17	7188	8241	10355
mcrave	157	0	-	-	-	0	-	-	-	124	9070	10939	16146
willyt	157	0	-	-	-	0	-	-	-	0	-	-	-
microwave	157	0	-	-	-	0	-	-	-	17	7371	8534	11397
daqin	156	126	7533	7912.5	18154	2	2721	2743.5	2766	0	-	-	-
freshmeat	157	0	-	-	-	0	-	-	-	1	16801	16801	16801
ualbertabot	157	17	6230	6477	6627	0	-	-	-	0	-	-	-

As you might expect after deep contemplation of the nature of reality, only protoss makes dark templar or proxy pylons, and only zerg makes mutalisks. Nothing interesting was recorded for the terran opponents.

Notice that UAlbertaBot sometimes makes dark templar much earlier than the no-data 7300 frame default time; the others do not. DaQin is recorded as twice placing a proxy pylon in Stardust’s main. I didn’t think it ever did that. I guess it’s a holdover from the Locutus proxy pylon play, to trick opponents into overreacting? DaQin made DTs in most games, and McRave went mutalisks in most games. FreshMeat is recorded as having made a mutalisk (or more than one) in exactly one game, which seems unusual.

AIIDE 2021 - the learning curves

Before I dig into what each bot learned, I thought I’d look at the win percentage over time graph. Every bot wrote data, and it is likely that every bot attempted to learn and improve over time. Only some succeeded in improving their results, though.

Every bot shows a startup transient on the graph. The early swings up and down are controlled by some combination of luck and learning; luck because there are few games so statistical variation is high, and learning if and when the learning algorithms make fast adjustments (I think they usually do). To disentangle luck from learning, I think I want both statistical tests and a look into the algorithms to see what the learning rates could be. It would be too much for one post. In this post, I’m looking at the curves after 20 or 30 rounds, when the swings have mostly leveled off. I’m answering the question: Is the bot able to keep learning throughout a long tournament, outlearning its competition in the long run?

Four bots more or less held even. There are wobbles or slight trends, but not large ones. It’s what you expect if most bots are about equally good at lifetime learning. The learning systems are more or less saturated, and when one discovers an exploit, its counterpart figures out soon enough how to neuter the exploit, or so I imagine it. The learning competition is near an equilibrium.

Stardust doesn’t learn much, and apparently doesn’t have to. Steamhammer and McRave have messy early curves, perhaps reflecting complicated learning systems. FreshMeat has a beautiful clean early curve, unlike any other bot’s, suggesting that it knows what it is doing and straightforwardly does it. All 3 of the lower bots show low humps followed by slight regressions. I provisionally interpret that as the bot’s learning system saturating, then its opponents adjusting to that over time.

Four bots were able to improve. BananaBrain was in a class by itself, improving far more than any other bot. WillyT, Microwave, and UAlbertaBot had slight upward trends. None of them looks as impressive as AIUR did in 2015.

What gives BananaBrain a steeper curve? Is it good at learning in the long term, or bad at learning in the short term? (See that down-hook at the beginning.) I’ll look into it later on.

Dragon and DaQin fell behind. If somebody’s going up, somebody else must be going down. It may not be a coincidence that both are carryover bots from last year. Dragon’s learning files have a simple structure, the strategy name and win/loss. DaQin plays few strategies and has few ways to escape from exploits that other bots may find.

Next: Looking at Stardust’s learning.

AIIDE 2021 - what bots wrote data?

I looked in each bot’s final write directory to see what files it wrote, if any, and in its AI directory to see if it had prepared data for any opponents. Be sure to note: A bot does not necessarily use the data it writes. Preparation for specific opponents is not necessarily in the form of data in the AI directory, it might be in code.

#	bot	info
1	Stardust	Unlike last year, this year Stardust wrote data. It’s in JSON format, and records the map by hash, win or loss, and the timings of up to 3 game events, named `firstDarkTemplarCompleted`, `firstMutaliskCompleted`, and `pylonInOurMain`. The times look like frame numbers, and the great majority are 2147483647 (-1 printed as unsigned), which must mean “didn’t happen”. There is prepared data for 7 opponents (including PurpleWave which did not compete), so I assume that Stardust uses the data. I’ll find out for sure when I look at the source.
2	BananaBrain	The learning files look unchanged from last year and the year before: One file for each opponent in the form of brief records of results. Each record consists of date+time, map, BananaBrain’s strategy (“PvZ_9/9proxygate”), the opponent’s recognized strategy (“Z_9pool”), a floating point number which we were told last year is the game duration in minutes, and the game result. Pre-learned data for DaQin and Dragon, the two stronger carryover bots. Last year there was pre-learned data for more opponents; maybe prep for opponents that might change turned out risky.
3	Dragon	Simple game records, one per line, with strategy and game result, like `"siege expand" won`.
4	Steamhammer	Steamhammer’s learning file format is documented here.
5	McRave	The files look to have the same information as last year, but the format is slightly different. Two files for each opponent, named like `ZvU UAlbertaBot.txt` and `ZvU UAlbertaBot Info.txt`. The first file is short and counts wins and losses overall and for each of McRave’s strategies. The info file has detailed game records with aspects of the opponent’s strategy (`2Gate,Main,ZealotRush`), McRave’s strategy at 3 levels of abstraction (`PoolHatch,Overpool,2HatchMuta`), timings, and unit counts. No prepared files.
6	WillyT	The files seem to have been corrected since last year. There is one file per opponent, one line per game, with lines that look like `20211005,Z,03,0`. The items look like date, opponent race, a number 01 02 or 03, and win/loss. No prepared files.
7	Microwave	Result and history files for each opponent. They look identical to last year’s, except that Microwave now lists a much larger number of strategies for itself. The result files count wins and losses for each Microwave strategy. The history files have a one-line record of data about each game. Also pre-learned history files for all opponents, each with over 100 game records.
8	DaQin	Carried over from last year. Learning files straight from its parent Locutus (very similar to the old format Steamhammer files). No prepared files (and they’d be out of date if they existed).
9	FreshMeat	Three files for each opponent, except 6 files for UAlbertaBot, presumably because it plays random. The contents of the files are opaque: Two are bare lists of numbers, one is a list of incomprehensible 14-character strings. I’ll have to read the code. No prepared files.
10	UAlbertaBot	Carried over from past years. For each opponent, a file listing strategies with win and loss counts for each.

The only real surprise is Stardust’s minimalist and rather weird-seeming data. FreshMeat is new, of course, so anything it did would be unsurprising! It’s notable that every single participant wrote learning data, but that’s not a surprise either because this was an elite tournament. Except for Stardust, all the elite bots have used learning for years.

In unrelated news, I expected that CoG would post replays and learning files shortly after the AIIDE submission deadline. But no, they haven’t done it yet.

Steamhammer showoff games

I picked five winning games to show off Steamhammer’s fearsome might, such as it is. I’m happy with the improvements in the tournament version, and if there’s more to do, then when it’s done I can be happy about that too.

Steamhammer used to defeat Locutus only when Locutus messed up severely, such as by trapping its own dragoons in its natural so that zerg didn’t have to face the whole army. And it still can’t touch Stardust. I was surprised to see a couple games where Steamhammer beat Locutus by straight up outfighting the dragoons. See Steamhammer-Locutus on Fighting Spirit where zerg won impressively with All The Macro, and Steamhammer-Locutus on Jade where zerg was unable to keep a third base up for long, but still wrested a win. Some of the credit is due to the smarter upgrade choices versus protoss, though the burrowed zergling preventing expansion was key, and Locutus did supply block itself. Here’s a picture from the second game. It may look as though zerg has 3 bases beyond its main and natural, but 1 is already destroyed (you’re seeing burrowed drones on the minimap) and the other 2 will be.

Steamhammer has also been taking games from Halo by Hao Pan. That’s not new, but I like the promise it shows. Some wins are with a one-base mutalisk build: Steamhammer-Hao Pan on Fighting Spirit. I was intrigued by Steamhammer-Hao Pan on Roadrunner where Halo was winning after a vulture-wraith build and putting on continual pressure, while Steamhammer struggled with awesome determination for longer than seemed possible. The new static defense code provides for stubborn defense. Instead of winning, Halo suffered some kind of production bug, fell behind on macro, and slowly lost. I suppose it is the result of Hao Pan concentrating on Fresh Meat, but Halo is still higher ranked than Steamhammer. This is why you don’t resign too early in bot versus bot!

Steamhammer-MadMixP on Medusa shows off cannon-related skills of both bots. MadMix cannoned behind the zerg minerals, a great skill which I haven’t seen from any other bot. Steamhammer could not fight so many cannons, but it showed its own rare skill: It mined only the mineral patches that were outside cannon range. It’s not a new skill, but I’m proud of it. Unfortunately the drones that were not allowed to mine dangerous minerals idled around the base “waiting for them to open up” instead of being transferred elsewhere, but one step at a time! Steamhammer knocked down the undefended protoss main, expanded there itself, and clumsily but inevitably defeated the cannons for the win.

I recommend making no more than about 4 cannons, then adding gateways at the proxy instead. Zealots have the power to, you know, move around and hit stuff that’s outside immediate reach. The only extra smarts the zealots need is the ability to retreat toward the cannons when outmatched and fight within cannon range.

AIIDE 2021 - results by map

This post is about the details of how bots performed on maps. I wrote up the map pool last year. In order across the top of each table, there are 3 maps with 2 starting positions, 2 with 3, and 5 with 4. The tables are full of information, but I’ve learned that it is hard to extract insights from the information; to find out what strengths and weaknesses the data points out, you usually have to watch the games. The value of the tables lies in telling authors what games to watch to identify weaknesses.

For reference, here’s a copy of the map table from yesterday, the summary of how well bots did overall on each map.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%
2	bananabrain	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%
3	dragon	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%
4	steamhammer	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%
5	mcrave	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%
6	willyt	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%
7	microwave	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%
8	daqin	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%
9	freshmeat	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%
10	ualbertabot	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Each bot gets its own table, how well it performed against each opponent on each map. Each cell represents 15 games, occasionally 14 if not all games completed, so expect noise in the numbers.

#	stardust	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
2	bananabrain	84%	93%	93%	80%	80%	67%	60%	100%	87%	93%	87%
3	dragon	98%	87%	100%	100%	100%	100%	93%	100%	100%	100%	100%
4	steamhammer	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
5	mcrave	95%	100%	93%	93%	100%	100%	100%	87%	93%	100%	80%
6	willyt	95%	100%	100%	100%	100%	93%	100%	100%	100%	60%	100%
7	microwave	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
8	daqin	91%	87%	87%	100%	100%	53%	93%	100%	93%	93%	100%
9	freshmeat	99%	100%	100%	100%	100%	93%	100%	100%	100%	100%	100%
10	ualbertabot	99%	93%	100%	100%	100%	100%	100%	93%	100%	100%	100%
	overall	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%

A solid wall of blue, but with a few gouges. The lower results versus WillyT on Python and DaQin on Longinus probably represent weaknesses exposed by specific game events that these players tend to bring about on these maps. The weaknesses are not visible in the overall chart, only here where broken down by opponent. The weaknesses show up in only a few cells, but they might occur in many games. Maybe the opponent only happened to exploit the weaknesses then.

#	bananabrain	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	16%	7%	7%	20%	20%	33%	40%	0%	13%	7%	13%
3	dragon	76%	93%	73%	67%	80%	87%	73%	60%	73%	80%	73%
4	steamhammer	83%	80%	87%	80%	100%	80%	80%	80%	87%	80%	80%
5	mcrave	83%	67%	80%	93%	80%	100%	80%	73%	80%	93%	80%
6	willyt	93%	93%	93%	93%	100%	93%	87%	93%	87%	100%	87%
7	microwave	86%	87%	100%	80%	87%	87%	73%	93%	100%	67%	87%
8	daqin	90%	87%	100%	93%	80%	93%	80%	73%	93%	100%	100%
9	freshmeat	96%	100%	100%	100%	87%	87%	93%	100%	100%	93%	100%
10	ualbertabot	95%	93%	93%	100%	87%	87%	100%	93%	100%	100%	93%
	overall	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%

And this is a blue wall with sharp stuff on top, staining the top course of bricks with blood.

#	dragon	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	2%	13%	0%	0%	0%	0%	7%	0%	0%	0%	0%
2	bananabrain	24%	7%	27%	33%	20%	13%	27%	40%	27%	20%	27%
4	steamhammer	37%	53%	47%	27%	40%	47%	13%	53%	33%	27%	33%
5	mcrave	67%	53%	27%	53%	80%	73%	87%	73%	80%	80%	67%
6	willyt	96%	93%	93%	100%	87%	100%	93%	93%	100%	100%	100%
7	microwave	66%	47%	80%	60%	93%	60%	80%	67%	73%	53%	47%
8	daqin	47%	40%	40%	40%	27%	40%	47%	67%	33%	73%	60%
9	freshmeat	39%	47%	40%	60%	33%	40%	47%	27%	20%	27%	47%
10	ualbertabot	83%	100%	73%	93%	67%	80%	93%	87%	87%	67%	80%
	overall	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%

Dragon’s results, as last year, are inconsistent across maps. Again, it doesn’t show in the averages across the bottom. Actually, comparing with other bots, it doesn’t seem much different. Most had extra good and extra bad maps against some opponents.

#	steamhammer	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	bananabrain	17%	20%	13%	20%	0%	20%	20%	20%	13%	20%	20%
3	dragon	63%	47%	53%	73%	60%	53%	87%	47%	67%	73%	67%
5	mcrave	54%	73%	60%	53%	47%	47%	73%	27%	60%	40%	60%
6	willyt	56%	80%	67%	60%	73%	40%	40%	60%	53%	27%	60%
7	microwave	73%	80%	87%	67%	73%	73%	53%	67%	67%	93%	67%
8	daqin	27%	13%	53%	13%	20%	7%	27%	40%	20%	47%	27%
9	freshmeat	68%	60%	73%	67%	80%	67%	60%	93%	73%	60%	47%
10	ualbertabot	92%	93%	100%	93%	100%	87%	100%	80%	93%	87%	93%
	overall	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%

The inconsistent results across maps may mean that bots are weak at adjusting their strategies to fit the maps. Steamhammer makes an attempt, but with 10 maps, it would take a very long tournament to gather the data to decide well. This is one of the issues that the opening timing data—the project I chose to delay—would address. It would at least help on BASIL maps, where there are enough games.

#	mcrave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	5%	0%	7%	7%	0%	0%	0%	13%	7%	0%	20%
2	bananabrain	17%	33%	20%	7%	20%	0%	20%	27%	20%	7%	20%
3	dragon	33%	47%	73%	47%	20%	27%	13%	27%	20%	20%	33%
4	steamhammer	46%	27%	40%	47%	53%	53%	27%	73%	40%	60%	40%
6	willyt	32%	47%	40%	33%	20%	27%	13%	27%	33%	60%	20%
7	microwave	60%	40%	47%	40%	67%	67%	67%	73%	73%	67%	60%
8	daqin	79%	87%	87%	73%	87%	100%	80%	60%	73%	67%	80%
9	freshmeat	65%	53%	47%	80%	60%	60%	60%	67%	60%	73%	93%
10	ualbertabot	37%	73%	67%	40%	47%	7%	33%	13%	47%	40%	7%
	overall	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%

As an example of the uninterpretability of the data, why did McRave do especially well against Dragon on Heartbreak Ridge? Is it because it was a 2-player map? No, the other 2-player maps Destination and Polaris Rhapsody do not agree. Was it because the map is flat, without a ramp? No, Dragon crushed it on Longinus and Empire of the Sun. Was it because of the short rush distance? I don’t think that matches McRave’s play style. It might be because Dragon makes specific mistakes in building placement or tactics, which McRave’s play is lucky enough to exploit on Heartbreak Ridge. The multiple paths through the center of the map might confuse Dragon into splitting its forces. To know for sure, we have to examine the games.

#	willyt	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	5%	0%	0%	0%	0%	7%	0%	0%	0%	40%	0%
2	bananabrain	7%	7%	7%	7%	0%	7%	13%	7%	13%	0%	13%
3	dragon	4%	7%	7%	0%	13%	0%	7%	7%	0%	0%	0%
4	steamhammer	44%	20%	33%	40%	27%	60%	60%	40%	47%	73%	40%
5	mcrave	68%	53%	60%	67%	80%	73%	87%	73%	67%	40%	80%
7	microwave	67%	60%	67%	87%	67%	53%	73%	73%	67%	73%	53%
8	daqin	38%	40%	33%	40%	20%	33%	47%	27%	47%	53%	40%
9	freshmeat	68%	80%	67%	73%	60%	40%	93%	60%	73%	73%	60%
10	ualbertabot	69%	79%	73%	67%	60%	47%	80%	53%	71%	86%	73%
	overall	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%

For bot authors, I think it’s likely to be more useful to look at weaknesses than strengths. The weaknesses with the greatest contrast with the bot’s other results against the same opponent may be worth figuring out. For WillyT, that is the 20% score versus Steamhammer on Destination, a map where the natural should be easy to defend thanks to the double bridges. The weak result might represent a systematic mistake, though of course it could also be something very specific to the map and opponent.

#	microwave	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
2	bananabrain	14%	13%	0%	20%	13%	13%	27%	7%	0%	33%	13%
3	dragon	34%	53%	20%	40%	7%	40%	20%	33%	27%	47%	53%
4	steamhammer	27%	20%	13%	33%	27%	27%	47%	33%	33%	7%	33%
5	mcrave	40%	60%	53%	60%	33%	33%	33%	27%	27%	33%	40%
6	willyt	33%	40%	33%	13%	33%	47%	27%	27%	33%	27%	47%
8	daqin	81%	87%	100%	67%	93%	80%	60%	67%	87%	67%	100%
9	freshmeat	83%	73%	73%	73%	80%	87%	87%	80%	100%	93%	80%
10	ualbertabot	55%	67%	73%	60%	40%	33%	73%	67%	40%	57%	40%
	overall	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%

Strong and weak results could also be just luck, statistical fluctuations. It’s safe to promise that some seemingly meaningful numbers... aren’t, because they’re based on only 15 games.

#	daqin	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	9%	13%	13%	0%	0%	47%	7%	0%	7%	7%	0%
2	bananabrain	10%	13%	0%	7%	20%	7%	20%	27%	7%	0%	0%
3	dragon	53%	60%	60%	60%	73%	60%	53%	33%	67%	27%	40%
4	steamhammer	73%	87%	47%	87%	80%	93%	73%	60%	80%	53%	73%
5	mcrave	21%	13%	13%	27%	13%	0%	20%	40%	27%	33%	20%
6	willyt	62%	60%	67%	60%	80%	67%	53%	73%	53%	47%	60%
7	microwave	19%	13%	0%	33%	7%	20%	40%	33%	13%	33%	0%
9	freshmeat	31%	27%	47%	33%	40%	40%	33%	0%	27%	13%	47%
10	ualbertabot	78%	80%	80%	73%	67%	73%	100%	80%	87%	67%	73%
	overall	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%

#	freshmeat	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	1%	0%	0%	0%	0%	7%	0%	0%	0%	0%	0%
2	bananabrain	4%	0%	0%	0%	13%	13%	7%	0%	0%	7%	0%
3	dragon	61%	53%	60%	40%	67%	60%	53%	73%	80%	73%	53%
4	steamhammer	32%	40%	27%	33%	20%	33%	40%	7%	27%	40%	53%
5	mcrave	35%	47%	53%	20%	40%	40%	40%	33%	40%	27%	7%
6	willyt	32%	20%	33%	27%	40%	60%	7%	40%	27%	27%	40%
7	microwave	17%	27%	27%	27%	20%	13%	13%	20%	0%	7%	20%
8	daqin	69%	73%	53%	67%	60%	60%	67%	100%	73%	87%	53%
10	ualbertabot	52%	21%	67%	80%	50%	43%	64%	57%	33%	47%	53%
	overall	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%

#	ualbertabot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	1%	7%	0%	0%	0%	0%	0%	7%	0%	0%	0%
2	bananabrain	5%	7%	7%	0%	13%	13%	0%	7%	0%	0%	7%
3	dragon	17%	0%	27%	7%	33%	20%	7%	13%	13%	33%	20%
4	steamhammer	8%	7%	0%	7%	0%	13%	0%	20%	7%	13%	7%
5	mcrave	63%	27%	33%	60%	53%	93%	67%	87%	53%	60%	93%
6	willyt	31%	21%	27%	33%	40%	53%	20%	47%	29%	14%	27%
7	microwave	45%	33%	27%	40%	60%	67%	27%	33%	60%	43%	60%
8	daqin	22%	20%	20%	27%	33%	27%	0%	20%	13%	33%	27%
9	freshmeat	48%	79%	33%	20%	50%	57%	36%	43%	67%	53%	47%
	overall	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Next: I want to take a day to show off Steamhammer skills before I get back to AIIDE analysis.

AIIDE 2021 - summary tables

This year, for the first time ever, I did not have to update my parser to get results that exactly match the official results. Go stable tooling!

Here’s my version of the crosstable, identical to the official one except for the presentation. I have to produce the table to verify that I got it right, so I might as well show it. Also, for some people and some purposes, it’s easier to read than the original. For official results, it’s correct to use exact numbers, as is done. For general use, percentages are easier to interpret.

#	bot	overall	star	bana	drag	stea	mcra	will	micr	daqi	fres	ualb
1	stardust	95.63%		84%	98%	100%	95%	95%	100%	91%	99%	99%
2	bananabrain	79.70%	16%		76%	83%	83%	93%	86%	90%	96%	95%
3	dragon	51.19%	2%	24%		37%	67%	96%	66%	47%	39%	83%
4	steamhammer	49.78%	0%	17%	63%		54%	56%	73%	27%	68%	92%
5	mcrave	41.70%	5%	17%	33%	46%		32%	60%	79%	65%	37%
6	willyt	41.05%	5%	7%	4%	44%	68%		67%	38%	68%	69%
7	microwave	40.70%	0%	14%	34%	27%	40%	33%		81%	83%	55%
8	daqin	39.63%	9%	10%	53%	73%	21%	62%	19%		31%	78%
9	freshmeat	33.61%	1%	4%	61%	32%	35%	32%	17%	69%		52%
10	ualbertabot	26.70%	1%	5%	17%	8%	63%	31%	45%	22%	48%

And here’s my version of the bot performance per map table. I use red and blue colors, which means less trouble for people who are red-green colorblind (supposed to be 8% of men plus a few women). The official tables have a sharp color shift between red at 49% and green at 51%, which is good if you want to distinguish ahead from behind. I didn’t go to any special trouble to make perceptually accurate colors, but my color shift is pretty smooth anyway, good if you want to accentuate big differences. 49% is very pale red and 51% is very pale blue; they look nearly the same because the numbers are nearly the same. If you’re interested, compare Steamhammer’s rows in the two tables, all close to 50%.

#	bot	overall	Destin	Heartb	Polari	Aztec	Longin	Circui	Empire	Fighti	Python	Roadki
1	stardust	95.63%	96%	97%	97%	98%	90%	94%	98%	97%	94%	96%
2	bananabrain	79.70%	79%	81%	81%	80%	83%	79%	74%	81%	80%	79%
3	dragon	51.19%	50%	47%	52%	50%	50%	55%	56%	50%	50%	51%
4	steamhammer	49.78%	51%	56%	49%	50%	44%	51%	48%	49%	50%	49%
5	mcrave	41.70%	45%	47%	41%	41%	38%	35%	42%	41%	44%	41%
6	willyt	41.05%	38%	39%	42%	36%	36%	51%	38%	43%	49%	40%
7	microwave	40.70%	46%	41%	41%	36%	40%	41%	38%	39%	40%	45%
8	daqin	39.63%	41%	36%	42%	42%	45%	44%	39%	41%	31%	35%
9	freshmeat	33.61%	31%	36%	33%	34%	37%	32%	37%	31%	35%	31%
10	ualbertabot	26.70%	22%	19%	22%	31%	38%	17%	31%	27%	28%	32%

Least but not last, the overall race balance. There is only one random bot, UAlbertaBot, and two terran bots, so the data is more sparse than usual. This table mainly tells us that the protoss participants were strong.

	overall	vT	vP	vZ	vR
terran	46%		20%	57%	76%
protoss	72%	80%		74%	90%
zerg	41%	43%	26%		59%
random	27%	24%	10%	41%

Finally, how each bot did against each race.

#	bot	overall	vT	vP	vZ	vR
1	stardust	95.63%	97%	87%	98%	99%
2	bananabrain	79.70%	84%	53%	87%	95%
3	dragon	51.19%	96%	24%	52%	83%
4	steamhammer	49.78%	59%	14%	65%	92%
5	mcrave	41.70%	32%	34%	57%	37%
6	willyt	41.05%	4%	17%	62%	69%
7	microwave	40.70%	33%	32%	50%	55%
8	daqin	39.63%	58%	10%	36%	78%
9	freshmeat	33.61%	47%	25%	28%	52%
10	ualbertabot	26.70%	24%	10%	41%	-

Next: Map tables for each bot.

new bot Broken Horn

New zerg bot Broken Horn is uploaded to BASIL only; it does not appear on SSCAIT. “Broken Horn” is the name of a zergling in Starcraft 2 lore. The name reflects the bot’s strategy. The bot was updated once after its first upload. I did not find signs of the bot on the wider internet, and I know nothing about the author (though I can make guesses). Without access to the binary, I can’t do my usual peek into it.

I watched a bunch of games. I can tell from details of its play that Broken Horn is a fork of a recent Steamhammer version—and there is only one recent release, the AIIDE tournament version 3.5.10. The long game Simon Prins - Broken Horn on Heartbreak Ridge is particularly clear in showing Broken Horn using Steamhammer skills later in the game.

Broken Horn always plays 3 hatch before pool. It varies the timing from 10 hatch 9 hatch 9 pool (fast for 3 hatch before pool) to 12 hatch 14 hatch 15 pool (extremely greedy; it immediately adds a 4th hatchery). I haven’t seen enough games to judge whether it varies the timing randomly, or based on learning, but I suspect it is random. In play, it uses the larva advantage it gained from the fast hatcheries to flood the opponent with masses of zerglings. It sticks with the ling flood for a long time, upgrading zergling speed but nothing else, then eventually (if the game continues) the opening line runs out and it falls back on Steamhammer’s strategy boss.

Is Broken Horn perhaps a Newbie Zerg production? See Newbie Zerg and links from there. It has that kind of feel. Matching characteristics are 1. Fork of a recent Steamhammer version. 2. A high-pressure cheese build. 3. It has only one basic build, but plays variations on it. 4. Steps to obfuscate the bot’s play and origin, in this case uploading to BASIL only.

The ling flood strategy often wins quickly. But an opponent that rushes will catch it unready, so it also often loses quickly. This is easy to see in the BASIL game length graph (bottom of the page), with a razor-sharp blue peak at 7 minutes and red in outlying areas. Terran BBS, or protoss 9-9 gates, or zerg 9 pool or faster should win with aggressive play. The opponent can also defend safely and tech to air units or anything else that a low-tech zerg is unready for, but that is trickier. The ling flood is powerful and easy to underestimate, and its size and timing depend on Broken Horn’s variable build timing.

In any case, welcome! New bots bringing new challenges are always good.

Next: AIIDE 2021 summary tables.

a first look at the AIIDE 2021 results

AIIDE 2021 results are out.

Ever since the unfortunate withdrawal of PurpleWave due to frame time issues, it was sure that protoss Stardust and BananaBrain would finish first and second—it seemed likely from the start, but without the other strong protoss it was inescapable. As it turned out, #1 Stardust was in a class by itself, scoring 96%, and #2 BananaBrain was in the following class by itself at 80%. #3 Dragon was only the best of the rest, the leader of the trailers, barely above breakeven with 51%. All others scored below even. I didn’t expect Dragon to place so high, because it was a holdover from last year and bots should have been prepared for it. I knew that #4 Steamhammer would outscore it head-to-head.

#4 Steamhammer did great at 49%. I met my goals of finishing above the middle and of murderfying #7 Microwave (73% score head-to-head). I had hoped to make third, but missed by about 1.4%. I expected to and did beat #3 Dragon, #5 McRave, and #7 Microwave, so I had some reason. I knew that Steamhammer risked a zero score against #1 Stardust—and it did happen—but the win count was going to be tiny no matter what so it wasn’t a big concern. I was worried about #6 WillyT because its big tank-infantry attacks are effective, but Steamhammer scored OK there too with 56%. Like last year, the trouble was a huge upset by carryover #8 DaQin. 2020 score 22%, 2021 score 27%—an improvement, but not by much. I had expected better.

#5 McRave scored better than the other zergs versus #1 Stardust and #2 BananaBrain, but it was not enough to move the needle. It was upset by #6 WillyT and, strangely, by #10 UAlbertaBot (last year it scored 89% against UAlbertaBot). #6 WillyT could not cope at all with #3 Dragon, and was upset by #8 DaQin too. #7 Microwave was little updated, according to the author. #9 FreshMeat, the new zerg by Hao Pan, scored 34% and was the tail ender of the submitted bots (those other than the holdovers). #10 UAlbertaBot’s upset of #5 McRave and stubborn ability to score some wins against every opponent kept it up at 27%, higher than I had anticipated. I guess UAlbertaBot will remain a usable benchmark for at least one more year.

The tournament ranks are similar to the BASIL ranks. BASIL has Stardust as the top among the AIIDE participants and BananaBrain as next. Microwave’s higher placement on BASIL is the biggest discrepancy. FreshMeat may be class B on BASIL and ranked 18 out of 86, but its BASIL rank still predicts its second-to-last finish.

This highlights that AIIDE 2021 was an elite tournament. There were few participants, and every submitted bot was already known to be highly ranked. 3 newcomer bots registered, and none was submitted. To me, it smells as though authors only want to submit if they believe they can do well. I see that as a mistake. From the author’s point of view, a tournament is a chance to gain experience, to learn about your own bot and others, and to show off your good ideas. From the community’s point of view, a tournament is an opportunity to invite new members in and to trade insights. In my experience, virtually every bot has good ideas that we can learn from. Many bots that perform poorly in games still have impressive skills in specific circumstances, not to mention other clever ideas. See for example my analysis of AITP, which scored 12% in AIIDE 2019.

Next: New bot Broken Horn. After that, stand by for more analysis of AIIDE.

Steamhammer’s timers

UAlbertaBot comes with a system of timers: It divides the bot’s work into aspects, and for each frame times how much time was spent on each aspect. Steamhammer inherited it. Overkill also inherited it, and if you’ve seen Overkill play on the SSCAIT stream then you’ve seen its timer display in a big black box smack in the middle of the screen. On a standard Broodwar screen, much smaller, the box is in the lower right.

The display only shows the times for one frame. I found it less useful than it could be; you have to be watching closely to see any spikes in specific aspects. So yesterday I extended it to remember the high water mark for each aspect, the longest time it has taken during the game. If there are time usage spikes, I can quickly get an initial idea of where they are.

black box with bar graph and two columns of numbers

At the time of the picture, Steamhammer’s supply was 187. Some of the aspects of play named down the left are the same as UAlbertaBot’s; others are new or changed. The bars represent time in milliseconds for the previous frame, the same as the first column of numbers. The second column of numbers is the peak time in milliseconds. (I decided that drawing the peaks on the bar graph would compress the real-time display too much.) Worker management, production, building construction, and combat are the most expensive aspects. Search means BOSS search, which does not happen for zerg, so the 0.4ms peak probably means that the OS dropped in that much delay at some point. The Tasks are a couple of jobs that the bot already did that I converted into tasks, nothing new yet.

If I run into slowdowns in the future, maybe I’ll extend it more and keep a histogram for each aspect to see how often it is slow.

The code that does the timing is straightforward and exactly like UAlbertaBot. Here’s the code to time the Info line of the display.

    _timerManager.startTimer(TimerManager::InformationManager);
    Bases::Instance().update();
    InformationManager::Instance().update();
    _timerManager.stopTimer(TimerManager::InformationManager);

timeout issues

Bruce @ Stardust commented on yesterday’s post. The indented paragraphs are quoted from the comment.

There is some AIIDE news from Discord that relates a bit to your last point about timeouts: PurpleWave has withdrawn from the tournament as it somewhat mysteriously was timing out in most of its games. I say mysteriously since Dan has invested a huge amount of effort into making PurpleWave run asynchronously, which should make timeouts impossible.

Executive summary: That sucks. Purple Dan Gant dug deep into low-level particulars that bot authors really should not have to know, and yet it wasn’t enough.

the problem

Last year when the BWAPI client timing bug was being investigated, some other issues were discovered, like problems relating to the timer not being high-enough resolution and problems with the bot process being pre-empted by the OS and therefore appearing to have spikes that were actually nothing to do with the bot.

BWAPI’s timer. I read the source and saw that BWAPI 4.4.0 times bots using the real-time counter GetTickCount(). The documentation that I just linked says that the timer’s resolution is “typically in the range of 10 milliseconds to 16 milliseconds.” That’s very crude for measuring that an interval does not exceed 55 milliseconds. A measurement “it took 55ms” means “it probably sort of maybe took between 40ms and 70ms, though it depends on your system.” One solution would be to use a high resolution timer in a new BWAPI version. That’s how Steamhammer times itself, with code inherited from UAlbertaBot. Another solution might be to find a way to time accurately from outside BWAPI, somehow accounting for overheads.

BWAPI reports the time through a call BWAPI::Broodwar->getLastEventTime(). A comment in ExampleTournamentModule.cpp explains a workaround in the code to cope with peculiarities that are hard to understand. It’s a code smell, as the authors are well aware, or the comment would not be there. I don’t want to try to figure out if and when the code works as intended.

Both these points appear in the BWAPI issue getLastEventTime() has different behavior for client/module bots, linked by Dan in another comment. Confusion among BWAPI developers in the comment thread shows how hard it is to understand!

Being pre-empted. As I understand it, in the tournament the timer is provided by a multitasking virtual machine which itself runs under a multitasking operating system. Looks like ample opportunity for slippage in every aspect of timing. I don’t know the solution for that. Is it possible to measure something like cpu time + I/O time instead of real time? Surely every operating system keeps track. Would it work better, even when running under a vm that might itself be pre-empted by the host OS? I can think of other potential problems, but that’s a good start!

Experiments with timers in one environment might not tell us about timers in another environment. And yet if we want to hold bots to time limits, then bots need a reliable way to measure their time usage.

a proposed solution

All of this has got me wondering if we should change the approach to timeouts. I think the 1- and 10-second limits are fine, but perhaps the 55ms rule should be an average over the entire game instead of a frame count limit. I’m a bit worried that the current rules will result in more unfair disqualifications or force more bot authors to spend a lot of time working around single-frame spikes, both of which are bad for our already-quite-small community.

That worries me too, and I like your suggestion, especially for tournaments, because tournaments care more about total time needed than time per frame. (For playing against humans, or for streaming, consistent speed counts.) My first thought is that if there is a mean frame time limit, then the limit should be lower than 55ms, perhaps 42ms. Averages are easier to keep low than occasional peaks are. Maybe histograms of frame time for a bunch of bots would help us understand what is best. I’m imagining that the tournament would allow a startup transient, then keep track of the frame count and total frame time, and verify the average periodically, perhaps once per second. Fail and earn an immediate loss.

Dan suggested a rolling average (aka moving average) as a possible alternative. That’s more complicated to implement, but not by much.

There are other averages than the mean. The mean has the advantage of simplicity, and the advantage that the total time allowed is proportional to the length of the game. I think the mean is the right choice. But if the goal is to limit spikes above 55ms (or whatever), then we could choose an averaging function that penalizes those more. Choose appropriately, and the 1-second and 10-second rules could be eliminated because the averaging function takes over their roles.

real-time systems

I favor making life easy for bot authors, but there’s only so easy it can be made.

Stepping back for a bigger picture, a BWAPI bot is a complex real-time system. If the bot does little work per frame, it is easy to hold it to its real-time promises, no matter the details of the promises. Don’t worry, just run it, it’ll be fine (Steamhammer’s approach so far). If it does a lot of work per frame and risks breaking its promises, then in general it has decide what work to skip to save time. It needs some way, preferably a smart way, to divide its work into pieces and to choose which pieces to drop or delay (PurpleWave’s approach). It’s much harder. The difficulty is intrinsic to real-time systems: If you want to play as well as possible, and playing your best may take more time than you have, then the bot needs a baked-in system to cut corners.

I can imagine that somebody might provide a real-time framework for bots, but even then not everybody would or should use it. With more to learn, starting a bot would be harder. Maybe it would be good to have a framework with optional real-time features.

I remember BeeBot, interesting but eventually disabled for being too slow. I can at least offer advice for authors whose bots are slow, or in danger of becoming slow. Many of these bots, I think, are by less-experienced programmers who haven’t yet mastered the art of efficient algorithms and structuring their code to avoid unnecessary work. Over-optimization that obfuscates code is an anti-skill for long-term development, but clear and efficient structure is good. Skip computations that you don’t need, calculate on demand data that you may or may not use, cache data that you may reuse, tolerate some out-of-date data if it doesn’t need to be the latest—all easy ideas, but not so easy to become expert at. And that means that the expertise is valuable.

A little more for those who do have the experience. If you’re not familiar with real-time systems, you may not realize: Code with a predictable runtime is often better than fast but unpredictable code. If you know how long it will take, then you can schedule it to safely meet your real-time promises. If it’s faster on average but occasionally takes longer, you may risk breaking your promises. Better yet is code where you decide how long it takes: See anytime algorithms, which offer some answer whenever you stop them, and a better answer if you let them run longer. Many search algorithms have the anytime property.

what’s next for Steamhammer: the decision

I have decided what tactical skills to work on. My list included skills for specific units: Mutalisks, the most important; lurkers, which I’m most interested in for now; scourge, which Steamhammer spends heavy gas on and doesn’t always use well, defiler skills because Steamhammer often reaches late game. But those are only single unit types. And unit coordination skills, like storm dodging, scarab dodging, mine clearing and mine dragging, making the best use of the dark swarm that is on the map—all needed, all narrow and specific. And tactical analysis, my initial favorite. I have an algorithm in mind, which calls for a fast combat evaluator. MaasCraft’s tactical search also uses a fast combat evaluator. My idea is different, and I’m not satisfied with MaasCraft’s evaluator. Thinking through what’s needed, I concluded that the first draft would be easy to write, but would produce poor results. I think it’s likely that it needs a sophisticated combat evaluator to work well—I have an AI algorithm in mind for that too, but I fear I can’t finish it in time for SSCAIT in December.

To make the most progress before SSCAIT, I decided to work on the next level of pathfinding skills. Steamhammer currently calculates terrain paths without regard to where the enemy may be. On an empty map, ground units reach their destinations without getting stuck on terrain. When a unit is trying to reach its destination safely despite the enemy, a scouting unit or a drone transferring to another base, the unit reacts to dangers by leaving its path and running away from the enemy. It is not able to figure out a way around (though it may blunder into one), and it is not able to tell when its path is completely blocked and it should give up. So overlords scout less safely and less efficiently than they could, and worse, drones trying to transfer may end up burrowed all over the map, wasting supply and risking their lives to achieve nothing.

Steamhammer needs true safe pathfinding. It has to recalculate safe paths when the enemy is sighted. That opens the door to a lot of more specific skills.

• Don’t send drones to a place you know they can’t reach. This alone would save many games.
• Don’t even spawn extra drones inside a tight contain. They won’t get out.
• Better scouting, from maneuvering the early scouting worker to moving overlords and the Recon squad.
• Calculate least-danger paths for harassment. You can take hits as long as you escape.
• Similarly for drops.
• Reach safe spots to attack walls or other stuff from outside enemy range.
• Enemy vision is a kind of danger too. Find sneaky paths.
• Path through nydus canals. Nydus canals are part of my plan to support islands.

I don’t know how many of these I’ll get to by SSCAIT. There is a lot to it: Ground units and air units have different needs, safe paths and least-danger paths are different, sneaky paths are different. Safe drone transfers are the biggest weakness and have top priority. Part of the solution is to spread hatcheries out more, rather than putting all macro hatcheries in the main.

The first part of the job was to create a task manager to run background tasks. It’s simple, I wrote it yesterday. The idea is that pathfinding tasks will update safe pathfinding data structures behind the scenes, so that the calculation load is spread out and the data is reasonably up-to-date. Over time, I expect to add a lot of other kinds of tasks. Steamhammer runs fast, and for now there is little risk of overstepping the frame time limit. (Even in the late game when maxed, most frames take a handful of milliseconds, and spikes above 20ms are rare.) But I have thought up plenty of complicated tasks, and it seems likely to become an issue someday. I want the infrastructure to be ready, so that I can implement a principled solution instead of refactoring a lot of code when the day arrives.

Steamhammer 3.5.11 change list

According to tradition, a new Steamhammer version drops in elo on BASIL at first. It takes around 2 months for changes to settle into the learning data before the elo reaches a new equilibrium. The new AIIDE tournament version has broken tradition and started out with an elo rise instead. It’s an early sign that I may have a successful version.

Last night I uploaded the “bug fix” version 3.5.11 to SSCAIT and SCHNAIL. It has 9 small changes over the tournament version of Steamhammer, a lot more than I planned. Only 3 changes are proper bug fixes. All of them are meant to prevent bad behavior or use resources more efficiently, so they fix play bugs if not code bugs. For debug flags, this time I turned on drawing of not only the clusters but also the combat sim info (drawn alongside the cluster info and in combat areas) and the static defense plan. It makes for a busy display.

operations

• Estimate when one of our bases is doomed to be destroyed, so that we can stop spending resources on it. Any code that wants to know can call base->isDoomed(). For this first pass, I made it conservative; it checks a few conditions and does a quick comparison of defenders to attackers to see if the fight is very lopsided. If it says the base is doomed, then it is under attack and there truly is little chance that it can be saved (though you never know, maybe the opponent will do something else). The feature has many uses that I’m sure I’ll get to, but for now only one is implemented (keep reading).

static defense

• Don’t add more sunkens or spores at a doomed base. They’ll die too and accomplish nothing. The weakness was glaring; now it should be only staring.

• Limit front-line sunkens versus terran to 6 at most, 5 at other bases. (Against bots, rather than humans, Steamhammer makes at most 1 sunken at other bases, to prevent casual raids. Almost all bots concentrate on attacking the main with its front line.)

• The plan/execute loop runs more often, to reduce the delay in adding defenses when in a hurry.

• The controller could mistakenly order multiple copies of a prerequisite building, like a forge for cannons or an evolution chamber for spore colonies. Fixed.

• There was one last place where a building was posted directly to the building manager instead of queued for production: The prerequisite building. Fixed. It caused no known bug, but queueing the building likely avoids rare problems.

zerg

• Fixed a production freeze that was possible when the enemy went mass air. This was an interesting one, because it was a completely different mechanism than any other production freeze I’ve seen. In the unit mix calculation, if the best unit for the mix was devourers and we already had as many devourers as we should, then the code rejected the choice it had committed to. The unit mix fell back on the default, drones as the only unit to make. By the time this happens, it is late in the game and Steamhammer already has as many drones as it wants. So it replaces lost or used drones, makes urgent units like scourge, keeps up with its upgrades... and produces no other units. The fix was to reject devourers up front in that case, so that the calculation finds a different best unit.

• If we have excess minerals and gas, make a lair and/or research burrow solely to use up some of the excess. It happens occasionally, and if the game continues we’ll want both eventually.

• If air carapace has reached +3 and we still have many mutalisks and/or guardians, start getting air attack upgrades too. Might as well, I figured. I uncommented a snippet of code that I wrote years ago, back when Steamhammer’s air upgrades never went beyond +2.

AIIDE 2021 dropouts

The AIIDE 2021 list of entrants says that all 3 of the new names did not submit: Taiji, real5drone, and BlueSoup. That leaves 8 familiar names and 3 bots carried over from last year, 11 total. See AIIDE 2021 prospects.

Unfortunate but unsurprising. :-( Lately new bots have been dropping out of tournaments at a high rate. I will keep advising authors to participate if they can. Even if you think you’re not ready, it’s worth it. If your bot plays games without crashing more than occasionally, you have nothing to lose and experience to gain.

the prototypical series on SCHNAIL

SCHNAIL players who try out Steamhammer often play a series of games one after another, and if they liked it they come back another day for more. I’ve watched enough of these series that I have a sense of the patterns they follow. Everybody’s different, of course, and Steamhammer’s play has random elements too. But often enough, a series with a terran or protoss opponent who is well-matched with Steamhammer more or less follows a prototypical sequence of four steps.

1. Get busted. Steamhammer is tuned against bots, where early aggression is successful, so it often starts out with a bust. Apparently many humans at this level are not quite prepared. Against terran it breaks in with zerglings or lurkers, against protoss with lings, hydras, or mutas. (Terrans are ready for mutalisks.)

2. Tighten up defenses. Players at this level figure out how to stop an incoming Steamhammer rush within a few games. That’s typically good for two or three wins before Steamhammer tries something else.

3. Get overrun by macro. Players at this level also tend to be too passive. Maybe macro and scouting and whatnot uses up their bandwidth, or maybe they’re used to being fine if they stay at home for a while. If the player goes active and begins attacks too late into the middle game, Steamhammer has already started to outmacro them and, even if it loses bases along the way, will finally win with hive tech.

4. Learn to attack actively. And players at this level don’t take long to understand how to react to zerg macro: Don’t let the zerg macro, but attack expansions aggressively. The new static defense code makes more sunkens at exposed outer bases against humans (not against bots), which helps them survive. But Steamhammer is not strong at defense, and players are fairly successful at taking the bases down anyway. After figuring this out, the human player will win games indefinitely, sometimes all games. If the two are closely matched, the games may be long and difficult.

Alternately, a terran may make one big timing attack into the zerg natural, and break through. Steamhammer can usually deter this plan versus protoss.

It seems to me that if you start out struggling to beat Steamhammer, and without using any special anti-bot tricks learn to defeat it, then you must have improved your play. Tight defense and active play are good. The same skills you polished to beat the bot will help you against other opponents.

Of course, many series don’t go this way at all. A player of different strength may get all losses or all wins. Several days ago one player played a long series of cannon rushes on the 2-player map Destination, first trying to push cannons from the side of the zerg base, then in later games switching to cannon the natural. The rush, after adding proxy gateways, often eventually destroyed the zerg main, despite being slowed by defenses, and units from the proxy gates were then able to move out and destroy more bases. But protoss was never able to stop zerg from expanding, and ended up losing every game, usually after defensive cannons in the protoss base suddenly fell. Steamhammer made many missteps, but I was pleased with the defense against cannon pushes. This was likely a player trying out the strategy for fun.

Steamhammer 3.5.10 source

The AIIDE 2021 tournament version of Steamhammer is available for download at Steamhammer’s web page, as binary and source.

It’s a strangely long time since I have formally released a version. Well, hiatus over.

plan then execute

I notice that in coding Steamhammer features, I increasingly employ a pattern of separating a planning phase and an execution phase. In the Steamhammer change list a couple days ago, I described new code for ground upgrades. It’s only 78 lines, including comments and blank lines. The planning phase looks at the game and decides on a priority order for melee attack, missile attack, and carapace upgrades. The execution phase carries out the top-priority upgrades when everything needed is available. By separating the concerns, each phase has a smaller job that is easier to understand. The only cost is that you need a data structure to carry the plan between phases.

On a larger scale, the static defense controller works the same. The planning phase does not go into detail about each base, but figures out how much defense is needed for each category of base: The front line needs this many sunkens, exposed outer bases need that many, and so on. The execution phase runs the following frame, and works out the details of which specific bases need more defensive buildings, and where exactly they should be placed, and how fast they should be made. Compared to most of Steamhammer, the code is straightforward and easy to understand, and I give part of the credit to the separation of planning and execution.

On a larger scale yet is the Micro module. It accepts orders for individual units, remembers the orders, and carries them out over however many frames they take. It figures out how to kite hydras and tries to solve problems like stuck units. Micro constitutes the execution phase for individual unit micro; its job is to make life easier for the rest of the bot. It is incomplete and not as pretty as the static defense controller, but I see it as benefiting from the same general idea.

as an architectural principle

It seems to me that completely separating planning from execution at the top level of the frame loop could be a good architectural choice. onFrame() might look like this:

void PerfectBot::onFrame()
{
  GameState state = collectGameState();
  GamePlan plan = analyzeAndPlan(state);
  execute(state, plan);
}

The planner would presumably be made up of many different modules, each planning a different aspect of play: Production, scouting, army movement, and so on. A minor point of doing all planning up front, before any execution, is that the execution phase then always sees a consistent view of the game; nothing is out of date because that module hasn’t run yet this frame. The major point is that each aspect of the planner has access to the others, so that (at least in principle) resources can be allocated well, conflicting goals can be reconciled, and tradeoffs can be resolved using all information. All this happens before the bot takes any action, so it should be easier to arrange for it to take good actions. For example, if the planner assigns each unit one job, then the bot should never have bugs where two modules both think they control the same unit (which has happened to Steamhammer).

The execution phase would presumably have many modules too, one for each executable aspect of the plan. They might be parallel to the analysis modules, but I don’t see that they have to be.

Compare CherryPi’s blackboard architecture. The blackboard is a global data structure which lets program modules communicate with each other. A blackboard is a good foundation for separating planning from execution, whether at the frame loop level or otherwise, and CherryPi uses it that way.