tournaments - 6 | Starcraft AI blog

CoG 2021 compared to 2020

CoG 2020 last year had 8 participants and played 5600 games, which works out to 200 for each pair of opponents. CoG 2021 this year had 9 participants and played 1800 games, far fewer. That made 50 games per pairing, 1/4 as many for each pair of opponents. The bot winning rates look mostly well-separated, so probably the smaller game count did not much affect the finishing order. Only #4 Microwave/#5 PurpleWave and #6 XiaoYi/#7 BetaStar are closely ranked and might have swapped places. Surely the top 3 would have finished in the same order. The shorter tournament does theoretically give an edge to bots which don’t learn, or don’t learn much, or otherwise reach their learning asymptote quickly, as compared to bots which can keep improving their decisions over a long period.

Last year, 23 of the 5600 games failed to complete with a result, and were not counted—about 4 per thousand. This year, it was 10 of 1800 games—over 5 per thousand. Reliability is about the same, probably because the same bot was responsible for most failures in both years, the technically tricky MetaBot.

And of course the maps are different, with Great Barrier Reef alone played in both years. With only 5 maps total, the different map choices may not average out nicely; there are likely to be accidental differences in how well each bot likes the maps on average. The difference should be pretty small, though I would still prefer 10 maps over 5.

holdover results

#6 XiaoYi, #7 BetaStar, and #8 MetaBot are carried over from last year—and also from before. We have a brief history of how well they’ve done against the field.

bot	2019	2020	2021
BetaStar	67.41%	51.73%	39.29%
XiaoYi	72.21%	36.57%	40.10%
MetaBot	59.04%	11.02%	23.08%

BetaStar performed worse than last year, but XiaoYi and MetaBot both performed better. That is because the added participant this year was bottom-ranked CUNYbot, which all opponents defeated with overwhelming scores. The field is weaker, so the holdovers look stronger.

holdovers versus updated bots

CUNYbot did not play last year (it last competed in CoG 2019), but all other participants are the same. We can make a closer comparison of 2020 and 2021 by leaving out the newcomer CUNYbot and considering only the holdovers and the updated returning bots. (It’s possible only because the tournament has lost popularity and, in the last 2 years, has retained only a hard core.)

bot	2020	2021
BetaStar	51.73%	33.43%
XiaoYi	36.57%	31.81%
MetaBot	11.02%	13.82%

In this virtual tournament, MetaBot was able to hold its low position without slipping further, but the other 2 clearly fell since last year. The field ex CUNYbot seems to have become tougher.

holdover tournament

I noticed something funny, though, that makes me question repeatability. Here are the 2020 and 2021 results of the subtournament played among the holdover bots only. They look different, and have a different finishing order.

2020	overall	Beta	XIAO	Meta
#1 BetaStar	262/388 67.53%		93/200 46%	169/188 90%
#2 XIAOYI	260/400 65.00%	107/200 54%		153/200 76%
#3 MetaBot	66/388 17.01%	19/188 10%	47/200 24%

2021	overall	XIAO	Beta	Meta
#1 XIAOYI	64/99 64.65%		32/50 64%	32/49 65%
#2 BetaStar	51/97 52.58%	18/50 36%		33/47 70%
#3 MetaBot	31/96 32.29%	17/49 35%	14/47 30%

The bots are the same. Do the different maps make that much difference? Let’s compare the numbers by map (I put these two tables into the same sort order, not following the finishing order).

2020	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
#2 XIAOYI	65.00%	85%	81%	59%	62%	38%
#1 BetaStar	67.53%	59%	65%	82%	58%	73%
#3 MetaBot	17.01%	6%	4%	9%	28%	40%

2021	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
#1 XIAOYI	64.65%	75%	55%	55%	74%	65%
#2 BetaStar	52.58%	55%	58%	53%	35%	63%
#3 MetaBot	32.29%	20%	37%	42%	42%	21%

Compare the columns for Great Barrier Reef, the map that was the same both years. BetaStar and MetaBot both performed very differently on Great Barrier Reef. The same bots, with the same opponents, on the same map, produced different results. The smaller number of games played this year could add random variation, but this seems like a lot to me.

I assume the difference is due to the randomness of learning. A bot trying to adapt to an opponent may randomly hit on a good idea soon and learn quickly, or may hit on the good idea later and learn slowly. I’ve seen that make a huge difference. I can’t rule out that the difference is due to a change in tournament conditions; in fact, the much stronger performance of MetaBot this year and weaker performance of BetaStar suggests that there may be a change in tournament conditions. No matter the cause, I have to think that year-over-year comparisons are questionable. In other words, this whole post was a waste, and I might as well have forgotten history!

CoG 2021 breakdown by map

For each CoG 2021 bot, how it performed against each opponent on each map. Where all games were played, each cell includes only 10 games, so the numbers are noisy. You can go back to compare with last year if you like.

Stardust	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
BananaBrain	92%	100%	90%	70%	100%	100%
McRave	42%	40%	60%	40%	20%	50%
Microwave	100%	100%	100%	100%	100%	100%
PurpleWave	92%	70%	100%	90%	100%	100%
XIAOYI	100%	100%	100%	100%	100%	100%
BetaStar	96%	90%	100%	90%	100%	100%
MetaBot	100%	100%	100%	100%	100%	100%
CUNYbot	100%	100%	100%	100%	100%	100%
overall	90.25%	88%	94%	86%	90%	94%

Maybe PurpleWave has a cheese build that it preferred against Stardust on the 2-player map Ride of Valkyries? It might be only random variation.

BananaBrain	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	8%	0%	10%	30%	0%	0%
McRave	74%	80%	80%	70%	80%	60%
Microwave	64%	20%	90%	70%	90%	50%
PurpleWave	70%	70%	60%	70%	70%	80%
XIAOYI	98%	100%	100%	100%	100%	90%
BetaStar	88%	90%	100%	80%	80%	90%
MetaBot	98%	90%	100%	100%	100%	100%
CUNYbot	98%	100%	100%	100%	100%	90%
overall	74.69%	69%	80%	78%	78%	70%

Compared to last year, BananaBrain looks more consistent across maps. But look at that severe loss versus Microwave on Ride of Valkyries! Is that a BananaBrain weakness, or a Microwave strength?

McRave	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	58%	60%	40%	60%	80%	50%
BananaBrain	26%	20%	20%	30%	20%	40%
Microwave	78%	70%	80%	90%	90%	60%
PurpleWave	70%	60%	70%	80%	70%	70%
XIAOYI	38%	20%	90%	50%	10%	20%
BetaStar	84%	100%	90%	70%	80%	80%
MetaBot	92%	89%	90%	90%	100%	90%
CUNYbot	100%	100%	100%	100%	100%	100%
overall	68.17%	65%	72%	71%	69%	64%

McRave is also consistent across maps—except against Stardust and XiaoYi, where it wobbled widely.

Microwave	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	36%	80%	10%	30%	10%	50%
McRave	22%	30%	20%	10%	10%	40%
PurpleWave	78%	90%	80%	90%	40%	90%
XIAOYI	80%	100%	100%	70%	40%	90%
BetaStar	44%	70%	60%	30%	50%	10%
MetaBot	82%	80%	100%	78%	90%	60%
CUNYbot	92%	100%	80%	100%	80%	100%
overall	54.14%	69%	56%	51%	40%	55%

Microwave does not look consistent, though. Differing results on different maps give a clue about strengths and weaknesses, or in other words, they point out stuff that you can fix... if you can decipher the clue. Microwave’s trouble on Neo Sniper Ridge seems to be due mainly to PurpleWave and XiaoYi.

PurpleWave	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	8%	30%	0%	10%	0%	0%
BananaBrain	30%	30%	40%	30%	30%	20%
McRave	30%	40%	30%	20%	30%	30%
Microwave	22%	10%	20%	10%	60%	10%
XIAOYI	90%	100%	90%	100%	70%	90%
BetaStar	58%	70%	50%	70%	60%	40%
MetaBot	96%	90%	100%	100%	89%	100%
CUNYbot	86%	90%	100%	80%	70%	90%
overall	52.14%	57%	53%	52%	51%	48%

XIAOYI	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	2%	0%	0%	0%	0%	10%
McRave	62%	80%	10%	50%	90%	80%
Microwave	20%	0%	0%	30%	60%	10%
PurpleWave	10%	0%	10%	0%	30%	10%
BetaStar	64%	70%	50%	70%	70%	60%
MetaBot	65%	80%	60%	40%	78%	70%
CUNYbot	98%	90%	100%	100%	100%	100%
overall	40.10%	40%	29%	36%	53%	42%

BetaStar	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	4%	10%	0%	10%	0%	0%
BananaBrain	12%	10%	0%	20%	20%	10%
McRave	16%	0%	10%	30%	20%	20%
Microwave	56%	30%	40%	70%	50%	90%
PurpleWave	42%	30%	50%	30%	40%	60%
XIAOYI	36%	30%	50%	30%	30%	40%
MetaBot	70%	80%	67%	78%	40%	89%
CUNYbot	80%	90%	100%	70%	60%	80%
overall	39.29%	35%	39%	42%	32%	48%

Like last year, BetaStar shows inconsistency, coming across as strong but brittle. Against the strongest opponents, it simply broke.

MetaBot	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	2%	10%	0%	0%	0%	0%
McRave	8%	11%	10%	10%	0%	10%
Microwave	18%	20%	0%	22%	10%	40%
PurpleWave	4%	10%	0%	0%	11%	0%
XIAOYI	35%	20%	40%	60%	22%	30%
BetaStar	30%	20%	33%	22%	60%	11%
CUNYbot	86%	90%	100%	80%	90%	70%
overall	23.08%	23%	23%	25%	24%	21%

CUNYbot	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	2%	0%	0%	0%	0%	10%
McRave	0%	0%	0%	0%	0%	0%
Microwave	8%	0%	20%	0%	20%	0%
PurpleWave	14%	10%	0%	20%	30%	10%
XIAOYI	2%	10%	0%	0%	0%	0%
BetaStar	20%	10%	0%	30%	40%	20%
MetaBot	14%	10%	0%	20%	10%	30%
overall	7.50%	5%	2%	9%	12%	9%

Next: Comparing with last year.

CoG 2021 tables

The CoG 2021 result tables. First, the crosstable. This is identical to the official crosstable, only presented differently. If every game in the tournament had completed successfully, there would be 7200 games. In fact 10 games were counted as uncompleted, so there are 7190 games in the table. An edge case was game 377, PurpleWave versus McRave on Ride of Valkyries, which was recorded as GAME_STATE_NEVER_DETECTED but also said that McRave crashed, a curious result. The official table counted it as a successfully completed game despite the fact that it never started, and a win for PurpleWave. So I followed suit.

If all games had completed, each bot would have played 400 games total, 50 games against each opponent. Old MetaBot was the cause; it accounted for all 10 uncompleted games, distributed more or less randomly among its opponents. Of the 9 * 8 / 2 = 36 pairings, only 3 were upsets: #3 McRave > #1 Stardust, #6 XiaoYi > #3 McRave, and #7 BetaStar > #4 Microwave.

	overall	Star	Bana	McRa	Micr	Purp	XIAO	Beta	Meta	CUNY
#1 Stardust	361/400 90.25%		46/50 92%	21/50 42%	50/50 100%	46/50 92%	50/50 100%	48/50 96%	50/50 100%	50/50 100%
#2 BananaBrain	298/399 74.69%	4/50 8%		37/50 74%	32/50 64%	35/50 70%	49/50 98%	44/50 88%	48/49 98%	49/50 98%
#3 McRave	272/399 68.17%	29/50 58%	13/50 26%		39/50 78%	35/50 70%	19/50 38%	42/50 84%	45/49 92%	50/50 100%
#4 Microwave	216/399 54.14%	0/50 0%	18/50 36%	11/50 22%		39/50 78%	40/50 80%	22/50 44%	40/49 82%	46/50 92%
#5 PurpleWave	207/397 52.14%	4/50 8%	15/50 30%	15/50 30%	11/50 22%		45/50 90%	29/50 58%	45/47 96%	43/50 86%
#6 XIAOYI	160/399 40.10%	0/50 0%	1/50 2%	31/50 62%	10/50 20%	5/50 10%		32/50 64%	32/49 65%	49/50 98%
#7 BetaStar	156/397 39.29%	2/50 4%	6/50 12%	8/50 16%	28/50 56%	21/50 42%	18/50 36%		33/47 70%	40/50 80%
#8 MetaBot	90/390 23.08%	0/50 0%	1/49 2%	4/49 8%	9/49 18%	2/47 4%	17/49 35%	14/47 30%		43/50 86%
#9 CUNYbot	30/400 7.50%	0/50 0%	1/50 2%	0/50 0%	4/50 8%	7/50 14%	1/50 2%	10/50 20%	7/50 14%

The table of bot performance on each map is also identical to the official table, except that it shows win rates instead of win counts. When no games were missed, each bot played 80 games on each map.

	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
#1 Stardust	90.25%	88%	94%	86%	90%	94%
#2 BananaBrain	74.69%	69%	80%	78%	78%	70%
#3 McRave	68.17%	65%	72%	71%	69%	64%
#4 Microwave	54.14%	69%	56%	51%	40%	55%
#5 PurpleWave	52.14%	57%	53%	52%	51%	48%
#6 XIAOYI	40.10%	40%	29%	36%	53%	42%
#7 BetaStar	39.29%	35%	39%	42%	32%	48%
#8 MetaBot	23.08%	23%	23%	25%	24%	21%
#9 CUNYbot	7.50%	5%	2%	9%	12%	9%

And one table that is not in the official results, bot performance against each opponent race. There is only one terran bot, #6 XiaoYi, so the vT column is not very interesting. #5 PurpleWave was the only bot to perform equally against protoss and zerg. Since there are so few opponents, it’s not clear whether that’s meaningful.

bot	race	overall	vT	vP	vZ
Stardust	protoss	90.25%	100%	95%	81%
BananaBrain	protoss	74.69%	98%	66%	79%
McRave	zerg	68.17%	38%	66%	89%
Microwave	zerg	54.14%	80%	48%	57%
PurpleWave	protoss	52.14%	90%	47%	46%
XIAOYI	terran	40.10%	-	28%	60%
BetaStar	protoss	39.29%	36%	31%	51%
MetaBot	protoss	23.08%	35%	9%	38%
CUNYbot	zerg	7.50%	2%	10%	4%

Next: The per-bot map tables.

CoG 2021 results first look

CoG 2021 results have been posted to their website, with the detailed results file. You can look back at my view of the prospects from June. Of the 10 entrants, one dropped out, the unknown newcomer Granite. 9 participants is one more than in 2020; in 2019 there were 27. I’ll be doing my usual analyses.

#	bot	%	updated?
1	Stardust	90.25%	new
2	BananaBrain	74.69%	new
3	McRave	68.17%	new
4	Microwave	54.14%	new
5	PurpleWave	52.14%	new
6	XiaoYi	40.10%	holdover
7	BetaStar	39.29%	holdover
8	MetaBot	23.08%	holdover
9	CUNYBot	7.50%	new

The big news is that we see the first signs of zerg resurgence in the face of protoss dominance. #3 McRave and #4 Microwave pulled in front of #5 PurpleWave and #7 BetaStar, both tough opponents. BetaStar may be a holdover, but it ran ahead of the zergs last year (and this year it narrowly upset Microwave head-to-head). With holdover #6 XiaoYi as the only terran, we did not get a read on terran progress.

#1 Stardust annihilated every opponent except #3 McRave, which scored an upset. If you’ve watched games recently, that should not be a surprise. #2 BananaBrain defeated every opponent except Stardust, but not as overwhelmingly. #3 McRave, closer to #2 BananaBrain above than to #4 Microwave below, lost only to BananaBrain and to #6 XiaoYi. Tournament replays are not released, but lately McRave has been going mutalisks in every game that I’ve seen. I take it as confirmation that McRave’s steady improvements in mutalisk control have been due to effort well spent—it truly is the most important zerg skill to master at this level of play.

The randomly-chosen maps are not clearly spelled out on the website. They were (2) Ride of Valkyries, (3) Great Barrier Reef (renamed El Niño in a later version), (3) Neo Aztec, (4) Neo Sniper Ridge, and (4) Python. As far as I can judge from the map results table, most bots performed about equally on all maps. The major exception is that #4 Microwave disliked Neo Sniper Ridge while #6 XiaoYi liked it.

PurpleWave in CoG 2021

The CoG 2021 website is updated today with “accept joining PurpleWave,” though the Result page is updated only with a note, not in the picture showing the list of entrants. As expected, PurpleWave is officially in, and now they say so. Good!

CoG 2021 prospects

The registered competitors for CoG 2021 were announced a few days ago on the results page. The submission deadline is not until 25 July.

Here are the new entrants. I sorted them in order of their current BASIL rank, which is a fair guess at how the tournament may work out.

BananaBrain
Stardust
Microwave
McRave
CUNYbot (aka Bryan Weber)
Granite

As I write, BananaBrain and Stardust are rated identically on BASIL, at 3068 elo. BananaBrain has fallen a bit; recently it was higher. I think it is likely that Stardust will get a tournament update, since it hasn’t had a public update in half a year. BananaBrain has had frequent updates and is closer to current. If so, then Stardust remains the favorite to hold its #1 finish. But BananaBrain has improved a lot, and tournaments are decided by results, not predictions! CUNYbot is the baby here, rated about 200 elo lower than McRave zerg. But it also has gone a long time without updates, and I don’t expect it to play without a tournament update—plus it showed strong improvement before updates paused. It may be able to hold itself off the floor.

Granite is a new name. CoG does not list authors or any other info, so I know nothing more about it. Historically, new names are often disappointing, but occasionally one is great. I have hope.

Here are the carryovers. PurpleWave is last year’s version (I expect that Purple Dan is not satisfied with progress). The others are being carried over for the second year. I sorted these by their finish in CoG 2020.

PurpleWave
BetaStar
XiaoYi
MetaBot

That makes 6 protoss, 3 zerg, and 1 meager terran (the same lonely terran as last year), and the terran scored only 37% in last year’s tournament. The race balance is lopsided. XiaoYi and MetaBot are likely to end up as punching bags that stronger bots try to score almost perfectly against.

Last year, 8 bots competed. This year, 10 bots are expected. But one constant has been that not all registrants end up submitting an acceptable bot—last year also had 10 bots expected, and 2 dropped out. Maybe this year will be better?

AIST S4 results

AIST S4 results are published. Despite my optimism, Steamhammer scored 0-2 in its first match then 0-2 in the loser’s bracket to be the first knocked out, as in the past. In fact, it is the worst result ever; in its other two tries, Steamhammer scored 1-4 rather than 0-4 as here.

Here are the results in crosstable form, counting games, players in rank order. The tournament counted matches, not games, so you can’t directly read off the tournament results from here. But it may give a different perspective.

#	bot	overall	star	purp	will	drag	bana	stea
1	stardust	7-2		5-1	*	*	2-1	*
2	purplewave	7-6	1-5		4-0	2-1	*	*
3	willyt	4-5	*	0-4		2-1	*	2-0
4	dragon	4-5	*	1-2	1-2		2-1	*
5	bananabrain	4-4	1-2	*	*	1-2		2-0
6	steamhammer	0-4	*	*	0-2	*	0-2

I watched the replays. PurpleWave 2-1 Dragon after Dragon crashed twice; the first game was a convincing win by Dragon. Again versus WillyT, Dragon won once and crashed twice. PurpleWave tried to counter Stardust with reavers, but suffered when bottlenecked at ramps. If you want the best games, I recommend the ones named in the replay pack Series 2, G1, and Series 10, G1.

Steamhammer’s special preparation for AIST S4

AIST S4 starts tomorrow. AIST is a different style of tournament from AIIDE, and I prepared differently for Steamhammer’s specific opponents. From the competitor’s point of view, AIST is a sequence of best-of-3 matches until you lose 2 matches—or win in the best-of-5 final. Steamhammer is unlikely to face the same opponent twice. With only a few games per opponent, Steamhammer’s deep library of builds is irrelevant.

I could have tuned the learning system to work well in short matches, as PurpleWave has done. Instead, I decided it was simpler to disable learning and use Steamhammer’s classic enemy-specific opening selection, which has gathered dust ever since learning was implemented in Steamhammer 1.4. Once I decided, I completed all the prep in a matter of hours. For each of the 5 opponents, I selected between 3 and 5 openings that were different and had high winning rates (except versus Stardust, which Steamhammer almost never defeats, where I picked what I hoped might have a chance). To choose, I looked at saved learning data for Steamhammer and Randomhammer zerg on BASIL, and for Steamhammer on SSCAIT, because each instance came up with different winners thanks to random sampling. Steamhammer will choose randomly from the openings for each opponent, according to probabilities I set, which sometimes vary depending on the map size.

For those who examine the tournament games, these lists may make the builds easier to interpret.

Stardust

9PoolHatchSpeedAllIn (Styx build)
10Hatch
3HatchLingExpo
3HatchHydra
2x10HatchAllIn

BananaBrain

2x10HatchSlow
10Hatch
Over10HatchBust (zergling bust)
3HatchMutaPure
OverhatchExpoLing

PurpleWave

ZvZ_Overgas9Pool (one hatch muta)
11HatchTurtleHydra
9PoolHatchSpeedAllIn (Styx build)

Dragon

AntiFact_2Hatch (2 hatch muta specialized to beat factory builds)
AntiFactory2 (3 hatch hydra into muta)
2HatchLurker
3HatchLurker

WillyT

9PoolLurker
OverpoolLurker
11Gas10PoolLurker
11HatchTurtleLurker
ZvT_3HatchMutaExpo

Steamhammer has virtually no chance to beat Stardust. I’m guessing that Steamhammer has maybe a 20%-30% chance against PurpleWave or BananaBrain, and a better than even chance against each of the terrans. Notice the lurker builds, making use of Steamhammer’s stepped-up lurker skills.

Steamhammer 3.4.8 for AIST S4

I have submitted Steamhammer 3.4.8 to AIST S4. Here is the change list, nice and long considering how long there was to work on it. The deadline isn’t for hours yet, so revealing my Dark Secrets early might theoretically come back to bite me, but... eh, it won’t bite hard. I’m still concealing my specific preparation for these opponents.

I tried for important advances that wouldn’t take long. I think I met my promise that Steamhammer will play a more “lucid” game, but let’s see how many fresh bugs I didn’t notice! The biggest improvements are pathfinding for ground units, smoothing of combat sim results, and more supple lurker behavior. My intention has always, ever since I started Steamhammer in December 2016, been to work on mutalisk micro before lurker micro, because it is more important. But I let it pass for too long, and now other bots have strong muta micro. I’ll get back to mutas (they are still more important), but I always want to do something different from everybody else, and now I want to work on lurkers first.

The irradiated squad is a nice improvement, too. Anyway, read on.

map analysis

• Reject starting bases (as identified by BWAPI::Broodwar->getStartLocations()) which have no minerals; they are no longer considered bases, much less possible starting bases. I think these are always observer slots. The AIST S4 version of Aztec has observer slots. Formerly, Steamhammer accepted whatever BWAPI told it and created a Base object for each one, marked as a starting base. The error had surprisingly little effect on play (Steamhammer is not tempted to expand to valueless “bases”), but it did cause sneaky mistakes. One is that a starting base is assigned a natural if conditions are met, with the result that the center bases on Aztec were considered naturals of some of the observer slots, and that information could be used for certain decisions, like where to make a hidden base. I’m glad I found this by looking at the maps, because the mistaken decisions might have been a nightmare to diagnose!

scouting

• The scout worker is released whenever friendly combat units are near. This is the final relaxation of the originally tight conditions for releasing the scout worker early, which I’ve been progressively relaxing.

• If the enemy has no known anti-air units, send an overlord to each base on the map to keep watch. Formerly, the rule was “if the enemy is not known to have tech to make anti-air units”, which was rarely satisfied outside the early game when there weren’t overlords available to distribute.

squads

• An irradiated squad centralizes the code for managing irradiated units and implements fancier behavior. Every irradiated organic unit (the ones harmed by the irradiate spell) goes into the squad, except for queens which have enough energy to cast and defilers after consume is researched—they will try to cast something before dying. Formerly, an irradiated unit that could burrow would, and that was the extent of the reaction (the successful reaction, anyway; mutas had code to separate out an irradiated muta, but it mysteriously broke). Now, an irradiated unit that is near friendly units will burrow if it can, or try to flee from its friends if it cannot. If it is near enemy units, it will approach to expose them to the radiation, and also attack if it can. If neither, it will run and do what scouting it can. Some of the behavior looks questionable, and I suspect bugs, but it’s good enough for now. The reaction of unirradiated units to nearby irradiated units is unchanged—only workers try to protect themselves, others carry on as usual.

• A newly-created squad did not pass along its order to its own micromanagers until it was updated. It was an oversight inherited from UAlbertaBot. The main effect is that a defense squad would not begin to act until exactly 8 frames after it was created, so defense was always a little bit late.

• The debug display for squads now shows two new pieces of information: A # symbol if the squad is using pathfinding, and the squad’s status string. The status string is sometimes unchanging, sometimes informative. For example, the Scourge squad has status Attack or Stand By, depending on whether enemy flyers are targetable.

• There was a minor bug in deciding which enemy unit last-seen location to visit next after all known enemy buildings are destroyed. The fix has little effect.

other ops and tactics

• Smooth attack/retreat combat sim decisions over time. I think this is the single most important change. Micro is less jittery and more decisive. I’m leaving out details for today; it’s worth a separate post.

• I found and fixed several more bugs related to spell units. The most important misbehavior was that Steamhammer treated an enemy comsat scan of a base as an attack that needed to be fended off. Every time terran scanned a base, a couple of mutalisks might peel off the flock and head there to defeat the scan. They never failed!

• I added time hysteresis to the defense squads, on top of their existing range hysteresis. After the enemies have been seen off, the squad waits out a time limit before it is disbanded. I was not convinced that the feature helps play, so I set the time limit to only 1 second.

• When a cluster of units is ordered to retreat, it may decide—depending on a simplified geometry calculation involving a risk radius—to “retreat forward” to join with friends. This helps a small cluster join up with a big cluster that is already in a fight. I decreased the risk radius, except in the case where the enemy is terran and has sieged tanks. It should help small clusters retreat forward more often.

• Earlier, I introduced a bug into FAP by adding an incorrect MAX_DISTANCE constant. I had forgotten that the distances are squared. Fixed.

micro

• Pathfinding for ground units for calls to the.micro.Move(), the.micro.MoveNear(), and the.micro.MoveSafely(), (but not the.micro.AttackMove()) at the option of the caller. I’ll post details tomorrow or so. It was easy to guess I would do this now, because Medusa is in the AIST S4 map pool, where Starcraft’s built-in pathfinder likes to pile up units at the blocked back doors of the bases.

• A unit trying to move safely (with the.micro.MoveSafely()) does not try to avoid interceptors, only carriers. Trying to avoid swooping interceptors causes erratic movements, not escape movements.

zerg

• Lurker behavior is smarter. When I first implemented lurkers in Steamhammer 1.3 in 2017, I found that if they obeyed the combat sim like other squad units, they were nearly useless; they did not understand when to burrow, or where to burrow, or when to unburrow. The SparCraft combat simulator I had at the time did not support lurkers. Steamhammer could not tell when the enemy had detected them. So I gave lurkers spartan hyper-aggressive tactics: Always attack, burrow at max range versus a target that can shoot back and directly next to a target that cannot, and unburrow when no target is in range. I’ve made only minor refinements since, because it worked surprisingly well, especially against terran bots. But the crudeness shows, and hyper-aggressive is often hyper-stupid. A lone lurker would boldly attack a line of cannons. Steamhammer has lost a lot of lurkers for free.

Today Steamhammer is a capable squad commander, and it can judge pretty faithfully when a lurker remains undetected by the enemy and should remain safely burrowed despite (meaning because of) the large enemy force on hand. Lurkers in a defense squad remain hyper-aggressive, so that they do not hesitate to eliminate enemies from the zerg base. Other lurkers now advance or retreat together with the units in their cluster, with some exceptions for retreat (see below) so that lurkers don’t unburrow too often. Lurker play remains clumsy, but it is far more flexible than before, and better overall.

• A lurker that Steamhammer believes is undetected will not retreat as long as it is in range of a target. It will remain burrowed, or it will burrow then and there instead of retreating. I made this change in an earlier version, but it had no effect until now because lurkers did not retreat.

• Failing the above check, if a burrowed lurker is asked to retreat then there is one more check: Will it survive long enough to unburrow? It calculates an expected survival time in frames and compares it to the time to unburrow plus a safety margin. If it won’t live that long, it doesn’t bother unburrowing; maybe it can get a last shot in. I do the same calculation for sieged tanks, with the unsiege time, so that a tank beset by zealots still has a chance for a last shot at some distant dragoon.

• I fixed a bug in the hidden enemies check; it handles an invalid missing ui.unit correctly. I don’t think it makes a practical difference with the existing codebase. The hidden enemies check prevents unburrowing when the lurker is undetected and, if it unburrows, at risk of dying to a cloaked unit, or a known unit out of sight on high ground. This check often (not always) prevents the cycle: Burrow on low ground just in range of a bunker up the ramp -> bunker stops shooting because the lurker is no longer visible -> the bunker is no longer visible to the lurker -> the lurker unburrows -> the bunker starts shooting again and becomes visible -> the lurker burrows....

• When possible, a lurker, guardian, or devourer will morph out of enemy view. When within enemy static defense range, it will not morph at all—at least theoretically; I’ve seen it happen so I think the check is not accurate. Lurkers should more often surprise the enemy, and cocoons should be shot down less often.

• An urgent sunken to stop vultures, or an urgent spore to stop wraiths or corsairs, is made quickly, rather than, oh, whenever the bot happens to get around to it, might be any minute now if we’re still alive then. The slowness was due to a bad design decision I made in reworking the code in the SSCAIT tournament version.

• Insert a fresh overlord after a spore colony, not before, when making supply and a spore is next in the queue, and similarly for the evolution chamber prerequisite. Formerly, when corsairs ravaged the overlords, Steamhammer gave priority to getting its supply back into the green. The corsairs had only to camp the zerg base for protoss to win.

• When scourge finds that its target is very close, it attacks even if the combat sim says to retreat. Scourge uses a combat simulation, excluding air units, to avoid ground fire. Often it would be on the verge of shooting down a corsair, but a dragoon was a little too close.... Steamhammer has missed a lot of kills that way.

• The defensive sunken versus cannons is tuned slightly.

openings

• There are no new or changed openings, only a change to how one opening is configured to be used. I saw the OverpoolTurtle build, a bot exploit opening that is objectively horrible, played in one game against a human on SCHNAIL, and that was one time too many. I removed it from the matchup and counter configurations, so it should be played very rarely unless it is found to succeed, and almost never against a human.

configuration

• New flag for Crazyhammer. Setting Config::Strategy::Crazyhammer to true causes Steamhammer to choose its openings purely at random from its large library, ignoring anything it may have learned. It’s used for Crazyhammer on SCHNAIL.

• New debug flag Debug::DrawLurkerTactics draws the name of the current default lurker behavior, either “Aggressive” or “With squad”. It doesn’t convey any information at the moment, since whenever there are lurkers they are “With squad”. I plan for lurker behavior to become more varied and complex, and then the debug flag will be useful.

unused stuff

• The movement sim is included and potentially valuable, but unused. There is code in FAP and in CombatSimulation.

• Fixed a primordial crashing bug in enemy unit clustering, making it also potentially valuable. But it’s turned off for now.

Next: Pathfinding.

AIST S4 prospects

Here are the participants in AIST S4 sorted by BASIL elo, with the latest BASIL update as a guide to gauge how much the bot may yet be updated by the submission deadline of 15 March. If it has been a long time since the last update, the author may have taken the opportunity to make big changes, so the elo is a less reliable guide.

bot	elo	update
Stardust	3072	29 Jan
BananaBrain	2940	1 Mar
PurpleWave	2877	14 Jan
Steamhammer	2728	1 Feb
Dragon	2687	1 Oct
WillyT	2565	26 Feb

Elo is pretty good, but even the best forecast can’t predict the results of a knockout tournament with random seeding. If we take elo as perfect, then any bot in the bottom half might get a tough pairing in the first round and fall to the loser’s bracket, then face another tough pairing. In reality elo is not perfect and even a top player might be knocked out early with bad luck in the BO3 matches. Also, 6 players do not make an exact power of 2 bracket, so I expect that some bots will get a bye the first round and the remainder will have to play an extra match, facing a longer road. The bracket works out if 2 bots get a bye the first round, so that 4 play in the first round and 2 drop to the loser’s bracket. Then round 1 and round 2 of the winner’s bracket both have 4 players.

Nevertheless, Stardust is clearly the favorite to win. We probably have a good handle on BananaBrain’s strength, since it was updated only a few days ago. PurpleWave I hope will have fixed the issues that harmed its performance in SSCAIT—unfinished improvements, we’re told—so it may be able to pass BananaBrain and meet Stardust in the final. Dragon may or may not get a big update, since it has not been reuploaded since October. If it does, it might suddenly become a contender.

Steamhammer, I don’t mind saying, has important updates and should play a more lucid game than the current BASIL version. On the other hand, I have stayed true to my intention of treating the tournament season as over—although it’s not—meaning that I disfavor safe improvements and favor work on basic infrastructure and new features that may need tuning. I don’t have time or resources to test intensively, so I accept risks of bugs and unexpected consequences and poor tuning. Either way, success or failure, Steamhammer may cross the plans of any bots that try to prepare specifically against it. I think my odds are good of, at last, avoiding the bottom spot in AIST. And I hope to beat one of the protosses.

AIST S4 prep

Today I received my acknowledgement that Steamhammer is registered for AIST S4. It will play, if the creek don’t rise.

I haven’t been posting about Steamhammer progress because I’m preparing (ssh! it’s a secret!) Secret Improvements. When the list of participants is announced, I expect I’ll make special arrangements for some of them. Steamhammer should be a moving target so that the same doesn’t happen to it.

The improvements I’ve made already are significant, and I’m finishing up another good one today. I’m hoping that, unlike the other two times Steamhammer participated (S1 and S2), it won’t be knocked out at the first opportunity and end up in last place, or tied for it, with 2 match losses. (In S3, the rival Microwave played instead and got the same result.) I don’t want to get used to being pushed over like a cardboard cutout.

I’m also hoping that by submission time at the ominous Ides of March, Steamhammer will have collected enough opening data that I can decide what to do about it next. I’m still forecasting that I’ll need a second collection phase to fill in gaps. Plus I’ll need a long-term plan to keep the data updated as Steamhammer’s skills progress.

the Steamhammer-Krasi0P match

Yesterday’s loser’s round match KrasioP-Steamhammer ended up with the result I expected, but not in the way I expected. Krasi0P has two builds, cannons in base and proxy cannons. Steamhammer can easily beat either one if it correctly predicts it, but it is poor at predicting. The wins tend to go in streaks, Steamhammer loses a bunch in a row then wins a bunch in a row, because both bots are somewhat slow at reacting when the opponent’s build switches. And Steamhammer’s recent games against Krasi0P, the 2 games in the SSCAIT round robin, were losses. So most likely the losing streak would continue.

1-1 after the first 2 games was good. In the last game, my initial diagnosis was the same as the commentator’s: Steamhammer was going to play 1 hatch muta, an excellent choice that should win. I was suddenly looking forward to a surprise match victory. But when I saw the second hatchery start, I knew something had gone wrong. Steamhammer ended up playing a nonsense build that countered nothing, a sure loss. After seeing the cannons it should have turned a few planned zergling pairs into drones; that part looked OK. I’m not sure what failed, but my best guess is that something caused Steamhammer to break out of its opening build and fall back on the strategy boss, which is buried up to its neck in ZvP weaknesses (it’s Steamhammer’s weakest matchup). Those mistakes look like strategy boss mistakes. But nothing I saw in the game should have caused it to break out of the opening. It may be a bug I haven’t seen before.

In any case, the fixes in version 3.3.6 made Steamhammer substantially stronger against Krasi0P (and many other opponents). Steamhammer will look impressive if it ever gets into a tournament with all the major bugs fixed before instead of after....

Really next, no really, this time I mean it: Steamhammer’s experience versus cannon rushes.

SSCAIT Steamhammer-PurpleWave games

Those who saw today’s stream of the second half of the SSCAIT round of 16 in the elimination bracket may be wondering why Steamhammer played the same losing opening three times in a row against PurpleWave. It’s easy to explain.

Before the tournament, PurpleWave was playing forge expand game after game against Steamhammer. Steamhammer experimented and found that it could win with 9 gas 9 pool, which is a strong ZvZ one-hatch mutalisk build. It is not a strong ZvP build, but before the tournament, the fast mutas scored 13-0 against the fast expands, not a single loss. PurpleWave was not ready to defend against air.

In the round robin phase, PurpleWave again opened the first game with forge expand, and lost to the one-hatch mutas. The score was 14-0. But PurpleWave had been updated for the tournament, and forgot whatever bug or other fixation had caused it to stick to the same strategy. In the second game, protoss varied, and the score went to 14-1. PurpleWave is ranked higher than Steamhammer and usually wins. 14-1 was statistically far and away the best available opening, so Steamhammer stuck with it. It saw during play that it had gotten into trouble and tried desperately to save itself, but the mischief came in the early opening and there was nothing for it; at best it could have lost more slowly.

An RPS analyzer that better predicts the enemy’s opening plan could have helped. Steamhammer would also have to pay more attention to the enemy’s predicted strategy; it deliberately ignores the prediction against long-familiar opponents. The only general way that I know for a bot to be sure that it’s time to switch plans is for it to model the game events and understand why it lost (“oh yeah, you can beat that every time”), which is much more powerful than comparing statistics. I hope to do that eventually, but it’s beyond the state of the art for now. Anyway, opening timing is first, and that feeds in too.

SSCAIT 2020 round robin is over

My first try at the solidity metric did not work well enough. The flaw is easy to fix; expect numbers tomorrow if there are no more flaws. For today, a few notes instead.

The SSCAIT 2020 round robin phase has just finished. The top ranks are no surprise. #1 is Stardust with 104-6 for 94.5%, a dominating performance after a slow start when most of the losses were front-loaded. Tied at #2-#3 are BananaBrain and BetaStar with 99-11, and they scored 1-1 against each other. There is a gap below #4 Monster with 98-12. Tied at #5-#6 are Halo by Hao Pan and PurpleWave with 91-19. This time the head-to-head score is Halo-PurpleWave 2-0. Compared to past expectations, that’s a good result for Halo and a poor result for PurpleWave. #7 Iron landed higher than I predicted.

#16 TyrProtoss, the bottom of the top, is notable for losing every game against others of the top 16, except for one game versus #15 MadMixP. It made up for it with solidity against the rest. The next bot to do at least as poorly against the top 16 is #36 Flash. #17 McRaveZ, the top of the bottom, narrowly missed the top 16, and is notable for a string of 1-1 scores against higher-ranked opponents. McRaveZ also lost 0-2 to #54 Marine Hell, the biggest 0-2 upset. Those 2 points left it 2 games behind #16 TyrProtoss, so it hurt. #18 Skynet by Andrew Smith had even more 1-1 scores against higher opponents. Skynet is old but still tough.

The biggest upsets are cannonbot #50 Jakub Trancik > #2-#3 BetaStar, #52 GarmBot by Aurelien Lermant > #8 Dragon, #54 Marine Hell > #11 Steamhammer, and #55 JumpyDoggoBot > #15 MadMixP. That’s not many extreme upsets; the top 16 are fairly solid. I think another notable upset is the old school champion #29 ICEbot > #1 Stardust.

#11 Steamhammer scored 78-32 for 70.9%. I had forecast rank #9 or #10, and I was optimistic. In the past I’ve predicted more accurately. Looking at every Steamhammer game, I learned about a half dozen bugs and weaknesses that I hadn’t seen before, some severe (I’ll post more later). I guess my prediction was off because of the unexpected weaknesses that I didn’t take into account. But in any case, #11 is the same rank that Steamhammer earned last year, and the year before too, and its win percentage varied neatly with the number of bots in the tournament. In the big picture, Steamhammer had a startup transient (weak the first year because it was barely started, extra strong the next year because other bots had not yet adapted to its new skills), and since then has been holding its level, not surpassing its neighbors but not falling behind either. That’s not bad. But this year I’m putting effort into skills no other bot has, so stand back!

Next I expect a wait while the elimination phase is run behind the scenes, then they’ll turn on bot submission. I will prioritize fixing some of the surprise bugs ahead of my bigger project of opening timing, so I’m thinking I’ll upload Steamhammer 3.3.6 sooner (the tournament version is 3.3.5) and hold 3.4 for later. Then I expect the elimination phase results will come out slowly, week by week. Steamhammer is likely to fall to the losers’ bracket in the first round of the elimination phase, and may visit E-Lemon Nation early on.

Steamhammer-Microwave rivalry

The round robin phase of the annual SSCAIT tournament is nearly over.

Steamhammer was ahead of Xiao Yi by one loss when Steamhammer’s final game came up—versus Microwave. Even with bugs that keep it out of contention, Microwave is still Steamhammer’s rival. Xiao Yi had unplayed games left, and in any case Steamhammer defeated Xiao Yi 2-0 so in the worst case it would place ahead on tiebreak. Nevertheless it was a tense pairing. The game was a long and difficult hive tech ZvZ that neither bot could play particularly well. Notice Microwave’s use of overlords to discover and eliminate burrowed drones. The mutas and devourers were all plagued, but too late....

Another difficult Steamhammer game was versus Ecgberht.

Addendum: BetaStar and BananaBrain will likely end up tied for #2-#3. They scored 1-1 against each other. Will the seeding order for the elimination phase be decided arbitrarily, or what?

solid versus daring play styles

A win against a stronger player is an upset. A loss against a weaker player is an upset in the opposite direction; the weaker player upset you. Two players of the same strength may have different rates of upsets in their games: Maybe one often beats stronger players but loses to weaker ones, and the other has more consistent results and does not. It’s a difference in play style. I call the inconsistent, risk-taking player daring and the consistent, risk-avoiding player solid.

It should be possible to measure solidity from a tournament crosstable. But what is a mathematically correct way to do it? You don’t want to simply count upsets, because you expect to win a lot of games versus a player that is slightly better than you. You want to somehow take the severity of the upset into account. For example, a win rate that stands far from its expected value should count more. But what is the expected win rate if, say, one player scored 80% in the tournament and the other scored 50%?

Here’s one way. Finding expected win rates is what elo is for, so compute an elo rating for each player in the tournament. You could use a program like bayeselo to take all information into account, or you could simply use the tournament win rates to impute elo values, essentially running the elo function in reverse. The two methods will give slightly different answers, but not very different. Then you can use the elo function in the forward direction on the differences between elo values to find expected win rates for each pairing.

Then for each pairing you have an actual win rate from the tournament results, and a calculated expected win rate. By construction, the two are the same in an average sense—but not individually. All that is left is to turn these numbers into a metric of upset-proneness, or daring risk-seeking. I haven’t tried to work out the math of what the metric should be, but the outline is obvious. For each opponent, pick out the pairings that are upsets: Either higher-than-expected win rates against a stronger opponent, or lower-than-expected against a weaker opponent. You might ignore other pairings, on the theory that they are symmetrical anti-upsets, or you might try to refine your metric by assigning them upset values too (I think the results would be a little different but probably close). You want some difference function f(actualRate, expectedRate) that says how big the difference is; you might choose linear distance (subtract then take the absolute value). Then you want a combining function g() that accumulates the difference values into a final metric; if f is distance, then you might choose the arithmetic mean.

I’ve never seen a metric like this, but it seems like an easy idea. Has anyone seen it? Can you point it out?

Next: I want to try this for AIIDE 2020. If it works smoothly I may extend it to other past tournaments, to see whether bots retain a measurably consistent daring/solidity style over time.