Starcraft AI blog | Entries from August 2021

Steamhammer 3.5.2 to SSCAIT

I’ve uploaded Steamhammer 3.5.2 to SSCAIT. It fixes the worst problems with the new static defense code introduced in version 3.5.1, the version with the big changes.

Smaller problems remain. Steamhammer now likes to build too many sunkens early in the game versus terran, a tuning issue. It is quick to defend the front line, but often strangely slow to defend other bases, which must be a bug. In combat decisions, units much more often mysteriously shy away from the fight, even when facing an undefended base. I think I know the cause. It is likely that this version will lose elo on BASIL, though we’ll see. It’s not easy to judge whether the new problems outweigh the improvements.

It’s time to go dark and prep for AIIDE. These problems and more will be banished. My aim is to come in ahead of the middle point, where I landed last year. And of course ahead of Microwave: Microwave must be utterly murderfied.

CoG 2021 compared to 2020

CoG 2020 last year had 8 participants and played 5600 games, which works out to 200 for each pair of opponents. CoG 2021 this year had 9 participants and played 1800 games, far fewer. That made 50 games per pairing, 1/4 as many for each pair of opponents. The bot winning rates look mostly well-separated, so probably the smaller game count did not much affect the finishing order. Only #4 Microwave/#5 PurpleWave and #6 XiaoYi/#7 BetaStar are closely ranked and might have swapped places. Surely the top 3 would have finished in the same order. The shorter tournament does theoretically give an edge to bots which don’t learn, or don’t learn much, or otherwise reach their learning asymptote quickly, as compared to bots which can keep improving their decisions over a long period.

Last year, 23 of the 5600 games failed to complete with a result, and were not counted—about 4 per thousand. This year, it was 10 of 1800 games—over 5 per thousand. Reliability is about the same, probably because the same bot was responsible for most failures in both years, the technically tricky MetaBot.

And of course the maps are different, with Great Barrier Reef alone played in both years. With only 5 maps total, the different map choices may not average out nicely; there are likely to be accidental differences in how well each bot likes the maps on average. The difference should be pretty small, though I would still prefer 10 maps over 5.

holdover results

#6 XiaoYi, #7 BetaStar, and #8 MetaBot are carried over from last year—and also from before. We have a brief history of how well they’ve done against the field.

bot	2019	2020	2021
BetaStar	67.41%	51.73%	39.29%
XiaoYi	72.21%	36.57%	40.10%
MetaBot	59.04%	11.02%	23.08%

BetaStar performed worse than last year, but XiaoYi and MetaBot both performed better. That is because the added participant this year was bottom-ranked CUNYbot, which all opponents defeated with overwhelming scores. The field is weaker, so the holdovers look stronger.

holdovers versus updated bots

CUNYbot did not play last year (it last competed in CoG 2019), but all other participants are the same. We can make a closer comparison of 2020 and 2021 by leaving out the newcomer CUNYbot and considering only the holdovers and the updated returning bots. (It’s possible only because the tournament has lost popularity and, in the last 2 years, has retained only a hard core.)

bot	2020	2021
BetaStar	51.73%	33.43%
XiaoYi	36.57%	31.81%
MetaBot	11.02%	13.82%

In this virtual tournament, MetaBot was able to hold its low position without slipping further, but the other 2 clearly fell since last year. The field ex CUNYbot seems to have become tougher.

holdover tournament

I noticed something funny, though, that makes me question repeatability. Here are the 2020 and 2021 results of the subtournament played among the holdover bots only. They look different, and have a different finishing order.

2020	overall	Beta	XIAO	Meta
#1 BetaStar	262/388 67.53%		93/200 46%	169/188 90%
#2 XIAOYI	260/400 65.00%	107/200 54%		153/200 76%
#3 MetaBot	66/388 17.01%	19/188 10%	47/200 24%

2021	overall	XIAO	Beta	Meta
#1 XIAOYI	64/99 64.65%		32/50 64%	32/49 65%
#2 BetaStar	51/97 52.58%	18/50 36%		33/47 70%
#3 MetaBot	31/96 32.29%	17/49 35%	14/47 30%

The bots are the same. Do the different maps make that much difference? Let’s compare the numbers by map (I put these two tables into the same sort order, not following the finishing order).

2020	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
#2 XIAOYI	65.00%	85%	81%	59%	62%	38%
#1 BetaStar	67.53%	59%	65%	82%	58%	73%
#3 MetaBot	17.01%	6%	4%	9%	28%	40%

2021	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
#1 XIAOYI	64.65%	75%	55%	55%	74%	65%
#2 BetaStar	52.58%	55%	58%	53%	35%	63%
#3 MetaBot	32.29%	20%	37%	42%	42%	21%

Compare the columns for Great Barrier Reef, the map that was the same both years. BetaStar and MetaBot both performed very differently on Great Barrier Reef. The same bots, with the same opponents, on the same map, produced different results. The smaller number of games played this year could add random variation, but this seems like a lot to me.

I assume the difference is due to the randomness of learning. A bot trying to adapt to an opponent may randomly hit on a good idea soon and learn quickly, or may hit on the good idea later and learn slowly. I’ve seen that make a huge difference. I can’t rule out that the difference is due to a change in tournament conditions; in fact, the much stronger performance of MetaBot this year and weaker performance of BetaStar suggests that there may be a change in tournament conditions. No matter the cause, I have to think that year-over-year comparisons are questionable. In other words, this whole post was a waste, and I might as well have forgotten history!

CoG 2021 breakdown by map

For each CoG 2021 bot, how it performed against each opponent on each map. Where all games were played, each cell includes only 10 games, so the numbers are noisy. You can go back to compare with last year if you like.

Stardust	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
BananaBrain	92%	100%	90%	70%	100%	100%
McRave	42%	40%	60%	40%	20%	50%
Microwave	100%	100%	100%	100%	100%	100%
PurpleWave	92%	70%	100%	90%	100%	100%
XIAOYI	100%	100%	100%	100%	100%	100%
BetaStar	96%	90%	100%	90%	100%	100%
MetaBot	100%	100%	100%	100%	100%	100%
CUNYbot	100%	100%	100%	100%	100%	100%
overall	90.25%	88%	94%	86%	90%	94%

Maybe PurpleWave has a cheese build that it preferred against Stardust on the 2-player map Ride of Valkyries? It might be only random variation.

BananaBrain	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	8%	0%	10%	30%	0%	0%
McRave	74%	80%	80%	70%	80%	60%
Microwave	64%	20%	90%	70%	90%	50%
PurpleWave	70%	70%	60%	70%	70%	80%
XIAOYI	98%	100%	100%	100%	100%	90%
BetaStar	88%	90%	100%	80%	80%	90%
MetaBot	98%	90%	100%	100%	100%	100%
CUNYbot	98%	100%	100%	100%	100%	90%
overall	74.69%	69%	80%	78%	78%	70%

Compared to last year, BananaBrain looks more consistent across maps. But look at that severe loss versus Microwave on Ride of Valkyries! Is that a BananaBrain weakness, or a Microwave strength?

McRave	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	58%	60%	40%	60%	80%	50%
BananaBrain	26%	20%	20%	30%	20%	40%
Microwave	78%	70%	80%	90%	90%	60%
PurpleWave	70%	60%	70%	80%	70%	70%
XIAOYI	38%	20%	90%	50%	10%	20%
BetaStar	84%	100%	90%	70%	80%	80%
MetaBot	92%	89%	90%	90%	100%	90%
CUNYbot	100%	100%	100%	100%	100%	100%
overall	68.17%	65%	72%	71%	69%	64%

McRave is also consistent across maps—except against Stardust and XiaoYi, where it wobbled widely.

Microwave	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	36%	80%	10%	30%	10%	50%
McRave	22%	30%	20%	10%	10%	40%
PurpleWave	78%	90%	80%	90%	40%	90%
XIAOYI	80%	100%	100%	70%	40%	90%
BetaStar	44%	70%	60%	30%	50%	10%
MetaBot	82%	80%	100%	78%	90%	60%
CUNYbot	92%	100%	80%	100%	80%	100%
overall	54.14%	69%	56%	51%	40%	55%

Microwave does not look consistent, though. Differing results on different maps give a clue about strengths and weaknesses, or in other words, they point out stuff that you can fix... if you can decipher the clue. Microwave’s trouble on Neo Sniper Ridge seems to be due mainly to PurpleWave and XiaoYi.

PurpleWave	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	8%	30%	0%	10%	0%	0%
BananaBrain	30%	30%	40%	30%	30%	20%
McRave	30%	40%	30%	20%	30%	30%
Microwave	22%	10%	20%	10%	60%	10%
XIAOYI	90%	100%	90%	100%	70%	90%
BetaStar	58%	70%	50%	70%	60%	40%
MetaBot	96%	90%	100%	100%	89%	100%
CUNYbot	86%	90%	100%	80%	70%	90%
overall	52.14%	57%	53%	52%	51%	48%

XIAOYI	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	2%	0%	0%	0%	0%	10%
McRave	62%	80%	10%	50%	90%	80%
Microwave	20%	0%	0%	30%	60%	10%
PurpleWave	10%	0%	10%	0%	30%	10%
BetaStar	64%	70%	50%	70%	70%	60%
MetaBot	65%	80%	60%	40%	78%	70%
CUNYbot	98%	90%	100%	100%	100%	100%
overall	40.10%	40%	29%	36%	53%	42%

BetaStar	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	4%	10%	0%	10%	0%	0%
BananaBrain	12%	10%	0%	20%	20%	10%
McRave	16%	0%	10%	30%	20%	20%
Microwave	56%	30%	40%	70%	50%	90%
PurpleWave	42%	30%	50%	30%	40%	60%
XIAOYI	36%	30%	50%	30%	30%	40%
MetaBot	70%	80%	67%	78%	40%	89%
CUNYbot	80%	90%	100%	70%	60%	80%
overall	39.29%	35%	39%	42%	32%	48%

Like last year, BetaStar shows inconsistency, coming across as strong but brittle. Against the strongest opponents, it simply broke.

MetaBot	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	2%	10%	0%	0%	0%	0%
McRave	8%	11%	10%	10%	0%	10%
Microwave	18%	20%	0%	22%	10%	40%
PurpleWave	4%	10%	0%	0%	11%	0%
XIAOYI	35%	20%	40%	60%	22%	30%
BetaStar	30%	20%	33%	22%	60%	11%
CUNYbot	86%	90%	100%	80%	90%	70%
overall	23.08%	23%	23%	25%	24%	21%

CUNYbot	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
Stardust	0%	0%	0%	0%	0%	0%
BananaBrain	2%	0%	0%	0%	0%	10%
McRave	0%	0%	0%	0%	0%	0%
Microwave	8%	0%	20%	0%	20%	0%
PurpleWave	14%	10%	0%	20%	30%	10%
XIAOYI	2%	10%	0%	0%	0%	0%
BetaStar	20%	10%	0%	30%	40%	20%
MetaBot	14%	10%	0%	20%	10%	30%
overall	7.50%	5%	2%	9%	12%	9%

Next: Comparing with last year.

CoG 2021 tables

The CoG 2021 result tables. First, the crosstable. This is identical to the official crosstable, only presented differently. If every game in the tournament had completed successfully, there would be 7200 games. In fact 10 games were counted as uncompleted, so there are 7190 games in the table. An edge case was game 377, PurpleWave versus McRave on Ride of Valkyries, which was recorded as GAME_STATE_NEVER_DETECTED but also said that McRave crashed, a curious result. The official table counted it as a successfully completed game despite the fact that it never started, and a win for PurpleWave. So I followed suit.

If all games had completed, each bot would have played 400 games total, 50 games against each opponent. Old MetaBot was the cause; it accounted for all 10 uncompleted games, distributed more or less randomly among its opponents. Of the 9 * 8 / 2 = 36 pairings, only 3 were upsets: #3 McRave > #1 Stardust, #6 XiaoYi > #3 McRave, and #7 BetaStar > #4 Microwave.

	overall	Star	Bana	McRa	Micr	Purp	XIAO	Beta	Meta	CUNY
#1 Stardust	361/400 90.25%		46/50 92%	21/50 42%	50/50 100%	46/50 92%	50/50 100%	48/50 96%	50/50 100%	50/50 100%
#2 BananaBrain	298/399 74.69%	4/50 8%		37/50 74%	32/50 64%	35/50 70%	49/50 98%	44/50 88%	48/49 98%	49/50 98%
#3 McRave	272/399 68.17%	29/50 58%	13/50 26%		39/50 78%	35/50 70%	19/50 38%	42/50 84%	45/49 92%	50/50 100%
#4 Microwave	216/399 54.14%	0/50 0%	18/50 36%	11/50 22%		39/50 78%	40/50 80%	22/50 44%	40/49 82%	46/50 92%
#5 PurpleWave	207/397 52.14%	4/50 8%	15/50 30%	15/50 30%	11/50 22%		45/50 90%	29/50 58%	45/47 96%	43/50 86%
#6 XIAOYI	160/399 40.10%	0/50 0%	1/50 2%	31/50 62%	10/50 20%	5/50 10%		32/50 64%	32/49 65%	49/50 98%
#7 BetaStar	156/397 39.29%	2/50 4%	6/50 12%	8/50 16%	28/50 56%	21/50 42%	18/50 36%		33/47 70%	40/50 80%
#8 MetaBot	90/390 23.08%	0/50 0%	1/49 2%	4/49 8%	9/49 18%	2/47 4%	17/49 35%	14/47 30%		43/50 86%
#9 CUNYbot	30/400 7.50%	0/50 0%	1/50 2%	0/50 0%	4/50 8%	7/50 14%	1/50 2%	10/50 20%	7/50 14%

The table of bot performance on each map is also identical to the official table, except that it shows win rates instead of win counts. When no games were missed, each bot played 80 games on each map.

	overall	Rideof	GreatB	NeoAzt	NeoSni	Python
#1 Stardust	90.25%	88%	94%	86%	90%	94%
#2 BananaBrain	74.69%	69%	80%	78%	78%	70%
#3 McRave	68.17%	65%	72%	71%	69%	64%
#4 Microwave	54.14%	69%	56%	51%	40%	55%
#5 PurpleWave	52.14%	57%	53%	52%	51%	48%
#6 XIAOYI	40.10%	40%	29%	36%	53%	42%
#7 BetaStar	39.29%	35%	39%	42%	32%	48%
#8 MetaBot	23.08%	23%	23%	25%	24%	21%
#9 CUNYbot	7.50%	5%	2%	9%	12%	9%

And one table that is not in the official results, bot performance against each opponent race. There is only one terran bot, #6 XiaoYi, so the vT column is not very interesting. #5 PurpleWave was the only bot to perform equally against protoss and zerg. Since there are so few opponents, it’s not clear whether that’s meaningful.

bot	race	overall	vT	vP	vZ
Stardust	protoss	90.25%	100%	95%	81%
BananaBrain	protoss	74.69%	98%	66%	79%
McRave	zerg	68.17%	38%	66%	89%
Microwave	zerg	54.14%	80%	48%	57%
PurpleWave	protoss	52.14%	90%	47%	46%
XIAOYI	terran	40.10%	-	28%	60%
BetaStar	protoss	39.29%	36%	31%	51%
MetaBot	protoss	23.08%	35%	9%	38%
CUNYbot	zerg	7.50%	2%	10%	4%

Next: The per-bot map tables.

Steamhammer 3.5.2 change list

I uploaded Steamhammer 3.5.2 to SCHNAIL today. The version fixes the worst of the problems in the static defense code that was new in the previous version 3.5.1. They were problematic problems; Steamhammer dropped 200 to 300 elo. The change list is short.

• Added a new macro location MacroLocation::Tile for placing buildings at an exact position. Now code that knows how to place buildings can post those buildings to the production queue.

• Taking advantage of that, now the static defense code queues buildings to produce instead of dropping them directly into the building manager, as before. The old static defense code never had a problem doing that, but the new code is more capable and hits failure cases. The production manager that reads the queue already does all the work needed to cover those cases, though.

• Zerg more carefully limits the number of static defense buildings it makes. Sometimes it wanted to put down more than the drone count could support.

Meanwhile, for CoG 2021 my parsing code needs updates to read the detailed results file perfectly. This happens every year. It won’t take long.

CoG 2021 results first look

CoG 2021 results have been posted to their website, with the detailed results file. You can look back at my view of the prospects from June. Of the 10 entrants, one dropped out, the unknown newcomer Granite. 9 participants is one more than in 2020; in 2019 there were 27. I’ll be doing my usual analyses.

#	bot	%	updated?
1	Stardust	90.25%	new
2	BananaBrain	74.69%	new
3	McRave	68.17%	new
4	Microwave	54.14%	new
5	PurpleWave	52.14%	new
6	XiaoYi	40.10%	holdover
7	BetaStar	39.29%	holdover
8	MetaBot	23.08%	holdover
9	CUNYBot	7.50%	new

The big news is that we see the first signs of zerg resurgence in the face of protoss dominance. #3 McRave and #4 Microwave pulled in front of #5 PurpleWave and #7 BetaStar, both tough opponents. BetaStar may be a holdover, but it ran ahead of the zergs last year (and this year it narrowly upset Microwave head-to-head). With holdover #6 XiaoYi as the only terran, we did not get a read on terran progress.

#1 Stardust annihilated every opponent except #3 McRave, which scored an upset. If you’ve watched games recently, that should not be a surprise. #2 BananaBrain defeated every opponent except Stardust, but not as overwhelmingly. #3 McRave, closer to #2 BananaBrain above than to #4 Microwave below, lost only to BananaBrain and to #6 XiaoYi. Tournament replays are not released, but lately McRave has been going mutalisks in every game that I’ve seen. I take it as confirmation that McRave’s steady improvements in mutalisk control have been due to effort well spent—it truly is the most important zerg skill to master at this level of play.

The randomly-chosen maps are not clearly spelled out on the website. They were (2) Ride of Valkyries, (3) Great Barrier Reef (renamed El Niño in a later version), (3) Neo Aztec, (4) Neo Sniper Ridge, and (4) Python. As far as I can judge from the map results table, most bots performed about equally on all maps. The major exception is that #4 Microwave disliked Neo Sniper Ridge while #6 XiaoYi liked it.

updated paper on sneaky dropship paths

The CoG 2021 proceedings includes one Starcraft paper. Sneak-Attacks in StarCraft using Influence Maps with Heuristic Search by Lucas Critch and Dave Churchill is an updated version of a paper from last year, “Combining Influence Maps with Heuristic Search for Executing Sneak-Attacks in RTS Games” by the same authors, which I wrote up as paper on dropship paths.

The dropship paths themselves appear identical to last year’s. The main news is an experiment to test how well the sneaky paths work against the built-in AI, as compared to straight-line paths. Somewhere in the world, hidden from our mortal eyes, there now exists a version of UAlbertaBot which does sneaky drops—at least in one restricted case, fast four-zealot drop by protoss. (The bot AIUR plays fast four-zealot drop too, but it doesn’t have the fancy shuttle pathfinding. Perhaps the authors will give us chances to play against this UAlbertaBot version?) The pathfinder replans the path on a fixed schedule, once per second, and stops replanning when it nears the drop point, because it has probably been seen by then anyway and it’s time to stop dodging and get on with it. The shuttle does not react to being shot at (though if the enemy wasn’t seen before then it may change course when the path is replanned). It’s clear that improvements are possible.

Results: A shuttle taking the direct path was somewhat more likely to be seen along the way, and was substantially more likely to be shot down before unloading (versus terran, way more likely to be shot down by marines). The direct path was faster, which had countervailing advantages in some situations. It’s not clear to me how well the results will generalize to other opponents, because they depend on details of the built-in AI’s play. A bot author, as opposed to a researcher, would rather find out whether the sneaky drops produce more wins in a tournament against a range of opponents.

Steamhammer’s skill kit system makes it relatively easy to record data about specific opponents, and to use the data for decisions. It wouldn’t be hard to record the kind of data shown in the paper—survival, damage taken, moving into enemy sight—and use the information to decide whether to drop and what category of path to follow if so. Steamhammer could figure out during a tournament what works against each opponent. I plan to eventually collect at least enough data to decide whether and when to drop.

A new idea in the paper is to use the sneaky pathfinder defensively, to place your buildings to prevent enemy sneak attack: If you find a good path for the enemy to attack you along, place a building that has vision of the path so you see the attack coming. This seems like overkill to me. I don’t think you gain much from a fancy pathfinding algorithm to place buildings for vision. If you want vision all around the edges of your base, then by the time drops are possible, you should have enough pylons, or supply depots, or zerg overlords, to have all the vision you can use. And besides, you have to weigh vision against the risk to the buildings, especially if you have a low-ground main base. I would prefer to spend the effort to react correctly to an incoming attack that you do spot.

If you want source code, you could try contacting the authors.

Steamhammer 3.5.1 results

After a long time, I finally discovered that Steamhammer 3.5.1 failed to update on SCHNAIL because I repeatedly uploaded the wrong file. This is because I am as dumb as a rock, and unmotivatedly checked a long list of other possible errors first. Although, unlike most rocks, I did eventually wake up.

Initial results are in for the new version, and they are poor. I notice only a few new bugs, but they are severe and must be fixed, preferably weeks ago. So 3.5.1 will not see SSCAIT; maybe 3.5.2 or 3.5.3 will be fixed enough.

Steamhammer’s macro

Steamhammer has superhuman macro, or at least, its macro skills are far beyond those of human players around its level. I think pro-level zergs can probably macro better than Steamhammer in many situations, because of superior planning. But Steamhammer is utterly superhuman in some aspects. For example, humans can’t do mining optimization after the early game. And the bot can set every drone to work on the first frame that the drone becomes commandable.

And yet, and yet! Steamhammer’s macro is far from optimal. It feels strange that it can be at the same time so good and still have so much room for improvement. It’s not something I need to work on soon, because it’s a relative strong point even compared to other zerg bots, but in the long run there are steps to take.

For one, Steamhammer does not do all the mining optimizations that it could. Some bots ranked above it have put considerably more effort in.

For another, there is a limitation in the production system that it can only produce one queue item per frame (except commands; it can execute commands in the same frame as another queue item). The limitation reduces bookkeeping and simplifies the code. Usually, it makes little difference. When Steamhammer saves larvas to produce its first 3 pairs of zerglings, it hardly matters that the second and third pairs start a fraction late. But in the late game, occasionally there is a sudden need to produce many zerglings at a time when all the needed larvas are available (because production has been limited by gas). The last zerglings in the queue may be started over a game second later than they could have been, and that could affect the fight. It’s hardly a critical weakness, though. I’ll rework production eventually, but it is low on my priority list.

The big macro weakness is that Steamhammer does not plan ahead. It makes hatcheries and overlords at times decided by heuristics. The heuristics are not bad, but for hatcheries in particular it often ends up with too few or too many—commonly, too few and then it overcorrects and gets too many. It’s possible to do much better by predicting the needs of future production and adding hatcheries just in time. It could be tricky, because future production depends on how the game will go, which depends on the opponent’s play. Even if you knew exactly what combat unit mix you want (which you can’t be sure of), there’s a risk that you might lose drones and need to replace them, and drones are larva-intensive because they are cheap. But still, it’s definitely possible to do much better than Steamhammer’s current play. That change is sort of in the middle of my priority list.

Even the greatest strengths are not so great. It’s a hard game.