tournaments - 9 | Starcraft AI blog

learning for tournaments

Most learning bots use a multi-armed bandit algorithm to choose opening builds, such as a UCB variant or an epsilon-greedy algorithm. The algorithms trade off exploration—trying stuff out to find what’s good—versus exploitation—getting wins using the good stuff that it found. If you’re playing on a ladder that always runs, you want an algorithm tuned to keep on exploring. Unless the algorithm hits on a build that wins every game, it will always find some value in occasionally trying another idea that might turn out to be better than what you’ve found so far.

If you’re playing a tournament, it’s different. The tournament will end, and if you’re near the end of the tournament then it won’t help to explore, because if you do find a better strategy you won’t have time to play it and earn wins. The cost of exploring starts to exceed the benefit. In the very last game of the tournament, if you know that it’s the last, the benefit of exploring is exactly zero and you should always play your best idea so far.

It’s an idea I thought of for AIIDE this year, but ended up not implementing in Steamhammer. I don’t know how long the tournament will be, but I can guess that it will probably not be much longer or shorter than last year, which was 100 rounds—100 games against each opponent. I could add a configuration parameter for the expected number of games, and reduce the rate of exploration slowly so that it reaches zero around then.

I may yet do it, it is likely a good idea. Bandit algorithms generally assume that the environment—the opponent—is steady over time, or becomes steady given enough time. Assuming an unchanging opponent, reducing exploration when exploration cannot help is always a gain. In reality the opponent may itself learn, so that bot + opponent form a hard-to-predict dynamic system. Reducing exploration against a learning opponent might make you more predictable so that the opponent’s exploitation works better. But the opponent will have to explore a little to do that exploiting, so... my guess is that the idea is still likely to help.

Does any bot use this technique of reducing exploration as the tournament continues? Do you have data showing how well it works?

Steamhammer’s prepared learning data for AIIDE 2020

What is the best way to prepare initial learning files for a tournament when you have partial knowledge of how your opponents will play? How should you seed your opponent model? Certainly it depends on your learning algorithm. I did not find the answer, but I poked at it and took my best guess for Steamhammer.

I tried an informal experiment on the Starcraft AI Ladder. The ladder is reset to zero once a week—it erases the game records, everybody’s learned data, everything, and makes a fresh start. One week I collected recent full-length data files and set those as Steamhammer’s prepared learning data for use after the weekly reset. (Because of how Steamhammer uses its prepared learning data, the files were not read at all until the reset.) The opponent is reset too, and may not play the same way that the old learning files expect, so it’s not guaranteed that the old learning files are the best seed for new learning. Still, I expected them to provide an advantage over starting from scratch. It makes intuitive sense that if the opponent is relatively constant, such as an opponent carried over in the tournament from a previous year, then keeping your learning files is good. But there may be cases where it’s not true, because both sides learn.

I was interested in whether the carried-over learning data would be helpful from the start, or would have trouble until it adjusted to the opponent’s reset. I let the ladder run overnight and collected data then. The carried-over data did seem to work well, certainly better than the missing or unmaintained initial data I had been using up to then.

The next week I tried an alternative, preparing minimal initial data. Steamhammer’s learning varies its approach depending on how much data is available, so I expect a difference between full-size and minimal prepared data. I looked through my records of previous weeks—not only one previous week—and selected by hand a small number of sample game records using varied openings that had scored well, never more than 4 games. This is the kind of preparation that makes intuitive sense if you have data on an opponent but you expect that the bot has enjoyed a major update—you want to be ready to exploit known weaknesses, but also to be ready to switch in case the weaknesses are gone. Again I set things up and grabbed the data the next morning. And the minimal data performed better. No opponent had worse numbers.

It was not in any sense a well-controlled experiment. Two weeks of data with the ladder’s small number of opponents is not enough to draw a statistically valid conclusion. Both Steamhammer and other bots were updated during the week between experiments, so the result is more than questionable, not solid but vapor. It’s entirely possible that Steamhammer performed better because I had made an important improvement during the week, and I know that I made improvements. Nevertheless this was the data I had, and I decided that it was more likely to be right than wrong. To my eye, Steamhammer’s performance curve over time looked more convincing with the minimal prepared data—not a scientific conclusion.

So in my AIIDE 2020 submission, I went with minimal prepared learning data. I selected the sample games with more care than in my experiment, trying to take everything into account. I could not prepare for the unknown bots, but I did invent one fictional game for EggBot so that Steamhammer will know it is a cannon bot. I did not prepare for Stardust because nothing has yet worked twice against Stardust. I also didn’t prepare against DaQin because I didn’t have recent data handy; I could have tried harder, but time was short.

We’ll see how it goes!

Surely the best preparation can’t be found by a fixed rule, but depends on the opponent and on what you know about it. And by nature it depends on your bot’s learning algorithm. It’s a question worth thought.

a few items

I am working furiously on Steamhammer, not leaving myself much time for posting. The version I submit for AIIDE should be substantially stronger than the current release version, with visible changes that close observers will notice.

I doubt it will have much chance versus Stardust, though. After watching games where Stardust lost, I wrote 3 new openings to try to exploit its weaknesses, to at least make a dent. One was the hydra opening that Jealous suggested in a cast. But no, a rubber ball does not make a dent in a concrete wall.

A new bot EggBot appeared on the AIIDE 2020 participant list at some point after I wrote up the participants. I guess it must have been omitted by mistake. It is by Nathan MacNeil of the Memorial University of Newfoundland (Dave Churchill’s institution), who is apparently a graduate student. A grad student should have an interesting project in mind, so I have hopes for EggBot—weak or strong, I hope it will be interesting.

I still want to post about encouraging new bot authors, but you can tell it’s not my actual priority because I’m spending my time on coding and testing instead. Still, it’s an important discussion. The addition of S A B C D E F ranks on the BASIL rankings could be a good step if it encourages new authors to make progress: “I want to move up to D!”

AIIDE 2020 participants

The AIIDE 2020 registration deadline was yesterday, and today the list of participants is out (though as I write I don’t see an update to the web site yet). I wrote up the new map pool earlier.

First, the familiar names.

bot	author
BananaBrain	Johan de Jong
Dragon	Vegard Mella
Ecgberht	Francisco Javier Sacido
McRave	Christian McCrave
Microwave	Micky Holdorf
PurpleWave	Dan Gant
Stardust	Bruce Nielsen
Steamhammer	Jay Scott
WillyT	Nico Klausner
ZZZKBot	Chris Coxe

Stardust crushed CoG and is at the head of the BASIL ladder; it is of course the favorite to win #1. McRave is playing zerg again, as in CoG. I’m hoping that Dragon will show us something new. For the other bots, I feel that I know more or less what to expect.

Then the new entrants that have not competed in AIIDE before.

bot	author
DanDanBot	Kim TaeYoung
Randofoo	Edgar Yajure
Taij	Wang Bin

I think “Kim Tae-Young” is a more standard way to anglicize the Korean name. DanDanBot registered last year too, but ended up not competing. The name “Wang Bin” also looks somehow familiar, though I don’t see a past mention related to Starcraft. It is possible that the second author of this recent paper Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets is the same person, but the name is common (at least as anglicized), so I can’t be sure.

Unknown bots are unknown. Let’s hope some of them are fun!

Plus 2 holdovers from last year.

bot	author
DaQin	Lion GIS
UAlbertaBot	Dave Churchill

DaQin first competed in AIIDE in 2018. UAlbertaBot is of course the perennial benchmark, though it risks landing in last place (last year it finished third to last, ahead of newcomers AITP and BunkerBoxeR).

I count 15 participants in total, 4 terran, 6 protoss, 4 zerg, 1 random. That’s a relatively even balance; protoss domination is not showing much. It is “traditional” for some participants to drop out before the tournament gets under way, so we’ll see how that goes. Last year half of AIIDE dropped out. Let’s hope there is no such trouble this year.

CoG 2020 - breakdown by map

These tables show, for each bot, its win rate against each opponent, broken down by map. For example, the first table is for Stardust, and shows Stardust’s win percentages by opponent and map. The tournament played 200 round robins rotating between 5 maps, so each table cell shows the result of 200/5 = 40 games between the two opponents on that map, or slightly less than 40 if some games were not counted due to problems.

See looking forward to CoG 2020 for a general discussion of the maps. As a reminder, the maps are:

(2)Blue Storm
(3)Alchemist
(3)Great Barrier Reef
(4)Andromeda
(4)Luna the Final

Stardust	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
PurpleWave	86%	62%	92%	88%	92%	95%
BananaBrain	63%	55%	62%	72%	65%	62%
BetaStar	76%	100%	55%	80%	72%	75%
Microwave	96%	95%	100%	95%	95%	98%
XIAOYI	95%	98%	100%	98%	90%	90%
McRave	96%	100%	95%	92%	95%	100%
MetaBot	99%	100%	100%	100%	100%	98%
overall	87.61%	87%	86%	89%	87%	88%

Aha, details count! Stardust’s overall results look even across maps, but there are differences for the top protoss opponents. It had some trouble on Blue Storm against PurpleWave and BananaBrain, and on Alchemist versus BetaStar.

PurpleWave	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	14%	38%	8%	12%	8%	5%
BananaBrain	42%	40%	38%	45%	38%	48%
BetaStar	70%	85%	68%	75%	70%	52%
Microwave	84%	82%	80%	90%	85%	82%
XIAOYI	100%	100%	98%	100%	100%	100%
McRave	88%	92%	88%	85%	90%	88%
MetaBot	98%	100%	100%	98%	92%	100%
overall	70.82%	77%	68%	72%	69%	68%

Again, differences show mainly against top opponents, where PurpleWave favored Blue Storm and struggled on Luna against Stardust and BetaStar (though it liked the map against BananaBrain). Luna is a classic macro map. Maybe PurpleWave is not as skilled at the brute force just-make-more-units-and-win style.

BananaBrain	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	37%	45%	38%	28%	35%	38%
PurpleWave	58%	60%	62%	55%	62%	52%
BetaStar	67%	82%	68%	60%	62%	62%
Microwave	74%	72%	57%	80%	85%	75%
XIAOYI	98%	100%	95%	100%	98%	95%
McRave	64%	48%	90%	65%	68%	50%
MetaBot	91%	100%	88%	89%	89%	87%
overall	69.71%	72%	71%	68%	71%	66%

BananaBrain by contrast shows differences versus many opponents, most notably McRave where it dominated on Alchemist and suffered on Blue Storm and Luna. A big disparity like that must mean something.

BetaStar	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	24%	0%	45%	20%	28%	25%
PurpleWave	30%	15%	32%	25%	30%	48%
BananaBrain	33%	18%	32%	40%	38%	38%
Microwave	71%	92%	25%	65%	98%	75%
XIAOYI	46%	25%	30%	80%	32%	65%
McRave	70%	100%	20%	80%	90%	62%
MetaBot	90%	92%	100%	85%	91%	81%
overall	51.73%	49%	41%	56%	57%	56%

A checkerboard table. The middling overall percentages are averages of big wins and big losses. BetaStar comes across as strong but brittle.

Microwave	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	4%	5%	0%	5%	5%	2%
PurpleWave	16%	18%	20%	10%	15%	18%
BananaBrain	26%	28%	42%	20%	15%	25%
BetaStar	29%	8%	75%	35%	2%	25%
XIAOYI	82%	88%	80%	90%	78%	72%
McRave	42%	28%	60%	32%	48%	40%
MetaBot	86%	95%	82%	98%	75%	82%
overall	40.57%	38%	51%	41%	34%	38%

Also with dramatic variations, especially against BetaStar. Was there map-specific preparation, and if so, why did it not work on Andromeda, which is traditionally considered a zerg-favored map? There was an incorrect announcement of maps followed by a correction, but Andromeda appears on both lists, and in any case the map announcements are dated before the submission deadline.

XIAOYI	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	5%	2%	0%	2%	10%	10%
PurpleWave	0%	0%	2%	0%	0%	0%
BananaBrain	2%	0%	5%	0%	2%	5%
BetaStar	54%	75%	70%	20%	68%	35%
Microwave	18%	12%	20%	10%	22%	28%
McRave	100%	100%	100%	98%	100%	100%
MetaBot	76%	95%	92%	98%	57%	40%
overall	36.57%	41%	41%	32%	37%	31%

McRave	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	4%	0%	5%	8%	5%	0%
PurpleWave	12%	8%	12%	15%	10%	12%
BananaBrain	36%	52%	10%	35%	32%	50%
BetaStar	30%	0%	80%	20%	10%	38%
Microwave	58%	72%	40%	68%	52%	60%
XIAOYI	0%	0%	0%	2%	0%	0%
MetaBot	82%	98%	85%	78%	78%	72%
overall	31.64%	33%	33%	32%	27%	33%

MetaBot	overall	BlueSt	Alchem	GreatB	Androm	LunaTh
Stardust	1%	0%	0%	0%	0%	2%
PurpleWave	2%	0%	0%	2%	8%	0%
BananaBrain	9%	0%	12%	11%	11%	13%
BetaStar	10%	8%	0%	15%	9%	19%
Microwave	14%	5%	18%	2%	25%	18%
XIAOYI	24%	5%	8%	2%	42%	60%
McRave	18%	2%	15%	22%	22%	28%
overall	11.02%	3%	8%	8%	17%	20%

Upsetting XiaoYi on even one map is not bad.

CoG 2020 - race balance

I parsed the tournament manager’s detailed results log (text file of 5601 lines for 5600 games plus one header line) from CoG 2020 with my own software. Here is a crosstable which exactly matches the official results.

	overall	Star	Purp	Bana	Beta	Micr	XIAO	McRa	Meta
Stardust	1223/1396 87.61%		171/199 86%	126/199 63%	153/200 76%	193/200 96%	190/200 95%	193/200 96%	197/198 99%
PurpleWave	990/1398 70.82%	28/199 14%		83/200 42%	140/200 70%	168/200 84%	199/200 100%	177/200 88%	195/199 98%
BananaBrain	971/1393 69.71%	73/199 37%	117/200 58%		134/200 67%	148/200 74%	195/200 98%	128/200 64%	176/194 91%
BetaStar	718/1388 51.73%	47/200 24%	60/200 30%	66/200 33%		142/200 71%	93/200 46%	141/200 70%	169/188 90%
Microwave	568/1400 40.57%	7/200 4%	32/200 16%	52/200 26%	58/200 29%		163/200 82%	83/200 42%	173/200 86%
XIAOYI	512/1400 36.57%	10/200 5%	1/200 0%	5/200 2%	107/200 54%	37/200 18%		199/200 100%	153/200 76%
McRave	443/1400 31.64%	7/200 4%	23/200 12%	72/200 36%	59/200 30%	117/200 58%	1/200 0%		164/200 82%
MetaBot	152/1379 11.02%	1/198 1%	4/199 2%	18/194 9%	19/188 10%	27/200 14%	47/200 24%	36/200 18%

As usual, the results file is somehow different in every tournament. I got this table by excluding the 23 games in which the loser’s score is given as -1. 22 of these games are reported in the detailed results text file as NORMAL game end type, and in the HTML detailed results as NO_REPORT. It bothers me that the two sources do not match. The Starcraft AI Ladder documentation on game end states says “this could be caused by the Tournament Manager client crashing, or a network error, etc.” so it’s correct to exclude these games. The remaining game shows GAME_STATE_NEVER_DETECTED, where apparently neither side was able to start. The NO_REPORT results are not necessarily the fault of the bot, but in fact 2 of the games had Stardust as loser and the other 21 had MetaBot, so there is a strong correlation.

Here are the versus-race results. They are strongly skewed by protoss dominance, and of course there is only one terran participant. So these numbers are not much use.

bot	race	overall	vT	vP	vZ
Stardust	protoss	87.61%	95%	81%	96%
PurpleWave	protoss	70.82%	100%	56%	86%
BananaBrain	protoss	69.71%	98%	63%	69%
BetaStar	protoss	51.73%	46%	43%	71%
Microwave	zerg	40.57%	82%	32%	42%
XIAOYI	terran	36.57%	-	28%	59%
McRave	zerg	31.64%	0%	32%	58%
MetaBot	protoss	11.02%	24%	5%	16%

Next: Maps per player, in the same format as this post on AIIDE 2019. I need some updates to my software first.

CoG 2020 results are out

The CoG 2020 results are out today. Overall results, a crosstable, a win rate per map table, and zip files of replays for each bot were released. Source code will be delayed until after AIIDE 2020. It’s not mentioned, but I expect that the tournament manager’s log of detailed results and the bot write directories will appear eventually, and then I’ll put up my colorful crosstables and other analyses. There is also an unspecific apology “We conflicted with several problems while running the competition. Sorry that I didn’t smoothly handle them. However, I hope to meet again with better competition in the future.” Something must have gone wrong.

The results give only 8 participants of the expected 10. Of the announced entrants, newcomer random Mikhail Golovach and old-timer zerg ZZZKBot do not appear in the results. Are they related to what went wrong? The effect is to tilt the tournament further toward protoss, 5 of the 8 bots.

As I predicted (probably along with most), the top winners were #1 Stardust 88%, #2 PurpleWave 71%, #3 BananaBrain 70%. PurpleWave was last year’s version (see the 89% frame timeout rate of a recent version on the Starcraft AI Ladder), which explains why it barely nosed out BananaBrain.

The three holdover bots scored much worse this year than last, and ended up in a different order. Progress has been strong, though it’s hard to compare because the field was much smaller and less varied this year, and had no low-end bots. Tail-ender MetaBot scored over 50% last year, so the field was stronger from the get-go. (The year-old version of PurpleWave scored 88.56% last year to 70.82% this year, but I didn’t include it in the table because I don’t know whether it’s the same year-old version.)

bot	last year	this year
#4 BetaStar	67.41%	51.73%
#6 XiaoYi	72.21%	36.57%
#8 MetaBot	59.04%	11.02%

There were exactly 3 upsets: #3 BananaBrain upset #2 PurpleWave, #6 XiaoYi beat #4 BetaStar, and #7 McRave zerg overcame #5 Microwave. Another surprising result is that #7 McRave zerg scored only 1 win out of 200 versus #6 XiaoYi, even though XiaoYi is a holdover that McRave could have tuned against. By comparison, #5 Microwave scored 163/200 versus XiaoYi.

The per-map table shows that #1 Stardust performed about equally well on all maps. I guess its fast-mass strategy is not sensitive to map layout. All other bots were more sensitive to the map. Most strikingly, Microwave scored well on the tricky two-entrance map Alchemist, while #8 MetaBot utterly collapsed on Blue Storm.

tournaments and tournament preparation

The CoG conference is underway from today through Thursday. It is of course entirely online in this plague year (did somebody let a defiler loose?). I expect the CoG 2020 tournament results to appear during the conference or not long after. The conference program does not announce a time, as far as I can see. You can look back at my expectations if you like.

I am working hard to prepare Steamhammer for AIIDE 2020. I have already uploaded 3 different test versions one after another to the Starcraft AI Ladder, and I have made good progress on the next test version to go up in a day or two. Each version shows some kind of progress in its play, though the win rate does not always go up. The AIIDE update will be much bigger than the update to the current 3.1 release.

I have been watching a huge number of replays and analyzing results with my software. The purple Dan Gant talked in an Undermind podcast about his tournament preparation process: He examines losing games only, identifies the most frequent game-losing weaknesses, and works to fix those. It’s great method, a direct way to improve tournament results in the short term, especially for a bot which can expect to finish near the top, meaning that it shows relatively rare game-losing weaknesses at its level of play. My goal is different, and I follow a different process. I also want to finish well, and make low-risk short-term improvements, but my top goal is to improve play in the long run rather than the short run. I identify ahead of time aspects of play that I think I can and must improve, and I examine both winning and losing games with an eye on those aspects, often looking at specific opponents that bring out those aspects. The weaknesses in winning games must be fixed too; they are by definition not game-losing, but fixing weaknesses improves play and that’s that, and to my eye winning games often show glaring blunders by Steamhammer. Anyway, at Steamhammer’s level of play, fixing a game-losing weakness often leads it to go wrong in a different way a little later and lose anyhow, so I may have to fix a string of weaknesses before I see results. Those are compound weaknesses, as it were, and I run into a lot of them.

I have learned some things by analyzing the ladder results with my software. One of the things I learned is that I could figure out even more if I wrote more software, and if I had Steamhammer record more information that I could correlate with the game results. I may do that soon, and post data.

The Ladder has new participants since I last wrote about it. Besides Stardust, MadMixP is new and StyxZ is reactivated. Just today, CherryPi joined too. That makes 11 active participants, enough for a small but lively scene. I still recommend joining the ladder if you intend to compete in AIIDE 2020, because it is the best way to test that your bot runs correctly in the AIIDE environment, adhering to the strict time limits and so on. Just letting it run for a day is enough to test for basic correctness (the ladder runs games much faster than BASIL), and if you have a problem you want to find out about it early so there’s time to fix it.

AIIDE 2020 has a new map pool

Registration for the AIIDE 2020 tournament opened today. The registration deadline is 31 August, and the submission deadline is 30 September. Steamhammer will be competing.

Most details seem to have no change from last year, but after using the same 10 maps from 2011 to 2019, this year there is a new pool of 10 maps. In the 2 player maps, Benzene is dropped in favor of Polaris Rhapsody. In 3 player maps, Tau Cross is out and Longinus is in. In 4 player maps, Andromeda and Fortress give way to Fighting Spirit and Roadkill. Other maps remain the same.

That’s 4 new maps. All 4 appeared in the “unknown map” secondary tournament last year, so they are tested in bot play. (That tournament ran with 5 maps; only Arcadia was not moved into the new map pool.) All the choices seem standard and conservative, unknown only in the sense that they were unannounced beforehand. By the way, the “unknown map” tournament will be repeated this year. Maybe the map choice will be a little more daring this time?

(2)Destination Back door mineral block to each main base, and other features favoring cheese play. The most logical third base location is immediately above a wide ramp and hard to defend; bots have even more trouble defending the thirds.
(2)Heartbreak Ridge Back door mineral block to high ground over each side’s natural base, on the path to the third. The many ridges benefit bots which understand high ground. The center base breaks the middle of the map into two paths; you can go above or below the center.
(2)Polaris Rhapsody Follows the three-paths map pattern; you can move your army down the center, or by a longer path down either side.
(3)Aztec Low-ground main bases. This causes problems for bots which wall off inside their main.
(3)Longinus Level-ground main bases, like Tau Cross which it replaces.
(4)Circuit Breaker is an old standard.
(4)Empire of the Sun is another map with level-ground mains.
(4)Fighting Spirit is another old standard, likely the most played map of all time.
(4)Python is an old map which I take to be an attempt to rework the aboriginal Lost Temple into a balanced map. Compared to most maps, there is more contrast between close and distant main bases; the enemy may be near or far depending on starting locations.
(4)Roadkill is the most recent of the maps, not to be confused with Roadrunner. It has low-ground mains and the famously thorough Freakling technical design.

Maps not in the SSCAIT pool are Polaris Rhapsody, Aztec, Longinus, and Roadkill. The maps are also outside of BASIL’s extended map pool (“2019Season1”), so some bots may not have played them.

The changes seem designed maintain the variety and balance of the map pool. I don’t expect any big shift in results; a tournament on the previous map pool would likely have a similar outcome. The new pool is slightly less tricky due to dropping Benzene and Fortress, on which a bot with specialized map knowledge has a chance to gain an advantage. That may mean that old bots and new bots with more specific skills can be compared a little more fairly, but I expect that the difference is small.

Overall the map pool change seems fine, unlikely to cause difficulties either in the tournament or in comparing with past tournaments. Given that, I wonder why the change was made. For my part, I’m pleased to see Longinus. I will miss Fortress, though. I like that map.

Update: I asked Dave Churchill why the change after so many years of the same maps. He said he just hadn’t gotten around to it before—the maps were originally meant to be changed every year.

looking forward to CoG 2020

Results of this year’s CoG tournament will be announced later this month. I am later than usual with my take. The new participants of CoG 2020 will be these 7:

Stardust (see yesterday)
PurpleWave
Microwave
Mikhail Golovach (random bot apparently named after its author)
BananaBrain
McRave
ZZZKBot

The only unknown newcomer is random Mikhail Golovach, who is listed as a hobbyist. Most brand new bots from individual programmers turn out to be weak, but there are occasional startling exceptions like Bereaver. At a minimum, going random shows ambition, so we can hope it’s strong and interesting!

I predict #1 Stardust, #2 PurpleWave, #3 BananaBrain as the most likely top winners, based on past records plus my look at Stardust yesterday. But there are surprises every tournament. I think McRave cannot be counted out for the #3 place because it is strong at PvP, and of course Mikhail Golovach is an unknown.

There will also be 3 carryover bots from last year:

MetaBot
BetaStar
XiaoYi

These 3 bots are pretty good, especially BetaStar which is a Locutus fork, but opponents will be prepared. I expect most of the 7 new entrants to perform well against them.

That makes 6 protoss, 2 zerg, 1 terran, 1 random. Protoss outnumbers all others together. Protoss dominance is becoming entrenched. I don’t like that, I should work harder on Steamhammer.

I wrote earlier about the CoG 2020 map pool. Now the specific 5 maps to be played have been selected from the pool.

• (2)Blue Storm has 2 exits toward the center of the map from each base. One is direct but narrow, only passing small units; dragoons and lurkers don’t fit, for example. The other is wide but the path is longer. Only bots with size-sensitive pathfinding will maneuver their armies correctly in long games; others risk getting units stuck. Do any bots have size-sensitive pathfinding? I don’t know of any, but I haven’t looked.

• (3)Alchemist has a circular layout with 2 entrances to each base, so you can go around the map in either direction to reach the enemy. Since it’s a 3 player map, one direction is short and the other is long. This map has appeared in past CIG tournaments, where the 2 entrances confused many bots and led to bad games. On the other hand, I predict that the map will cause little trouble in PvP games, which will be the majority this tournament.

• (3)Great Barrier Reef has mineral lines around the edge of the map that can be mined out to open new paths. Any bot that knows how to take advantage of this map feature may gain an advantage. Bots that can’t take advantage will probably be OK in most games, though; the edge paths are longer, so no matter whether you rely on native pathfinding or roll your own, you should normally find perfectly adequate paths through the center.

• (4)Andromeda is familiar from SSCAIT.

• (4)Luna the Final is a classic macro map, as standard as they come, sometimes criticized for leading to boring standard games. Bots should be fine on it.

Update: An anonymous commenter points out that McRave is playing zerg, not protoss. Oops, I didn’t pay enough attention! I changed the coloring in the list of entrants to reflect that, but did not update the commentary. To correct the counting, protoss is half of the total pool of participants and equal with zerg among the new entrants.

the CoG maps

The CoG tournament expects participating bots to support all maps in a pool of 19, and randomly selects 5 maps from the pool to play in the tournament. The registration deadline was yesterday, so I guess everybody has already decided whether to participate. But there was mention of the CoG maps in the Undermind podcast #44, so I thought it was worth a post.

I described the maps in a 2018 post about the then-CIG maps. This year’s CoG map pool is the same, except that they removed the extremely difficult map Plasma, “because this map may too tricky to play by agents,” an understatement. Plasma has egg blocks that must be destroyed before ground units can walk to the enemy base, and also small mains where terran and protoss cannot fit all the buildings they need, and narrow ramps from the mains that only allow small units to pass. The combination of special features is more than current bots can be expected to support. The last time Steamhammer participated, I tested that it could play games on Plasma without crashing and made no other preparations. If the map had been selected, games on it would have looked ridiculously bad and distorted the tournament results.

With Plasma removed, I expect only a few of the other maps to pose any difficulty to bots. In order of difficulty, on Alchemist the 2 entrances to each base may cause misplays, but I think games should look normal except for misplaced buildings and overlooked opportunities. On Blue Storm, the narrow entrance to the center near each side’s natural will cause some bots to pile up units, trying to send them through where they do not fit. The most difficult will be Hitchhiker, where many bots will try to route through the destructible buildings and leave units trapped. From what I’ve seen, even bots which know how to destroy the buildings will be unable to plan a route to the enemy base, and are likely to blunder in confusion through the game. With 5 maps chosen out of 19, odds are good that at least one of the 3 more difficult maps will be included.

That is what I expect based on past experience, but I could be wrong. Compared to 2018, bots today rely on more and different libraries. Does BWEM have trouble with any of the maps? Does BWEB misplace walls on some of them? I don’t know.

Starcraft AI Ladder crosstables

The Starcraft AI Ladder does not display crosstables or per-map results. I wanted to see the charts to know Steamhammer’s strengths and weaknesses, so I calculated them myself. I modified the script I use to analyze the CoG and AIIDE tournament results every year. The tournament manager’s results file is now in CSV format, a change, but of course it was no trouble to parse. The pop-up table legend explains how to interpret the results to know whether to count each game, only referring to a “Duration” column which the file itself names “Game Time” (to distinguish it from “Wall Time”), and which has value “0:00” on an unstarted game rather than “00:00:00”. My script skipped a total of 3 games out of 2793, all of them with PurpleWave as one player and all due to GAME_STATE_NOT_UPDATED_60S_BOTH_BOTS.

I found that the “Download Search Results” did not behave quite as its name suggests. It seemed to perhaps remember a previous search rather than the current setting, or anyway something unexpected. But after a couple tries I was able to get the complete record of games played since the last reset on 17 April (just a couple days ago). I trimmed off the incomplete round 133, so the file I analyzed includes all games of rounds 0 through 132. 2793 games in 2 days is a great number, far more than BASIL plays.

The ladder would be more valuable if it had more participants. As it is, I am learning from it, because nowhere else runs so many games so quickly.

crosstable

#	bot	overall	Bana	Stea	Micr	Halo	Ecgb	Purp	ZZZK
1	BananaBrain	86.22%		69%	51%	100%	98%	100%	99%
2	Steamhammer	71.68%	31%		80%	46%	79%	95%	98%
3	Microwave	70.05%	49%	20%		56%	97%	99%	100%
4	Halo	61.86%	0%	54%	44%		94%	99%	80%
5	Ecgberht	27.35%	2%	21%	3%	6%		36%	95%
6	PurpleWave	20.88%	0%	5%	1%	1%	64%		56%
7	ZZZKBot	11.79%	1%	2%	0%	20%	5%	44%

BananaBrain is on top in this small field. Steamhammer is doing well; it scores nearly a third versus BananaBrain, wins most games versus Microwave, and is about equal with Halo by Hao Pan. Thanks to the huge number of games, Steamhammer’s learning is saturated so this should be its peak performance. Steamhammer eked out a slight overall lead over Microwave only due to its dominating head-to-head results; against every other bot, Microwave scored better.

The version of PurpleWave is broken; it crashes or oversteps a frame time limit most games. I have to imagine that fixes are progressing in the workshop. I suspect that this version of ZZZKBot may not be working perfectly either, but I didn’t look into it.

each bot’s results per map

BananaBrain	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
Steamhammer	69%	79%	57%	64%	77%	31%	77%	77%	77%	69%	85%
Microwave	51%	21%	57%	50%	62%	46%	46%	62%	46%	46%	77%
Halo	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Ecgberht	98%	86%	100%	93%	100%	100%	100%	100%	100%	100%	100%
PurpleWave	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
ZZZKBot	99%	100%	100%	100%	100%	100%	100%	92%	100%	100%	100%
overall	86.22%	81%	86%	85%	90%	79%	87%	88%	87%	86%	94%

BananaBrain barely noticed opponents other than Steamhammer and Microwave. Against Steamhammer it had trouble on the map Tau Cross, and against Microwave on Benzene. Checking the mix of strategies played on those maps would probably explain the cause.

Steamhammer	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	31%	21%	43%	36%	23%	69%	23%	23%	23%	31%	15%
Microwave	80%	93%	100%	100%	100%	69%	62%	77%	77%	54%	69%
Halo	46%	14%	36%	29%	69%	54%	38%	69%	46%	77%	31%
Ecgberht	79%	93%	79%	71%	92%	85%	77%	85%	85%	62%	62%
PurpleWave	95%	100%	100%	100%	100%	100%	92%	92%	92%	100%	77%
ZZZKBot	98%	100%	100%	100%	100%	100%	100%	92%	92%	100%	100%
overall	71.68%	70%	76%	73%	81%	79%	65%	73%	69%	71%	59%

Steamhammer’s results vary strongly from map to map. I think it is a sign that the opening selection is not paying enough attention to the map. I should have gone with a proper Bayesian calculation rather than an ad hoc algorithm.

Microwave	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	49%	79%	43%	50%	38%	54%	54%	38%	54%	54%	23%
Steamhammer	20%	7%	0%	0%	0%	31%	38%	23%	23%	46%	31%
Halo	56%	43%	57%	21%	77%	23%	92%	85%	62%	62%	38%
Ecgberht	97%	79%	100%	93%	100%	100%	100%	100%	100%	100%	100%
PurpleWave	99%	100%	100%	100%	100%	100%	100%	100%	100%	100%	92%
ZZZKBot	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
overall	70.05%	68%	67%	61%	69%	68%	81%	74%	73%	77%	64%

Microwave also shows a lot of variation from map to map. That’s harder for me to interpret, even though it is the same evidence: Microwave has fewer openings overall than Steamhammer, so it is possible that poor results on some maps are due to not having an appropriate strategy available. Of course it could also be that the opponent’s play is much stronger on some maps.

Halo	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Steamhammer	54%	86%	64%	71%	31%	46%	62%	31%	54%	23%	69%
Microwave	44%	57%	43%	79%	23%	77%	8%	15%	38%	38%	62%
Ecgberht	94%	100%	93%	100%	92%	100%	100%	92%	85%	92%	85%
PurpleWave	99%	100%	100%	100%	100%	100%	100%	100%	100%	100%	92%
ZZZKBot	80%	93%	79%	79%	38%	100%	69%	77%	85%	92%	85%
overall	61.86%	73%	63%	71%	47%	71%	56%	53%	60%	58%	65%

Halo by Hao Pan seems to have consistent trouble on Aztec, the only map in the pool with a low-ground main and a ramp up to the natural. That could be the cause. Most bots underestimate the difficulty of defending the main from enemies on high ground.

Ecgberht	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	2%	14%	0%	7%	0%	0%	0%	0%	0%	0%	0%
Steamhammer	21%	7%	21%	29%	8%	15%	23%	15%	15%	38%	38%
Microwave	3%	21%	0%	7%	0%	0%	0%	0%	0%	0%	0%
Halo	6%	0%	7%	0%	8%	0%	0%	8%	15%	8%	15%
PurpleWave	36%	7%	7%	14%	100%	100%	8%	15%	92%	23%	8%
ZZZKBot	95%	93%	100%	100%	85%	100%	92%	100%	92%	100%	92%
overall	27.35%	24%	23%	26%	32%	36%	21%	23%	36%	28%	26%

For Steamhammer, Ecgberht is a tricky opponent that can sometimes pull surprise wins. Other bots don’t seem to have the same experience.

PurpleWave	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Steamhammer	5%	0%	0%	0%	0%	0%	8%	8%	8%	0%	23%
Microwave	1%	0%	0%	0%	0%	0%	0%	0%	0%	0%	8%
Halo	1%	0%	0%	0%	0%	0%	0%	0%	0%	0%	8%
Ecgberht	64%	93%	93%	86%	0%	0%	92%	85%	8%	77%	92%
ZZZKBot	56%	93%	43%	57%	0%	77%	62%	62%	77%	46%	38%
overall	20.88%	31%	23%	24%	0%	13%	27%	26%	15%	21%	28%

PurpleWave crashes every game on Aztec, and frequently on other maps. :-(

ZZZKBot	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	1%	0%	0%	0%	0%	0%	0%	8%	0%	0%	0%
Steamhammer	2%	0%	0%	0%	0%	0%	0%	8%	8%	0%	0%
Microwave	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Halo	20%	7%	21%	21%	62%	0%	31%	23%	15%	8%	15%
Ecgberht	5%	7%	0%	0%	15%	0%	8%	0%	8%	0%	8%
PurpleWave	44%	7%	57%	43%	100%	23%	38%	38%	23%	54%	62%
overall	11.79%	4%	13%	11%	29%	4%	13%	13%	9%	10%	14%

Starcraft AI Ladder

I signed up Steamhammer for the Starcraft AI Ladder run by Dave Churchill and his group. After Rick Kelly fixed a bug in the ladderware, I got the bot signed up and activated.

From my point of view, the main advantage of this third ongoing ladder (after SSCAIT and BASIL) is that it runs with the same tournament manager software that is used for the AIIDE tournaments. So for those who plan to participate in AIIDE 2020, it should be a good test to make sure your bot is compatible. In particular, it checks the frame time limits strictly. For now, the ladder has few participants and most of them are above average strength, which could be an advantage or a disadvantage depending on what you’re looking for. Games run at a high rate and there are fewer participants, so each participant gets more games than on BASIL.

It’s a bit barebones. It doesn’t even have a name—”Starcraft AI Ladder” is more a description, and has been used before. In particular, it does not come with any documentation, so the signup process and the features you get for it may come as a surprise. I thought it was worth writing up.

Signing up has a few steps. First, fill out the form. You have to authenticate your e-mail address. Then, according to the authentication success e-mail, you have to wait for an administrator to approve your account. For me, approval was almost instant, which made me wonder whether there was an administrator at all—but then, Steamhammer and I should be familiar. The message also says “You may upload your bot files to your profile before being approved, but you won’t be able to activate your bot to play in the ladder.” What you upload is a zip file that will be unzipped straight into the AI/ directory. You specified your BWAPI version at signup, so unlike SSCAIT you don’t include a copy of bwapi.dll.

Finally, go to your profile page, change “ladder participation” from “inactive” to “active”, and save changes. You can also fill in extra info if you like, like your URL. After the current queue of games runs out, your bot will get games queued up.

And that’s about it. The Detailed Results page is not as polished as BASIL, but it does offer the same replay download and replay viewing features. At the far right of the page (I had to scroll right) is a “Download Search Results” button that gives you a CSV file with the same information. At the bottom of your profile page, you can download your bot’s game history in the same format, or your bot’s read/ directory. For other bots, there is no access to anything but the games and results.

Since it’s running the same software, games are played in the same order as in the AIIDE tournaments. There are 10 maps—the AIIDE maps, which include 2 maps that are not among the 14 SSCAIT maps. The tournament manager takes each map in turn and plays a round-robin among the participants on that map; that is called one “round”. Then it moves on to the next map. Every so often the “tournament” is declared over and game records are reset; as I write, that happened last on 1 April.

What do I think so far? The tournament manager part of it is of course industrial strength. The UI is not great, but it’s good by the standard of academic projects, which are often done under severe time constraints and sometimes by students who are still learning. It could use an About page, or a paragraph of description somewhere. I hit one bug, so there are sure to be more that I did not see. Attention to detail is a little lacking; for example, the Home page aka Server Status lists the game number followed by the round number, while the Detailed Results page lists round number followed by game number.

LetaBot in AIST S3

AIST S3 completed. Round up the usual suspects, you can see the results for yourself. There are replays for all the games, but no video so far. The replays are labeled “Round 1” through “Round 14”; you can read the round numbers (I would have called them match numbers) from the bracket diagram. I like seeing all the results at once, but if you prefer to follow the tournament without spoilers, you could download the replays at the top of the page, scroll down no farther than the picture of the initial bracket to match up replays with the tournament structure, and watch the replays by round number.

I was especially interested to see LetaBot by Martin Rooijackers. LetaBot, it seems to me, is strategically strong but suffers from many bugs. Its standard game plan is a classic terran strategy, defend while building up a large army, then bury the opponent in an avalanche of units. LetaBot knows a variety of specific defenses against different cheeses, and has reactions to enemy moves that threaten its game plan, so I think its potential is high. This version is updated to BWAPI 4.4.0, so possibly it has more updates besides. Martin Rooijackers long ago promised a stronger version with bug fixes.

Versus Dragon: I especially wanted to see this match because of the players’ opposite game plans. Dragon plays an early attack and harassment style. In game 1 of the best of 3, at about 2:35 into the game, Dragon sent its first 2 marines and 5 SCVs to breathe some fire. LetaBot luckily scouted in the right direction and timing to spot the force as it left Dragon’s base, and LetaBot had a bunker on its ramp in plenty of time. Fireproof. Dragon sent the SCVs home right away, but inexplicably continued to produce marines that it posted outside LetaBot’s natural. LetaBot appeared to read the situation and react; it made vultures and goliaths—a strong force when the enemy has fallen behind early—batted aside the marines, and continued straight on to Dragon’s base. Dragon had been preparing medics and tanks behind the scene for its next wave of aggression, leaving itself open to LetaBot’s timing, though LetaBot was well ahead even without that opening. Luck plus skill, 1-0.

In the second game, LetaBot crashed on startup. 1-1

In the third game, Dragon tried the same plan again. LetaBot did not get the lucky scout, but cross positions favored defense. Dragon seemed indecisive, sending SCVs forward and back before attacking with 7. LetaBot floated its barracks after 1 marine, Dragon’s SCVs defeated the marine and moved into LetaBot’s base, some blocking while one built a bunker for the following marines. LetaBot pulled SCVs to defend, and with more it easily won the worker fight, but the bunker completed. LetaBot pulled SCVs out as far as the center of the map to keep marines away from the bunker, but eventually retreated and let 2 marines enter. But the timing was OK, another SCV pull past the loaded bunker kept further marines away while a tank came out and kicked over the bunker; not perfect defense but more than good enough. Fire resistant. Dragon’s SCV attack left it far behind LetaBot’s cheaper SCV defense; LetaBot seemed to understand and quickly won with a tank-goliath attack. LetaBot advances 2-1 on its strong defensive and counter-attacking reactions.

Versus Locutus in the next round: LetaBot crashed 2 games in a row.

Versus BananaBrain in the loser’s bracket: LetaBot crashed 2 games in a row.

So... there might be bug fixes, but we didn’t get to see because there were bugs. :-(

what about the Torch Up tournament?

What ever happened with the Torch Up: FOSDEM’20 Brood War AI Tournament? All the pages saying to be ready by 25 January are still up, and I found a list of 11 participants, but I can’t find any sign of results, including at the FOSDEM 2020 conference which happened at the start of February.

It seemed like an interesting competition with some different decisions than other tournaments. The map pool, in a post that is mysteriously dated 9 March, includes Gold Rush, which has difficult features (Steamhammer struggles on that map, though it can still beat the built-in AI). Some of the optional challenge maps are extremely demanding—are these for a future iteration?

Did the tournament happen? Did it complete? Were the results perhaps announced to raised beers and general carousing so that nobody remembers them any more?

AIST S3 pairings

The AIST S3 pairings are out. Normally that would not be worth a post, it’s only first-round pairings. But it immediately struck me that there are 3 mirror matchups out of 4 pairings. It’s the maximum possible; there are 3 terrans and 3 protoss, so not all can be paired with the same race. If that happened to me I would have to step away from the computer, I would be tempted to rerun the pairings to get something “more random”—there’s an irrational reaction!

PurpleWave	BananaBrain
McRave zerg	Microwave
WillyT	Locutus
Dragon	LetaBot

How unlikely is it? Doing the combinatorics, I get 8! / (2⁴ * 4!) = 105 possible pairings, of which 9 have the maximum-mirror property—3 ways to choose the protoss player, 3 ways to choose the terran, and after that choice everything else is fixed. That’s about 8.6%, not all that rare. It seemed more surprising than it should have.

Starting with mostly mirrors doesn’t affect the tournament much. We should have slightly fewer than average mirrors in the next round in both the winner’s and loser’s bracket; it will even out over the tournament.

I judge that BananaBrain, McRave, and WillyT have little chance to upset their stronger opponents. I’m excited to see skilled defender LetaBot play against the dangerous harasser Dragon. Is this LetaBot version much improved?