BWAPI bots - 19 | Starcraft AI blog

Steamhammer 2.1.2 test version

I’ve uploaded the next test version to SSCAIT, Steamhammer 2.1.2. The biggest change is that stuck units are less common. There is still an issue where units can occasionally freeze en masse in the late game, but it’s rare in my tests and so far I’ve only seen it when Steamhammer was winning and it didn’t matter. To fix sticking, I finished up and enabled part of Steamhammer’s new unit control infrastructure, work that has been inching forward for months. I didn’t change anything else or take any unsticking actions; the new structure usually works better by nature than the classic structure inherited from UAlbertaBot.

There’s also a fix for a building manager bug that incorrectly turned expansion hatcheries into macro hatcheries. The bug sometimes made play much worse, but also sometimes made it better, so testing is called for.

AIIDE 2018 - what CherryPi learned

Here is a table of how each CherryPi opening fared against each opponent, like the tables I made for other bots. Reading the code confirmed my inference that the learning files recorded opening build orders, not build orders switched to later in the game; see how CherryPi played.

#	bot	total	10hatchling	2hatchmuta	3basepoollings	9poolspeedlingmuta	hydracheese	zve9poolspeed	zvp10hatch	zvp3hatchhydra	zvp6hatchhydra	zvpohydras	zvpomutas	zvt2baseguardian	zvt2baseultra	zvt3hatchlurker	zvtmacro	zvz12poolhydras	zvz9gas10pool	zvz9poolspeed	zvzoverpool
#1	saida	13-90 13%	-	-	-	-	-	1-19 5%	-	-	-	-	-	-	1-15 6%	9-37 20%	2-19 10%	-	-	-	-
#3	cse	73-30 71%	-	-	-	-	-	0-2 0%	24-5 83%	-	-	16-8 67%	-	-	-	-	33-15 69%	-	-	-	-
#4	bluebluesky	89-14 86%	-	-	-	-	-	0-1 0%	29-8 78%	-	-	-	-	-	-	-	60-5 92%	-	-	-	-
#5	locutus	84-19 82%	-	-	63-11 85%	-	-	-	-	-	14-3 82%	-	2-2 50%	-	-	-	5-3 62%	-	-	-	-
#6	isamind	99-4 96%	-	-	1-0 100%	-	-	-	-	-	98-4 96%	-	-	-	-	-	-	-	-	-	-
#7	daqin	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	103-0 100%	-	-	-	-
#8	mcrave	87-16 84%	-	-	9-2 82%	-	-	-	-	-	31-4 89%	-	14-4 78%	-	-	-	33-6 85%	-	-	-	-
#9	iron	97-6 94%	-	-	-	-	97-6 94%	-	-	-	-	-	-	-	-	-	-	-	-	-	-
#10	zzzkbot	93-10 90%	58-4 94%	-	-	0-1 0%	-	-	-	-	-	-	-	-	-	-	-	-	-	35-4 90%	0-1 0%
#11	steamhammer	81-21 79%	22-7 76%	-	-	-	-	16-5 76%	-	-	-	-	-	-	-	-	-	0-1 0%	-	43-8 84%	-
#12	microwave	94-9 91%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0-1 0%	4-2 67%	90-6 94%
#13	lastorder	85-18 83%	45-7 87%	-	-	-	-	0-1 0%	-	-	-	-	-	-	-	-	-	-	-	-	40-10 80%
#14	tyr	98-5 95%	-	-	-	-	-	-	98-5 95%	-	-	-	-	-	-	-	-	-	-	-	-
#15	metabot	94-2 98%	-	-	-	-	-	-	-	-	-	94-2 98%	-	-	-	-	-	-	-	-	-
#16	letabot	101-2 98%	0-1 0%	-	97-0 100%	-	-	1-1 50%	-	-	-	-	-	3-0 100%	-	-	-	-	-	-	-
#17	arrakhammer	92-11 89%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	92-11 89%	-
#18	ecgberht	102-1 99%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	102-1 99%	-	-	-	-
#19	ualbertabot	99-4 96%	-	-	-	96-2 98%	-	3-2 60%	-	-	-	-	-	-	-	-	-	-	-	-	-
#20	ximp	98-5 95%	-	-	-	-	-	-	-	1-0 100%	-	97-5 95%	-	-	-	-	-	-	-	-	-
#21	cdbot	103-0 100%	-	-	-	-	-	96-0 100%	-	-	-	-	-	-	-	-	-	-	-	7-0 100%	-
#22	aiur	100-3 97%	-	-	-	-	-	-	-	-	-	100-3 97%	-	-	-	-	-	-	-	-	-
#23	killall	103-0 100%	102-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	1-0 100%
#24	willyt	103-0 100%	-	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
#25	ailien	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	103-0 100%	-
#26	cunybot	100-3 97%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	100-3 97%	-
#27	hellbot	103-0 100%	-	-	-	-	-	-	31-0 100%	-	-	72-0 100%	-	-	-	-	-	-	-	-	-
overall		- 90%	227-19 92%	103-0 100%	170-13 93%	96-3 97%	97-6 94%	117-31 79%	182-18 91%	1-0 100%	143-11 93%	379-18 95%	16-6 73%	3-0 100%	1-15 6%	9-37 20%	338-49 87%	0-1 0%	0-1 0%	384-28 93%	131-17 89%

Look how sparse the chart is—CherryPi was highly selective about its choices. It did not try more than 4 different builds against any opponent. It makes sense to minimize the number of choices so that you don’t lose games exploring bad ones, but you have to be pretty sure that one of the choices you do try is good. Where did the selectivity come from?

The opening “hydracheese” was played only against Iron, and was the only opening played against Iron. It smelled like a hand-coded choice. Sure enough, the file source/src/models/banditconfigurations.cpp configures builds by name for 18 of the 27 entrants. A comment says that the build order switcher is turned off for the hydracheese opening only: “BOS disabled for this specific build because the model hasn’t seen it.” Here is the full set of builds configured, including defaults for those that were not hand-configured. CherryPi played only builds that were configured, but did not play all the builds that were configured; presumably it stopped when it hit a good one.

bots	builds	note
AILien	zve9poolspeed zvz9poolspeed	returning opponents from last year
AIUR	zvtmacro zvpohydras zvp10hatch
Arrakhammer	10hatchling zvz9poolspeed
Iron	hydracheese
UAlbertaBot	zve9poolspeed 9poolspeedlingmuta
Ximp	zvpohydras zvtmacro zvp3hatchhydra
Microwave	zvzoverpool zvz9poolspeed zvz9gas10pool	“we have some expectations”
Steamhammer	zve9poolspeed zvz9poolspeed zvz12poolhydras 10hatchling
ZZZKBot	9poolspeedlingmuta 10hatchling zvz9poolspeed zvzoverpool
ISAMind Locutus McRave DaQin	zvtmacro zvp6hatchhydra 3basepoollings zvpomutas
CUNYBot	zvzoverpoolplus1 zvz9gas10pool zvz9poolspeed
HannesBredberg	zvtp1hatchlurker zvt2baseultra zvt3hatchlurker zvp10hatch
LetaBot	zvtmacro 3basepoollings zvt2baseguardian zve9poolspeed 10hatchling
MetaBot	zvtmacro zvpohydras zvpomutas zve9poolspeed
WillyT	zvt2baseultra 12poolmuta 2hatchmuta
ZvT	zvt2baseultra zvtmacro zvt3hatchlurker zve9poolspeed	defaults
ZvP	zve9poolspeed zvtmacro zvp10hatch zvpohydras
ZvZ	10hatchling zve9poolspeed zvz9poolspeed zvzoverpool
ZvR	10hatchling zve9poolspeed 9poolspeedlingmuta

I read this as pulling out all the stops to reach #1. They would have succeeded if not for SAIDA.

banditconfigurations.cpp continues and declares some properties for builds including non-opening builds. It looks like .validOpening() tells whether it can be played as an opening build, .validSwitch() tells whether the build order switcher is allowed to switch to it during the game, and .switchEnabled() tells whether the build order switcher is enabled at all.

The build orders themselves are defined in source/src/buildorders/. I found them a little hard to read, partly because they are written in reverse order: Actions to happen first are posted last to the blackboard.

The opening zve9poolspeed (I read “zve” as zerg versus everything) has the most red boxes in the chart—it did poorly against more opponents than any other. It may have been a poor choice to configure for use in so many cases. In contrast, zvz9poolspeed specialized for ZvZ was successful. It gets fast mutalisks and in general has a lot more strategic detail coded into the build.

They seem to have had expectations of the zvt2baseultra build against terran. It is configured for HannesBredberg, WillyT, and the default ZvT. It was in fact only tried against SAIDA. I didn’t notice anything that tells CherryPi what order to try opening builds in. Maybe the build order switcher itself contributes, helping to choose the more likely openings first?

Steamhammer 2.1.1 test version

I’ve uploaded Steamhammer 2.1.1 to SSCAIT. It’s a test version that I don’t expect to release source for. I’ll follow a plan similar to last year’s: Upload a series of test versions as the tournament approaches, in hope of catching any new bugs and weaknesses before it is too late. The version that goes into the tournament will be whatever is ready by then, and that is the version that I’ll do the full release job for. Last year I used “1.4a1” and so on as test version numbers. This year it will be “2.1.x”.

There are quite a few changes, and they are probably not what you are expecting. I count 5 fixes for serious bugs that could easily lose games, plus mitigations to reduce 2 important weaknesses, plus a bunch of quick corrections and tweaks for less important issues that came up. There is still 1 major bug at the top of my list to fix before the tournament, and we’ll see whether I’ve introduced any awful new weaknesses—some of the changes are risky, it would not be a surprise.

I disabled Randomhammer until after the tournament. It’s not broken or anything, I just felt like turning it off in plenty of time this year. Everybody else will get in a few more last-minute games.

I changed the debug drawing options to show off a new option that I added, DrawHiddenEnemies. It draws 4 symbols to represent known enemy units and buildings that cannot be seen at the moment: An enemy that is out of sight gets a small green circle at its last seen location, or a yellow circle if it was burrowed when last seen (especially useful for spider mines and lurkers). An enemy that is out of sight and is known to be no longer at its last seen location gets a red X instead. And an enemy which is in sight and is not detected gets a larger violet circle (a faint color that I think of as representing the cloaking field) and is labeled in white with its unit type. The picture is from a game versus Iron. The yellow circles are known spider mines outside detection range, and the green circle and red X mean that at least 2 more enemies are out of sight, likely nearby.

I’m not finished with AIIDE 2018. I want to analyze aspects of CherryPi and SAIDA. I’ll squeeze that in too, sooner or later.

Steamhammer 2.1 source available

Steamhammer’s web page is updated. You can finally download the source of version 2.1. Sorry about the long delay, it was a mistake!

LastOrder and its macro model - technical info

Time to dig into the details! I read the paper and some of the code to find stuff out.

LastOrder’s “macro” decisions are made by a neural network whose data size is close to 8MB—much larger than LastOrder’s code (but much smaller than CherryPi’s model data). There is room for a lot of smarts in that many bytes. The network takes in a summary description of the game state as a vector of feature values, and returns a macro action, what to build or upgrade or whatever next. The code to marshal data to and from the network is in StrategyManager.cpp.

network input

The list of network input features is initialized in the StrategyManager constructor and filled in in StrategyManager::triggerModel(). There are a lot of features. I didn’t dig into the details, but it looks as though some of the features are binary, some are counts, some are X or Y values that together give a position on the map, and a few are other numbers. They fall into these groups:

• State features. Basic information about the map and the opponent, our upgrades and economy, our own and enemy tech buildings.

• Waiting to build features. I’m not sure what these mean, but it’s something to do with production.

• “Our battle basic features” and “enemy battle basic features.” Combat units.

• “Our defend building features” and “enemy defend building features.” Static defense.

• “Killed” and “kill” features, what units of ours or the enemy’s are destroyed.

• A mass of features related to our current attack action and what the enemy has available to defend against it.

• “Our defend info” looks like places we are defending and what the enemy is attacking with.

• “Enemy defend info” looks like it locates the enemy’s static defense relative to the places we are interested in attacking.

• “Visible” gives the locations of the currently visible enemy unit types. I’m not quite sure what this means. A unit type doesn’t have an (x,y) position, and it seems as though LastOrder is making one up. It could be the location of the largest group of each unit type, or the closest unit of each type, or something. Have to read more code.

With this much information available, sophisticated strategies are possible in principle. It’s not clear how much of this the network successfully understands and makes use of. The games I watched did not give the impression of deep understanding, but then again, we have to remember that LastOrder learned to play against 20 specific opponents. Its results against those opponents suggest that it does understand them deeply.

network output

It looks like the network output is a single macro action. Code checks whether the action is valid in the current situation and, if so, calls on the appropriate manager to carry it out. The code is full of I/O details and validation and error handling, so I might have missed something in the clutter. Also the code shows signs of having been modified over time without tying up loose ends. I imagine they experimented actively.

By the way, the 9 pool/10 hatch muta/12 hatch muta opening choices and learning code from Overkill are still there, though Overkill’s opening learning is not used.

learning setup

The learning setup uses Ape-X DQN. The term is as dense as a neutron star! Ape-X is a way to organize deep reinforcement learning; see the paper Distributed Prioritized Experience Replay by Horgan et al of Google’s DeepMind. In “DQN”, D stands for deep and as far as I’m concerned is a term of hype and means “we’re doing the cool stuff.” Q is for Q-learning, the form of reinforcement learning you use when you know what’s good (winning the game) and you have to figure out from experience a policy (that’s a technical term) to achieve it in a series of steps over time. The policy is in effect a box where you feed in the situation and it tells you what to do in that situation. What’s good is represented by a reward (given as a number) that you may receive long after the actions that earned it; that can make it hard to figure out a good policy, which is why you end up training on a cluster of 1000 machines. Finally, “N” is for the neural network that acts as the box that knows the policy.

In Ape-X, the learning system consists of a set of Actors that (in the case of LastOrder) play Brood War and record the input features and reward for each time step, plus a Learner (the paper suggests that 1 learner is enough, though you could have more) that feeds the data to a reinforcement learning algorithm. The Actors are responsible for exploring, that is, trying out variations from the current best known policy to see if any of them are improvements. The Ape-X paper suggests having different Actors explore differently so you don’t get stuck in a rut. In the case of LastOrder, the Actors play against a range of opponents. The Learner keeps track of which which data points are more important to learn and feeds those in more often to speed up learning. If you hit a surprise, meaning the reward is much different than you expected (“I thought I was winning, then a nuke hit”), that’s something important to learn.

LastOrder seems to have closely followed the Ape-X DQN formula from the Ape-X paper. They name the exact same set of techniques, although many other choices are possible. Presumably DeepMind knows what they’re doing.

LastOrder does not train with a reward “I won/I lost.” That’s very little information and appears long after the actions that cause it, and it would leave learning glacially slow. They use reward shaping, which means giving a more informative reward number that offers the learning system more clues about whether it is going in the right direction. They use a reward based on the current game score.

the network itself

Following an idea from the 2015 paper Deep Recurrent Q-Learning for Partially Observable MDPs by Hausknecht and Stone, the LastOrder team layered a Long Short-Term Memory network in front of the DQN. We’ve seen LSTM before in Tscmoo (at least at one point; is it still there?). The point of the LSTM network is to remember what’s going on and more fully represent the game state, because in Brood War there is fog of war. So inputs go through the LSTM to expand the currently observed game state into some encoded approximation of all the game state that has been seen so far, then through the DQN to turn that into an action.

The LastOrder paper does not go into detail. There is not enough information in it to reproduce their network design. The Actor and Learner code is in the repo. I haven’t read it to see if it tells us everything.

Taken together it’s a little complicated, isn’t it? Not something for one hobbyist to try on their own. I think you need a team and a budget to put together something like this.

LastOrder and its macro model - general info

LastOrder (github) now has a 15-page academic paper out, Macro action selection with deep reinforcement learning in StarCraft by 6 authors including Sijia Xu as lead author. The paper does not go into great detail, but it reveals new information. It also uses a lot of technical terms without explanation, so it may be hard to follow if you don’t have the background. Also see my recent post how LastOrder played for a concrete look at its games.

I want to break my discussion into 2 parts. Today I’ll go over general information, tomorrow I’ll work through technical stuff, the network input and output and training and so on.

The name LastOrder turns out to be an ingenious reference to the character Last Order from the A Certain Magical Index fictional universe, the final clone sister. The machine learning process produces a long string of similar models which go into combat for experimental purposes, and you keep the last one. Good name!

LastOrder divides its work into 2 halves, “macro” handled by the machine learning model and “micro” handled by the rule-based code derived from Overkill. It’s a broad distinction; in Steamhammer’s 4-level abstraction terms, I would say that “macro” more or less covers strategy and operations, while “micro” covers tactics and micro. The macro model has a set of actions to build stuff, research tech, and expand to a new base, and a set of 18 attack actions which call for 3 different types of attack in each of 5 different places plus 3 “add army” actions which apparently assign units to the 3 types of attack. (The paper says 17 though it lists 18. It looks as though the mutalisk add army action is unused, maybe because mutas are added automatically.) There is also an action to do nothing.

The paper includes a table on the last page, results of a test tournament where each of the 28 AIIDE 2017 participants played 303 games against LastOrder. We get to see how LastOrder scored its advertised 83% win rate: #2 PurpleWave and #3 Iron (rankings from AIIDE 2017) won nearly all games, no doubt overwhelming the rule-based part of LastOrder so that the macro model could not help. Next Microwave scored just under 50%, XIMP scored about 32%, and all others performed worse, including #1 ZZZKBot at 1.64% win rate—9 bots scored under 2%. When LastOrder’s micro part is good enough, the macro part is highly effective.

In AIIDE 2018, #13 LastOrder scored 49%, ranking in the top half. The paper has a brief discussion on page 10. LastOrder was rolled by top finishers because the micro part could not keep up with #9 Iron and above (according to me) or #8 McRave and above (according to the authors, who know things I don’t). Learning can’t help if you’re too burned to learn. LastOrder was also put down by terrans Ecgberht and WillyT, whose play styles are not represented in the 2017 training group, which has only 4 terrans (one of which is Iron that LastOrder cannot touch).

In the discussion of future work (a mandatory part of an academic paper; the work is required to be unending), they talk briefly about how to fix the weaknesses that showed in AIIDE 2018. They mention improving the rule-based part and learning unit-level micro to address the too-burned-to-learn problem, and self-play training to address the limitations of the training opponents. Self-play is the right idea in principle, but it’s not easy. You have to play all 3 races and support all the behaviors you might face, and that’s only the starting point before making it work.

I’d like to suggest another simple idea for future work: Train each matchup separately. You lose generalization, but how much do production and attack decisions generalize between matchups? I could be wrong, but I think not much. Instead, a zerg player could train 3 models, ZvT ZvP and ZvZ, each of which takes fewer inputs and is solving an easier problem. A disadvantage is that protoss becomes relatively more difficult if you allow for mind control.

LastOrder has skills that I did not see in the games I watched. There is code for them, at least; whether it can carry out the skills successfully is a separate question. It can use hydralisks and lurkers. Most interestingly, it knows how to drop. The macro model includes an action to research drop (UpgradeOverlordLoad), an action to assign units and presumably load up for a drop (AirDropAddArmy), and actions to carry out drops in different places (AttackAirDropStart for the enemy starting base, AttackAirDropNatural, AttackAirDropOther1, AttackAirDropOther2, AttackAirDropOther3). The code to carry out drops is AirdropTactic.cpp; it seems to expect to drop either all zerglings, all hydralisks, or all lurkers, no mixed unit types. Does LastOrder use these skills at all? If anybody can point out a game, I’m interested.

Learning to when to make hydras and lurkers should not be too hard. If LastOrder rarely or never uses hydras, it must be because it found another plan more effective—in games you make hydras first and then get the upgrades, so it’s easy to figure out. If it doesn’t use lurkers, maybe they didn’t help, or maybe it didn’t have any hydras around to morph after it tried researching the upgrade, because hydras were seen as useless. But still, it’s only 2 steps, it should be doable. Learning to drop is not as easy, though. To earn a reward, the agent has to select the upgrade action, the load action, and the drop action in order, each at a time when it makes sense. Doing only part of the sequence sets you back, and so does doing the whole sequence if you leave too much time between the steps, or drop straight into the enemy army, or make any severe mistake. You have to carry through accurately to get the payoff. It should be learnable, but it may take a long time and trainloads of data. I would be tempted to explicitly represent dependencies like this in some way or another, to tell the model up front the required order of events.

AIIDE 2018 - how CherryPi played

Overall, the play of the AIIDE 2018 CherryPi version looks similar to last year’s CherryPi which is still playing on SSCAIT. It still has the devastating ling micro, and it still prefers to win games with a flood of low-level units. It still gets melee +1 attack even when +1 carapace seems better. (Do CherryPi’s micro skills make +1 attack better, and if so, how?) Mutalisk micro looks very similar to Tscmoo’s, with mutas individually cautious and clever and collectively lazy and uncoordinated. It can use lurkers, guardians, and ultralisks. I didn’t see defilers, even when they would have been useful.

CherryPi scouts extremely aggressively with its first 2 overlords. They stick near the enemy base and try to poke into every corner, even if the enemy is terran and can shoot them down early. It gets a clear view, which must be useful for its build order switcher. The drawback is that the overlords often die young.

I think this CherryPi looks beatable. It doesn’t have SAIDA’s wide knowledge of action and reaction. It doesn’t have Steamhammer’s knowledge of how to react to LastOrder’s excessive static defense (but usually wins anyway with a zergling flood). It sometimes ignores undefended enemy bases, preferring to attack into the enemy’s strength—or even to wait idly. Game 31245 versus Iron shows it sticking with gas units and failing at macro; it forgot its love of zerglings. It doesn’t know whether it is ahead or behind, and it doesn’t realize that when it is maxed and owns the map, it ought to attack regardless of losses. It’s strong and tricky, but it also makes mistakes. I think next year’s version had better be improved if they don’t want to be overtaken.

Here are the names of the build orders that CherryPi recorded itself as playing in its opponent learning files. One of CherryPi’s major advertised features is a learned build order switcher that can switch to a new build order on the fly. It recorded 103 build order wins/losses for each opponent (except a couple with fewer), and 103 rounds were played, so these appear to be opening build orders only rather than all build orders tried throughout each game. Presumably the openings reflect CherryPi’s intentions when it started the game. It may not have followed the initial build order to its end.

10hatchling
2hatchmuta
3basepoollings
9poolspeedlingmuta
hydracheese
zve9poolspeed
zvp10hatch
zvp3hatchhydra
zvp6hatchhydra
zvpohydras
zvpomutas
zvt2baseguardian
zvt2baseultra
zvt3hatchlurker
zvtmacro
zvz12poolhydras
zvz9gas10pool
zvz9poolspeed
zvzoverpool

CherryPi tried between 1 and 4 openings against each opponent. CherryPi sometimes switched away from its initial try even if it won all games (for example, against CDBot and Hellbot), so I’m not sure what the switching criterion is. But opponents that it tried 4 openings against are all ones that gave it a touch of trouble.

grep -c key *.json

AILien.json:1
Aiur.json:1
Arrakhammer.json:1
BlueBlueSky.json:3
CDBot.json:2
CSE.json:4
CUNYBot.json:1
DaQin.json:1
Ecgberht.json:1
Hellbot.json:2
ISAMind.json:2
Iron.json:1
KillAll.json:2
LastOrder.json:3
LetaBot.json:4
Locutus.json:4
McRave.json:4
MetaBot.json:1
Microwave.json:3
SAIDA.json:4
Steamhammer.json:4
Tyr.json:1
UAlbertaBot.json:2
WillyT.json:1
Ximp.json:2
ZZZKBot.json:4

The other machine learning feature advertised for CherryPi is a building placer. It was trained against human building placements and apparently takes into account some of the bot’s intentions. I recommend against training on human play (or at least exclusively on human play), because machines play differently. Teaching a bot to blindly imitate human decisions that it doesn’t understand will lead to mistakes. It’s worse than teaching a human to imitate without understanding, because the bot won’t figure things out on its own. Nevertheless, CherryPi’s building placement does seem cleaner than other bots. To me the building placement looks simple and logical, but not sophisticated like a strong human player’s. Here’s an example from a ZvZ game, game 1755. The sunken colony does not interfere with gas mining, and it is somewhat protected from zergling surrounds by the geyser, the spawning pool, and the lair itself, while remaining open for drone drills on the drone side. The spire is curiously far away; I would have fit it into the gap next to the sunken. It looks OK but a little loose, not quite optimized. (By the way, game 14742 against the same opponent has the same building layout, except that the spire is placed close.)

CherryPi has gained new tactical tricks. I mentioned the burrow trick where it burrows zerglings at expansion locations. So far, I haven’t seen a game where the opponent was ready for the trick; I imagine it contributed to a lot of wins, even though CherryPi sometimes researches burrow and then never uses it. (And I’m disappointed. I thought of using this trick in Steamhammer, and didn’t because I expected that bots which knew how to clear spider mines would also know how to clear burrowed zerglings. I think I was wrong!) As far as I’ve seen, CherryPi doesn’t use burrow for any other purpose (though I wouldn’t be surprised, since there are so many). CherryPi also does zergling runbys; an example is game 1406 versus SAIDA where CherryPi played an unusual and not entirely efficient gas-first 3 hatch zergling build.

CherryPi doesn’t have as many complex skills as SAIDA, but it has a good number. I doubt I saw everything it can do.

Steamhammer 2.1.1 status

Posts take time, but I am also making progress on Steamhammer. The next version will have no big features, so I’ll call it version 2.1.1. Big stuff has to wait until next year, after the tournament season. I found a second serious bug in defiler control, and now defilers consistently move to where they are wanted. It helps them swarm and plague more actively, though they still don’t do as much work as I would like. So far, 2.1.1 development version keeps track of burrowed units accurately, recognizes enemy proxies better, wards off the enemy scout worker more reliably, and has some improved macro decisions and emergency reactions, a new opening (of course), and a variety of other fixes.

There are a bunch of debilitating bugs in squad control and micro. For upcoming work, I have my eyes on 2 bugs in particular that I think cause the most frequent setbacks (rather than the most glaring blunders), the suicide pokes and the stuck units. If I get those fixed in time, I have a priority list of more stuff. If I succeed, Steamhammer will play better in nearly every game, which should show in the SSCAIT round robin phase.

note on CherryPi

I’ve been watching CherryPi’s AIIDE games. No conclusions yet, but I noticed that CherryPi likes to research burrow (not the first bot to do so) and burrow scouting lings at expansions to watch for the enemy (I think it’s the first to do that). SAIDA appeared unready for the trick. When an SCV showed up, CherryPi did not unburrow the ling, but sent another to prevent the expansion.

AIIDE 2018 - how LastOrder played

The new bot #13 LastOrder is related to the classic Overkill by Sijia Xu, but uses a machine learning model to make certain decisions: According to the description, “all the production of unit (excluding overlord), building, upgrade, tech and trigger of attack.” The learning is entirely offline; LastOrder does not store information about its opponents between games. Tactical and micro decisions, and I think building placement, are decided by rule-based code. One survey answer says,

we train the proposed macro model against about 20 AIIDE 2017 bots on a 1000 machines cluster scheduled by Tournament manager. the final model achieve about 83% win rate on all AIIDE 2017 bots

Against the stronger AIIDE 2018 bots, LastOrder scored about 49%, good enough to land in the top half of the ranking. I think the 83% win rate is too high for LastOrder’s underlying strength; I suspect that it overfitted to its 20 opponents. I think it learned to recognize some of its training opponents by their play style, and when it sees similar signs from different bots that play differently, it reacts incorrectly to a game plan that the different bot does not follow.

I watched a bunch of games to see what kind of play LastOrder figured out for itself. LastOrder’s units are mutalisks and zerglings, sometimes with scourge; I did not see it make other units (though Overkill has hydralisk skills that it might have chosen). LastOrder’s game plan is to open safely with 9 pool, sit back for a while, watch the opponent, lay down massive static defenses when danger seems to loom, macro up lots of drones, zerglings, and mutalisks behind its ramparts, and eventually burst into action and overwhelm the opponent. Details vary, but the overall game plan seemed consistent in all the games I watched.

It’s not an objectively strong game plan, but it seems effective against many bots. LastOrder had trouble touching stronger bots, upsetting only Steamhammer, and was itself upset by Ecgberht and WillyT, which as terrans had no difficulty steamrolling static defenses. But it scored highly against most lower-ranked opponents, including LetaBot (which may have been on its panel of 20 with little change).

Game 39, LastOrder-Steamhammer (replay file), was a good example of the game plan. LastOrder countered zergling with zergling for a while, then seemed to grow bored and made 4 sunkens to hide behind—far more than necessary or useful. A little later, it seemed to predict Steamhammer’s spire timing, adding excessive spores too. Steamhammer understands in principle how to react: It makes extra drones and gets ahead in both army and economy. Steamhammer could not safely attack the heavy defenses, but it could prevent LastOrder from expanding beyond its natural and win slowly. Sure enough, LastOrder tried to expand to a third, Steamhammer caught it and sent the entire army to erase the attempt—and LastOrder exploited the play, which was strategically correct but tactically wrong, hitting Steamhammer’s natural while its forces were out of position. Steamhammer’s tactical analysis is minimal; it doesn’t realize that it should destroy the expansion attempt with a small detachment.

Game 33041, LastOrder-Tyr (replay file), is one of the games that makes me suspect that LastOrder overfitted. Watch what happens after 7:00. LastOrder scouts Tyr’s turtle cannons with a zergling. LastOrder immediately reacts by building... many spore colonies, a nonsensical action. I think LastOrder saw the cannons and concluded, “I’ve seen this play before, and I know what is coming: Carriers!” It believes it is playing against XIMP. It plays similarly in games against XIMP.

LastOrder is a super interesting experiment. It did not score high like CherryPi, but it applied reinforcement learning to a more difficult problem, and it is far more successful than Sijia Xu’s past experiments with machine learning in Overkill. Its middling result is worth something, and yet its play remains somewhat disappointing. LastOrder’s play is highly reactive, but the reactions are often poor and the bot’s range of play is narrow (a wider pool of training opponents should help). I didn’t give examples, but many games show dishearteningly weak macro and mistaken tech decisions (possibly a better training methodology is needed). The problem is not solved yet!

AIIDE 2018 - what McRave learned

McRave, like Microwave and no doubt most bots that follow more than one plan, plays different openings against different races. In each opponent’s learning file, it writes win/loss numbers for 15 strategies. Their names all start with “P” for protoss, but I have stripped out the P to make the table more readable. 4 of the strategies are unused: ZealotDrop, NZCore (sounds like no zealot core), Proxy99, and Proxy6. That leaves 11 active openings. The races they were used against seen in the table. ZZCore (2 zealots before core) was played only against random.

#	bot	total	12Nexus	1GateCorsair	1GateRobo	21Nexus	2GateDragoon	2GateExpand	4Gate	DTExpand	FFE	ZCore	ZZCore
#1	saida	16-55 23%	1-12 8%	-	-	7-17 29%	1-12 8%	-	-	7-14 33%	-	-	-
#2	cherrypi	15-88 15%	-	6-25 19%	-	-	-	6-25 19%	2-20 9%	-	1-18 5%	-	-
#3	cse	27-75 26%	-	-	7-19 27%	-	-	5-17 23%	2-15 12%	-	-	13-24 35%	-
#4	bluebluesky	29-74 28%	-	-	1-14 7%	-	-	2-15 12%	7-18 28%	-	-	19-27 41%	-
#5	locutus	46-56 45%	-	-	5-12 29%	-	-	15-15 50%	14-15 48%	-	-	12-14 46%	-
#6	isamind	54-49 52%	-	-	7-11 39%	-	-	4-10 29%	15-14 52%	-	-	28-14 67%	-
#7	daqin	60-43 58%	-	-	13-11 54%	-	-	4-9 31%	8-10 44%	-	-	35-13 73%	-
#9	iron	56-32 64%	27-8 77%	-	-	2-7 22%	18-9 67%	-	-	9-8 53%	-	-	-
#10	zzzkbot	75-28 73%	-	8-7 53%	-	-	-	17-7 71%	21-7 75%	-	29-7 81%	-	-
#11	steamhammer	64-38 63%	-	9-9 50%	-	-	-	27-10 73%	15-10 60%	-	13-9 59%	-	-
#12	microwave	82-21 80%	-	0-5 0%	-	-	-	39-4 91%	30-5 86%	-	13-7 65%	-	-
#13	lastorder	97-6 94%	-	10-2 83%	-	-	-	17-1 94%	10-2 83%	-	60-1 98%	-	-
#14	tyr	91-10 90%	-	-	23-3 88%	-	-	7-5 58%	31-1 97%	-	-	30-1 97%	-
#15	metabot	49-46 52%	-	-	8-11 42%	-	-	16-12 57%	23-14 62%	-	-	2-9 18%	-
#16	letabot	77-15 84%	12-5 71%	-	-	5-5 50%	20-4 83%	-	-	40-1 98%	-	-	-
#17	arrakhammer	102-1 99%	-	-	-	-	-	-	94-1 99%	-	8-0 100%	-	-
#18	ecgberht	99-2 98%	95-0 100%	-	-	-	3-1 75%	-	-	1-1 50%	-	-	-
#19	ualbertabot	73-29 72%	-	-	-	-	-	12-8 60%	38-6 86%	-	7-7 50%	-	16-8 67%
#20	ximp	41-59 41%	-	-	8-14 36%	-	-	15-17 47%	18-18 50%	-	-	0-10 0%	-
#21	cdbot	103-0 100%	-	-	-	-	-	-	103-0 100%	-	-	-	-
#22	aiur	80-21 79%	-	-	11-6 65%	-	-	13-6 68%	41-3 93%	-	-	15-6 71%	-
#23	killall	60-43 58%	-	3-9 25%	-	-	-	6-9 40%	19-12 61%	-	32-13 71%	-	-
#24	willyt	77-17 82%	37-2 95%	-	-	3-6 33%	23-4 85%	-	-	14-5 74%	-	-	-
#25	ailien	86-17 83%	-	31-3 91%	-	-	-	20-5 80%	5-6 45%	-	30-3 91%	-	-
#26	cunybot	91-8 92%	-	26-1 96%	-	-	-	36-1 97%	14-3 82%	-	15-3 83%	-	-
#27	hellbot	103-0 100%	-	-	-	-	-	-	-	-	-	103-0 100%	-
overall		- 68%	172-27 86%	93-61 60%	83-101 45%	17-35 33%	65-30 68%	261-176 60%	510-180 74%	71-29 71%	208-68 75%	257-118 69%	16-8 67%

Unlike other bots that scored comparatively well against SAIDA—meaning they weren’t always wiped summarily from the map—McRave did not rely solely on cloaked units. The DTExpand opening scored best, but 21Nexus was nearly as successful. (McRave scored inconsistently against lower-ranked bots, though, as its author has commented.)

Every strategy came out with some good scores. But here is another analysis: Suppose the goal of the learning algorithm is to find the single most successful strategy (which is not always true—you might want to find the best mix to confuse the opponent’s learning). Leaving aside CDBot and HellBot, which McRave scored 100% against, against how many opponents was each opening the best choice? I made this table by hand, so there might be mistakes. I counted equal best as also best. The “versus” column tells which races the opening was used against.

opening	best	versus
12Nexus	3	T
1GateCorsair	2	Z
1GateRobo	0	P
21Nexus	0	T
2GateDragoon	0	T
2GateExpand	6	P, Z, R
4Gate	5	P, Z, R
DTExpand	2	T
FFE	5	Z, R
ZCore	4	P
ZZCore	0	R

The counts do not match up well with the overall winning rates. There were 4 never-best openings. This analysis does not say that they are bad openings that dragged down the score. Consider what would have happened if they had not been enabled: Their games would have been distributed among the other openings; there would have been some extra wins and some extra losses, and the ratio would depend on the distribution. 21Nexus was never best, but scored second best against SAIDA and contributed as many wins. On the other hand, the openings which were often best were definitely worth having; they were well-chosen for McRave versus this set of opponents. It could make sense to try those openings first, or more often. On the third hand, notice that the openings with the highest counts were played against the largest number of opponents. There were more bests to count! Openings versus terran scored 5 bests because there were 5 terran opponents.

Plenty of similar analyses could be done. For example, you could count how often or how widely an opening scored above/below the average for each opponent: Did it make a net contribution, or the opposite? It would be another way of seeing whether the openings were well chosen for the opponents they faced.

Next I want to start watching some replays. I think I will start with LastOrder, which did all its learning offline yet held its win rate steady against the onslaught of learning bots. I’m expecting it to be interesting and sophisticated in some way.

AIIDE 2018 - what UAlbertaBot learned

UAlbertaBot played random, and its openings are chosen, not according to the opponent’s race, but according to its own once the game starts. It has 3 protoss, 4 terran, and 4 zerg openings. Playing random gives the disadvantage of having about 1/3 as many games to figure out how to counter the opponent with each race. The countervailing advantage, of course, is that the opponent can’t predict what is coming its way.

103 rounds were played and UAlbertaBot does not deliberately drop data, so some of the totals add up to more than the 100 official rounds. UAlbertaBot also had 46 crashes, so some totals add up to less. For example, it recorded 96 games against LastOrder.

The official site doesn’t offer binaries for the bots which were carried over from last year, but this should be the 2017 version of UAlbertaBot. It has enemy-specific strategies configured for 13 opponents, of which 5 are also in this tournament: #9 Iron, #10 ZZZKBot, #16 LetaBot, #2o Ximp, and #22 Aiur. For ZZZKBot, only the protoss opening is set; for the others, all 3 races have openings set. Looking at the table, we see that UAlbertaBot did not always try all of its openings, and the blanks in the table do not always correspond to enemy-specific openings. Apparently in this UAlbertaBot version, the enemy-specific strategies act as hints rather than requirements: When available they are tried first, and when not, the default opening is tried first (ZealotRush, MarineRush, or ZerglingRush). If the first opening tried performs well enough, UAlbertaBot sticks with it.

#	bot	total	Protoss			Terran				Zerg
#	bot	total	DTRush	DragoonRush	ZealotRush	4RaxMarines	MarineRush	TankPush	VultureRush	2HatchHydra	3HatchMuta	3HatchScourge	ZerglingRush
#1	saida	13-88 13%	12-7 63%	0-2 0%	0-5 0%	0-9 0%	0-9 0%	1-13 7%	0-9 0%	0-9 0%	0-9 0%	0-8 0%	0-8 0%
#2	cherrypi	1-99 1%	0-8 0%	0-7 0%	0-7 0%	0-8 0%	1-11 8%	0-8 0%	0-8 0%	0-11 0%	0-11 0%	0-10 0%	0-10 0%
#3	cse	2-99 2%	0-7 0%	2-14 12%	0-7 0%	0-11 0%	0-10 0%	0-10 0%	0-10 0%	0-8 0%	0-8 0%	0-7 0%	0-7 0%
#4	bluebluesky	11-92 11%	0-4 0%	3-10 23%	4-11 27%	0-5 0%	0-5 0%	2-11 15%	0-5 0%	0-9 0%	0-8 0%	0-8 0%	2-16 11%
#5	locutus	6-97 6%	0-7 0%	4-17 19%	0-7 0%	0-8 0%	0-8 0%	1-11 8%	0-8 0%	1-10 9%	0-7 0%	0-7 0%	0-7 0%
#6	isamind	5-96 5%	0-7 0%	4-17 19%	0-7 0%	0-9 0%	0-8 0%	0-8 0%	0-8 0%	0-7 0%	0-7 0%	0-7 0%	1-11 8%
#7	daqin	12-90 12%	4-12 25%	0-4 0%	2-9 18%	0-6 0%	0-6 0%	1-6 14%	0-5 0%	2-13 13%	0-7 0%	0-7 0%	3-15 17%
#8	mcrave	29-71 29%	5-12 29%	1-6 14%	0-5 0%	0-3 0%	10-13 43%	1-5 17%	0-3 0%	2-6 25%	0-3 0%	0-3 0%	10-12 45%
#9	iron	9-94 9%	0-10 0%	1-14 7%	0-9 0%	0-8 0%	0-8 0%	0-8 0%	1-12 8%	1-6 14%	1-6 14%	0-4 0%	5-9 36%
#10	zzzkbot	13-87 13%	0-3 0%	0-3 0%	13-20 39%	0-9 0%	0-9 0%	0-9 0%	0-9 0%	0-7 0%	0-6 0%	0-6 0%	0-6 0%
#11	steamhammer	11-92 11%	0-5 0%	0-5 0%	8-19 30%	1-10 9%	0-6 0%	0-6 0%	0-6 0%	0-7 0%	0-7 0%	0-7 0%	2-14 12%
#12	microwave	20-81 20%	-	-	18-7 72%	0-7 0%	2-14 12%	0-7 0%	0-7 0%	0-10 0%	0-10 0%	0-10 0%	0-9 0%
#13	lastorder	4-92 4%	0-6 0%	0-6 0%	2-12 14%	2-10 17%	0-5 0%	0-5 0%	0-5 0%	0-11 0%	0-11 0%	0-11 0%	0-10 0%
#14	tyr	36-61 37%	5-12 29%	0-4 0%	0-5 0%	0-2 0%	3-4 43%	13-7 65%	1-2 33%	13-15 46%	0-3 0%	0-3 0%	1-4 20%
#15	metabot	35-56 38%	4-5 44%	6-5 55%	2-4 33%	1-6 14%	3-9 25%	1-6 14%	0-3 0%	0-2 0%	6-3 67%	3-3 50%	9-10 47%
#16	letabot	48-44 52%	11-14 44%	0-3 0%	2-6 25%	0-2 0%	1-4 20%	0-2 0%	4-7 36%	30-6 83%	-	-	-
#17	arrakhammer	56-41 58%	-	-	23-6 79%	0-6 0%	0-6 0%	0-6 0%	0-6 0%	-	-	-	33-11 75%
#18	ecgberht	40-56 42%	9-7 56%	9-8 53%	1-4 20%	0-2 0%	0-5 0%	0-2 0%	6-7 46%	0-3 0%	0-3 0%	0-3 0%	15-12 56%
#20	ximp	38-56 40%	0-2 0%	7-7 50%	4-5 44%	0-4 0%	0-4 0%	9-19 32%	1-6 14%	-	-	17-9 65%	-
#21	cdbot	44-54 45%	-	-	23-4 85%	0-2 0%	19-15 56%	0-2 0%	0-2 0%	0-6 0%	1-9 10%	0-5 0%	1-9 10%
#22	aiur	57-45 56%	35-1 97%	-	-	0-2 0%	0-2 0%	0-2 0%	11-10 52%	1-5 17%	9-15 38%	0-3 0%	1-5 17%
#23	killall	73-27 73%	-	-	30-8 79%	0-2 0%	12-6 67%	0-2 0%	0-2 0%	-	-	-	31-7 82%
#24	willyt	36-55 40%	3-12 20%	1-8 11%	0-5 0%	0-4 0%	0-5 0%	0-4 0%	10-11 48%	-	-	-	22-6 79%
#25	ailien	71-30 70%	-	-	18-11 62%	16-10 62%	2-4 33%	0-2 0%	0-2 0%	-	-	-	35-1 97%
#26	cunybot	75-15 83%	-	-	23-1 96%	-	30-7 81%	-	-	-	-	-	22-7 76%
#27	hellbot	100-2 98%	-	-	33-0 100%	-	41-2 95%	-	-	-	-	-	26-0 100%
overall		- 33%	88-141 38%	38-140 21%	206-184 53%	20-145 12%	124-185 40%	29-161 15%	34-153 18%	50-151 25%	17-133 11%	20-121 14%	219-206 52%

The DT rush caused surprising problems for SAIDA, but terran and zerg had nothing. Did playing random contribute? Does the updated current SAIDA, flame-hardened on SSCAIT, react better? The hand-chosen 2 hatch hydra also did strikingly well against LetaBot, not an obvious choice. Every opening had a plus score against some opponent, though VultureRush barely made it over. Looking across the bottom row, the default openings had the best overall results for each race—they were chosen correctly. Also, we can see that protoss was UAlbertaBot’s best race, and terran the worst; we already knew that, but here we see it in the numbers.

AIIDE 2018 - what Microwave learned

Microwave uses UCB and keeps its learning data in the same file format as UAlbertaBot, one file per opponent listing on each line an opening, a count of wins, and a count of losses. It’s a simple format that is also used outside the UAlbertaBot family. Microwave adds a twist: It does not allow the count of wins or the count of losses to exceed 10. I’m not sure what the exact update rule is without reading the code, but the effect is that only the more recent game results are remembered. It’s appropriate if the enemy is expected to be learning too, and to change its strategy rapidly so that Microwave has to keep adapting.

Microwave plays different strategies against each race. Against Terran it has 7, against Protoss and Zerg 8, and against random 6. UAlbertaBot was the only random opponent. The strategies partly overlap. For example, 10Hatch9Pool9gas is played against both terran and protoss, while 9HatchMain8Pool8Gas is played only against zerg. The table has big blank spaces full of unplayed strategies. Maybe I should have sorted it by race, instead of by rank?

#	bot	total	10Hatch9Pool9gas	12Pool	3HatchPoolHydra	5HatchGasHydra	5Pool	9HatchMain8Pool8Gas	9Pool	9PoolExpo	9PoolHatch	9PoolLurker	9PoolSpeed	9PoolSpeedLing	9PoolSunken	Overpool	OverpoolSpeed	ZvT_12HatchHydra	ZvT_12HatchLurker	ZvT_12HatchMuta	ZvZ_Overpool11Gas
#1	saida	0-70 0%	0-10 0%	-	-	-	0-10 0%	-	0-10 0%	-	-	0-10 0%	-	-	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#2	cherrypi	0-80 0%	-	0-10 0%	-	-	0-10 0%	0-10 0%	-	-	0-10 0%	-	0-10 0%	-	0-10 0%	-	0-10 0%	-	-	-	0-10 0%
#3	cse	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#4	bluebluesky	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#5	locutus	1-80 1%	1-10 9%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#6	isamind	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#7	daqin	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#8	mcrave	7-68 9%	1-10 9%	-	1-10 9%	0-5 0%	1-8 11%	-	-	-	-	-	-	1-10 9%	-	-	-	1-10 9%	0-5 0%	2-10 17%	-
#9	iron	0-70 0%	0-10 0%	-	-	-	0-10 0%	-	0-10 0%	-	-	0-10 0%	-	-	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#10	zzzkbot	24-37 39%	-	5-8 38%	-	-	0-2 0%	9-10 47%	-	-	9-10 47%	-	0-1 0%	-	0-1 0%	-	0-1 0%	-	-	-	1-4 20%
#11	steamhammer	57-15 79%	-	10-2 83%	-	-	6-7 46%	1-2 33%	-	-	10-2 83%	-	10-0 100%	-	0-1 0%	-	10-1 91%	-	-	-	10-0 100%
#13	lastorder	24-21 53%	-	0-1 0%	-	-	10-2 83%	0-1 0%	-	-	2-4 33%	-	0-1 0%	-	10-6 62%	-	1-3 25%	-	-	-	1-3 25%
#14	tyr	15-13 54%	2-3 40%	-	0-1 0%	3-4 43%	10-1 91%	-	-	-	-	-	-	0-1 0%	-	-	-	0-1 0%	0-1 0%	0-1 0%	-
#15	metabot	41-13 76%	10-2 83%	-	8-3 73%	0-1 0%	0-1 0%	-	-	-	-	-	-	10-1 91%	-	-	-	1-2 33%	2-3 40%	10-0 100%	-
#16	letabot	26-18 59%	4-5 44%	-	-	-	1-2 33%	-	10-0 100%	-	-	8-5 62%	-	-	-	-	-	0-1 0%	0-1 0%	3-4 43%	-
#17	arrakhammer	27-22 55%	-	7-8 47%	-	-	10-0 100%	0-1 0%	-	-	0-1 0%	-	3-4 43%	-	5-4 56%	-	2-3 40%	-	-	-	0-1 0%
#18	ecgberht	38-18 68%	0-1 0%	-	-	-	10-0 100%	-	0-1 0%	-	-	1-2 33%	-	-	-	-	-	10-7 59%	10-0 100%	7-7 50%	-
#19	ualbertabot	50-10 83%	-	-	-	-	10-1 91%	-	0-1 0%	10-0 100%	10-4 71%	-	-	-	10-4 71%	10-0 100%	-	-	-	-	-
#20	ximp	27-15 64%	2-3 40%	-	0-1 0%	0-1 0%	0-1 0%	-	-	-	-	-	-	10-0 100%	-	-	-	5-6 45%	0-1 0%	10-2 83%	-
#21	cdbot	46-13 78%	-	10-0 100%	-	-	0-1 0%	1-2 33%	-	-	4-5 44%	-	10-3 77%	-	1-2 33%	-	10-0 100%	-	-	-	10-0 100%
#22	aiur	48-15 76%	1-2 33%	-	10-1 91%	7-5 58%	0-1 0%	-	-	-	-	-	-	9-3 75%	-	-	-	1-2 33%	10-1 91%	10-0 100%	-
#23	killall	40-5 89%	-	10-0 100%	-	-	0-1 0%	10-0 100%	-	-	0-1 0%	-	10-0 100%	-	10-1 91%	-	0-1 0%	-	-	-	0-1 0%
#24	willyt	34-10 77%	4-5 44%	-	-	-	0-1 0%	-	0-1 0%	-	-	0-1 0%	-	-	-	-	-	10-2 83%	10-0 100%	10-0 100%	-
#25	ailien	28-32 47%	-	9-10 47%	-	-	1-4 20%	0-1 0%	-	-	3-6 33%	-	0-1 0%	-	10-2 83%	-	5-7 42%	-	-	-	0-1 0%
#26	cunybot	67-1 99%	-	10-0 100%	-	-	10-0 100%	0-1 0%	-	-	10-0 100%	-	10-0 100%	-	7-0 100%	-	10-0 100%	-	-	-	10-0 100%
#27	hellbot	74-0 100%	10-0 100%	-	10-0 100%	6-0 100%	10-0 100%	-	-	-	-	-	-	8-0 100%	-	-	-	10-0 100%	10-0 100%	10-0 100%	-
overall		- 42%	35-101 26%	61-39 61%	29-66 31%	16-66 20%	79-113 41%	21-28 43%	10-23 30%	10-0 100%	48-43 53%	9-28 24%	43-20 68%	38-65 37%	53-31 63%	10-0 100%	38-26 59%	38-101 27%	42-82 34%	62-94 40%	32-20 62%

The total column tells how successful Microwave was in recent games against each opponent. You might want to compare the percentages against the overall win rates from the official crosstable; they sometimes vary curiously. When the recorded results were less successful than the total results, it suggests that Microwave may have forgotten too much (though it could be random fluctuation). For example, Microwave scored 80% against LetaBot overall, but 59% in the recent games in this table.

The overall row tells how successful each opening was in recent games. Every opening was successful against some opponents, so there were no useless strategies. The body of the table, from #10 ZZZKBot and down, is full of strong contrasts, meaning that there was a big difference between the successful and unsuccessful openings against each opponent. That suggests that learning must have been useful.

SAIDA again under threat

Another brief note: On SSCAIT, SAIDA is again threatening to topple from its #1 position. I expect it would hold #1 easily if there were no voting, but voters distort the pairings, and its top opposition has been chipping away at its dominance. SAIDA’s win rate has fallen to about 3/4, from a high over 9/10. Will SAIDA get another update soon and recover?

AIIDE 2018 - what Locutus learned

The Locutusoids have learning data only slightly different from Steamhammer’s. I have run my summarizer code for CSE, BlueBlueSky, Locutus, and ISAMind, skipping DaQin because it recorded only 1 game per opponent (which tickles a bug in my code). I am thinking of posting only the Locutus results, because the others don’t hold much extra interest. Locutus plays a wider range of openings than the others (perhaps because newer bots have to restrict their scope). CSE in particular is more in the do-one-thing-well camp. Besides, all of them had high win rates against lower-ranked opponents; they did not have much to learn. I don’t see a point in piling up data about similar players.

But if people want, I can post them all. Any requests?

Locutus is the only Locutusoid to use pre-learned data. Some of the others had their own ways of preparing for known opponents. For example, CSE is configured with several enemy-specific strategies, such as DT drop against #9 Iron.

Here is a summary of the pre-learned data used by Locutus. Locutus is configured to retain at most 200 game records per opponent, so that’s as much pre-learned data as it makes sense to give it. When you give it that much, each tournament game record added at the end causes one pre-learned record to scroll off the beginning. At the end of a 100 round tournament, half the game records are retained from the pre-learned data and half are tournament games—the pre-learned data more or less dominated tournament data for decisions during the tournament.

#	opponent	games	wins
7	DaQin	35	91%
9	Iron	200	93%
10	ZZZKBot	200	76%
14	Tyr	200	96%
17	Arrakhammer	200	88%
19	UAlbertaBot	71	100%
22	AIUR	51	96%
25	AILien	200	96%

Here is the final data. For every opponent that has pre-learned data, much or all of the per-learned data is retained until the end.

#1 saida

opening	games	wins
10-15GateGoon	22	0%
10Gate25NexusFE	29	7%
DTDrop	32	6%
Proxy4GateGoon	7	0%
Proxy4GateGoon2p	3	0%
Proxy9-9Gate	10	0%
6 openings	103	4%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	102	99%	4%	102	99%	3%	99%	0%
Proxy		0%	0%	1	1%	100%	0%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

Locutus and the Locutusoids use “Not fast rush” as a catch-all: The enemy’s opening is not a fast rush, and it is not more precisely recognized than that.

#2 cherrypi

opening	games	wins
ForgeExpand4Gate2Archon	19	16%
ForgeExpand5GateGoon	55	5%
ForgeExpandSpeedlots	16	6%
ProxyHeavyZealotRush	6	17%
ProxyHeavyZealotRush2p	7	57%
5 openings	103	12%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	13	13%	23%	35	34%	20%	23%	0%
Not fast rush	89	86%	10%	68	66%	7%	64%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

Why are the successful proxy openings so little played? The “2p” version is played only on 2-player maps; the other version only on 3- and 4-player maps. Looking into the file by hand, I see that they were both successful from early in the tournament, so it’s not a matter of discovering them late. Perhaps the map size specialization interferes with the learning process? Perhaps they are deliberately little played to prevent the opponent from adapting? Have to read the code for this one. The proxy openings show similar numbers across other opponents, so it's not a one-off. Locutus’s learning in general does not look like it concentrates hard on playing the best-performing openings.

#3 cse

opening	games	wins
2GateDTExpo	3	0%
2GateDTRush	24	38%
4GateGoon	46	30%
Proxy4GateGoon	4	50%
Proxy4GateGoon2p	8	62%
Proxy9-9Gate	6	0%
ProxyHeavyZealotRush	4	0%
ProxyHeavyZealotRush2p	2	50%
Turtle	6	50%
9 openings	103	33%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	10	10%	40%	28	27%	43%	10%	0%
Fast rush		0%	0%	6	6%	0%	0%	0%
Heavy rush		0%	0%	3	3%	100%	0%	0%
Not fast rush	92	89%	33%	66	64%	29%	63%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

#4 bluebluesky

opening	games	wins
2GateDTExpo	13	31%
2GateDTRush	7	43%
4GateGoon	58	43%
9-9GateDefensive	3	0%
Proxy4GateGoon	1	100%
Proxy4GateGoon2p	2	100%
Proxy9-9Gate	2	0%
ProxyHeavyZealotRush	2	0%
ProxyHeavyZealotRush2p	1	0%
Turtle	14	29%
10 openings	103	38%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	60	58%	32%	55	53%	31%	77%	0%
Not fast rush	39	38%	51%	45	44%	49%	82%	0%
Proxy	3	3%	0%	3	3%	0%	67%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

#6 isamind

opening	games	wins
2GateDTRush	17	71%
4GateGoon	60	58%
9-9GateDefensive	6	33%
Proxy4GateGoon	2	100%
Proxy4GateGoon2p	3	67%
Proxy9-9Gate	1	0%
ProxyHeavyZealotRush	2	0%
ProxyHeavyZealotRush2p	1	0%
Turtle	11	55%
9 openings	103	57%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar		0%	0%	1	1%	100%	0%	0%
Fast rush	5	5%	60%	7	7%	100%	20%	0%
Heavy rush	13	13%	54%	7	7%	71%	15%	0%
Not fast rush	78	76%	59%	85	83%	51%	85%	0%
Proxy	6	6%	33%	3	3%	100%	0%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#7 daqin

opening	games	wins
2GateDTExpo	4	100%
2GateDTRush	25	100%
4GateGoon	44	98%
9-9GateDefensive	19	68%
Proxy4GateGoon	6	83%
Proxy4GateGoon2p	1	100%
Proxy9-9Gate	4	75%
ProxyHeavyZealotRush	2	100%
ProxyHeavyZealotRush2p	1	100%
Turtle	32	38%
10 openings	138	79%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	51	37%	49%	41	30%	78%	31%	0%
Not fast rush	86	62%	97%	97	70%	79%	71%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

Locutus scored lower versus DaQin in the tournament than in the pre-learning data. It may mean that DaQin was updated in private before the tournament. You have to expect that; I assume it is why there were only 35 games in the pre-learning data.

#8 mcrave

opening	games	wins
2GateDTExpo	1	0%
2GateDTRush	27	67%
4GateGoon	49	55%
9-9GateDefensive	6	33%
Proxy4GateGoon	3	67%
Proxy4GateGoon2p	3	67%
Proxy9-9Gate	1	0%
ProxyHeavyZealotRush	4	50%
ProxyHeavyZealotRush2p	1	0%
Turtle	8	25%
10 openings	103	53%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	2	2%	50%	2	2%	0%	0%	0%
Fast rush	13	13%	31%	12	12%	25%	8%	0%
Heavy rush	15	15%	40%	6	6%	83%	7%	0%
Not fast rush	72	70%	61%	83	81%	57%	81%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

#9 iron

opening	games	wins
10-15GateGoon	5	80%
10Gate25NexusFE	105	91%
DTDrop	89	91%
Proxy4GateGoon	1	100%
4 openings	200	91%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	152	76%	91%	74	37%	97%	39%	14%
Unknown	1	0%	100%	22	11%	91%	0%	0%
Wall-in	47	24%	91%	104	52%	87%	70%	0%

#10 zzzkbot

opening	games	wins
ForgeExpand4Gate2Archon	7	86%
ForgeExpand5GateGoon	97	94%
ForgeExpandSpeedlots	86	95%
ProxyHeavyZealotRush	5	80%
ProxyHeavyZealotRush2p	5	40%
5 openings	200	92%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	63	32%	95%	107	54%	91%	54%	0%
Heavy rush	81	40%	90%	74	37%	93%	40%	0%
Not fast rush	56	28%	93%	19	10%	100%	9%	0%

#11 steamhammer

opening	games	wins
ForgeExpand4Gate2Archon	1	100%
ForgeExpand5GateGoon	102	96%
2 openings	103	96%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	2	2%	100%	7	7%	100%	0%	0%
Heavy rush	37	36%	100%	22	21%	100%	19%	0%
Hydra bust	6	6%	67%	14	14%	93%	17%	0%
Not fast rush	57	55%	96%	60	58%	95%	61%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#12 microwave

opening	games	wins
ForgeExpand4Gate2Archon	5	100%
ForgeExpand5GateGoon	83	94%
ForgeExpandSpeedlots	15	93%
3 openings	103	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	2	2%	100%	12	12%	100%	0%	0%
Heavy rush	38	37%	95%	23	22%	100%	21%	0%
Hydra bust	18	17%	94%	16	16%	81%	11%	0%
Not fast rush	44	43%	93%	52	50%	94%	43%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#13 lastorder

opening	games	wins
ForgeExpand5GateGoon	103	98%
1 openings	103	98%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	49	48%	100%	58	56%	97%	55%	0%
Not fast rush	53	51%	96%	45	44%	100%	43%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#14 tyr

opening	games	wins
12Nexus5ZealotFECannons	57	100%
2GateDTExpo	2	50%
4GateGoon	103	100%
9-9GateDefensive	6	67%
Proxy9-9Gate	3	33%
ProxyHeavyZealotRush	1	0%
Turtle	28	89%
7 openings	200	96%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	21	10%	86%	1	0%	100%	0%	0%
Heavy rush	89	44%	100%	18	9%	89%	10%	0%
Not fast rush	80	40%	95%	150	75%	97%	54%	38%
Proxy	6	3%	67%	1	0%	100%	0%	0%
Unknown	4	2%	100%	30	15%	90%	0%	0%

#15 metabot

opening	games	wins
2GateDTRush	35	100%
4GateGoon	47	89%
ProxyHeavyZealotRush	2	100%
Turtle	14	100%
4 openings	98	95%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	17	17%	88%	50	51%	90%	71%	0%
Fast rush	10	10%	100%	1	1%	100%	0%	0%
Heavy rush	2	2%	100%	7	7%	100%	50%	0%
Not fast rush	68	69%	96%	40	41%	100%	49%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#16 letabot

opening	games	wins
10-15GateGoon	1	0%
10Gate25NexusFE	2	50%
4GateGoon	4	75%
DTDrop	96	96%
4 openings	103	93%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	4	4%	75%	1	1%	100%	0%	0%
Not fast rush	40	39%	98%	10	10%	90%	10%	0%
Unknown	2	2%	50%		0%	0%	0%	0%
Wall-in	57	55%	93%	92	89%	93%	89%	0%

#17 arrakhammer

opening	games	wins
ForgeExpand4Gate2Archon	13	69%
ForgeExpand5GateGoon	146	98%
ForgeExpandSpeedlots	25	80%
ProxyHeavyZealotRush	11	55%
ProxyHeavyZealotRush2p	5	60%
5 openings	200	90%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	37	18%	92%	28	14%	100%	3%	0%
Heavy rush	82	41%	88%	96	48%	89%	46%	0%
Naked expand	12	6%	92%	6	3%	83%	25%	8%
Not fast rush	69	34%	93%	69	34%	90%	38%	0%
Unknown		0%	0%	1	0%	100%	0%	0%

#18 ecgberht

opening	games	wins
4GateGoon	53	100%
DTDrop	50	100%
2 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	53	51%	100%	88	85%	100%	81%	0%
Not fast rush	43	42%	100%	15	15%	100%	9%	0%
Unknown	7	7%	100%		0%	0%	0%	0%

#19 ualbertabot

opening	games	wins
4GateGoon	63	100%
9-9GateDefensive	5	100%
ForgeExpand5GateGoon	94	93%
Proxy9-9Gate	12	100%
4 openings	174	96%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	6	3%	100%	6	3%	100%	17%	0%
Fast rush	34	20%	88%	20	11%	100%	18%	0%
Heavy rush	55	32%	96%	37	21%	100%	31%	9%
Hydra bust	10	6%	100%	9	5%	89%	30%	0%
Not fast rush	68	39%	99%	92	53%	93%	46%	6%
Proxy		0%	0%	1	1%	100%	0%	0%
Unknown	1	1%	100%	9	5%	100%	0%	0%

#20 ximp

opening	games	wins
2GateDTRush	2	50%
4GateGoon	101	95%
2 openings	103	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	53	51%	96%	103	100%	94%	100%	0%
Unknown	50	49%	92%		0%	0%	0%	0%

#21 cdbot

opening	games	wins
9-9GateDefensive	1	100%
ForgeExpand5GateGoon	102	100%
2 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	5	5%	100%	10	10%	100%	0%	0%
Heavy rush	43	42%	100%	36	35%	100%	40%	5%
Hydra bust		0%	0%	2	2%	100%	0%	0%
Not fast rush	53	51%	100%	46	45%	100%	43%	8%
Proxy	1	1%	100%	3	3%	100%	0%	0%
Unknown	1	1%	100%	6	6%	100%	0%	0%

#22 aiur

opening	games	wins
10-15GateGoon	3	67%
12Nexus5ZealotFE	5	100%
2GateDTExpo	1	100%
2GateDTRush	4	100%
4GateGoon	114	96%
Proxy4GateGoon	3	100%
Proxy9-9Gate	6	83%
ProxyHeavyZealotRush	3	100%
ProxyHeavyZealotRush2p	1	100%
Turtle	14	93%
10 openings	154	95%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	30	19%	97%	31	20%	94%	33%	0%
Heavy rush	39	25%	92%	53	34%	98%	28%	0%
Naked expand	13	8%	85%	3	2%	67%	23%	38%
Not fast rush	72	47%	97%	55	36%	93%	44%	1%
Proxy		0%	0%	6	4%	100%	0%	0%
Unknown		0%	0%	6	4%	100%	0%	0%

#23 killall

opening	games	wins
ForgeExpand5GateGoon	103	98%
1 openings	103	98%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	3	3%	100%	8	8%	100%	0%	0%
Heavy rush	45	44%	98%	38	37%	97%	22%	0%
Hydra bust		0%	0%	1	1%	100%	0%	0%
Not fast rush	54	52%	98%	56	54%	98%	41%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#24 willyt

opening	games	wins
10-15GateGoon	8	100%
10Gate25NexusFE	7	100%
4GateGoon	64	100%
DTDrop	21	100%
Turtle	3	100%
5 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	67	65%	100%	64	62%	100%	69%	0%
Not fast rush	35	34%	100%	36	35%	100%	46%	0%
Proxy		0%	0%	3	3%	100%	0%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#25 ailien

opening	games	wins
ForgeExpand4Gate2Archon	24	96%
ForgeExpand5GateGoon	33	97%
ForgeExpandSpeedlots	128	98%
ProxyHeavyZealotRush	12	83%
ProxyHeavyZealotRush2p	3	100%
5 openings	200	97%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	132	66%	98%	101	50%	96%	57%	2%
Naked expand		0%	0%	2	1%	100%	0%	0%
Not fast rush	68	34%	96%	95	48%	98%	62%	0%
Unknown		0%	0%	2	1%	100%	0%	0%

#26 cunybot

opening	games	wins
ForgeExpand5GateGoon	93	100%
1 openings	93	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	1	1%	100%	2	2%	100%	0%	0%
Heavy rush	44	47%	100%	23	25%	100%	25%	2%
Not fast rush	47	51%	100%	65	70%	100%	72%	4%
Unknown	1	1%	100%	3	3%	100%	0%	0%

#27 hellbot

opening	games	wins
2GateDTRush	20	100%
4GateGoon	83	100%
2 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	49	48%	100%	103	100%	100%	100%	0%
Unknown	54	52%	100%		0%	0%	0%	0%

overall

	total		PvT		PvP		PvZ		PvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
10-15GateGoon	39	36%	36	33%	3	67%
10Gate25NexusFE	143	74%	143	74%
12Nexus5ZealotFE	5	100%			5	100%
12Nexus5ZealotFECannons	57	100%			57	100%
2GateDTExpo	24	42%			24	42%
2GateDTRush	161	79%			161	79%
4GateGoon	889	85%	121	99%	705	82%			63	100%
9-9GateDefensive	46	59%			40	52%	1	100%	5	100%
DTDrop	288	85%	288	85%
ForgeExpand4Gate2Archon	69	68%					69	68%
ForgeExpand5GateGoon	1011	92%					917	92%	94	93%
ForgeExpandSpeedlots	270	90%					270	90%
Proxy4GateGoon	27	59%	8	12%	19	79%
Proxy4GateGoon2p	20	60%	3	0%	17	71%
Proxy9-9Gate	45	47%	10	0%	23	39%			12	100%
ProxyHeavyZealotRush	54	56%			20	45%	34	62%
ProxyHeavyZealotRush2p	27	56%			7	43%	20	60%
Turtle	130	63%	3	100%	127	62%
total	3305	83%	612	80%	1208	77%	1311	89%	174	96%
openings played	18		8		13		6		4

AIIDE 2018 - what Steamhammer learned

In CIG, Steamhammer was broken. My findings on what Steamhammer learned in CIG 2018 are not valid, because Steamhammer rarely played the opening it thought it was playing; it played a broken version of the opening that left out drones and buildings. That is likely why the zergling rushes were successful in CIG: There was little in the build to leave out, so the build played more nearly as written. In this tournament, Steamhammer seems to have been working fine (though we’ll see when the replays come out)—well, working fine except for the usual bugs, some of which are fixed in version 2.1. Also, Steamhammer’s learning was revamped to better bamboozle opponents that tried to learn its patterns; the result is that its learning behavior is richer. I think these tables are full of interesting data.

103 rounds were played, of which 100 were official. Steamhammer is set to record at most 100 game records per opponent, so games from the first 3 rounds may have been dropped. That’s why the numbers don’t exactly match the official crosstable, even though the game totals look correct.

Steamhammer’s game records contain much more information than I can summarize in tidy little tables. This time I captured a little more of it, adding a table about the plan recognizer. For each plan that was recognized during a game, the table shows how often the plan was predicted before the game, how often it was recognized during the game, and the win rate in each of those cases. It also tries to measure the accuracy of the prediction. The plan recognizer itself is not very accurate; it often fails to recognize what is in front of it, calling the plan Unknown. The “?” column shows how often the plan was predicted and then no plan was recognized. The plan recognizer can also blow it completely and recognize the wrong plan. When the opponent plays predictably, the plan predictor is generally more accurate than the plan recognizer. When the opponent plays unpredictably, I don’t know which is more accurate! Either way, the plan prediction is more important early in the tournament; once Steamhammer has accumulated enough experience, it pays more attention to its learning data, and it doesn’t matter whether the predicted plan is good.

#1 saida

opening	games	wins
11Gas10PoolLurker	3	0%
11Gas10PoolMuta	1	0%
11HatchTurtleHydra	1	0%
2HatchHydraBust	1	0%
3HatchHydraExpo	1	0%
3HatchLurker	1	0%
4HatchBeforeGas	2	0%
4PoolHard	3	0%
5PoolHard	1	0%
5PoolSoft	1	0%
6Pool	1	0%
7PoolSoft	2	0%
9Hatch8Pool	2	0%
9HatchExpo9Pool9Gas	1	0%
9Pool	1	0%
9PoolExpo	1	0%
9PoolLurker	8	12%
9PoolSpeedAllIn	1	0%
9PoolSunkSpeed	1	0%
AntiFact_13Pool	8	0%
AntiFact_2Hatch	12	0%
AntiFactory	16	0%
AntiZeal_12Hatch	2	0%
Over10Hatch2SunkHard	1	0%
OverhatchLateGas	1	0%
Overpool+1	1	0%
OverpoolHatch	1	0%
PurpleSwarmBuild	1	0%
Sparkle 2HatchMuta	2	0%
ZvP_3HatchPoolHydra	1	0%
ZvT_12PoolMuta	2	0%
ZvT_2HatchMuta	1	0%
ZvT_3HatchMuta	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool11Gas	13	15%
ZvZ_Overpool9Gas	1	0%
ZvZ_OverpoolTurtle	1	0%
38 openings	100	3%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	100	100%	3%	91	91%	3%	91%	2%
Naked expand		0%	0%	7	7%	0%	0%	0%
Unknown		0%	0%	2	2%	0%	0%	0%

SAIDA is a good example of how Steamhammer reacts to a predictable opponent. First, it repeatedly tried its counters to the opponent’s Factory plan, the 3 “AntiFact” openings (you may call them fake news openings if you like). In this case the counters did not work; SAIDA is too strong. Then it explored more widely. Steamhammer scored 1 win with a fast lurker opening, and repeated the opening to no avail (maybe Steamhammer got lucky once, or maybe SAIDA learned the timing). It also scored a win with a ZvZ fast mutalisk opening, and repeating that did bring a second win for a total of 3 in 100 rounds. The smaller second table shows that the plan predictor was 100% accurate over the last 100 rounds in predicting SAIDA’s factory-first play, while the plan recognizer was 91% accurate and actually saw a command center first in 7 games.

#2 cherrypi

opening	games	wins
2.5HatchMuta	1	0%
3HatchPoolMuta	1	0%
4HatchBeforeGas	1	0%
4PoolSoft	1	0%
6PoolSpeed	2	0%
7PoolHard	1	0%
8Hatch7Pool	1	0%
9Hatch8Pool	1	0%
9PoolSunkSpeed	1	0%
OverhatchLing	1	0%
OverhatchMuta	1	0%
OverpoolSpeed	1	0%
OverpoolSunk	1	0%
ZvP_2HatchMuta	1	0%
ZvP_3BaseSpire+Den	1	0%
ZvT_12PoolMuta	1	0%
ZvT_3HatchMuta	1	0%
ZvT_3HatchMutaExpo	1	0%
ZvZ_12HatchMain	21	14%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	3	0%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool9Gas	30	30%
ZvZ_OverpoolTurtle	25	32%
24 openings	100	20%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	22	22%	14%	1	1%	0%	0%	100%
Heavy rush	77	77%	22%	28	28%	25%	35%	61%
Naked expand	1	1%	0%	2	2%	0%	0%	0%
Unknown		0%	0%	69	69%	19%	0%	0%

Steamhammer sees CherryPi as a strategy switcher. I suspect that CherryPi did not actually play any fast zergling rushes, because they said they avoided risky openings, but I can’t be sure without a closer look. In any case, Steamhammer found answers and scored a respectable 20% against a much higher ranked opponent.

#3 cse

opening	games	wins
11Gas10PoolLurker	1	0%
11Gas10PoolMuta	10	20%
11HatchTurtleHydra	2	0%
11HatchTurtleLurker	1	0%
12HatchTurtle	1	0%
2.5HatchMuta	1	0%
2HatchHydra	1	0%
2HatchHydraBust	5	0%
2HatchLurkerAllIn	1	0%
3HatchHydraBust	9	0%
3HatchHydraExpo	1	0%
3HatchLingBust	3	0%
3HatchLingExpo	1	0%
3HatchLurker	2	0%
3HatchPoolMuta	1	0%
4HatchBeforeGas	6	0%
4PoolHard	2	0%
5PoolHard2Player	2	0%
5PoolSoft	1	0%
7PoolHard	2	0%
7PoolSoft	1	0%
8Pool	3	0%
9HatchExpo9Pool9Gas	1	0%
9PoolExpo	1	0%
9PoolHatch	1	0%
9PoolSpeedAllIn	2	0%
9PoolSpire	2	0%
AntiFact_2Hatch	1	0%
AntiZeal_12Hatch	1	0%
Over10Hatch2SunkHard	1	0%
Over10HatchBust	2	0%
Over10HatchSlowLings	2	0%
OverhatchExpoLing	3	0%
OverhatchExpoMuta	1	0%
OverhatchMuta	1	0%
Overpool+1	1	0%
OverpoolHydra	1	0%
OverpoolLurker	1	0%
OverpoolSpeed	2	0%
PurpleSwarmBuild	1	0%
Sparkle 1HatchMuta	1	0%
ZvP_2HatchMuta	5	0%
ZvP_3BaseSpire+Den	3	0%
ZvP_3HatchPoolHydra	4	0%
ZvP_4HatchPoolHydra	1	0%
ZvZ_12Pool	2	0%
ZvZ_Overpool11Gas	1	0%
ZvZ_Overpool9Gas	1	0%
48 openings	100	2%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush		0%	0%	4	4%	0%	0%	0%
Safe expand	19	19%	0%	33	33%	0%	32%	5%
Turtle	81	81%	2%	60	60%	3%	60%	2%
Unknown		0%	0%	3	3%	0%	0%	0%

Steamhammer has trouble telling the difference between Safe Expand (in the protoss case, forge expand with cannons) and Turtle (hide behind cannons), because it does not scout well enough to see the natural nexus reliably. It compensates by reacting similarly in both cases. But the opponent is still seen as an unpredictable strategy switcher, so Steamhammer switches up its openings too. In this case it has more counter openings and tries each fewer times, so they are not as obvious in the table, but they do have higher counts: See 2HatchHydraBust, 3HatchHydraBust, 3HatchLingBust, 4HatchBeforeGas, ZvP_2HatchMuta, and ZvP_3BaseSpire+Den. As against SAIDA, Steamhammer scored 2 wins with a ZvZ fast mutalisk opening. I have an idea to add another exploration phase which experiments with all-in attacks like the fast mutas.

#4 bluebluesky

opening	games	wins
11Gas10PoolLurker	2	0%
11Gas10PoolMuta	1	0%
11HatchTurtleHydra	2	0%
2.5HatchMuta	1	0%
2HatchHydraBust	5	0%
2HatchLurker	1	0%
2HatchLurkerAllIn	1	0%
3HatchHydraBust	1	0%
3HatchLingBust	1	0%
3HatchLingExpo	1	0%
4HatchBeforeGas	3	0%
4PoolSoft	1	0%
5PoolHard	1	0%
7PoolHard	10	10%
8Pool	1	0%
9HatchExpo9Pool9Gas	18	11%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	3	0%
9PoolSpeedAllIn	3	0%
AntiFact_2Hatch	1	0%
Over10Hatch	2	0%
Over10Hatch1Sunk	1	0%
Over10Hatch2Sunk	2	0%
Over10Hatch2SunkHard	1	0%
OverhatchExpoLing	2	0%
Overpool+1	1	0%
OverpoolHatch	1	0%
OverpoolHydra	1	0%
OverpoolSpeed	1	0%
OverpoolTurtle	1	0%
PurpleSwarmBuild	1	0%
Sparkle 1HatchMuta	1	0%
Sparkle 2HatchMuta	1	0%
Sparkle 3HatchMuta	1	0%
ZvP_2HatchMuta	4	0%
ZvP_3BaseSpire+Den	7	0%
ZvP_3HatchPoolHydra	6	0%
ZvT_13Pool	1	0%
ZvZ_Overgas11Pool	1	0%
ZvZ_Overgas9Pool	3	0%
ZvZ_Overpool11Gas	2	0%
ZvZ_Overpool9Gas	1	0%
42 openings	100	3%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	7	7%	0%	20	20%	5%	29%	0%
Naked expand		0%	0%	1	1%	100%	0%	0%
Safe expand	53	53%	2%	45	45%	0%	58%	2%
Turtle	40	40%	5%	33	33%	3%	45%	0%
Unknown		0%	0%	1	1%	0%	0%	0%

Different all-ins took a few wins from BlueBlueSky.

#5 locutus

opening	games	wins
11Gas10PoolLurker	2	0%
11HatchTurtleLurker	1	0%
12HatchTurtle	1	0%
2HatchHydra	1	0%
2HatchHydraBust	5	0%
2HatchLurker	2	0%
2HatchLurkerAllIn	2	0%
3HatchHydra	1	0%
3HatchHydraBust	3	0%
3HatchHydraExpo	1	0%
3HatchLingBust	25	12%
3HatchLingExpo	2	0%
4PoolSoft	1	0%
5PoolHard	2	0%
6PoolSpeed	1	0%
8Hatch7Pool	1	0%
8Pool	1	0%
9HatchExpo9Pool9Gas	1	0%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	1	0%
9PoolSpeedAllIn	1	0%
AntiFact_13Pool	1	0%
AntiFact_2Hatch	1	0%
AntiFactory	1	0%
AntiZeal_12Hatch	1	0%
Over10Hatch	1	0%
Over10Hatch2SunkHard	1	0%
OverhatchExpoMuta	2	0%
OverhatchLateGas	1	0%
OverpoolHydra	1	0%
OverpoolSpeed	1	0%
OverpoolSunk	1	0%
OverpoolTurtle	1	0%
PurpleSwarmBuild	2	0%
Sparkle 2HatchMuta	1	0%
Sparkle 3HatchMuta	1	0%
ZvP_2HatchMuta	5	0%
ZvP_3BaseSpire+Den	4	0%
ZvP_3HatchPoolHydra	5	0%
ZvP_Overpool3Hatch	1	0%
ZvT_12PoolMuta	4	0%
ZvT_13Pool	1	0%
ZvT_2HatchMuta	1	0%
ZvT_3HatchMuta	1	0%
ZvZ_12Pool	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	1	0%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool9Gas	1	0%
49 openings	100	3%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush		0%	0%	4	4%	25%	0%	0%
Safe expand	62	62%	3%	55	55%	0%	60%	0%
Turtle	38	38%	3%	41	41%	5%	50%	0%

#6 isamind

opening	games	wins
11Gas10PoolLurker	1	0%
11Gas10PoolMuta	1	0%
2.5HatchMuta	1	0%
2HatchHydra	1	0%
2HatchHydraBust	6	0%
2HatchLurker	1	0%
3HatchHydra	1	0%
3HatchHydraBust	5	0%
3HatchLingBust	5	0%
4HatchBeforeGas	3	0%
4PoolHard	1	0%
4PoolSoft	2	0%
5PoolHard2Player	1	0%
5PoolSoft	1	0%
7PoolHard	11	18%
7PoolMid	1	0%
7PoolSoft	1	0%
8Hatch7Pool	1	0%
8Pool	1	0%
9HatchExpo9Pool9Gas	3	0%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	1	0%
9PoolSunkHatch	1	0%
AntiFact_13Pool	1	0%
AntiZeal_12Hatch	1	0%
Over10Hatch	1	0%
Over10Hatch1Sunk	2	0%
Over10Hatch2Sunk	1	0%
Over10Hatch2SunkHard	1	0%
Over10HatchSlowLings	1	0%
OverhatchExpoLing	3	0%
OverpoolHatch	8	12%
OverpoolHydra	1	0%
OverpoolLurker	2	0%
OverpoolSpeed	2	0%
PurpleSwarmBuild	1	0%
ZvP_2HatchMuta	2	0%
ZvP_3BaseSpire+Den	4	0%
ZvP_3HatchPoolHydra	6	17%
ZvP_Overpool3Hatch	3	0%
ZvT_2HatchMuta	4	0%
ZvT_3HatchMutaExpo	1	0%
ZvZ_12HatchMain	1	0%
ZvZ_12PoolMain	1	0%
ZvZ_Overpool11Gas	1	0%
ZvZ_OverpoolTurtle	1	0%
46 openings	100	4%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	17	17%	12%	14	14%	14%	65%	6%
Proxy	2	2%	0%	2	2%	0%	0%	0%
Safe expand	62	62%	3%	47	47%	2%	47%	5%
Turtle	19	19%	0%	33	33%	3%	26%	0%
Unknown		0%	0%	4	4%	0%	0%	0%

#7 daqin

opening	games	wins
11Gas10PoolMuta	8	12%
2HatchHydra	2	0%
2HatchHydraBust	5	0%
2HatchLurkerAllIn	5	0%
3HatchHydra	2	0%
3HatchHydraBust	3	0%
3HatchHydraExpo	2	0%
3HatchLing	1	0%
3HatchLingBust	4	0%
3HatchLingExpo	1	0%
4HatchBeforeGas	4	0%
4PoolSoft	1	0%
5PoolHard2Player	2	0%
6PoolSpeed	3	0%
8Hatch7Pool	1	0%
9HatchExpo9Pool9Gas	1	0%
9PoolHatch	2	0%
9PoolSpeedAllIn	3	0%
9PoolSpire	1	0%
9PoolSunkHatch	3	0%
9PoolSunkSpeed	2	0%
AntiFact_13Pool	1	0%
AntiFact_2Hatch	2	0%
AntiZeal_12Hatch	1	0%
Over10Hatch1Sunk	2	0%
Over10Hatch2Sunk	3	0%
OverhatchExpoLing	1	0%
OverhatchExpoMuta	4	0%
OverhatchLateGas	1	0%
OverhatchLing	1	0%
OverpoolHatch	1	0%
OverpoolHydra	2	0%
OverpoolLurker	1	0%
OverpoolSpeed	4	0%
OverpoolSunk	1	0%
OverpoolTurtle	1	0%
Sparkle 1HatchMuta	2	0%
ZvP_2HatchMuta	2	0%
ZvP_3BaseSpire+Den	3	0%
ZvP_3HatchPoolHydra	2	0%
ZvP_4HatchPoolHydra	1	0%
ZvT_12PoolMuta	1	0%
ZvT_3HatchMutaExpo	1	0%
ZvZ_12HatchExpo	1	0%
ZvZ_12HatchMain	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_Overgas11Pool	1	0%
ZvZ_OverpoolTurtle	2	0%
48 openings	100	1%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush		0%	0%	3	3%	0%	0%	0%
Proxy	10	10%	0%	16	16%	0%	0%	0%
Safe expand	35	35%	0%	34	34%	0%	29%	6%
Turtle	55	55%	2%	41	41%	2%	40%	7%
Unknown		0%	0%	6	6%	0%	0%	0%

#8 mcrave

opening	games	wins
11HatchTurtleHydra	12	50%
2HatchHydra	11	36%
2HatchLurker	2	50%
2HatchLurkerAllIn	1	0%
3HatchHydraBust	7	43%
3HatchLing	2	0%
3HatchLingBust	1	0%
AntiZeal_12Hatch	2	0%
Over10Hatch2Hard	1	0%
Over10HatchBust	1	0%
OverhatchLateGas	23	30%
ZvP_3HatchPoolHydra	13	23%
ZvP_Overpool3Hatch	1	0%
ZvT_12PoolMuta	1	0%
ZvZ_OverpoolTurtle	22	64%
15 openings	100	38%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	91	91%	37%	51	51%	25%	54%	31%
Safe expand	8	8%	38%	11	11%	45%	0%	62%
Turtle	1	1%	100%	5	5%	20%	0%	0%
Unknown		0%	0%	33	33%	58%	0%	0%

The plan predictor struggled to predict what McRave was going to do next, but learning worked well anyway—eventually. The ZvZ_OverpoolTurtle choice is a big surprise, an opening that builds 3 sunkens and gets fast mutalisks on one base. The opening is sound only against certain all-in zerg strategies; protoss really ought to smash it. I’m guessing it worked against a zealot rush where McRave was slow to switch tech when the mutas showed up.

#9 iron

opening	games	wins
12HatchTurtle	1	0%
2.5HatchMuta	1	0%
3HatchPoolMuta	9	11%
9PoolExpo	8	25%
9PoolSunkHatch	1	0%
AntiFact_13Pool	35	23%
AntiFact_2Hatch	2	0%
AntiFactory	1	0%
AntiZeal_12Hatch	1	0%
OverpoolLurker	1	0%
OverpoolSpeed	1	0%
OverpoolSunk	1	0%
ZvP_4HatchPoolHydra	1	0%
ZvZ_12PoolMain	1	0%
ZvZ_Overgas11Pool	14	50%
ZvZ_Overpool9Gas	22	45%
16 openings	100	28%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	100	100%	28%	91	91%	29%	91%	7%
Turtle		0%	0%	2	2%	0%	0%	0%
Unknown		0%	0%	7	7%	29%	0%	0%

When I run matches locally against Iron, Steamhammer soon settles on AntiFactory as the most reliable answer, and that does seem best. For some reason, Steamhammer behaved differently in both CIG and AIIDE. It is astonishing that ZvZ fast mutalisk openings came out on top again. Exactly as against SAIDA, the plan predictor was 100% accurate while the plan recognizer was 91% accurate.

#10 zzzkbot

opening	games	wins
3HatchHydraBust	1	0%
4PoolHard	1	0%
9PoolSpeedAllIn	14	79%
9PoolSunkHatch	22	32%
OverhatchExpoLing	1	0%
OverhatchLing	1	0%
OverpoolSunk	21	38%
ZvP_3HatchPoolHydra	1	0%
ZvP_4HatchPoolHydra	1	0%
ZvZ_Overgas9Pool	25	44%
ZvZ_Overpool11Gas	5	20%
ZvZ_Overpool9Gas	1	0%
ZvZ_OverpoolTurtle	6	17%
13 openings	100	39%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	77	77%	42%	21	21%	57%	22%	75%
Heavy rush	14	14%	21%	2	2%	0%	0%	86%
Turtle	9	9%	44%	2	2%	100%	22%	56%
Unknown		0%	0%	75	75%	33%	0%	0%

9PoolSunkHatch and OverpoolSunk are anti-rush openings, and 9PoolSpeedAllIn is general-purpose but good against rushes. In contrast, ZvZ_Overgas9Pool is a fast mutalisk opening and can be overrun by too many zerglings. I don’t know how accurate the plan predictions are, but they agree fairly well with the selected openings.

#12 microwave

opening	games	wins
11Gas10PoolMuta	28	32%
3HatchHydraBust	1	0%
3HatchLing	1	0%
3HatchLingExpo	1	0%
3HatchLurker	1	0%
4PoolSoft	12	17%
5PoolHard2Player	1	0%
9HatchMain9Pool9Gas	2	0%
9PoolSpeed	1	0%
9PoolSpeedAllIn	1	0%
9PoolSunkSpeed	2	0%
AntiFact_2Hatch	1	0%
OverhatchLing	2	0%
OverpoolSunk	4	25%
ZvZ_12HatchMain	2	0%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	2	0%
ZvZ_Overgas9Pool	2	0%
ZvZ_Overpool11Gas	10	20%
ZvZ_Overpool9Gas	23	39%
ZvZ_OverpoolTurtle	2	0%
21 openings	100	23%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	15	15%	27%	10	10%	50%	13%	53%
Heavy rush	42	42%	17%	20	20%	40%	14%	45%
Naked expand	43	43%	28%	21	21%	5%	21%	49%
Turtle		0%	0%	1	1%	0%	0%	0%
Unknown		0%	0%	48	48%	19%	0%	0%

Microwave really mixed things up, and it was successful! Steamhammer could not predict the opening switches. It’s interesting that when Steamhammer predicted a fast rush, it won a quarter of the time, and when it actually recognized a fast rush, it won half the time. That doesn’t tell us what actually happened in the games. When Steamhammer recognizes a fast rush, it can react no matter what opening it is playing, and often save itself. When it is rushed and doesn’t recognize it, it will lose unless it is playing a safe opening.

#13 lastorder

opening	games	wins
3HatchLingBust	12	33%
4PoolHard	1	0%
4PoolSoft	21	29%
6PoolSpeed	1	0%
AntiFactory	1	0%
Over10Hatch	1	0%
Over10Hatch1Sunk	4	25%
OverhatchLing	2	0%
OverhatchMuta	7	29%
PurpleSwarmBuild	1	0%
ZvP_3HatchPoolHydra	1	0%
ZvT_3HatchMutaExpo	6	33%
ZvZ_12HatchMain	13	31%
ZvZ_12PoolLing	5	20%
ZvZ_12PoolMain	5	0%
ZvZ_Overpool11Gas	17	35%
ZvZ_OverpoolTurtle	2	0%
17 openings	100	26%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	100	100%	26%	77	77%	25%	77%	14%
Naked expand		0%	0%	3	3%	0%	0%	0%
Turtle		0%	0%	6	6%	17%	0%	0%
Unknown		0%	0%	14	14%	43%	0%	0%

LastOrder did not learn during the tournament and played predictably, yet Steamhammer struggled to find an answer. We also know that LastOrder learned extensively offline before the tournament. Knowing that, and looking at these tables (check out the variety of recognized plans and the variety of Steamhammer’s more successful openings), I get the impression that LastOrder is highly adaptive and knows how to react in a wide variety of situations. I guess we’ll see when the replays come out.

#14 tyr

opening	games	wins
2HatchHydraBust	13	38%
2HatchLurkerAllIn	14	43%
3HatchHydraExpo	38	76%
4HatchBeforeGas	2	0%
4PoolHard	4	25%
9PoolSunkSpeed	1	0%
Over10Hatch2Hard	1	0%
Over10HatchBust	1	0%
OverpoolLurker	7	29%
OverpoolSpeed	5	100%
ZvP_3BaseSpire+Den	13	62%
ZvP_3HatchPoolHydra	1	0%
12 openings	100	56%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	39	39%	56%	45	45%	78%	41%	3%
Naked expand		0%	0%	1	1%	100%	0%	0%
Turtle	61	61%	56%	50	50%	32%	48%	5%
Unknown		0%	0%	4	4%	100%	0%	0%

These numbers say that anything which helps Steamhammer find the right answers early, without having to do so much random exploration, would be a big win in a long tournament. The plan recognizer is not good enough.

#15 metabot

opening	games	wins
11Gas10PoolLurker	2	50%
11HatchTurtleHydra	6	83%
12HatchTurtle	3	67%
2HatchLurkerAllIn	3	67%
3HatchHydraExpo	1	0%
3HatchLing	11	82%
3HatchLingExpo	10	60%
4PoolHard	1	0%
6PoolSpeed	2	100%
9HatchExpo9Pool9Gas	8	50%
9PoolHatch	3	67%
9PoolSpeedAllIn	2	50%
AntiZeal_12Hatch	1	0%
Over10Hatch	2	50%
Over10Hatch2Hard	1	100%
Over10Hatch2Sunk	3	0%
OverhatchExpoLing	8	62%
OverhatchExpoMuta	14	43%
OverhatchLateGas	4	25%
OverpoolSpeed	4	75%
ZvP_2HatchMuta	2	50%
21 openings	91	57%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	34	37%	65%	19	21%	68%	21%	41%
Naked expand	3	3%	33%	3	3%	100%	0%	33%
Safe expand	34	37%	56%	20	22%	45%	21%	38%
Turtle	19	21%	47%	13	14%	46%	11%	42%
Unknown	1	1%	100%	36	40%	58%	0%	0%

It must have been a crazy learning duel! Later I’ll try to figure out what MetaBot learned, and we can check them against each other.

#16 letabot

opening	games	wins
12HatchTurtle	2	0%
3HatchLing	1	0%
6PoolSpeed	11	64%
9HatchExpo9Pool9Gas	6	33%
9PoolLurker	45	82%
OverpoolHatch	7	71%
OverpoolLurker	28	82%
7 openings	100	74%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	99	99%	74%	59	59%	78%	59%	20%
Safe expand		0%	0%	4	4%	50%	0%	0%
Turtle	1	1%	100%	17	17%	76%	0%	0%
Unknown		0%	0%	20	20%	65%	0%	0%

#17 arrakhammer

opening	games	wins
2HatchLurkerAllIn	1	0%
4PoolHard	22	68%
6PoolSpeed	52	75%
7Pool12Hatch	1	0%
9HatchMain9Pool9Gas	1	0%
9PoolSpeedAllIn	1	0%
AntiFactory	1	0%
Over10Hatch2SunkHard	1	0%
Over10HatchBust	1	0%
Over10HatchSlowLings	1	0%
OverhatchExpoMuta	1	0%
OverhatchLing	1	0%
OverpoolHydra	1	0%
ZvZ_12HatchMain	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	2	0%
ZvZ_Overpool11Gas	11	36%
17 openings	100	58%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	99	99%	58%	78	78%	65%	78%	1%
Naked expand	1	1%	100%	21	21%	29%	0%	0%
Unknown		0%	0%	1	1%	100%	0%	0%

This old version of Arrakhammer has a fixed anti-Steamhammer opening configured. It was written before Steamhammer had learning. Modern Steamhammer can exploit the fixed opening. You can’t get away with that any more.

#18 ecgberht

opening	games	wins
11Gas10PoolLurker	11	91%
11HatchTurtleLurker	51	100%
9PoolLurker	37	97%
OverpoolLurker	1	0%
4 openings	100	97%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	100	100%	97%	67	67%	96%	67%	33%
Unknown		0%	0%	33	33%	100%	0%	0%

#19 ualbertabot

opening	games	wins
3HatchLurker	1	0%
7PoolHard	11	82%
AntiZeal_12Hatch	7	57%
OverhatchExpoMuta	1	0%
OverpoolTurtle	80	98%
5 openings	100	91%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	2	2%	100%	11	11%	100%	0%	0%
Fast rush	12	12%	92%	15	15%	80%	33%	25%
Heavy rush	85	85%	91%	45	45%	89%	45%	22%
Naked expand	1	1%	100%	7	7%	100%	0%	0%
Unknown		0%	0%	22	22%	95%	0%	0%

Getting that 98% win rate is one of the reasons I added the seemingly nonsensical overpool turtle opening, which makes an absurd 6 sunkens on one base. It works against all kinds of rushes, fast or slow, when the rusher does not know how to adapt.

#20 ximp

opening	games	wins
3HatchHydraExpo	17	82%
4HatchBeforeGas	36	83%
9Hatch8Pool	1	0%
AntiFactory	1	0%
ZvP_2HatchMuta	9	78%
ZvP_3BaseSpire+Den	36	78%
6 openings	100	79%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Safe expand	3	3%	100%	18	18%	94%	0%	0%
Turtle	97	97%	78%	78	78%	76%	77%	4%
Unknown		0%	0%	4	4%	75%	0%	0%

Why didn’t Steamhammer try the 3 hatch before pool opening even once in 100 rounds? I expect it would have scored higher. Well, I know why; when the win rate is so convincing, Steamhammer doesn’t explore much.

#21 cdbot

opening	games	wins
11HatchTurtleHydra	1	0%
9PoolSunkSpeed	15	47%
OverpoolSunk	82	96%
ZvP_Overpool3Hatch	1	0%
ZvZ_12PoolLing	1	0%
5 openings	100	86%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	96	96%	85%	31	31%	71%	29%	57%
Heavy rush	4	4%	100%	13	13%	100%	0%	25%
Unknown		0%	0%	56	56%	91%	0%	0%

#22 aiur

opening	games	wins
11Gas10PoolLurker	1	0%
3HatchHydraExpo	28	89%
5PoolHard2Player	1	0%
AntiZeal_12Hatch	46	91%
Over10Hatch	24	92%
5 openings	100	89%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	95	95%	89%	65	65%	91%	64%	18%
Naked expand	4	4%	75%	15	15%	73%	0%	25%
Proxy		0%	0%	2	2%	50%	0%	0%
Turtle	1	1%	100%		0%	0%	0%	0%
Unknown		0%	0%	18	18%	100%	0%	0%

Turtle was predicted once but never recognized in the last 100 games. That implies that Steamhammer recognized a turtle opening in the first 3 rounds—and it was wrong, since AIUR doesn’t do that; it must have been a misrecognized cannon rush, a bug that has crept in. Comparing against what AIUR learned, I see that AIUR cannon rushed Steamhammer 3 times total, all failures, and favored its defensive strategy.

#23 killall

opening	games	wins
6PoolSpeed	1	0%
9PoolSpeed	37	100%
ZvZ_12PoolMain	1	0%
ZvZ_OverpoolTurtle	61	93%
4 openings	100	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	75	75%	93%	43	43%	91%	49%	36%
Naked expand	5	5%	80%	12	12%	100%	20%	20%
Turtle	20	20%	100%	10	10%	100%	45%	35%
Unknown		0%	0%	35	35%	94%	0%	0%

#24 willyt

opening	games	wins
11Gas10PoolLurker	30	97%
11HatchTurtleLurker	7	86%
12HatchTurtle	2	0%
2HatchLurkerAllIn	24	96%
6PoolSpeed	1	0%
9PoolLurker	1	0%
OverpoolLurker	35	100%
7 openings	100	93%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	100	100%	93%	85	85%	96%	85%	15%
Unknown		0%	0%	15	15%	73%	0%	0%

#25 ailien

opening	games	wins
3HatchLurker	1	0%
6PoolSpeed	1	0%
9PoolSpeedAllIn	1	0%
OverhatchLing	1	0%
ZvT_3HatchMuta	1	0%
ZvZ_Overgas9Pool	7	43%
ZvZ_Overpool9Gas	20	85%
ZvZ_OverpoolTurtle	68	93%
8 openings	100	83%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Naked expand	98	98%	85%	3	3%	0%	2%	98%
Unknown	2	2%	0%	97	97%	86%	0%	50%

#26 cunybot

opening	games	wins
11Gas10PoolMuta	1	0%
5PoolHard2Player	3	67%
OverhatchLing	15	93%
OverpoolSpeed	1	0%
ZvZ_12HatchExpo	2	50%
ZvZ_OverpoolTurtle	77	100%
6 openings	99	95%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	4	4%	100%	3	3%	100%	0%	75%
Heavy rush	13	13%	100%	6	6%	83%	0%	62%
Naked expand	62	63%	94%	20	20%	90%	19%	61%
Turtle	19	19%	100%	10	10%	100%	11%	58%
Unknown	1	1%	0%	60	61%	97%	0%	0%

#27 hellbot

opening	games	wins
2HatchHydraBust	5	80%
3HatchHydra	7	100%
3HatchHydraBust	12	100%
3HatchHydraExpo	14	100%
3HatchLingBust	8	100%
4HatchBeforeGas	16	100%
Over10Hatch1Sunk	3	100%
ZvP_2HatchMuta	11	100%
ZvP_3BaseSpire+Den	15	100%
ZvP_3HatchPoolHydra	9	100%
10 openings	100	99%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Turtle	100	100%	99%	76	76%	99%	76%	24%
Unknown		0%	0%	24	24%	100%	0%	0%

overall

	total		ZvT		ZvP		ZvZ		ZvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
11Gas10PoolLurker	53	75%	44	89%	9	11%
11Gas10PoolMuta	50	24%	1	0%	20	15%	29	31%
11HatchTurtleHydra	24	46%	1	0%	22	50%	1	0%
11HatchTurtleLurker	60	95%	58	98%	2	0%
12HatchTurtle	10	20%	5	0%	5	40%
2.5HatchMuta	5	0%	1	0%	3	0%	1	0%
2HatchHydra	16	25%			16	25%
2HatchHydraBust	45	20%	1	0%	44	20%
2HatchLurker	6	17%			6	17%
2HatchLurkerAllIn	52	60%	24	96%	27	30%	1	0%
3HatchHydra	11	64%			11	64%
3HatchHydraBust	42	36%			40	38%	2	0%
3HatchHydraExpo	103	80%	1	0%	102	80%
3HatchLing	16	56%	1	0%	14	64%	1	0%
3HatchLingBust	59	25%			47	23%	12	33%
3HatchLingExpo	16	38%			15	40%	1	0%
3HatchLurker	6	0%	1	0%	2	0%	2	0%	1	0%
3HatchPoolMuta	11	9%	9	11%	1	0%	1	0%
4HatchBeforeGas	73	63%	2	0%	70	66%	1	0%
4PoolHard	35	46%	3	0%	8	12%	24	62%
4PoolSoft	39	21%			5	0%	34	24%
5PoolHard	4	0%	1	0%	3	0%
5PoolHard2Player	10	20%			6	0%	4	50%
5PoolSoft	3	0%	1	0%	2	0%
6Pool	1	0%	1	0%
6PoolSpeed	75	64%	12	58%	6	33%	57	68%
7Pool12Hatch	1	0%					1	0%
7PoolHard	35	34%			23	13%	1	0%	11	82%
7PoolMid	1	0%			1	0%
7PoolSoft	4	0%	2	0%	2	0%
8Hatch7Pool	4	0%			3	0%	1	0%
8Pool	6	0%			6	0%
9Hatch8Pool	4	0%	2	0%	1	0%	1	0%
9HatchExpo9Pool9Gas	39	21%	7	29%	32	19%
9HatchMain9Pool9Gas	6	0%			3	0%	3	0%
9Pool	1	0%	1	0%
9PoolExpo	10	20%	9	22%	1	0%
9PoolHatch	6	33%			6	33%
9PoolLurker	91	81%	91	81%
9PoolSpeed	43	86%			5	0%	38	97%
9PoolSpeedAllIn	29	41%	1	0%	11	9%	17	65%
9PoolSpire	3	0%			3	0%
9PoolSunkHatch	27	26%	1	0%	4	0%	22	32%
9PoolSunkSpeed	22	32%	1	0%	3	0%	18	39%
AntiFact_13Pool	46	17%	43	19%	3	0%
AntiFact_2Hatch	20	0%	14	0%	5	0%	1	0%
AntiFactory	21	0%	17	0%	2	0%	2	0%
AntiZeal_12Hatch	63	73%	3	0%	53	79%			7	57%
Over10Hatch	31	74%			30	77%	1	0%
Over10Hatch1Sunk	12	33%			8	38%	4	25%
Over10Hatch2Hard	3	33%			3	33%
Over10Hatch2Sunk	9	0%			9	0%
Over10Hatch2SunkHard	6	0%	1	0%	4	0%	1	0%
Over10HatchBust	5	0%			4	0%	1	0%
Over10HatchSlowLings	4	0%			3	0%	1	0%
OverhatchExpoLing	18	28%			17	29%	1	0%
OverhatchExpoMuta	23	26%			21	29%	1	0%	1	0%
OverhatchLateGas	30	27%	1	0%	29	28%
OverhatchLing	24	58%			1	0%	23	61%
OverhatchMuta	9	22%			1	0%	8	25%
Overpool+1	3	0%	1	0%	2	0%
OverpoolHatch	18	33%	8	62%	10	10%
OverpoolHydra	7	0%			6	0%	1	0%
OverpoolLurker	76	79%	65	89%	11	18%
OverpoolSpeed	22	36%	1	0%	19	42%	2	0%
OverpoolSunk	111	79%	1	0%	2	0%	108	81%
OverpoolTurtle	83	94%			3	0%			80	98%
PurpleSwarmBuild	7	0%	1	0%	5	0%	1	0%
Sparkle 1HatchMuta	4	0%			4	0%
Sparkle 2HatchMuta	4	0%	2	0%	2	0%
Sparkle 3HatchMuta	2	0%			2	0%
ZvP_2HatchMuta	41	46%			40	48%	1	0%
ZvP_3BaseSpire+Den	86	59%			85	60%	1	0%
ZvP_3HatchPoolHydra	49	27%	1	0%	46	28%	2	0%
ZvP_4HatchPoolHydra	4	0%	1	0%	2	0%	1	0%
ZvP_Overpool3Hatch	6	0%			5	0%	1	0%
ZvT_12PoolMuta	9	0%	2	0%	6	0%	1	0%
ZvT_13Pool	2	0%			2	0%
ZvT_2HatchMuta	6	0%	1	0%	5	0%
ZvT_3HatchMuta	4	0%	1	0%	1	0%	2	0%
ZvT_3HatchMutaExpo	9	22%			2	0%	7	29%
ZvZ_12HatchExpo	3	33%			1	0%	2	50%
ZvZ_12HatchMain	39	18%			2	0%	37	19%
ZvZ_12Pool	3	0%			3	0%
ZvZ_12PoolLing	12	8%	1	0%	2	0%	9	11%
ZvZ_12PoolMain	16	0%	1	0%	2	0%	13	0%
ZvZ_Overgas11Pool	16	44%	14	50%	2	0%
ZvZ_Overgas9Pool	40	35%	1	0%	4	0%	35	40%
ZvZ_Overpool11Gas	60	25%	13	15%	4	0%	43	30%
ZvZ_Overpool9Gas	100	45%	23	43%	3	0%	74	47%
ZvZ_OverpoolTurtle	267	82%	1	0%	25	56%	241	85%
total	2590	52%	500	59%	1091	39%	899	58%	100	91%
openings played	91		52		87		55		5

Steamhammer played all of its openings during the tournament, almost all of them multiple times. It even tried the 3 specialized openings for the island map Sparkle. Nearly as many were played in ZvP alone, since it spent so much time desperately seeking an answer to the Locutusoids (or possibly Susan). Some openings were highly successful in given matchups, which generally means that the opening defeated one opponent reliably and so was played many times. For example, OverpoolSunk wiped out CDBot, which makes it look in this table as though it wiped out all zergs. If only it were so simple! The opening with the best success across matchups is 6PoolSpeed, an opening that I have never seen in human play.