machine learning - 5 | Starcraft AI blog

SSCAIT round of 8; AlphaStar

In between ASL 7, the SSCAIT round of 8, about 4 hours of video on DeepMind’s AlphaStar, and keeping up with Steamhammer’s games, I was watching Starcraft today nearly from dawn to dusk. Coding progress: Zero. Time to do something else for a while and write about it!

In SSCAIT, the hard-to-predict matches were PurpleWave-BananaBrain and Iron-Steamhammer. PurpleWave 3-2 BananaBrain was the expected close match. In the next round we can expect PurpleWave > SAIDA and Locutus > Iron, so PurpleWave and Locutus will fight it out in the semifinal. I think Locutus has an edge, but PurpleWave retains chances.

I was not surprised by Iron > Steamhammer, but I was surprised by Iron 3-0 Steamhammer. It was unlucky that it was so one-sided. When Steamhammer has ample learning data, it has a small advantage over Iron. On a 2 player map, the 2 hatch muta variant usually wins, but here Steamhammer played it on a 4 player map where it usually loses due to poor execution—Steamhammer didn’t have enough data to connect its past wins with 2 hatch muta to the map size. On other maps, the AntiFactory build wins 50% or a little more, and in this match Steamhammer was still casting about for ways to win and didn’t try AntiFactory. It’s because I knew that Steamhammer didn’t have enough data that I gave the edge to Iron. Steamhammer is likely to win in the next round and, as others have also predicted, drop out in loser’s round 4 to SAIDA.

Interestingly, commenters who predicted Proxy > Hao Pan were wrong. All the information I can find seems to indicate that Proxy has the upper hand over Hao Pan. Did somebody hit a learning transient?

AlphaStar in Starcraft 2 gives us a foretaste of what to expect from advanced neural network learning. On the one hand, they spent huge computing resources—weeks at a time of “many thousands” of simultaneous games with 16 of Google’s TPUs per player—to learn to play protoss versus protoss on a single map. On the other hand, AlphaStar came out of that work with exceptional micro and strong judgment, areas in which all Brood War bots are currently weak. Machine learning is the way to get strong judgment. But it’s not easy.

They say that AlphaStar plays with average APM around 280 and latency around 350 ms, both somewhat slower than human. That makes its strength more impressive. They didn’t say so clearly, but I got the idea that the 350 ms latency is for free: It takes that long to evaluate their deep and complex network, so they can’t react faster! They did not talk as much about how AlphaStar’s real advantage is not in speed, but in precision: It does not misclick (at least not harmfully). Humans have a tradeoff of speed versus precision; if you do something faster, you do it with more slop. AlphaStar is a little slower, but far more precise than a human, so in fact it stands higher on the speed-precision tradeoff. It should play better, given equal knowledge. Still, it certainly takes fewer liberties than a BWAPI bot.

AIIDE 2018 - what CherryPi learned

Here is a table of how each CherryPi opening fared against each opponent, like the tables I made for other bots. Reading the code confirmed my inference that the learning files recorded opening build orders, not build orders switched to later in the game; see how CherryPi played.

#	bot	total	10hatchling	2hatchmuta	3basepoollings	9poolspeedlingmuta	hydracheese	zve9poolspeed	zvp10hatch	zvp3hatchhydra	zvp6hatchhydra	zvpohydras	zvpomutas	zvt2baseguardian	zvt2baseultra	zvt3hatchlurker	zvtmacro	zvz12poolhydras	zvz9gas10pool	zvz9poolspeed	zvzoverpool
#1	saida	13-90 13%	-	-	-	-	-	1-19 5%	-	-	-	-	-	-	1-15 6%	9-37 20%	2-19 10%	-	-	-	-
#3	cse	73-30 71%	-	-	-	-	-	0-2 0%	24-5 83%	-	-	16-8 67%	-	-	-	-	33-15 69%	-	-	-	-
#4	bluebluesky	89-14 86%	-	-	-	-	-	0-1 0%	29-8 78%	-	-	-	-	-	-	-	60-5 92%	-	-	-	-
#5	locutus	84-19 82%	-	-	63-11 85%	-	-	-	-	-	14-3 82%	-	2-2 50%	-	-	-	5-3 62%	-	-	-	-
#6	isamind	99-4 96%	-	-	1-0 100%	-	-	-	-	-	98-4 96%	-	-	-	-	-	-	-	-	-	-
#7	daqin	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	103-0 100%	-	-	-	-
#8	mcrave	87-16 84%	-	-	9-2 82%	-	-	-	-	-	31-4 89%	-	14-4 78%	-	-	-	33-6 85%	-	-	-	-
#9	iron	97-6 94%	-	-	-	-	97-6 94%	-	-	-	-	-	-	-	-	-	-	-	-	-	-
#10	zzzkbot	93-10 90%	58-4 94%	-	-	0-1 0%	-	-	-	-	-	-	-	-	-	-	-	-	-	35-4 90%	0-1 0%
#11	steamhammer	81-21 79%	22-7 76%	-	-	-	-	16-5 76%	-	-	-	-	-	-	-	-	-	0-1 0%	-	43-8 84%	-
#12	microwave	94-9 91%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0-1 0%	4-2 67%	90-6 94%
#13	lastorder	85-18 83%	45-7 87%	-	-	-	-	0-1 0%	-	-	-	-	-	-	-	-	-	-	-	-	40-10 80%
#14	tyr	98-5 95%	-	-	-	-	-	-	98-5 95%	-	-	-	-	-	-	-	-	-	-	-	-
#15	metabot	94-2 98%	-	-	-	-	-	-	-	-	-	94-2 98%	-	-	-	-	-	-	-	-	-
#16	letabot	101-2 98%	0-1 0%	-	97-0 100%	-	-	1-1 50%	-	-	-	-	-	3-0 100%	-	-	-	-	-	-	-
#17	arrakhammer	92-11 89%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	92-11 89%	-
#18	ecgberht	102-1 99%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	102-1 99%	-	-	-	-
#19	ualbertabot	99-4 96%	-	-	-	96-2 98%	-	3-2 60%	-	-	-	-	-	-	-	-	-	-	-	-	-
#20	ximp	98-5 95%	-	-	-	-	-	-	-	1-0 100%	-	97-5 95%	-	-	-	-	-	-	-	-	-
#21	cdbot	103-0 100%	-	-	-	-	-	96-0 100%	-	-	-	-	-	-	-	-	-	-	-	7-0 100%	-
#22	aiur	100-3 97%	-	-	-	-	-	-	-	-	-	100-3 97%	-	-	-	-	-	-	-	-	-
#23	killall	103-0 100%	102-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	1-0 100%
#24	willyt	103-0 100%	-	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
#25	ailien	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	103-0 100%	-
#26	cunybot	100-3 97%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	100-3 97%	-
#27	hellbot	103-0 100%	-	-	-	-	-	-	31-0 100%	-	-	72-0 100%	-	-	-	-	-	-	-	-	-
overall		- 90%	227-19 92%	103-0 100%	170-13 93%	96-3 97%	97-6 94%	117-31 79%	182-18 91%	1-0 100%	143-11 93%	379-18 95%	16-6 73%	3-0 100%	1-15 6%	9-37 20%	338-49 87%	0-1 0%	0-1 0%	384-28 93%	131-17 89%

Look how sparse the chart is—CherryPi was highly selective about its choices. It did not try more than 4 different builds against any opponent. It makes sense to minimize the number of choices so that you don’t lose games exploring bad ones, but you have to be pretty sure that one of the choices you do try is good. Where did the selectivity come from?

The opening “hydracheese” was played only against Iron, and was the only opening played against Iron. It smelled like a hand-coded choice. Sure enough, the file source/src/models/banditconfigurations.cpp configures builds by name for 18 of the 27 entrants. A comment says that the build order switcher is turned off for the hydracheese opening only: “BOS disabled for this specific build because the model hasn’t seen it.” Here is the full set of builds configured, including defaults for those that were not hand-configured. CherryPi played only builds that were configured, but did not play all the builds that were configured; presumably it stopped when it hit a good one.

bots	builds	note
AILien	zve9poolspeed zvz9poolspeed	returning opponents from last year
AIUR	zvtmacro zvpohydras zvp10hatch
Arrakhammer	10hatchling zvz9poolspeed
Iron	hydracheese
UAlbertaBot	zve9poolspeed 9poolspeedlingmuta
Ximp	zvpohydras zvtmacro zvp3hatchhydra
Microwave	zvzoverpool zvz9poolspeed zvz9gas10pool	“we have some expectations”
Steamhammer	zve9poolspeed zvz9poolspeed zvz12poolhydras 10hatchling
ZZZKBot	9poolspeedlingmuta 10hatchling zvz9poolspeed zvzoverpool
ISAMind Locutus McRave DaQin	zvtmacro zvp6hatchhydra 3basepoollings zvpomutas
CUNYBot	zvzoverpoolplus1 zvz9gas10pool zvz9poolspeed
HannesBredberg	zvtp1hatchlurker zvt2baseultra zvt3hatchlurker zvp10hatch
LetaBot	zvtmacro 3basepoollings zvt2baseguardian zve9poolspeed 10hatchling
MetaBot	zvtmacro zvpohydras zvpomutas zve9poolspeed
WillyT	zvt2baseultra 12poolmuta 2hatchmuta
ZvT	zvt2baseultra zvtmacro zvt3hatchlurker zve9poolspeed	defaults
ZvP	zve9poolspeed zvtmacro zvp10hatch zvpohydras
ZvZ	10hatchling zve9poolspeed zvz9poolspeed zvzoverpool
ZvR	10hatchling zve9poolspeed 9poolspeedlingmuta

I read this as pulling out all the stops to reach #1. They would have succeeded if not for SAIDA.

banditconfigurations.cpp continues and declares some properties for builds including non-opening builds. It looks like .validOpening() tells whether it can be played as an opening build, .validSwitch() tells whether the build order switcher is allowed to switch to it during the game, and .switchEnabled() tells whether the build order switcher is enabled at all.

The build orders themselves are defined in source/src/buildorders/. I found them a little hard to read, partly because they are written in reverse order: Actions to happen first are posted last to the blackboard.

The opening zve9poolspeed (I read “zve” as zerg versus everything) has the most red boxes in the chart—it did poorly against more opponents than any other. It may have been a poor choice to configure for use in so many cases. In contrast, zvz9poolspeed specialized for ZvZ was successful. It gets fast mutalisks and in general has a lot more strategic detail coded into the build.

They seem to have had expectations of the zvt2baseultra build against terran. It is configured for HannesBredberg, WillyT, and the default ZvT. It was in fact only tried against SAIDA. I didn’t notice anything that tells CherryPi what order to try opening builds in. Maybe the build order switcher itself contributes, helping to choose the more likely openings first?

LastOrder and its macro model - technical info

Time to dig into the details! I read the paper and some of the code to find stuff out.

LastOrder’s “macro” decisions are made by a neural network whose data size is close to 8MB—much larger than LastOrder’s code (but much smaller than CherryPi’s model data). There is room for a lot of smarts in that many bytes. The network takes in a summary description of the game state as a vector of feature values, and returns a macro action, what to build or upgrade or whatever next. The code to marshal data to and from the network is in StrategyManager.cpp.

network input

The list of network input features is initialized in the StrategyManager constructor and filled in in StrategyManager::triggerModel(). There are a lot of features. I didn’t dig into the details, but it looks as though some of the features are binary, some are counts, some are X or Y values that together give a position on the map, and a few are other numbers. They fall into these groups:

• State features. Basic information about the map and the opponent, our upgrades and economy, our own and enemy tech buildings.

• Waiting to build features. I’m not sure what these mean, but it’s something to do with production.

• “Our battle basic features” and “enemy battle basic features.” Combat units.

• “Our defend building features” and “enemy defend building features.” Static defense.

• “Killed” and “kill” features, what units of ours or the enemy’s are destroyed.

• A mass of features related to our current attack action and what the enemy has available to defend against it.

• “Our defend info” looks like places we are defending and what the enemy is attacking with.

• “Enemy defend info” looks like it locates the enemy’s static defense relative to the places we are interested in attacking.

• “Visible” gives the locations of the currently visible enemy unit types. I’m not quite sure what this means. A unit type doesn’t have an (x,y) position, and it seems as though LastOrder is making one up. It could be the location of the largest group of each unit type, or the closest unit of each type, or something. Have to read more code.

With this much information available, sophisticated strategies are possible in principle. It’s not clear how much of this the network successfully understands and makes use of. The games I watched did not give the impression of deep understanding, but then again, we have to remember that LastOrder learned to play against 20 specific opponents. Its results against those opponents suggest that it does understand them deeply.

network output

It looks like the network output is a single macro action. Code checks whether the action is valid in the current situation and, if so, calls on the appropriate manager to carry it out. The code is full of I/O details and validation and error handling, so I might have missed something in the clutter. Also the code shows signs of having been modified over time without tying up loose ends. I imagine they experimented actively.

By the way, the 9 pool/10 hatch muta/12 hatch muta opening choices and learning code from Overkill are still there, though Overkill’s opening learning is not used.

learning setup

The learning setup uses Ape-X DQN. The term is as dense as a neutron star! Ape-X is a way to organize deep reinforcement learning; see the paper Distributed Prioritized Experience Replay by Horgan et al of Google’s DeepMind. In “DQN”, D stands for deep and as far as I’m concerned is a term of hype and means “we’re doing the cool stuff.” Q is for Q-learning, the form of reinforcement learning you use when you know what’s good (winning the game) and you have to figure out from experience a policy (that’s a technical term) to achieve it in a series of steps over time. The policy is in effect a box where you feed in the situation and it tells you what to do in that situation. What’s good is represented by a reward (given as a number) that you may receive long after the actions that earned it; that can make it hard to figure out a good policy, which is why you end up training on a cluster of 1000 machines. Finally, “N” is for the neural network that acts as the box that knows the policy.

In Ape-X, the learning system consists of a set of Actors that (in the case of LastOrder) play Brood War and record the input features and reward for each time step, plus a Learner (the paper suggests that 1 learner is enough, though you could have more) that feeds the data to a reinforcement learning algorithm. The Actors are responsible for exploring, that is, trying out variations from the current best known policy to see if any of them are improvements. The Ape-X paper suggests having different Actors explore differently so you don’t get stuck in a rut. In the case of LastOrder, the Actors play against a range of opponents. The Learner keeps track of which which data points are more important to learn and feeds those in more often to speed up learning. If you hit a surprise, meaning the reward is much different than you expected (“I thought I was winning, then a nuke hit”), that’s something important to learn.

LastOrder seems to have closely followed the Ape-X DQN formula from the Ape-X paper. They name the exact same set of techniques, although many other choices are possible. Presumably DeepMind knows what they’re doing.

LastOrder does not train with a reward “I won/I lost.” That’s very little information and appears long after the actions that cause it, and it would leave learning glacially slow. They use reward shaping, which means giving a more informative reward number that offers the learning system more clues about whether it is going in the right direction. They use a reward based on the current game score.

the network itself

Following an idea from the 2015 paper Deep Recurrent Q-Learning for Partially Observable MDPs by Hausknecht and Stone, the LastOrder team layered a Long Short-Term Memory network in front of the DQN. We’ve seen LSTM before in Tscmoo (at least at one point; is it still there?). The point of the LSTM network is to remember what’s going on and more fully represent the game state, because in Brood War there is fog of war. So inputs go through the LSTM to expand the currently observed game state into some encoded approximation of all the game state that has been seen so far, then through the DQN to turn that into an action.

The LastOrder paper does not go into detail. There is not enough information in it to reproduce their network design. The Actor and Learner code is in the repo. I haven’t read it to see if it tells us everything.

Taken together it’s a little complicated, isn’t it? Not something for one hobbyist to try on their own. I think you need a team and a budget to put together something like this.

LastOrder and its macro model - general info

LastOrder (github) now has a 15-page academic paper out, Macro action selection with deep reinforcement learning in StarCraft by 6 authors including Sijia Xu as lead author. The paper does not go into great detail, but it reveals new information. It also uses a lot of technical terms without explanation, so it may be hard to follow if you don’t have the background. Also see my recent post how LastOrder played for a concrete look at its games.

I want to break my discussion into 2 parts. Today I’ll go over general information, tomorrow I’ll work through technical stuff, the network input and output and training and so on.

The name LastOrder turns out to be an ingenious reference to the character Last Order from the A Certain Magical Index fictional universe, the final clone sister. The machine learning process produces a long string of similar models which go into combat for experimental purposes, and you keep the last one. Good name!

LastOrder divides its work into 2 halves, “macro” handled by the machine learning model and “micro” handled by the rule-based code derived from Overkill. It’s a broad distinction; in Steamhammer’s 4-level abstraction terms, I would say that “macro” more or less covers strategy and operations, while “micro” covers tactics and micro. The macro model has a set of actions to build stuff, research tech, and expand to a new base, and a set of 18 attack actions which call for 3 different types of attack in each of 5 different places plus 3 “add army” actions which apparently assign units to the 3 types of attack. (The paper says 17 though it lists 18. It looks as though the mutalisk add army action is unused, maybe because mutas are added automatically.) There is also an action to do nothing.

The paper includes a table on the last page, results of a test tournament where each of the 28 AIIDE 2017 participants played 303 games against LastOrder. We get to see how LastOrder scored its advertised 83% win rate: #2 PurpleWave and #3 Iron (rankings from AIIDE 2017) won nearly all games, no doubt overwhelming the rule-based part of LastOrder so that the macro model could not help. Next Microwave scored just under 50%, XIMP scored about 32%, and all others performed worse, including #1 ZZZKBot at 1.64% win rate—9 bots scored under 2%. When LastOrder’s micro part is good enough, the macro part is highly effective.

In AIIDE 2018, #13 LastOrder scored 49%, ranking in the top half. The paper has a brief discussion on page 10. LastOrder was rolled by top finishers because the micro part could not keep up with #9 Iron and above (according to me) or #8 McRave and above (according to the authors, who know things I don’t). Learning can’t help if you’re too burned to learn. LastOrder was also put down by terrans Ecgberht and WillyT, whose play styles are not represented in the 2017 training group, which has only 4 terrans (one of which is Iron that LastOrder cannot touch).

In the discussion of future work (a mandatory part of an academic paper; the work is required to be unending), they talk briefly about how to fix the weaknesses that showed in AIIDE 2018. They mention improving the rule-based part and learning unit-level micro to address the too-burned-to-learn problem, and self-play training to address the limitations of the training opponents. Self-play is the right idea in principle, but it’s not easy. You have to play all 3 races and support all the behaviors you might face, and that’s only the starting point before making it work.

I’d like to suggest another simple idea for future work: Train each matchup separately. You lose generalization, but how much do production and attack decisions generalize between matchups? I could be wrong, but I think not much. Instead, a zerg player could train 3 models, ZvT ZvP and ZvZ, each of which takes fewer inputs and is solving an easier problem. A disadvantage is that protoss becomes relatively more difficult if you allow for mind control.

LastOrder has skills that I did not see in the games I watched. There is code for them, at least; whether it can carry out the skills successfully is a separate question. It can use hydralisks and lurkers. Most interestingly, it knows how to drop. The macro model includes an action to research drop (UpgradeOverlordLoad), an action to assign units and presumably load up for a drop (AirDropAddArmy), and actions to carry out drops in different places (AttackAirDropStart for the enemy starting base, AttackAirDropNatural, AttackAirDropOther1, AttackAirDropOther2, AttackAirDropOther3). The code to carry out drops is AirdropTactic.cpp; it seems to expect to drop either all zerglings, all hydralisks, or all lurkers, no mixed unit types. Does LastOrder use these skills at all? If anybody can point out a game, I’m interested.

Learning to when to make hydras and lurkers should not be too hard. If LastOrder rarely or never uses hydras, it must be because it found another plan more effective—in games you make hydras first and then get the upgrades, so it’s easy to figure out. If it doesn’t use lurkers, maybe they didn’t help, or maybe it didn’t have any hydras around to morph after it tried researching the upgrade, because hydras were seen as useless. But still, it’s only 2 steps, it should be doable. Learning to drop is not as easy, though. To earn a reward, the agent has to select the upgrade action, the load action, and the drop action in order, each at a time when it makes sense. Doing only part of the sequence sets you back, and so does doing the whole sequence if you leave too much time between the steps, or drop straight into the enemy army, or make any severe mistake. You have to carry through accurately to get the payoff. It should be learnable, but it may take a long time and trainloads of data. I would be tempted to explicitly represent dependencies like this in some way or another, to tell the model up front the required order of events.

AIIDE 2018 - what McRave learned

McRave, like Microwave and no doubt most bots that follow more than one plan, plays different openings against different races. In each opponent’s learning file, it writes win/loss numbers for 15 strategies. Their names all start with “P” for protoss, but I have stripped out the P to make the table more readable. 4 of the strategies are unused: ZealotDrop, NZCore (sounds like no zealot core), Proxy99, and Proxy6. That leaves 11 active openings. The races they were used against seen in the table. ZZCore (2 zealots before core) was played only against random.

#	bot	total	12Nexus	1GateCorsair	1GateRobo	21Nexus	2GateDragoon	2GateExpand	4Gate	DTExpand	FFE	ZCore	ZZCore
#1	saida	16-55 23%	1-12 8%	-	-	7-17 29%	1-12 8%	-	-	7-14 33%	-	-	-
#2	cherrypi	15-88 15%	-	6-25 19%	-	-	-	6-25 19%	2-20 9%	-	1-18 5%	-	-
#3	cse	27-75 26%	-	-	7-19 27%	-	-	5-17 23%	2-15 12%	-	-	13-24 35%	-
#4	bluebluesky	29-74 28%	-	-	1-14 7%	-	-	2-15 12%	7-18 28%	-	-	19-27 41%	-
#5	locutus	46-56 45%	-	-	5-12 29%	-	-	15-15 50%	14-15 48%	-	-	12-14 46%	-
#6	isamind	54-49 52%	-	-	7-11 39%	-	-	4-10 29%	15-14 52%	-	-	28-14 67%	-
#7	daqin	60-43 58%	-	-	13-11 54%	-	-	4-9 31%	8-10 44%	-	-	35-13 73%	-
#9	iron	56-32 64%	27-8 77%	-	-	2-7 22%	18-9 67%	-	-	9-8 53%	-	-	-
#10	zzzkbot	75-28 73%	-	8-7 53%	-	-	-	17-7 71%	21-7 75%	-	29-7 81%	-	-
#11	steamhammer	64-38 63%	-	9-9 50%	-	-	-	27-10 73%	15-10 60%	-	13-9 59%	-	-
#12	microwave	82-21 80%	-	0-5 0%	-	-	-	39-4 91%	30-5 86%	-	13-7 65%	-	-
#13	lastorder	97-6 94%	-	10-2 83%	-	-	-	17-1 94%	10-2 83%	-	60-1 98%	-	-
#14	tyr	91-10 90%	-	-	23-3 88%	-	-	7-5 58%	31-1 97%	-	-	30-1 97%	-
#15	metabot	49-46 52%	-	-	8-11 42%	-	-	16-12 57%	23-14 62%	-	-	2-9 18%	-
#16	letabot	77-15 84%	12-5 71%	-	-	5-5 50%	20-4 83%	-	-	40-1 98%	-	-	-
#17	arrakhammer	102-1 99%	-	-	-	-	-	-	94-1 99%	-	8-0 100%	-	-
#18	ecgberht	99-2 98%	95-0 100%	-	-	-	3-1 75%	-	-	1-1 50%	-	-	-
#19	ualbertabot	73-29 72%	-	-	-	-	-	12-8 60%	38-6 86%	-	7-7 50%	-	16-8 67%
#20	ximp	41-59 41%	-	-	8-14 36%	-	-	15-17 47%	18-18 50%	-	-	0-10 0%	-
#21	cdbot	103-0 100%	-	-	-	-	-	-	103-0 100%	-	-	-	-
#22	aiur	80-21 79%	-	-	11-6 65%	-	-	13-6 68%	41-3 93%	-	-	15-6 71%	-
#23	killall	60-43 58%	-	3-9 25%	-	-	-	6-9 40%	19-12 61%	-	32-13 71%	-	-
#24	willyt	77-17 82%	37-2 95%	-	-	3-6 33%	23-4 85%	-	-	14-5 74%	-	-	-
#25	ailien	86-17 83%	-	31-3 91%	-	-	-	20-5 80%	5-6 45%	-	30-3 91%	-	-
#26	cunybot	91-8 92%	-	26-1 96%	-	-	-	36-1 97%	14-3 82%	-	15-3 83%	-	-
#27	hellbot	103-0 100%	-	-	-	-	-	-	-	-	-	103-0 100%	-
overall		- 68%	172-27 86%	93-61 60%	83-101 45%	17-35 33%	65-30 68%	261-176 60%	510-180 74%	71-29 71%	208-68 75%	257-118 69%	16-8 67%

Unlike other bots that scored comparatively well against SAIDA—meaning they weren’t always wiped summarily from the map—McRave did not rely solely on cloaked units. The DTExpand opening scored best, but 21Nexus was nearly as successful. (McRave scored inconsistently against lower-ranked bots, though, as its author has commented.)

Every strategy came out with some good scores. But here is another analysis: Suppose the goal of the learning algorithm is to find the single most successful strategy (which is not always true—you might want to find the best mix to confuse the opponent’s learning). Leaving aside CDBot and HellBot, which McRave scored 100% against, against how many opponents was each opening the best choice? I made this table by hand, so there might be mistakes. I counted equal best as also best. The “versus” column tells which races the opening was used against.

opening	best	versus
12Nexus	3	T
1GateCorsair	2	Z
1GateRobo	0	P
21Nexus	0	T
2GateDragoon	0	T
2GateExpand	6	P, Z, R
4Gate	5	P, Z, R
DTExpand	2	T
FFE	5	Z, R
ZCore	4	P
ZZCore	0	R

The counts do not match up well with the overall winning rates. There were 4 never-best openings. This analysis does not say that they are bad openings that dragged down the score. Consider what would have happened if they had not been enabled: Their games would have been distributed among the other openings; there would have been some extra wins and some extra losses, and the ratio would depend on the distribution. 21Nexus was never best, but scored second best against SAIDA and contributed as many wins. On the other hand, the openings which were often best were definitely worth having; they were well-chosen for McRave versus this set of opponents. It could make sense to try those openings first, or more often. On the third hand, notice that the openings with the highest counts were played against the largest number of opponents. There were more bests to count! Openings versus terran scored 5 bests because there were 5 terran opponents.

Plenty of similar analyses could be done. For example, you could count how often or how widely an opening scored above/below the average for each opponent: Did it make a net contribution, or the opposite? It would be another way of seeing whether the openings were well chosen for the opponents they faced.

Next I want to start watching some replays. I think I will start with LastOrder, which did all its learning offline yet held its win rate steady against the onslaught of learning bots. I’m expecting it to be interesting and sophisticated in some way.

AIIDE 2018 - what UAlbertaBot learned

UAlbertaBot played random, and its openings are chosen, not according to the opponent’s race, but according to its own once the game starts. It has 3 protoss, 4 terran, and 4 zerg openings. Playing random gives the disadvantage of having about 1/3 as many games to figure out how to counter the opponent with each race. The countervailing advantage, of course, is that the opponent can’t predict what is coming its way.

103 rounds were played and UAlbertaBot does not deliberately drop data, so some of the totals add up to more than the 100 official rounds. UAlbertaBot also had 46 crashes, so some totals add up to less. For example, it recorded 96 games against LastOrder.

The official site doesn’t offer binaries for the bots which were carried over from last year, but this should be the 2017 version of UAlbertaBot. It has enemy-specific strategies configured for 13 opponents, of which 5 are also in this tournament: #9 Iron, #10 ZZZKBot, #16 LetaBot, #2o Ximp, and #22 Aiur. For ZZZKBot, only the protoss opening is set; for the others, all 3 races have openings set. Looking at the table, we see that UAlbertaBot did not always try all of its openings, and the blanks in the table do not always correspond to enemy-specific openings. Apparently in this UAlbertaBot version, the enemy-specific strategies act as hints rather than requirements: When available they are tried first, and when not, the default opening is tried first (ZealotRush, MarineRush, or ZerglingRush). If the first opening tried performs well enough, UAlbertaBot sticks with it.

#	bot	total	Protoss			Terran				Zerg
#	bot	total	DTRush	DragoonRush	ZealotRush	4RaxMarines	MarineRush	TankPush	VultureRush	2HatchHydra	3HatchMuta	3HatchScourge	ZerglingRush
#1	saida	13-88 13%	12-7 63%	0-2 0%	0-5 0%	0-9 0%	0-9 0%	1-13 7%	0-9 0%	0-9 0%	0-9 0%	0-8 0%	0-8 0%
#2	cherrypi	1-99 1%	0-8 0%	0-7 0%	0-7 0%	0-8 0%	1-11 8%	0-8 0%	0-8 0%	0-11 0%	0-11 0%	0-10 0%	0-10 0%
#3	cse	2-99 2%	0-7 0%	2-14 12%	0-7 0%	0-11 0%	0-10 0%	0-10 0%	0-10 0%	0-8 0%	0-8 0%	0-7 0%	0-7 0%
#4	bluebluesky	11-92 11%	0-4 0%	3-10 23%	4-11 27%	0-5 0%	0-5 0%	2-11 15%	0-5 0%	0-9 0%	0-8 0%	0-8 0%	2-16 11%
#5	locutus	6-97 6%	0-7 0%	4-17 19%	0-7 0%	0-8 0%	0-8 0%	1-11 8%	0-8 0%	1-10 9%	0-7 0%	0-7 0%	0-7 0%
#6	isamind	5-96 5%	0-7 0%	4-17 19%	0-7 0%	0-9 0%	0-8 0%	0-8 0%	0-8 0%	0-7 0%	0-7 0%	0-7 0%	1-11 8%
#7	daqin	12-90 12%	4-12 25%	0-4 0%	2-9 18%	0-6 0%	0-6 0%	1-6 14%	0-5 0%	2-13 13%	0-7 0%	0-7 0%	3-15 17%
#8	mcrave	29-71 29%	5-12 29%	1-6 14%	0-5 0%	0-3 0%	10-13 43%	1-5 17%	0-3 0%	2-6 25%	0-3 0%	0-3 0%	10-12 45%
#9	iron	9-94 9%	0-10 0%	1-14 7%	0-9 0%	0-8 0%	0-8 0%	0-8 0%	1-12 8%	1-6 14%	1-6 14%	0-4 0%	5-9 36%
#10	zzzkbot	13-87 13%	0-3 0%	0-3 0%	13-20 39%	0-9 0%	0-9 0%	0-9 0%	0-9 0%	0-7 0%	0-6 0%	0-6 0%	0-6 0%
#11	steamhammer	11-92 11%	0-5 0%	0-5 0%	8-19 30%	1-10 9%	0-6 0%	0-6 0%	0-6 0%	0-7 0%	0-7 0%	0-7 0%	2-14 12%
#12	microwave	20-81 20%	-	-	18-7 72%	0-7 0%	2-14 12%	0-7 0%	0-7 0%	0-10 0%	0-10 0%	0-10 0%	0-9 0%
#13	lastorder	4-92 4%	0-6 0%	0-6 0%	2-12 14%	2-10 17%	0-5 0%	0-5 0%	0-5 0%	0-11 0%	0-11 0%	0-11 0%	0-10 0%
#14	tyr	36-61 37%	5-12 29%	0-4 0%	0-5 0%	0-2 0%	3-4 43%	13-7 65%	1-2 33%	13-15 46%	0-3 0%	0-3 0%	1-4 20%
#15	metabot	35-56 38%	4-5 44%	6-5 55%	2-4 33%	1-6 14%	3-9 25%	1-6 14%	0-3 0%	0-2 0%	6-3 67%	3-3 50%	9-10 47%
#16	letabot	48-44 52%	11-14 44%	0-3 0%	2-6 25%	0-2 0%	1-4 20%	0-2 0%	4-7 36%	30-6 83%	-	-	-
#17	arrakhammer	56-41 58%	-	-	23-6 79%	0-6 0%	0-6 0%	0-6 0%	0-6 0%	-	-	-	33-11 75%
#18	ecgberht	40-56 42%	9-7 56%	9-8 53%	1-4 20%	0-2 0%	0-5 0%	0-2 0%	6-7 46%	0-3 0%	0-3 0%	0-3 0%	15-12 56%
#20	ximp	38-56 40%	0-2 0%	7-7 50%	4-5 44%	0-4 0%	0-4 0%	9-19 32%	1-6 14%	-	-	17-9 65%	-
#21	cdbot	44-54 45%	-	-	23-4 85%	0-2 0%	19-15 56%	0-2 0%	0-2 0%	0-6 0%	1-9 10%	0-5 0%	1-9 10%
#22	aiur	57-45 56%	35-1 97%	-	-	0-2 0%	0-2 0%	0-2 0%	11-10 52%	1-5 17%	9-15 38%	0-3 0%	1-5 17%
#23	killall	73-27 73%	-	-	30-8 79%	0-2 0%	12-6 67%	0-2 0%	0-2 0%	-	-	-	31-7 82%
#24	willyt	36-55 40%	3-12 20%	1-8 11%	0-5 0%	0-4 0%	0-5 0%	0-4 0%	10-11 48%	-	-	-	22-6 79%
#25	ailien	71-30 70%	-	-	18-11 62%	16-10 62%	2-4 33%	0-2 0%	0-2 0%	-	-	-	35-1 97%
#26	cunybot	75-15 83%	-	-	23-1 96%	-	30-7 81%	-	-	-	-	-	22-7 76%
#27	hellbot	100-2 98%	-	-	33-0 100%	-	41-2 95%	-	-	-	-	-	26-0 100%
overall		- 33%	88-141 38%	38-140 21%	206-184 53%	20-145 12%	124-185 40%	29-161 15%	34-153 18%	50-151 25%	17-133 11%	20-121 14%	219-206 52%

The DT rush caused surprising problems for SAIDA, but terran and zerg had nothing. Did playing random contribute? Does the updated current SAIDA, flame-hardened on SSCAIT, react better? The hand-chosen 2 hatch hydra also did strikingly well against LetaBot, not an obvious choice. Every opening had a plus score against some opponent, though VultureRush barely made it over. Looking across the bottom row, the default openings had the best overall results for each race—they were chosen correctly. Also, we can see that protoss was UAlbertaBot’s best race, and terran the worst; we already knew that, but here we see it in the numbers.

AIIDE 2018 - what Microwave learned

Microwave uses UCB and keeps its learning data in the same file format as UAlbertaBot, one file per opponent listing on each line an opening, a count of wins, and a count of losses. It’s a simple format that is also used outside the UAlbertaBot family. Microwave adds a twist: It does not allow the count of wins or the count of losses to exceed 10. I’m not sure what the exact update rule is without reading the code, but the effect is that only the more recent game results are remembered. It’s appropriate if the enemy is expected to be learning too, and to change its strategy rapidly so that Microwave has to keep adapting.

Microwave plays different strategies against each race. Against Terran it has 7, against Protoss and Zerg 8, and against random 6. UAlbertaBot was the only random opponent. The strategies partly overlap. For example, 10Hatch9Pool9gas is played against both terran and protoss, while 9HatchMain8Pool8Gas is played only against zerg. The table has big blank spaces full of unplayed strategies. Maybe I should have sorted it by race, instead of by rank?

#	bot	total	10Hatch9Pool9gas	12Pool	3HatchPoolHydra	5HatchGasHydra	5Pool	9HatchMain8Pool8Gas	9Pool	9PoolExpo	9PoolHatch	9PoolLurker	9PoolSpeed	9PoolSpeedLing	9PoolSunken	Overpool	OverpoolSpeed	ZvT_12HatchHydra	ZvT_12HatchLurker	ZvT_12HatchMuta	ZvZ_Overpool11Gas
#1	saida	0-70 0%	0-10 0%	-	-	-	0-10 0%	-	0-10 0%	-	-	0-10 0%	-	-	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#2	cherrypi	0-80 0%	-	0-10 0%	-	-	0-10 0%	0-10 0%	-	-	0-10 0%	-	0-10 0%	-	0-10 0%	-	0-10 0%	-	-	-	0-10 0%
#3	cse	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#4	bluebluesky	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#5	locutus	1-80 1%	1-10 9%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#6	isamind	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#7	daqin	0-80 0%	0-10 0%	-	0-10 0%	0-10 0%	0-10 0%	-	-	-	-	-	-	0-10 0%	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#8	mcrave	7-68 9%	1-10 9%	-	1-10 9%	0-5 0%	1-8 11%	-	-	-	-	-	-	1-10 9%	-	-	-	1-10 9%	0-5 0%	2-10 17%	-
#9	iron	0-70 0%	0-10 0%	-	-	-	0-10 0%	-	0-10 0%	-	-	0-10 0%	-	-	-	-	-	0-10 0%	0-10 0%	0-10 0%	-
#10	zzzkbot	24-37 39%	-	5-8 38%	-	-	0-2 0%	9-10 47%	-	-	9-10 47%	-	0-1 0%	-	0-1 0%	-	0-1 0%	-	-	-	1-4 20%
#11	steamhammer	57-15 79%	-	10-2 83%	-	-	6-7 46%	1-2 33%	-	-	10-2 83%	-	10-0 100%	-	0-1 0%	-	10-1 91%	-	-	-	10-0 100%
#13	lastorder	24-21 53%	-	0-1 0%	-	-	10-2 83%	0-1 0%	-	-	2-4 33%	-	0-1 0%	-	10-6 62%	-	1-3 25%	-	-	-	1-3 25%
#14	tyr	15-13 54%	2-3 40%	-	0-1 0%	3-4 43%	10-1 91%	-	-	-	-	-	-	0-1 0%	-	-	-	0-1 0%	0-1 0%	0-1 0%	-
#15	metabot	41-13 76%	10-2 83%	-	8-3 73%	0-1 0%	0-1 0%	-	-	-	-	-	-	10-1 91%	-	-	-	1-2 33%	2-3 40%	10-0 100%	-
#16	letabot	26-18 59%	4-5 44%	-	-	-	1-2 33%	-	10-0 100%	-	-	8-5 62%	-	-	-	-	-	0-1 0%	0-1 0%	3-4 43%	-
#17	arrakhammer	27-22 55%	-	7-8 47%	-	-	10-0 100%	0-1 0%	-	-	0-1 0%	-	3-4 43%	-	5-4 56%	-	2-3 40%	-	-	-	0-1 0%
#18	ecgberht	38-18 68%	0-1 0%	-	-	-	10-0 100%	-	0-1 0%	-	-	1-2 33%	-	-	-	-	-	10-7 59%	10-0 100%	7-7 50%	-
#19	ualbertabot	50-10 83%	-	-	-	-	10-1 91%	-	0-1 0%	10-0 100%	10-4 71%	-	-	-	10-4 71%	10-0 100%	-	-	-	-	-
#20	ximp	27-15 64%	2-3 40%	-	0-1 0%	0-1 0%	0-1 0%	-	-	-	-	-	-	10-0 100%	-	-	-	5-6 45%	0-1 0%	10-2 83%	-
#21	cdbot	46-13 78%	-	10-0 100%	-	-	0-1 0%	1-2 33%	-	-	4-5 44%	-	10-3 77%	-	1-2 33%	-	10-0 100%	-	-	-	10-0 100%
#22	aiur	48-15 76%	1-2 33%	-	10-1 91%	7-5 58%	0-1 0%	-	-	-	-	-	-	9-3 75%	-	-	-	1-2 33%	10-1 91%	10-0 100%	-
#23	killall	40-5 89%	-	10-0 100%	-	-	0-1 0%	10-0 100%	-	-	0-1 0%	-	10-0 100%	-	10-1 91%	-	0-1 0%	-	-	-	0-1 0%
#24	willyt	34-10 77%	4-5 44%	-	-	-	0-1 0%	-	0-1 0%	-	-	0-1 0%	-	-	-	-	-	10-2 83%	10-0 100%	10-0 100%	-
#25	ailien	28-32 47%	-	9-10 47%	-	-	1-4 20%	0-1 0%	-	-	3-6 33%	-	0-1 0%	-	10-2 83%	-	5-7 42%	-	-	-	0-1 0%
#26	cunybot	67-1 99%	-	10-0 100%	-	-	10-0 100%	0-1 0%	-	-	10-0 100%	-	10-0 100%	-	7-0 100%	-	10-0 100%	-	-	-	10-0 100%
#27	hellbot	74-0 100%	10-0 100%	-	10-0 100%	6-0 100%	10-0 100%	-	-	-	-	-	-	8-0 100%	-	-	-	10-0 100%	10-0 100%	10-0 100%	-
overall		- 42%	35-101 26%	61-39 61%	29-66 31%	16-66 20%	79-113 41%	21-28 43%	10-23 30%	10-0 100%	48-43 53%	9-28 24%	43-20 68%	38-65 37%	53-31 63%	10-0 100%	38-26 59%	38-101 27%	42-82 34%	62-94 40%	32-20 62%

The total column tells how successful Microwave was in recent games against each opponent. You might want to compare the percentages against the overall win rates from the official crosstable; they sometimes vary curiously. When the recorded results were less successful than the total results, it suggests that Microwave may have forgotten too much (though it could be random fluctuation). For example, Microwave scored 80% against LetaBot overall, but 59% in the recent games in this table.

The overall row tells how successful each opening was in recent games. Every opening was successful against some opponents, so there were no useless strategies. The body of the table, from #10 ZZZKBot and down, is full of strong contrasts, meaning that there was a big difference between the successful and unsuccessful openings against each opponent. That suggests that learning must have been useful.

AIIDE 2018 - what Locutus learned

The Locutusoids have learning data only slightly different from Steamhammer’s. I have run my summarizer code for CSE, BlueBlueSky, Locutus, and ISAMind, skipping DaQin because it recorded only 1 game per opponent (which tickles a bug in my code). I am thinking of posting only the Locutus results, because the others don’t hold much extra interest. Locutus plays a wider range of openings than the others (perhaps because newer bots have to restrict their scope). CSE in particular is more in the do-one-thing-well camp. Besides, all of them had high win rates against lower-ranked opponents; they did not have much to learn. I don’t see a point in piling up data about similar players.

But if people want, I can post them all. Any requests?

Locutus is the only Locutusoid to use pre-learned data. Some of the others had their own ways of preparing for known opponents. For example, CSE is configured with several enemy-specific strategies, such as DT drop against #9 Iron.

Here is a summary of the pre-learned data used by Locutus. Locutus is configured to retain at most 200 game records per opponent, so that’s as much pre-learned data as it makes sense to give it. When you give it that much, each tournament game record added at the end causes one pre-learned record to scroll off the beginning. At the end of a 100 round tournament, half the game records are retained from the pre-learned data and half are tournament games—the pre-learned data more or less dominated tournament data for decisions during the tournament.

#	opponent	games	wins
7	DaQin	35	91%
9	Iron	200	93%
10	ZZZKBot	200	76%
14	Tyr	200	96%
17	Arrakhammer	200	88%
19	UAlbertaBot	71	100%
22	AIUR	51	96%
25	AILien	200	96%

Here is the final data. For every opponent that has pre-learned data, much or all of the per-learned data is retained until the end.

#1 saida

opening	games	wins
10-15GateGoon	22	0%
10Gate25NexusFE	29	7%
DTDrop	32	6%
Proxy4GateGoon	7	0%
Proxy4GateGoon2p	3	0%
Proxy9-9Gate	10	0%
6 openings	103	4%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	102	99%	4%	102	99%	3%	99%	0%
Proxy		0%	0%	1	1%	100%	0%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

Locutus and the Locutusoids use “Not fast rush” as a catch-all: The enemy’s opening is not a fast rush, and it is not more precisely recognized than that.

#2 cherrypi

opening	games	wins
ForgeExpand4Gate2Archon	19	16%
ForgeExpand5GateGoon	55	5%
ForgeExpandSpeedlots	16	6%
ProxyHeavyZealotRush	6	17%
ProxyHeavyZealotRush2p	7	57%
5 openings	103	12%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	13	13%	23%	35	34%	20%	23%	0%
Not fast rush	89	86%	10%	68	66%	7%	64%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

Why are the successful proxy openings so little played? The “2p” version is played only on 2-player maps; the other version only on 3- and 4-player maps. Looking into the file by hand, I see that they were both successful from early in the tournament, so it’s not a matter of discovering them late. Perhaps the map size specialization interferes with the learning process? Perhaps they are deliberately little played to prevent the opponent from adapting? Have to read the code for this one. The proxy openings show similar numbers across other opponents, so it's not a one-off. Locutus’s learning in general does not look like it concentrates hard on playing the best-performing openings.

#3 cse

opening	games	wins
2GateDTExpo	3	0%
2GateDTRush	24	38%
4GateGoon	46	30%
Proxy4GateGoon	4	50%
Proxy4GateGoon2p	8	62%
Proxy9-9Gate	6	0%
ProxyHeavyZealotRush	4	0%
ProxyHeavyZealotRush2p	2	50%
Turtle	6	50%
9 openings	103	33%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	10	10%	40%	28	27%	43%	10%	0%
Fast rush		0%	0%	6	6%	0%	0%	0%
Heavy rush		0%	0%	3	3%	100%	0%	0%
Not fast rush	92	89%	33%	66	64%	29%	63%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

#4 bluebluesky

opening	games	wins
2GateDTExpo	13	31%
2GateDTRush	7	43%
4GateGoon	58	43%
9-9GateDefensive	3	0%
Proxy4GateGoon	1	100%
Proxy4GateGoon2p	2	100%
Proxy9-9Gate	2	0%
ProxyHeavyZealotRush	2	0%
ProxyHeavyZealotRush2p	1	0%
Turtle	14	29%
10 openings	103	38%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	60	58%	32%	55	53%	31%	77%	0%
Not fast rush	39	38%	51%	45	44%	49%	82%	0%
Proxy	3	3%	0%	3	3%	0%	67%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

#6 isamind

opening	games	wins
2GateDTRush	17	71%
4GateGoon	60	58%
9-9GateDefensive	6	33%
Proxy4GateGoon	2	100%
Proxy4GateGoon2p	3	67%
Proxy9-9Gate	1	0%
ProxyHeavyZealotRush	2	0%
ProxyHeavyZealotRush2p	1	0%
Turtle	11	55%
9 openings	103	57%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar		0%	0%	1	1%	100%	0%	0%
Fast rush	5	5%	60%	7	7%	100%	20%	0%
Heavy rush	13	13%	54%	7	7%	71%	15%	0%
Not fast rush	78	76%	59%	85	83%	51%	85%	0%
Proxy	6	6%	33%	3	3%	100%	0%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#7 daqin

opening	games	wins
2GateDTExpo	4	100%
2GateDTRush	25	100%
4GateGoon	44	98%
9-9GateDefensive	19	68%
Proxy4GateGoon	6	83%
Proxy4GateGoon2p	1	100%
Proxy9-9Gate	4	75%
ProxyHeavyZealotRush	2	100%
ProxyHeavyZealotRush2p	1	100%
Turtle	32	38%
10 openings	138	79%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	51	37%	49%	41	30%	78%	31%	0%
Not fast rush	86	62%	97%	97	70%	79%	71%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

Locutus scored lower versus DaQin in the tournament than in the pre-learning data. It may mean that DaQin was updated in private before the tournament. You have to expect that; I assume it is why there were only 35 games in the pre-learning data.

#8 mcrave

opening	games	wins
2GateDTExpo	1	0%
2GateDTRush	27	67%
4GateGoon	49	55%
9-9GateDefensive	6	33%
Proxy4GateGoon	3	67%
Proxy4GateGoon2p	3	67%
Proxy9-9Gate	1	0%
ProxyHeavyZealotRush	4	50%
ProxyHeavyZealotRush2p	1	0%
Turtle	8	25%
10 openings	103	53%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	2	2%	50%	2	2%	0%	0%	0%
Fast rush	13	13%	31%	12	12%	25%	8%	0%
Heavy rush	15	15%	40%	6	6%	83%	7%	0%
Not fast rush	72	70%	61%	83	81%	57%	81%	0%
Unknown	1	1%	0%		0%	0%	0%	0%

#9 iron

opening	games	wins
10-15GateGoon	5	80%
10Gate25NexusFE	105	91%
DTDrop	89	91%
Proxy4GateGoon	1	100%
4 openings	200	91%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	152	76%	91%	74	37%	97%	39%	14%
Unknown	1	0%	100%	22	11%	91%	0%	0%
Wall-in	47	24%	91%	104	52%	87%	70%	0%

#10 zzzkbot

opening	games	wins
ForgeExpand4Gate2Archon	7	86%
ForgeExpand5GateGoon	97	94%
ForgeExpandSpeedlots	86	95%
ProxyHeavyZealotRush	5	80%
ProxyHeavyZealotRush2p	5	40%
5 openings	200	92%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	63	32%	95%	107	54%	91%	54%	0%
Heavy rush	81	40%	90%	74	37%	93%	40%	0%
Not fast rush	56	28%	93%	19	10%	100%	9%	0%

#11 steamhammer

opening	games	wins
ForgeExpand4Gate2Archon	1	100%
ForgeExpand5GateGoon	102	96%
2 openings	103	96%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	2	2%	100%	7	7%	100%	0%	0%
Heavy rush	37	36%	100%	22	21%	100%	19%	0%
Hydra bust	6	6%	67%	14	14%	93%	17%	0%
Not fast rush	57	55%	96%	60	58%	95%	61%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#12 microwave

opening	games	wins
ForgeExpand4Gate2Archon	5	100%
ForgeExpand5GateGoon	83	94%
ForgeExpandSpeedlots	15	93%
3 openings	103	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	2	2%	100%	12	12%	100%	0%	0%
Heavy rush	38	37%	95%	23	22%	100%	21%	0%
Hydra bust	18	17%	94%	16	16%	81%	11%	0%
Not fast rush	44	43%	93%	52	50%	94%	43%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#13 lastorder

opening	games	wins
ForgeExpand5GateGoon	103	98%
1 openings	103	98%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	49	48%	100%	58	56%	97%	55%	0%
Not fast rush	53	51%	96%	45	44%	100%	43%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#14 tyr

opening	games	wins
12Nexus5ZealotFECannons	57	100%
2GateDTExpo	2	50%
4GateGoon	103	100%
9-9GateDefensive	6	67%
Proxy9-9Gate	3	33%
ProxyHeavyZealotRush	1	0%
Turtle	28	89%
7 openings	200	96%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	21	10%	86%	1	0%	100%	0%	0%
Heavy rush	89	44%	100%	18	9%	89%	10%	0%
Not fast rush	80	40%	95%	150	75%	97%	54%	38%
Proxy	6	3%	67%	1	0%	100%	0%	0%
Unknown	4	2%	100%	30	15%	90%	0%	0%

#15 metabot

opening	games	wins
2GateDTRush	35	100%
4GateGoon	47	89%
ProxyHeavyZealotRush	2	100%
Turtle	14	100%
4 openings	98	95%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	17	17%	88%	50	51%	90%	71%	0%
Fast rush	10	10%	100%	1	1%	100%	0%	0%
Heavy rush	2	2%	100%	7	7%	100%	50%	0%
Not fast rush	68	69%	96%	40	41%	100%	49%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#16 letabot

opening	games	wins
10-15GateGoon	1	0%
10Gate25NexusFE	2	50%
4GateGoon	4	75%
DTDrop	96	96%
4 openings	103	93%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	4	4%	75%	1	1%	100%	0%	0%
Not fast rush	40	39%	98%	10	10%	90%	10%	0%
Unknown	2	2%	50%		0%	0%	0%	0%
Wall-in	57	55%	93%	92	89%	93%	89%	0%

#17 arrakhammer

opening	games	wins
ForgeExpand4Gate2Archon	13	69%
ForgeExpand5GateGoon	146	98%
ForgeExpandSpeedlots	25	80%
ProxyHeavyZealotRush	11	55%
ProxyHeavyZealotRush2p	5	60%
5 openings	200	90%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	37	18%	92%	28	14%	100%	3%	0%
Heavy rush	82	41%	88%	96	48%	89%	46%	0%
Naked expand	12	6%	92%	6	3%	83%	25%	8%
Not fast rush	69	34%	93%	69	34%	90%	38%	0%
Unknown		0%	0%	1	0%	100%	0%	0%

#18 ecgberht

opening	games	wins
4GateGoon	53	100%
DTDrop	50	100%
2 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	53	51%	100%	88	85%	100%	81%	0%
Not fast rush	43	42%	100%	15	15%	100%	9%	0%
Unknown	7	7%	100%		0%	0%	0%	0%

#19 ualbertabot

opening	games	wins
4GateGoon	63	100%
9-9GateDefensive	5	100%
ForgeExpand5GateGoon	94	93%
Proxy9-9Gate	12	100%
4 openings	174	96%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	6	3%	100%	6	3%	100%	17%	0%
Fast rush	34	20%	88%	20	11%	100%	18%	0%
Heavy rush	55	32%	96%	37	21%	100%	31%	9%
Hydra bust	10	6%	100%	9	5%	89%	30%	0%
Not fast rush	68	39%	99%	92	53%	93%	46%	6%
Proxy		0%	0%	1	1%	100%	0%	0%
Unknown	1	1%	100%	9	5%	100%	0%	0%

#20 ximp

opening	games	wins
2GateDTRush	2	50%
4GateGoon	101	95%
2 openings	103	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	53	51%	96%	103	100%	94%	100%	0%
Unknown	50	49%	92%		0%	0%	0%	0%

#21 cdbot

opening	games	wins
9-9GateDefensive	1	100%
ForgeExpand5GateGoon	102	100%
2 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	5	5%	100%	10	10%	100%	0%	0%
Heavy rush	43	42%	100%	36	35%	100%	40%	5%
Hydra bust		0%	0%	2	2%	100%	0%	0%
Not fast rush	53	51%	100%	46	45%	100%	43%	8%
Proxy	1	1%	100%	3	3%	100%	0%	0%
Unknown	1	1%	100%	6	6%	100%	0%	0%

#22 aiur

opening	games	wins
10-15GateGoon	3	67%
12Nexus5ZealotFE	5	100%
2GateDTExpo	1	100%
2GateDTRush	4	100%
4GateGoon	114	96%
Proxy4GateGoon	3	100%
Proxy9-9Gate	6	83%
ProxyHeavyZealotRush	3	100%
ProxyHeavyZealotRush2p	1	100%
Turtle	14	93%
10 openings	154	95%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Dark templar	30	19%	97%	31	20%	94%	33%	0%
Heavy rush	39	25%	92%	53	34%	98%	28%	0%
Naked expand	13	8%	85%	3	2%	67%	23%	38%
Not fast rush	72	47%	97%	55	36%	93%	44%	1%
Proxy		0%	0%	6	4%	100%	0%	0%
Unknown		0%	0%	6	4%	100%	0%	0%

#23 killall

opening	games	wins
ForgeExpand5GateGoon	103	98%
1 openings	103	98%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	3	3%	100%	8	8%	100%	0%	0%
Heavy rush	45	44%	98%	38	37%	97%	22%	0%
Hydra bust		0%	0%	1	1%	100%	0%	0%
Not fast rush	54	52%	98%	56	54%	98%	41%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#24 willyt

opening	games	wins
10-15GateGoon	8	100%
10Gate25NexusFE	7	100%
4GateGoon	64	100%
DTDrop	21	100%
Turtle	3	100%
5 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	67	65%	100%	64	62%	100%	69%	0%
Not fast rush	35	34%	100%	36	35%	100%	46%	0%
Proxy		0%	0%	3	3%	100%	0%	0%
Unknown	1	1%	100%		0%	0%	0%	0%

#25 ailien

opening	games	wins
ForgeExpand4Gate2Archon	24	96%
ForgeExpand5GateGoon	33	97%
ForgeExpandSpeedlots	128	98%
ProxyHeavyZealotRush	12	83%
ProxyHeavyZealotRush2p	3	100%
5 openings	200	97%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	132	66%	98%	101	50%	96%	57%	2%
Naked expand		0%	0%	2	1%	100%	0%	0%
Not fast rush	68	34%	96%	95	48%	98%	62%	0%
Unknown		0%	0%	2	1%	100%	0%	0%

#26 cunybot

opening	games	wins
ForgeExpand5GateGoon	93	100%
1 openings	93	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	1	1%	100%	2	2%	100%	0%	0%
Heavy rush	44	47%	100%	23	25%	100%	25%	2%
Not fast rush	47	51%	100%	65	70%	100%	72%	4%
Unknown	1	1%	100%	3	3%	100%	0%	0%

#27 hellbot

opening	games	wins
2GateDTRush	20	100%
4GateGoon	83	100%
2 openings	103	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Not fast rush	49	48%	100%	103	100%	100%	100%	0%
Unknown	54	52%	100%		0%	0%	0%	0%

overall

	total		PvT		PvP		PvZ		PvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
10-15GateGoon	39	36%	36	33%	3	67%
10Gate25NexusFE	143	74%	143	74%
12Nexus5ZealotFE	5	100%			5	100%
12Nexus5ZealotFECannons	57	100%			57	100%
2GateDTExpo	24	42%			24	42%
2GateDTRush	161	79%			161	79%
4GateGoon	889	85%	121	99%	705	82%			63	100%
9-9GateDefensive	46	59%			40	52%	1	100%	5	100%
DTDrop	288	85%	288	85%
ForgeExpand4Gate2Archon	69	68%					69	68%
ForgeExpand5GateGoon	1011	92%					917	92%	94	93%
ForgeExpandSpeedlots	270	90%					270	90%
Proxy4GateGoon	27	59%	8	12%	19	79%
Proxy4GateGoon2p	20	60%	3	0%	17	71%
Proxy9-9Gate	45	47%	10	0%	23	39%			12	100%
ProxyHeavyZealotRush	54	56%			20	45%	34	62%
ProxyHeavyZealotRush2p	27	56%			7	43%	20	60%
Turtle	130	63%	3	100%	127	62%
total	3305	83%	612	80%	1208	77%	1311	89%	174	96%
openings played	18		8		13		6		4

AIIDE 2018 - what Steamhammer learned

In CIG, Steamhammer was broken. My findings on what Steamhammer learned in CIG 2018 are not valid, because Steamhammer rarely played the opening it thought it was playing; it played a broken version of the opening that left out drones and buildings. That is likely why the zergling rushes were successful in CIG: There was little in the build to leave out, so the build played more nearly as written. In this tournament, Steamhammer seems to have been working fine (though we’ll see when the replays come out)—well, working fine except for the usual bugs, some of which are fixed in version 2.1. Also, Steamhammer’s learning was revamped to better bamboozle opponents that tried to learn its patterns; the result is that its learning behavior is richer. I think these tables are full of interesting data.

103 rounds were played, of which 100 were official. Steamhammer is set to record at most 100 game records per opponent, so games from the first 3 rounds may have been dropped. That’s why the numbers don’t exactly match the official crosstable, even though the game totals look correct.

Steamhammer’s game records contain much more information than I can summarize in tidy little tables. This time I captured a little more of it, adding a table about the plan recognizer. For each plan that was recognized during a game, the table shows how often the plan was predicted before the game, how often it was recognized during the game, and the win rate in each of those cases. It also tries to measure the accuracy of the prediction. The plan recognizer itself is not very accurate; it often fails to recognize what is in front of it, calling the plan Unknown. The “?” column shows how often the plan was predicted and then no plan was recognized. The plan recognizer can also blow it completely and recognize the wrong plan. When the opponent plays predictably, the plan predictor is generally more accurate than the plan recognizer. When the opponent plays unpredictably, I don’t know which is more accurate! Either way, the plan prediction is more important early in the tournament; once Steamhammer has accumulated enough experience, it pays more attention to its learning data, and it doesn’t matter whether the predicted plan is good.

#1 saida

opening	games	wins
11Gas10PoolLurker	3	0%
11Gas10PoolMuta	1	0%
11HatchTurtleHydra	1	0%
2HatchHydraBust	1	0%
3HatchHydraExpo	1	0%
3HatchLurker	1	0%
4HatchBeforeGas	2	0%
4PoolHard	3	0%
5PoolHard	1	0%
5PoolSoft	1	0%
6Pool	1	0%
7PoolSoft	2	0%
9Hatch8Pool	2	0%
9HatchExpo9Pool9Gas	1	0%
9Pool	1	0%
9PoolExpo	1	0%
9PoolLurker	8	12%
9PoolSpeedAllIn	1	0%
9PoolSunkSpeed	1	0%
AntiFact_13Pool	8	0%
AntiFact_2Hatch	12	0%
AntiFactory	16	0%
AntiZeal_12Hatch	2	0%
Over10Hatch2SunkHard	1	0%
OverhatchLateGas	1	0%
Overpool+1	1	0%
OverpoolHatch	1	0%
PurpleSwarmBuild	1	0%
Sparkle 2HatchMuta	2	0%
ZvP_3HatchPoolHydra	1	0%
ZvT_12PoolMuta	2	0%
ZvT_2HatchMuta	1	0%
ZvT_3HatchMuta	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool11Gas	13	15%
ZvZ_Overpool9Gas	1	0%
ZvZ_OverpoolTurtle	1	0%
38 openings	100	3%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	100	100%	3%	91	91%	3%	91%	2%
Naked expand		0%	0%	7	7%	0%	0%	0%
Unknown		0%	0%	2	2%	0%	0%	0%

SAIDA is a good example of how Steamhammer reacts to a predictable opponent. First, it repeatedly tried its counters to the opponent’s Factory plan, the 3 “AntiFact” openings (you may call them fake news openings if you like). In this case the counters did not work; SAIDA is too strong. Then it explored more widely. Steamhammer scored 1 win with a fast lurker opening, and repeated the opening to no avail (maybe Steamhammer got lucky once, or maybe SAIDA learned the timing). It also scored a win with a ZvZ fast mutalisk opening, and repeating that did bring a second win for a total of 3 in 100 rounds. The smaller second table shows that the plan predictor was 100% accurate over the last 100 rounds in predicting SAIDA’s factory-first play, while the plan recognizer was 91% accurate and actually saw a command center first in 7 games.

#2 cherrypi

opening	games	wins
2.5HatchMuta	1	0%
3HatchPoolMuta	1	0%
4HatchBeforeGas	1	0%
4PoolSoft	1	0%
6PoolSpeed	2	0%
7PoolHard	1	0%
8Hatch7Pool	1	0%
9Hatch8Pool	1	0%
9PoolSunkSpeed	1	0%
OverhatchLing	1	0%
OverhatchMuta	1	0%
OverpoolSpeed	1	0%
OverpoolSunk	1	0%
ZvP_2HatchMuta	1	0%
ZvP_3BaseSpire+Den	1	0%
ZvT_12PoolMuta	1	0%
ZvT_3HatchMuta	1	0%
ZvT_3HatchMutaExpo	1	0%
ZvZ_12HatchMain	21	14%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	3	0%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool9Gas	30	30%
ZvZ_OverpoolTurtle	25	32%
24 openings	100	20%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	22	22%	14%	1	1%	0%	0%	100%
Heavy rush	77	77%	22%	28	28%	25%	35%	61%
Naked expand	1	1%	0%	2	2%	0%	0%	0%
Unknown		0%	0%	69	69%	19%	0%	0%

Steamhammer sees CherryPi as a strategy switcher. I suspect that CherryPi did not actually play any fast zergling rushes, because they said they avoided risky openings, but I can’t be sure without a closer look. In any case, Steamhammer found answers and scored a respectable 20% against a much higher ranked opponent.

#3 cse

opening	games	wins
11Gas10PoolLurker	1	0%
11Gas10PoolMuta	10	20%
11HatchTurtleHydra	2	0%
11HatchTurtleLurker	1	0%
12HatchTurtle	1	0%
2.5HatchMuta	1	0%
2HatchHydra	1	0%
2HatchHydraBust	5	0%
2HatchLurkerAllIn	1	0%
3HatchHydraBust	9	0%
3HatchHydraExpo	1	0%
3HatchLingBust	3	0%
3HatchLingExpo	1	0%
3HatchLurker	2	0%
3HatchPoolMuta	1	0%
4HatchBeforeGas	6	0%
4PoolHard	2	0%
5PoolHard2Player	2	0%
5PoolSoft	1	0%
7PoolHard	2	0%
7PoolSoft	1	0%
8Pool	3	0%
9HatchExpo9Pool9Gas	1	0%
9PoolExpo	1	0%
9PoolHatch	1	0%
9PoolSpeedAllIn	2	0%
9PoolSpire	2	0%
AntiFact_2Hatch	1	0%
AntiZeal_12Hatch	1	0%
Over10Hatch2SunkHard	1	0%
Over10HatchBust	2	0%
Over10HatchSlowLings	2	0%
OverhatchExpoLing	3	0%
OverhatchExpoMuta	1	0%
OverhatchMuta	1	0%
Overpool+1	1	0%
OverpoolHydra	1	0%
OverpoolLurker	1	0%
OverpoolSpeed	2	0%
PurpleSwarmBuild	1	0%
Sparkle 1HatchMuta	1	0%
ZvP_2HatchMuta	5	0%
ZvP_3BaseSpire+Den	3	0%
ZvP_3HatchPoolHydra	4	0%
ZvP_4HatchPoolHydra	1	0%
ZvZ_12Pool	2	0%
ZvZ_Overpool11Gas	1	0%
ZvZ_Overpool9Gas	1	0%
48 openings	100	2%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush		0%	0%	4	4%	0%	0%	0%
Safe expand	19	19%	0%	33	33%	0%	32%	5%
Turtle	81	81%	2%	60	60%	3%	60%	2%
Unknown		0%	0%	3	3%	0%	0%	0%

Steamhammer has trouble telling the difference between Safe Expand (in the protoss case, forge expand with cannons) and Turtle (hide behind cannons), because it does not scout well enough to see the natural nexus reliably. It compensates by reacting similarly in both cases. But the opponent is still seen as an unpredictable strategy switcher, so Steamhammer switches up its openings too. In this case it has more counter openings and tries each fewer times, so they are not as obvious in the table, but they do have higher counts: See 2HatchHydraBust, 3HatchHydraBust, 3HatchLingBust, 4HatchBeforeGas, ZvP_2HatchMuta, and ZvP_3BaseSpire+Den. As against SAIDA, Steamhammer scored 2 wins with a ZvZ fast mutalisk opening. I have an idea to add another exploration phase which experiments with all-in attacks like the fast mutas.

#4 bluebluesky

opening	games	wins
11Gas10PoolLurker	2	0%
11Gas10PoolMuta	1	0%
11HatchTurtleHydra	2	0%
2.5HatchMuta	1	0%
2HatchHydraBust	5	0%
2HatchLurker	1	0%
2HatchLurkerAllIn	1	0%
3HatchHydraBust	1	0%
3HatchLingBust	1	0%
3HatchLingExpo	1	0%
4HatchBeforeGas	3	0%
4PoolSoft	1	0%
5PoolHard	1	0%
7PoolHard	10	10%
8Pool	1	0%
9HatchExpo9Pool9Gas	18	11%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	3	0%
9PoolSpeedAllIn	3	0%
AntiFact_2Hatch	1	0%
Over10Hatch	2	0%
Over10Hatch1Sunk	1	0%
Over10Hatch2Sunk	2	0%
Over10Hatch2SunkHard	1	0%
OverhatchExpoLing	2	0%
Overpool+1	1	0%
OverpoolHatch	1	0%
OverpoolHydra	1	0%
OverpoolSpeed	1	0%
OverpoolTurtle	1	0%
PurpleSwarmBuild	1	0%
Sparkle 1HatchMuta	1	0%
Sparkle 2HatchMuta	1	0%
Sparkle 3HatchMuta	1	0%
ZvP_2HatchMuta	4	0%
ZvP_3BaseSpire+Den	7	0%
ZvP_3HatchPoolHydra	6	0%
ZvT_13Pool	1	0%
ZvZ_Overgas11Pool	1	0%
ZvZ_Overgas9Pool	3	0%
ZvZ_Overpool11Gas	2	0%
ZvZ_Overpool9Gas	1	0%
42 openings	100	3%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	7	7%	0%	20	20%	5%	29%	0%
Naked expand		0%	0%	1	1%	100%	0%	0%
Safe expand	53	53%	2%	45	45%	0%	58%	2%
Turtle	40	40%	5%	33	33%	3%	45%	0%
Unknown		0%	0%	1	1%	0%	0%	0%

Different all-ins took a few wins from BlueBlueSky.

#5 locutus

opening	games	wins
11Gas10PoolLurker	2	0%
11HatchTurtleLurker	1	0%
12HatchTurtle	1	0%
2HatchHydra	1	0%
2HatchHydraBust	5	0%
2HatchLurker	2	0%
2HatchLurkerAllIn	2	0%
3HatchHydra	1	0%
3HatchHydraBust	3	0%
3HatchHydraExpo	1	0%
3HatchLingBust	25	12%
3HatchLingExpo	2	0%
4PoolSoft	1	0%
5PoolHard	2	0%
6PoolSpeed	1	0%
8Hatch7Pool	1	0%
8Pool	1	0%
9HatchExpo9Pool9Gas	1	0%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	1	0%
9PoolSpeedAllIn	1	0%
AntiFact_13Pool	1	0%
AntiFact_2Hatch	1	0%
AntiFactory	1	0%
AntiZeal_12Hatch	1	0%
Over10Hatch	1	0%
Over10Hatch2SunkHard	1	0%
OverhatchExpoMuta	2	0%
OverhatchLateGas	1	0%
OverpoolHydra	1	0%
OverpoolSpeed	1	0%
OverpoolSunk	1	0%
OverpoolTurtle	1	0%
PurpleSwarmBuild	2	0%
Sparkle 2HatchMuta	1	0%
Sparkle 3HatchMuta	1	0%
ZvP_2HatchMuta	5	0%
ZvP_3BaseSpire+Den	4	0%
ZvP_3HatchPoolHydra	5	0%
ZvP_Overpool3Hatch	1	0%
ZvT_12PoolMuta	4	0%
ZvT_13Pool	1	0%
ZvT_2HatchMuta	1	0%
ZvT_3HatchMuta	1	0%
ZvZ_12Pool	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	1	0%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool9Gas	1	0%
49 openings	100	3%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush		0%	0%	4	4%	25%	0%	0%
Safe expand	62	62%	3%	55	55%	0%	60%	0%
Turtle	38	38%	3%	41	41%	5%	50%	0%

#6 isamind

opening	games	wins
11Gas10PoolLurker	1	0%
11Gas10PoolMuta	1	0%
2.5HatchMuta	1	0%
2HatchHydra	1	0%
2HatchHydraBust	6	0%
2HatchLurker	1	0%
3HatchHydra	1	0%
3HatchHydraBust	5	0%
3HatchLingBust	5	0%
4HatchBeforeGas	3	0%
4PoolHard	1	0%
4PoolSoft	2	0%
5PoolHard2Player	1	0%
5PoolSoft	1	0%
7PoolHard	11	18%
7PoolMid	1	0%
7PoolSoft	1	0%
8Hatch7Pool	1	0%
8Pool	1	0%
9HatchExpo9Pool9Gas	3	0%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	1	0%
9PoolSunkHatch	1	0%
AntiFact_13Pool	1	0%
AntiZeal_12Hatch	1	0%
Over10Hatch	1	0%
Over10Hatch1Sunk	2	0%
Over10Hatch2Sunk	1	0%
Over10Hatch2SunkHard	1	0%
Over10HatchSlowLings	1	0%
OverhatchExpoLing	3	0%
OverpoolHatch	8	12%
OverpoolHydra	1	0%
OverpoolLurker	2	0%
OverpoolSpeed	2	0%
PurpleSwarmBuild	1	0%
ZvP_2HatchMuta	2	0%
ZvP_3BaseSpire+Den	4	0%
ZvP_3HatchPoolHydra	6	17%
ZvP_Overpool3Hatch	3	0%
ZvT_2HatchMuta	4	0%
ZvT_3HatchMutaExpo	1	0%
ZvZ_12HatchMain	1	0%
ZvZ_12PoolMain	1	0%
ZvZ_Overpool11Gas	1	0%
ZvZ_OverpoolTurtle	1	0%
46 openings	100	4%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	17	17%	12%	14	14%	14%	65%	6%
Proxy	2	2%	0%	2	2%	0%	0%	0%
Safe expand	62	62%	3%	47	47%	2%	47%	5%
Turtle	19	19%	0%	33	33%	3%	26%	0%
Unknown		0%	0%	4	4%	0%	0%	0%

#7 daqin

opening	games	wins
11Gas10PoolMuta	8	12%
2HatchHydra	2	0%
2HatchHydraBust	5	0%
2HatchLurkerAllIn	5	0%
3HatchHydra	2	0%
3HatchHydraBust	3	0%
3HatchHydraExpo	2	0%
3HatchLing	1	0%
3HatchLingBust	4	0%
3HatchLingExpo	1	0%
4HatchBeforeGas	4	0%
4PoolSoft	1	0%
5PoolHard2Player	2	0%
6PoolSpeed	3	0%
8Hatch7Pool	1	0%
9HatchExpo9Pool9Gas	1	0%
9PoolHatch	2	0%
9PoolSpeedAllIn	3	0%
9PoolSpire	1	0%
9PoolSunkHatch	3	0%
9PoolSunkSpeed	2	0%
AntiFact_13Pool	1	0%
AntiFact_2Hatch	2	0%
AntiZeal_12Hatch	1	0%
Over10Hatch1Sunk	2	0%
Over10Hatch2Sunk	3	0%
OverhatchExpoLing	1	0%
OverhatchExpoMuta	4	0%
OverhatchLateGas	1	0%
OverhatchLing	1	0%
OverpoolHatch	1	0%
OverpoolHydra	2	0%
OverpoolLurker	1	0%
OverpoolSpeed	4	0%
OverpoolSunk	1	0%
OverpoolTurtle	1	0%
Sparkle 1HatchMuta	2	0%
ZvP_2HatchMuta	2	0%
ZvP_3BaseSpire+Den	3	0%
ZvP_3HatchPoolHydra	2	0%
ZvP_4HatchPoolHydra	1	0%
ZvT_12PoolMuta	1	0%
ZvT_3HatchMutaExpo	1	0%
ZvZ_12HatchExpo	1	0%
ZvZ_12HatchMain	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_Overgas11Pool	1	0%
ZvZ_OverpoolTurtle	2	0%
48 openings	100	1%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush		0%	0%	3	3%	0%	0%	0%
Proxy	10	10%	0%	16	16%	0%	0%	0%
Safe expand	35	35%	0%	34	34%	0%	29%	6%
Turtle	55	55%	2%	41	41%	2%	40%	7%
Unknown		0%	0%	6	6%	0%	0%	0%

#8 mcrave

opening	games	wins
11HatchTurtleHydra	12	50%
2HatchHydra	11	36%
2HatchLurker	2	50%
2HatchLurkerAllIn	1	0%
3HatchHydraBust	7	43%
3HatchLing	2	0%
3HatchLingBust	1	0%
AntiZeal_12Hatch	2	0%
Over10Hatch2Hard	1	0%
Over10HatchBust	1	0%
OverhatchLateGas	23	30%
ZvP_3HatchPoolHydra	13	23%
ZvP_Overpool3Hatch	1	0%
ZvT_12PoolMuta	1	0%
ZvZ_OverpoolTurtle	22	64%
15 openings	100	38%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	91	91%	37%	51	51%	25%	54%	31%
Safe expand	8	8%	38%	11	11%	45%	0%	62%
Turtle	1	1%	100%	5	5%	20%	0%	0%
Unknown		0%	0%	33	33%	58%	0%	0%

The plan predictor struggled to predict what McRave was going to do next, but learning worked well anyway—eventually. The ZvZ_OverpoolTurtle choice is a big surprise, an opening that builds 3 sunkens and gets fast mutalisks on one base. The opening is sound only against certain all-in zerg strategies; protoss really ought to smash it. I’m guessing it worked against a zealot rush where McRave was slow to switch tech when the mutas showed up.

#9 iron

opening	games	wins
12HatchTurtle	1	0%
2.5HatchMuta	1	0%
3HatchPoolMuta	9	11%
9PoolExpo	8	25%
9PoolSunkHatch	1	0%
AntiFact_13Pool	35	23%
AntiFact_2Hatch	2	0%
AntiFactory	1	0%
AntiZeal_12Hatch	1	0%
OverpoolLurker	1	0%
OverpoolSpeed	1	0%
OverpoolSunk	1	0%
ZvP_4HatchPoolHydra	1	0%
ZvZ_12PoolMain	1	0%
ZvZ_Overgas11Pool	14	50%
ZvZ_Overpool9Gas	22	45%
16 openings	100	28%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	100	100%	28%	91	91%	29%	91%	7%
Turtle		0%	0%	2	2%	0%	0%	0%
Unknown		0%	0%	7	7%	29%	0%	0%

When I run matches locally against Iron, Steamhammer soon settles on AntiFactory as the most reliable answer, and that does seem best. For some reason, Steamhammer behaved differently in both CIG and AIIDE. It is astonishing that ZvZ fast mutalisk openings came out on top again. Exactly as against SAIDA, the plan predictor was 100% accurate while the plan recognizer was 91% accurate.

#10 zzzkbot

opening	games	wins
3HatchHydraBust	1	0%
4PoolHard	1	0%
9PoolSpeedAllIn	14	79%
9PoolSunkHatch	22	32%
OverhatchExpoLing	1	0%
OverhatchLing	1	0%
OverpoolSunk	21	38%
ZvP_3HatchPoolHydra	1	0%
ZvP_4HatchPoolHydra	1	0%
ZvZ_Overgas9Pool	25	44%
ZvZ_Overpool11Gas	5	20%
ZvZ_Overpool9Gas	1	0%
ZvZ_OverpoolTurtle	6	17%
13 openings	100	39%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	77	77%	42%	21	21%	57%	22%	75%
Heavy rush	14	14%	21%	2	2%	0%	0%	86%
Turtle	9	9%	44%	2	2%	100%	22%	56%
Unknown		0%	0%	75	75%	33%	0%	0%

9PoolSunkHatch and OverpoolSunk are anti-rush openings, and 9PoolSpeedAllIn is general-purpose but good against rushes. In contrast, ZvZ_Overgas9Pool is a fast mutalisk opening and can be overrun by too many zerglings. I don’t know how accurate the plan predictions are, but they agree fairly well with the selected openings.

#12 microwave

opening	games	wins
11Gas10PoolMuta	28	32%
3HatchHydraBust	1	0%
3HatchLing	1	0%
3HatchLingExpo	1	0%
3HatchLurker	1	0%
4PoolSoft	12	17%
5PoolHard2Player	1	0%
9HatchMain9Pool9Gas	2	0%
9PoolSpeed	1	0%
9PoolSpeedAllIn	1	0%
9PoolSunkSpeed	2	0%
AntiFact_2Hatch	1	0%
OverhatchLing	2	0%
OverpoolSunk	4	25%
ZvZ_12HatchMain	2	0%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	2	0%
ZvZ_Overgas9Pool	2	0%
ZvZ_Overpool11Gas	10	20%
ZvZ_Overpool9Gas	23	39%
ZvZ_OverpoolTurtle	2	0%
21 openings	100	23%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	15	15%	27%	10	10%	50%	13%	53%
Heavy rush	42	42%	17%	20	20%	40%	14%	45%
Naked expand	43	43%	28%	21	21%	5%	21%	49%
Turtle		0%	0%	1	1%	0%	0%	0%
Unknown		0%	0%	48	48%	19%	0%	0%

Microwave really mixed things up, and it was successful! Steamhammer could not predict the opening switches. It’s interesting that when Steamhammer predicted a fast rush, it won a quarter of the time, and when it actually recognized a fast rush, it won half the time. That doesn’t tell us what actually happened in the games. When Steamhammer recognizes a fast rush, it can react no matter what opening it is playing, and often save itself. When it is rushed and doesn’t recognize it, it will lose unless it is playing a safe opening.

#13 lastorder

opening	games	wins
3HatchLingBust	12	33%
4PoolHard	1	0%
4PoolSoft	21	29%
6PoolSpeed	1	0%
AntiFactory	1	0%
Over10Hatch	1	0%
Over10Hatch1Sunk	4	25%
OverhatchLing	2	0%
OverhatchMuta	7	29%
PurpleSwarmBuild	1	0%
ZvP_3HatchPoolHydra	1	0%
ZvT_3HatchMutaExpo	6	33%
ZvZ_12HatchMain	13	31%
ZvZ_12PoolLing	5	20%
ZvZ_12PoolMain	5	0%
ZvZ_Overpool11Gas	17	35%
ZvZ_OverpoolTurtle	2	0%
17 openings	100	26%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	100	100%	26%	77	77%	25%	77%	14%
Naked expand		0%	0%	3	3%	0%	0%	0%
Turtle		0%	0%	6	6%	17%	0%	0%
Unknown		0%	0%	14	14%	43%	0%	0%

LastOrder did not learn during the tournament and played predictably, yet Steamhammer struggled to find an answer. We also know that LastOrder learned extensively offline before the tournament. Knowing that, and looking at these tables (check out the variety of recognized plans and the variety of Steamhammer’s more successful openings), I get the impression that LastOrder is highly adaptive and knows how to react in a wide variety of situations. I guess we’ll see when the replays come out.

#14 tyr

opening	games	wins
2HatchHydraBust	13	38%
2HatchLurkerAllIn	14	43%
3HatchHydraExpo	38	76%
4HatchBeforeGas	2	0%
4PoolHard	4	25%
9PoolSunkSpeed	1	0%
Over10Hatch2Hard	1	0%
Over10HatchBust	1	0%
OverpoolLurker	7	29%
OverpoolSpeed	5	100%
ZvP_3BaseSpire+Den	13	62%
ZvP_3HatchPoolHydra	1	0%
12 openings	100	56%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	39	39%	56%	45	45%	78%	41%	3%
Naked expand		0%	0%	1	1%	100%	0%	0%
Turtle	61	61%	56%	50	50%	32%	48%	5%
Unknown		0%	0%	4	4%	100%	0%	0%

These numbers say that anything which helps Steamhammer find the right answers early, without having to do so much random exploration, would be a big win in a long tournament. The plan recognizer is not good enough.

#15 metabot

opening	games	wins
11Gas10PoolLurker	2	50%
11HatchTurtleHydra	6	83%
12HatchTurtle	3	67%
2HatchLurkerAllIn	3	67%
3HatchHydraExpo	1	0%
3HatchLing	11	82%
3HatchLingExpo	10	60%
4PoolHard	1	0%
6PoolSpeed	2	100%
9HatchExpo9Pool9Gas	8	50%
9PoolHatch	3	67%
9PoolSpeedAllIn	2	50%
AntiZeal_12Hatch	1	0%
Over10Hatch	2	50%
Over10Hatch2Hard	1	100%
Over10Hatch2Sunk	3	0%
OverhatchExpoLing	8	62%
OverhatchExpoMuta	14	43%
OverhatchLateGas	4	25%
OverpoolSpeed	4	75%
ZvP_2HatchMuta	2	50%
21 openings	91	57%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	34	37%	65%	19	21%	68%	21%	41%
Naked expand	3	3%	33%	3	3%	100%	0%	33%
Safe expand	34	37%	56%	20	22%	45%	21%	38%
Turtle	19	21%	47%	13	14%	46%	11%	42%
Unknown	1	1%	100%	36	40%	58%	0%	0%

It must have been a crazy learning duel! Later I’ll try to figure out what MetaBot learned, and we can check them against each other.

#16 letabot

opening	games	wins
12HatchTurtle	2	0%
3HatchLing	1	0%
6PoolSpeed	11	64%
9HatchExpo9Pool9Gas	6	33%
9PoolLurker	45	82%
OverpoolHatch	7	71%
OverpoolLurker	28	82%
7 openings	100	74%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	99	99%	74%	59	59%	78%	59%	20%
Safe expand		0%	0%	4	4%	50%	0%	0%
Turtle	1	1%	100%	17	17%	76%	0%	0%
Unknown		0%	0%	20	20%	65%	0%	0%

#17 arrakhammer

opening	games	wins
2HatchLurkerAllIn	1	0%
4PoolHard	22	68%
6PoolSpeed	52	75%
7Pool12Hatch	1	0%
9HatchMain9Pool9Gas	1	0%
9PoolSpeedAllIn	1	0%
AntiFactory	1	0%
Over10Hatch2SunkHard	1	0%
Over10HatchBust	1	0%
Over10HatchSlowLings	1	0%
OverhatchExpoMuta	1	0%
OverhatchLing	1	0%
OverpoolHydra	1	0%
ZvZ_12HatchMain	1	0%
ZvZ_12PoolLing	1	0%
ZvZ_12PoolMain	2	0%
ZvZ_Overpool11Gas	11	36%
17 openings	100	58%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	99	99%	58%	78	78%	65%	78%	1%
Naked expand	1	1%	100%	21	21%	29%	0%	0%
Unknown		0%	0%	1	1%	100%	0%	0%

This old version of Arrakhammer has a fixed anti-Steamhammer opening configured. It was written before Steamhammer had learning. Modern Steamhammer can exploit the fixed opening. You can’t get away with that any more.

#18 ecgberht

opening	games	wins
11Gas10PoolLurker	11	91%
11HatchTurtleLurker	51	100%
9PoolLurker	37	97%
OverpoolLurker	1	0%
4 openings	100	97%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	100	100%	97%	67	67%	96%	67%	33%
Unknown		0%	0%	33	33%	100%	0%	0%

#19 ualbertabot

opening	games	wins
3HatchLurker	1	0%
7PoolHard	11	82%
AntiZeal_12Hatch	7	57%
OverhatchExpoMuta	1	0%
OverpoolTurtle	80	98%
5 openings	100	91%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	2	2%	100%	11	11%	100%	0%	0%
Fast rush	12	12%	92%	15	15%	80%	33%	25%
Heavy rush	85	85%	91%	45	45%	89%	45%	22%
Naked expand	1	1%	100%	7	7%	100%	0%	0%
Unknown		0%	0%	22	22%	95%	0%	0%

Getting that 98% win rate is one of the reasons I added the seemingly nonsensical overpool turtle opening, which makes an absurd 6 sunkens on one base. It works against all kinds of rushes, fast or slow, when the rusher does not know how to adapt.

#20 ximp

opening	games	wins
3HatchHydraExpo	17	82%
4HatchBeforeGas	36	83%
9Hatch8Pool	1	0%
AntiFactory	1	0%
ZvP_2HatchMuta	9	78%
ZvP_3BaseSpire+Den	36	78%
6 openings	100	79%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Safe expand	3	3%	100%	18	18%	94%	0%	0%
Turtle	97	97%	78%	78	78%	76%	77%	4%
Unknown		0%	0%	4	4%	75%	0%	0%

Why didn’t Steamhammer try the 3 hatch before pool opening even once in 100 rounds? I expect it would have scored higher. Well, I know why; when the win rate is so convincing, Steamhammer doesn’t explore much.

#21 cdbot

opening	games	wins
11HatchTurtleHydra	1	0%
9PoolSunkSpeed	15	47%
OverpoolSunk	82	96%
ZvP_Overpool3Hatch	1	0%
ZvZ_12PoolLing	1	0%
5 openings	100	86%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	96	96%	85%	31	31%	71%	29%	57%
Heavy rush	4	4%	100%	13	13%	100%	0%	25%
Unknown		0%	0%	56	56%	91%	0%	0%

#22 aiur

opening	games	wins
11Gas10PoolLurker	1	0%
3HatchHydraExpo	28	89%
5PoolHard2Player	1	0%
AntiZeal_12Hatch	46	91%
Over10Hatch	24	92%
5 openings	100	89%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	95	95%	89%	65	65%	91%	64%	18%
Naked expand	4	4%	75%	15	15%	73%	0%	25%
Proxy		0%	0%	2	2%	50%	0%	0%
Turtle	1	1%	100%		0%	0%	0%	0%
Unknown		0%	0%	18	18%	100%	0%	0%

Turtle was predicted once but never recognized in the last 100 games. That implies that Steamhammer recognized a turtle opening in the first 3 rounds—and it was wrong, since AIUR doesn’t do that; it must have been a misrecognized cannon rush, a bug that has crept in. Comparing against what AIUR learned, I see that AIUR cannon rushed Steamhammer 3 times total, all failures, and favored its defensive strategy.

#23 killall

opening	games	wins
6PoolSpeed	1	0%
9PoolSpeed	37	100%
ZvZ_12PoolMain	1	0%
ZvZ_OverpoolTurtle	61	93%
4 openings	100	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	75	75%	93%	43	43%	91%	49%	36%
Naked expand	5	5%	80%	12	12%	100%	20%	20%
Turtle	20	20%	100%	10	10%	100%	45%	35%
Unknown		0%	0%	35	35%	94%	0%	0%

#24 willyt

opening	games	wins
11Gas10PoolLurker	30	97%
11HatchTurtleLurker	7	86%
12HatchTurtle	2	0%
2HatchLurkerAllIn	24	96%
6PoolSpeed	1	0%
9PoolLurker	1	0%
OverpoolLurker	35	100%
7 openings	100	93%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Heavy rush	100	100%	93%	85	85%	96%	85%	15%
Unknown		0%	0%	15	15%	73%	0%	0%

#25 ailien

opening	games	wins
3HatchLurker	1	0%
6PoolSpeed	1	0%
9PoolSpeedAllIn	1	0%
OverhatchLing	1	0%
ZvT_3HatchMuta	1	0%
ZvZ_Overgas9Pool	7	43%
ZvZ_Overpool9Gas	20	85%
ZvZ_OverpoolTurtle	68	93%
8 openings	100	83%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Naked expand	98	98%	85%	3	3%	0%	2%	98%
Unknown	2	2%	0%	97	97%	86%	0%	50%

#26 cunybot

opening	games	wins
11Gas10PoolMuta	1	0%
5PoolHard2Player	3	67%
OverhatchLing	15	93%
OverpoolSpeed	1	0%
ZvZ_12HatchExpo	2	50%
ZvZ_OverpoolTurtle	77	100%
6 openings	99	95%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	4	4%	100%	3	3%	100%	0%	75%
Heavy rush	13	13%	100%	6	6%	83%	0%	62%
Naked expand	62	63%	94%	20	20%	90%	19%	61%
Turtle	19	19%	100%	10	10%	100%	11%	58%
Unknown	1	1%	0%	60	61%	97%	0%	0%

#27 hellbot

opening	games	wins
2HatchHydraBust	5	80%
3HatchHydra	7	100%
3HatchHydraBust	12	100%
3HatchHydraExpo	14	100%
3HatchLingBust	8	100%
4HatchBeforeGas	16	100%
Over10Hatch1Sunk	3	100%
ZvP_2HatchMuta	11	100%
ZvP_3BaseSpire+Den	15	100%
ZvP_3HatchPoolHydra	9	100%
10 openings	100	99%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Turtle	100	100%	99%	76	76%	99%	76%	24%
Unknown		0%	0%	24	24%	100%	0%	0%

overall

	total		ZvT		ZvP		ZvZ		ZvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
11Gas10PoolLurker	53	75%	44	89%	9	11%
11Gas10PoolMuta	50	24%	1	0%	20	15%	29	31%
11HatchTurtleHydra	24	46%	1	0%	22	50%	1	0%
11HatchTurtleLurker	60	95%	58	98%	2	0%
12HatchTurtle	10	20%	5	0%	5	40%
2.5HatchMuta	5	0%	1	0%	3	0%	1	0%
2HatchHydra	16	25%			16	25%
2HatchHydraBust	45	20%	1	0%	44	20%
2HatchLurker	6	17%			6	17%
2HatchLurkerAllIn	52	60%	24	96%	27	30%	1	0%
3HatchHydra	11	64%			11	64%
3HatchHydraBust	42	36%			40	38%	2	0%
3HatchHydraExpo	103	80%	1	0%	102	80%
3HatchLing	16	56%	1	0%	14	64%	1	0%
3HatchLingBust	59	25%			47	23%	12	33%
3HatchLingExpo	16	38%			15	40%	1	0%
3HatchLurker	6	0%	1	0%	2	0%	2	0%	1	0%
3HatchPoolMuta	11	9%	9	11%	1	0%	1	0%
4HatchBeforeGas	73	63%	2	0%	70	66%	1	0%
4PoolHard	35	46%	3	0%	8	12%	24	62%
4PoolSoft	39	21%			5	0%	34	24%
5PoolHard	4	0%	1	0%	3	0%
5PoolHard2Player	10	20%			6	0%	4	50%
5PoolSoft	3	0%	1	0%	2	0%
6Pool	1	0%	1	0%
6PoolSpeed	75	64%	12	58%	6	33%	57	68%
7Pool12Hatch	1	0%					1	0%
7PoolHard	35	34%			23	13%	1	0%	11	82%
7PoolMid	1	0%			1	0%
7PoolSoft	4	0%	2	0%	2	0%
8Hatch7Pool	4	0%			3	0%	1	0%
8Pool	6	0%			6	0%
9Hatch8Pool	4	0%	2	0%	1	0%	1	0%
9HatchExpo9Pool9Gas	39	21%	7	29%	32	19%
9HatchMain9Pool9Gas	6	0%			3	0%	3	0%
9Pool	1	0%	1	0%
9PoolExpo	10	20%	9	22%	1	0%
9PoolHatch	6	33%			6	33%
9PoolLurker	91	81%	91	81%
9PoolSpeed	43	86%			5	0%	38	97%
9PoolSpeedAllIn	29	41%	1	0%	11	9%	17	65%
9PoolSpire	3	0%			3	0%
9PoolSunkHatch	27	26%	1	0%	4	0%	22	32%
9PoolSunkSpeed	22	32%	1	0%	3	0%	18	39%
AntiFact_13Pool	46	17%	43	19%	3	0%
AntiFact_2Hatch	20	0%	14	0%	5	0%	1	0%
AntiFactory	21	0%	17	0%	2	0%	2	0%
AntiZeal_12Hatch	63	73%	3	0%	53	79%			7	57%
Over10Hatch	31	74%			30	77%	1	0%
Over10Hatch1Sunk	12	33%			8	38%	4	25%
Over10Hatch2Hard	3	33%			3	33%
Over10Hatch2Sunk	9	0%			9	0%
Over10Hatch2SunkHard	6	0%	1	0%	4	0%	1	0%
Over10HatchBust	5	0%			4	0%	1	0%
Over10HatchSlowLings	4	0%			3	0%	1	0%
OverhatchExpoLing	18	28%			17	29%	1	0%
OverhatchExpoMuta	23	26%			21	29%	1	0%	1	0%
OverhatchLateGas	30	27%	1	0%	29	28%
OverhatchLing	24	58%			1	0%	23	61%
OverhatchMuta	9	22%			1	0%	8	25%
Overpool+1	3	0%	1	0%	2	0%
OverpoolHatch	18	33%	8	62%	10	10%
OverpoolHydra	7	0%			6	0%	1	0%
OverpoolLurker	76	79%	65	89%	11	18%
OverpoolSpeed	22	36%	1	0%	19	42%	2	0%
OverpoolSunk	111	79%	1	0%	2	0%	108	81%
OverpoolTurtle	83	94%			3	0%			80	98%
PurpleSwarmBuild	7	0%	1	0%	5	0%	1	0%
Sparkle 1HatchMuta	4	0%			4	0%
Sparkle 2HatchMuta	4	0%	2	0%	2	0%
Sparkle 3HatchMuta	2	0%			2	0%
ZvP_2HatchMuta	41	46%			40	48%	1	0%
ZvP_3BaseSpire+Den	86	59%			85	60%	1	0%
ZvP_3HatchPoolHydra	49	27%	1	0%	46	28%	2	0%
ZvP_4HatchPoolHydra	4	0%	1	0%	2	0%	1	0%
ZvP_Overpool3Hatch	6	0%			5	0%	1	0%
ZvT_12PoolMuta	9	0%	2	0%	6	0%	1	0%
ZvT_13Pool	2	0%			2	0%
ZvT_2HatchMuta	6	0%	1	0%	5	0%
ZvT_3HatchMuta	4	0%	1	0%	1	0%	2	0%
ZvT_3HatchMutaExpo	9	22%			2	0%	7	29%
ZvZ_12HatchExpo	3	33%			1	0%	2	50%
ZvZ_12HatchMain	39	18%			2	0%	37	19%
ZvZ_12Pool	3	0%			3	0%
ZvZ_12PoolLing	12	8%	1	0%	2	0%	9	11%
ZvZ_12PoolMain	16	0%	1	0%	2	0%	13	0%
ZvZ_Overgas11Pool	16	44%	14	50%	2	0%
ZvZ_Overgas9Pool	40	35%	1	0%	4	0%	35	40%
ZvZ_Overpool11Gas	60	25%	13	15%	4	0%	43	30%
ZvZ_Overpool9Gas	100	45%	23	43%	3	0%	74	47%
ZvZ_OverpoolTurtle	267	82%	1	0%	25	56%	241	85%
total	2590	52%	500	59%	1091	39%	899	58%	100	91%
openings played	91		52		87		55		5

Steamhammer played all of its openings during the tournament, almost all of them multiple times. It even tried the 3 specialized openings for the island map Sparkle. Nearly as many were played in ZvP alone, since it spent so much time desperately seeking an answer to the Locutusoids (or possibly Susan). Some openings were highly successful in given matchups, which generally means that the opening defeated one opponent reliably and so was played many times. For example, OverpoolSunk wiped out CDBot, which makes it look in this table as though it wiped out all zergs. If only it were so simple! The opening with the best success across matchups is 6PoolSpeed, an opening that I have never seen in human play.

AIIDE 2018 - what AIUR learned

Here is what the protoss AIUR learned about each opponent over the course of AIIDE 2018. . Seeing AIUR’s counters for each opponent tells us something about how the opponent played. For the recent CIG edition, see CIG 2018 - what AIUR learned.

This is generated from data in AIUR’s final write directory. There were 103 rounds played (100 of which were official) and 10 maps, three 2-player, two 3-player, and five 4-player maps. For some opponents, all games were recorded; for the supernumerary 3 rounds at the end, the extra games were on the 2-player maps (they’re taken in rotation). For many opponents, fewer than 103 games were recorded. AIUR recorded 2606 games in 103 rounds, and officially played 2570 in 100 rounds. 2570 plus the 3 extra rounds times 26 opponents per round gives a total of 2648, which is 42 more than AIUR recorded. There were 37 official crashes in 100 rounds, leaving 5 games unaccounted for. They might be crashes in the extra 3 rounds. It’s also possible that the last round was not finished.

It would be nice if we had the data after round 100, instead of round 103. We could do the accounting and get correct answers.

First, the totals across all opponents.

overall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	221	27%	71	25%	137	18%	429	24%
rush	101	46%	83	25%	210	32%	394	34%
aggressive	57	7%	82	16%	173	17%	312	15%
fast expo	86	41%	79	34%	233	31%	398	34%
macro	80	26%	67	33%	136	32%	283	31%
defensive	261	34%	134	41%	395	37%	790	37%
total	806	32%	516	30%	1284	30%	2606	31%

2, 3, 4 - map size, the number of starting positions
n - games recorded
wins - winning percentage over those games
cheese - cannon rush
rush - dark templar rush
aggressive - fast 4 zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - try to be safe against rushes

AIUR struggled in this tournament; it has not been updated since 2014. As in CIG, AIUR did about equally well on the different map sizes, but relied on a different mix of strategies on each. On all map sizes, the defensive strategy was most often used. On 2-player maps, the cannon rush was also a popular solution, and on 4-player maps (where cannon rush is harder to pull off), the dark templar rush and the nexus first fast expansion were popular.

#1 saida	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	3	0%	7	0%	11	0%
rush	1	0%	2	0%	6	0%	9	0%
aggressive	18	6%	7	0%	27	11%	52	8%
fast expo	1	0%	3	0%	5	0%	9	0%
macro	1	0%	4	0%	2	0%	7	0%
defensive	2	0%	1	0%	2	0%	5	0%
total	24	4%	20	0%	49	6%	93	4%

As in CIG, AIUR’s learning is able to squeeze a little extra from the toughest opponents. Against #1 SAIDA, it found that the dark templar rush occasionally worked, and was able to get a couple extra wins on 4-player maps. The same plan scored a single win on a 2-player map, but repeating the strategy did not help. Nothing else it tried made a dent.

#2 cherrypi	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	11	9%	2	0%	6	0%	19	5%
rush	5	0%	3	0%	5	0%	13	0%
aggressive	2	0%	4	0%	6	0%	12	0%
fast expo	1	0%	6	0%	7	0%	14	0%
macro	5	0%	2	0%	20	5%	27	4%
defensive	6	0%	3	0%	6	0%	15	0%
total	30	3%	20	0%	50	2%	100	2%

Oops, I lied already. AIUR was not able to squeeze an extra win against CherryPi. It won a total of 2 times with different strategies, and repeating the strategies did not win again. This is the first time I have seen AIUR’s diverse strategies unable to make any impression.

#3 cse	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	26	12%	14	14%	5	0%	45	11%
rush	1	0%	1	0%	9	0%	11	0%
aggressive	2	0%	1	0%	8	0%	11	0%
fast expo	1	0%	1	0%	10	0%	12	0%
macro	1	0%	1	0%	8	0%	10	0%
defensive	1	0%	2	0%	10	0%	13	0%
total	32	9%	20	10%	50	0%	102	5%

CSE was apparently not fully prepared for cannon rushes. AIUR plays the best cannon rush of all bots, in my opinion. But even the best is harder to pull off on a 4-player map.

#4 bluebluesky	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	10	60%	11	64%	30	30%	51	43%
rush	5	0%	2	0%	3	0%	10	0%
aggressive	3	0%	2	0%	7	0%	12	0%
fast expo	3	0%	2	0%	4	0%	9	0%
macro	5	0%	2	0%	5	0%	12	0%
defensive	5	0%	1	0%	1	0%	7	0%
total	31	19%	20	35%	50	18%	101	22%

The Locutusoids showed somewhat similar patterns. BlueBlueSky was surprisingly weak against the cannon rush.

#5 locutus	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	18	17%	2	0%	11	0%	31	10%
rush	8	12%	5	0%	11	9%	24	8%
aggressive	1	0%	2	0%	7	0%	10	0%
fast expo	1	0%	2	0%	5	0%	8	0%
macro	2	0%	3	0%	8	0%	13	0%
defensive	1	0%	5	0%	8	0%	14	0%
total	31	13%	19	0%	50	2%	100	5%

The other part of the pattern is some weakness against dark templar rush. Interestingly, the earlier version of Locutus survived AIUR’s DTs perfectly in CIG, despite a fair number of tries.

#6 isamind	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	27	11%	1	0%	1	0%	29	10%
rush	1	0%	3	0%	2	0%	6	0%
aggressive	1	0%	4	0%	4	0%	9	0%
fast expo	2	0%	6	0%	40	8%	48	6%
macro	1	0%	2	0%	1	0%	4	0%
defensive	1	0%	4	25%	2	0%	7	14%
total	33	9%	20	5%	50	6%	103	7%

#7 daqin	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	20	15%	1	0%	2	0%	23	13%
rush	5	40%	15	20%	43	19%	63	21%
aggressive	1	0%	1	0%	2	0%	4	0%
fast expo	2	0%	1	0%	1	0%	4	0%
macro	2	0%	1	0%	1	0%	4	0%
defensive	1	0%	1	0%	1	0%	3	0%
total	31	16%	20	15%	50	16%	101	16%

#8 mcrave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	17	35%	11	27%	4	25%	32	31%
rush	4	0%	1	0%	3	0%	8	0%
aggressive	1	0%	4	25%	4	0%	9	11%
fast expo	4	25%	2	0%	33	30%	39	28%
macro	3	0%	1	0%	1	0%	5	0%
defensive	2	0%	1	0%	5	0%	8	0%
total	31	23%	20	20%	50	22%	101	22%

McRave shows a different pattern. Its weaknesses were against the cannon rush on smaller maps and nexus first on 4-player maps—a fast rush versus a macro opening. The tournament manager cycles through the maps in order, which makes a difference for bots which are sensitive to which map is being played. It’s possible that the sequence of strategies that AIUR played as the maps cycled through helped confuse McRave’s learning.

#9 iron	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	4	0%	1	0%	5	0%	10	0%
rush	5	0%	1	0%	2	0%	8	0%
aggressive	5	0%	6	0%	4	0%	15	0%
fast expo	7	0%	2	0%	28	4%	37	3%
macro	5	0%	7	29%	4	0%	16	12%
defensive	6	0%	3	0%	5	0%	14	0%
total	32	0%	20	10%	48	2%	100	3%

#10 zzzkbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	2	0%	5	0%
rush	1	0%	1	0%	6	0%	8	0%
aggressive	2	0%	3	0%	4	0%	9	0%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	1	0%	1	0%	2	0%	4	0%
defensive	27	15%	12	25%	34	12%	73	15%
total	33	12%	20	15%	50	8%	103	11%

AIUR of course settled on the defensive opening against ZZZKBot, which prefers 4 pool.

#11 steamhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	6	0%	4	0%	11	0%
aggressive	1	0%	3	0%	3	0%	7	0%
fast expo	5	20%	2	0%	16	12%	23	13%
macro	0	0%	4	0%	3	0%	7	0%
defensive	25	20%	4	0%	23	13%	52	15%
total	33	18%	20	0%	50	10%	103	11%

The fast expo (“big army later”) and the defensive opening (“some army fast”) play out similarly when Steamhammer does not go with an early pressure opening. Maybe that’s why they both found some success.

#12 microwave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	11	18%	14	14%
rush	2	0%	1	0%	6	17%	9	11%
aggressive	1	0%	3	33%	11	18%	15	20%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	1	0%	5	40%	1	0%	7	29%
defensive	27	11%	8	38%	19	26%	54	20%
total	33	9%	20	30%	50	20%	103	18%

That is quite a variety of tries against Microwave!

#13 lastorder	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	1	0%	23	9%	29	7%
rush	4	0%	4	0%	1	0%	9	0%
aggressive	5	0%	4	0%	1	0%	10	0%
fast expo	5	0%	3	0%	1	0%	9	0%
macro	7	0%	1	0%	0	0%	8	0%
defensive	6	0%	7	14%	24	8%	37	8%
total	32	0%	20	5%	50	8%	102	5%

LastOrder may have been trained offline against AIUR (that would fit with how LastOrder is supposed to work).

#14 tyr	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	8	75%	3	33%	1	0%	12	58%
rush	19	89%	10	40%	43	53%	72	61%
aggressive	1	0%	2	0%	2	50%	5	20%
fast expo	2	50%	1	0%	1	0%	4	25%
macro	2	0%	3	0%	2	0%	7	0%
defensive	1	0%	1	0%	1	0%	3	0%
total	33	73%	20	25%	50	48%	103	51%

#15 metabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	21	48%	4	25%	2	0%	27	41%
rush	1	0%	1	0%	15	20%	17	18%
aggressive	1	0%	9	44%	22	27%	32	31%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	1	0%	2	0%	1	0%	4	0%
defensive	1	0%	1	0%	1	0%	3	0%
total	26	38%	18	28%	42	21%	86	28%

MetaBot includes AIUR as one of its heads. Also AIUR struggles against both the other heads, Skynet and XIMP. Still, aggressive tries had some success.

#16 letabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	1	0%	1	0%	4	0%
rush	3	33%	2	0%	1	0%	6	17%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	21	67%	4	25%	44	77%	69	71%
macro	2	50%	1	0%	1	0%	4	25%
defensive	4	50%	11	27%	1	0%	16	31%
total	33	55%	20	20%	49	69%	102	55%

#17 arrakhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	3	33%	5	20%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	0	0%	1	0%	1	0%	2	0%
fast expo	2	0%	9	22%	1	0%	12	17%
macro	0	0%	1	0%	1	0%	2	0%
defensive	29	72%	7	43%	43	47%	79	56%
total	33	64%	20	25%	50	42%	103	46%

#18 ecgberht	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	0%	4	0%
rush	1	0%	4	50%	1	0%	6	33%
aggressive	3	0%	2	0%	1	0%	6	0%
fast expo	6	67%	2	50%	1	0%	9	56%
macro	1	0%	10	60%	43	60%	54	59%
defensive	21	24%	1	0%	2	0%	24	21%
total	33	27%	20	45%	50	52%	103	43%

#19 ualbertabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	0	0%	0	0%	0	0%	0	0%
rush	0	0%	0	0%	2	100%	2	100%
aggressive	0	0%	1	100%	1	0%	2	50%
fast expo	1	100%	0	0%	0	0%	1	100%
macro	0	0%	0	0%	0	0%	0	0%
defensive	31	35%	19	42%	47	23%	97	31%
total	32	38%	20	45%	50	26%	102	33%

UAlbertaBot is one of the opponents that AIUR has pre-learned data against. The pre-learned data is not included in this table. That’s why so many cells are 0.

#20 ximp	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	33	33%	0	0%	1	0%	34	32%
rush	0	0%	0	0%	2	50%	2	50%
aggressive	0	0%	12	25%	41	20%	53	21%
fast expo	0	0%	8	50%	3	100%	11	64%
macro	0	0%	0	0%	0	0%	0	0%
defensive	0	0%	0	0%	2	100%	2	100%
total	33	33%	20	35%	49	29%	102	31%

XIMP is the other competitor that AIUR has pre-learned data about.

#21 cdbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	2	0%	1	0%	1	0%	4	0%
macro	1	0%	1	0%	1	0%	3	0%
defensive	27	96%	15	100%	45	87%	87	92%
total	33	79%	20	75%	50	78%	103	78%

It smells like CDBot played a rush every game, and not a strong one.

#23 killall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	0%	4	0%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	1	0%	3	0%	1	0%	5	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	1	0%	1	0%	1	0%	3	0%
defensive	28	18%	13	46%	44	36%	85	32%
total	33	15%	20	30%	50	32%	103	26%

#24 willyt	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	9	56%	11	45%
rush	26	85%	13	69%	30	67%	69	74%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	2	50%	2	50%	6	50%	10	50%
macro	1	0%	1	0%	2	0%	4	0%
defensive	1	0%	2	50%	2	50%	5	40%
total	32	72%	20	55%	50	58%	102	62%

#25 ailien	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	1	0%	2	0%	4	0%
aggressive	2	0%	1	0%	2	0%	5	0%
fast expo	1	0%	10	100%	1	0%	12	83%
macro	27	41%	6	83%	13	23%	46	41%
defensive	1	0%	1	0%	30	37%	32	34%
total	33	33%	20	75%	49	29%	102	39%

#26 cunybot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	50%	1	0%	1	0%	4	25%
rush	1	0%	2	50%	2	50%	5	40%
aggressive	2	100%	1	0%	3	67%	6	67%
fast expo	4	75%	2	100%	8	75%	14	79%
macro	2	50%	4	100%	4	75%	10	80%
defensive	5	100%	9	100%	30	87%	44	91%
total	16	75%	19	84%	48	79%	83	80%

#27 hellbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	7	100%	4	100%	5	80%	16	94%
rush	3	100%	2	100%	8	100%	13	100%
aggressive	1	100%	3	100%	8	100%	12	100%
fast expo	9	100%	6	100%	11	100%	26	100%
macro	8	100%	3	100%	11	100%	22	100%
defensive	2	100%	2	100%	7	100%	11	100%
total	30	100%	20	100%	50	98%	100	99%

Looking across all the tables, each of AIUR’s 6 strategies was sometimes found to be the best. Even today, the variety remains valuable.

AIIDE 2018 - the performance curves

I decided to look more closely at the Win Percentage Over Time curves. For this post, “learning” means online learning during the tournament; bots which only learned offline at home are “non-learning” bots for the moment.

To start off, here are the bots whose curves are more or less flat over time. Of these, #1 SAIDA is the only learning bot. Its learning apparently enabled it to hold its ground at a high level, but not to rise further. The other 3 are #13 LastOrder, #26 CUNYBot, and tail-ender #27 Hellbot. Hellbot gradually lost win rate over time despite its low starting point. The other 2 are very nearly level over time, despite being non-learners in a field of enemies eagerly seeking weaknesses to exploit. I suppose that their play is in some way difficult to exploit by learning, whether highly adaptive, or random and unpredictable, or simply not exposing weaknesses that other bots were able to catch.

Here are all the non-learning bots, as best I could identify from yesterday’s findings. I also included #1 SAIDA to maintain the scale, which usefully goes to 1.1 to accomodate any bots which won more games than they played.

Most of the curves trend down over time. The exceptions are #13 LastOrder and #26 CUNYBot from the first graph. Here’s a rescaled graph to tease apart the dense clump from #16 LetaBot to #24 WillyT. It’s easier to see the downward trend. Of these, #17 Arrakhammer which has sophisticated play, and #20 XIMP whose weaknesses may be difficult for many opponents to exploit, leveled out after the early losses. (So did #9 Iron with its numerous adaptive reactions, from the chart above.) The others continued downward for the entire tournament. Apparently if your play is in some way good enough, you can avoid exploitation by other bots to an extent. But most non-learning bots seem doomed to keep losing win rate even over a long tournament.

a clump of closely-spaced bots that didn’t learn

Here are the learning bots which fell at first, then leveled out. It’s due to some combination of statistical fluctuation plus learning by their opponents, and no doubt bugs and whatever other random stuff. There are only 3 of them.

#15 MetaBot might belong in the graph above, but I gave it its own picture because it is in a class by itself when it comes to struggling at the start then recovering strongly. It fell hard (on the left its curve drops below #21 CDBot) and came back, but it did not level off! MetaBot rivals Steamhammer and AIUR for performance gains over time. I imagine it’s due to MetaBot’s 2-level learning ability, where it learns which of 3 heads is best against each opponent, and then 2 of the heads (AIUR and Skynet), when chosen, learn how best to play against that opponent. Like Steamhammer and AIUR, it has more scope to learn, and it learns more. The graph shows how many rivals MetaBot left in the dust—it came within an ace of surpassing #14 Tyr, and likely would have given 10 more rounds.

Here are the bots which gained win rate early, then largely leveled out—most continued to gain or lose a little for the duration. This is partly because the curves are cumulative. Only the left part of the curve can change quickly; each data point is the average of all the per round win rates to the left. The non-learning #13 LastOrder is included; the others are learning bots. #14 Tyr, which learns less because it only remembers one previous game, had the biggest decline from its peak. That’s interesting: The extremely simple method of learning from only one game is already a powerful form of learning, but it is not as powerful as, say, the UCB learning of #12 Microwave, which remembers summary statistics from many games. All these bots arguably could have done better if they had scope to learn more; their learning ceilings may not be high enough for a long tournament. Perhaps some are tuned for SSCAIT, where fast learning with limited scope helps performance.

Finally, here are the learning bots which kept learning for a long time. (#15 MetaBot has its own graph above and is left out.) #2 CherryPi started strongly and reduced its loss rate by 1/3 over the course of the tournament, which is impressive. #10 ZZZKBot started poorly, then has a clean smooth curve which approaches an asymptote after about 10 rounds. #11 Steamhammer also started poorly, and its slower improvement seems to approach an asymptote after around 30 games, but in fact Steamhammer kept on learning throughout, left Tyr, LastOrder, and Microwave behind, and came close to surpassing ZZZKBot. In a longer tournament, it likely would have; Steamhammer’s big repertoire of openings means it still has fresh ideas to try after 100 rounds. #22 AIUR struggled at first, then recovered and showed its usual strong learning gains.

I find that these performance curves are rich with insight. The top finishers have strong basics, and use learning to avoid being exploited (that seems to the only purpose of learning in SAIDA), or to exploit the weaknesses of other bots. Most bots that did not learn suffered for it, but some were difficult to exploit and could hold their ground—LastOrder was chief among these. Bots that did learn sometimes learned too little and could not keep up with their rivals. Steamhammer and MetaBot were remarkable for their comparatively weak foundations and slow but strong learning skills.

Next I’ll look into what specific bots learned about their opponents. Following tradition, I’ll start with AIUR.

AIIDE 2018 - what bots wrote data

As usual, here is my examination of what each bot kept in its AI directory to read at startup, and what it wrote into its write directory for learning and/or debugging. The AI directory is not the only place a bot might keep prepared data; some bots have configuration files, and the binary might contain anything. This time I left out the up/down arrows. The performance curves seem more complicated than in CIG, and I want to look at them separately. Having files doesn’t mean that the files are used; they might be sitting there unread.

#	bot	info
1	SAIDA	SAIDA stored three classes of files, 131 DefeatResult files (though officially it lost 106 games and timed out 8 times), 18 Error files, and 229 Timeout files. The DefeatResult files are 33 to 80 lines long and have nicely-formatted readable information including the enemy’s build order history with timings, and unit counts and unit loss counts for both sides. I expect that the enemy build timings are key information for the learning mechanism. The error files range from 2 to 2500 lines long and report internal errors that the bot presumably was able to ignore or recover from. The timeout files report when specific managers ran over.
2	CherryPi	CherryPi has a couple of larger files in `AI`, 77MB and 3MB, which are likely offline machine learning data. CherryPi’s survey answers mention offline learning. In the `write` directory it wrote a JSON file for each opponent. The JSON file gives a list of the build orders CherryPi played, and for each build order, a list of booleans under the name “wins_” that look like the win/loss history. It’s interesting that they give the sequence of wins and losses, not simply the counts. It suggests that their learning method is watching for when the opponent figures something out and starts to perform better. It’s also interesting that the build given as having been played most often versus SAIDA is “zvt3hatchlurker”, which does not seem appropriate versus SAIDA’s mech play—but does claim more wins than the alternatives tried. In the files I checked, the total number of win/loss booleans is slightly over 100, the official number of games played. It looks like the tournament manager played 103 rounds before time ran out, then its results were pruned back to 100 rounds so the maps were equally used.
3	CSE	Log file and learning data that looks like that of Locutus.
4	BlueBlueSky	Log file and learning data that looks like that of Locutus.
5	Locutus	Log file and learning data that... is that of Locutus, not very different from Steamhammer data. Locutus also has pre-learned data for 11 opponents, 2 of which have 2 names.
6	ISAMind	Log file and learning data that looks like that of Locutus. Also ISAMind’s machine learning data.
7	DaQin	Log file and learning data that looks like that of Locutus, except that DaQin stores data about only one game per opponent, although the survey answers say differently. Was something broken for this tournament? If so, it doesn’t show in DaQin’s win rate, which is about as expected.
8	McRave	For each opponent, a file listing the 15 protoss strategies that McRave could play, with 2 numbers that look like wins/losses. The numbers sometimes add up to 100 or so, but some are lower. McRave is listed with 83 crashes and 120 frame timeouts, which is likely why.
9	Iron	Nothing. #9 Iron is the highest-ranked bot which wrote no learning data.
10	ZZZKBot	Looks about the same as last year’s format. Even the timestamps say 2017.
11	Steamhammer	Steamhammer’s familiar data, game records with obscure timing numbers.
12	Microwave	As before, a file listing 7 or 8 strategies and win/loss counts for each, limited to a max count of 10.
13	LastOrder	Machine learning data in `AI`, but no online learning data, only a 2 byte file `log_detail_file`.
14	Tyr	For each opponent, a 1 to 4 line file apparently telling whether the previous game was a win or a loss, a small integer, and the strategy Tyr followed, possibly with a few following items named “flags”.
15	MetaBot	In `AI/learning`, a file for each of Skynet, UAlbertaBot, and XIMP, with 91 numbers in each file. 91 is the count of parameters that AIUR learns, and AIUR itself has the same 3 files, so this is AIUR's old pre-learned data about these 3 opponents. In `write`, a mess of mostly log files, but also with apparent learning data per opponent. `states_` files list which head was played for some games against each opponent; this is probably log data, but could also be used for learning. `skynet_` files per opponent look like Skynet learning data, no doubt for games where Skynet played. `[opponent].txt` files are the 91 numbers, likely learning data from when AIUR played. So there are 2 levels of learning here: Learning which head should play, and learning inside that head.
16	LetaBot	A 619-line file `battlescore.txt` with 103 game records of 6 lines each, which I think is one record for each round played (though only 100 rounds were official). It could be a log file or learning data.
17	Arrakhammer	Nothing.
18	Ecgberht	Nothing. The author has explained that learning did not work due to an incorrect `run_proxy.bat` file.
19	UAlbertaBot	The familiar UAlbertaBot format. For each opponent, a file listing 11 opening strategies with a win/loss count for each.
20	XIMP	Nothing.
21	CDBot	Nothing.
22	AIUR	A carryover from past years. Pre-learned data against 3 old opponents (as already mentioned under MetaBot), plus for each opponent, the familiar 91 lines of numbers.
23	KillAll	KillAll is a Steamhammer fork, but it uses a different learning file format. There is a file for each opponent+map combination. It looks like each file gives a game count (usually 10), a chosen opening or “None”, and a list of 8 openings with 3 numbers for each; the last number is floating point. I guess I have to read the code to find out what the numbers mean.
24	WillyT	A log file with 103 lines, presumably 1 per round played.
25	AILien	AILien's idiosyncratic learning file format. One file per opponent, with numbers saying what units are preferred and a few odds and ends. It looks as though AILien saved data for only 1 game per opponent. If this is the same version of AILien that I looked at earlier, then I expect learning was turned off and the recorded data was not used.
26	CUNYBot	In `AI`, a file `output.txt` with a list of build orders and some data on each one. In `write`, 487 files in these groups: `output.txt` an apparent log file with 103 lines, `[map]_v_[opponent]_status.txt` which looks like detailed information per game with a variety of hard-to-understand values, 226 files `[map]Veins([x],[y])` with mostly over 200K lines per file where the (x,y) values are too large to be tile positions and too small to be pixel positions (so I guess they are "Veins"). It looks complex.
27	Hellbot	Nothing.

Lesson: Learn about your opponent! All the winning kids are doing it!

Some interesting and some complicated stuff here. As for CIG, I’ll be looking at what different bots learned. This time it should be more informative.

SAIDA’s learning and SAIDA’s weaknesses

SAIDA is holding its position as #1 on SSCAIT, but it is under constant attack from other bots and loses some games. On the one hand, SAIDA has weaknesses against early harassment and timing attacks, especially if the opponent denies scouting. On the other hand, SAIDA appears to have a learning mechanism that recognizes rush timing and figures out a defense. The SAIDA page describes it as “He also catches perfect rush timing by using information he collected.” That’s a vague description, but the behavior does appear to involve learning from experience. MicroDK noted that SAIDA writes data only after it loses; this must be why. For example, BananaBrain tried a dark templar rush and won a series of games, but finally the learning kicked in and SAIDA figured out how to get turrets in time to stop it (SAIDA’s code was not updated). Since then, BananaBrain has mostly lost games, defeating SAIDA only once, in this game where the turret was seconds late.

Other examples include PurpleSpirit winning one game with BBS then being unable to win with it again, and Krasi0 winning with its fast barracks marine cheese with similar results.

In the latest attacks, Locutus won with center gates, making only 2 zealots before switching into dragoons, and Krasi0 added a bunker to its marine cheese to overcome SAIDA’s vulture counter to the marines (SAIDA crashed this game). Will SAIDA learn to defeat these tricks too? I don’t know, let’s find out!

How powerful is this learning mechanism? Surely there must be attacks that it cannot figure out how to forestall—or can’t figure out in reasonable time. If you find 2 winning tricks and switch between them, can it learn to defend against both? If you DT rush once so that it learns to get early turrets, does it get early turrets for the rest of time after you switch back to regular play? The unnecessary turrets give you a small advantage, and at a high level of play, small advantages are big.

Here are some of the weaknesses I see in SAIDA’s play.

Poor defense against unscouted early attacks, mitigated by the learning mechanism. SAIDA loses more SCVs than it should.
SAIDA recovers poorly from economic setbacks. It does not replenish lost SCVs as well as it should, and stops expanding after a while. If you gain an early lead, you can win by holding on and waiting for SAIDA to mine out.
SAIDA is vulnerable to mine drags. It sees no danger in having its spider mines and its forces next to each other. It will even place mines in its mineral line, begging you to blow up its SCVs.
SAIDA does not know how to build in safe locations. On some maps, like Moon Glaive, parts of the main base are easily sieged from outside. Krasi0 has won games by blasting down factories that are in range, and SAIDA keeps trying to rebuild in places that are also in range.
SAIDA is consistent and predictable. It varies to counter the opponent, but at heart always plays the same strategy and the same tactics. The dropships always fly along the edge.

SAIDA also has great strengths. The greatest may be the big red animated arrow that points out the main attack position. As long as SAIDA has a monopoly on big animated arrows, I think it will remain #1.

CIG 2018 - what Locutus learned

Locutus only recorded 8 games. It is configured to retain 200 game records, and I read the source code and verified that Locutus does not intentionally drop game records before the limit of 200. Recording exactly 8 games is the same problem that McRave suffered, and must be due to CIG problems. I don't know what the underlying problem was. My suspicion is that CIG organizers or tournament software may have accidentally or mistakenly cleared learning data for some bots. If that is what happened, and it happened once 8 games before the end of the tournament, it seems likely that it happened more than once. Who knows, though? The error might be somewhere else. Maybe they mistakenly shipped us data from after round 8 instead of round 125—in that case the tournament may have run normally, and only the data about it is wrong.

Locutus has prepared data for some opponents, stored in the AI directory. When Locutus finds it has no game records for a given opponent, it looks in AI to see if it has prepared data, and if so, it reads in those game records. At the end of the game, it writes out the prepared game records along with the record for the newly played game, and from then on the prepared records are treated like any others and retained unless and until the 200 record limit is passed.

How many other bots were affected by the 8 game problem?

Here is Locutus’s prepared data. Against some opponents, like McRave, Locutus picks out openings to avoid at first. If other openings don’t win either, I’m sure Locutus will come back and try these anyway. Against others, it picks out winners to try first. For some, it simply provides data. Most but not all of the prepared data is for opponents which were carried over from last year, for which pre-learning is sure to be helpful... if it is done on the same maps.

#3 mcrave

opening	games	wins
12Nexus5ZealotFECannons	1	0%
Turtle	1	0%
2 openings	2	0%

#6 iron

opening	games	wins
DTDrop	14	100%
1 openings	14	100%

#7 zzzkbot

opening	games	wins
ForgeExpand5GateGoon	2	100%
1 openings	2	100%

#11 ualbertabot

opening	games	wins
4GateGoon	1	0%
9-9GateDefensive	2	50%
ForgeExpand5GateGoon	15	93%
3 openings	18	83%

#14 aiur

opening	games	wins
4GateGoon	3	100%
9-9GateDefensive	1	100%
2 openings	4	100%

#16 ziabot

opening	games	wins
9-9GateDefensive	1	0%
ForgeExpand5GateGoon	1	100%
2 openings	2	50%

#19 terranuab

opening	games	wins
DTDrop	10	100%
1 openings	10	100%

#21 opprimobot

opening	games	wins
DTDrop	11	100%
1 openings	11	100%

#22 sling

opening	games	wins
ForgeExpand5GateGoon	2	100%
1 openings	2	100%

#23 srbotone

opening	games	wins
DTDrop	7	100%
PlasmaProxy2Gate	1	100%
2 openings	8	100%

#24 bonjwa

opening	games	wins
DTDrop	6	100%
PlasmaProxy2Gate	1	100%
2 openings	7	100%

overall

	total		PvT		PvP		PvZ		PvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
12Nexus5ZealotFECannons	1	0%			1	0%
4GateGoon	4	75%			3	100%			1	0%
9-9GateDefensive	4	50%			1	100%	1	0%	2	50%
DTDrop	48	100%	48	100%
ForgeExpand5GateGoon	20	95%					5	100%	15	93%
PlasmaProxy2Gate	2	100%	2	100%
Turtle	1	0%			1	0%
total	80	92%	50	100%	6	67%	6	83%	18	83%
openings played	7		2		4		2		3

Here is Locutus’s learned data. In every case, the number of games recorded is 8 plus the number of games in the prepared data. With only 8 games there is not much to go on, but the prepared data does seem to have helped Locutus choose successful openings.

#2 purplewave

opening	games	wins
12Nexus5ZealotFECannons	1	0%
4GateGoon	1	0%
9-9GateDefensive	5	80%
Proxy9-9Gate	1	0%
4 openings	8	50%

#3 mcrave

opening	games	wins
12Nexus5ZealotFECannons	1	0%
4GateGoon	3	67%
Proxy9-9Gate	5	100%
Turtle	1	0%
4 openings	10	70%

#4 tscmoo

opening	games	wins
4GateGoon	1	0%
9-9GateDefensive	1	0%
ForgeExpand5GateGoon	4	25%
Proxy9-9Gate	2	50%
4 openings	8	25%

#5 isamind

opening	games	wins
4GateGoon	6	83%
9-9GateDefensive	1	100%
Proxy9-9Gate	1	100%
3 openings	8	88%

#6 iron

opening	games	wins
DTDrop	22	95%
1 openings	22	95%

#7 zzzkbot

opening	games	wins
ForgeExpand5GateGoon	7	86%
ForgeExpandSpeedlots	2	50%
Proxy9-9Gate	1	0%
3 openings	10	70%

#8 microwave

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

#9 letabot

opening	games	wins
DTDrop	8	88%
1 openings	8	88%

#10 megabot

opening	games	wins
4GateGoon	8	100%
1 openings	8	100%

#11 ualbertabot

opening	games	wins
4GateGoon	1	0%
9-9GateDefensive	2	50%
ForgeExpand5GateGoon	23	91%
3 openings	26	85%

#12 tyr

opening	games	wins
4GateGoon	8	100%
1 openings	8	100%

#13 ecgberht

opening	games	wins
DTDrop	8	88%
1 openings	8	88%

#14 aiur

opening	games	wins
12Nexus5ZealotFECannons	1	0%
2GateDTExpo	1	100%
4GateGoon	5	80%
9-9GateDefensive	1	100%
Proxy9-9Gate	4	75%
5 openings	12	75%

#15 titaniron

opening	games	wins
DTDrop	8	100%
1 openings	8	100%

#16 ziabot

opening	games	wins
9-9GateDefensive	1	0%
ForgeExpand5GateGoon	6	83%
ForgeExpandSpeedlots	2	50%
Proxy9-9Gate	1	100%
4 openings	10	70%

#17 steamhammer

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

#18 overkill

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

#19 terranuab

opening	games	wins
DTDrop	18	100%
1 openings	18	100%

#20 cunybot

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

#21 opprimobot

opening	games	wins
DTDrop	19	100%
1 openings	19	100%

#22 sling

opening	games	wins
ForgeExpand5GateGoon	10	100%
1 openings	10	100%

#23 srbotone

opening	games	wins
DTDrop	15	100%
PlasmaProxy2Gate	1	100%
2 openings	16	100%

#24 bonjwa

opening	games	wins
DTDrop	14	100%
PlasmaProxy2Gate	1	100%
2 openings	15	100%

#25 stormbreaker

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

#26 korean

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

#27 salsa

opening	games	wins
ForgeExpand5GateGoon	8	100%
1 openings	8	100%

overall

	total		PvT		PvP		PvZ		PvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
12Nexus5ZealotFECannons	3	0%			3	0%
2GateDTExpo	1	100%			1	100%
4GateGoon	33	82%			31	87%			2	0%
9-9GateDefensive	11	64%			7	86%	1	0%	3	33%
DTDrop	112	97%	112	97%
ForgeExpand5GateGoon	106	93%					79	97%	27	81%
ForgeExpandSpeedlots	4	50%					4	50%
PlasmaProxy2Gate	2	100%	2	100%
Proxy9-9Gate	15	73%			11	82%	2	50%	2	50%
Turtle	1	0%			1	0%
total	288	90%	114	97%	54	80%	86	93%	34	71%
openings played	10		2		6		4		4

CIG 2018 - what Steamhammer learned

I wrote a new script to analyze Steamhammer’s learning data. A couple points: 1. Steamhammer crashed in nearly half of its games in CIG 2018. It can’t save learning data after a crash, so against some opponents Steamhammer had few opportunities to experiment. The number of crashes varied strongly depending on the opponent. 2. Steamhammer was set to remember the previous 100 games, since I figure there’s no play advantage to remembering more. The tournament was 125 rounds long. So in the tables below, “100 games” means that Steamhammer played at least 100 games without crashing, and up to 25 games may have been dropped, the early games. Against some weak opponents, Steamhammer learned, within 25 games, how to win 100% of the remaining games, and those tables give a 100% win rate for remembered games. Steamhammer did not score 100% against any opponent overall; it always had some losses in early games.

I should be able to run the same analysis for Steamhammer forks which retain Steamhammer’s opponent model file format.

#1 Locutus

opening	games	wins
2HatchHydraBust	1	0%
3HatchHydraExpo	2	0%
3HatchLingBust	1	0%
3HatchLingExpo	1	0%
4HatchBeforeGas	1	0%
OverpoolSpeed	9	56%
6 openings	15	33%

A mystery is solved. Why was Steamhammer’s crash rate higher than I expected? Because many opponents learned to make Steamhammer crash. A crash for the opponent is a win, and the bot doesn’t care how it wins, so if it can learn a plan that makes the opponent crash reliably, it will. The stronger opponents tend to be learning bots, so Steamhammer crashed more often on average against strong opponents. This also means that my glib conclusion that Steamhammer won 66% of non-crash games, so it seems to have kept up with general progress is not sound. The non-crash games were mostly against weak opponents.

Locutus was lucky that it could figure out how to break Steamhammer. As Bruce mentioned in a comment, this Locutus version had a bug when facing certain zergling timings, and Steamhammer quickly figured out how to exploit the bug. It’s possible that Steamhammer minus the crash would have upset Locutus.

#2 PurpleWave

opening	games	wins
11Gas10PoolMuta	1	0%
3HatchHydra	3	0%
3HatchLurker	1	0%
4PoolSoft	1	0%
7Pool12Hatch	1	0%
7PoolSoft	1	0%
9Hatch8Pool	1	0%
9HatchExpo9Pool9Gas	1	0%
9PoolSpeed	1	0%
AntiFactory	1	0%
Over10Hatch	6	0%
Over10Hatch1Sunk	7	0%
Over10Hatch2Sunk	18	0%
Over10HatchBust	1	0%
Over10HatchSlowLings	4	0%
OverhatchMuta	1	0%
OverpoolHatch	1	0%
OverpoolTurtle	3	0%
ZvP_3HatchPoolHydra	2	0%
ZvP_4HatchPoolHydra	1	0%
ZvT_12PoolMuta	1	0%
ZvZ_Overpool11Gas	1	0%
22 openings	58	0%

PurpleWave shut out Steamhammer. It didn’t learn to make Steamhammer crash because every game was a win for it anyway. Steamhammer desperately tried alternatives all over the map, including crazy all-ins and openings intended for ZvT and ZvZ, and nothing worked.

#3 McRave

opening	games	wins
11Gas10PoolLurker	1	0%
4HatchBeforeGas	1	0%
9HatchExpo9Pool9Gas	1	0%
9PoolSpeed	5	100%
ZvP_3HatchPoolHydra	2	0%
5 openings	10	50%

#4 tscmoo

opening	games	wins
9PoolExpo	1	0%
9PoolHatch	1	0%
9PoolSunkHatch	1	0%
AntiFact_2Hatch	1	0%
Over10Hatch2Sunk	1	0%
OverhatchExpoLing	13	15%
OverpoolSpeed	22	23%
7 openings	40	18%

#5 ISAMind

opening	games	wins
3HatchHydraExpo	1	0%
4HatchBeforeGas	1	0%
OverpoolSpeed	4	100%
ZvP_2HatchMuta	7	0%
ZvP_3HatchPoolHydra	6	0%
5 openings	19	21%

#6 Iron

opening	games	wins
2HatchHydra	1	0%
3HatchLingExpo	2	0%
4PoolHard	1	0%
6PoolSpeed	1	0%
9Hatch8Pool	1	0%
9HatchMain9Pool9Gas	1	0%
9PoolSunkSpeed	1	0%
AntiFact_13Pool	4	0%
AntiFact_2Hatch	83	12%
AntiFactory	1	0%
Over10Hatch	1	0%
PurpleSwarmBuild	1	0%
ZvP_2HatchMuta	1	0%
ZvT_12PoolMuta	1	0%
14 openings	100	10%

Iron is not a learning bot, so it did not learn to crash Steamhammer. Still, these results show a weakness in Steamhammer: Its best opening against Iron is AntiFactory, which it tried only once in these 100 games. Steamhammer did not explore enough. I tried to fix the weakness in Steamhammer 2.0.

#7 ZZZKBot

opening	games	wins
11Gas10PoolMuta	1	0%
8Pool	7	29%
9HatchMain9Pool9Gas	1	0%
9PoolSpeed	1	0%
OverhatchMuta	1	0%
Overpool+1	1	0%
OverpoolSpeed	1	0%
ZvZ_12HatchMain	2	0%
ZvZ_12Pool	1	0%
ZvZ_12PoolLing	48	58%
ZvZ_Overgas9Pool	2	0%
ZvZ_Overpool9Gas	2	0%
12 openings	68	44%

#8 Microwave

opening	games	wins
9PoolSunkHatch	5	80%
9PoolSunkSpeed	27	67%
OverpoolSunk	1	0%
OverpoolTurtle	3	33%
ZvZ_12PoolLing	1	0%
5 openings	37	62%

This looks like successful learning. Too bad Steamhammer only successfully played 37 of the 125 games.

#9 LetaBot

opening	games	wins
11Gas10PoolLurker	1	0%
2HatchLurkerAllIn	4	0%
3HatchHydraExpo	1	0%
3HatchLurker	13	38%
9HatchExpo9Pool9Gas	45	36%
OverpoolLurker	13	31%
ZvP_2HatchMuta	1	0%
ZvT_12PoolMuta	1	0%
ZvT_13Pool	1	0%
ZvT_3HatchMuta	1	0%
10 openings	81	31%

#10 MegaBot

opening	games	wins
11Gas10PoolLurker	1	0%
3HatchHydra	1	0%
3HatchHydraExpo	1	0%
3HatchLingExpo	21	43%
Over10Hatch	1	0%
OverhatchExpoLing	1	100%
ZvP_3HatchPoolHydra	2	0%
7 openings	28	36%

#11 UAlbertaBot

opening	games	wins
3HatchLingExpo	1	0%
5PoolHard2Player	1	0%
9PoolExpo	1	0%
9PoolSpeed	1	0%
9PoolSunkHatch	46	33%
9PoolSunkSpeed	29	48%
Over10Hatch1Sunk	2	0%
OverpoolSpeed	1	0%
ZvZ_Overpool9Gas	1	0%
9 openings	83	35%

#12 Tyr

opening	games	wins
9PoolHatch	5	100%
ZvP_3HatchPoolHydra	5	0%
2 openings	10	50%

#13 Ecgberht

opening	games	wins
11Gas10PoolLurker	10	50%
2HatchLurker	23	61%
2HatchLurkerAllIn	44	75%
Over10HatchBust	3	33%
OverpoolLurker	8	75%
OverpoolSpeed	3	33%
ZvT_13Pool	1	0%
7 openings	92	65%

#14 Aiur

opening	games	wins
11Gas10PoolLurker	1	100%
5PoolHard2Player	1	100%
9PoolSunkHatch	1	100%
9PoolSunkSpeed	2	100%
Over10Hatch	1	0%
Over10Hatch1Sunk	2	50%
Over10Hatch2Hard	1	100%
Over10HatchSlowLings	1	100%
OverpoolSpeed	2	100%
OverpoolTurtle	3	67%
10 openings	15	80%

#15 TitanIron

opening	games	wins
3HatchLingBust	1	0%
AntiFact_13Pool	6	50%
AntiFact_2Hatch	1	0%
AntiFactory	74	42%
Over10Hatch2Sunk	1	0%
OverhatchExpoMuta	1	0%
OverpoolLurker	1	0%
ZvZ_Overgas9Pool	14	21%
ZvZ_Overpool9Gas	1	0%
9 openings	100	37%

This selection of openings implies that TitanIron plays a factory-first build against zerg, like Iron, and is a non-learning bot, like Iron. Later I’ll look into the source and find out for sure.

#16 Ziabot

opening	games	wins
11Gas10PoolMuta	4	25%
2.5HatchMuta	1	0%
3HatchHydraBust	1	0%
6PoolSpeed	1	0%
8Pool	7	71%
9Hatch8Pool	1	0%
9PoolHatch	4	50%
ZvP_2HatchTurtle	1	0%
ZvZ_12Pool	1	0%
ZvZ_12PoolMain	16	25%
ZvZ_Overpool11Gas	10	50%
ZvZ_Overpool9Gas	53	74%
12 openings	100	56%

Low win rates against Zia and some other opponents suggest to me that Steamhammer had other new weaknesses besides crashing. I think Steamhammer should score over 80% against Zia.

#18 Overkill

opening	games	wins
11Gas10PoolMuta	10	90%
4PoolHard	23	96%
6PoolSpeed	28	100%
9Hatch8Pool	1	0%
OverhatchLing	2	50%
OverpoolSpeed	13	92%
ZvZ_12HatchExpo	2	50%
ZvZ_12PoolMain	1	0%
8 openings	80	91%

#19 TerranUAB

opening	games	wins
2HatchLurker	52	90%
AntiFact_13Pool	8	88%
AntiFact_2Hatch	9	78%
AntiFactory	31	90%
4 openings	100	89%

#20 CUNYbot

opening	games	wins
11Gas10PoolMuta	9	78%
OverhatchLing	34	97%
ZvZ_12PoolLing	27	96%
ZvZ_Overgas9Pool	1	0%
ZvZ_Overpool9Gas	19	89%
5 openings	90	92%

#21 OpprimoBot

opening	games	wins
11Gas10PoolLurker	3	67%
2HatchLurker	2	50%
2HatchLurkerAllIn	6	83%
6PoolSpeed	19	100%
OverpoolLurker	1	0%
OverpoolSpeed	5	80%
ZvT_12PoolMuta	20	95%
ZvT_3HatchMuta	20	100%
ZvT_3HatchMutaExpo	24	100%
9 openings	100	94%

#22 Sling

opening	games	wins
4PoolHard	4	75%
4PoolSoft	6	100%
5PoolHard2Player	3	100%
ZvZ_12HatchMain	1	0%
ZvZ_Overgas9Pool	1	0%
5 openings	15	80%

The selection of fast rush openings suggests that Sling played a macro strategy which was countered by fast rushes. But I don’t want to draw strong conclusions based on 15 non-crash games out of 125.

#23 SRbotOne

opening	games	wins
11Gas10PoolLurker	14	93%
2HatchLurker	10	90%
2HatchLurkerAllIn	10	90%
3HatchLurker	17	100%
4PoolSoft	17	100%
5PoolHard	7	100%
9HatchExpo9Pool9Gas	4	75%
9PoolLurker	3	100%
OverpoolLurker	5	100%
9 openings	87	95%

The wide range of lurker openings means that SRbotOne by Johan Kayser fought with mostly barracks units. Well, we already knew that.

#24 Bonjwa

opening	games	wins
9PoolExpo	6	100%
9PoolSunkHatch	5	100%
9PoolSunkSpeed	5	100%
AntiFact_2Hatch	3	100%
AntiFactory	5	100%
ZvT_2HatchMuta	1	100%
6 openings	25	100%

#25 Stormbreaker

opening	games	wins
11Gas10PoolMuta	1	100%
4PoolHard	1	100%
9PoolSunkHatch	8	100%
9PoolSunkSpeed	8	100%
OverhatchLing	1	100%
OverhatchMuta	7	100%
OverpoolSpeed	1	100%
OverpoolSunk	7	100%
ZvZ_12HatchExpo	2	100%
ZvZ_12HatchMain	3	100%
ZvZ_12PoolLing	1	100%
ZvZ_12PoolMain	3	100%
12 openings	43	100%

#26 Korean

opening	games	wins
4PoolHard	1	100%
4PoolSoft	3	100%
5PoolHard	5	100%
5PoolHard2Player	3	100%
5PoolSoft	1	100%
6PoolSpeed	6	100%
OverhatchLing	9	100%
OverhatchMuta	12	100%
ZvZ_12HatchExpo	13	100%
ZvZ_12HatchMain	16	100%
ZvZ_12PoolLing	14	100%
ZvZ_12PoolMain	17	100%
12 openings	100	100%

#27 Salsa

opening	games	wins
4PoolHard	2	100%
4PoolSoft	4	100%
5PoolHard	7	100%
5PoolHard2Player	1	100%
5PoolSoft	1	100%
6PoolSpeed	8	100%
OverhatchLing	11	100%
OverhatchMuta	8	100%
ZvZ_12HatchExpo	12	100%
ZvZ_12HatchMain	20	100%
ZvZ_12PoolLing	13	100%
ZvZ_12PoolMain	12	100%
ZvZ_Overgas9Pool	1	100%
13 openings	100	100%

overall

	total		ZvT		ZvP		ZvZ		ZvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
11Gas10PoolLurker	31	68%	28	71%	3	33%
11Gas10PoolMuta	26	69%			1	0%	25	72%
2.5HatchMuta	1	0%					1	0%
2HatchHydra	1	0%	1	0%
2HatchHydraBust	1	0%			1	0%
2HatchLurker	87	82%	87	82%
2HatchLurkerAllIn	64	73%	64	73%
3HatchHydra	4	0%			4	0%
3HatchHydraBust	1	0%					1	0%
3HatchHydraExpo	5	0%	1	0%	4	0%
3HatchLingBust	2	0%	1	0%	1	0%
3HatchLingExpo	25	36%	2	0%	22	41%			1	0%
3HatchLurker	31	71%	30	73%	1	0%
4HatchBeforeGas	3	0%			3	0%
4PoolHard	32	91%	1	0%			31	94%
4PoolSoft	31	97%	17	100%	1	0%	13	100%
5PoolHard	19	100%	7	100%			12	100%
5PoolHard2Player	9	89%			1	100%	7	100%	1	0%
5PoolSoft	2	100%					2	100%
6PoolSpeed	63	97%	20	95%			43	98%
7Pool12Hatch	1	0%			1	0%
7PoolSoft	1	0%			1	0%
8Pool	14	50%					14	50%
9Hatch8Pool	4	0%	1	0%	1	0%	2	0%
9HatchExpo9Pool9Gas	51	37%	49	39%	2	0%
9HatchMain9Pool9Gas	2	0%	1	0%			1	0%
9PoolExpo	8	75%	6	100%					2	0%
9PoolHatch	10	70%			5	100%	4	50%	1	0%
9PoolLurker	3	100%	3	100%
9PoolSpeed	8	62%			6	83%	1	0%	1	0%
9PoolSunkHatch	66	50%	5	100%	1	100%	13	92%	47	32%
9PoolSunkSpeed	72	65%	6	83%	2	100%	35	74%	29	48%
AntiFact_13Pool	18	56%	18	56%
AntiFact_2Hatch	97	21%	96	21%					1	0%
AntiFactory	112	57%	111	58%	1	0%
Over10Hatch	9	0%	1	0%	8	0%
Over10Hatch1Sunk	11	9%			9	11%			2	0%
Over10Hatch2Hard	1	100%			1	100%
Over10Hatch2Sunk	20	0%	1	0%	18	0%			1	0%
Over10HatchBust	4	25%	3	33%	1	0%
Over10HatchSlowLings	5	20%			5	20%
OverhatchExpoLing	14	21%			1	100%			13	15%
OverhatchExpoMuta	1	0%	1	0%
OverhatchLing	57	96%					57	96%
OverhatchMuta	29	93%			1	0%	28	96%
Overpool+1	1	0%					1	0%
OverpoolHatch	1	0%			1	0%
OverpoolLurker	28	54%	28	54%
OverpoolSpeed	61	56%	8	62%	15	73%	15	87%	23	22%
OverpoolSunk	8	88%					8	88%
OverpoolTurtle	9	33%			6	33%	3	33%
PurpleSwarmBuild	1	0%	1	0%
ZvP_2HatchMuta	9	0%	2	0%	7	0%
ZvP_2HatchTurtle	1	0%					1	0%
ZvP_3HatchPoolHydra	17	0%			17	0%
ZvP_4HatchPoolHydra	1	0%			1	0%
ZvT_12PoolMuta	23	83%	22	86%	1	0%
ZvT_13Pool	2	0%	2	0%
ZvT_2HatchMuta	1	100%	1	100%
ZvT_3HatchMuta	21	95%	21	95%
ZvT_3HatchMutaExpo	24	100%	24	100%
ZvZ_12HatchExpo	29	97%					29	97%
ZvZ_12HatchMain	42	93%					42	93%
ZvZ_12Pool	2	0%					2	0%
ZvZ_12PoolLing	104	79%					104	79%
ZvZ_12PoolMain	49	73%					49	73%
ZvZ_Overgas9Pool	19	21%	14	21%			5	20%
ZvZ_Overpool11Gas	11	45%			1	0%	10	50%
ZvZ_Overpool9Gas	76	74%	1	0%			74	76%	1	0%
total	1596	64%	685	62%	155	26%	633	82%	123	29%
openings played	69		37		36		31		13

This summary table took me hours to get right, so I hope it's useful.

Steamhammer played 69 openings in 1596 non-crash games, which is around 2/3rds of the openings it knows. No single matchup had more than 37 different openings. There were far more games against terran and zerg than against protoss and random, partly due to the crashing pattern. Against the random opponents (Tscmoo and UAlbertaBot), it settled on mostly general-purpose openings, as you might expect. Its best matchup was ZvZ, with a Jaedong-like 82% win rate (and lately, Jaedong crashes half the time too, so they’re just alike).

Openings that were both popular and successful include 2HatchLurker and 2HatchLurkerAllIn versus terran, 6PoolSpeed with a 97% win rate against mostly weak opponents, 9PoolSunkSpeed used across all matchups, and ZvZ specialties OverhatchLing, ZvZ_12PoolLing, and ZvZ_Overpool9Gas. None of the opening choices surprises me, though some of the win rates do.

CIG 2018 - what Overkill learned

After analyzing AIUR yesterday, I ran a similar (but much simpler) analysis for the classic zerg #18 Overkill. The version in CIG 2018 has not been updated since 2015 and is the same version that still plays on SSCAIT. In 2015 it was a sensation, placing 3rd in both CIG and AIIDE—its place of 18 in this tournament, with about 35% win rate, suggests huge progress over the past 3 years. But keep reading; Overkill appears to have been broken in this tournament. I did this analysis once before: See what Overkill learned in AIIDE 2015.

Classic Overkill knows 3 openings, a 9 pool opening which stays on one base for a good time, and 10- and 12-hatch openings to get mutalisks first. When it chooses 9 pool, that means that the opponent is either rushing (so the 9 pool is necessary to defend) or is being too greedy (which the 9 pool can exploit). Overkill counts some games twice in an attempt to learn faster, so sometimes its total game count is larger than the number of rounds in the tournament (125).

	NinePoolling		TenHatchMuta		TwelveHatchMuta		total
opponent	n	win	n	win	n	win	n	win
#1 Locutus	42	0%	42	0%	41	0%	125	0%
#2 PurpleWave	43	0%	43	0%	42	0%	128	0%
#3 McRave	44	0%	44	0%	43	0%	131	0%
#4 tscmoo	40	0%	40	0%	47	2%	127	1%
#5 ISAMind	42	0%	42	0%	41	0%	125	0%
#6 Iron	54	7%	32	0%	39	3%	125	4%
#7 ZZZKBot	47	2%	39	0%	47	2%	133	2%
#8 Microwave	54	6%	35	0%	42	2%	131	3%
#9 LetaBot	52	6%	33	0%	40	2%	125	3%
#10 MegaBot	60	12%	24	0%	41	7%	125	8%
#11 UAlbertaBot	41	0%	41	0%	48	2%	130	1%
#12 Tyr	40	0%	39	0%	47	2%	126	1%
#13 Ecgberht	57	16%	24	4%	42	12%	123	12%
#14 Aiur	94	34%	14	7%	17	12%	125	28%
#15 TitanIron	36	11%	20	0%	69	16%	125	12%
#16 Ziabot	16	0%	16	0%	93	23%	125	17%
#17 Steamhammer	107	48%	7	0%	10	10%	124	42%
#19 TerranUAB	24	67%	3	0%	98	83%	125	78%
#20 CUNYbot	18	44%	6	17%	101	66%	125	61%
#21 OpprimoBot	36	67%	3	0%	86	76%	125	71%
#22 Sling	67	46%	6	0%	52	42%	125	42%
#23 SRbotOne	23	74%	4	25%	95	89%	122	84%
#24 Bonjwa	75	92%	4	25%	46	87%	125	88%
#25 Stormbreaker	70	91%	2	0%	53	87%	125	88%
#26 Korean	77	99%	2	0%	46	93%	125	95%
#27 Salsa	46	100%	32	94%	46	100%	124	98%
total	1305	36%	597	6%	1372	40%	3274	32%

The 10 hatch opening was useless in this tournament—against every opponent, 10 hatch was the worst choice, at best tying for 0. In 2015, 10 hatch was about as successful as the other openings.

Signs are that something was wrong with Overkill in this tournament. In AIIDE 2015, then #3 Overkill scored 23% against then #4 UAlbertaBot, 68% against #5 AIUR, and 99% against #17 OpprimoBot. In CIG 2018, it was 1.6% against UAlbertaBot, 28% against AIUR, 71% against OpprimoBot. All versions appear to be the same in both tournaments—I didn’t look closely, but I did unpack the sources and check dates (in particular, Overkill has file change dates up to 8 October 2015 in both tournaments). Overkill had 14 crash games in CIG 2018, not enough to account for the difference. It’s hard to believe that the maps could have shifted results that much.

Tomorrow: What went wrong with Overkill?