archive by month
Skip to content

Steamhammer 2.1.2 test version

I’ve uploaded the next test version to SSCAIT, Steamhammer 2.1.2. The biggest change is that stuck units are less common. There is still an issue where units can occasionally freeze en masse in the late game, but it’s rare in my tests and so far I’ve only seen it when Steamhammer was winning and it didn’t matter. To fix sticking, I finished up and enabled part of Steamhammer’s new unit control infrastructure, work that has been inching forward for months. I didn’t change anything else or take any unsticking actions; the new structure usually works better by nature than the classic structure inherited from UAlbertaBot.

There’s also a fix for a building manager bug that incorrectly turned expansion hatcheries into macro hatcheries. The bug sometimes made play much worse, but also sometimes made it better, so testing is called for.

AIIDE 2018 - what CherryPi learned

Here is a table of how each CherryPi opening fared against each opponent, like the tables I made for other bots. Reading the code confirmed my inference that the learning files recorded opening build orders, not build orders switched to later in the game; see how CherryPi played.

#bottotal10hatchling2hatchmuta3basepoollings9poolspeedlingmutahydracheesezve9poolspeedzvp10hatchzvp3hatchhydrazvp6hatchhydrazvpohydraszvpomutaszvt2baseguardianzvt2baseultrazvt3hatchlurkerzvtmacrozvz12poolhydraszvz9gas10poolzvz9poolspeedzvzoverpool
#1saida13-90  13%-----1-19 5%------1-15 6%9-37 20%2-19 10%----
#3cse73-30  71%-----0-2 0%24-5 83%--16-8 67%----33-15 69%----
#4bluebluesky89-14  86%-----0-1 0%29-8 78%-------60-5 92%----
#5locutus84-19  82%--63-11 85%-----14-3 82%-2-2 50%---5-3 62%----
#6isamind99-4  96%--1-0 100%-----98-4 96%----------
#7daqin103-0  100%--------------103-0 100%----
#8mcrave87-16  84%--9-2 82%-----31-4 89%-14-4 78%---33-6 85%----
#9iron97-6  94%----97-6 94%--------------
#10zzzkbot93-10  90%58-4 94%--0-1 0%-------------35-4 90%0-1 0%
#11steamhammer81-21  79%22-7 76%----16-5 76%---------0-1 0%-43-8 84%-
#12microwave94-9  91%----------------0-1 0%4-2 67%90-6 94%
#13lastorder85-18  83%45-7 87%----0-1 0%------------40-10 80%
#14tyr98-5  95%------98-5 95%------------
#15metabot94-2  98%---------94-2 98%---------
#16letabot101-2  98%0-1 0%-97-0 100%--1-1 50%-----3-0 100%-------
#17arrakhammer92-11  89%-----------------92-11 89%-
#18ecgberht102-1  99%--------------102-1 99%----
#19ualbertabot99-4  96%---96-2 98%-3-2 60%-------------
#20ximp98-5  95%-------1-0 100%-97-5 95%---------
#21cdbot103-0  100%-----96-0 100%-----------7-0 100%-
#22aiur100-3  97%---------100-3 97%---------
#23killall103-0  100%102-0 100%-----------------1-0 100%
#24willyt103-0  100%-103-0 100%-----------------
#25ailien103-0  100%-----------------103-0 100%-
#26cunybot100-3  97%-----------------100-3 97%-
#27hellbot103-0  100%------31-0 100%--72-0 100%---------
overall-  90%227-19 92%103-0 100%170-13 93%96-3 97%97-6 94%117-31 79%182-18 91%1-0 100%143-11 93%379-18 95%16-6 73%3-0 100%1-15 6%9-37 20%338-49 87%0-1 0%0-1 0%384-28 93%131-17 89%

Look how sparse the chart is—CherryPi was highly selective about its choices. It did not try more than 4 different builds against any opponent. It makes sense to minimize the number of choices so that you don’t lose games exploring bad ones, but you have to be pretty sure that one of the choices you do try is good. Where did the selectivity come from?

The opening “hydracheese” was played only against Iron, and was the only opening played against Iron. It smelled like a hand-coded choice. Sure enough, the file source/src/models/banditconfigurations.cpp configures builds by name for 18 of the 27 entrants. A comment says that the build order switcher is turned off for the hydracheese opening only: “BOS disabled for this specific build because the model hasn’t seen it.” Here is the full set of builds configured, including defaults for those that were not hand-configured. CherryPi played only builds that were configured, but did not play all the builds that were configured; presumably it stopped when it hit a good one.

botsbuildsnote
AILienzve9poolspeed zvz9poolspeedreturning opponents from last year
AIURzvtmacro zvpohydras zvp10hatch
Arrakhammer10hatchling zvz9poolspeed
Ironhydracheese
UAlbertaBotzve9poolspeed 9poolspeedlingmuta
Ximpzvpohydras zvtmacro zvp3hatchhydra
Microwavezvzoverpool zvz9poolspeed zvz9gas10pool“we have some expectations”
Steamhammerzve9poolspeed zvz9poolspeed zvz12poolhydras 10hatchling
ZZZKBot9poolspeedlingmuta 10hatchling zvz9poolspeed zvzoverpool
ISAMind
Locutus
McRave
DaQin
zvtmacro zvp6hatchhydra 3basepoollings zvpomutas
CUNYBotzvzoverpoolplus1 zvz9gas10pool zvz9poolspeed
HannesBredbergzvtp1hatchlurker zvt2baseultra zvt3hatchlurker zvp10hatch
LetaBotzvtmacro 3basepoollings zvt2baseguardian zve9poolspeed 10hatchling
MetaBotzvtmacro zvpohydras zvpomutas zve9poolspeed
WillyTzvt2baseultra 12poolmuta 2hatchmuta
ZvTzvt2baseultra zvtmacro zvt3hatchlurker zve9poolspeeddefaults
ZvPzve9poolspeed zvtmacro zvp10hatch zvpohydras
ZvZ10hatchling zve9poolspeed zvz9poolspeed zvzoverpool
ZvR10hatchling zve9poolspeed 9poolspeedlingmuta

I read this as pulling out all the stops to reach #1. They would have succeeded if not for SAIDA.

banditconfigurations.cpp continues and declares some properties for builds including non-opening builds. It looks like .validOpening() tells whether it can be played as an opening build, .validSwitch() tells whether the build order switcher is allowed to switch to it during the game, and .switchEnabled() tells whether the build order switcher is enabled at all.

The build orders themselves are defined in source/src/buildorders/. I found them a little hard to read, partly because they are written in reverse order: Actions to happen first are posted last to the blackboard.

The opening zve9poolspeed (I read “zve” as zerg versus everything) has the most red boxes in the chart—it did poorly against more opponents than any other. It may have been a poor choice to configure for use in so many cases. In contrast, zvz9poolspeed specialized for ZvZ was successful. It gets fast mutalisks and in general has a lot more strategic detail coded into the build.

They seem to have had expectations of the zvt2baseultra build against terran. It is configured for HannesBredberg, WillyT, and the default ZvT. It was in fact only tried against SAIDA. I didn’t notice anything that tells CherryPi what order to try opening builds in. Maybe the build order switcher itself contributes, helping to choose the more likely openings first?

Steamhammer 2.1.1 test version

I’ve uploaded Steamhammer 2.1.1 to SSCAIT. It’s a test version that I don’t expect to release source for. I’ll follow a plan similar to last year’s: Upload a series of test versions as the tournament approaches, in hope of catching any new bugs and weaknesses before it is too late. The version that goes into the tournament will be whatever is ready by then, and that is the version that I’ll do the full release job for. Last year I used “1.4a1” and so on as test version numbers. This year it will be “2.1.x”.

There are quite a few changes, and they are probably not what you are expecting. I count 5 fixes for serious bugs that could easily lose games, plus mitigations to reduce 2 important weaknesses, plus a bunch of quick corrections and tweaks for less important issues that came up. There is still 1 major bug at the top of my list to fix before the tournament, and we’ll see whether I’ve introduced any awful new weaknesses—some of the changes are risky, it would not be a surprise.

I disabled Randomhammer until after the tournament. It’s not broken or anything, I just felt like turning it off in plenty of time this year. Everybody else will get in a few more last-minute games.

I changed the debug drawing options to show off a new option that I added, DrawHiddenEnemies. It draws 4 symbols to represent known enemy units and buildings that cannot be seen at the moment: An enemy that is out of sight gets a small green circle at its last seen location, or a yellow circle if it was burrowed when last seen (especially useful for spider mines and lurkers). An enemy that is out of sight and is known to be no longer at its last seen location gets a red X instead. And an enemy which is in sight and is not detected gets a larger violet circle (a faint color that I think of as representing the cloaking field) and is labeled in white with its unit type. The picture is from a game versus Iron. The yellow circles are known spider mines outside detection range, and the green circle and red X mean that at least 2 more enemies are out of sight, likely nearby.

hidden enemies symbols

I’m not finished with AIIDE 2018. I want to analyze aspects of CherryPi and SAIDA. I’ll squeeze that in too, sooner or later.

LastOrder and its macro model - technical info

Time to dig into the details! I read the paper and some of the code to find stuff out.

LastOrder’s “macro” decisions are made by a neural network whose data size is close to 8MB—much larger than LastOrder’s code (but much smaller than CherryPi’s model data). There is room for a lot of smarts in that many bytes. The network takes in a summary description of the game state as a vector of feature values, and returns a macro action, what to build or upgrade or whatever next. The code to marshal data to and from the network is in StrategyManager.cpp.

network input

The list of network input features is initialized in the StrategyManager constructor and filled in in StrategyManager::triggerModel(). There are a lot of features. I didn’t dig into the details, but it looks as though some of the features are binary, some are counts, some are X or Y values that together give a position on the map, and a few are other numbers. They fall into these groups:

• State features. Basic information about the map and the opponent, our upgrades and economy, our own and enemy tech buildings.

• Waiting to build features. I’m not sure what these mean, but it’s something to do with production.

• “Our battle basic features” and “enemy battle basic features.” Combat units.

• “Our defend building features” and “enemy defend building features.” Static defense.

• “Killed” and “kill” features, what units of ours or the enemy’s are destroyed.

• A mass of features related to our current attack action and what the enemy has available to defend against it.

• “Our defend info” looks like places we are defending and what the enemy is attacking with.

• “Enemy defend info” looks like it locates the enemy’s static defense relative to the places we are interested in attacking.

• “Visible” gives the locations of the currently visible enemy unit types. I’m not quite sure what this means. A unit type doesn’t have an (x,y) position, and it seems as though LastOrder is making one up. It could be the location of the largest group of each unit type, or the closest unit of each type, or something. Have to read more code.

With this much information available, sophisticated strategies are possible in principle. It’s not clear how much of this the network successfully understands and makes use of. The games I watched did not give the impression of deep understanding, but then again, we have to remember that LastOrder learned to play against 20 specific opponents. Its results against those opponents suggest that it does understand them deeply.

network output

It looks like the network output is a single macro action. Code checks whether the action is valid in the current situation and, if so, calls on the appropriate manager to carry it out. The code is full of I/O details and validation and error handling, so I might have missed something in the clutter. Also the code shows signs of having been modified over time without tying up loose ends. I imagine they experimented actively.

By the way, the 9 pool/10 hatch muta/12 hatch muta opening choices and learning code from Overkill are still there, though Overkill’s opening learning is not used.

learning setup

The learning setup uses Ape-X DQN. The term is as dense as a neutron star! Ape-X is a way to organize deep reinforcement learning; see the paper Distributed Prioritized Experience Replay by Horgan et al of Google’s DeepMind. In “DQN”, D stands for deep and as far as I’m concerned is a term of hype and means “we’re doing the cool stuff.” Q is for Q-learning, the form of reinforcement learning you use when you know what’s good (winning the game) and you have to figure out from experience a policy (that’s a technical term) to achieve it in a series of steps over time. The policy is in effect a box where you feed in the situation and it tells you what to do in that situation. What’s good is represented by a reward (given as a number) that you may receive long after the actions that earned it; that can make it hard to figure out a good policy, which is why you end up training on a cluster of 1000 machines. Finally, “N” is for the neural network that acts as the box that knows the policy.

In Ape-X, the learning system consists of a set of Actors that (in the case of LastOrder) play Brood War and record the input features and reward for each time step, plus a Learner (the paper suggests that 1 learner is enough, though you could have more) that feeds the data to a reinforcement learning algorithm. The Actors are responsible for exploring, that is, trying out variations from the current best known policy to see if any of them are improvements. The Ape-X paper suggests having different Actors explore differently so you don’t get stuck in a rut. In the case of LastOrder, the Actors play against a range of opponents. The Learner keeps track of which which data points are more important to learn and feeds those in more often to speed up learning. If you hit a surprise, meaning the reward is much different than you expected (“I thought I was winning, then a nuke hit”), that’s something important to learn.

LastOrder seems to have closely followed the Ape-X DQN formula from the Ape-X paper. They name the exact same set of techniques, although many other choices are possible. Presumably DeepMind knows what they’re doing.

LastOrder does not train with a reward “I won/I lost.” That’s very little information and appears long after the actions that cause it, and it would leave learning glacially slow. They use reward shaping, which means giving a more informative reward number that offers the learning system more clues about whether it is going in the right direction. They use a reward based on the current game score.

the network itself

Following an idea from the 2015 paper Deep Recurrent Q-Learning for Partially Observable MDPs by Hausknecht and Stone, the LastOrder team layered a Long Short-Term Memory network in front of the DQN. We’ve seen LSTM before in Tscmoo (at least at one point; is it still there?). The point of the LSTM network is to remember what’s going on and more fully represent the game state, because in Brood War there is fog of war. So inputs go through the LSTM to expand the currently observed game state into some encoded approximation of all the game state that has been seen so far, then through the DQN to turn that into an action.

The LastOrder paper does not go into detail. There is not enough information in it to reproduce their network design. The Actor and Learner code is in the repo. I haven’t read it to see if it tells us everything.

Taken together it’s a little complicated, isn’t it? Not something for one hobbyist to try on their own. I think you need a team and a budget to put together something like this.

LastOrder and its macro model - general info

LastOrder (github) now has a 15-page academic paper out, Macro action selection with deep reinforcement learning in StarCraft by 6 authors including Sijia Xu as lead author. The paper does not go into great detail, but it reveals new information. It also uses a lot of technical terms without explanation, so it may be hard to follow if you don’t have the background. Also see my recent post how LastOrder played for a concrete look at its games.

I want to break my discussion into 2 parts. Today I’ll go over general information, tomorrow I’ll work through technical stuff, the network input and output and training and so on.

The name LastOrder turns out to be an ingenious reference to the character Last Order from the A Certain Magical Index fictional universe, the final clone sister. The machine learning process produces a long string of similar models which go into combat for experimental purposes, and you keep the last one. Good name!

LastOrder divides its work into 2 halves, “macro” handled by the machine learning model and “micro” handled by the rule-based code derived from Overkill. It’s a broad distinction; in Steamhammer’s 4-level abstraction terms, I would say that “macro” more or less covers strategy and operations, while “micro” covers tactics and micro. The macro model has a set of actions to build stuff, research tech, and expand to a new base, and a set of 18 attack actions which call for 3 different types of attack in each of 5 different places plus 3 “add army” actions which apparently assign units to the 3 types of attack. (The paper says 17 though it lists 18. It looks as though the mutalisk add army action is unused, maybe because mutas are added automatically.) There is also an action to do nothing.

The paper includes a table on the last page, results of a test tournament where each of the 28 AIIDE 2017 participants played 303 games against LastOrder. We get to see how LastOrder scored its advertised 83% win rate: #2 PurpleWave and #3 Iron (rankings from AIIDE 2017) won nearly all games, no doubt overwhelming the rule-based part of LastOrder so that the macro model could not help. Next Microwave scored just under 50%, XIMP scored about 32%, and all others performed worse, including #1 ZZZKBot at 1.64% win rate—9 bots scored under 2%. When LastOrder’s micro part is good enough, the macro part is highly effective.

In AIIDE 2018, #13 LastOrder scored 49%, ranking in the top half. The paper has a brief discussion on page 10. LastOrder was rolled by top finishers because the micro part could not keep up with #9 Iron and above (according to me) or #8 McRave and above (according to the authors, who know things I don’t). Learning can’t help if you’re too burned to learn. LastOrder was also put down by terrans Ecgberht and WillyT, whose play styles are not represented in the 2017 training group, which has only 4 terrans (one of which is Iron that LastOrder cannot touch).

In the discussion of future work (a mandatory part of an academic paper; the work is required to be unending), they talk briefly about how to fix the weaknesses that showed in AIIDE 2018. They mention improving the rule-based part and learning unit-level micro to address the too-burned-to-learn problem, and self-play training to address the limitations of the training opponents. Self-play is the right idea in principle, but it’s not easy. You have to play all 3 races and support all the behaviors you might face, and that’s only the starting point before making it work.

I’d like to suggest another simple idea for future work: Train each matchup separately. You lose generalization, but how much do production and attack decisions generalize between matchups? I could be wrong, but I think not much. Instead, a zerg player could train 3 models, ZvT ZvP and ZvZ, each of which takes fewer inputs and is solving an easier problem. A disadvantage is that protoss becomes relatively more difficult if you allow for mind control.

LastOrder has skills that I did not see in the games I watched. There is code for them, at least; whether it can carry out the skills successfully is a separate question. It can use hydralisks and lurkers. Most interestingly, it knows how to drop. The macro model includes an action to research drop (UpgradeOverlordLoad), an action to assign units and presumably load up for a drop (AirDropAddArmy), and actions to carry out drops in different places (AttackAirDropStart for the enemy starting base, AttackAirDropNatural, AttackAirDropOther1, AttackAirDropOther2, AttackAirDropOther3). The code to carry out drops is AirdropTactic.cpp; it seems to expect to drop either all zerglings, all hydralisks, or all lurkers, no mixed unit types. Does LastOrder use these skills at all? If anybody can point out a game, I’m interested.

Learning to when to make hydras and lurkers should not be too hard. If LastOrder rarely or never uses hydras, it must be because it found another plan more effective—in games you make hydras first and then get the upgrades, so it’s easy to figure out. If it doesn’t use lurkers, maybe they didn’t help, or maybe it didn’t have any hydras around to morph after it tried researching the upgrade, because hydras were seen as useless. But still, it’s only 2 steps, it should be doable. Learning to drop is not as easy, though. To earn a reward, the agent has to select the upgrade action, the load action, and the drop action in order, each at a time when it makes sense. Doing only part of the sequence sets you back, and so does doing the whole sequence if you leave too much time between the steps, or drop straight into the enemy army, or make any severe mistake. You have to carry through accurately to get the payoff. It should be learnable, but it may take a long time and trainloads of data. I would be tempted to explicitly represent dependencies like this in some way or another, to tell the model up front the required order of events.

AIIDE 2018 - how CherryPi played

Overall, the play of the AIIDE 2018 CherryPi version looks similar to last year’s CherryPi which is still playing on SSCAIT. It still has the devastating ling micro, and it still prefers to win games with a flood of low-level units. It still gets melee +1 attack even when +1 carapace seems better. (Do CherryPi’s micro skills make +1 attack better, and if so, how?) Mutalisk micro looks very similar to Tscmoo’s, with mutas individually cautious and clever and collectively lazy and uncoordinated. It can use lurkers, guardians, and ultralisks. I didn’t see defilers, even when they would have been useful.

CherryPi scouts extremely aggressively with its first 2 overlords. They stick near the enemy base and try to poke into every corner, even if the enemy is terran and can shoot them down early. It gets a clear view, which must be useful for its build order switcher. The drawback is that the overlords often die young.

I think this CherryPi looks beatable. It doesn’t have SAIDA’s wide knowledge of action and reaction. It doesn’t have Steamhammer’s knowledge of how to react to LastOrder’s excessive static defense (but usually wins anyway with a zergling flood). It sometimes ignores undefended enemy bases, preferring to attack into the enemy’s strength—or even to wait idly. Game 31245 versus Iron shows it sticking with gas units and failing at macro; it forgot its love of zerglings. It doesn’t know whether it is ahead or behind, and it doesn’t realize that when it is maxed and owns the map, it ought to attack regardless of losses. It’s strong and tricky, but it also makes mistakes. I think next year’s version had better be improved if they don’t want to be overtaken.

Here are the names of the build orders that CherryPi recorded itself as playing in its opponent learning files. One of CherryPi’s major advertised features is a learned build order switcher that can switch to a new build order on the fly. It recorded 103 build order wins/losses for each opponent (except a couple with fewer), and 103 rounds were played, so these appear to be opening build orders only rather than all build orders tried throughout each game. Presumably the openings reflect CherryPi’s intentions when it started the game. It may not have followed the initial build order to its end.

  • 10hatchling
  • 2hatchmuta
  • 3basepoollings
  • 9poolspeedlingmuta
  • hydracheese
  • zve9poolspeed
  • zvp10hatch
  • zvp3hatchhydra
  • zvp6hatchhydra
  • zvpohydras
  • zvpomutas
  • zvt2baseguardian
  • zvt2baseultra
  • zvt3hatchlurker
  • zvtmacro
  • zvz12poolhydras
  • zvz9gas10pool
  • zvz9poolspeed
  • zvzoverpool

CherryPi tried between 1 and 4 openings against each opponent. CherryPi sometimes switched away from its initial try even if it won all games (for example, against CDBot and Hellbot), so I’m not sure what the switching criterion is. But opponents that it tried 4 openings against are all ones that gave it a touch of trouble.

grep -c key *.json

AILien.json:1
Aiur.json:1
Arrakhammer.json:1
BlueBlueSky.json:3
CDBot.json:2
CSE.json:4
CUNYBot.json:1
DaQin.json:1
Ecgberht.json:1
Hellbot.json:2
ISAMind.json:2
Iron.json:1
KillAll.json:2
LastOrder.json:3
LetaBot.json:4
Locutus.json:4
McRave.json:4
MetaBot.json:1
Microwave.json:3
SAIDA.json:4
Steamhammer.json:4
Tyr.json:1
UAlbertaBot.json:2
WillyT.json:1
Ximp.json:2
ZZZKBot.json:4

The other machine learning feature advertised for CherryPi is a building placer. It was trained against human building placements and apparently takes into account some of the bot’s intentions. I recommend against training on human play (or at least exclusively on human play), because machines play differently. Teaching a bot to blindly imitate human decisions that it doesn’t understand will lead to mistakes. It’s worse than teaching a human to imitate without understanding, because the bot won’t figure things out on its own. Nevertheless, CherryPi’s building placement does seem cleaner than other bots. To me the building placement looks simple and logical, but not sophisticated like a strong human player’s. Here’s an example from a ZvZ game, game 1755. The sunken colony does not interfere with gas mining, and it is somewhat protected from zergling surrounds by the geyser, the spawning pool, and the lair itself, while remaining open for drone drills on the drone side. The spire is curiously far away; I would have fit it into the gap next to the sunken. It looks OK but a little loose, not quite optimized. (By the way, game 14742 against the same opponent has the same building layout, except that the spire is placed close.)

ZvZ building placement in game 1755

CherryPi has gained new tactical tricks. I mentioned the burrow trick where it burrows zerglings at expansion locations. So far, I haven’t seen a game where the opponent was ready for the trick; I imagine it contributed to a lot of wins, even though CherryPi sometimes researches burrow and then never uses it. (And I’m disappointed. I thought of using this trick in Steamhammer, and didn’t because I expected that bots which knew how to clear spider mines would also know how to clear burrowed zerglings. I think I was wrong!) As far as I’ve seen, CherryPi doesn’t use burrow for any other purpose (though I wouldn’t be surprised, since there are so many). CherryPi also does zergling runbys; an example is game 1406 versus SAIDA where CherryPi played an unusual and not entirely efficient gas-first 3 hatch zergling build.

CherryPi doesn’t have as many complex skills as SAIDA, but it has a good number. I doubt I saw everything it can do.

Steamhammer 2.1.1 status

Posts take time, but I am also making progress on Steamhammer. The next version will have no big features, so I’ll call it version 2.1.1. Big stuff has to wait until next year, after the tournament season. I found a second serious bug in defiler control, and now defilers consistently move to where they are wanted. It helps them swarm and plague more actively, though they still don’t do as much work as I would like. So far, 2.1.1 development version keeps track of burrowed units accurately, recognizes enemy proxies better, wards off the enemy scout worker more reliably, and has some improved macro decisions and emergency reactions, a new opening (of course), and a variety of other fixes.

There are a bunch of debilitating bugs in squad control and micro. For upcoming work, I have my eyes on 2 bugs in particular that I think cause the most frequent setbacks (rather than the most glaring blunders), the suicide pokes and the stuck units. If I get those fixed in time, I have a priority list of more stuff. If I succeed, Steamhammer will play better in nearly every game, which should show in the SSCAIT round robin phase.

note on CherryPi

I’ve been watching CherryPi’s AIIDE games. No conclusions yet, but I noticed that CherryPi likes to research burrow (not the first bot to do so) and burrow scouting lings at expansions to watch for the enemy (I think it’s the first to do that). SAIDA appeared unready for the trick. When an SCV showed up, CherryPi did not unburrow the ling, but sent another to prevent the expansion.

AIIDE 2018 - how LastOrder played

The new bot #13 LastOrder is related to the classic Overkill by Sijia Xu, but uses a machine learning model to make certain decisions: According to the description, “all the production of unit (excluding overlord), building, upgrade, tech and trigger of attack.” The learning is entirely offline; LastOrder does not store information about its opponents between games. Tactical and micro decisions, and I think building placement, are decided by rule-based code. One survey answer says,

we train the proposed macro model against about 20 AIIDE 2017 bots on a 1000 machines cluster scheduled by Tournament manager. the final model achieve about 83% win rate on all AIIDE 2017 bots

Against the stronger AIIDE 2018 bots, LastOrder scored about 49%, good enough to land in the top half of the ranking. I think the 83% win rate is too high for LastOrder’s underlying strength; I suspect that it overfitted to its 20 opponents. I think it learned to recognize some of its training opponents by their play style, and when it sees similar signs from different bots that play differently, it reacts incorrectly to a game plan that the different bot does not follow.

I watched a bunch of games to see what kind of play LastOrder figured out for itself. LastOrder’s units are mutalisks and zerglings, sometimes with scourge; I did not see it make other units (though Overkill has hydralisk skills that it might have chosen). LastOrder’s game plan is to open safely with 9 pool, sit back for a while, watch the opponent, lay down massive static defenses when danger seems to loom, macro up lots of drones, zerglings, and mutalisks behind its ramparts, and eventually burst into action and overwhelm the opponent. Details vary, but the overall game plan seemed consistent in all the games I watched.

It’s not an objectively strong game plan, but it seems effective against many bots. LastOrder had trouble touching stronger bots, upsetting only Steamhammer, and was itself upset by Ecgberht and WillyT, which as terrans had no difficulty steamrolling static defenses. But it scored highly against most lower-ranked opponents, including LetaBot (which may have been on its panel of 20 with little change).

Game 39, LastOrder-Steamhammer (replay file), was a good example of the game plan. LastOrder countered zergling with zergling for a while, then seemed to grow bored and made 4 sunkens to hide behind—far more than necessary or useful. A little later, it seemed to predict Steamhammer’s spire timing, adding excessive spores too. Steamhammer understands in principle how to react: It makes extra drones and gets ahead in both army and economy. Steamhammer could not safely attack the heavy defenses, but it could prevent LastOrder from expanding beyond its natural and win slowly. Sure enough, LastOrder tried to expand to a third, Steamhammer caught it and sent the entire army to erase the attempt—and LastOrder exploited the play, which was strategically correct but tactically wrong, hitting Steamhammer’s natural while its forces were out of position. Steamhammer’s tactical analysis is minimal; it doesn’t realize that it should destroy the expansion attempt with a small detachment.

Game 33041, LastOrder-Tyr (replay file), is one of the games that makes me suspect that LastOrder overfitted. Watch what happens after 7:00. LastOrder scouts Tyr’s turtle cannons with a zergling. LastOrder immediately reacts by building... many spore colonies, a nonsensical action. I think LastOrder saw the cannons and concluded, “I’ve seen this play before, and I know what is coming: Carriers!” It believes it is playing against XIMP. It plays similarly in games against XIMP.

LastOrder is a super interesting experiment. It did not score high like CherryPi, but it applied reinforcement learning to a more difficult problem, and it is far more successful than Sijia Xu’s past experiments with machine learning in Overkill. Its middling result is worth something, and yet its play remains somewhat disappointing. LastOrder’s play is highly reactive, but the reactions are often poor and the bot’s range of play is narrow (a wider pool of training opponents should help). I didn’t give examples, but many games show dishearteningly weak macro and mistaken tech decisions (possibly a better training methodology is needed). The problem is not solved yet!

AIIDE 2018 - what McRave learned

McRave, like Microwave and no doubt most bots that follow more than one plan, plays different openings against different races. In each opponent’s learning file, it writes win/loss numbers for 15 strategies. Their names all start with “P” for protoss, but I have stripped out the P to make the table more readable. 4 of the strategies are unused: ZealotDrop, NZCore (sounds like no zealot core), Proxy99, and Proxy6. That leaves 11 active openings. The races they were used against seen in the table. ZZCore (2 zealots before core) was played only against random.

#bottotal12Nexus1GateCorsair1GateRobo21Nexus2GateDragoon2GateExpand4GateDTExpandFFEZCoreZZCore
#1saida16-55  23%1-12 8%--7-17 29%1-12 8%--7-14 33%---
#2cherrypi15-88  15%-6-25 19%---6-25 19%2-20 9%-1-18 5%--
#3cse27-75  26%--7-19 27%--5-17 23%2-15 12%--13-24 35%-
#4bluebluesky29-74  28%--1-14 7%--2-15 12%7-18 28%--19-27 41%-
#5locutus46-56  45%--5-12 29%--15-15 50%14-15 48%--12-14 46%-
#6isamind54-49  52%--7-11 39%--4-10 29%15-14 52%--28-14 67%-
#7daqin60-43  58%--13-11 54%--4-9 31%8-10 44%--35-13 73%-
#9iron56-32  64%27-8 77%--2-7 22%18-9 67%--9-8 53%---
#10zzzkbot75-28  73%-8-7 53%---17-7 71%21-7 75%-29-7 81%--
#11steamhammer64-38  63%-9-9 50%---27-10 73%15-10 60%-13-9 59%--
#12microwave82-21  80%-0-5 0%---39-4 91%30-5 86%-13-7 65%--
#13lastorder97-6  94%-10-2 83%---17-1 94%10-2 83%-60-1 98%--
#14tyr91-10  90%--23-3 88%--7-5 58%31-1 97%--30-1 97%-
#15metabot49-46  52%--8-11 42%--16-12 57%23-14 62%--2-9 18%-
#16letabot77-15  84%12-5 71%--5-5 50%20-4 83%--40-1 98%---
#17arrakhammer102-1  99%------94-1 99%-8-0 100%--
#18ecgberht99-2  98%95-0 100%---3-1 75%--1-1 50%---
#19ualbertabot73-29  72%-----12-8 60%38-6 86%-7-7 50%-16-8 67%
#20ximp41-59  41%--8-14 36%--15-17 47%18-18 50%--0-10 0%-
#21cdbot103-0  100%------103-0 100%----
#22aiur80-21  79%--11-6 65%--13-6 68%41-3 93%--15-6 71%-
#23killall60-43  58%-3-9 25%---6-9 40%19-12 61%-32-13 71%--
#24willyt77-17  82%37-2 95%--3-6 33%23-4 85%--14-5 74%---
#25ailien86-17  83%-31-3 91%---20-5 80%5-6 45%-30-3 91%--
#26cunybot91-8  92%-26-1 96%---36-1 97%14-3 82%-15-3 83%--
#27hellbot103-0  100%---------103-0 100%-
overall-  68%172-27 86%93-61 60%83-101 45%17-35 33%65-30 68%261-176 60%510-180 74%71-29 71%208-68 75%257-118 69%16-8 67%

Unlike other bots that scored comparatively well against SAIDA—meaning they weren’t always wiped summarily from the map—McRave did not rely solely on cloaked units. The DTExpand opening scored best, but 21Nexus was nearly as successful. (McRave scored inconsistently against lower-ranked bots, though, as its author has commented.)

Every strategy came out with some good scores. But here is another analysis: Suppose the goal of the learning algorithm is to find the single most successful strategy (which is not always true—you might want to find the best mix to confuse the opponent’s learning). Leaving aside CDBot and HellBot, which McRave scored 100% against, against how many opponents was each opening the best choice? I made this table by hand, so there might be mistakes. I counted equal best as also best. The “versus” column tells which races the opening was used against.

openingbestversus
12Nexus3T
1GateCorsair2Z
1GateRobo0P
21Nexus0T
2GateDragoon0T
2GateExpand6P, Z, R
4Gate5P, Z, R
DTExpand2T
FFE5Z, R
ZCore4P
ZZCore0R

The counts do not match up well with the overall winning rates. There were 4 never-best openings. This analysis does not say that they are bad openings that dragged down the score. Consider what would have happened if they had not been enabled: Their games would have been distributed among the other openings; there would have been some extra wins and some extra losses, and the ratio would depend on the distribution. 21Nexus was never best, but scored second best against SAIDA and contributed as many wins. On the other hand, the openings which were often best were definitely worth having; they were well-chosen for McRave versus this set of opponents. It could make sense to try those openings first, or more often. On the third hand, notice that the openings with the highest counts were played against the largest number of opponents. There were more bests to count! Openings versus terran scored 5 bests because there were 5 terran opponents.

Plenty of similar analyses could be done. For example, you could count how often or how widely an opening scored above/below the average for each opponent: Did it make a net contribution, or the opposite? It would be another way of seeing whether the openings were well chosen for the opponents they faced.

Next I want to start watching some replays. I think I will start with LastOrder, which did all its learning offline yet held its win rate steady against the onslaught of learning bots. I’m expecting it to be interesting and sophisticated in some way.

AIIDE 2018 - what UAlbertaBot learned

UAlbertaBot played random, and its openings are chosen, not according to the opponent’s race, but according to its own once the game starts. It has 3 protoss, 4 terran, and 4 zerg openings. Playing random gives the disadvantage of having about 1/3 as many games to figure out how to counter the opponent with each race. The countervailing advantage, of course, is that the opponent can’t predict what is coming its way.

103 rounds were played and UAlbertaBot does not deliberately drop data, so some of the totals add up to more than the 100 official rounds. UAlbertaBot also had 46 crashes, so some totals add up to less. For example, it recorded 96 games against LastOrder.

The official site doesn’t offer binaries for the bots which were carried over from last year, but this should be the 2017 version of UAlbertaBot. It has enemy-specific strategies configured for 13 opponents, of which 5 are also in this tournament: #9 Iron, #10 ZZZKBot, #16 LetaBot, #2o Ximp, and #22 Aiur. For ZZZKBot, only the protoss opening is set; for the others, all 3 races have openings set. Looking at the table, we see that UAlbertaBot did not always try all of its openings, and the blanks in the table do not always correspond to enemy-specific openings. Apparently in this UAlbertaBot version, the enemy-specific strategies act as hints rather than requirements: When available they are tried first, and when not, the default opening is tried first (ZealotRush, MarineRush, or ZerglingRush). If the first opening tried performs well enough, UAlbertaBot sticks with it.

#bottotalProtossTerranZerg
DTRushDragoonRushZealotRush4RaxMarinesMarineRushTankPushVultureRush2HatchHydra3HatchMuta3HatchScourgeZerglingRush
#1saida13-88  13%12-7 63%0-2 0%0-5 0%0-9 0%0-9 0%1-13 7%0-9 0%0-9 0%0-9 0%0-8 0%0-8 0%
#2cherrypi1-99  1%0-8 0%0-7 0%0-7 0%0-8 0%1-11 8%0-8 0%0-8 0%0-11 0%0-11 0%0-10 0%0-10 0%
#3cse2-99  2%0-7 0%2-14 12%0-7 0%0-11 0%0-10 0%0-10 0%0-10 0%0-8 0%0-8 0%0-7 0%0-7 0%
#4bluebluesky11-92  11%0-4 0%3-10 23%4-11 27%0-5 0%0-5 0%2-11 15%0-5 0%0-9 0%0-8 0%0-8 0%2-16 11%
#5locutus6-97  6%0-7 0%4-17 19%0-7 0%0-8 0%0-8 0%1-11 8%0-8 0%1-10 9%0-7 0%0-7 0%0-7 0%
#6isamind5-96  5%0-7 0%4-17 19%0-7 0%0-9 0%0-8 0%0-8 0%0-8 0%0-7 0%0-7 0%0-7 0%1-11 8%
#7daqin12-90  12%4-12 25%0-4 0%2-9 18%0-6 0%0-6 0%1-6 14%0-5 0%2-13 13%0-7 0%0-7 0%3-15 17%
#8mcrave29-71  29%5-12 29%1-6 14%0-5 0%0-3 0%10-13 43%1-5 17%0-3 0%2-6 25%0-3 0%0-3 0%10-12 45%
#9iron9-94  9%0-10 0%1-14 7%0-9 0%0-8 0%0-8 0%0-8 0%1-12 8%1-6 14%1-6 14%0-4 0%5-9 36%
#10zzzkbot13-87  13%0-3 0%0-3 0%13-20 39%0-9 0%0-9 0%0-9 0%0-9 0%0-7 0%0-6 0%0-6 0%0-6 0%
#11steamhammer11-92  11%0-5 0%0-5 0%8-19 30%1-10 9%0-6 0%0-6 0%0-6 0%0-7 0%0-7 0%0-7 0%2-14 12%
#12microwave20-81  20%--18-7 72%0-7 0%2-14 12%0-7 0%0-7 0%0-10 0%0-10 0%0-10 0%0-9 0%
#13lastorder4-92  4%0-6 0%0-6 0%2-12 14%2-10 17%0-5 0%0-5 0%0-5 0%0-11 0%0-11 0%0-11 0%0-10 0%
#14tyr36-61  37%5-12 29%0-4 0%0-5 0%0-2 0%3-4 43%13-7 65%1-2 33%13-15 46%0-3 0%0-3 0%1-4 20%
#15metabot35-56  38%4-5 44%6-5 55%2-4 33%1-6 14%3-9 25%1-6 14%0-3 0%0-2 0%6-3 67%3-3 50%9-10 47%
#16letabot48-44  52%11-14 44%0-3 0%2-6 25%0-2 0%1-4 20%0-2 0%4-7 36%30-6 83%---
#17arrakhammer56-41  58%--23-6 79%0-6 0%0-6 0%0-6 0%0-6 0%---33-11 75%
#18ecgberht40-56  42%9-7 56%9-8 53%1-4 20%0-2 0%0-5 0%0-2 0%6-7 46%0-3 0%0-3 0%0-3 0%15-12 56%
#20ximp38-56  40%0-2 0%7-7 50%4-5 44%0-4 0%0-4 0%9-19 32%1-6 14%--17-9 65%-
#21cdbot44-54  45%--23-4 85%0-2 0%19-15 56%0-2 0%0-2 0%0-6 0%1-9 10%0-5 0%1-9 10%
#22aiur57-45  56%35-1 97%--0-2 0%0-2 0%0-2 0%11-10 52%1-5 17%9-15 38%0-3 0%1-5 17%
#23killall73-27  73%--30-8 79%0-2 0%12-6 67%0-2 0%0-2 0%---31-7 82%
#24willyt36-55  40%3-12 20%1-8 11%0-5 0%0-4 0%0-5 0%0-4 0%10-11 48%---22-6 79%
#25ailien71-30  70%--18-11 62%16-10 62%2-4 33%0-2 0%0-2 0%---35-1 97%
#26cunybot75-15  83%--23-1 96%-30-7 81%-----22-7 76%
#27hellbot100-2  98%--33-0 100%-41-2 95%-----26-0 100%
overall-  33%88-141 38%38-140 21%206-184 53%20-145 12%124-185 40%29-161 15%34-153 18%50-151 25%17-133 11%20-121 14%219-206 52%

The DT rush caused surprising problems for SAIDA, but terran and zerg had nothing. Did playing random contribute? Does the updated current SAIDA, flame-hardened on SSCAIT, react better? The hand-chosen 2 hatch hydra also did strikingly well against LetaBot, not an obvious choice. Every opening had a plus score against some opponent, though VultureRush barely made it over. Looking across the bottom row, the default openings had the best overall results for each race—they were chosen correctly. Also, we can see that protoss was UAlbertaBot’s best race, and terran the worst; we already knew that, but here we see it in the numbers.

AIIDE 2018 - what Microwave learned

Microwave uses UCB and keeps its learning data in the same file format as UAlbertaBot, one file per opponent listing on each line an opening, a count of wins, and a count of losses. It’s a simple format that is also used outside the UAlbertaBot family. Microwave adds a twist: It does not allow the count of wins or the count of losses to exceed 10. I’m not sure what the exact update rule is without reading the code, but the effect is that only the more recent game results are remembered. It’s appropriate if the enemy is expected to be learning too, and to change its strategy rapidly so that Microwave has to keep adapting.

Microwave plays different strategies against each race. Against Terran it has 7, against Protoss and Zerg 8, and against random 6. UAlbertaBot was the only random opponent. The strategies partly overlap. For example, 10Hatch9Pool9gas is played against both terran and protoss, while 9HatchMain8Pool8Gas is played only against zerg. The table has big blank spaces full of unplayed strategies. Maybe I should have sorted it by race, instead of by rank?

#bottotal10Hatch9Pool9gas12Pool3HatchPoolHydra5HatchGasHydra5Pool9HatchMain8Pool8Gas9Pool9PoolExpo9PoolHatch9PoolLurker9PoolSpeed9PoolSpeedLing9PoolSunkenOverpoolOverpoolSpeedZvT_12HatchHydraZvT_12HatchLurkerZvT_12HatchMutaZvZ_Overpool11Gas
#1saida0-70  0%0-10 0%---0-10 0%-0-10 0%--0-10 0%-----0-10 0%0-10 0%0-10 0%-
#2cherrypi0-80  0%-0-10 0%--0-10 0%0-10 0%--0-10 0%-0-10 0%-0-10 0%-0-10 0%---0-10 0%
#3cse0-80  0%0-10 0%-0-10 0%0-10 0%0-10 0%------0-10 0%---0-10 0%0-10 0%0-10 0%-
#4bluebluesky0-80  0%0-10 0%-0-10 0%0-10 0%0-10 0%------0-10 0%---0-10 0%0-10 0%0-10 0%-
#5locutus1-80  1%1-10 9%-0-10 0%0-10 0%0-10 0%------0-10 0%---0-10 0%0-10 0%0-10 0%-
#6isamind0-80  0%0-10 0%-0-10 0%0-10 0%0-10 0%------0-10 0%---0-10 0%0-10 0%0-10 0%-
#7daqin0-80  0%0-10 0%-0-10 0%0-10 0%0-10 0%------0-10 0%---0-10 0%0-10 0%0-10 0%-
#8mcrave7-68  9%1-10 9%-1-10 9%0-5 0%1-8 11%------1-10 9%---1-10 9%0-5 0%2-10 17%-
#9iron0-70  0%0-10 0%---0-10 0%-0-10 0%--0-10 0%-----0-10 0%0-10 0%0-10 0%-
#10zzzkbot24-37  39%-5-8 38%--0-2 0%9-10 47%--9-10 47%-0-1 0%-0-1 0%-0-1 0%---1-4 20%
#11steamhammer57-15  79%-10-2 83%--6-7 46%1-2 33%--10-2 83%-10-0 100%-0-1 0%-10-1 91%---10-0 100%
#13lastorder24-21  53%-0-1 0%--10-2 83%0-1 0%--2-4 33%-0-1 0%-10-6 62%-1-3 25%---1-3 25%
#14tyr15-13  54%2-3 40%-0-1 0%3-4 43%10-1 91%------0-1 0%---0-1 0%0-1 0%0-1 0%-
#15metabot41-13  76%10-2 83%-8-3 73%0-1 0%0-1 0%------10-1 91%---1-2 33%2-3 40%10-0 100%-
#16letabot26-18  59%4-5 44%---1-2 33%-10-0 100%--8-5 62%-----0-1 0%0-1 0%3-4 43%-
#17arrakhammer27-22  55%-7-8 47%--10-0 100%0-1 0%--0-1 0%-3-4 43%-5-4 56%-2-3 40%---0-1 0%
#18ecgberht38-18  68%0-1 0%---10-0 100%-0-1 0%--1-2 33%-----10-7 59%10-0 100%7-7 50%-
#19ualbertabot50-10  83%----10-1 91%-0-1 0%10-0 100%10-4 71%---10-4 71%10-0 100%-----
#20ximp27-15  64%2-3 40%-0-1 0%0-1 0%0-1 0%------10-0 100%---5-6 45%0-1 0%10-2 83%-
#21cdbot46-13  78%-10-0 100%--0-1 0%1-2 33%--4-5 44%-10-3 77%-1-2 33%-10-0 100%---10-0 100%
#22aiur48-15  76%1-2 33%-10-1 91%7-5 58%0-1 0%------9-3 75%---1-2 33%10-1 91%10-0 100%-
#23killall40-5  89%-10-0 100%--0-1 0%10-0 100%--0-1 0%-10-0 100%-10-1 91%-0-1 0%---0-1 0%
#24willyt34-10  77%4-5 44%---0-1 0%-0-1 0%--0-1 0%-----10-2 83%10-0 100%10-0 100%-
#25ailien28-32  47%-9-10 47%--1-4 20%0-1 0%--3-6 33%-0-1 0%-10-2 83%-5-7 42%---0-1 0%
#26cunybot67-1  99%-10-0 100%--10-0 100%0-1 0%--10-0 100%-10-0 100%-7-0 100%-10-0 100%---10-0 100%
#27hellbot74-0  100%10-0 100%-10-0 100%6-0 100%10-0 100%------8-0 100%---10-0 100%10-0 100%10-0 100%-
overall-  42%35-101 26%61-39 61%29-66 31%16-66 20%79-113 41%21-28 43%10-23 30%10-0 100%48-43 53%9-28 24%43-20 68%38-65 37%53-31 63%10-0 100%38-26 59%38-101 27%42-82 34%62-94 40%32-20 62%

The total column tells how successful Microwave was in recent games against each opponent. You might want to compare the percentages against the overall win rates from the official crosstable; they sometimes vary curiously. When the recorded results were less successful than the total results, it suggests that Microwave may have forgotten too much (though it could be random fluctuation). For example, Microwave scored 80% against LetaBot overall, but 59% in the recent games in this table.

The overall row tells how successful each opening was in recent games. Every opening was successful against some opponents, so there were no useless strategies. The body of the table, from #10 ZZZKBot and down, is full of strong contrasts, meaning that there was a big difference between the successful and unsuccessful openings against each opponent. That suggests that learning must have been useful.

SAIDA again under threat

Another brief note: On SSCAIT, SAIDA is again threatening to topple from its #1 position. I expect it would hold #1 easily if there were no voting, but voters distort the pairings, and its top opposition has been chipping away at its dominance. SAIDA’s win rate has fallen to about 3/4, from a high over 9/10. Will SAIDA get another update soon and recover?

AIIDE 2018 - what Locutus learned

The Locutusoids have learning data only slightly different from Steamhammer’s. I have run my summarizer code for CSE, BlueBlueSky, Locutus, and ISAMind, skipping DaQin because it recorded only 1 game per opponent (which tickles a bug in my code). I am thinking of posting only the Locutus results, because the others don’t hold much extra interest. Locutus plays a wider range of openings than the others (perhaps because newer bots have to restrict their scope). CSE in particular is more in the do-one-thing-well camp. Besides, all of them had high win rates against lower-ranked opponents; they did not have much to learn. I don’t see a point in piling up data about similar players.

But if people want, I can post them all. Any requests?

Locutus is the only Locutusoid to use pre-learned data. Some of the others had their own ways of preparing for known opponents. For example, CSE is configured with several enemy-specific strategies, such as DT drop against #9 Iron.

Here is a summary of the pre-learned data used by Locutus. Locutus is configured to retain at most 200 game records per opponent, so that’s as much pre-learned data as it makes sense to give it. When you give it that much, each tournament game record added at the end causes one pre-learned record to scroll off the beginning. At the end of a 100 round tournament, half the game records are retained from the pre-learned data and half are tournament games—the pre-learned data more or less dominated tournament data for decisions during the tournament.

#opponentgameswins
7DaQin3591%
9Iron20093%
10ZZZKBot20076%
14Tyr20096%
17Arrakhammer20088%
19UAlbertaBot71100%
22AIUR5196%
25AILien20096%


Here is the final data. For every opponent that has pre-learned data, much or all of the per-learned data is retained until the end.

#1 saida

openinggameswins
10-15GateGoon220%
10Gate25NexusFE297%
DTDrop326%
Proxy4GateGoon70%
Proxy4GateGoon2p30%
Proxy9-9Gate100%
6 openings1034%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Not fast rush10299%4%10299%3%99%0%
Proxy0%0%11%100%0%0%
Unknown11%0%0%0%0%0%


Locutus and the Locutusoids use “Not fast rush” as a catch-all: The enemy’s opening is not a fast rush, and it is not more precisely recognized than that.

#2 cherrypi

openinggameswins
ForgeExpand4Gate2Archon1916%
ForgeExpand5GateGoon555%
ForgeExpandSpeedlots166%
ProxyHeavyZealotRush617%
ProxyHeavyZealotRush2p757%
5 openings10312%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush1313%23%3534%20%23%0%
Not fast rush8986%10%6866%7%64%0%
Unknown11%0%0%0%0%0%


Why are the successful proxy openings so little played? The “2p” version is played only on 2-player maps; the other version only on 3- and 4-player maps. Looking into the file by hand, I see that they were both successful from early in the tournament, so it’s not a matter of discovering them late. Perhaps the map size specialization interferes with the learning process? Perhaps they are deliberately little played to prevent the opponent from adapting? Have to read the code for this one. The proxy openings show similar numbers across other opponents, so it's not a one-off. Locutus’s learning in general does not look like it concentrates hard on playing the best-performing openings.

#3 cse

openinggameswins
2GateDTExpo30%
2GateDTRush2438%
4GateGoon4630%
Proxy4GateGoon450%
Proxy4GateGoon2p862%
Proxy9-9Gate60%
ProxyHeavyZealotRush40%
ProxyHeavyZealotRush2p250%
Turtle650%
9 openings10333%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar1010%40%2827%43%10%0%
Fast rush0%0%66%0%0%0%
Heavy rush0%0%33%100%0%0%
Not fast rush9289%33%6664%29%63%0%
Unknown11%0%0%0%0%0%

#4 bluebluesky

openinggameswins
2GateDTExpo1331%
2GateDTRush743%
4GateGoon5843%
9-9GateDefensive30%
Proxy4GateGoon1100%
Proxy4GateGoon2p2100%
Proxy9-9Gate20%
ProxyHeavyZealotRush20%
ProxyHeavyZealotRush2p10%
Turtle1429%
10 openings10338%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar6058%32%5553%31%77%0%
Not fast rush3938%51%4544%49%82%0%
Proxy33%0%33%0%67%0%
Unknown11%0%0%0%0%0%

#6 isamind

openinggameswins
2GateDTRush1771%
4GateGoon6058%
9-9GateDefensive633%
Proxy4GateGoon2100%
Proxy4GateGoon2p367%
Proxy9-9Gate10%
ProxyHeavyZealotRush20%
ProxyHeavyZealotRush2p10%
Turtle1155%
9 openings10357%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar0%0%11%100%0%0%
Fast rush55%60%77%100%20%0%
Heavy rush1313%54%77%71%15%0%
Not fast rush7876%59%8583%51%85%0%
Proxy66%33%33%100%0%0%
Unknown11%100%0%0%0%0%

#7 daqin

openinggameswins
2GateDTExpo4100%
2GateDTRush25100%
4GateGoon4498%
9-9GateDefensive1968%
Proxy4GateGoon683%
Proxy4GateGoon2p1100%
Proxy9-9Gate475%
ProxyHeavyZealotRush2100%
ProxyHeavyZealotRush2p1100%
Turtle3238%
10 openings13879%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush5137%49%4130%78%31%0%
Not fast rush8662%97%9770%79%71%0%
Unknown11%100%0%0%0%0%


Locutus scored lower versus DaQin in the tournament than in the pre-learning data. It may mean that DaQin was updated in private before the tournament. You have to expect that; I assume it is why there were only 35 games in the pre-learning data.

#8 mcrave

openinggameswins
2GateDTExpo10%
2GateDTRush2767%
4GateGoon4955%
9-9GateDefensive633%
Proxy4GateGoon367%
Proxy4GateGoon2p367%
Proxy9-9Gate10%
ProxyHeavyZealotRush450%
ProxyHeavyZealotRush2p10%
Turtle825%
10 openings10353%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar22%50%22%0%0%0%
Fast rush1313%31%1212%25%8%0%
Heavy rush1515%40%66%83%7%0%
Not fast rush7270%61%8381%57%81%0%
Unknown11%0%0%0%0%0%

#9 iron

openinggameswins
10-15GateGoon580%
10Gate25NexusFE10591%
DTDrop8991%
Proxy4GateGoon1100%
4 openings20091%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Not fast rush15276%91%7437%97%39%14%
Unknown10%100%2211%91%0%0%
Wall-in4724%91%10452%87%70%0%

#10 zzzkbot

openinggameswins
ForgeExpand4Gate2Archon786%
ForgeExpand5GateGoon9794%
ForgeExpandSpeedlots8695%
ProxyHeavyZealotRush580%
ProxyHeavyZealotRush2p540%
5 openings20092%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush6332%95%10754%91%54%0%
Heavy rush8140%90%7437%93%40%0%
Not fast rush5628%93%1910%100%9%0%

#11 steamhammer

openinggameswins
ForgeExpand4Gate2Archon1100%
ForgeExpand5GateGoon10296%
2 openings10396%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush22%100%77%100%0%0%
Heavy rush3736%100%2221%100%19%0%
Hydra bust66%67%1414%93%17%0%
Not fast rush5755%96%6058%95%61%0%
Unknown11%100%0%0%0%0%

#12 microwave

openinggameswins
ForgeExpand4Gate2Archon5100%
ForgeExpand5GateGoon8394%
ForgeExpandSpeedlots1593%
3 openings10394%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush22%100%1212%100%0%0%
Heavy rush3837%95%2322%100%21%0%
Hydra bust1817%94%1616%81%11%0%
Not fast rush4443%93%5250%94%43%0%
Unknown11%100%0%0%0%0%

#13 lastorder

openinggameswins
ForgeExpand5GateGoon10398%
1 openings10398%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush4948%100%5856%97%55%0%
Not fast rush5351%96%4544%100%43%0%
Unknown11%100%0%0%0%0%

#14 tyr

openinggameswins
12Nexus5ZealotFECannons57100%
2GateDTExpo250%
4GateGoon103100%
9-9GateDefensive667%
Proxy9-9Gate333%
ProxyHeavyZealotRush10%
Turtle2889%
7 openings20096%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush2110%86%10%100%0%0%
Heavy rush8944%100%189%89%10%0%
Not fast rush8040%95%15075%97%54%38%
Proxy63%67%10%100%0%0%
Unknown42%100%3015%90%0%0%

#15 metabot

openinggameswins
2GateDTRush35100%
4GateGoon4789%
ProxyHeavyZealotRush2100%
Turtle14100%
4 openings9895%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar1717%88%5051%90%71%0%
Fast rush1010%100%11%100%0%0%
Heavy rush22%100%77%100%50%0%
Not fast rush6869%96%4041%100%49%0%
Unknown11%100%0%0%0%0%

#16 letabot

openinggameswins
10-15GateGoon10%
10Gate25NexusFE250%
4GateGoon475%
DTDrop9696%
4 openings10393%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush44%75%11%100%0%0%
Not fast rush4039%98%1010%90%10%0%
Unknown22%50%0%0%0%0%
Wall-in5755%93%9289%93%89%0%

#17 arrakhammer

openinggameswins
ForgeExpand4Gate2Archon1369%
ForgeExpand5GateGoon14698%
ForgeExpandSpeedlots2580%
ProxyHeavyZealotRush1155%
ProxyHeavyZealotRush2p560%
5 openings20090%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush3718%92%2814%100%3%0%
Heavy rush8241%88%9648%89%46%0%
Naked expand126%92%63%83%25%8%
Not fast rush6934%93%6934%90%38%0%
Unknown0%0%10%100%0%0%

#18 ecgberht

openinggameswins
4GateGoon53100%
DTDrop50100%
2 openings103100%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush5351%100%8885%100%81%0%
Not fast rush4342%100%1515%100%9%0%
Unknown77%100%0%0%0%0%

#19 ualbertabot

openinggameswins
4GateGoon63100%
9-9GateDefensive5100%
ForgeExpand5GateGoon9493%
Proxy9-9Gate12100%
4 openings17496%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar63%100%63%100%17%0%
Fast rush3420%88%2011%100%18%0%
Heavy rush5532%96%3721%100%31%9%
Hydra bust106%100%95%89%30%0%
Not fast rush6839%99%9253%93%46%6%
Proxy0%0%11%100%0%0%
Unknown11%100%95%100%0%0%

#20 ximp

openinggameswins
2GateDTRush250%
4GateGoon10195%
2 openings10394%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Not fast rush5351%96%103100%94%100%0%
Unknown5049%92%0%0%0%0%

#21 cdbot

openinggameswins
9-9GateDefensive1100%
ForgeExpand5GateGoon102100%
2 openings103100%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush55%100%1010%100%0%0%
Heavy rush4342%100%3635%100%40%5%
Hydra bust0%0%22%100%0%0%
Not fast rush5351%100%4645%100%43%8%
Proxy11%100%33%100%0%0%
Unknown11%100%66%100%0%0%

#22 aiur

openinggameswins
10-15GateGoon367%
12Nexus5ZealotFE5100%
2GateDTExpo1100%
2GateDTRush4100%
4GateGoon11496%
Proxy4GateGoon3100%
Proxy9-9Gate683%
ProxyHeavyZealotRush3100%
ProxyHeavyZealotRush2p1100%
Turtle1493%
10 openings15495%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Dark templar3019%97%3120%94%33%0%
Heavy rush3925%92%5334%98%28%0%
Naked expand138%85%32%67%23%38%
Not fast rush7247%97%5536%93%44%1%
Proxy0%0%64%100%0%0%
Unknown0%0%64%100%0%0%

#23 killall

openinggameswins
ForgeExpand5GateGoon10398%
1 openings10398%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush33%100%88%100%0%0%
Heavy rush4544%98%3837%97%22%0%
Hydra bust0%0%11%100%0%0%
Not fast rush5452%98%5654%98%41%0%
Unknown11%100%0%0%0%0%

#24 willyt

openinggameswins
10-15GateGoon8100%
10Gate25NexusFE7100%
4GateGoon64100%
DTDrop21100%
Turtle3100%
5 openings103100%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush6765%100%6462%100%69%0%
Not fast rush3534%100%3635%100%46%0%
Proxy0%0%33%100%0%0%
Unknown11%100%0%0%0%0%

#25 ailien

openinggameswins
ForgeExpand4Gate2Archon2496%
ForgeExpand5GateGoon3397%
ForgeExpandSpeedlots12898%
ProxyHeavyZealotRush1283%
ProxyHeavyZealotRush2p3100%
5 openings20097%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush13266%98%10150%96%57%2%
Naked expand0%0%21%100%0%0%
Not fast rush6834%96%9548%98%62%0%
Unknown0%0%21%100%0%0%

#26 cunybot

openinggameswins
ForgeExpand5GateGoon93100%
1 openings93100%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush11%100%22%100%0%0%
Heavy rush4447%100%2325%100%25%2%
Not fast rush4751%100%6570%100%72%4%
Unknown11%100%33%100%0%0%

#27 hellbot

openinggameswins
2GateDTRush20100%
4GateGoon83100%
2 openings103100%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Not fast rush4948%100%103100%100%100%0%
Unknown5452%100%0%0%0%0%

overall

totalPvTPvPPvZPvR
openinggameswinsgameswinsgameswinsgameswinsgameswins
10-15GateGoon3936% 3633% 367%
10Gate25NexusFE14374% 14374%
12Nexus5ZealotFE5100% 5100%
12Nexus5ZealotFECannons57100% 57100%
2GateDTExpo2442% 2442%
2GateDTRush16179% 16179%
4GateGoon88985% 12199% 70582% 63100%
9-9GateDefensive4659% 4052% 1100% 5100%
DTDrop28885% 28885%
ForgeExpand4Gate2Archon6968% 6968%
ForgeExpand5GateGoon101192% 91792% 9493%
ForgeExpandSpeedlots27090% 27090%
Proxy4GateGoon2759% 812% 1979%
Proxy4GateGoon2p2060% 30% 1771%
Proxy9-9Gate4547% 100% 2339% 12100%
ProxyHeavyZealotRush5456% 2045% 3462%
ProxyHeavyZealotRush2p2756% 743% 2060%
Turtle13063% 3100% 12762%
total330583%61280%120877%131189%17496%
openings played1881364

AIIDE 2018 - what Steamhammer learned

In CIG, Steamhammer was broken. My findings on what Steamhammer learned in CIG 2018 are not valid, because Steamhammer rarely played the opening it thought it was playing; it played a broken version of the opening that left out drones and buildings. That is likely why the zergling rushes were successful in CIG: There was little in the build to leave out, so the build played more nearly as written. In this tournament, Steamhammer seems to have been working fine (though we’ll see when the replays come out)—well, working fine except for the usual bugs, some of which are fixed in version 2.1. Also, Steamhammer’s learning was revamped to better bamboozle opponents that tried to learn its patterns; the result is that its learning behavior is richer. I think these tables are full of interesting data.

103 rounds were played, of which 100 were official. Steamhammer is set to record at most 100 game records per opponent, so games from the first 3 rounds may have been dropped. That’s why the numbers don’t exactly match the official crosstable, even though the game totals look correct.

Steamhammer’s game records contain much more information than I can summarize in tidy little tables. This time I captured a little more of it, adding a table about the plan recognizer. For each plan that was recognized during a game, the table shows how often the plan was predicted before the game, how often it was recognized during the game, and the win rate in each of those cases. It also tries to measure the accuracy of the prediction. The plan recognizer itself is not very accurate; it often fails to recognize what is in front of it, calling the plan Unknown. The “?” column shows how often the plan was predicted and then no plan was recognized. The plan recognizer can also blow it completely and recognize the wrong plan. When the opponent plays predictably, the plan predictor is generally more accurate than the plan recognizer. When the opponent plays unpredictably, I don’t know which is more accurate! Either way, the plan prediction is more important early in the tournament; once Steamhammer has accumulated enough experience, it pays more attention to its learning data, and it doesn’t matter whether the predicted plan is good.

#1 saida

openinggameswins
11Gas10PoolLurker30%
11Gas10PoolMuta10%
11HatchTurtleHydra10%
2HatchHydraBust10%
3HatchHydraExpo10%
3HatchLurker10%
4HatchBeforeGas20%
4PoolHard30%
5PoolHard10%
5PoolSoft10%
6Pool10%
7PoolSoft20%
9Hatch8Pool20%
9HatchExpo9Pool9Gas10%
9Pool10%
9PoolExpo10%
9PoolLurker812%
9PoolSpeedAllIn10%
9PoolSunkSpeed10%
AntiFact_13Pool80%
AntiFact_2Hatch120%
AntiFactory160%
AntiZeal_12Hatch20%
Over10Hatch2SunkHard10%
OverhatchLateGas10%
Overpool+110%
OverpoolHatch10%
PurpleSwarmBuild10%
Sparkle 2HatchMuta20%
ZvP_3HatchPoolHydra10%
ZvT_12PoolMuta20%
ZvT_2HatchMuta10%
ZvT_3HatchMuta10%
ZvZ_12PoolLing10%
ZvZ_Overgas9Pool10%
ZvZ_Overpool11Gas1315%
ZvZ_Overpool9Gas10%
ZvZ_OverpoolTurtle10%
38 openings1003%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Factory100100%3%9191%3%91%2%
Naked expand0%0%77%0%0%0%
Unknown0%0%22%0%0%0%


SAIDA is a good example of how Steamhammer reacts to a predictable opponent. First, it repeatedly tried its counters to the opponent’s Factory plan, the 3 “AntiFact” openings (you may call them fake news openings if you like). In this case the counters did not work; SAIDA is too strong. Then it explored more widely. Steamhammer scored 1 win with a fast lurker opening, and repeated the opening to no avail (maybe Steamhammer got lucky once, or maybe SAIDA learned the timing). It also scored a win with a ZvZ fast mutalisk opening, and repeating that did bring a second win for a total of 3 in 100 rounds. The smaller second table shows that the plan predictor was 100% accurate over the last 100 rounds in predicting SAIDA’s factory-first play, while the plan recognizer was 91% accurate and actually saw a command center first in 7 games.

#2 cherrypi

openinggameswins
2.5HatchMuta10%
3HatchPoolMuta10%
4HatchBeforeGas10%
4PoolSoft10%
6PoolSpeed20%
7PoolHard10%
8Hatch7Pool10%
9Hatch8Pool10%
9PoolSunkSpeed10%
OverhatchLing10%
OverhatchMuta10%
OverpoolSpeed10%
OverpoolSunk10%
ZvP_2HatchMuta10%
ZvP_3BaseSpire+Den10%
ZvT_12PoolMuta10%
ZvT_3HatchMuta10%
ZvT_3HatchMutaExpo10%
ZvZ_12HatchMain2114%
ZvZ_12PoolLing10%
ZvZ_12PoolMain30%
ZvZ_Overgas9Pool10%
ZvZ_Overpool9Gas3030%
ZvZ_OverpoolTurtle2532%
24 openings10020%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush2222%14%11%0%0%100%
Heavy rush7777%22%2828%25%35%61%
Naked expand11%0%22%0%0%0%
Unknown0%0%6969%19%0%0%


Steamhammer sees CherryPi as a strategy switcher. I suspect that CherryPi did not actually play any fast zergling rushes, because they said they avoided risky openings, but I can’t be sure without a closer look. In any case, Steamhammer found answers and scored a respectable 20% against a much higher ranked opponent.

#3 cse

openinggameswins
11Gas10PoolLurker10%
11Gas10PoolMuta1020%
11HatchTurtleHydra20%
11HatchTurtleLurker10%
12HatchTurtle10%
2.5HatchMuta10%
2HatchHydra10%
2HatchHydraBust50%
2HatchLurkerAllIn10%
3HatchHydraBust90%
3HatchHydraExpo10%
3HatchLingBust30%
3HatchLingExpo10%
3HatchLurker20%
3HatchPoolMuta10%
4HatchBeforeGas60%
4PoolHard20%
5PoolHard2Player20%
5PoolSoft10%
7PoolHard20%
7PoolSoft10%
8Pool30%
9HatchExpo9Pool9Gas10%
9PoolExpo10%
9PoolHatch10%
9PoolSpeedAllIn20%
9PoolSpire20%
AntiFact_2Hatch10%
AntiZeal_12Hatch10%
Over10Hatch2SunkHard10%
Over10HatchBust20%
Over10HatchSlowLings20%
OverhatchExpoLing30%
OverhatchExpoMuta10%
OverhatchMuta10%
Overpool+110%
OverpoolHydra10%
OverpoolLurker10%
OverpoolSpeed20%
PurpleSwarmBuild10%
Sparkle 1HatchMuta10%
ZvP_2HatchMuta50%
ZvP_3BaseSpire+Den30%
ZvP_3HatchPoolHydra40%
ZvP_4HatchPoolHydra10%
ZvZ_12Pool20%
ZvZ_Overpool11Gas10%
ZvZ_Overpool9Gas10%
48 openings1002%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush0%0%44%0%0%0%
Safe expand1919%0%3333%0%32%5%
Turtle8181%2%6060%3%60%2%
Unknown0%0%33%0%0%0%


Steamhammer has trouble telling the difference between Safe Expand (in the protoss case, forge expand with cannons) and Turtle (hide behind cannons), because it does not scout well enough to see the natural nexus reliably. It compensates by reacting similarly in both cases. But the opponent is still seen as an unpredictable strategy switcher, so Steamhammer switches up its openings too. In this case it has more counter openings and tries each fewer times, so they are not as obvious in the table, but they do have higher counts: See 2HatchHydraBust, 3HatchHydraBust, 3HatchLingBust, 4HatchBeforeGas, ZvP_2HatchMuta, and ZvP_3BaseSpire+Den. As against SAIDA, Steamhammer scored 2 wins with a ZvZ fast mutalisk opening. I have an idea to add another exploration phase which experiments with all-in attacks like the fast mutas.

#4 bluebluesky

openinggameswins
11Gas10PoolLurker20%
11Gas10PoolMuta10%
11HatchTurtleHydra20%
2.5HatchMuta10%
2HatchHydraBust50%
2HatchLurker10%
2HatchLurkerAllIn10%
3HatchHydraBust10%
3HatchLingBust10%
3HatchLingExpo10%
4HatchBeforeGas30%
4PoolSoft10%
5PoolHard10%
7PoolHard1010%
8Pool10%
9HatchExpo9Pool9Gas1811%
9HatchMain9Pool9Gas10%
9PoolSpeed30%
9PoolSpeedAllIn30%
AntiFact_2Hatch10%
Over10Hatch20%
Over10Hatch1Sunk10%
Over10Hatch2Sunk20%
Over10Hatch2SunkHard10%
OverhatchExpoLing20%
Overpool+110%
OverpoolHatch10%
OverpoolHydra10%
OverpoolSpeed10%
OverpoolTurtle10%
PurpleSwarmBuild10%
Sparkle 1HatchMuta10%
Sparkle 2HatchMuta10%
Sparkle 3HatchMuta10%
ZvP_2HatchMuta40%
ZvP_3BaseSpire+Den70%
ZvP_3HatchPoolHydra60%
ZvT_13Pool10%
ZvZ_Overgas11Pool10%
ZvZ_Overgas9Pool30%
ZvZ_Overpool11Gas20%
ZvZ_Overpool9Gas10%
42 openings1003%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush77%0%2020%5%29%0%
Naked expand0%0%11%100%0%0%
Safe expand5353%2%4545%0%58%2%
Turtle4040%5%3333%3%45%0%
Unknown0%0%11%0%0%0%


Different all-ins took a few wins from BlueBlueSky.

#5 locutus

openinggameswins
11Gas10PoolLurker20%
11HatchTurtleLurker10%
12HatchTurtle10%
2HatchHydra10%
2HatchHydraBust50%
2HatchLurker20%
2HatchLurkerAllIn20%
3HatchHydra10%
3HatchHydraBust30%
3HatchHydraExpo10%
3HatchLingBust2512%
3HatchLingExpo20%
4PoolSoft10%
5PoolHard20%
6PoolSpeed10%
8Hatch7Pool10%
8Pool10%
9HatchExpo9Pool9Gas10%
9HatchMain9Pool9Gas10%
9PoolSpeed10%
9PoolSpeedAllIn10%
AntiFact_13Pool10%
AntiFact_2Hatch10%
AntiFactory10%
AntiZeal_12Hatch10%
Over10Hatch10%
Over10Hatch2SunkHard10%
OverhatchExpoMuta20%
OverhatchLateGas10%
OverpoolHydra10%
OverpoolSpeed10%
OverpoolSunk10%
OverpoolTurtle10%
PurpleSwarmBuild20%
Sparkle 2HatchMuta10%
Sparkle 3HatchMuta10%
ZvP_2HatchMuta50%
ZvP_3BaseSpire+Den40%
ZvP_3HatchPoolHydra50%
ZvP_Overpool3Hatch10%
ZvT_12PoolMuta40%
ZvT_13Pool10%
ZvT_2HatchMuta10%
ZvT_3HatchMuta10%
ZvZ_12Pool10%
ZvZ_12PoolLing10%
ZvZ_12PoolMain10%
ZvZ_Overgas9Pool10%
ZvZ_Overpool9Gas10%
49 openings1003%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush0%0%44%25%0%0%
Safe expand6262%3%5555%0%60%0%
Turtle3838%3%4141%5%50%0%

#6 isamind

openinggameswins
11Gas10PoolLurker10%
11Gas10PoolMuta10%
2.5HatchMuta10%
2HatchHydra10%
2HatchHydraBust60%
2HatchLurker10%
3HatchHydra10%
3HatchHydraBust50%
3HatchLingBust50%
4HatchBeforeGas30%
4PoolHard10%
4PoolSoft20%
5PoolHard2Player10%
5PoolSoft10%
7PoolHard1118%
7PoolMid10%
7PoolSoft10%
8Hatch7Pool10%
8Pool10%
9HatchExpo9Pool9Gas30%
9HatchMain9Pool9Gas10%
9PoolSpeed10%
9PoolSunkHatch10%
AntiFact_13Pool10%
AntiZeal_12Hatch10%
Over10Hatch10%
Over10Hatch1Sunk20%
Over10Hatch2Sunk10%
Over10Hatch2SunkHard10%
Over10HatchSlowLings10%
OverhatchExpoLing30%
OverpoolHatch812%
OverpoolHydra10%
OverpoolLurker20%
OverpoolSpeed20%
PurpleSwarmBuild10%
ZvP_2HatchMuta20%
ZvP_3BaseSpire+Den40%
ZvP_3HatchPoolHydra617%
ZvP_Overpool3Hatch30%
ZvT_2HatchMuta40%
ZvT_3HatchMutaExpo10%
ZvZ_12HatchMain10%
ZvZ_12PoolMain10%
ZvZ_Overpool11Gas10%
ZvZ_OverpoolTurtle10%
46 openings1004%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush1717%12%1414%14%65%6%
Proxy22%0%22%0%0%0%
Safe expand6262%3%4747%2%47%5%
Turtle1919%0%3333%3%26%0%
Unknown0%0%44%0%0%0%

#7 daqin

openinggameswins
11Gas10PoolMuta812%
2HatchHydra20%
2HatchHydraBust50%
2HatchLurkerAllIn50%
3HatchHydra20%
3HatchHydraBust30%
3HatchHydraExpo20%
3HatchLing10%
3HatchLingBust40%
3HatchLingExpo10%
4HatchBeforeGas40%
4PoolSoft10%
5PoolHard2Player20%
6PoolSpeed30%
8Hatch7Pool10%
9HatchExpo9Pool9Gas10%
9PoolHatch20%
9PoolSpeedAllIn30%
9PoolSpire10%
9PoolSunkHatch30%
9PoolSunkSpeed20%
AntiFact_13Pool10%
AntiFact_2Hatch20%
AntiZeal_12Hatch10%
Over10Hatch1Sunk20%
Over10Hatch2Sunk30%
OverhatchExpoLing10%
OverhatchExpoMuta40%
OverhatchLateGas10%
OverhatchLing10%
OverpoolHatch10%
OverpoolHydra20%
OverpoolLurker10%
OverpoolSpeed40%
OverpoolSunk10%
OverpoolTurtle10%
Sparkle 1HatchMuta20%
ZvP_2HatchMuta20%
ZvP_3BaseSpire+Den30%
ZvP_3HatchPoolHydra20%
ZvP_4HatchPoolHydra10%
ZvT_12PoolMuta10%
ZvT_3HatchMutaExpo10%
ZvZ_12HatchExpo10%
ZvZ_12HatchMain10%
ZvZ_12PoolLing10%
ZvZ_Overgas11Pool10%
ZvZ_OverpoolTurtle20%
48 openings1001%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush0%0%33%0%0%0%
Proxy1010%0%1616%0%0%0%
Safe expand3535%0%3434%0%29%6%
Turtle5555%2%4141%2%40%7%
Unknown0%0%66%0%0%0%

#8 mcrave

openinggameswins
11HatchTurtleHydra1250%
2HatchHydra1136%
2HatchLurker250%
2HatchLurkerAllIn10%
3HatchHydraBust743%
3HatchLing20%
3HatchLingBust10%
AntiZeal_12Hatch20%
Over10Hatch2Hard10%
Over10HatchBust10%
OverhatchLateGas2330%
ZvP_3HatchPoolHydra1323%
ZvP_Overpool3Hatch10%
ZvT_12PoolMuta10%
ZvZ_OverpoolTurtle2264%
15 openings10038%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush9191%37%5151%25%54%31%
Safe expand88%38%1111%45%0%62%
Turtle11%100%55%20%0%0%
Unknown0%0%3333%58%0%0%


The plan predictor struggled to predict what McRave was going to do next, but learning worked well anyway—eventually. The ZvZ_OverpoolTurtle choice is a big surprise, an opening that builds 3 sunkens and gets fast mutalisks on one base. The opening is sound only against certain all-in zerg strategies; protoss really ought to smash it. I’m guessing it worked against a zealot rush where McRave was slow to switch tech when the mutas showed up.

#9 iron

openinggameswins
12HatchTurtle10%
2.5HatchMuta10%
3HatchPoolMuta911%
9PoolExpo825%
9PoolSunkHatch10%
AntiFact_13Pool3523%
AntiFact_2Hatch20%
AntiFactory10%
AntiZeal_12Hatch10%
OverpoolLurker10%
OverpoolSpeed10%
OverpoolSunk10%
ZvP_4HatchPoolHydra10%
ZvZ_12PoolMain10%
ZvZ_Overgas11Pool1450%
ZvZ_Overpool9Gas2245%
16 openings10028%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Factory100100%28%9191%29%91%7%
Turtle0%0%22%0%0%0%
Unknown0%0%77%29%0%0%


When I run matches locally against Iron, Steamhammer soon settles on AntiFactory as the most reliable answer, and that does seem best. For some reason, Steamhammer behaved differently in both CIG and AIIDE. It is astonishing that ZvZ fast mutalisk openings came out on top again. Exactly as against SAIDA, the plan predictor was 100% accurate while the plan recognizer was 91% accurate.

#10 zzzkbot

openinggameswins
3HatchHydraBust10%
4PoolHard10%
9PoolSpeedAllIn1479%
9PoolSunkHatch2232%
OverhatchExpoLing10%
OverhatchLing10%
OverpoolSunk2138%
ZvP_3HatchPoolHydra10%
ZvP_4HatchPoolHydra10%
ZvZ_Overgas9Pool2544%
ZvZ_Overpool11Gas520%
ZvZ_Overpool9Gas10%
ZvZ_OverpoolTurtle617%
13 openings10039%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush7777%42%2121%57%22%75%
Heavy rush1414%21%22%0%0%86%
Turtle99%44%22%100%22%56%
Unknown0%0%7575%33%0%0%


9PoolSunkHatch and OverpoolSunk are anti-rush openings, and 9PoolSpeedAllIn is general-purpose but good against rushes. In contrast, ZvZ_Overgas9Pool is a fast mutalisk opening and can be overrun by too many zerglings. I don’t know how accurate the plan predictions are, but they agree fairly well with the selected openings.

#12 microwave

openinggameswins
11Gas10PoolMuta2832%
3HatchHydraBust10%
3HatchLing10%
3HatchLingExpo10%
3HatchLurker10%
4PoolSoft1217%
5PoolHard2Player10%
9HatchMain9Pool9Gas20%
9PoolSpeed10%
9PoolSpeedAllIn10%
9PoolSunkSpeed20%
AntiFact_2Hatch10%
OverhatchLing20%
OverpoolSunk425%
ZvZ_12HatchMain20%
ZvZ_12PoolLing10%
ZvZ_12PoolMain20%
ZvZ_Overgas9Pool20%
ZvZ_Overpool11Gas1020%
ZvZ_Overpool9Gas2339%
ZvZ_OverpoolTurtle20%
21 openings10023%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush1515%27%1010%50%13%53%
Heavy rush4242%17%2020%40%14%45%
Naked expand4343%28%2121%5%21%49%
Turtle0%0%11%0%0%0%
Unknown0%0%4848%19%0%0%


Microwave really mixed things up, and it was successful! Steamhammer could not predict the opening switches. It’s interesting that when Steamhammer predicted a fast rush, it won a quarter of the time, and when it actually recognized a fast rush, it won half the time. That doesn’t tell us what actually happened in the games. When Steamhammer recognizes a fast rush, it can react no matter what opening it is playing, and often save itself. When it is rushed and doesn’t recognize it, it will lose unless it is playing a safe opening.

#13 lastorder

openinggameswins
3HatchLingBust1233%
4PoolHard10%
4PoolSoft2129%
6PoolSpeed10%
AntiFactory10%
Over10Hatch10%
Over10Hatch1Sunk425%
OverhatchLing20%
OverhatchMuta729%
PurpleSwarmBuild10%
ZvP_3HatchPoolHydra10%
ZvT_3HatchMutaExpo633%
ZvZ_12HatchMain1331%
ZvZ_12PoolLing520%
ZvZ_12PoolMain50%
ZvZ_Overpool11Gas1735%
ZvZ_OverpoolTurtle20%
17 openings10026%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush100100%26%7777%25%77%14%
Naked expand0%0%33%0%0%0%
Turtle0%0%66%17%0%0%
Unknown0%0%1414%43%0%0%


LastOrder did not learn during the tournament and played predictably, yet Steamhammer struggled to find an answer. We also know that LastOrder learned extensively offline before the tournament. Knowing that, and looking at these tables (check out the variety of recognized plans and the variety of Steamhammer’s more successful openings), I get the impression that LastOrder is highly adaptive and knows how to react in a wide variety of situations. I guess we’ll see when the replays come out.

#14 tyr

openinggameswins
2HatchHydraBust1338%
2HatchLurkerAllIn1443%
3HatchHydraExpo3876%
4HatchBeforeGas20%
4PoolHard425%
9PoolSunkSpeed10%
Over10Hatch2Hard10%
Over10HatchBust10%
OverpoolLurker729%
OverpoolSpeed5100%
ZvP_3BaseSpire+Den1362%
ZvP_3HatchPoolHydra10%
12 openings10056%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush3939%56%4545%78%41%3%
Naked expand0%0%11%100%0%0%
Turtle6161%56%5050%32%48%5%
Unknown0%0%44%100%0%0%


These numbers say that anything which helps Steamhammer find the right answers early, without having to do so much random exploration, would be a big win in a long tournament. The plan recognizer is not good enough.

#15 metabot

openinggameswins
11Gas10PoolLurker250%
11HatchTurtleHydra683%
12HatchTurtle367%
2HatchLurkerAllIn367%
3HatchHydraExpo10%
3HatchLing1182%
3HatchLingExpo1060%
4PoolHard10%
6PoolSpeed2100%
9HatchExpo9Pool9Gas850%
9PoolHatch367%
9PoolSpeedAllIn250%
AntiZeal_12Hatch10%
Over10Hatch250%
Over10Hatch2Hard1100%
Over10Hatch2Sunk30%
OverhatchExpoLing862%
OverhatchExpoMuta1443%
OverhatchLateGas425%
OverpoolSpeed475%
ZvP_2HatchMuta250%
21 openings9157%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush3437%65%1921%68%21%41%
Naked expand33%33%33%100%0%33%
Safe expand3437%56%2022%45%21%38%
Turtle1921%47%1314%46%11%42%
Unknown11%100%3640%58%0%0%


It must have been a crazy learning duel! Later I’ll try to figure out what MetaBot learned, and we can check them against each other.

#16 letabot

openinggameswins
12HatchTurtle20%
3HatchLing10%
6PoolSpeed1164%
9HatchExpo9Pool9Gas633%
9PoolLurker4582%
OverpoolHatch771%
OverpoolLurker2882%
7 openings10074%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush9999%74%5959%78%59%20%
Safe expand0%0%44%50%0%0%
Turtle11%100%1717%76%0%0%
Unknown0%0%2020%65%0%0%

#17 arrakhammer

openinggameswins
2HatchLurkerAllIn10%
4PoolHard2268%
6PoolSpeed5275%
7Pool12Hatch10%
9HatchMain9Pool9Gas10%
9PoolSpeedAllIn10%
AntiFactory10%
Over10Hatch2SunkHard10%
Over10HatchBust10%
Over10HatchSlowLings10%
OverhatchExpoMuta10%
OverhatchLing10%
OverpoolHydra10%
ZvZ_12HatchMain10%
ZvZ_12PoolLing10%
ZvZ_12PoolMain20%
ZvZ_Overpool11Gas1136%
17 openings10058%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush9999%58%7878%65%78%1%
Naked expand11%100%2121%29%0%0%
Unknown0%0%11%100%0%0%


This old version of Arrakhammer has a fixed anti-Steamhammer opening configured. It was written before Steamhammer had learning. Modern Steamhammer can exploit the fixed opening. You can’t get away with that any more.

#18 ecgberht

openinggameswins
11Gas10PoolLurker1191%
11HatchTurtleLurker51100%
9PoolLurker3797%
OverpoolLurker10%
4 openings10097%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush100100%97%6767%96%67%33%
Unknown0%0%3333%100%0%0%

#19 ualbertabot

openinggameswins
3HatchLurker10%
7PoolHard1182%
AntiZeal_12Hatch757%
OverhatchExpoMuta10%
OverpoolTurtle8098%
5 openings10091%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Factory22%100%1111%100%0%0%
Fast rush1212%92%1515%80%33%25%
Heavy rush8585%91%4545%89%45%22%
Naked expand11%100%77%100%0%0%
Unknown0%0%2222%95%0%0%


Getting that 98% win rate is one of the reasons I added the seemingly nonsensical overpool turtle opening, which makes an absurd 6 sunkens on one base. It works against all kinds of rushes, fast or slow, when the rusher does not know how to adapt.

#20 ximp

openinggameswins
3HatchHydraExpo1782%
4HatchBeforeGas3683%
9Hatch8Pool10%
AntiFactory10%
ZvP_2HatchMuta978%
ZvP_3BaseSpire+Den3678%
6 openings10079%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Safe expand33%100%1818%94%0%0%
Turtle9797%78%7878%76%77%4%
Unknown0%0%44%75%0%0%


Why didn’t Steamhammer try the 3 hatch before pool opening even once in 100 rounds? I expect it would have scored higher. Well, I know why; when the win rate is so convincing, Steamhammer doesn’t explore much.

#21 cdbot

openinggameswins
11HatchTurtleHydra10%
9PoolSunkSpeed1547%
OverpoolSunk8296%
ZvP_Overpool3Hatch10%
ZvZ_12PoolLing10%
5 openings10086%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush9696%85%3131%71%29%57%
Heavy rush44%100%1313%100%0%25%
Unknown0%0%5656%91%0%0%

#22 aiur

openinggameswins
11Gas10PoolLurker10%
3HatchHydraExpo2889%
5PoolHard2Player10%
AntiZeal_12Hatch4691%
Over10Hatch2492%
5 openings10089%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush9595%89%6565%91%64%18%
Naked expand44%75%1515%73%0%25%
Proxy0%0%22%50%0%0%
Turtle11%100%0%0%0%0%
Unknown0%0%1818%100%0%0%


Turtle was predicted once but never recognized in the last 100 games. That implies that Steamhammer recognized a turtle opening in the first 3 rounds—and it was wrong, since AIUR doesn’t do that; it must have been a misrecognized cannon rush, a bug that has crept in. Comparing against what AIUR learned, I see that AIUR cannon rushed Steamhammer 3 times total, all failures, and favored its defensive strategy.

#23 killall

openinggameswins
6PoolSpeed10%
9PoolSpeed37100%
ZvZ_12PoolMain10%
ZvZ_OverpoolTurtle6193%
4 openings10094%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush7575%93%4343%91%49%36%
Naked expand55%80%1212%100%20%20%
Turtle2020%100%1010%100%45%35%
Unknown0%0%3535%94%0%0%

#24 willyt

openinggameswins
11Gas10PoolLurker3097%
11HatchTurtleLurker786%
12HatchTurtle20%
2HatchLurkerAllIn2496%
6PoolSpeed10%
9PoolLurker10%
OverpoolLurker35100%
7 openings10093%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Heavy rush100100%93%8585%96%85%15%
Unknown0%0%1515%73%0%0%

#25 ailien

openinggameswins
3HatchLurker10%
6PoolSpeed10%
9PoolSpeedAllIn10%
OverhatchLing10%
ZvT_3HatchMuta10%
ZvZ_Overgas9Pool743%
ZvZ_Overpool9Gas2085%
ZvZ_OverpoolTurtle6893%
8 openings10083%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Naked expand9898%85%33%0%2%98%
Unknown22%0%9797%86%0%50%

#26 cunybot

openinggameswins
11Gas10PoolMuta10%
5PoolHard2Player367%
OverhatchLing1593%
OverpoolSpeed10%
ZvZ_12HatchExpo250%
ZvZ_OverpoolTurtle77100%
6 openings9995%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Fast rush44%100%33%100%0%75%
Heavy rush1313%100%66%83%0%62%
Naked expand6263%94%2020%90%19%61%
Turtle1919%100%1010%100%11%58%
Unknown11%0%6061%97%0%0%

#27 hellbot

openinggameswins
2HatchHydraBust580%
3HatchHydra7100%
3HatchHydraBust12100%
3HatchHydraExpo14100%
3HatchLingBust8100%
4HatchBeforeGas16100%
Over10Hatch1Sunk3100%
ZvP_2HatchMuta11100%
ZvP_3BaseSpire+Den15100%
ZvP_3HatchPoolHydra9100%
10 openings10099%
planpredictedrecognizedaccuracy
countgameswinscountgameswinsgood?
Turtle100100%99%7676%99%76%24%
Unknown0%0%2424%100%0%0%

overall

totalZvTZvPZvZZvR
openinggameswinsgameswinsgameswinsgameswinsgameswins
11Gas10PoolLurker5375% 4489% 911%
11Gas10PoolMuta5024% 10% 2015% 2931%
11HatchTurtleHydra2446% 10% 2250% 10%
11HatchTurtleLurker6095% 5898% 20%
12HatchTurtle1020% 50% 540%
2.5HatchMuta50% 10% 30% 10%
2HatchHydra1625% 1625%
2HatchHydraBust4520% 10% 4420%
2HatchLurker617% 617%
2HatchLurkerAllIn5260% 2496% 2730% 10%
3HatchHydra1164% 1164%
3HatchHydraBust4236% 4038% 20%
3HatchHydraExpo10380% 10% 10280%
3HatchLing1656% 10% 1464% 10%
3HatchLingBust5925% 4723% 1233%
3HatchLingExpo1638% 1540% 10%
3HatchLurker60% 10% 20% 20% 10%
3HatchPoolMuta119% 911% 10% 10%
4HatchBeforeGas7363% 20% 7066% 10%
4PoolHard3546% 30% 812% 2462%
4PoolSoft3921% 50% 3424%
5PoolHard40% 10% 30%
5PoolHard2Player1020% 60% 450%
5PoolSoft30% 10% 20%
6Pool10% 10%
6PoolSpeed7564% 1258% 633% 5768%
7Pool12Hatch10% 10%
7PoolHard3534% 2313% 10% 1182%
7PoolMid10% 10%
7PoolSoft40% 20% 20%
8Hatch7Pool40% 30% 10%
8Pool60% 60%
9Hatch8Pool40% 20% 10% 10%
9HatchExpo9Pool9Gas3921% 729% 3219%
9HatchMain9Pool9Gas60% 30% 30%
9Pool10% 10%
9PoolExpo1020% 922% 10%
9PoolHatch633% 633%
9PoolLurker9181% 9181%
9PoolSpeed4386% 50% 3897%
9PoolSpeedAllIn2941% 10% 119% 1765%
9PoolSpire30% 30%
9PoolSunkHatch2726% 10% 40% 2232%
9PoolSunkSpeed2232% 10% 30% 1839%
AntiFact_13Pool4617% 4319% 30%
AntiFact_2Hatch200% 140% 50% 10%
AntiFactory210% 170% 20% 20%
AntiZeal_12Hatch6373% 30% 5379% 757%
Over10Hatch3174% 3077% 10%
Over10Hatch1Sunk1233% 838% 425%
Over10Hatch2Hard333% 333%
Over10Hatch2Sunk90% 90%
Over10Hatch2SunkHard60% 10% 40% 10%
Over10HatchBust50% 40% 10%
Over10HatchSlowLings40% 30% 10%
OverhatchExpoLing1828% 1729% 10%
OverhatchExpoMuta2326% 2129% 10% 10%
OverhatchLateGas3027% 10% 2928%
OverhatchLing2458% 10% 2361%
OverhatchMuta922% 10% 825%
Overpool+130% 10% 20%
OverpoolHatch1833% 862% 1010%
OverpoolHydra70% 60% 10%
OverpoolLurker7679% 6589% 1118%
OverpoolSpeed2236% 10% 1942% 20%
OverpoolSunk11179% 10% 20% 10881%
OverpoolTurtle8394% 30% 8098%
PurpleSwarmBuild70% 10% 50% 10%
Sparkle 1HatchMuta40% 40%
Sparkle 2HatchMuta40% 20% 20%
Sparkle 3HatchMuta20% 20%
ZvP_2HatchMuta4146% 4048% 10%
ZvP_3BaseSpire+Den8659% 8560% 10%
ZvP_3HatchPoolHydra4927% 10% 4628% 20%
ZvP_4HatchPoolHydra40% 10% 20% 10%
ZvP_Overpool3Hatch60% 50% 10%
ZvT_12PoolMuta90% 20% 60% 10%
ZvT_13Pool20% 20%
ZvT_2HatchMuta60% 10% 50%
ZvT_3HatchMuta40% 10% 10% 20%
ZvT_3HatchMutaExpo922% 20% 729%
ZvZ_12HatchExpo333% 10% 250%
ZvZ_12HatchMain3918% 20% 3719%
ZvZ_12Pool30% 30%
ZvZ_12PoolLing128% 10% 20% 911%
ZvZ_12PoolMain160% 10% 20% 130%
ZvZ_Overgas11Pool1644% 1450% 20%
ZvZ_Overgas9Pool4035% 10% 40% 3540%
ZvZ_Overpool11Gas6025% 1315% 40% 4330%
ZvZ_Overpool9Gas10045% 2343% 30% 7447%
ZvZ_OverpoolTurtle26782% 10% 2556% 24185%
total259052%50059%109139%89958%10091%
openings played915287555

Steamhammer played all of its openings during the tournament, almost all of them multiple times. It even tried the 3 specialized openings for the island map Sparkle. Nearly as many were played in ZvP alone, since it spent so much time desperately seeking an answer to the Locutusoids (or possibly Susan). Some openings were highly successful in given matchups, which generally means that the opening defeated one opponent reliably and so was played many times. For example, OverpoolSunk wiped out CDBot, which makes it look in this table as though it wiped out all zergs. If only it were so simple! The opening with the best success across matchups is 6PoolSpeed, an opening that I have never seen in human play.