Entries by Jay Scott | Starcraft AI blog

academy and cybernetics core are special

A thought while I work on more bug fixes, like the 4 bugs responsible for this absurdly bad game. (1. The enemy refinery was considered inaccessible by ground and not attacked. 2. The squad tried to take a path down the ramp which is blocked by minerals. 3. A zergling was stuck in a key position at the top of the ramp. 4. Production froze due to an undiagnosed bug which I have never seen before.)

Although an academy has zero gas cost, terran has no reason to build an academy without taking gas first, or at latest around the same time the academy is built. All uses of the academy require gas. (A refinery has half the build time of an academy, but you also have to gather some gas. The rule relies on the fact that you never want to spend that much to get 1 medic or 1 firebat.) Protoss has no reason to build a cybernetics core without taking gas first, for the same reason. These are the only 2 buildings with that property. All other buildings either require gas to be built at all, or if they have zero gas cost, they enable production that also has zero gas cost. For example, an engineering bay, forge, or evolution chamber might be used to get static defense, which costs only minerals. Zerg has no building with this property. (Imbalance!)

If you generate your own build orders automatically, you can use this as a constraint on which build orders are acceptable.

Steamhammer 2.1.3 test version

Yesterday’s version 2.1.2 turned out to have a serious bug that occurs on SSCAIT and not in my test environment. The problem is that a drone on the way to start a building may be reassigned in the middle of its journey, and another drone sent to build instead. If it happens once, it causes a delay in construction—and it often happens repeatedly. Today’s version 2.1.3 tries to fix the bug. I can’t reproduce the bug, even though it happens in every game on SSCAIT, but I know what code is responsible for reassigning drones. I made a couple of improvements that are valuable in their own right, in the hope that one of them will also fix the bug. If not, I don’t know what I’ll do. I don’t want to roll back because my changes are improvements otherwise, and the deadline looms.

Update: Whew, my fixes worked.

Steamhammer 2.1.2 test version

I’ve uploaded the next test version to SSCAIT, Steamhammer 2.1.2. The biggest change is that stuck units are less common. There is still an issue where units can occasionally freeze en masse in the late game, but it’s rare in my tests and so far I’ve only seen it when Steamhammer was winning and it didn’t matter. To fix sticking, I finished up and enabled part of Steamhammer’s new unit control infrastructure, work that has been inching forward for months. I didn’t change anything else or take any unsticking actions; the new structure usually works better by nature than the classic structure inherited from UAlbertaBot.

There’s also a fix for a building manager bug that incorrectly turned expansion hatcheries into macro hatcheries. The bug sometimes made play much worse, but also sometimes made it better, so testing is called for.

AIIDE 2018 - what CherryPi learned

Here is a table of how each CherryPi opening fared against each opponent, like the tables I made for other bots. Reading the code confirmed my inference that the learning files recorded opening build orders, not build orders switched to later in the game; see how CherryPi played.

#	bot	total	10hatchling	2hatchmuta	3basepoollings	9poolspeedlingmuta	hydracheese	zve9poolspeed	zvp10hatch	zvp3hatchhydra	zvp6hatchhydra	zvpohydras	zvpomutas	zvt2baseguardian	zvt2baseultra	zvt3hatchlurker	zvtmacro	zvz12poolhydras	zvz9gas10pool	zvz9poolspeed	zvzoverpool
#1	saida	13-90 13%	-	-	-	-	-	1-19 5%	-	-	-	-	-	-	1-15 6%	9-37 20%	2-19 10%	-	-	-	-
#3	cse	73-30 71%	-	-	-	-	-	0-2 0%	24-5 83%	-	-	16-8 67%	-	-	-	-	33-15 69%	-	-	-	-
#4	bluebluesky	89-14 86%	-	-	-	-	-	0-1 0%	29-8 78%	-	-	-	-	-	-	-	60-5 92%	-	-	-	-
#5	locutus	84-19 82%	-	-	63-11 85%	-	-	-	-	-	14-3 82%	-	2-2 50%	-	-	-	5-3 62%	-	-	-	-
#6	isamind	99-4 96%	-	-	1-0 100%	-	-	-	-	-	98-4 96%	-	-	-	-	-	-	-	-	-	-
#7	daqin	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	103-0 100%	-	-	-	-
#8	mcrave	87-16 84%	-	-	9-2 82%	-	-	-	-	-	31-4 89%	-	14-4 78%	-	-	-	33-6 85%	-	-	-	-
#9	iron	97-6 94%	-	-	-	-	97-6 94%	-	-	-	-	-	-	-	-	-	-	-	-	-	-
#10	zzzkbot	93-10 90%	58-4 94%	-	-	0-1 0%	-	-	-	-	-	-	-	-	-	-	-	-	-	35-4 90%	0-1 0%
#11	steamhammer	81-21 79%	22-7 76%	-	-	-	-	16-5 76%	-	-	-	-	-	-	-	-	-	0-1 0%	-	43-8 84%	-
#12	microwave	94-9 91%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0-1 0%	4-2 67%	90-6 94%
#13	lastorder	85-18 83%	45-7 87%	-	-	-	-	0-1 0%	-	-	-	-	-	-	-	-	-	-	-	-	40-10 80%
#14	tyr	98-5 95%	-	-	-	-	-	-	98-5 95%	-	-	-	-	-	-	-	-	-	-	-	-
#15	metabot	94-2 98%	-	-	-	-	-	-	-	-	-	94-2 98%	-	-	-	-	-	-	-	-	-
#16	letabot	101-2 98%	0-1 0%	-	97-0 100%	-	-	1-1 50%	-	-	-	-	-	3-0 100%	-	-	-	-	-	-	-
#17	arrakhammer	92-11 89%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	92-11 89%	-
#18	ecgberht	102-1 99%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	102-1 99%	-	-	-	-
#19	ualbertabot	99-4 96%	-	-	-	96-2 98%	-	3-2 60%	-	-	-	-	-	-	-	-	-	-	-	-	-
#20	ximp	98-5 95%	-	-	-	-	-	-	-	1-0 100%	-	97-5 95%	-	-	-	-	-	-	-	-	-
#21	cdbot	103-0 100%	-	-	-	-	-	96-0 100%	-	-	-	-	-	-	-	-	-	-	-	7-0 100%	-
#22	aiur	100-3 97%	-	-	-	-	-	-	-	-	-	100-3 97%	-	-	-	-	-	-	-	-	-
#23	killall	103-0 100%	102-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	1-0 100%
#24	willyt	103-0 100%	-	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
#25	ailien	103-0 100%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	103-0 100%	-
#26	cunybot	100-3 97%	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	100-3 97%	-
#27	hellbot	103-0 100%	-	-	-	-	-	-	31-0 100%	-	-	72-0 100%	-	-	-	-	-	-	-	-	-
overall		- 90%	227-19 92%	103-0 100%	170-13 93%	96-3 97%	97-6 94%	117-31 79%	182-18 91%	1-0 100%	143-11 93%	379-18 95%	16-6 73%	3-0 100%	1-15 6%	9-37 20%	338-49 87%	0-1 0%	0-1 0%	384-28 93%	131-17 89%

Look how sparse the chart is—CherryPi was highly selective about its choices. It did not try more than 4 different builds against any opponent. It makes sense to minimize the number of choices so that you don’t lose games exploring bad ones, but you have to be pretty sure that one of the choices you do try is good. Where did the selectivity come from?

The opening “hydracheese” was played only against Iron, and was the only opening played against Iron. It smelled like a hand-coded choice. Sure enough, the file source/src/models/banditconfigurations.cpp configures builds by name for 18 of the 27 entrants. A comment says that the build order switcher is turned off for the hydracheese opening only: “BOS disabled for this specific build because the model hasn’t seen it.” Here is the full set of builds configured, including defaults for those that were not hand-configured. CherryPi played only builds that were configured, but did not play all the builds that were configured; presumably it stopped when it hit a good one.

bots	builds	note
AILien	zve9poolspeed zvz9poolspeed	returning opponents from last year
AIUR	zvtmacro zvpohydras zvp10hatch
Arrakhammer	10hatchling zvz9poolspeed
Iron	hydracheese
UAlbertaBot	zve9poolspeed 9poolspeedlingmuta
Ximp	zvpohydras zvtmacro zvp3hatchhydra
Microwave	zvzoverpool zvz9poolspeed zvz9gas10pool	“we have some expectations”
Steamhammer	zve9poolspeed zvz9poolspeed zvz12poolhydras 10hatchling
ZZZKBot	9poolspeedlingmuta 10hatchling zvz9poolspeed zvzoverpool
ISAMind Locutus McRave DaQin	zvtmacro zvp6hatchhydra 3basepoollings zvpomutas
CUNYBot	zvzoverpoolplus1 zvz9gas10pool zvz9poolspeed
HannesBredberg	zvtp1hatchlurker zvt2baseultra zvt3hatchlurker zvp10hatch
LetaBot	zvtmacro 3basepoollings zvt2baseguardian zve9poolspeed 10hatchling
MetaBot	zvtmacro zvpohydras zvpomutas zve9poolspeed
WillyT	zvt2baseultra 12poolmuta 2hatchmuta
ZvT	zvt2baseultra zvtmacro zvt3hatchlurker zve9poolspeed	defaults
ZvP	zve9poolspeed zvtmacro zvp10hatch zvpohydras
ZvZ	10hatchling zve9poolspeed zvz9poolspeed zvzoverpool
ZvR	10hatchling zve9poolspeed 9poolspeedlingmuta

I read this as pulling out all the stops to reach #1. They would have succeeded if not for SAIDA.

banditconfigurations.cpp continues and declares some properties for builds including non-opening builds. It looks like .validOpening() tells whether it can be played as an opening build, .validSwitch() tells whether the build order switcher is allowed to switch to it during the game, and .switchEnabled() tells whether the build order switcher is enabled at all.

The build orders themselves are defined in source/src/buildorders/. I found them a little hard to read, partly because they are written in reverse order: Actions to happen first are posted last to the blackboard.

The opening zve9poolspeed (I read “zve” as zerg versus everything) has the most red boxes in the chart—it did poorly against more opponents than any other. It may have been a poor choice to configure for use in so many cases. In contrast, zvz9poolspeed specialized for ZvZ was successful. It gets fast mutalisks and in general has a lot more strategic detail coded into the build.

They seem to have had expectations of the zvt2baseultra build against terran. It is configured for HannesBredberg, WillyT, and the default ZvT. It was in fact only tried against SAIDA. I didn’t notice anything that tells CherryPi what order to try opening builds in. Maybe the build order switcher itself contributes, helping to choose the more likely openings first?

Steamhammer 2.1.1 test version

I’ve uploaded Steamhammer 2.1.1 to SSCAIT. It’s a test version that I don’t expect to release source for. I’ll follow a plan similar to last year’s: Upload a series of test versions as the tournament approaches, in hope of catching any new bugs and weaknesses before it is too late. The version that goes into the tournament will be whatever is ready by then, and that is the version that I’ll do the full release job for. Last year I used “1.4a1” and so on as test version numbers. This year it will be “2.1.x”.

There are quite a few changes, and they are probably not what you are expecting. I count 5 fixes for serious bugs that could easily lose games, plus mitigations to reduce 2 important weaknesses, plus a bunch of quick corrections and tweaks for less important issues that came up. There is still 1 major bug at the top of my list to fix before the tournament, and we’ll see whether I’ve introduced any awful new weaknesses—some of the changes are risky, it would not be a surprise.

I disabled Randomhammer until after the tournament. It’s not broken or anything, I just felt like turning it off in plenty of time this year. Everybody else will get in a few more last-minute games.

I changed the debug drawing options to show off a new option that I added, DrawHiddenEnemies. It draws 4 symbols to represent known enemy units and buildings that cannot be seen at the moment: An enemy that is out of sight gets a small green circle at its last seen location, or a yellow circle if it was burrowed when last seen (especially useful for spider mines and lurkers). An enemy that is out of sight and is known to be no longer at its last seen location gets a red X instead. And an enemy which is in sight and is not detected gets a larger violet circle (a faint color that I think of as representing the cloaking field) and is labeled in white with its unit type. The picture is from a game versus Iron. The yellow circles are known spider mines outside detection range, and the green circle and red X mean that at least 2 more enemies are out of sight, likely nearby.

I’m not finished with AIIDE 2018. I want to analyze aspects of CherryPi and SAIDA. I’ll squeeze that in too, sooner or later.

Steamhammer 2.1 source available

Steamhammer’s web page is updated. You can finally download the source of version 2.1. Sorry about the long delay, it was a mistake!

LastOrder and its macro model - technical info

Time to dig into the details! I read the paper and some of the code to find stuff out.

LastOrder’s “macro” decisions are made by a neural network whose data size is close to 8MB—much larger than LastOrder’s code (but much smaller than CherryPi’s model data). There is room for a lot of smarts in that many bytes. The network takes in a summary description of the game state as a vector of feature values, and returns a macro action, what to build or upgrade or whatever next. The code to marshal data to and from the network is in StrategyManager.cpp.

network input

The list of network input features is initialized in the StrategyManager constructor and filled in in StrategyManager::triggerModel(). There are a lot of features. I didn’t dig into the details, but it looks as though some of the features are binary, some are counts, some are X or Y values that together give a position on the map, and a few are other numbers. They fall into these groups:

• State features. Basic information about the map and the opponent, our upgrades and economy, our own and enemy tech buildings.

• Waiting to build features. I’m not sure what these mean, but it’s something to do with production.

• “Our battle basic features” and “enemy battle basic features.” Combat units.

• “Our defend building features” and “enemy defend building features.” Static defense.

• “Killed” and “kill” features, what units of ours or the enemy’s are destroyed.

• A mass of features related to our current attack action and what the enemy has available to defend against it.

• “Our defend info” looks like places we are defending and what the enemy is attacking with.

• “Enemy defend info” looks like it locates the enemy’s static defense relative to the places we are interested in attacking.

• “Visible” gives the locations of the currently visible enemy unit types. I’m not quite sure what this means. A unit type doesn’t have an (x,y) position, and it seems as though LastOrder is making one up. It could be the location of the largest group of each unit type, or the closest unit of each type, or something. Have to read more code.

With this much information available, sophisticated strategies are possible in principle. It’s not clear how much of this the network successfully understands and makes use of. The games I watched did not give the impression of deep understanding, but then again, we have to remember that LastOrder learned to play against 20 specific opponents. Its results against those opponents suggest that it does understand them deeply.

network output

It looks like the network output is a single macro action. Code checks whether the action is valid in the current situation and, if so, calls on the appropriate manager to carry it out. The code is full of I/O details and validation and error handling, so I might have missed something in the clutter. Also the code shows signs of having been modified over time without tying up loose ends. I imagine they experimented actively.

By the way, the 9 pool/10 hatch muta/12 hatch muta opening choices and learning code from Overkill are still there, though Overkill’s opening learning is not used.

learning setup

The learning setup uses Ape-X DQN. The term is as dense as a neutron star! Ape-X is a way to organize deep reinforcement learning; see the paper Distributed Prioritized Experience Replay by Horgan et al of Google’s DeepMind. In “DQN”, D stands for deep and as far as I’m concerned is a term of hype and means “we’re doing the cool stuff.” Q is for Q-learning, the form of reinforcement learning you use when you know what’s good (winning the game) and you have to figure out from experience a policy (that’s a technical term) to achieve it in a series of steps over time. The policy is in effect a box where you feed in the situation and it tells you what to do in that situation. What’s good is represented by a reward (given as a number) that you may receive long after the actions that earned it; that can make it hard to figure out a good policy, which is why you end up training on a cluster of 1000 machines. Finally, “N” is for the neural network that acts as the box that knows the policy.

In Ape-X, the learning system consists of a set of Actors that (in the case of LastOrder) play Brood War and record the input features and reward for each time step, plus a Learner (the paper suggests that 1 learner is enough, though you could have more) that feeds the data to a reinforcement learning algorithm. The Actors are responsible for exploring, that is, trying out variations from the current best known policy to see if any of them are improvements. The Ape-X paper suggests having different Actors explore differently so you don’t get stuck in a rut. In the case of LastOrder, the Actors play against a range of opponents. The Learner keeps track of which which data points are more important to learn and feeds those in more often to speed up learning. If you hit a surprise, meaning the reward is much different than you expected (“I thought I was winning, then a nuke hit”), that’s something important to learn.

LastOrder seems to have closely followed the Ape-X DQN formula from the Ape-X paper. They name the exact same set of techniques, although many other choices are possible. Presumably DeepMind knows what they’re doing.

LastOrder does not train with a reward “I won/I lost.” That’s very little information and appears long after the actions that cause it, and it would leave learning glacially slow. They use reward shaping, which means giving a more informative reward number that offers the learning system more clues about whether it is going in the right direction. They use a reward based on the current game score.

the network itself

Following an idea from the 2015 paper Deep Recurrent Q-Learning for Partially Observable MDPs by Hausknecht and Stone, the LastOrder team layered a Long Short-Term Memory network in front of the DQN. We’ve seen LSTM before in Tscmoo (at least at one point; is it still there?). The point of the LSTM network is to remember what’s going on and more fully represent the game state, because in Brood War there is fog of war. So inputs go through the LSTM to expand the currently observed game state into some encoded approximation of all the game state that has been seen so far, then through the DQN to turn that into an action.

The LastOrder paper does not go into detail. There is not enough information in it to reproduce their network design. The Actor and Learner code is in the repo. I haven’t read it to see if it tells us everything.

Taken together it’s a little complicated, isn’t it? Not something for one hobbyist to try on their own. I think you need a team and a budget to put together something like this.

Martin Rooijackers on The Future of StarCraft AI

The Martin Rooijackers talk The Future of StarCraft AI is up on YouTube. I got the 3rd view. :-)

LastOrder and its macro model - general info

LastOrder (github) now has a 15-page academic paper out, Macro action selection with deep reinforcement learning in StarCraft by 6 authors including Sijia Xu as lead author. The paper does not go into great detail, but it reveals new information. It also uses a lot of technical terms without explanation, so it may be hard to follow if you don’t have the background. Also see my recent post how LastOrder played for a concrete look at its games.

I want to break my discussion into 2 parts. Today I’ll go over general information, tomorrow I’ll work through technical stuff, the network input and output and training and so on.

The name LastOrder turns out to be an ingenious reference to the character Last Order from the A Certain Magical Index fictional universe, the final clone sister. The machine learning process produces a long string of similar models which go into combat for experimental purposes, and you keep the last one. Good name!

LastOrder divides its work into 2 halves, “macro” handled by the machine learning model and “micro” handled by the rule-based code derived from Overkill. It’s a broad distinction; in Steamhammer’s 4-level abstraction terms, I would say that “macro” more or less covers strategy and operations, while “micro” covers tactics and micro. The macro model has a set of actions to build stuff, research tech, and expand to a new base, and a set of 18 attack actions which call for 3 different types of attack in each of 5 different places plus 3 “add army” actions which apparently assign units to the 3 types of attack. (The paper says 17 though it lists 18. It looks as though the mutalisk add army action is unused, maybe because mutas are added automatically.) There is also an action to do nothing.

The paper includes a table on the last page, results of a test tournament where each of the 28 AIIDE 2017 participants played 303 games against LastOrder. We get to see how LastOrder scored its advertised 83% win rate: #2 PurpleWave and #3 Iron (rankings from AIIDE 2017) won nearly all games, no doubt overwhelming the rule-based part of LastOrder so that the macro model could not help. Next Microwave scored just under 50%, XIMP scored about 32%, and all others performed worse, including #1 ZZZKBot at 1.64% win rate—9 bots scored under 2%. When LastOrder’s micro part is good enough, the macro part is highly effective.

In AIIDE 2018, #13 LastOrder scored 49%, ranking in the top half. The paper has a brief discussion on page 10. LastOrder was rolled by top finishers because the micro part could not keep up with #9 Iron and above (according to me) or #8 McRave and above (according to the authors, who know things I don’t). Learning can’t help if you’re too burned to learn. LastOrder was also put down by terrans Ecgberht and WillyT, whose play styles are not represented in the 2017 training group, which has only 4 terrans (one of which is Iron that LastOrder cannot touch).

In the discussion of future work (a mandatory part of an academic paper; the work is required to be unending), they talk briefly about how to fix the weaknesses that showed in AIIDE 2018. They mention improving the rule-based part and learning unit-level micro to address the too-burned-to-learn problem, and self-play training to address the limitations of the training opponents. Self-play is the right idea in principle, but it’s not easy. You have to play all 3 races and support all the behaviors you might face, and that’s only the starting point before making it work.

I’d like to suggest another simple idea for future work: Train each matchup separately. You lose generalization, but how much do production and attack decisions generalize between matchups? I could be wrong, but I think not much. Instead, a zerg player could train 3 models, ZvT ZvP and ZvZ, each of which takes fewer inputs and is solving an easier problem. A disadvantage is that protoss becomes relatively more difficult if you allow for mind control.

LastOrder has skills that I did not see in the games I watched. There is code for them, at least; whether it can carry out the skills successfully is a separate question. It can use hydralisks and lurkers. Most interestingly, it knows how to drop. The macro model includes an action to research drop (UpgradeOverlordLoad), an action to assign units and presumably load up for a drop (AirDropAddArmy), and actions to carry out drops in different places (AttackAirDropStart for the enemy starting base, AttackAirDropNatural, AttackAirDropOther1, AttackAirDropOther2, AttackAirDropOther3). The code to carry out drops is AirdropTactic.cpp; it seems to expect to drop either all zerglings, all hydralisks, or all lurkers, no mixed unit types. Does LastOrder use these skills at all? If anybody can point out a game, I’m interested.

Learning to when to make hydras and lurkers should not be too hard. If LastOrder rarely or never uses hydras, it must be because it found another plan more effective—in games you make hydras first and then get the upgrades, so it’s easy to figure out. If it doesn’t use lurkers, maybe they didn’t help, or maybe it didn’t have any hydras around to morph after it tried researching the upgrade, because hydras were seen as useless. But still, it’s only 2 steps, it should be doable. Learning to drop is not as easy, though. To earn a reward, the agent has to select the upgrade action, the load action, and the drop action in order, each at a time when it makes sense. Doing only part of the sequence sets you back, and so does doing the whole sequence if you leave too much time between the steps, or drop straight into the enemy army, or make any severe mistake. You have to carry through accurately to get the payoff. It should be learnable, but it may take a long time and trainloads of data. I would be tempted to explicitly represent dependencies like this in some way or another, to tell the model up front the required order of events.

AIIDE 2018 - how CherryPi played

Overall, the play of the AIIDE 2018 CherryPi version looks similar to last year’s CherryPi which is still playing on SSCAIT. It still has the devastating ling micro, and it still prefers to win games with a flood of low-level units. It still gets melee +1 attack even when +1 carapace seems better. (Do CherryPi’s micro skills make +1 attack better, and if so, how?) Mutalisk micro looks very similar to Tscmoo’s, with mutas individually cautious and clever and collectively lazy and uncoordinated. It can use lurkers, guardians, and ultralisks. I didn’t see defilers, even when they would have been useful.

CherryPi scouts extremely aggressively with its first 2 overlords. They stick near the enemy base and try to poke into every corner, even if the enemy is terran and can shoot them down early. It gets a clear view, which must be useful for its build order switcher. The drawback is that the overlords often die young.

I think this CherryPi looks beatable. It doesn’t have SAIDA’s wide knowledge of action and reaction. It doesn’t have Steamhammer’s knowledge of how to react to LastOrder’s excessive static defense (but usually wins anyway with a zergling flood). It sometimes ignores undefended enemy bases, preferring to attack into the enemy’s strength—or even to wait idly. Game 31245 versus Iron shows it sticking with gas units and failing at macro; it forgot its love of zerglings. It doesn’t know whether it is ahead or behind, and it doesn’t realize that when it is maxed and owns the map, it ought to attack regardless of losses. It’s strong and tricky, but it also makes mistakes. I think next year’s version had better be improved if they don’t want to be overtaken.

Here are the names of the build orders that CherryPi recorded itself as playing in its opponent learning files. One of CherryPi’s major advertised features is a learned build order switcher that can switch to a new build order on the fly. It recorded 103 build order wins/losses for each opponent (except a couple with fewer), and 103 rounds were played, so these appear to be opening build orders only rather than all build orders tried throughout each game. Presumably the openings reflect CherryPi’s intentions when it started the game. It may not have followed the initial build order to its end.

10hatchling
2hatchmuta
3basepoollings
9poolspeedlingmuta
hydracheese
zve9poolspeed
zvp10hatch
zvp3hatchhydra
zvp6hatchhydra
zvpohydras
zvpomutas
zvt2baseguardian
zvt2baseultra
zvt3hatchlurker
zvtmacro
zvz12poolhydras
zvz9gas10pool
zvz9poolspeed
zvzoverpool

CherryPi tried between 1 and 4 openings against each opponent. CherryPi sometimes switched away from its initial try even if it won all games (for example, against CDBot and Hellbot), so I’m not sure what the switching criterion is. But opponents that it tried 4 openings against are all ones that gave it a touch of trouble.

grep -c key *.json

AILien.json:1
Aiur.json:1
Arrakhammer.json:1
BlueBlueSky.json:3
CDBot.json:2
CSE.json:4
CUNYBot.json:1
DaQin.json:1
Ecgberht.json:1
Hellbot.json:2
ISAMind.json:2
Iron.json:1
KillAll.json:2
LastOrder.json:3
LetaBot.json:4
Locutus.json:4
McRave.json:4
MetaBot.json:1
Microwave.json:3
SAIDA.json:4
Steamhammer.json:4
Tyr.json:1
UAlbertaBot.json:2
WillyT.json:1
Ximp.json:2
ZZZKBot.json:4

The other machine learning feature advertised for CherryPi is a building placer. It was trained against human building placements and apparently takes into account some of the bot’s intentions. I recommend against training on human play (or at least exclusively on human play), because machines play differently. Teaching a bot to blindly imitate human decisions that it doesn’t understand will lead to mistakes. It’s worse than teaching a human to imitate without understanding, because the bot won’t figure things out on its own. Nevertheless, CherryPi’s building placement does seem cleaner than other bots. To me the building placement looks simple and logical, but not sophisticated like a strong human player’s. Here’s an example from a ZvZ game, game 1755. The sunken colony does not interfere with gas mining, and it is somewhat protected from zergling surrounds by the geyser, the spawning pool, and the lair itself, while remaining open for drone drills on the drone side. The spire is curiously far away; I would have fit it into the gap next to the sunken. It looks OK but a little loose, not quite optimized. (By the way, game 14742 against the same opponent has the same building layout, except that the spire is placed close.)

CherryPi has gained new tactical tricks. I mentioned the burrow trick where it burrows zerglings at expansion locations. So far, I haven’t seen a game where the opponent was ready for the trick; I imagine it contributed to a lot of wins, even though CherryPi sometimes researches burrow and then never uses it. (And I’m disappointed. I thought of using this trick in Steamhammer, and didn’t because I expected that bots which knew how to clear spider mines would also know how to clear burrowed zerglings. I think I was wrong!) As far as I’ve seen, CherryPi doesn’t use burrow for any other purpose (though I wouldn’t be surprised, since there are so many). CherryPi also does zergling runbys; an example is game 1406 versus SAIDA where CherryPi played an unusual and not entirely efficient gas-first 3 hatch zergling build.

CherryPi doesn’t have as many complex skills as SAIDA, but it has a good number. I doubt I saw everything it can do.

Steamhammer 2.1.1 status

Posts take time, but I am also making progress on Steamhammer. The next version will have no big features, so I’ll call it version 2.1.1. Big stuff has to wait until next year, after the tournament season. I found a second serious bug in defiler control, and now defilers consistently move to where they are wanted. It helps them swarm and plague more actively, though they still don’t do as much work as I would like. So far, 2.1.1 development version keeps track of burrowed units accurately, recognizes enemy proxies better, wards off the enemy scout worker more reliably, and has some improved macro decisions and emergency reactions, a new opening (of course), and a variety of other fixes.

There are a bunch of debilitating bugs in squad control and micro. For upcoming work, I have my eyes on 2 bugs in particular that I think cause the most frequent setbacks (rather than the most glaring blunders), the suicide pokes and the stuck units. If I get those fixed in time, I have a priority list of more stuff. If I succeed, Steamhammer will play better in nearly every game, which should show in the SSCAIT round robin phase.

note on CherryPi

I’ve been watching CherryPi’s AIIDE games. No conclusions yet, but I noticed that CherryPi likes to research burrow (not the first bot to do so) and burrow scouting lings at expansions to watch for the enemy (I think it’s the first to do that). SAIDA appeared unready for the trick. When an SCV showed up, CherryPi did not unburrow the ling, but sent another to prevent the expansion.

AIIDE 2018 - how LastOrder played

The new bot #13 LastOrder is related to the classic Overkill by Sijia Xu, but uses a machine learning model to make certain decisions: According to the description, “all the production of unit (excluding overlord), building, upgrade, tech and trigger of attack.” The learning is entirely offline; LastOrder does not store information about its opponents between games. Tactical and micro decisions, and I think building placement, are decided by rule-based code. One survey answer says,

we train the proposed macro model against about 20 AIIDE 2017 bots on a 1000 machines cluster scheduled by Tournament manager. the final model achieve about 83% win rate on all AIIDE 2017 bots

Against the stronger AIIDE 2018 bots, LastOrder scored about 49%, good enough to land in the top half of the ranking. I think the 83% win rate is too high for LastOrder’s underlying strength; I suspect that it overfitted to its 20 opponents. I think it learned to recognize some of its training opponents by their play style, and when it sees similar signs from different bots that play differently, it reacts incorrectly to a game plan that the different bot does not follow.

I watched a bunch of games to see what kind of play LastOrder figured out for itself. LastOrder’s units are mutalisks and zerglings, sometimes with scourge; I did not see it make other units (though Overkill has hydralisk skills that it might have chosen). LastOrder’s game plan is to open safely with 9 pool, sit back for a while, watch the opponent, lay down massive static defenses when danger seems to loom, macro up lots of drones, zerglings, and mutalisks behind its ramparts, and eventually burst into action and overwhelm the opponent. Details vary, but the overall game plan seemed consistent in all the games I watched.

It’s not an objectively strong game plan, but it seems effective against many bots. LastOrder had trouble touching stronger bots, upsetting only Steamhammer, and was itself upset by Ecgberht and WillyT, which as terrans had no difficulty steamrolling static defenses. But it scored highly against most lower-ranked opponents, including LetaBot (which may have been on its panel of 20 with little change).

Game 39, LastOrder-Steamhammer (replay file), was a good example of the game plan. LastOrder countered zergling with zergling for a while, then seemed to grow bored and made 4 sunkens to hide behind—far more than necessary or useful. A little later, it seemed to predict Steamhammer’s spire timing, adding excessive spores too. Steamhammer understands in principle how to react: It makes extra drones and gets ahead in both army and economy. Steamhammer could not safely attack the heavy defenses, but it could prevent LastOrder from expanding beyond its natural and win slowly. Sure enough, LastOrder tried to expand to a third, Steamhammer caught it and sent the entire army to erase the attempt—and LastOrder exploited the play, which was strategically correct but tactically wrong, hitting Steamhammer’s natural while its forces were out of position. Steamhammer’s tactical analysis is minimal; it doesn’t realize that it should destroy the expansion attempt with a small detachment.

Game 33041, LastOrder-Tyr (replay file), is one of the games that makes me suspect that LastOrder overfitted. Watch what happens after 7:00. LastOrder scouts Tyr’s turtle cannons with a zergling. LastOrder immediately reacts by building... many spore colonies, a nonsensical action. I think LastOrder saw the cannons and concluded, “I’ve seen this play before, and I know what is coming: Carriers!” It believes it is playing against XIMP. It plays similarly in games against XIMP.

LastOrder is a super interesting experiment. It did not score high like CherryPi, but it applied reinforcement learning to a more difficult problem, and it is far more successful than Sijia Xu’s past experiments with machine learning in Overkill. Its middling result is worth something, and yet its play remains somewhat disappointing. LastOrder’s play is highly reactive, but the reactions are often poor and the bot’s range of play is narrow (a wider pool of training opponents should help). I didn’t give examples, but many games show dishearteningly weak macro and mistaken tech decisions (possibly a better training methodology is needed). The problem is not solved yet!

AIIDE 2018 - what McRave learned

McRave, like Microwave and no doubt most bots that follow more than one plan, plays different openings against different races. In each opponent’s learning file, it writes win/loss numbers for 15 strategies. Their names all start with “P” for protoss, but I have stripped out the P to make the table more readable. 4 of the strategies are unused: ZealotDrop, NZCore (sounds like no zealot core), Proxy99, and Proxy6. That leaves 11 active openings. The races they were used against seen in the table. ZZCore (2 zealots before core) was played only against random.

#	bot	total	12Nexus	1GateCorsair	1GateRobo	21Nexus	2GateDragoon	2GateExpand	4Gate	DTExpand	FFE	ZCore	ZZCore
#1	saida	16-55 23%	1-12 8%	-	-	7-17 29%	1-12 8%	-	-	7-14 33%	-	-	-
#2	cherrypi	15-88 15%	-	6-25 19%	-	-	-	6-25 19%	2-20 9%	-	1-18 5%	-	-
#3	cse	27-75 26%	-	-	7-19 27%	-	-	5-17 23%	2-15 12%	-	-	13-24 35%	-
#4	bluebluesky	29-74 28%	-	-	1-14 7%	-	-	2-15 12%	7-18 28%	-	-	19-27 41%	-
#5	locutus	46-56 45%	-	-	5-12 29%	-	-	15-15 50%	14-15 48%	-	-	12-14 46%	-
#6	isamind	54-49 52%	-	-	7-11 39%	-	-	4-10 29%	15-14 52%	-	-	28-14 67%	-
#7	daqin	60-43 58%	-	-	13-11 54%	-	-	4-9 31%	8-10 44%	-	-	35-13 73%	-
#9	iron	56-32 64%	27-8 77%	-	-	2-7 22%	18-9 67%	-	-	9-8 53%	-	-	-
#10	zzzkbot	75-28 73%	-	8-7 53%	-	-	-	17-7 71%	21-7 75%	-	29-7 81%	-	-
#11	steamhammer	64-38 63%	-	9-9 50%	-	-	-	27-10 73%	15-10 60%	-	13-9 59%	-	-
#12	microwave	82-21 80%	-	0-5 0%	-	-	-	39-4 91%	30-5 86%	-	13-7 65%	-	-
#13	lastorder	97-6 94%	-	10-2 83%	-	-	-	17-1 94%	10-2 83%	-	60-1 98%	-	-
#14	tyr	91-10 90%	-	-	23-3 88%	-	-	7-5 58%	31-1 97%	-	-	30-1 97%	-
#15	metabot	49-46 52%	-	-	8-11 42%	-	-	16-12 57%	23-14 62%	-	-	2-9 18%	-
#16	letabot	77-15 84%	12-5 71%	-	-	5-5 50%	20-4 83%	-	-	40-1 98%	-	-	-
#17	arrakhammer	102-1 99%	-	-	-	-	-	-	94-1 99%	-	8-0 100%	-	-
#18	ecgberht	99-2 98%	95-0 100%	-	-	-	3-1 75%	-	-	1-1 50%	-	-	-
#19	ualbertabot	73-29 72%	-	-	-	-	-	12-8 60%	38-6 86%	-	7-7 50%	-	16-8 67%
#20	ximp	41-59 41%	-	-	8-14 36%	-	-	15-17 47%	18-18 50%	-	-	0-10 0%	-
#21	cdbot	103-0 100%	-	-	-	-	-	-	103-0 100%	-	-	-	-
#22	aiur	80-21 79%	-	-	11-6 65%	-	-	13-6 68%	41-3 93%	-	-	15-6 71%	-
#23	killall	60-43 58%	-	3-9 25%	-	-	-	6-9 40%	19-12 61%	-	32-13 71%	-	-
#24	willyt	77-17 82%	37-2 95%	-	-	3-6 33%	23-4 85%	-	-	14-5 74%	-	-	-
#25	ailien	86-17 83%	-	31-3 91%	-	-	-	20-5 80%	5-6 45%	-	30-3 91%	-	-
#26	cunybot	91-8 92%	-	26-1 96%	-	-	-	36-1 97%	14-3 82%	-	15-3 83%	-	-
#27	hellbot	103-0 100%	-	-	-	-	-	-	-	-	-	103-0 100%	-
overall		- 68%	172-27 86%	93-61 60%	83-101 45%	17-35 33%	65-30 68%	261-176 60%	510-180 74%	71-29 71%	208-68 75%	257-118 69%	16-8 67%

Unlike other bots that scored comparatively well against SAIDA—meaning they weren’t always wiped summarily from the map—McRave did not rely solely on cloaked units. The DTExpand opening scored best, but 21Nexus was nearly as successful. (McRave scored inconsistently against lower-ranked bots, though, as its author has commented.)

Every strategy came out with some good scores. But here is another analysis: Suppose the goal of the learning algorithm is to find the single most successful strategy (which is not always true—you might want to find the best mix to confuse the opponent’s learning). Leaving aside CDBot and HellBot, which McRave scored 100% against, against how many opponents was each opening the best choice? I made this table by hand, so there might be mistakes. I counted equal best as also best. The “versus” column tells which races the opening was used against.

opening	best	versus
12Nexus	3	T
1GateCorsair	2	Z
1GateRobo	0	P
21Nexus	0	T
2GateDragoon	0	T
2GateExpand	6	P, Z, R
4Gate	5	P, Z, R
DTExpand	2	T
FFE	5	Z, R
ZCore	4	P
ZZCore	0	R

The counts do not match up well with the overall winning rates. There were 4 never-best openings. This analysis does not say that they are bad openings that dragged down the score. Consider what would have happened if they had not been enabled: Their games would have been distributed among the other openings; there would have been some extra wins and some extra losses, and the ratio would depend on the distribution. 21Nexus was never best, but scored second best against SAIDA and contributed as many wins. On the other hand, the openings which were often best were definitely worth having; they were well-chosen for McRave versus this set of opponents. It could make sense to try those openings first, or more often. On the third hand, notice that the openings with the highest counts were played against the largest number of opponents. There were more bests to count! Openings versus terran scored 5 bests because there were 5 terran opponents.

Plenty of similar analyses could be done. For example, you could count how often or how widely an opening scored above/below the average for each opponent: Did it make a net contribution, or the opposite? It would be another way of seeing whether the openings were well chosen for the opponents they faced.

Next I want to start watching some replays. I think I will start with LastOrder, which did all its learning offline yet held its win rate steady against the onslaught of learning bots. I’m expecting it to be interesting and sophisticated in some way.

AIIDE 2018 replays released

AIIDE 2018 replays are available for download, either as compressed files for each bot or one by one. You can also get individual replays from the detailed results page (which is big and slow to load). It’s a sea of data, no surprise that it took them a while to make arrangements.

AIIDE 2018 - what UAlbertaBot learned

UAlbertaBot played random, and its openings are chosen, not according to the opponent’s race, but according to its own once the game starts. It has 3 protoss, 4 terran, and 4 zerg openings. Playing random gives the disadvantage of having about 1/3 as many games to figure out how to counter the opponent with each race. The countervailing advantage, of course, is that the opponent can’t predict what is coming its way.

103 rounds were played and UAlbertaBot does not deliberately drop data, so some of the totals add up to more than the 100 official rounds. UAlbertaBot also had 46 crashes, so some totals add up to less. For example, it recorded 96 games against LastOrder.

The official site doesn’t offer binaries for the bots which were carried over from last year, but this should be the 2017 version of UAlbertaBot. It has enemy-specific strategies configured for 13 opponents, of which 5 are also in this tournament: #9 Iron, #10 ZZZKBot, #16 LetaBot, #2o Ximp, and #22 Aiur. For ZZZKBot, only the protoss opening is set; for the others, all 3 races have openings set. Looking at the table, we see that UAlbertaBot did not always try all of its openings, and the blanks in the table do not always correspond to enemy-specific openings. Apparently in this UAlbertaBot version, the enemy-specific strategies act as hints rather than requirements: When available they are tried first, and when not, the default opening is tried first (ZealotRush, MarineRush, or ZerglingRush). If the first opening tried performs well enough, UAlbertaBot sticks with it.

#	bot	total	Protoss			Terran				Zerg
#	bot	total	DTRush	DragoonRush	ZealotRush	4RaxMarines	MarineRush	TankPush	VultureRush	2HatchHydra	3HatchMuta	3HatchScourge	ZerglingRush
#1	saida	13-88 13%	12-7 63%	0-2 0%	0-5 0%	0-9 0%	0-9 0%	1-13 7%	0-9 0%	0-9 0%	0-9 0%	0-8 0%	0-8 0%
#2	cherrypi	1-99 1%	0-8 0%	0-7 0%	0-7 0%	0-8 0%	1-11 8%	0-8 0%	0-8 0%	0-11 0%	0-11 0%	0-10 0%	0-10 0%
#3	cse	2-99 2%	0-7 0%	2-14 12%	0-7 0%	0-11 0%	0-10 0%	0-10 0%	0-10 0%	0-8 0%	0-8 0%	0-7 0%	0-7 0%
#4	bluebluesky	11-92 11%	0-4 0%	3-10 23%	4-11 27%	0-5 0%	0-5 0%	2-11 15%	0-5 0%	0-9 0%	0-8 0%	0-8 0%	2-16 11%
#5	locutus	6-97 6%	0-7 0%	4-17 19%	0-7 0%	0-8 0%	0-8 0%	1-11 8%	0-8 0%	1-10 9%	0-7 0%	0-7 0%	0-7 0%
#6	isamind	5-96 5%	0-7 0%	4-17 19%	0-7 0%	0-9 0%	0-8 0%	0-8 0%	0-8 0%	0-7 0%	0-7 0%	0-7 0%	1-11 8%
#7	daqin	12-90 12%	4-12 25%	0-4 0%	2-9 18%	0-6 0%	0-6 0%	1-6 14%	0-5 0%	2-13 13%	0-7 0%	0-7 0%	3-15 17%
#8	mcrave	29-71 29%	5-12 29%	1-6 14%	0-5 0%	0-3 0%	10-13 43%	1-5 17%	0-3 0%	2-6 25%	0-3 0%	0-3 0%	10-12 45%
#9	iron	9-94 9%	0-10 0%	1-14 7%	0-9 0%	0-8 0%	0-8 0%	0-8 0%	1-12 8%	1-6 14%	1-6 14%	0-4 0%	5-9 36%
#10	zzzkbot	13-87 13%	0-3 0%	0-3 0%	13-20 39%	0-9 0%	0-9 0%	0-9 0%	0-9 0%	0-7 0%	0-6 0%	0-6 0%	0-6 0%
#11	steamhammer	11-92 11%	0-5 0%	0-5 0%	8-19 30%	1-10 9%	0-6 0%	0-6 0%	0-6 0%	0-7 0%	0-7 0%	0-7 0%	2-14 12%
#12	microwave	20-81 20%	-	-	18-7 72%	0-7 0%	2-14 12%	0-7 0%	0-7 0%	0-10 0%	0-10 0%	0-10 0%	0-9 0%
#13	lastorder	4-92 4%	0-6 0%	0-6 0%	2-12 14%	2-10 17%	0-5 0%	0-5 0%	0-5 0%	0-11 0%	0-11 0%	0-11 0%	0-10 0%
#14	tyr	36-61 37%	5-12 29%	0-4 0%	0-5 0%	0-2 0%	3-4 43%	13-7 65%	1-2 33%	13-15 46%	0-3 0%	0-3 0%	1-4 20%
#15	metabot	35-56 38%	4-5 44%	6-5 55%	2-4 33%	1-6 14%	3-9 25%	1-6 14%	0-3 0%	0-2 0%	6-3 67%	3-3 50%	9-10 47%
#16	letabot	48-44 52%	11-14 44%	0-3 0%	2-6 25%	0-2 0%	1-4 20%	0-2 0%	4-7 36%	30-6 83%	-	-	-
#17	arrakhammer	56-41 58%	-	-	23-6 79%	0-6 0%	0-6 0%	0-6 0%	0-6 0%	-	-	-	33-11 75%
#18	ecgberht	40-56 42%	9-7 56%	9-8 53%	1-4 20%	0-2 0%	0-5 0%	0-2 0%	6-7 46%	0-3 0%	0-3 0%	0-3 0%	15-12 56%
#20	ximp	38-56 40%	0-2 0%	7-7 50%	4-5 44%	0-4 0%	0-4 0%	9-19 32%	1-6 14%	-	-	17-9 65%	-
#21	cdbot	44-54 45%	-	-	23-4 85%	0-2 0%	19-15 56%	0-2 0%	0-2 0%	0-6 0%	1-9 10%	0-5 0%	1-9 10%
#22	aiur	57-45 56%	35-1 97%	-	-	0-2 0%	0-2 0%	0-2 0%	11-10 52%	1-5 17%	9-15 38%	0-3 0%	1-5 17%
#23	killall	73-27 73%	-	-	30-8 79%	0-2 0%	12-6 67%	0-2 0%	0-2 0%	-	-	-	31-7 82%
#24	willyt	36-55 40%	3-12 20%	1-8 11%	0-5 0%	0-4 0%	0-5 0%	0-4 0%	10-11 48%	-	-	-	22-6 79%
#25	ailien	71-30 70%	-	-	18-11 62%	16-10 62%	2-4 33%	0-2 0%	0-2 0%	-	-	-	35-1 97%
#26	cunybot	75-15 83%	-	-	23-1 96%	-	30-7 81%	-	-	-	-	-	22-7 76%
#27	hellbot	100-2 98%	-	-	33-0 100%	-	41-2 95%	-	-	-	-	-	26-0 100%
overall		- 33%	88-141 38%	38-140 21%	206-184 53%	20-145 12%	124-185 40%	29-161 15%	34-153 18%	50-151 25%	17-133 11%	20-121 14%	219-206 52%

The DT rush caused surprising problems for SAIDA, but terran and zerg had nothing. Did playing random contribute? Does the updated current SAIDA, flame-hardened on SSCAIT, react better? The hand-chosen 2 hatch hydra also did strikingly well against LetaBot, not an obvious choice. Every opening had a plus score against some opponent, though VultureRush barely made it over. Looking across the bottom row, the default openings had the best overall results for each race—they were chosen correctly. Also, we can see that protoss was UAlbertaBot’s best race, and terran the worst; we already knew that, but here we see it in the numbers.