archive by month
Skip to content

AIIDE 2021 - Stardust table in minutes and seconds

It occurred to me a little late that many people would find the Stardust data table easier to understand if the frame times were converted to minutes and seconds. So here’s that version. See the previous post from today.

firstDarkTemplarCompleted pylonInOurMain firstMutaliskCompleted
opponent games n min median max n min median max n min median max
bananabrain 155 20 5:15 5:29 16:11 0 - - - 0 - - -
dragon 156 0 - - - 0 - - - 0 - - -
steamhammer 158 0 - - - 0 - - - 17 4:59 5:43 7:11
mcrave 157 0 - - - 0 - - - 124 6:17 7:35 11:12
willyt 157 0 - - - 0 - - - 0 - - -
microwave 157 0 - - - 0 - - - 17 5:07 5:55 7:54
daqin 156 126 5:13 5:29 12:36 2 1:53 1:54 1:55 0 - - -
freshmeat 157 0 - - - 0 - - - 1 11:40 11:40 11:40
ualbertabot 157 17 4:19 4:29 4:36 0 - - - 0 - - -

AIIDE 2021 - Stardust’s learning

I investigated how Stardust’s learning works, and what it learned. It’s unusual, so it was worth a close look.

In its learning file of game records for each opponent, Stardust records values for 3 keys for each game, firstDarkTemplarCompleted, pylonInOurMain, and firstMutaliskCompleted. If the event occurs in the game, the value is the frame time of the event; otherwise the value is 2147483647 (INT_MAX, the largest int value, in this C++ implementation). It also records whether the game was a win or a loss. It records the hash of the map, too, but that doesn’t seem to be used again.

summarizing the data

The class Opponent is responsible for providing the learned information to the rest of the bot. It summarizes the game records via two routines.

  int minValueInPreviousGames(const std::string &key, int defaultNoData, int maxCount = INT_MAX, int minCount = 0);

If there are at least minCount games, then look through the game records, most recent first, for up to maxCount games. Look up the key for each game and return its minimum value, or the default value if there are none. This amounts to finding the earliest frame at which the event happened, or the default if it did not happen in the specified number of games.

   double winLossRatio(double defaultValue, int maxCount = INT_MAX);

Look through the game records, most recent first, for up to maxCount games and return the winning ratio, or the default value if there are no games yet.

using the summarized data

Each of the 3 keys is used in exactly one place in the code. Here is where firstDarkTemplarCompleted is looked up in the PvP strategy code:

    if (Opponent::winLossRatio(0.0, 200) < 0.99)
    {
        expectedCompletionFrame = Opponent::minValueInPreviousGames("firstDarkTemplarCompleted", 7300, 15, 10);
    }

This means “If we’re rolling you absolutely flat (at least 99% wins in the last 200 games), then it doesn’t matter. Otherwise there’s some risk. In the most recent 15 games, find the earliest frame that the first enemy dark templar was (estimated to be) completed, or return frame 7300 if none.” The default frame 7300 is not the earliest a DT can emerge; they can be on the map over a thousand frames earlier. So it is not a worst-case assumption. Further code overrides the frame number if there is scouting information related to dark templar production. It attempts to build a defensive photon cannon just in time for the enemy DT’s arrival, and sometimes to get an observer.

The key pylonInOurMain is part of cannon rush defense. Stardust again checks the win ratio and again looks back 15 games with a minimum game count of 10, this time with a default of 0 if there are not enough games. It starts scouting its base 500 frames (about 21 seconds) ahead of the earliest seen enemy pylon appearing in its base, which may be never. The idea is that Stardust doesn’t waste time scouting its own base if it hasn’t seen you proxy a pylon in the last 15 games, and delays the scout if the pylon is proxied late.

The key firstMutaliskCompleted is used very similarly, to decide whether and when to defend each nexus with cannons. The goal is to get cannons in time in case mutalisks arrive without being scouted. There are simple rules to decide how many cannons at each nexus:

    // Main and natural are special cases, we only get cannons there to defend against air threats
    if (base == Map::getMyMain() || base == Map::getMyNatural())
    {
        if (enemyAirUnits > 6) return 4;
        if (enemyAirThreat) return 3;
        if (enemyDropThreat && BWAPI::Broodwar->getFrameCount() > 8000) return 1;
        return 0;
    }

    // At expansions we get cannons if the enemy is not contained or has an air threat
    if (!Strategist::isEnemyContained() || enemyAirUnits > 0) return 2;
    if (enemyAirThreat || enemyDropThreat) return 1;
    return 0;

If the firstMutaliskCompleted check says that it’s time, it sets enemyAirThreat to true and makes 3 cannons each at main and natural, and at least 1 at each other base.

the data itself

Here’s my summary of the data in Stardust’s files. The files include prepared data. I left the prepared data out; this covers only what was recorded during the tournament. The tournament was run for 157 rounds, although the official results are given after round 150. The table here is data for all 157 rounds. I don’t have a way to tell which unrecorded games were from rounds 1-150 and which were from 151-157... though I think I could guess.

n is the number of games for which a value (other than 2147483647) was recorded for the key. The values are frame numbers.

firstDarkTemplarCompleted pylonInOurMain firstMutaliskCompleted
opponent games n min median max n min median max n min median max
bananabrain 155 20 7579 7897.5 23319 0 - - - 0 - - -
dragon 156 0 - - - 0 - - - 0 - - -
steamhammer 158 0 - - - 0 - - - 17 7188 8241 10355
mcrave 157 0 - - - 0 - - - 124 9070 10939 16146
willyt 157 0 - - - 0 - - - 0 - - -
microwave 157 0 - - - 0 - - - 17 7371 8534 11397
daqin 156 126 7533 7912.5 18154 2 2721 2743.5 2766 0 - - -
freshmeat 157 0 - - - 0 - - - 1 16801 16801 16801
ualbertabot 157 17 6230 6477 6627 0 - - - 0 - - -

As you might expect after deep contemplation of the nature of reality, only protoss makes dark templar or proxy pylons, and only zerg makes mutalisks. Nothing interesting was recorded for the terran opponents.

Notice that UAlbertaBot sometimes makes dark templar much earlier than the no-data 7300 frame default time; the others do not. DaQin is recorded as twice placing a proxy pylon in Stardust’s main. I didn’t think it ever did that. I guess it’s a holdover from the Locutus proxy pylon play, to trick opponents into overreacting? DaQin made DTs in most games, and McRave went mutalisks in most games. FreshMeat is recorded as having made a mutalisk (or more than one) in exactly one game, which seems unusual.

AIIDE 2020 - McRave’s learning algorithm

I meant to summarize McRave’s learning data today, but to know what to put in the tables I had to understand how the numbers are used. Yesterday I examined McRave’s strategy representation with three elements, like “PoolHatch,Overpool,2HatchMuta”. In the code, the elements are named “build” (like PoolHatch), “opener” (like Overpool) and “transition” (like 2HatchMuta). Today I read the code to see what the numbers in the learning files are and how they are used.

Here’s a sample data file, showing McRave doing well versus Steamhammer. The first two numbers are the overall wins and losses. After that, delimited by dashes, is a section for the first build, followed by a section for the openers of that build and a section for the transitions of the build. Then more sections for the other two builds and their appendages. Each element has an independent count of wins and losses.

86 64
-
HatchPool 0 0
-
12Hatch 0 0
-
2HatchMuta 0 0
2HatchSpeedling 0 0
-
PoolHatch 28 27
-
4Pool 0 0
9Pool 0 0
Overpool 0 0
12Pool 28 27
-
2HatchMuta 16 17
2HatchSpeedling 12 10
3HatchSpeedling 0 0
-
PoolLair 58 37
-
9Pool 58 37
-
1HatchMuta 58 37

The code calls a function to check which triples are allowed and deals with other minor details, but even with the fiddly bits it’s simple: It picks the build with the highest UCB value, then given that build the corresponding opener with the highest UCB value, then given that build and opener the transition with the highest UCB value. Because of how the data file is organized, this can be done in one pass. The code is in the file LearningManager.cpp in the nested function parseLearningFile().

In theory, this three-level hierarchy could speed up learning. For example, you might be able to conclude that PoolHatch is better than PoolLair against some opponent, even if you don’t have enough data to know which PoolHatch opener or transition is best. My intuition is that the hierarchical scheme should on average work better than a flat scheme, but that there will be perverse situations where it does worse. Many of the triples are not allowed, which limits the value of the hierarchy. There should be enough data from this tournament to judge whether the hierarchy brought an advantage; it would be interesting to do the analysis.

Next: OK, now I know what tables to generate. I have to add some features to my script, but soon I should be able to post the summary tables.

AIIDE 2019 - looking into AITP

AITP follows an “aggressive defense” game plan, similar to SAIDA, where it builds up a strong ground army, then sets up tank lines in forward positions to constrict the enemy like a boa (preferably a snake, sometimes a feather boa). Its overall skill is far less, though; it finished second to last in AIIDE 2019. Overall, after reading some of the key code, I am not impressed (maybe you shouldn’t expect me to have been). The plans are ambitious but the work looks hasty, as if the authors underestimated it and had to rush to make the tournament. Still, the plans are ambitious, and that makes it somewhat interesting.

tactics

One of AITP’s first steps is to calculate map positions for a wall (supplyPoint barracksPoint bunkerPoint), defensive locations (chokePoint1 through chokePoint4) and offensive tank lines (frontLine1 through frontLine3). How does it calculate them? They are hardcoded for every starting base on every map in InformationManager::initializePosition(). That must be part of why AITP did not want to participate in the unknown map tournament. These positions seem to be the foundation on which all the tactical decisions are laid.

Spider mines are placed in neat diagonal double lines at locations offset from the defense line calculated by CombatCommander::updateDefenseLine(), which looks mostly at the supply and the previous defense line and sets the current defense line to chokePoint2 (the natural choke) or to one of the precalculated frontLine values. It’s simple and primitive, and there is not much examination of the game situation, a good starting try but nothing you expect to be strong. It’s silly sometimes; the defense line may be in front of an empty base that is away from the fight. CombatCommander::updateSpiderSquads() sets up spiderSquad1 and spiderSquad2—and also lays mines itself, calculating the offset steps and checking every vulture itself rather than leaving it to the squads. The intended separation of functions is not observed. Micro::LaySpiderMine() is unused.

StrategyManager::shouldBuildTurret() decides how many turrets should go where. Some turrets go to occupied choke points 1 and 2, some next to command centers, and some at the front lines computed above depending on certain “squad positions.” These so-called squad positions come from CombatCommander::getSquadPosition(std::string squadID), which takes a string that is called squad ID but is actually a 2-digit code that identifies a location rather than a squad. The location is usually an offset from the current defense line, but there are exceptions. The ID seems to choose one of a hardcoded set of offsets for the final position.

Units, of course, also go to the current defense line: That is the tank line, soon furnished with mines and turrets, that is meant to restrict the enemy. At some point CombatCommander::updateAttackEnemyBase() becomes true and sets _aimEnemyBase to the location of an enemy base to destroy. When this happens depends on what the current defense line is, but it’s another set of simple calculations using the closest friendly unit to various enemy bases.

what to build

I outlined the build order and unit mix decisions in the post how AITP played. There are game stages A, B and C and “modules” A1, A2 and so on for each game stage.

A1 is the antirush module that starts a barracks at 5, whose play was described in that post. It switches to the next module when there is a bunker and 4 marines and the engineering bay is started. One of AITP’s strategies was A1-B1-B2-C2. B1 makes SCVs up to 20, barracks up to 5, academy and medics, moves out at 10 marines, and switches to the next module after 10 minutes game time. B2 makes a bunker under certain conditions—but only if the previous module was A4. The modules are not in fact modular but know about each other. Then it makes SCVs and marines up to a smaller limit than B1, makes a factory and gets the upgrades and expansion. It switches when there are 2 command centers or the supply reaches 100 (really 100, not 50: int supplyUsed = BWAPI::Broodwar->self()->supplyUsed() / 2;). Module C2 is the middlegame: It makes SCVs to the limit, adds factories, throws bunkers into the middle of the map, gets armory upgrades, and never switches.

Here’s an extract showing the strange over-specificity of AITP’s code, from StrategyManager::doSwitchModule() which switches from the current module to the next one. The code does not simply parse out the strategy module names from the StrategyName string, it handles each as a special case, sometimes taking extra actions that I would say should be factored out.

	if (_currentModule == "A1")
	{
		if (Config::Strategy::StrategyName == "A1-B1-B2-C2")
		{
			_currentModule = "B1";
			CombatCommander::Instance().clearSquad("bunkerSquad2");
		}
		else if (Config::Strategy::StrategyName == "A1-B3-C2")
		{
			_currentModule = "B3";
			CombatCommander::Instance().clearSquad("bunkerSquad2");
		}
	}

AITP’s modules are not easy to read. I think Steamhammer’s explicit build orders plus zerg strategy rules are more perspicuous and no less expressive—but then I would say that, wouldn’t I?

AIIDE 2018 - what CherryPi learned

Here is a table of how each CherryPi opening fared against each opponent, like the tables I made for other bots. Reading the code confirmed my inference that the learning files recorded opening build orders, not build orders switched to later in the game; see how CherryPi played.

#bottotal10hatchling2hatchmuta3basepoollings9poolspeedlingmutahydracheesezve9poolspeedzvp10hatchzvp3hatchhydrazvp6hatchhydrazvpohydraszvpomutaszvt2baseguardianzvt2baseultrazvt3hatchlurkerzvtmacrozvz12poolhydraszvz9gas10poolzvz9poolspeedzvzoverpool
#1saida13-90  13%-----1-19 5%------1-15 6%9-37 20%2-19 10%----
#3cse73-30  71%-----0-2 0%24-5 83%--16-8 67%----33-15 69%----
#4bluebluesky89-14  86%-----0-1 0%29-8 78%-------60-5 92%----
#5locutus84-19  82%--63-11 85%-----14-3 82%-2-2 50%---5-3 62%----
#6isamind99-4  96%--1-0 100%-----98-4 96%----------
#7daqin103-0  100%--------------103-0 100%----
#8mcrave87-16  84%--9-2 82%-----31-4 89%-14-4 78%---33-6 85%----
#9iron97-6  94%----97-6 94%--------------
#10zzzkbot93-10  90%58-4 94%--0-1 0%-------------35-4 90%0-1 0%
#11steamhammer81-21  79%22-7 76%----16-5 76%---------0-1 0%-43-8 84%-
#12microwave94-9  91%----------------0-1 0%4-2 67%90-6 94%
#13lastorder85-18  83%45-7 87%----0-1 0%------------40-10 80%
#14tyr98-5  95%------98-5 95%------------
#15metabot94-2  98%---------94-2 98%---------
#16letabot101-2  98%0-1 0%-97-0 100%--1-1 50%-----3-0 100%-------
#17arrakhammer92-11  89%-----------------92-11 89%-
#18ecgberht102-1  99%--------------102-1 99%----
#19ualbertabot99-4  96%---96-2 98%-3-2 60%-------------
#20ximp98-5  95%-------1-0 100%-97-5 95%---------
#21cdbot103-0  100%-----96-0 100%-----------7-0 100%-
#22aiur100-3  97%---------100-3 97%---------
#23killall103-0  100%102-0 100%-----------------1-0 100%
#24willyt103-0  100%-103-0 100%-----------------
#25ailien103-0  100%-----------------103-0 100%-
#26cunybot100-3  97%-----------------100-3 97%-
#27hellbot103-0  100%------31-0 100%--72-0 100%---------
overall-  90%227-19 92%103-0 100%170-13 93%96-3 97%97-6 94%117-31 79%182-18 91%1-0 100%143-11 93%379-18 95%16-6 73%3-0 100%1-15 6%9-37 20%338-49 87%0-1 0%0-1 0%384-28 93%131-17 89%

Look how sparse the chart is—CherryPi was highly selective about its choices. It did not try more than 4 different builds against any opponent. It makes sense to minimize the number of choices so that you don’t lose games exploring bad ones, but you have to be pretty sure that one of the choices you do try is good. Where did the selectivity come from?

The opening “hydracheese” was played only against Iron, and was the only opening played against Iron. It smelled like a hand-coded choice. Sure enough, the file source/src/models/banditconfigurations.cpp configures builds by name for 18 of the 27 entrants. A comment says that the build order switcher is turned off for the hydracheese opening only: “BOS disabled for this specific build because the model hasn’t seen it.” Here is the full set of builds configured, including defaults for those that were not hand-configured. CherryPi played only builds that were configured, but did not play all the builds that were configured; presumably it stopped when it hit a good one.

botsbuildsnote
AILienzve9poolspeed zvz9poolspeedreturning opponents from last year
AIURzvtmacro zvpohydras zvp10hatch
Arrakhammer10hatchling zvz9poolspeed
Ironhydracheese
UAlbertaBotzve9poolspeed 9poolspeedlingmuta
Ximpzvpohydras zvtmacro zvp3hatchhydra
Microwavezvzoverpool zvz9poolspeed zvz9gas10pool“we have some expectations”
Steamhammerzve9poolspeed zvz9poolspeed zvz12poolhydras 10hatchling
ZZZKBot9poolspeedlingmuta 10hatchling zvz9poolspeed zvzoverpool
ISAMind
Locutus
McRave
DaQin
zvtmacro zvp6hatchhydra 3basepoollings zvpomutas
CUNYBotzvzoverpoolplus1 zvz9gas10pool zvz9poolspeed
HannesBredbergzvtp1hatchlurker zvt2baseultra zvt3hatchlurker zvp10hatch
LetaBotzvtmacro 3basepoollings zvt2baseguardian zve9poolspeed 10hatchling
MetaBotzvtmacro zvpohydras zvpomutas zve9poolspeed
WillyTzvt2baseultra 12poolmuta 2hatchmuta
ZvTzvt2baseultra zvtmacro zvt3hatchlurker zve9poolspeeddefaults
ZvPzve9poolspeed zvtmacro zvp10hatch zvpohydras
ZvZ10hatchling zve9poolspeed zvz9poolspeed zvzoverpool
ZvR10hatchling zve9poolspeed 9poolspeedlingmuta

I read this as pulling out all the stops to reach #1. They would have succeeded if not for SAIDA.

banditconfigurations.cpp continues and declares some properties for builds including non-opening builds. It looks like .validOpening() tells whether it can be played as an opening build, .validSwitch() tells whether the build order switcher is allowed to switch to it during the game, and .switchEnabled() tells whether the build order switcher is enabled at all.

The build orders themselves are defined in source/src/buildorders/. I found them a little hard to read, partly because they are written in reverse order: Actions to happen first are posted last to the blackboard.

The opening zve9poolspeed (I read “zve” as zerg versus everything) has the most red boxes in the chart—it did poorly against more opponents than any other. It may have been a poor choice to configure for use in so many cases. In contrast, zvz9poolspeed specialized for ZvZ was successful. It gets fast mutalisks and in general has a lot more strategic detail coded into the build.

They seem to have had expectations of the zvt2baseultra build against terran. It is configured for HannesBredberg, WillyT, and the default ZvT. It was in fact only tried against SAIDA. I didn’t notice anything that tells CherryPi what order to try opening builds in. Maybe the build order switcher itself contributes, helping to choose the more likely openings first?

LastOrder and its macro model - technical info

Time to dig into the details! I read the paper and some of the code to find stuff out.

LastOrder’s “macro” decisions are made by a neural network whose data size is close to 8MB—much larger than LastOrder’s code (but much smaller than CherryPi’s model data). There is room for a lot of smarts in that many bytes. The network takes in a summary description of the game state as a vector of feature values, and returns a macro action, what to build or upgrade or whatever next. The code to marshal data to and from the network is in StrategyManager.cpp.

network input

The list of network input features is initialized in the StrategyManager constructor and filled in in StrategyManager::triggerModel(). There are a lot of features. I didn’t dig into the details, but it looks as though some of the features are binary, some are counts, some are X or Y values that together give a position on the map, and a few are other numbers. They fall into these groups:

• State features. Basic information about the map and the opponent, our upgrades and economy, our own and enemy tech buildings.

• Waiting to build features. I’m not sure what these mean, but it’s something to do with production.

• “Our battle basic features” and “enemy battle basic features.” Combat units.

• “Our defend building features” and “enemy defend building features.” Static defense.

• “Killed” and “kill” features, what units of ours or the enemy’s are destroyed.

• A mass of features related to our current attack action and what the enemy has available to defend against it.

• “Our defend info” looks like places we are defending and what the enemy is attacking with.

• “Enemy defend info” looks like it locates the enemy’s static defense relative to the places we are interested in attacking.

• “Visible” gives the locations of the currently visible enemy unit types. I’m not quite sure what this means. A unit type doesn’t have an (x,y) position, and it seems as though LastOrder is making one up. It could be the location of the largest group of each unit type, or the closest unit of each type, or something. Have to read more code.

With this much information available, sophisticated strategies are possible in principle. It’s not clear how much of this the network successfully understands and makes use of. The games I watched did not give the impression of deep understanding, but then again, we have to remember that LastOrder learned to play against 20 specific opponents. Its results against those opponents suggest that it does understand them deeply.

network output

It looks like the network output is a single macro action. Code checks whether the action is valid in the current situation and, if so, calls on the appropriate manager to carry it out. The code is full of I/O details and validation and error handling, so I might have missed something in the clutter. Also the code shows signs of having been modified over time without tying up loose ends. I imagine they experimented actively.

By the way, the 9 pool/10 hatch muta/12 hatch muta opening choices and learning code from Overkill are still there, though Overkill’s opening learning is not used.

learning setup

The learning setup uses Ape-X DQN. The term is as dense as a neutron star! Ape-X is a way to organize deep reinforcement learning; see the paper Distributed Prioritized Experience Replay by Horgan et al of Google’s DeepMind. In “DQN”, D stands for deep and as far as I’m concerned is a term of hype and means “we’re doing the cool stuff.” Q is for Q-learning, the form of reinforcement learning you use when you know what’s good (winning the game) and you have to figure out from experience a policy (that’s a technical term) to achieve it in a series of steps over time. The policy is in effect a box where you feed in the situation and it tells you what to do in that situation. What’s good is represented by a reward (given as a number) that you may receive long after the actions that earned it; that can make it hard to figure out a good policy, which is why you end up training on a cluster of 1000 machines. Finally, “N” is for the neural network that acts as the box that knows the policy.

In Ape-X, the learning system consists of a set of Actors that (in the case of LastOrder) play Brood War and record the input features and reward for each time step, plus a Learner (the paper suggests that 1 learner is enough, though you could have more) that feeds the data to a reinforcement learning algorithm. The Actors are responsible for exploring, that is, trying out variations from the current best known policy to see if any of them are improvements. The Ape-X paper suggests having different Actors explore differently so you don’t get stuck in a rut. In the case of LastOrder, the Actors play against a range of opponents. The Learner keeps track of which which data points are more important to learn and feeds those in more often to speed up learning. If you hit a surprise, meaning the reward is much different than you expected (“I thought I was winning, then a nuke hit”), that’s something important to learn.

LastOrder seems to have closely followed the Ape-X DQN formula from the Ape-X paper. They name the exact same set of techniques, although many other choices are possible. Presumably DeepMind knows what they’re doing.

LastOrder does not train with a reward “I won/I lost.” That’s very little information and appears long after the actions that cause it, and it would leave learning glacially slow. They use reward shaping, which means giving a more informative reward number that offers the learning system more clues about whether it is going in the right direction. They use a reward based on the current game score.

the network itself

Following an idea from the 2015 paper Deep Recurrent Q-Learning for Partially Observable MDPs by Hausknecht and Stone, the LastOrder team layered a Long Short-Term Memory network in front of the DQN. We’ve seen LSTM before in Tscmoo (at least at one point; is it still there?). The point of the LSTM network is to remember what’s going on and more fully represent the game state, because in Brood War there is fog of war. So inputs go through the LSTM to expand the currently observed game state into some encoded approximation of all the game state that has been seen so far, then through the DQN to turn that into an action.

The LastOrder paper does not go into detail. There is not enough information in it to reproduce their network design. The Actor and Learner code is in the repo. I haven’t read it to see if it tells us everything.

Taken together it’s a little complicated, isn’t it? Not something for one hobbyist to try on their own. I think you need a team and a budget to put together something like this.

LastOrder and its macro model - general info

LastOrder (github) now has a 15-page academic paper out, Macro action selection with deep reinforcement learning in StarCraft by 6 authors including Sijia Xu as lead author. The paper does not go into great detail, but it reveals new information. It also uses a lot of technical terms without explanation, so it may be hard to follow if you don’t have the background. Also see my recent post how LastOrder played for a concrete look at its games.

I want to break my discussion into 2 parts. Today I’ll go over general information, tomorrow I’ll work through technical stuff, the network input and output and training and so on.

The name LastOrder turns out to be an ingenious reference to the character Last Order from the A Certain Magical Index fictional universe, the final clone sister. The machine learning process produces a long string of similar models which go into combat for experimental purposes, and you keep the last one. Good name!

LastOrder divides its work into 2 halves, “macro” handled by the machine learning model and “micro” handled by the rule-based code derived from Overkill. It’s a broad distinction; in Steamhammer’s 4-level abstraction terms, I would say that “macro” more or less covers strategy and operations, while “micro” covers tactics and micro. The macro model has a set of actions to build stuff, research tech, and expand to a new base, and a set of 18 attack actions which call for 3 different types of attack in each of 5 different places plus 3 “add army” actions which apparently assign units to the 3 types of attack. (The paper says 17 though it lists 18. It looks as though the mutalisk add army action is unused, maybe because mutas are added automatically.) There is also an action to do nothing.

The paper includes a table on the last page, results of a test tournament where each of the 28 AIIDE 2017 participants played 303 games against LastOrder. We get to see how LastOrder scored its advertised 83% win rate: #2 PurpleWave and #3 Iron (rankings from AIIDE 2017) won nearly all games, no doubt overwhelming the rule-based part of LastOrder so that the macro model could not help. Next Microwave scored just under 50%, XIMP scored about 32%, and all others performed worse, including #1 ZZZKBot at 1.64% win rate—9 bots scored under 2%. When LastOrder’s micro part is good enough, the macro part is highly effective.

In AIIDE 2018, #13 LastOrder scored 49%, ranking in the top half. The paper has a brief discussion on page 10. LastOrder was rolled by top finishers because the micro part could not keep up with #9 Iron and above (according to me) or #8 McRave and above (according to the authors, who know things I don’t). Learning can’t help if you’re too burned to learn. LastOrder was also put down by terrans Ecgberht and WillyT, whose play styles are not represented in the 2017 training group, which has only 4 terrans (one of which is Iron that LastOrder cannot touch).

In the discussion of future work (a mandatory part of an academic paper; the work is required to be unending), they talk briefly about how to fix the weaknesses that showed in AIIDE 2018. They mention improving the rule-based part and learning unit-level micro to address the too-burned-to-learn problem, and self-play training to address the limitations of the training opponents. Self-play is the right idea in principle, but it’s not easy. You have to play all 3 races and support all the behaviors you might face, and that’s only the starting point before making it work.

I’d like to suggest another simple idea for future work: Train each matchup separately. You lose generalization, but how much do production and attack decisions generalize between matchups? I could be wrong, but I think not much. Instead, a zerg player could train 3 models, ZvT ZvP and ZvZ, each of which takes fewer inputs and is solving an easier problem. A disadvantage is that protoss becomes relatively more difficult if you allow for mind control.

LastOrder has skills that I did not see in the games I watched. There is code for them, at least; whether it can carry out the skills successfully is a separate question. It can use hydralisks and lurkers. Most interestingly, it knows how to drop. The macro model includes an action to research drop (UpgradeOverlordLoad), an action to assign units and presumably load up for a drop (AirDropAddArmy), and actions to carry out drops in different places (AttackAirDropStart for the enemy starting base, AttackAirDropNatural, AttackAirDropOther1, AttackAirDropOther2, AttackAirDropOther3). The code to carry out drops is AirdropTactic.cpp; it seems to expect to drop either all zerglings, all hydralisks, or all lurkers, no mixed unit types. Does LastOrder use these skills at all? If anybody can point out a game, I’m interested.

Learning to when to make hydras and lurkers should not be too hard. If LastOrder rarely or never uses hydras, it must be because it found another plan more effective—in games you make hydras first and then get the upgrades, so it’s easy to figure out. If it doesn’t use lurkers, maybe they didn’t help, or maybe it didn’t have any hydras around to morph after it tried researching the upgrade, because hydras were seen as useless. But still, it’s only 2 steps, it should be doable. Learning to drop is not as easy, though. To earn a reward, the agent has to select the upgrade action, the load action, and the drop action in order, each at a time when it makes sense. Doing only part of the sequence sets you back, and so does doing the whole sequence if you leave too much time between the steps, or drop straight into the enemy army, or make any severe mistake. You have to carry through accurately to get the payoff. It should be learnable, but it may take a long time and trainloads of data. I would be tempted to explicitly represent dependencies like this in some way or another, to tell the model up front the required order of events.

looking at ISAMind

As reported, ISAMind is a fork of Locutus, and the only important difference is that ISAMind can predict the opponent’s opening plan using a trained neural network instead of the rule-based method that Locutus inherited from Steamhammer (and modified). The author of Locutus has conveniently made a branch for ISAMind, so that we can compare and verify the differences.

The neural network is a standard feedforward network trained by backpropagation (we can see this in the network data in the file ISAMind.xml). The computation is implemented using OpenCV, a computer vision library that includes some general-purpose machine learning tools. I’ve looked at OpenCV in the past, and I think it is a good choice.

The input to the network is the frame count and the counts of any early game units seen, plus any opening plan recognized already so the network can decide whether to stick with it, plus the high-level features of the rule-based recognizer for proxy, worker rush, factory tech, and number of known enemy bases. With those high-level features, the network doesn’t have much thinking to do. The output is one of 10 possible opening plans, and if none is recognized it falls back on the rule-based recognizer again. In the best case, this neural network can’t provide much of a boost.

The key question is: How was the network trained? I looked all around and found no sign of an explanation. It could be trained to reproduce the values returned by the original rule-based method, but what would be the point? It could be trained to recognize the plan that leads to the highest win rate, but that would be expensive and might learn to deliberately misrecognize plans, so that’s less likely. It could have been trained on data from games with opponents that were coded to play specific plans, but that would risk lack of variety in the training data. It could have been trained on hand-labeled data, if they took the time to label that much data. My best guess is that they wrote a replay analysis tool that labels the training data “here is what I saw” with the plan, “here is what I should recognize,” that would have been chosen if the scout saw everything. The scout is early and normally does see everything that is in the enemy base (not proxied in some hidden location or built later outside the main), so if my guess is right, the trained network should normally be accurate.

I ran ISAMind locally to see what the predictions looked like. My impression is that the predictions were generally accurate, but sometimes sluggish. Steamhammer recognizes a zealot rush if it sees 2 gateways and no gas. ISAMind doesn’t recognize a zealot rush until it sees the zealots. Whether the sluggishness is harmful depends on whether the bot needs to react quickly. ISAMind recognizes a fast rush at the first sign, and that is the most important enemy plan to recognize immediately. Most input information has to come from the scout probe, and the scout probe circles the base without ever looking outside, so the network rarely sees enemy expansions unless it passes them on the way in; the input data is not complete enough to recognize all the possible plans reliably, because scouting is not thorough enough. (Locutus has a commit “Scout the enemy natural” only later, on 19 September.)

Overall, ISAMind seems like a cute little job that doesn’t try to accomplish much. It is like a class project or an early experiment to start getting tools and ideas into place.

looking at TitanIron

TitanIron is, as all signs indicated, a fork of Iron. It forks from the latest Iron, the AIIDE 2017 Iron. The Iron that played in CIG 2018 was carried over from the previous CIG 2017 tournament, and is an earlier version.

#15 TitanIron crashed in 30% of its games. Its win rate was 51.46% overall, or 73.59% in non-crash games. #6 Iron itself (an earlier version) finished with 74.31% win rate, so TitanIron does not seem to be an improvement, even discounting poor code quality. Curious point: #9 LetaBot upset Iron, because LetaBot copes well with vulture and wraith harassment. But TitanIron upset LetaBot. Another curious point: TitanIron performed poorly on the map Andromeda and strongly on Destination, and about equally well on the other 3 maps. Andromeda seems a surprising map to have trouble with.

I watched some replays. In Iron-TitanIron games, the two played identical build orders until the first factory finished, when Iron made 1 vulture first while TitanIron immediately added a machine shop to get the vulture upgrades faster. The bigger difference came later, when Iron built a starport and made wraiths while TitanIron did not. I got the impression that TitanIron rarely or never goes air. The expense of going air puts TitanIron ahead in vultures for a while, so that it won some games, but it seemed that if the vulture pressure did not push Iron over the edge, then Iron would strike back and take the advantage. I watched only 1 game Locutus-TitanIron, because Locutus’s proxy pylon trick misled TitanIron just as it does Iron, and Locutus won easily. I watched a strange game against AIUR where TitanIron built a second command center far from its natural, slowly floated it over, left it in the air, and built a new command center underneath. Not all the bugs are crashing bugs. In the picture, TitanIron is losing to AIUR. Notice the nicely spaced tanks, the spider mines directly next to one tank, the barracks floating in an unhelpful position, and the spare command center in the air.

extra command center

Overall, my impression is that TitanIron’s play is often similar to Iron’s. Unlike Iron, it does not make air units (it seems to have drop skills, but I didn’t run into any games with drop). Against protoss, TitanIron makes more tanks and uses them more cautiously and often clumsily. TitanIron also seems a bit fonder of expanding and growing its economy.

TitanIron adds over 4,000 lines of code to Iron. It was made by a team of 10, so that’s not an excessive amount of new code. The crash rate and the score suggest that the team was not disciplined enough in code quality and testing (of course Steamhammer crashed even more, so I don’t get to brag). Read on and you’ll see what most of the new lines of code do. I question the choices of where to spend effort. I’m not sure what the plan behind TitanIron was supposed to be.

openings

Iron does not play different openings as such. Conceptually, I see Iron as playing one opening which it varies reactively. TitanIron adds a directory opening with code which allows it to define specific build orders. The build order system is loosely modeled on Steamhammer’s, using similar names (which are not the same as UAlbertaBot’s names)—some members of the team have worked on Steamhammer forks.

TitanIron knows 3 specific build orders, named 8BB CC (1 barracks expand), SKT (tanks first), and 5BB (marines). Based on watching replays, TitanIron retains and uses Iron’s reactive opening, with modifications.

opponent-specific strategies

Iron does not recognize opponents by name. TitanIron recognizes 2 specific opponents: Locutus and PurpleSwarm. The zerg PurpleSwarm is a curious choice, since it did not play in CIG. Maybe they found it an interesting test opponent? In any case, Locutus is the main focus. It is recognized in 4 strategy classes, Locutus, SKT, TankAdvance, and Walling. In Iron’s codebase, any number of strategies can be active at the same time, and other parts of the code check by name which strategies are active to suit their actions to the situation.

	Locutus::Locutus()
	{
		std::string enemyName = him().Player()->getName();
		if (enemyName == “Locutus” || enemyName == “locutus”)
		{
			me().SetOpening(“SKT”);
			m_detected = true;
		}
	}

SKT (defined in opening/opening.cpp) builds a barracks and refinery on 11, then adds 2 factories and gets tanks before vultures. It sounds as though it should refer to the “SK terran” unit mix of marines and medics with science vessels and no tanks, but it doesn’t. The Locutus strategy turns itself off (if I understand the code’s intent correctly) after all 4 dark templar of Locutus’s DT drop are dead, or after frame 13,000. Various buildings (barracks, factory, e-bay, turret) recognize when the Locutus strategy is active and carry out scripted actions. The name “Locutus” also activates the TankAdvance strategy which seems to first guard the natural and then perform a tank push, and deactivates the Walling strategy after frame 11,000 or when above 12 marines, causing the barracks to lift and open the wall.

TitanIron scored a total of 1 win out of 125 games against Locutus, so the special attention does not seem to have paid off.

PurpleSwarm gets less attention. (The question is why it got any.)

	Purpleswarm::Purpleswarm()
	{
		std::string enemyName = him().Player()->getName();
		if (him().Race() == BWAPI::Races::Zerg &&
			(enemyName == “Purpleswarm” || enemyName == “purpleswarm” || enemyName == “PurpleSwarm”))
		{
			me().SetOpening(“5BB”);
			m_detected = true;
		}
	}

5BB (also defined in opening/opening.cpp) builds barracks on 10 and 12, later adding a third barracks and training marines up to 30. I don’t see any other cases where TitanIron uses this opening. The rest of the code has no special instructions for PurpleSwarm or 5BB.

other new files

Besides the opening directory, TitanIron adds 16 files in the strategy and behavior directories, defining 8 strategies and behaviors. The added strategies are:

  • GuardNatural
  • Locutus
  • PurpleSwarm
  • SKT
  • TankAdvance

These are remarkable for being all and only the classes used when Locutus or PurpleSwarm is recognized. Do they have any other purpose? I didn’t dig into it, but I suspect that GuardNatural and TankAdvance may be used more widely against protoss.

The added unit behaviors are:

  • GuardLoc - guard a location
  • HangingBase - carry out drops
  • SKTAttack - related to SKT

GuardLoc has some connection with GuardNatural, but seems to be a general-purpose behavior, as far as I can tell. I’m not sure how HangingBase got its name.

The new opening directory and the newly added strategy and behavior files account for about 2/3rds of the lines of code added to Iron. The rest is scattered through the code and not as easy to inventory, but surely much of it must be uses of the new openings, strategies, and behaviors. I do see a lot of changes related to expanding.

AIIDE 2017 the learning bots

In March 2016 I analyzed which bots learned during the AIIDE 2015 tournament by looking at the data files. Here’s a similar analysis for AIIDE 2017.

I looked at the “write” directory for each bot, to see if it wrote files there, and if so, what the files looked like. Writing data doesn’t mean that the bot actually learned anything—it may not have used the data. Bots not listed in the table did not write anything interesting (maybe log files, nothing more). The table includes 15 bots of the 28 entrants, over half of them.

#botinfo
1ZZZKBotvaried info about each game, including tidbits like the time zone and the processor it was played on
2PurpleWavefor each opponent, a log of games with info including the sequence of strategies followed
5Microwavesame format as UAlbertaBot (Microwave has more strategies)
6CherryPiopening data for each opponent
9Tyrfor each opponent, seems to save info only about the previous game: win or loss, and a flag or two like "FLYERS" or "CANNONS"
11AILien10 values per opponent: scores for zerg unit types, a few odds and ends like macroHeavyness and supplyISawAir
12LetaBotone file which records a log of games, with opponent, race, map, and 3 numbers per game
14UAlbertaBota file for each opponent, giving for each strategy the count of wins and losses; learning was turned on this year
15Aiur91 values per opponent: strategy x map size
17Skyneta file for each opponent, with lines like "build_0_2 14 12"
19MegaBotmany files; the important ones seem to be "MegaBot-vs-[bot name].xml" which give scores for each bot MegaBot can imitate: Skynet, NUSBot, Xelnaga
20Xelnagaa file for each opponent with a single number: -1, 0, 2, or 3
21Overkillmany files with neural network data, reinforcement learning data, and opening learning data for each opponent (more than I thought!)
24Myscbotsame format as UAlbertaBot, but only 1 strategy was played for each opponent; nothing was learned
25HannesBredberg2 numbers per opponent, approximately (not exactly) the win and loss counts

LetaBot seems worth looking into, to see whether its log is learning data and, if so, how it is used. PurpleWave also recorded data essentially as a log, which could be used for a wide range of purposes. And AILien has a unique learning method that I should spend some time on.

UAlbertaBot had learning turned on this year. It has sometimes left learning off because its default strategies were dominant. It’s also notable that Ziabot skipped learning this year. In the past it has learned. Ziabot also finished last.

Next: What AIUR learned.

looking at cpac

Cpac is a Steamhammer fork, so it’s easier for me to evaluate. On the other hand, the code has been reformatted, so a diff only says “everything’s different.”

Overall, cpac is very similar to Steamhammer, but corrects critical weaknesses. I think the author chose accurately which weaknesses were important to correct, and cpac should play clearly better than Steamhammer in many situations. It implements a number of ideas that I thought of or planned, always in a completely different way than I would have. It also removes some, but not all, code to support terran and protoss.

The most interesting changes are to the strategy.

strategy changes

The headline change is that, when cpac is out of book, it chooses the next unit to build using a neural network which was learned offline. It replaces Steamhammer’s long list of strategy rules with 2 calls, to extract the features of the game situation and feed them through the network to choose the next unit to make. Well, it makes an exception of the opponent is rushing; then it chooses an emergency unit by a set of emergency rules.

BuildOrder & StrategyBossZerg::freshProductionPlan()
{
	_latestBuildOrder.clearAll();
	updateGameState();
	// We always want at least 9 drones and a spawning pool.
	if (rebuildCriticalLosses())
	{
		return _latestBuildOrder;
	}
	ExtractFeature();
	predictUnit();

	return _latestBuildOrder;
}

According to the readme file, it was trained against the dataset of the paper A Dataset for StarCraft AI & an Example of Armies Clustering. The data comes from replays of expert human games. The network can choose a unit of one of 8 types, and the drone is not included. (Whether to make a drone or something else is the most important zerg macro decision, if you ask me.)

I suspect that the readme may not give the full picture. In games where cpac gets zealot rushed, I notice that it plays sunken defense into lurkers, which works against bots but does not happen in expert human play. I think either the database includes some extra data to cover that case, or else there is an exception that I didn’t notice which applies different rules.

Cpac is able to recover from some setbacks after the opening by applying emergency rules. Other mishaps leave it helpless. For example, in this horrible game versus Overkill, cpac lost most of its drones and never rebuilt beyond 9, because if you have 9 drones it’s “not an emergency.” In general, cpac’s play becomes blander and more predictable as the game continues.

Here’s a funny game against HannesBredberg showing good and bad unit mix decisions.

tactical and micro changes

I don’t think I found all the tactical and micro changes. Here are a few of the things I did find.

Cpac adds a new file MicroFlyers which is responsible for mutalisks and scourge. Among other things, it greatly improves on Steamhammer’s scourge micro and helps mutalisks fly around cannons.

Cpac hardcodes mutalisk targets for specific maps in StateManager. The idea seems to be that if the opponent is protoss and has cannoned up one base, the mutas should fly specific paths to find other bases that may be undefended. It fixes a critical Steamhammer weakness (“Cannons! My way is blocked, I can’t get past them!”) in a quick and dirty way.

Cpac can position sunkens toward the enemy, a longstanding Steamhammer weakness that most forks fix on their own. (I’m waiting for the future building placement planner.)

the openings

Cpac plays fixed openings against 9 opponents. Some of the openings are general purpose, and some (you can tell by the names alone) are custom tailored.

  • UAlbertaBot - OverpoolSpeedDave
  • Steamhammer - 5PoolHard
  • Aiur - 5PoolHard
  • Ximp - 2HatchMutaXimp
  • Skynet - 5PoolHard
  • MegaBot - 5PoolHard
  • Microwave - ZvZ_Overpool9Gas
  • ZZZKBot - 9PoolSpeedExpo
  • McRave - 2HatchMutaMcRave

The ZZZKBot setting backfired when ZZZKBot did not play as expected. The other settings successfully exploited weaknesses. The prepared openings made a major contribution to cpac’s #4 finish. Opponent modeling, I can’t possibly forget, is critical for long tournaments.

Against other opponents, cpac played randomly chosen openings in the Steamhammer style. Cpac trimmed out weaker openings, but otherwise made not too many changes to the random weights and the build orders—except that against terran it chose a 1 hatch lurker opening 90% of the time, instead of occasionally.

Cpac also includes special case strategy code that specifically recognizes UAlbertaBot’s marine rush and zealot rush.

bool StateManager::beingZealotRushed()
{
	auto enemyName = BWAPI::Broodwar->enemy()->getName();
	if (enemyName == "UAlbertaBot")
	{
...

conclusion

Cpac is the highest-placing bot with a fancy machine learning feature. ZZZKBot and PurpleWave have, as far as I know, only simple learning. And... I think it’s a reminder of how hard it is to get machine learning to work well in Starcraft. I’m not convinced that the network adds much to cpac’s skill.

If you ask me, network trained on human games is not appropriate for a bot. When you train a system to copy human play without understanding it, it doesn’t matter what the game is—the system will learn a shallow imitation of human play, not the real thing. It will make a lot of mistakes. AlphaGo trained on human games only as a speedup trick; the real smarts came from the followup training by self-play, which forces the system to understand its own moves in relation to its abilities.

If you also train the network to play only part of the game, only choosing from among 8 combat units and skipping over drones and buildings and research, then its understanding of what it is doing must be paper thin.

Most of cpac’s skill, I think, came from fixing important Steamhammer weaknesses and from manual special case coding and configuration for maps and opponents.

By the way, cpac comes with an MIT license, so its code is legally available to copy. To comply with the license is simple.

Next: Overkill.

looking at CherryPi

There’s a ton of code in CherryPi, more than I can read in a day. I tried to pick out key parts to look at.

blackboard architecture

This comment from the file CherryPi/src/upc.h explains an important part of CherryPi’s high-level architecture.

/**
 * Who, where, what action tuple (unit, position, command).
 *
 * UPCTuples are used for module-to-module communication (via the Blackboard).
 * Posting a UPCTuple to the Blackboard is akin to requesting a specific action
 * from another module, and consuming a UPCTuple from the Blackboard implies
 * that the consumer implements this action. The implementation can also consist
 * of refining the UPCTuple so that a more lower-level module can actually
 * execute it.
 *
 * For example, a build order module might post a UPCTuple to create a certain
 * unit type, a builder module might wait until their are sufficient resources
 * before consuming it and then select a worker and a location. The
 * UPCToCommandModule takes care of translating executable UPCTuples (with sharp
 * unit, position and command entries) to actual game commands.
 */

I think it’s an excellent architectural choice, especially for a project carried out by a team rather than an individual. Communication between modules is managed in large part automatically by software rather than manually through the calling conventions of each module. It’s flexible and easy to modify and extend.

relation to Tscmoo

People have speculated that CherryPi may borrow a lot from Tscmoo the bot, since Tscmoo the person is on the team. The speculation even made it into ZZZKBot’s source code, as we saw a couple days ago. I compared 2 slices of code that do similar jobs in CherryPi and the CIG 2017 version of Tscmoo. I looked at the combat simulator in both, and code implementing the 1 hatch lurker opening in both.

Note well: If I had looked at different code, I might have drawn different conclusions. I deliberately selected code with related purposes that might be connected. In some places, CherryPi uses ideas from the old BroodwarBotQ that was written up in Gabriel Synnaeve’s PhD thesis.

1. I think CherryPi directly copied nothing from Tscmoo. I didn’t expect it to. The overall architecture was likely decided before Tscmoo the person joined the team. Besides, an academic usually wants credit to be clear, and a corporation usually wants ownership to be clear. The code in detail looks quite different.

2. In the parts I looked at for this comparison, some structure and ideas in Tscmoo were carried over and seemingly reimplemented in CherryPi, with (I should repeat) great differences in detail. It’s clear that somebody familiar with Tscmoo wrote this CherryPi code. For example, in the combat simulator one has addUnit() and run() in that order, and the other add_unit() and run() in that order. They both refer to “teams”, both count up frames from 0 (I would have counted up from the current frame, some would have counted down to 0), and other shallow similarities.

3. CherryPi, in the parts I compared, seems to be simpler and more cleanly written. In the lurker opening in particular, I think CherryPi encodes the opening a little more abstractly. Sometimes Tscmoo has more features. Tscmoo’s combat simulator simulates splash damage, and CherryPi’s does not.

4. OpenBW is another source of ideas, and it is of course also connected with Tscmoo the person. For example, the FogOfWar class says it is based on OpenBW. It calculates visibility depending on ground height and so on.

the openings

I always want to know, “what openings does it play?” In the directory CherryPi/src/buildorders I see 16 classes that look like they could be build orders. The opening learning files include 15 build orders. The in_use.txt file lists these 8 build orders as active or possibly active:

  • 12hatchhydras
  • zvp10hatch
  • 5pool
  • 2basemutas
  • 3basepoollings
  • 1hatchlurker
  • meanlingrush (9 pool speed)
  • ximptest (it says this one is “unknown status”)

I will watch games and find out what openings it plays in practice. Come back tomorrow!

As a sample of how openings are defined, here is a snip from the file CherryPi/src/buildorders/meanlingrush.cpp showing the basic definition of 9 pool speed:

    buildN(Zerg_Drone, 9);
    buildN(Zerg_Extractor, 1);
    buildN(Zerg_Spawning_Pool, 1);
    if (countPlusProduction(st, Zerg_Hatchery) == 1) {
      build(Zerg_Hatchery, nextBase);
      buildN(Zerg_Drone, 9);
    }

It writes on the blackboard: Make drones until you have 9, extractor and spawning pool, then add a second hatchery at an expansion and rebuild the drones to 9. Simple and concise. Details like spawning the overlord and figuring out exactly when to start the second hatchery are left for other code to fill in (in Steamhammer, you have to specify it explicitly). On the other hand, here is how it says to collect only 100 gas to research zergling speed:

    if (hasOrInProduction(st, Metabolic_Boost) || st.gas >= 100.0) {
      state->board()->post("GathererMinGasGatherers", 0);
      state->board()->post("GathererMaxGasGatherers", 0);
    } else {
      state->board()->post("GathererMinGasGatherers", 3);
      state->board()->post("GathererMaxGasGatherers", 3);
    }

More writing on the blackboard. That’s a complicated test, where in Steamhammer you’d simply specify "go gas until 100". It’s fixable. They could, for example, write goals to the blackboard like “collect 100 gas for zergling speed” and have another module collect only enough gas to meet the goals.

machine learning

I’ll take two cases, online learning during the tournament, and offline learning before the tournament starts, producing data that can be fed to or compiled into the bot.

For online learning, the win rate over time graph for CherryPi shows a rapid increase in win rate from .4 to .7 within the first 10 rounds, then a gradual slight decline to the end of the tournament. It looks as though CherryPi rapidly learned how to play against each opponent, then more or less froze its decisions and allowed slower-learning bots to catch up a tiny bit. (Though swings in score in early rounds can also be due to statistical noise.) The readme file says:

CherryPi is a TorchCraft Zerg bot developed by Facebook AI Research.
It uses bandits to select learn strategies that work against a given
opponent.

“Bandits” refers to the n-arm bandit problem, which is behind most bots with opening learning. Looking at the file CherryPi/src/models/bandit.cpp, I see that that is exactly what CherryPi is doing too. It uses the classic UCB1 algorithm to learn which opening to play against each opponent, just like many other bots.

I looked at the opening learning files, one for each opponent. They are in JSON format and are written by a general-purpose serializer that leaves the data a little hard to interpret by eye. It looks like value2 maps between the 15 opening names and 15 opening index numbers. value3 is 15 zeroes, and value4 and value5 are the learned data for the 15 indexes 0 through 14.

The only offline learning that I found is the same opening learning, performed for certain opponents ahead of time.

  • Iron
  • LetaBot
  • Skynet
  • Xelnaga
  • ZZZKBot

I can’t guess how they came up with that set of 5 opponents to pre-learn openings against. For these opponents, CherryPi relied on its offline learning exclusively; it did not write new learning data for these opponents. It’s such a strange decision that I have to wonder whether it’s a bug. In any case, we saw yesterday that it backfired against ZZZKBot, which did not play as expected: Unable to learn, CherryPi played the same unsuccessful opening every time, and lost over and over. Both ZZZKBot and CherryPi had incorrect prior knowledge about each other, and only ZZZKBot adapted.

conclusion

It is clear to me that CherryPi the project is not far along compared to where they are aiming. There are plenty of TODOs in the code. The big machine learning ideas that (if successful) could make CherryPi superhuman are not there yet; only some foundations are laid. CherryPi is still a scripted bot like others, not a machine learning bot. Even so, with (as I understand it) 8 people on the team, they have done a tremendous amount of work. They implemented ideas—most of which I didn’t write about—that I wish I had time to do myself. If they can maintain the rate of progress, then within a few years individual coders won’t be able to keep up. On the other hand, progress may slow when they get to the hard part. We’ll have to stay tuned!

Next: Games by CherryPi.

looking at ZZZKBot

I read the source of the AIIDE 2017 version of ZZZKBot. It comes with a github repository but at the moment the repo is 2 years out of date.

As a reminder, #1 ZZZKBot was upset by McRave and Tyr which open with cannons to stop most cheeses, and by LetaBot which has extensive work to recognize and respond to different cheeses. ZZZKBot had a plus score against every other opponent, including #2 PurpleWave and #3 Iron. Interestingly, ZZZKBot scored only a little better than even against #26 Slin.

coding style

ZZZKBot is written in a straight-line style, as if subroutines had not been invented. Most of the code is in the onFrame() method. It’s kind of hard to read.

At one point, ZZZKBot checks the name that it is running under using Broodwar->self()->getName(), and if it is not exactly “ZZZKBot” it sets transitionOutOf4PoolFrameCountThresh to 0. The effect depends on the strategy, but it looks as though it usually switches out of its cheese build into late-game play. I have to suppose it is an obfuscation. [Update: Comments suggest that the reason is to avoid lazy copying.] Anyway, be aware that the bot may behave differently depending on the name it is playing under.

strategies and learning

This version of ZZZKBot has 4 strategies.

  • its old standard 4 pool
  • 9 pool speedlings
  • a 1-hatchery hydralisk rush with 11 drones
  • sunken defense into mutalisks

Besides the basic strategy choice, ZZZKBot keeps a few variables that control details of the strategy’s execution.

        bool isSpeedlingPushDeferred;
        bool isEnemyWorkerRusher;
        bool isNumSunkensDecidedAfterScoutEnemyRace;
        int numSunkens;

ZZZKBot also has hardcoded default strategy data for specific opponents, so that against a known bot it can choose a presumed good strategy on the first game. It has values for 17 of the 28 bots in AIIDE 2017, including itself (so 16 of the 27 possible opponents). Considering how thorough the list is, I suspect that Chris Coxe tested against every available bot and manually chose strategies against the ones that ZZZKBot did not defeat on the first attempt. (Against itself it sets all strategy variables to false—I didn’t check what that does.)

  • UAlbertaBot
  • ZZZKBot itself
  • XIMP
  • GarmBot
  • Overkill
  • CherryPi (comment “Guessing it is like tscmooz...”)
  • Ziabot
  • Skynet
  • MegaBot (considered the same as Skynet)
  • Steamhammer
  • Arrakhammer
  • Microwave
  • KillAll
  • ForceBot
  • Juno
  • Iron
  • HannesBredberg

After each game, ZZZKBot may update its strategy data and save the result to a file. The algorithm is long and includes random choices, and I didn’t try to puzzle it out. There are special provisions for playing against random; it checks when the opponent’s race was discovered to see whether the stored info will be useful for the current game. It also, oddly, saves information about the processor it is running on.

scouting a zerg opponent

When your opponent is zerg, a couple tricks can make scouting easier. One is that if you spot the enemy’s first overlord, you may be able to infer the location of the enemy base. I added this knowledge to Steamhammer for the next version, and it frequently helps. Another is that you do not have to scout the enemy hatchery itself. The base is found when you see the creep, and a scouting unit can turn aside as soon as it sees bare ground where creep would be if the base were there. I didn’t add this knowledge to Steamhammer (yet), because it’s tricky. For one thing, I’m not sure of the exact rules for creep spreading; the way unbuildable ground blocks creep is not obvious (commonly creep extends beyond the geyser on both sides but not around it, leaving a notch). For another, a map could have neutral zerg buildings that spread static creep, and BWAPI won’t tell you where the static creep is; an example is Colosseum (though on that map the static creep doesn’t affect scouting). The complications were enough to keep the idea off the top of my priority list.

ZZZKBot implements both of these tricks. How does ZZZKBot know where the creep spreads to for each base location? Simple: It hardcodes the data for each starting position of all 10 maps!

ZZZKBot also tries to guess the location of the enemy base from other enemy units it sees. As a cheeser, ZZZKBot scouts early, so it has a good chance of guessing right.

a comment from the code

            // TODO: support making more than one building of a particular type.

Waste no time on the unimportant!

conclusion

ZZZKBot is very specific and narrowly coded. Its sole purpose is to defeat other bots, especially opponent bots that it knows playing on maps that it knows. Everything possible is hardcoded, above all prior knowledge about opponents and maps. ZZZKBot teaches strict lessons to its opponents, but there are not many lessons in the bot itself. It’s about the cheesiest, most exploitative bot possible.

If ZZZKBot had stuck with its classic 4 pool, I’m convinced that it would have finished in 4th place or worse rather than first. Too many of its opponents in this tournament were ready for the 4 pool (I know Steamhammer was). Next year, I expect that top bots will be prepared for all 4 of ZZZKBot’s strategies this year (it won’t be difficult), and ZZZKBot will have to go to even greater lengths if it is to finish strongly. I see it as a sign of progress: Even the most exploitative bot is forced to play multiple strategies and to rely on machine learning—it is forced to use smart methods to play better. The space for cheap exploits is gradually narrowing.

Next: Looking at how ZZZKBot played against different opponents.

Overkill AIIDE 2017 updates

PurpleWaveJadian pointed out in a comment that Sijia Xu had updated Overkill’s repository, from last year’s version to bring it up to date with the AIIDE 2017 version of Overkill. There are extensive changes. The bot looks substantially more capable.

Here’s what caught my eye. This is based on a quick read through a long list of diffs, so I may have missed or misunderstood a lot.

• New machine learning library with a couple different learning methods.

• Support for lurkers, scourge, guardians, devourers, ultralisks—all the regular zerg combat units. Only queens and defilers are omitted, the spell units. Last year it supported only zerglings, hydralisks, and mutalisks.

• Major changes in tactical calculations. Well, there would have to be to support new units.

• Changes to building construction. I’m not sure what the idea is here.

• It detects production deadlocks, to avoid freezes. The method is written to be simple and general-purpose rather than extensive and tweakable like Steamhammer’s, so it may be worth a look. See ProductionManager::checkProductionDeadLock().

• Scouting seems to be completely rewritten. Overlords scout around using an influence map. It looks like 1 zergling can be singled out to be a scout zergling.

• Overkill 2016 learned a model of when to make zerglings, hydralisks, or mutalisks. The actions in this Overkill version include the combat units except for guardians and devourers, sunken colonies, tech buildings to construct, upgrades and tech to research, expanding, attacking versus harrassing with mutalisks, and doing nothing. So it not only has a wider scope for strategy with more unit types, it looks as though it learns more of its strategy. I see declarations for 4 neural networks, which apparently cooperate to do different parts of the job.

• It chooses from among the same 3 openings as before: 9 pool, 10 hatch, 12 hatch. The details of the build orders have been revised.

How does it play? Well, I only looked at the source, and not too closely. Overkill first got a learning model in the CIG 2016 tournament. Here are its historical tournament results:

tournament#%
CIG 2015381%
AIIDE 2015381%
CIG 2016 qualifier471%
CIG 2016 final551%
AIIDE 2016762%
CIG 2017760%

The CIG 2016 qualifier is a better comparison to the other tournaments than the final, since it includes all entrants. The CIG 2017 entry is probably not much different from the AIIDE 2017 one. [Update: Not so. CIG says that the CIG 2017 was the same as the CIG 2016 version. See comments.] It seems that Overkill’s win rate has been falling as it learns more on its own. Is that a sign of how hard it is to do strategy learning? I would like to see it play before I draw conclusions!

Next: I’ll look at Overkill’s CIG 2017 replays and see what I can see.

the big BOSS bug with zerg

UAlbertaBot includes BOSS, the Build Order Search System, which accepts goals like “I want 12 zerglings and the speed upgrade” and calculates how to achieve them as quickly as possible, including creating the spawning pool and extractor if necessary, adding supply, building the ideal number of workers, and the exact spending sequence for everything. It does make approximations, but the build orders it produces should be close to optimal for the goal you give. UAlbertaBot by default uses BOSS to oversee its production as soon as it gets out of its opening build order.

Unfortunately BOSS also has limitations and bugs. If you give it a complex goal, it can’t run in real time (though you can still use it to create opening builds offline). And for zerg, BOSS has the bug that it will sometimes throw in an extra building that is not implied by your goal, usually an extractor, hydralisk den, or lair. I only wanted zerglings, why are you giving me a hydra den?

Blog reader Micky Holdorf looked into that bug and found the cause. Thanks for the report!

BOSS wants to search efficiently, so given your goal, it calculates the prerequisites so that it doesn’t waste time searching actions that don’t contribute to the goal. It calls this the goalMax, the most of each thing that you might want in order to achieve the goal. For terran and protoss, BOSS recursively calculates all the prerequisites and carefully counts each. Here is the code for zerg, from DFBB_BuildOrderSmartSearch::setPrerequisiteGoalMax():

    else if (getRace() == Races::Zerg)
    {
        _goal.setGoalMax(ActionTypes::GetActionType("Zerg_Spawning_Pool"), 1);
        _goal.setGoalMax(ActionTypes::GetActionType("Zerg_Extractor"), 1);
        _goal.setGoalMax(ActionTypes::GetActionType("Zerg_Lair"), 1);
        _goal.setGoalMax(ActionTypes::GetActionType("Zerg_Spire"), 1);
        _goal.setGoalMax(ActionTypes::GetActionType("Zerg_Hydralisk_Den"), 1);
    }

That’s not a calculation, that’s a stand-in! This part of the code was never written in the first place. I wonder if there was a bug in the recursive calculation for zerg, and Dave Churchill put in a stub until it could be dealt with?

Anyway, the correct fix is to write that code. One workaround is to delete the setGoalMax calls for zerg and always pass in goals that include their own prerequisites; you have to remember that you might need a spawning pool, or notice that you’ve lost yours and explicitly put it in. Micky Holdorf found another workaround with the same effect, which tells BOSS not to worry about prerequisites at all. In DFBB_BuildOrderStackSearch::generateLegalActions() change the line

            if (!goal.getGoal(actionType) && !goal.getGoalMax(actionType))

to this, which tells it that only goals and not prerequisites are legal actions:

            if (!goal.getGoal(actionType))

I don’t have any plans to write a fix for Steamhammer. If somebody sends me one I’ll certainly consider it! But Steamhammer is moving away from BOSS, because BOSS answers the opening question “how do I get this stuff as fast as possible?” and not the middlegame question “how do I spend my resources efficiently?”

other changes to Overkill

Overkill AIIDE 2016 is not Overkill 2015 plus Q-learning. Overkill is extensively updated. Scanning quickly through the diffs, I spotted changes affecting:

  • pathfinding
  • influence mapping
  • handling of enemy cloaked units
  • reactions to enemy flying units
  • choke finding
  • scouting
  • assignment of squads to targets (I think)
  • behavior of irradiated units
  • behavior of overlords
  • prioritization of targets to attack
  • static defense, both ground and air
  • reactions to enemy bunkers and cannons

My list is not complete. I really want to see some games!

Next: Overkill wrapup.