BWAPI bots - 45 | Starcraft AI blog

tactical search in MaasCraft

MaasCraft is a protoss bot by Dennis Soemers which does a tactical search to decide how to maneuver its army. MaasCraft looks slightly above average. It finished 8th of 18 with a 59% win rate in AIIDE 2014 and 7th of 13 in CIG 2014 with 55% wins and hasn’t played since as far as I can tell.

Soemers’s paper on it is Tactical Planning Using MCTS in the Game of StarCraft (pdf BSc thesis from Games and AI Group, Maastricht University). You can get the code from the Starcraft AI Data Archive.

So, tactical search. “Tactical” here means maneuvering squads around the map and deciding when to engage or retreat. At the most basic level, search means finding alternatives and comparing them to pick one that is probably better. The comparison itself might involve a further search—and that is the root of all search algorithms. Search done right ought to outperform any scripted behavior, so our mission (should we choose to accept it) is to figure out how to do search right. This is the most interesting try I’ve seen so far.

The search nodes for tactical decisions have to include some representation of squads and squad positions on the map. In a chess program a search node represents the whole game state, but a specialized search in Starcraft should only represent the details it cares about. MaasCraft keeps it super abstract. The map is simplified to a graph with “nodes at each chokepoint and each potential base location,” the paper says, so that every location is just a node in the graph. A squad has a location, a speed, a hit point total, and a total damage rate (and presumably a number of units). A base is treated like a squad that can’t move (its static defense provides a damage rate). The paper doesn’t mention any provision for air units, but MaasCraft does mention air units in its code.

The moves for a squad that is not in combat are to stand pat or to move to an adjacent node in the location graph (where it may get into a fight). A squad that is already in combat can either keep fighting or retreat to an adjacent location. Moves take different amounts of time, and since each side can have any number of squads, many moves can be ongoing at once.

Finding the next state when you know the current state and the ongoing moves amounts to simulating part of the game. Movement is easy: A squad has a speed and the location graph has a distance. The course of a battle is estimated by Lanchester’s Square Law, which is a good estimate only for ranged fire and works better if all units in a squad are alike. I’m sure it’s less accurate than a battle simulator but it must be faster.

The search algorithm is one of the coolest bits. It is a variation of MCTS (Monte Carlo tree search) adjusted to cope with simultaneous moves of different durations. The next move down a branch is whichever move comes up next for whichever player. A squad that’s moving or retreating gets a new move when its current move finishes. If I understand it right, a squad that’s waiting or fighting gets an opportunity to change its mind when some other move finishes.

Algorithms in the MCTS family traditionally evaluate leaves of the tree by playing out one line of the game to the end. MaasCraft plays out up to one minute ahead and then applies a heuristic evaluation (not unlike AlphaGo). The evaluator gives points for destroying bases and harming armies, plus a tiny bonus to encourage moving toward the enemy.

How well does it work? Search should make play much more flexible and adaptive. A tactical search can plan sandwich maneuvers, or deliberately sacrifice a base to a strong enemy squad as long as it can defeat a weaker squad and take an enemy base in compensation. Looking into the future is what intelligence is. But this look into the future is simplified, so the question stands: How well does it work?

In the form interviewDennis Soemers wrote “I suspect the algorithm may be better suited for a more defensive and late-game strategy which is less reliant on perfect execution of the early-game. I also believe the algorithm would still be able to run sufficient simulations with a more fine-grained map representation and a more accurate Battle Model, since the bot is currently running on the bots ladder using less time per frame (I believe I use 35ms per frame in the competitions and only 20ms per frame on the ladder), with similar performance.”

That makes sense to me. In the early game, details matter; if you’re chasing the enemy scout, or fighting the first zealots with your first marines, it seems to me that you want to represent the situation in detail and search as exactly as you can. As armies get bigger, approximations are good enough and may be needed for speed.

By the way, code under the name “MCTSDennis” appears in LetaBot. Switches control whether it is active. The two bot authors are both from Maastricht University.

Iron meteor

Today Iron learned how to keep expanding. I watched it starve out ICEbot, which is exactly how a contain is supposed to work. It’s moving fast, and may yet go over 90% win rate.

Update: I also watched a tense close game against XIMP. For half the game XIMP had no probes and 30 minerals but 8 carriers, while Iron had many bases that it could not defend. Would the carriers have to stay home to resist the pressure, or could they relieve it enough to go marauding? XIMP won, but I was unsure until near the end.

recent developments

SSCAIT has binary bot downloads on the Bots & Results page. Binaries are of course most safely executed inside a virtual machine.

screenshot showing download links

Iron has continued to make progress but hasn’t yet lived up to its author’s hopes. It can now expand to its natural, though not beyond. Its game plan amounts to setting an elastic containment outside the enemy base, but its unit mix is not adaptive. An enemy that finds the right counter-mix and keeps pressure up can push the contain back and either break through or starve Iron out. Aggressive tscmoo zerg can’t be contained at all.

Tscmoo terran now knows how to kill tanks by laying mines next to them. The technique seems deadly effective. I also saw it chase down overlords with Napoleonic vigor, though only after the zerg was already toast.

Krasi0 and especially Tyr have both been playing more strongly after updates. I can’t pin down what they’re doing better, though.

Hmm, 4 terrans. Are we in for a terran renaissance?

precomputing a catalog of build orders

Today is an idea from the future work section of Dave Churchill’s thesis “Heuristic Search Techniques for Real-Time Strategy Games.” I don’t talk about it using the same words and I toss out a few side ideas that are not in the thesis, but the important points were all written down by Dave Churchill, author of UAlbertaBot. He even wrote up a preliminary experiment.

The idea is to, he wrote, “Perform a build-order search which does not attempt to achieve a given goal set of units, but instead attempts to maximize a given army value.” The BOSS build order search from UAlbertaBot can find near-optimal build orders if you tell it what units you want. It works in a general way, and with the right mods it can find build orders that, say, “make my army as big as possible at a given time, who cares what the units are” or “make my army beat this unit mix” or anything else you can turn into a number.

Of course if you ask something complicated it will take longer and you may not be able to run it in real time during a game. I think the natural course is to calculate a large catalog of good build orders (or even a catalog per map). Then on one side you can play the build orders, and on the other side match the opponent’s play to the catalog.

I’ll suggest a few recipes. You may think of tastier ones. Before we get cooking, here are the ingredients.

Ingredient: Build order search. You’ll have to insert your own evaluation function into the code. You may want to read the BOSS paper first and check whether any of BOSS’s optimizations and shortcuts gets in your way. Besides BOSS, I understand that tscmoo calculates build orders, though I haven’t looked into how it works and whether it would be good for cataloging.

Ingredient: Army comparison. An army does not have a strength in itself, it is strong or weak relative to the enemy. If you want to find counter-builds, not only builds, you need to compare armies. One way is with a battle simulator like SparCraft: Set the two armies down in some formation and have them wargame it out. MaasCraft used to do the same job faster but less accurately with a classic military formula. To estimate whether an advantage is enough to win, you might try a second war game giving the weak side a 20-second head start on its build, to represent that it is defending its base and doesn’t have to cross the map.

Recipe: Tech rushes, the kind of attack where you arrange for your army to be particularly strong at a given time, such as when you’ve teched up to a new unit type or when research finishes. Tech rushes are keystones of the build catalog because they are the strongest instantaneous attacks.

Step 1. List the unit combinations you care about: Zerglings, hydralisks, hydraling, lurkling, hydralurk, hydralurkling, mutaling—that might be all for zerg, maybe a few more. I don’t know any reason to precompute build orders into the late game. You can skip unit combinations that are too gas-heavy, like muta+lurker or tank+wraith—surely it’s better to include at least one unit that costs mostly minerals, so that you can spend efficiently.

Step 2. For each unit in one of your combinations, list the research and upgrades that you care about. For hydralisks you’ll list speed, range, missile attack, and carapace. +3 upgrades are for late game, and even +2 might be too late to be worth precomputing. Combine the units and upgrades into a longer list: Slowlings, speedlings, +1 speedlings, etc. Each is a target army that you might aim for in a tech rush.

Step 3. Brute force it. For each target army in the list, do the build order search to find out how fast you can build it. That’s the earliest timing you can hit. Then, for that time (fast tech rush) and later times (every n frames) until some time limit (slow but heavy tech rushes), search to find the largest target army you can make at that time, counting only the target military units and not the workers. That will give a curve of possible tech rushes versus time, from 4-pool to whatever upper time and tech limits you set.

Recipe: Timing attacks, where you recognize what the enemy is doing and strike while they are weak. I expect that strong tech rushes are likely to risk strong timing attacks.

For each tech rush that you want to counter, find timings when the build is weaker: Before units come out or before research finishes. For each potentially vulnerable timing, pull tech rushes from the catalog and for each one do the army comparison to see whether it’s a counter, in which case it is a timing attack too. And of course label the counters in the catalog.

The same if you want to counter any other known strategy, like what your opponent just beat you with. In that case you may be able to calculate a counter in game—after it’s hopeless and before the gg should be enough time.

Recipe: Universally safe builds are those safe against most or all tech rushes. Brute force it! Do you need static defense? The best safe build is the one that gives you the most workers. Some tech rushes may be safe, but maybe you should also consider how many workers are cut and seek out more economic builds. For universal safety you only have to fear early attacks, because later you can get practical safety.

Recipe: Practically safe builds are those safe against tech rushes that are consistent with your scouting info (ones the opponent could still play). One way to precompute them might be to mine replays for typical scouting info and brute force a range of safe builds so that you can switch into one from where you are. Again, the best safe build is the one that gives you the most workers.

More is possible, and by now you should get the idea. The catalog is a tree of builds, of course, in one representation. The branching structure of the tree tells you when you have to scout for what information.

I expect that PerfectBot will have some kind of knowledgebase created by large-scale search of strategy space. That doesn’t mean we should have it today, or that this way of making one will work well. It’s super simplified compared to real play, after all. I feel confident that some build catalog could be made useful, but I dunno how many new ideas it might hunger for.

high-level view of strategy inference

PerfectBot 1.0, which deploys every processor cycle optimally to play the best possible Brood War, is scheduled for release at the end of time. I doubt it will be later. How will PerfectBot figure out its enemy’s opening plan?

Saa, who knows? But we can guess part of it.

Needless to say, PerfectBot picks up on more cues than bots today. A few possibilities:

direct cues
- what is the map? (different strategies will be good)
- what are the respective base positions?
- what buildings did we see? (tech, production, expansion...)
- what units did we see?
- what minerals has the enemy mined?
- does the enemy have gas? still mining?
- what information did the enemy collect about what I’m doing?
effort cues (how much should we have seen?)
- where and when and how long did we scout?
- could there be unfound hidden buildings?
- do we have units forward to put pressure on or to see the enemy moving out?
behavior cues (what is the enemy army doing?)
- when did the enemy scout?
- are the first enemy units hanging back or out on the map?
- what goals does the enemy building placement support? (e.g. a wall-in is better for some strategies than others)
- is the enemy trying extra hard to prevent scouting?

PerfectBot does at least two kinds of inference. First, it figures out what is physically possible under the rules of the game. “Mutalisks can be out at time t, not earlier.” Second, it recognizes cues associated with strong strategies. By combining the two, PerfectBot can narrow down the situation accurately.

How do strategy cues work? The enemy has three choices about the message it sends with every visible action it takes:

play a straight move that supports the chosen strategy
pay a strategic cost to take a deceptive action
play badly with disadvantage

Straight moves tend to be strategy cues, but they also make your strategy effective. Bad moves can be ignored in strategy inference; in theory they only decrease the hazard that PerfectBot faces in the range of possible futures.

Deceptive moves come with a more difficult theory. Think of a deceptive move as a move to confuse the opponent: A move that looks strong for a different strategy than you are playing (researching air weapons to look like dragoon range while you prepare a reaver drop), or that goes above and beyond to hide information (blocking the ramp with drones to keep the scout out). First, a deceptive move always incurs a cost, meaning that it makes the strategy in some way less likely to succeed. If it had no cost, it would be a straight move that happened to give little information away. Second, the payback for the cost of deception does not occur in the game in which you play a deceptive move (not against an opponent as smart as PerfectBot). The theory is related to the theory of bluffing in poker: Game theory says you should bluff a certain amount of the time, varying by situation, and the payback for bluffing is the opponent’s higher uncertainty all the time. PerfectBot knows its opponent may be pulling a trick, and (since it’s perfect) it knows what the available tricks are, but it still has to prepare for a wider range of future events (could be dragoons, could be reaver drop).

Based on the combination of cues and analysis of possibilities, PerfectBot can do something like assign probabilities to future events (“overlords hanging around outside my base, high chance of slowdrop around time t”) and play to maximize its odds of winning.

You can’t deceive today’s bots because they’re not smart enough to take the bait. So first we should concentrate on the strategy cues of straight moves: If you know what the good strategies are and how to play them, you can find the straight cues. Knowing the good strategies, for example with a database of builds like LetaBot’s, is a strong start.

Instead of relying on human knowledge of good strategy, we could try to compute good strategies with a search through strategy space. I imagine that’s how PerfectBot gets its knowledge. I’ll write about that tomorrow—it’s easy enough that I can imagine trying it this year.

I’m also looking forward to ideas for combining physics knowledge of what’s possible with cues. If the opponent is going mutas, when will they come out? (Ever notice experts building turrets literally at the last second?) Maybe try a build order search along the lines of UAlbertaBot’s BOSS, but applied to the opponent.

High level theory like this is super vague but I hope it’s helpful for thinking about concrete ideas so that tomorrow’s bots can become smart enough to be tricked.

LetaBot’s build order inference

LetaBot does an elaborate inference of its opponent’s opening build order. A couple posts ago I found no scouting inferences by LetaBot—I missed it. Martin Rooijackers had to tip me off. Thanks! It’s already there in the AIIDE 2015 version. There’s a ton of code in all the bots and I didn’t read most of it, so I may have missed many inferences in other bots too. This one is kind of fancy and breaks my earlier conclusion that no inferences were “deep or clever.”

AIIDE 2015 LetaBot declares terran and zerg build orders that an opponent may play. I imagine that the current version also knows protoss build orders. But in any case, at least for terran and zerg opponents LetaBot can compare what it sees against the different build orders and find the best match.

A build order declaration looks like this. A zerg build order from BuildOrderManager.cpp:

void BuildOrderManager::NinePoolSpeed(){
	BuildOrder newBo;
	newBo.YourRace = BWAPI::Races::Zerg;
	newBo.RaceEnemy = BWAPI::Races::Terran;
	newBo.Creator = "Dennis Soemers Text mining. Manual adjustments by Martin Rooijackers";
	newBo.Name = "9 Pool Speed";

	BOpiece addPiece;
	addPiece.supply = 9;
	addPiece.Building = BWAPI::UnitTypes::Zerg_Spawning_Pool;
	newBo.BOpieces.push_back( addPiece );
	addPiece.supply = 8;
	addPiece.Building = BWAPI::UnitTypes::Zerg_Drone;
	newBo.BOpieces.push_back( addPiece );
	addPiece.supply = 9;
	addPiece.Building = BWAPI::UnitTypes::Zerg_Extractor;
	newBo.BOpieces.push_back( addPiece );
	addPiece.supply = 8;
	addPiece.Building = BWAPI::UnitTypes::Zerg_Overlord;
	newBo.BOpieces.push_back( addPiece );

	newBo.WeakCounterBy.push_back( "2 Rax FE" );
	newBo.StrongCounterTo.push_back("14 CC");
	newBo.StrongCounterTo.push_back("BBS");
	newBo.WeakCounterTo.push_back("1 Rax FE");

	BOs.push_back(newBo);
}

Some counter-builds are declared near the bottom. The AIIDE 2015 version does not use them.

LetaBot draws the build order inference in BuildOrderManager.onFrame() so that the inference can change from moment to moment as more information turns up.

	for(int j=0; j<BOs.size(); j++){
		std::vector<BWAPI::UnitType>  EnemyBuildingsCopy;
	  for(int i=0; i<InfoMan->EnemyBuildings.size(); i++){
		  if( InfoMan->EnemyBuildings[i].building->getType() == BWAPI::UnitTypes::Zerg_Hatchery){
			  if( InfoMan->EnemyBase != BWAPI::TilePositions::Unknown ){
				  if(  InfoMan->EnemyBuildings[i].position.getDistance( BWAPI::Position(InfoMan->EnemyBase) ) > 4*32 ){
			        EnemyBuildingsCopy.push_back( InfoMan->EnemyBuildings[i].type );
				  }
			  }
		  } else {
			  EnemyBuildingsCopy.push_back( InfoMan->EnemyBuildings[i].type );
		  }
			  
	  }

	  int BuildAvailable = 0;
	  int BuildMissing = 0;
		  for(int k=0; k<BOs[j].BOpieces.size(); k++){
			  if( BOs[j].BOpieces[k].Building.isBuilding() ){
				  bool isThere = false;
				  for( int m=0; m<EnemyBuildingsCopy.size(); m++){
					  if( EnemyBuildingsCopy[m] == BOs[j].BOpieces[k].Building ){
						  isThere = true;
						  EnemyBuildingsCopy.erase( EnemyBuildingsCopy.begin() + m);
						  break;
					  }
				  }
				  if( isThere ){
					  BuildAvailable++;
				  } else {
					  BuildMissing++;
				  }
			  }
		  }


	  int score = BuildAvailable - BuildMissing;
	  if( BestScore < score){
		  BestScore = score;
		  EnemyBo = BOs[j].Name;
	  }
	}

Compute a score for each build order. Add to the score (BuildAvailable) for each part of the build order that LetaBot has seen, and subtract from the score (BuildMissing) for each part it has not seen. The enemy is (LetaBot concludes) following the build order that gets the highest score. In case of ties, the first high score gets it.

It’s simple and probably jumps to conclusions sometimes, but if the build orders are chosen and ordered correctly then I think it could avoid jumping to dangerous conclusions that make LetaBot do something foolish.

In the AIIDE 2015 version, LetaBot uses the inferred build order to decide how many marines to train for its marine-SCV rush, but not for any other purpose. Openings that give zerg a stronger early army need more marines. From TwoRaxSCVRush.cpp:

	if( Broodwar->enemy()->getRace() == BWAPI::Races::Zerg){
	//get marines to build based on BO
	if( BOMan->EnemyBo == "9 Pool"){
		MarinesToBuild = 8;
	}
	//get marines to build based on BO
	if( BOMan->EnemyBo == "OverPool"){
		MarinesToBuild = 6;
	}
	//get marines to build based on BO
	if( BOMan->EnemyBo == "12 Hatch"){
		MarinesToBuild = 6;
	}
	//get marines to build based on BO
	if( BOMan->EnemyBo == "9 Pool Speed"){
		MarinesToBuild = 10;
	}
	}

The latest version of LetaBot surely has many more uses for the inferred build order.

Bottom line: It looks to me like an excellent idea for adapting your opening strategy to the opponent’s, as long as the opponent is playing a recognized good strategy. If the opponent plays a bad strategy, why worry? Against zerg there should be few problems caused by unscouted buildings. Even against terran or protoss, I have the feeling that there are ways to manage the uncertainty from unscouted buildings or ambiguous build orders. For example, for build orders that branch you can declare one build order up to the branching point and further builds along each possible branch.

Who’s coming in with the next tip-off?

Update: If you catch LetaBot playing on the SSCAIT live stream you may be able to watch the build order inference in action. LetaBot reports its conclusion at the top center of the screen; at the start of the game it is “Unknown”. For some reason it doesn’t seem to show up every game (maybe only games versus zerg?). The conclusion can be way off when the opponent plays something LetaBot doesn’t know, like “1 Hatch Lurker” when it has seen two hatcheries and no gas. (I guess it should subtract from the score when it sees buildings that are outside the build order, or some other cleverness increment.)

what AIUR learned

After Overkill yesterday, I wrote a not-quite-as-little Perl script to read AIUR’s learning files. AIUR learns more data: Overkill learns a table (opponent, strategy), while AIUR learns a table (opponent, strategy, map size) where map size is the number of starting positions, which is 2, 3 or 4 in AIIDE 2015.

Unlike Overkill, AIUR recorded every game exactly once, missing none and adding none, so its data should be easier to interpret.

Here’s a sample table for one opponent. Compare it against AIUR’s row in Overkill’s table from yesterday. See the full AIUR learning results.

overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	18	67%	3	33%	1	0%	22	59%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	1	0%	3	33%	25	12%	29	14%
defensive	5	40%	9	33%	15	40%	29	38%
total	27	52%	18	28%	45	20%	90	31%

For reference, here are AIUR’s “moods,” aka strategies.

cheese - cannon rush
rush - dark templar rush
aggressive - fast 4-zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes

We see that against Overkill, the cannon rush was relatively successful on 2-player maps, 3-player maps were a struggle, and on 4-player maps AIUR discovered a little late that the defensive mood was better than the macro mood. We also see that AIUR barely explored further when it found a reasonably successful try. If the best strategy was one that happened to lose its first game and didn’t get tried again, it would never know. With so many table cells to fill in, the tremendously long tournament was not long enough for AIUR to explore every possibility thoroughly.

AIUR selected strategies with an initial phase of try-everything-approximately-once followed by an epsilon-greedy algorithm, with epsilon set at 6%. Epsilon-greedy means that 6% of the time it chose a strategy at random, and otherwise it made the greedy choice, the strategy with the best record so far. With 90 games against each opponent to fill in 18 table cells, most cells never came up in the 6% random sample.

It should be clear why AIUR was still improving steadily at the end of the tournament! I offered a theory that AIUR learned so much because of its extreme strategies. If you read through the full set of tables, you’ll see that a strategy which works on one map size only sometimes works on other sizes too. The combination of opponent and map size paid off in ways that neither could alone, though only sometimes.

Overkill and AIUR fought a learning duel during the tournament. Both are running learning algorithms which assume that the opponent does not change (or at least settles down in the long run), and both bots violated the assumption. AIUR violated it more strongly. Was that an advantage? Could there be a connection with AIUR’s late discovery of the defensive strategy on 4-player maps?

I updated the zip archive of the Perl scripts and related files to add AIUR’s script alongside Overkill’s. By the way, I haven’t tested it on Windows, so it might need a tweak or two for that (nothing more than one or two very small changes).

what Overkill learned

I wrote a little Perl script to read Overkill’s learning files from AIIDE 2015 and add up the numbers. The three strategy names are as Overkill spells them. The opponents are listed in tournament order, so the strongest are at the top.

	NinePoolling		TenHatchMuta		TwelveHatchMuta		total
opponent	n	win	n	win	n	win	n	win
tscmoo	57	26%	19	11%	18	11%	94	20%
ZZZKBot	80	46%	8	0%	8	0%	96	39%
UAlbertaBot	61	30%	20	15%	10	0%	91	23%
Aiur	13	54%	66	80%	3	0%	82	73%
Ximp	2	0%	30	83%	57	93%	89	88%
IceBot	4	25%	72	83%	14	57%	90	77%
Skynet	13	62%	19	68%	58	84%	90	78%
Xelnaga	75	81%	12	50%	3	0%	90	74%
LetaBot	78	100%	10	70%	2	0%	90	94%
Tyr	6	33%	25	64%	53	77%	84	70%
GarmBot	27	96%	27	96%	36	100%	90	98%
NUSBot	66	100%	13	77%	11	73%	90	93%
TerranUAB	30	100%	30	100%	30	100%	90	100%
Cimex	56	100%	33	94%	2	0%	91	96%
CruzBot	30	100%	30	100%	29	100%	89	100%
OpprimoBot	24	96%	33	100%	33	100%	90	99%
Oritaka	56	98%	10	70%	24	88%	90	92%
Stone	56	93%	12	67%	21	81%	89	87%
Bonjwa	30	100%	30	100%	30	100%	90	100%
Yarmouk	30	100%	30	100%	30	100%	90	100%
SusanooTricks	32	100%	23	96%	32	100%	87	99%
total	826	80%	552	80%	504	83%	1882	81%

The number n here is not the number of games played. There were 90 rounds. Some games were perhaps not recorded due to crashes or other errors, which could explain why some opponents have n< 90. Also, when the 10-hatch mutalisk strategy failed, Overkill assumes it must have lost due to a rush that would also kill the 12-hatch muta strategy. In that case Overkill records 2 game records, a 10-hatch muta loss and a 12-hatch muta loss, explaining why some opponents have n> 90. At least that’s what the code says; some of the data in the table doesn’t seem to match up (see the Xelnaga row). What did I miss?

Some of the strategy choices make sense intuitively. Overkill learned to get early zerglings against ZZZKBot and UAlbertaBot which play rushes, and learned that a more economy-oriented strategy worked against XIMP with its later carriers. These are examples of learning as a substitute for scouting and adapting.

Look at the bottom row. Each strategy ended up with virtually the same winning rate; the UCB algorithm evened them out accurately. But it didn’t use the strategies equally often; the 9-pool was more successful on average against this set of opponents. The early zerglings are important against many opponents, for whatever reason.

Look at the individual lines. Except for weaker opponents that Overkill defeats no matter what, for most opponents one or two strategies were clearly better and were played more often. How much did Overkill learn? If it had played strategies randomly, then the winning rate would be the average of the strategy winning rates. The gain can be estimated as the total winning rate minus the mean of the strategy winning rates—how far did you rise above ignorance? The number varies from zero to huge for different opponents. Because of sampling effects, the estimate will statistically tend to be higher than the truth.

This learning method has to play weak strategies to find out that they’re weak, so it can’t be perfect. The regret for each opponent can be estimated as the difference between the total winning rate and the winning rate of the best strategy if you’d known to play it from the start—how far did you fall short of omniscience? For many of the opponents, the regret estimated that way is 6% to 7%. If the learning algorithm converges to an exact solution, then in an infinitely long tournament the regret will fall to 0. Thinking about numbers like this can give you an idea of when learning makes sense.

The Perl script and related files are available as a zip archive.

strategy selection in LetaBot

Martin Rooijackers sent me some information about his creation LetaBot.

Up through AIIDE 2015 LetaBot selected builds by learning, but now it has jettisoned learning and selects builds based on scouting information. LetaBot opens with a build that is safe against rushes and transitions to counter whatever it scouts—at least up to a point, it’s a work in progress. LetaBot now has “an extensive flowchart” (that’s how he put it) of terran build orders from Liquipedia. That makes it sound like LetaBot will make more than one transition if it thinks it should.

Rooijackers credits Dennis Soemers (author of protoss bot MaasCraft, which played in AIIDE 2014 and CIG 2014) with pulling the build orders out of Liquipedia, and says he got more build order tips from mapmaker CardinalAllin.

You can see why he might have wanted to change—LetaBot didn’t really benefit from learning in AIIDE 2015. An advantage of prior knowledge over learning is that knowledge is available from the start; you don’t lose games figuring stuff out. A disadvantage is that you can’t take special advantage of surprise weaknesses in the opponent’s play. And I notice how much AIUR wins with strategies that are objectively bad.

Ideally bots should both have prior knowledge and learn during a competition, of course. Prior knowledge says “here are the builds or strategies that work” and learning adds “and these are the particular ones that you should pick to gain advantage over this opponent/this opponent on this map/etc.”

I think that offline learning would be a good way to gain knowledge of builds and strategies, especially if you have vast resources like most of us. You don’t want to go with builds that are good, you want to go with builds that are good for you, based on your skills; that’s true for any player. So every time you make a tweak to micro that may affect your choices, be sure to spend a few cpu-years on offline learning to re-learn your openings from scratch. Should be no problem if you’re as rich as Google, and who isn’t?

AIUR learns more

The protoss bot AIUR by Florian Richoux has a set of hand-coded strategies and learns over time which strategies win against which opponents. That’s a popular religion; other bots like Overkill (see my post on it) and Tscmoo worship at the same altar. But a funny thing happened on the way through the tournament. In the AIIDE 2015 competition report, look at the graph of winning rate over time for the different bots. Let me steal the image showing the top half of participants:

win rates by round in AIIDE 2015

AIUR’s line is the one in the middle that keeps rising and rising. Look carefully and you can see it leveling off, but it hasn’t reached its asymptote at the end of the very long tournament. AIUR’s learning seems to learn more, and to keep on learning, even though AIUR’s learning method is about the same as other bots. Howzat happen?

Of course AIUR doesn’t do exactly the same thing as other bots. After all, it calls its strategies “moods,” which sounds entirely different. It doesn’t learn an opponent -> strategy mapping, it learns opponent + map size -> strategy, where map size means the number of starting bases, usually 2, 3, or 4. It can figure out that its cannon rush works better on 2-player maps, for example. I imagine that that’s part of the answer, but could it be the whole story?

I have a theory. My theory is that AIUR’s extreme strategies make good probes for weakness. AIUR’s strategies range from absolutely reckless cannon rush, dark templar rush, and 4-zealot drop cheese to defensive and macro-oriented game plans. AIUR’s strategies stake out corners of strategy space. Compare Overkill’s middle-of-the-road zergling, mutalisk, and hydralisk strats, with no fast rushes or slow macro plays, nothing highly aggressive and nothing highly cautious. My theory is that if an enemy makes systematic mistakes, then one of AIUR’s extreme strategies is likely to exploit the mistakes, and AIUR will eventually learn so.

If true, that could explain why AIUR learns more effectively in the long run. Presumably the reason that it takes so long to reach its asymptote is that it has to learn the effect of the map size. The tournament had 27 games per opponent on 2-player maps, 18 on 3-player, and 45 on 4-player, not enough to test each of its 6 strategies repeatedly. It could learn faster by doing a touch of generalization—I’ll post on that some other day.

AIUR also claims to implement its strategies with a further dose of randomness. Intentional unpredictability could confuse the learning algorithms of its enemies. I approve.

crazy game Iron-Killerbot

Here’s a whacky shot from a game between terran Iron by Igor Dimitrijevic and zerg Killerbot by Marian Devecka. I caught this on the SSCAIT live stream yesterday (yesterday in my time zone, anyway).

Killerbot is the strongest bot in existence, if you ask me, but here it is in trouble. Killerbot expanded, placed 3 sunkens and a maze of buildings to deter any runby, and set about teching up. All as usual. Iron also did its usual thing, rushing with speed vultures and a seemingly excessive number of SCVs.

Killerbot is so cautious about runbys that its blocking buildings sometimes block its own lurkers in its base. Caution didn’t help this time. Iron’s vultures ran through the natural and brought SCVs with them into the main. You can see a couple of the SCVs in the screenshot; the vultures are still in the main but are out of view. Iron killed the drones, killed the one defending hydralisk, and set about slowly-slowly tearing down the buildings. Killerbot has sharp strategies but it has the same fragilities as other bots—it said “hmm, the main is underpopulated” and sent drones in from the natural one at a time to die until it had none left.

If Iron expanded now, it could eventually accumulate enough vultures to kill the sunkens and eliminate Killerbot. But I believe this version does not know how to expand and is afraid of static defense (I saw it lose a game against AIUR on points because it did not attack the single well-placed cannon defending all the buildings in the main but instead gradually lost its vultures). On the other side, the queens do not have broodling. Stalemate.

At this point the tournament system had some kind of problem and aborted the game. In the regame, there was no runby and Killerbot won easily. So I never found out who would have won on points.

Lessons!

Runbys rule. I think the feature was recently added to Iron, and I expect it to help a lot. I haven’t seen any other bot do a runby. Runbys are common in human games, so I think they’re vital.
Expanding is a robustness measure. Iron already builds extra command centers, all in the same base. I expect a future version will know how to expand.
If there are things you can’t attack, then there are games you can’t win. If you fear sunkens, or if you can’t find floating command centers in the corner, you didn’t finish the game under your own power.
It’s good to have a fallback strategy to break loose a stuck game. ZZZKBot eventually makes mutalisks. LetaBot used to go wraiths. Even Jakub Trancik’s mass cannon bot finally switches to zealots.

Update later the same day: Igor Dimitrijevic sent me more information. He says my guesses about Iron are correct. About the new runby feature, he writes “Up to 3 times, if some conditions are met (some sunken / bunker / cannon prevents my vultures from reaching the enemy main base), some vultures and/or SCVs (at least one vulture) are ordered to runby, which is something they would otherwise never do.” Later he turned the runby feature off, and I’m not sure when or if it will make a return.

Iron has been updated again and it has another new feature: It makes tanks. Tanks can easily siege down the static defenses that used to hold Iron at bay. Igor is confident that, when tanks and the ability to expand are tuned properly with everything else, Iron will rise near the top of the SSCAIT table. Hmm... maybe it will. I can’t judge. How will Iron fight flying units or cloaked units or tank pushes? Is the rush so strong that it will stop the opponent from getting that far?

strategy selection in Overkill

Zerg bot Overkill by Sijia Xu placed 3rd in the CIG2015 and 3rd in the AIIDE2015 tournaments. It’s not the very best, but it’s a top bot. Its games are entertaining to watch because it has more than one strategy.

Today I’m going to poke into Overkill’s C++ source code for strategy selection. There’s not much to it.

Overkill knows 3 strategies, 9-pool zergling, mutalisks, and hydras. For mutas it knows 2 different build orders, a faster one (10 hatch) and a slower one (12 hatch) with a stronger economy. It has an idea of when to switch between strategies within a game. It also learns from experience which strategies work against each opponent. If that sounds fancy, then the code is probably simpler than you imagine. It ain’t long and it ain’t tricky.

Overkill keeps a history file for each opponent so it can remember which strategies work. Code from the main program Overkill.cpp shows that if it doesn’t know the opponent yet, it goes 9-pool on the first game. (I stripped out a few irrelevant comments.)

	string enemyName = BWAPI::Broodwar->enemy()->getName();
	string filePath = "./bwapi-data/read/";
	filePath += enemyName;
	
	//for each enemy, create a file
	historyFile.open(filePath.c_str(), ios::in);
	
	//file do not exist, first match to the enemy
	if (!historyFile.is_open())
	{
		BWAPI::Broodwar->printf("first match to:   %s", enemyName.c_str());
		historyFile.close();

		//default opening strategy
		chooseOpeningStrategy = NinePoolling;
		StrategyManager::Instance().setOpeningStrategy(chooseOpeningStrategy);
	}
	else
	{

I’ll skip over reading the history file (it’s just reading this-strategy-got-that-game-result records and adding up) and go straight to the decision. Strategy selection is a multi-armed bandit problem, and Overkill solves it with a UCB algorithm that makes a tradeoff between exploration (“I’m not really sure how well this strategy works against this opponent, better try it again”) and exploitation (“this one usually wins, I’ll do it”). This is about the same method as other strategy-learning bots.

		double experimentCount = historyInfo.size();
		std::map<std::string, double> strategyUCB;
		for(auto opening : StrategyManager::Instance().getStrategyNameArray())
		{
			strategyUCB[opening] = 99999;
		}

		for (auto it : strategyResults)
		{
			double strategyExpectation = double(it.second.first) / (it.second.first + it.second.second);
			double uncertainty = 0.7 * std::sqrt(std::log(experimentCount) / (it.second.first + it.second.second));
			strategyUCB[it.first] = strategyExpectation + uncertainty;
		}
		
		std::string maxUCBStrategy;
		double maxUCB = 0;
		for (auto it : strategyUCB)
		{
			if (it.second > maxUCB)
			{
				maxUCBStrategy = it.first;
				maxUCB = it.second;
			}
			BWAPI::Broodwar->printf("%s , UCB: %.4f", it.first.c_str(), it.second);
		}

		BWAPI::Broodwar->printf("choose %s opening", maxUCBStrategy.c_str());
		int openingId = StrategyManager::Instance().getStrategyByName(maxUCBStrategy);
		if (openingId != -1)
		{
			chooseOpeningStrategy = openingStrategy(openingId);
		}
		StrategyManager::Instance().setOpeningStrategy(chooseOpeningStrategy);

		historyFile.close();
	}

If you haven’t seen something like this before it might be a little hard to understand what it’s doing, but you can’t say that the code is complicated. strategyExpectation is the strategy’s winning rate so far: You want to pick winners. uncertainty is high for strategies that have not yet been played and decreases as the strategy is tried more times: You want to experiment with strategies that you haven’t tried enough yet. Add them up and pick the one that’s best overall.

The initial strategy is one of 9-pool lings, 10-hatch mutas, or 12-hatch mutas. It never starts the game intending to go hydras. This is declared in StrategyManager.h:

enum openingStrategy { TwelveHatchMuta, NinePoolling, TenHatchMuta };

So how does Overkill decide to switch between strategies? That’s in the StrategyManager::update() method.

	switch (currentStrategy)
	{
	case HydraPush:
	{
		ProductionManager::Instance().setExtractorBuildSpeed(0);
		ProductionManager::Instance().setDroneProductionSpeed(250);

		if (hydraUpgradeTrigger && BWAPI::Broodwar->self()->allUnitCount(BWAPI::UnitTypes::Zerg_Hydralisk) >= 12)
		{
			hydraUpgradeTrigger = false;
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Missile_Attacks);
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Carapace);
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Missile_Attacks);
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Carapace);
		}
	}
		break;

Well, this isn’t switching strategy at all, it’s implementing the strategy by sending orders to the ProductionManager. Once Overkill goes hydra, it sticks with hydras. What if it goes muta? Here I stripped out the instructions to ProductionManager and left the decision logic.

	case MutaPush:
	{
		std::map<BWAPI::UnitType, std::set<BWAPI::Unit>>& enemyBattle = InformationManager::Instance().getEnemyAllBattleUnit();
		int enemyAntiAirSupply = 0;
		for (std::map<BWAPI::UnitType, std::set<BWAPI::Unit>>::iterator it = enemyBattle.begin(); it != enemyBattle.end(); it++)
		{
			if (it->first.airWeapon() != BWAPI::WeaponTypes::None)
			{
				enemyAntiAirSupply += it->first.supplyRequired() * it->second.size();
			}
		}

		// if lost too many mutalisk , and enemy still have many anti-air army, change to hydrisk
		if (enemyAntiAirSupply >= 2 * 1.5 * 12 && BWAPI::Broodwar->self()->allUnitCount(BWAPI::UnitTypes::Zerg_Mutalisk) * 3 < enemyAntiAirSupply)
		{
			goalChange(HydraPush);
		}

	}
		break;

This is Overkill’s most complex strategy logic, and it’s not very complex. If the enemy has too many units that can shoot down mutalisks, according to a short formula, then stop making mutalisks and switch to hydras.

What if Overkill is still on zerglings? This time I left in the orders to the ProductionManager.

	case ZerglingPush:
	{
		int hatchCompleteCount = 0;
		BOOST_FOREACH(BWAPI::Unit u, BWAPI::Broodwar->self()->getUnits())
		{
			if (u->getType() == BWAPI::UnitTypes::Zerg_Hatchery && u->isCompleted())
			{
				hatchCompleteCount++;
			}
			else if (u->getType() == BWAPI::UnitTypes::Zerg_Lair)
			{
				hatchCompleteCount++;
			}
			else
				continue;
		}
		// slow the drone production speed when only one hatch
		if (hatchCompleteCount == 1)
		{
			ProductionManager::Instance().setDroneProductionSpeed(750);
		}
		else
		{
			ProductionManager::Instance().setDroneProductionSpeed(250);
		}

		ProductionManager::Instance().setExtractorBuildSpeed(15);

		if (BWAPI::Broodwar->self()->gas() >= 100)
		{
			goalChange(MutaPush);
		}
	}
		break;

It counts the hatcheries and, if there are 2 or more, makes drones faster (I think that’s what it means). In other words, make a bunch of lings early on to see what you can do, then focus on drones. This is again implementing the strategy, not switching it. The strategy switch is: As soon as we’ve collected 100 gas (enough to start a lair), switch to mutas. In other words, the strategy path is hardcoded zerglings to mutas to hydras, and Overkill can start with either zerglings or mutas (and for mutas, with either a 10- or a 12-hatchery build). Dead simple.

You might draw the lesson, “If this is all the smarts a top-3 bot has, no wonder humans play so much better!” You won’t be wrong. But that’s where we are today, and we have to make progress one step at a time. The stronger Tscmoo bot knows more strategies than Overkill, but its strategy selection and strategy switching are hardly any smarter.

A more fun lesson to draw is “Hey, this is easy! I can make a top bot without worrying much about strategy selection, even though it seems like such a hard problem.” That’s true too.

There are plenty of other lessons to draw. When Overkill doesn’t expand early enough to its natural, it often gets stuck on one base for the whole game and (since zerg have a lust to expand) gets smooshed. Also, Overkill’s zergling and mutalisk strategies work okay, but the hydralisk strategy can backfire, especially against terran. A terran ground army can roll up an equal-size hydra army like a rug. The first thing you should worry about is not picking a good strategy, but making the strategy you pick good. Any decent strategy can beat other bots if it’s carried out crisply.