strategy learning - 6 | Starcraft AI blog

you need more than 1 strategy

Martin Rooijackers aka LetaBot read my posts about Zia and wrote to point out that a zerg bot facing terran wants both mutalisk and lurker options. The reason is that terran may counter the mutas. He mentioned 5 barracks with +1, which should hard counter mutas. He also called out valkyrie and goliath possibilities, specifically pointing out that valkyries force mutas to spread out, which reduces their potential. Zerg needs to scout the build and react before overcommitting to mutalisks—at the latest when the first fliers arrive at the terran base and see what’s up.

Zerg can’t stick with tier 1 units (zerglings and hydralisks) because any likely terran midgame army will walk over them. And hive tech takes time. Lair units are key to the middlegame.

If zerg always goes mutas, any terran with strategy learning will find a way to counter the mutas and gain an advantage every game. I think this has already happened with Zia and Tscmoo terran. If zerg sometimes opens mutas and sometimes lurkers, then terran faces a risk trying to counter mutas with marines—the lurkers counter marines. Terran’s best play becomes less committal and more cautious, and that favors zerg.

Mainline pro play has the zerg starting with a limited number of mutas and using the time they buy with cautious harassment to get lurkers and rapidly tech to hive. But pros of course are totally comfortable with adaptation and tech switches. Not all games follow the main line. Today’s game of Flash (T) vs. Zero (Z) was a great example: Flash opened 14 CC, Zero responded logically with 3 hatcheries before pool and went lurkers while Flash prepared for mutalisks.

Any bot with only one strategy stands at a disadvantage against bots with opponent modeling. It’s true for all matchups. Today’s simple strategy learning will find a counter-strategy within a dozen games, usually less. Humans, and tomorrow’s sophisticated opponent modeling bots, may counter the strategy of the first game in the second, and should quickly find strong counters to most fixed strategies.

To beat humans, or to beat opponent modeling bots, you’ll need strategy flexibility plus either learning or a dose of randomness, ideally both. I promise. If sophisticated opponent modeling doesn’t arrive fast enough for me, I’ll provide it myself. It will make bots much more interesting to watch and to play against.

what AIUR learned

After Overkill yesterday, I wrote a not-quite-as-little Perl script to read AIUR’s learning files. AIUR learns more data: Overkill learns a table (opponent, strategy), while AIUR learns a table (opponent, strategy, map size) where map size is the number of starting positions, which is 2, 3 or 4 in AIIDE 2015.

Unlike Overkill, AIUR recorded every game exactly once, missing none and adding none, so its data should be easier to interpret.

Here’s a sample table for one opponent. Compare it against AIUR’s row in Overkill’s table from yesterday. See the full AIUR learning results.

overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	18	67%	3	33%	1	0%	22	59%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	1	0%	3	33%	25	12%	29	14%
defensive	5	40%	9	33%	15	40%	29	38%
total	27	52%	18	28%	45	20%	90	31%

For reference, here are AIUR’s “moods,” aka strategies.

cheese - cannon rush
rush - dark templar rush
aggressive - fast 4-zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes

We see that against Overkill, the cannon rush was relatively successful on 2-player maps, 3-player maps were a struggle, and on 4-player maps AIUR discovered a little late that the defensive mood was better than the macro mood. We also see that AIUR barely explored further when it found a reasonably successful try. If the best strategy was one that happened to lose its first game and didn’t get tried again, it would never know. With so many table cells to fill in, the tremendously long tournament was not long enough for AIUR to explore every possibility thoroughly.

AIUR selected strategies with an initial phase of try-everything-approximately-once followed by an epsilon-greedy algorithm, with epsilon set at 6%. Epsilon-greedy means that 6% of the time it chose a strategy at random, and otherwise it made the greedy choice, the strategy with the best record so far. With 90 games against each opponent to fill in 18 table cells, most cells never came up in the 6% random sample.

It should be clear why AIUR was still improving steadily at the end of the tournament! I offered a theory that AIUR learned so much because of its extreme strategies. If you read through the full set of tables, you’ll see that a strategy which works on one map size only sometimes works on other sizes too. The combination of opponent and map size paid off in ways that neither could alone, though only sometimes.

Overkill and AIUR fought a learning duel during the tournament. Both are running learning algorithms which assume that the opponent does not change (or at least settles down in the long run), and both bots violated the assumption. AIUR violated it more strongly. Was that an advantage? Could there be a connection with AIUR’s late discovery of the defensive strategy on 4-player maps?

I updated the zip archive of the Perl scripts and related files to add AIUR’s script alongside Overkill’s. By the way, I haven’t tested it on Windows, so it might need a tweak or two for that (nothing more than one or two very small changes).

what Overkill learned

I wrote a little Perl script to read Overkill’s learning files from AIIDE 2015 and add up the numbers. The three strategy names are as Overkill spells them. The opponents are listed in tournament order, so the strongest are at the top.

	NinePoolling		TenHatchMuta		TwelveHatchMuta		total
opponent	n	win	n	win	n	win	n	win
tscmoo	57	26%	19	11%	18	11%	94	20%
ZZZKBot	80	46%	8	0%	8	0%	96	39%
UAlbertaBot	61	30%	20	15%	10	0%	91	23%
Aiur	13	54%	66	80%	3	0%	82	73%
Ximp	2	0%	30	83%	57	93%	89	88%
IceBot	4	25%	72	83%	14	57%	90	77%
Skynet	13	62%	19	68%	58	84%	90	78%
Xelnaga	75	81%	12	50%	3	0%	90	74%
LetaBot	78	100%	10	70%	2	0%	90	94%
Tyr	6	33%	25	64%	53	77%	84	70%
GarmBot	27	96%	27	96%	36	100%	90	98%
NUSBot	66	100%	13	77%	11	73%	90	93%
TerranUAB	30	100%	30	100%	30	100%	90	100%
Cimex	56	100%	33	94%	2	0%	91	96%
CruzBot	30	100%	30	100%	29	100%	89	100%
OpprimoBot	24	96%	33	100%	33	100%	90	99%
Oritaka	56	98%	10	70%	24	88%	90	92%
Stone	56	93%	12	67%	21	81%	89	87%
Bonjwa	30	100%	30	100%	30	100%	90	100%
Yarmouk	30	100%	30	100%	30	100%	90	100%
SusanooTricks	32	100%	23	96%	32	100%	87	99%
total	826	80%	552	80%	504	83%	1882	81%

The number n here is not the number of games played. There were 90 rounds. Some games were perhaps not recorded due to crashes or other errors, which could explain why some opponents have n< 90. Also, when the 10-hatch mutalisk strategy failed, Overkill assumes it must have lost due to a rush that would also kill the 12-hatch muta strategy. In that case Overkill records 2 game records, a 10-hatch muta loss and a 12-hatch muta loss, explaining why some opponents have n> 90. At least that’s what the code says; some of the data in the table doesn’t seem to match up (see the Xelnaga row). What did I miss?

Some of the strategy choices make sense intuitively. Overkill learned to get early zerglings against ZZZKBot and UAlbertaBot which play rushes, and learned that a more economy-oriented strategy worked against XIMP with its later carriers. These are examples of learning as a substitute for scouting and adapting.

Look at the bottom row. Each strategy ended up with virtually the same winning rate; the UCB algorithm evened them out accurately. But it didn’t use the strategies equally often; the 9-pool was more successful on average against this set of opponents. The early zerglings are important against many opponents, for whatever reason.

Look at the individual lines. Except for weaker opponents that Overkill defeats no matter what, for most opponents one or two strategies were clearly better and were played more often. How much did Overkill learn? If it had played strategies randomly, then the winning rate would be the average of the strategy winning rates. The gain can be estimated as the total winning rate minus the mean of the strategy winning rates—how far did you rise above ignorance? The number varies from zero to huge for different opponents. Because of sampling effects, the estimate will statistically tend to be higher than the truth.

This learning method has to play weak strategies to find out that they’re weak, so it can’t be perfect. The regret for each opponent can be estimated as the difference between the total winning rate and the winning rate of the best strategy if you’d known to play it from the start—how far did you fall short of omniscience? For many of the opponents, the regret estimated that way is 6% to 7%. If the learning algorithm converges to an exact solution, then in an infinitely long tournament the regret will fall to 0. Thinking about numbers like this can give you an idea of when learning makes sense.

The Perl script and related files are available as a zip archive.

panic button and fish story

Yesterday’s post was about prior knowledge. The posts before were about learning. Today’s is about prior knowledge for learning.

I was inspired by a remark from Dave Churchill, author of UAlbertaBot, in his new A History of Starcraft AI Competitions: In AIIDE 2015 “UAlbertaBot had [only] a 2/3 winning percentage against some of the lower ranking bots due to the fact that one of the 3 races did not win against those bots.” UAlbertaBot, playing random, had its learning turned off, presumably because the selected strategy for each race was dominant. With learning turned on, it would have lost games trying weaker strategies before settling on the dominant strategy, ending up behind overall—so the thinking, if my guess is good.

Well, that’s like Bisu defeating Savior. When somebody comes up with a counter for the game plan you thought was dominant, don’t you think you should try something different?

You can have it both ways. You can restrict yourself to playing your dominant strategy unless and until it turns out to lose repeatedly. You don’t have to lose games exploring your options; you can take losing to mean that you should start exploring your options.

The panic button implementation is simple. Start out recording the game results as usual, as if learning were turned on, but ignore them and always pick your dominant strategy. But when you get to (say) >10 games with <10% win rate, hit the panic button and let your algorithm try alternatives. It’s unlikely to make things worse!

The fish story implementation is also simple. Pretend, before the first game with a new opponent, that you actually have a history with this opponent. Tell yourself a fish story: “Oh, strategy A, I tried that a few times and always won. And strategy B sucked, I tried that a time or two and lost.” It’s literally a few lines of code to slide fictitious history into your learning data, and you’re done. Your strategy selection algorithm will look at it and say “Strategy A, duh,” and as long as A keeps winning it will explore others at a low rate.

The simpleminded learning algorithms that bots use today assume that you start out knowing nothing about which choices are better. And that’s just false. You always know that some strategies are stronger than others, that some are safe and work against many opponents while others are risky and only exploit certain weaknesses. With the fish story, your bot can start out knowing that A is reliable (“it won repeatedly”), B is a fallback (“it lost once”), and C can be tried if all else fails (“it lost some times”) in a last ditch attempt to trick a few points out of a killer opponent. Or any combination you want.

If you have prior knowledge about your opponents but you’re not sure whether they’ll have updates for the tournament, you can go Baron Munchausen and tell yourself a different fish story about each opponent.

Many variations and other ideas work too. Think about your strategy choices and your selected algorithm and how you would like it to behave. You can probably find your own ways.

Update: Dave Churchill told me the real reason behind UAbertaBot’s decision: He ran out of time! He wrote that he actually implemented a “panic button” method, but did not have time before the tournament to test it and make sure it was solid. I think it’s enough that UAlbertaBot can play random—progress comes one step at a time.

AIUR learns more

The protoss bot AIUR by Florian Richoux has a set of hand-coded strategies and learns over time which strategies win against which opponents. That’s a popular religion; other bots like Overkill (see my post on it) and Tscmoo worship at the same altar. But a funny thing happened on the way through the tournament. In the AIIDE 2015 competition report, look at the graph of winning rate over time for the different bots. Let me steal the image showing the top half of participants:

win rates by round in AIIDE 2015

AIUR’s line is the one in the middle that keeps rising and rising. Look carefully and you can see it leveling off, but it hasn’t reached its asymptote at the end of the very long tournament. AIUR’s learning seems to learn more, and to keep on learning, even though AIUR’s learning method is about the same as other bots. Howzat happen?

Of course AIUR doesn’t do exactly the same thing as other bots. After all, it calls its strategies “moods,” which sounds entirely different. It doesn’t learn an opponent -> strategy mapping, it learns opponent + map size -> strategy, where map size means the number of starting bases, usually 2, 3, or 4. It can figure out that its cannon rush works better on 2-player maps, for example. I imagine that that’s part of the answer, but could it be the whole story?

I have a theory. My theory is that AIUR’s extreme strategies make good probes for weakness. AIUR’s strategies range from absolutely reckless cannon rush, dark templar rush, and 4-zealot drop cheese to defensive and macro-oriented game plans. AIUR’s strategies stake out corners of strategy space. Compare Overkill’s middle-of-the-road zergling, mutalisk, and hydralisk strats, with no fast rushes or slow macro plays, nothing highly aggressive and nothing highly cautious. My theory is that if an enemy makes systematic mistakes, then one of AIUR’s extreme strategies is likely to exploit the mistakes, and AIUR will eventually learn so.

If true, that could explain why AIUR learns more effectively in the long run. Presumably the reason that it takes so long to reach its asymptote is that it has to learn the effect of the map size. The tournament had 27 games per opponent on 2-player maps, 18 on 3-player, and 45 on 4-player, not enough to test each of its 6 strategies repeatedly. It could learn faster by doing a touch of generalization—I’ll post on that some other day.

AIUR also claims to implement its strategies with a further dose of randomness. Intentional unpredictability could confuse the learning algorithms of its enemies. I approve.

strategy selection in Overkill

Zerg bot Overkill by Sijia Xu placed 3rd in the CIG2015 and 3rd in the AIIDE2015 tournaments. It’s not the very best, but it’s a top bot. Its games are entertaining to watch because it has more than one strategy.

Today I’m going to poke into Overkill’s C++ source code for strategy selection. There’s not much to it.

Overkill knows 3 strategies, 9-pool zergling, mutalisks, and hydras. For mutas it knows 2 different build orders, a faster one (10 hatch) and a slower one (12 hatch) with a stronger economy. It has an idea of when to switch between strategies within a game. It also learns from experience which strategies work against each opponent. If that sounds fancy, then the code is probably simpler than you imagine. It ain’t long and it ain’t tricky.

Overkill keeps a history file for each opponent so it can remember which strategies work. Code from the main program Overkill.cpp shows that if it doesn’t know the opponent yet, it goes 9-pool on the first game. (I stripped out a few irrelevant comments.)

	string enemyName = BWAPI::Broodwar->enemy()->getName();
	string filePath = "./bwapi-data/read/";
	filePath += enemyName;
	
	//for each enemy, create a file
	historyFile.open(filePath.c_str(), ios::in);
	
	//file do not exist, first match to the enemy
	if (!historyFile.is_open())
	{
		BWAPI::Broodwar->printf("first match to:   %s", enemyName.c_str());
		historyFile.close();

		//default opening strategy
		chooseOpeningStrategy = NinePoolling;
		StrategyManager::Instance().setOpeningStrategy(chooseOpeningStrategy);
	}
	else
	{

I’ll skip over reading the history file (it’s just reading this-strategy-got-that-game-result records and adding up) and go straight to the decision. Strategy selection is a multi-armed bandit problem, and Overkill solves it with a UCB algorithm that makes a tradeoff between exploration (“I’m not really sure how well this strategy works against this opponent, better try it again”) and exploitation (“this one usually wins, I’ll do it”). This is about the same method as other strategy-learning bots.

		double experimentCount = historyInfo.size();
		std::map<std::string, double> strategyUCB;
		for(auto opening : StrategyManager::Instance().getStrategyNameArray())
		{
			strategyUCB[opening] = 99999;
		}

		for (auto it : strategyResults)
		{
			double strategyExpectation = double(it.second.first) / (it.second.first + it.second.second);
			double uncertainty = 0.7 * std::sqrt(std::log(experimentCount) / (it.second.first + it.second.second));
			strategyUCB[it.first] = strategyExpectation + uncertainty;
		}
		
		std::string maxUCBStrategy;
		double maxUCB = 0;
		for (auto it : strategyUCB)
		{
			if (it.second > maxUCB)
			{
				maxUCBStrategy = it.first;
				maxUCB = it.second;
			}
			BWAPI::Broodwar->printf("%s , UCB: %.4f", it.first.c_str(), it.second);
		}

		BWAPI::Broodwar->printf("choose %s opening", maxUCBStrategy.c_str());
		int openingId = StrategyManager::Instance().getStrategyByName(maxUCBStrategy);
		if (openingId != -1)
		{
			chooseOpeningStrategy = openingStrategy(openingId);
		}
		StrategyManager::Instance().setOpeningStrategy(chooseOpeningStrategy);

		historyFile.close();
	}

If you haven’t seen something like this before it might be a little hard to understand what it’s doing, but you can’t say that the code is complicated. strategyExpectation is the strategy’s winning rate so far: You want to pick winners. uncertainty is high for strategies that have not yet been played and decreases as the strategy is tried more times: You want to experiment with strategies that you haven’t tried enough yet. Add them up and pick the one that’s best overall.

The initial strategy is one of 9-pool lings, 10-hatch mutas, or 12-hatch mutas. It never starts the game intending to go hydras. This is declared in StrategyManager.h:

enum openingStrategy { TwelveHatchMuta, NinePoolling, TenHatchMuta };

So how does Overkill decide to switch between strategies? That’s in the StrategyManager::update() method.

	switch (currentStrategy)
	{
	case HydraPush:
	{
		ProductionManager::Instance().setExtractorBuildSpeed(0);
		ProductionManager::Instance().setDroneProductionSpeed(250);

		if (hydraUpgradeTrigger && BWAPI::Broodwar->self()->allUnitCount(BWAPI::UnitTypes::Zerg_Hydralisk) >= 12)
		{
			hydraUpgradeTrigger = false;
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Missile_Attacks);
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Carapace);
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Missile_Attacks);
			ProductionManager::Instance().triggerUpgrade(BWAPI::UpgradeTypes::Zerg_Carapace);
		}
	}
		break;

Well, this isn’t switching strategy at all, it’s implementing the strategy by sending orders to the ProductionManager. Once Overkill goes hydra, it sticks with hydras. What if it goes muta? Here I stripped out the instructions to ProductionManager and left the decision logic.

	case MutaPush:
	{
		std::map<BWAPI::UnitType, std::set<BWAPI::Unit>>& enemyBattle = InformationManager::Instance().getEnemyAllBattleUnit();
		int enemyAntiAirSupply = 0;
		for (std::map<BWAPI::UnitType, std::set<BWAPI::Unit>>::iterator it = enemyBattle.begin(); it != enemyBattle.end(); it++)
		{
			if (it->first.airWeapon() != BWAPI::WeaponTypes::None)
			{
				enemyAntiAirSupply += it->first.supplyRequired() * it->second.size();
			}
		}

		// if lost too many mutalisk , and enemy still have many anti-air army, change to hydrisk
		if (enemyAntiAirSupply >= 2 * 1.5 * 12 && BWAPI::Broodwar->self()->allUnitCount(BWAPI::UnitTypes::Zerg_Mutalisk) * 3 < enemyAntiAirSupply)
		{
			goalChange(HydraPush);
		}

	}
		break;

This is Overkill’s most complex strategy logic, and it’s not very complex. If the enemy has too many units that can shoot down mutalisks, according to a short formula, then stop making mutalisks and switch to hydras.

What if Overkill is still on zerglings? This time I left in the orders to the ProductionManager.

	case ZerglingPush:
	{
		int hatchCompleteCount = 0;
		BOOST_FOREACH(BWAPI::Unit u, BWAPI::Broodwar->self()->getUnits())
		{
			if (u->getType() == BWAPI::UnitTypes::Zerg_Hatchery && u->isCompleted())
			{
				hatchCompleteCount++;
			}
			else if (u->getType() == BWAPI::UnitTypes::Zerg_Lair)
			{
				hatchCompleteCount++;
			}
			else
				continue;
		}
		// slow the drone production speed when only one hatch
		if (hatchCompleteCount == 1)
		{
			ProductionManager::Instance().setDroneProductionSpeed(750);
		}
		else
		{
			ProductionManager::Instance().setDroneProductionSpeed(250);
		}

		ProductionManager::Instance().setExtractorBuildSpeed(15);

		if (BWAPI::Broodwar->self()->gas() >= 100)
		{
			goalChange(MutaPush);
		}
	}
		break;

It counts the hatcheries and, if there are 2 or more, makes drones faster (I think that’s what it means). In other words, make a bunch of lings early on to see what you can do, then focus on drones. This is again implementing the strategy, not switching it. The strategy switch is: As soon as we’ve collected 100 gas (enough to start a lair), switch to mutas. In other words, the strategy path is hardcoded zerglings to mutas to hydras, and Overkill can start with either zerglings or mutas (and for mutas, with either a 10- or a 12-hatchery build). Dead simple.

You might draw the lesson, “If this is all the smarts a top-3 bot has, no wonder humans play so much better!” You won’t be wrong. But that’s where we are today, and we have to make progress one step at a time. The stronger Tscmoo bot knows more strategies than Overkill, but its strategy selection and strategy switching are hardly any smarter.

A more fun lesson to draw is “Hey, this is easy! I can make a top bot without worrying much about strategy selection, even though it seems like such a hard problem.” That’s true too.

There are plenty of other lessons to draw. When Overkill doesn’t expand early enough to its natural, it often gets stuck on one base for the whole game and (since zerg have a lust to expand) gets smooshed. Also, Overkill’s zergling and mutalisk strategies work okay, but the hydralisk strategy can backfire, especially against terran. A terran ground army can roll up an equal-size hydra army like a rug. The first thing you should worry about is not picking a good strategy, but making the strategy you pick good. Any decent strategy can beat other bots if it’s carried out crisply.