machine learning - 7 | Starcraft AI blog

Steamhammer wants opening learning 1: McRave

I still have some essential bugs to fix. Also I keep literally forgetting that I promised a public repository next, and having to remind myself; it’s not appealing work. But beyond all that, I’m thinking about the next step.

The next major feature will be a start on opponent modeling and opening learning. As I think about it more, I’m seeing how important a feature it is to add soon, so I don’t regret setting out down the path. I’m going to write a series of posts giving examples to show how key a feature opponent modeling is.

McRave

The current version of protoss McRave doesn’t play that strongly. Its new forge-expand strategy is not polished yet, and I think new weaknesses have been introduced. But even so, it beats Steamhammer. The live Steamhammer always plays low-econ pressure builds versus protoss, and (unlike most protoss bots, even Bereaver) McRave is a sturdy enough defender to hold off the pressure. While trying to pile on pressure, Steamhammer doesn’t make enough drones, and as the game wears on it can’t keep up.

It’s easy enough to change Steamhammer’s behavior. There is a standard opening which is slower than 12 hatch 11 pool and faster than 3 hatch before pool: It is 12 hatch 13 pool, squeezing in 2 extra drones before the spawning pool. Those 2 drones make a substantial difference in the economy, because the earlier you spawn a drone, the more it pays off. The trade is that zerglings come later and the opponent gets a window for early aggression. Anyway, this McRave version doesn’t go for early aggression, so I coded up a quick 12 hatch 13 pool into 3 hatch hydra build. Sure enough, my first draft won most test games against McRave.

I don’t want to make 12 hatch 13 pool a standard build versus protoss, because too many bots go for early attacks. Steamhammer’s low-econ openings are effective against most opponents. But I need economic openings to win against defensive opponents like McRave.

I could specify an enemy specific opening mix versus McRave and immediately turn a bunch of losses into wins, at least for a time. Steamhammer does that versus rushbots and a few others. Arrakhammer does it too; it has hand-made builds to beat Wuli and McRave, and one to give it a chance versus Iron. But it’s not satisfying. It’s not sustainable, because I have to keep updating the hand-made builds by hand as opponents appear and change. And it’s likely to fail in tournaments, because many opponents show up with surprise updates that are specifically intended to throw off opponents which tune against them.

The answer is so learn each opponent’s habits from experience.

recommended deep learning textbook

The recent textbook Deep Learning looks excellent to me. The authors are Ian Goodfellow, Yoshua Bengio, and Aaron Courville. There is an expensive edition, but it is also readable for free at the website. It’s a much-recommended book, and I recommend it too.

It is a theory book more than a practical book. I would say it is for people who have a computer background and perhaps don’t have deep math experience yet but aren’t afraid of math and are willing to dig in. I think it should be a good book for an early grad student, or an undergrad with strong interest, or a bright high schooler. The first part of the book presents the math knowledge you’ll need, like linear algebra and probability theory, so it is possible to start if you don’t know much. As always, the more background you have, the easier it gets.

To become expert, you have to know the theory and have experience applying it. If you want practical exercises to work your way into the technology, I think your approach should be to pick a software framework first (for example TensorFlow, Torch, Caffe) and then seek out tutorials or sample projects specific to the framework.

Everyone has their own learning style. If I were getting into deep learning from scratch, my approach would be: Read the whole book once through quickly to get an idea of the shape of things. With an overview in my mind, I could pick out parts that I needed to step through slowly and carefully. It takes time and practice to make unfamiliar concepts familiar, so if I hit topics where I felt weak or awkward I might seek out other resources.

Steamhammer versus Tyr

Here’s a curious point about Steamhammer’s ZvT and Tyr by Simon Prins. Tyr has opening learning, and center map BBS is its answer to bots which don’t defend themselves early. The BBS has always beaten Steamhammer’s mutalisk builds and lost to its zergling builds.

In the past, Tyr has tried BBS against Steamhammer and given it up after a while after losing; it concluded that a regular opening was better. Past Steamhammer versions played zergling openings 25% of the time, which was apparently enough to deter BBS even though Tyr’s slower play didn’t consistently win 75% of the time (it varied by version).

This Steamhammer version plays zergling openings ZvT 20% of the time, because the mutalisk openings are improved more (that was my thinking, at least). Tyr apparently detected the shift in game results, and now it plays BBS every game and wins 4 out of 5, a huge upswing. Can improved play can lead to worse results when it highlights remaining weaknesses for the opponent’s learning to exploit? It could also be because Tyr was updated recently. Or it could be a chance change due to the interaction of opening learning, random choice of builds by Steamhammer, and historical changes in Steamhammer’s performance.

To fix the weakness I thought of a simple adaptation, and I hope to try it out in an upcoming version. Steamhammer is prepared for early pressure versus zerg or protoss opponents, but against terran it tries to exploit the tendency of most terran bots to sit back and macro for a while. So it only has to adapt in the case of early marine pressure, as played by Tyr (with BBS) and the latest tscmoo (with an academy rush) and a number of weaker marine bots like Kruecke and KaonBot. Steamhammer should have better chances to survive if it breaks out of its prepared build when it recognizes the early pressure, and lets the strategy boss do its default thing. I doubt my simple idea is good enough by itself for all cases, but I’ll try it.

Overkill’s new learning 3 - one model or many?

My first question about Overkill’s model is: One model for all opponents, or one model for each opponent? It turns out that the answer is: It depends. It checks curMode.

enum developMode { Develop, Release };
extern developMode	curMode;

Here is an example. Bits of code like this show up for each of the 3 data files used to store the model and learning information.

	if (curMode == Develop)
	{
		filePath = “./bwapi-data/write/RL_data”;
	}
	else
	{
		string enemyName = BWAPI::Broodwar->enemy()->getName();
		filePath = “./bwapi-data/write/RL_data”;
		filePath += enemyName;
	}

In the AIIDE 2016 competition, curMode is set to Release. It looks as though each opponent gets its own model, learned independently. But not learned from scratch!

My idea that Overkill has a general model turned out true. (I may have read it somewhere and forgotten where.) When it plays an opponent for the first time, it uses a model defined in file ModelWeightInit.h as the initial model, and learns starting from there. I don’t see any information about how the initial model was created. It may have been trained by playing against a variety of opponents, in Develop mode.

You could say that the initial model is “how to play Starcraft” and the refined model made for each opponent is “how to beat this opponent.” The same learning system can be used both offline to learn the game and online to model the opponent.

How well did opponent modeling work? We can look at Overkill’s graph of win rate over time in the AIIDE 2016 results. Its winning rate after 20 rounds was 0.59 and after 90 rounds was 0.62. The curve shows a fairly steady rise, and visually it’s convincing that Overkill learned something about its opponents, but it improved its win rate only a little. The unsteady learning curve we saw in the description suggests that Overkill might have eventually learned more if the tournament had gone on long enough—maybe.

Next: The features of Overkill’s model.

Overkill’s new learning 1 - the description

Overkill has new strategy learning unlike what we’ve seen before. I’m excited to look into it.

Author Sijia Xu updated Overkill for AIIDE 2016 with a machine learning method to learn what units to build after the opening, depending on the game situation. As far as I know, no other bot has anything similar. So far, this version of Overkill played only in AIIDE 2016, where it finished 7th; it’s not playing on SSCAIT or anywhere else that I know of. But the code is in Sijia Xu's github repository along with a description.

overview

I’ll start with the description. To quote:

Overkill currently contain two main AI system:

building unit selection. Using reinforcement learning model to choose the most proper unit to build(currently only three types of unit to choose: zerglings,mutalisk,hydralisk) according to current game status, and saving the match info to continually improve the RL model itself.

opening strategy selection. Using UCB1 to select the best opening strategy at the the beginning of the game according to the previous match result against this opponent.

In other words, the opening strategy learning that I went over before remains the same, though according to the readme the openings have been trimmed back to let the unit selection model make more decisions.

It sounds as though Overkill’s tactical abilities are not changed. Overkill 2015 plays with zerglings, mutalisks, and hydralisks and nothing else, switching from one to the next in a fixed order when simple heuristics are triggered. Overkill 2016 plays with the same units but can change between them however it likes depending on the way the game develops. Its play should be more flexible and fun.

My thought: Unit selection is an important skill, but it is only one skill. Unit selection among only 3 units is an interesting experiment, not a way in itself to score a lot of wins. The most basic zerg skill isn’t knowing when to make which fighting unit, it’s knowing when to make drones, and that is not learned here. With no ability to use scourge, lurkers, or any hive tech, and hand-coded tactics that maneuver each type of unit independently of the other types, Overkill does not have a high ceiling on its skill level, no matter how smart the unit selection model grows. That’s why it only came in 7th, I guess. It’s the academic attitude at work, tackling one problem at a time and not worrying about full generality until later. And I’m intensely interested to see what lessons we can draw from the experiment.

the method

Now I’ll cover the “model”, “feature” and “learning methods” sections from the description. They are written to be understandable to AI people, and in fact I can understand the outline clearly enough. But it’s probably incomprehensible unless you know the jargon, so I’ll try to explain step by step in followup posts.

The basic algorithm is Q-learning, a fundamental reinforcement learning algorithm. Q-learning is well worth knowing if you care about AI at all, so I’ll write a post to explain it. Q-learning is a plug-in algorithm where you can plug in different kinds of models which are learned by another algorithm.

Overkill plugs in a model of a linear fit to a large number of binary game features, which is exactly the same kind of model that I briefly mentioned in breakthrough game programs as used in the evaluation function of the othello program Logistello. The description says that Overkill has about 4000 features, which (believe it or not) is a small number for this kind of model! Having more features might make it too slow. The features are things like which units and tech each side has, and so on. I’ll read the code and find out details.

Overkill does its linear fit to the features not using regression like Logistello but using stochastic gradient descent or SGD. Versions of SGD are state of the art and heavily used in deep learning (and more). SGD is important, but it’s not fundamental so I don’t intend to look at it.

The description does not tell us whether Overkill includes “who is my opponent?” in its model. Looking at the I/O files from AIIDE 2016, I see files like “feature_gradientAiur”, “feature_valueAiur”, and “RL_dataAiur” for each opponent. It looks as though Overkill learns a different model for each opponent, although it could combine the opponent data into one model. I’ll read the code and find out.

the results

The description shows us 3 graphs from one experimental run of Overkill playing 600 games versus IceBot. That’s a lot of games. The graphs show Overkill’s model appearing to converge, slowly and unsteadily, and improve its already-high winning rate.

My thoughts: 1. One experiment does not tell us whether this works. It seemed to work against IceBot. 2. The learning rate was slow. It took hundreds of games. As I pointed out in the generalization for strategy learning post, reinforcement learning doesn’t stand out as a good choice when you have limited data. 3. Therefore Overkill’s learning seems suitable only for a very long tournament. If Overkill learns a model for each opponent, then the 90 games against each opponent in AIIDE 2016 were not enough. If it learns one overall model, then there may be enough games but an experiment against one opponent doesn’t tell us.

I don’t see where this is written, but somewhere I got the idea that for AIIDE 2016, Overkill started with a general model trained against some range of opponents, its prior knowledge. Then it refined the prior knowledge with experience during the tournament. It seems like a good idea. On the other hand, Overkill’s AIIDE 2016 winning rate of 69/87 (79%) doesn’t match the expected >90% win rate from the graph if Overkill had been pre-trained against IceBot. So I’m not sure. A general model plus 90 games for each opponent could produce the results we see—it’s plausible.

Next: Q-learning.

learning signals

For strategy learning, current bots as far as I’ve seen learn based on the game result and nothing else. It’s also the only learning I’ve written up so far.

I’ll tell you one of the deep secrets of the dark magic of machine learning: If you want to learn better, don’t grub around for a better algorithm like I did yesterday. I mean, you can and it will probably help. But first, dig for better information to learn from. The big gains most often come from finding better learning signals. Yesterday’s suggestion about generalizing across opponents and across maps was an example.

When Bisu loses, does he adjust his probability of playing the game opening downward? Not like that, no. He thinks through the game events and finds the cause of the loss. If Bisu came out of the opening in a sound position, then it would be silly to blame the opening for the loss, no matter what happened later in the game. (By the way, this is an example of the classic credit assignment problem, one of the oldest named problems in AI: I got this result. What features of the situation deserve credit or blame for the result?)

I expect that it will be a long time before bots can reason about cause and effect. But they should be able to figure out “am I more likely ahead or behind?” In fact, that can be a learning target itself. The input data is scouting info seen at a point during the game, and the goal might be (for example) to estimate the actual supply difference as seen in a replay (if you do learning from replays) or to estimate the probability of winning the game (which works for learning during games by temporal differences—worth reading up on if you don’t know it). Once you have the ability to estimate whether you’re winning, you can learn to choose the opening that leaves you in the strongest position, not the opening that is seen to win most often. If your estimate is good then it provides more and better information (a score at the correct point in the game) than whether you won or lost the game (1 bit after the game), so you’ll learn faster and better.

As a rough cut you could say: A quick win or loss is definitely related to the opening. If the bot adapts during the game, then the longer the game, or the more adaptation done after the opening, the less credit or blame the opening is likely to deserve. In fact, if you lost a long game then the opening might deserve credit for putting you in a position to survive that long!

The same general idea, look for good learning signals, goes for all kinds of learning. You already knew that if you want your bot to learn micro, don’t count won or lost battles, count units lost and damage done. It’s obvious, right? And so on.

generalization for strategy learning

This post is for people who want to do strategy learning better than we have seen so far, but who haven’t married AI and are looking for a few hints on what’s good to try. I assume the simplest case: The bot has a fixed set of strategies and wants to choose one based on experience (but possibly influenced by scouting info). Similar ideas work in more complicated cases, too.

In past posts I looked at the strategy learning done by Overkill and AIUR. Overkill learns (strategy, opponent) and AIUR learns (strategy, opponent, map size). I found out that, on the one hand, AIUR learned more by including the map size, but on the other hand, AIUR learned more slowly and didn’t have time to explore the possibilities thoroughly and find the best. It would be nice to learn (strategy, opponent, opponent race, map, player positions, any other results of early scouting), but how could a bot possibly learn so much?

Overkill and AIUR learn tables of outcomes. Tabular learning is slow learning because it does not generalize. AIUR may win with its cannon rush on a 2-player map against opponent A and opponent B, but when it faces opponent C on a 2-player map it starts with a blank slate. It doesn’t try cannon rush first because that worked against other opponents, it says “well gosh darn I don’t know a thing yet, I’ll pick randomly.” And again, when nexus-first wins against opponent D on 2-player and 3-player maps and AIUR faces opponent D on a 4-player map for the first time, it’s “well gosh darn.”

Tabular learning is, well, it’s the only kind of learning which does not generalize. Tabular learning is a form of rote memorization, and all the countless other learning algorithms try to generalize in one way or another. That doesn’t mean you should learn strategies using any random algorithm you have lying around, though. You can, but it’s best to look for one that suits the problem.

The problem requirements are not too complicated.

1. Our algorithm’s input will be a set of past observations like (strategy, opponent, any other data you want to include, game result). The output will be the strategy to play this game, where you don’t know the game result yet. Or at least the output will have enough information to let you decide on a strategy. Estimated-probability-to-win for each strategy choice is one idea.

2. Some of the inputs, like the opponent, are categorical (as opposed to numerical). We need an algorithm that likes categorical inputs. Some work best with numerical inputs. One way to look at it is: Fitting a curve from opponent A to opponent C doesn’t tell you anything about opponent B, so you don’t want an algorithm that’s always trying that.

3. The algorithm should work well with small to moderate amounts of data. In the first game of the tournament, with no observations made yet, you’ll pick a strategy from prior knowledge (pick randomly, or pick one that did well in testing, or a combination). In the second game, you want to consider your prior knowledge plus 1 data point. The prior knowledge stops some algorithms from saying “we lost the first game, by generalization all strategies always lose.” You want the 1 data point to be important enough to make some difference, and not so important that it immediately overrides prior knowledge. And so on to thousands or tens of thousands of data points if the tournament is that long (it’s hardly likely to be longer); by then, prior knowledge should not make much difference.

4. You also want to consider exploration. If you always play the strategy that looks best (a “greedy algorithm”), then you may be overlooking a strategy that plays better but happened to lose its first game, or that never got tried. You have to explore to learn well.

My suggestions. First, exploration is not hard. Epsilon-greedy (see multi-armed bandit) should always work for exploration. There may be better choices in particular cases, but you have a fallback. You can do better if the algorithm outputs not only an estimated win rate but also its confidence in the estimate: Preferentially explore options which have low confidence.

Second, prior knowledge is not too hard either. You can always encode your prior knowledge as a set of fictional data points, fish story style. Again, there may be better ways, especially if you go with a Bayesian algorithm which by definition includes priors.

The requirement to work with varying but mostly modest amounts of data means that batch algorithms that analyze the dataset as a whole are preferred. Incremental algorithms that analyze one data point at a time, like the huge family of reinforcement learning algorithms that includes most neural networks, are by and large less suitable; they have a harder time controlling the level of generalization as the amount of data increases, to learn fast enough without overfitting. It’s not that reinforcement learning won’t work, or even that it can’t be made to work just as well, but without extra knowledge and care you can expect it to be less effective or less efficient. I was surprised to see the new version of Overkill use reinforcement learning for unit production decisions—it may be a good choice, but if so it’s not obvious why.

I suggest boosted decision trees. Decision trees have good generalization properties with small and modest amounts of data, and adding a boosting algorithm increases their accuracy. Since there’s not too much data and strategy learning happens once per game, speed should not be a problem. (If it does get too slow, then discard the oldest data points.) Go look up code to implement it and check the license, you know the drill.

It’s just a suggestion. Other choices may be better.

In a little more detail, at the end of each game the bot records the result with whatever other information it wants to learn from: Opponent, race, map, etc. At the start of each game it reads the records and runs its learning algorithm from scratch (it doesn’t have to or want to remember what it thought it knew last game). You may want to vary this depending on tournament rules about when learning data becomes available.

With the learned model in hand, the bot can look at the game situation, run it through to find out what strategies seem best, and combine that with the exploration policy to decide what strategy to play.

What if some inputs are not known yet? Say the opponent is random and your scout didn’t find out the enemy race before it’s time to decide on the initial strategy. If the learning algorithm estimates win rates, here’s one way: Run the game situation through three times, once with each race, and combine the results. There are different ways to combine the results, but averaging works. The same for other information that you don’t know yet; run through each possibility that hasn’t been excluded (“I know they’re not at that base, but then my scout died”). If there’s too much unknown info to test all possibilities against your learned model, then limit it to a statistical sample.

Generalizing across opponents. If you have an opponent model, you can do better. If you’re able to recognize characteristics of your opponents, then you can remember the information in an opponent model and use the models to generalize across opponents. It’s a way of learning counter-strategies alongside counter-this-opponent strategies. I think opponent modeling should make strategy learning more effective. “Oh, opponent X went dark templar and I won with strategy A. Now I’m fighting opponent Y, which has been known to go dark templar too.”

opponent random?
opponent race
how rushy/all-in? (consider the earliest attack, or the early economy)
when (if ever) did opponent make unit X (for each X)?
when did opponent get upgrade Y (for each Y)?
when did opponent first use spell Z (for each Z)?
or in less detail: when did opponent get air units/detection/etc.?
how soon/often did opponent expand?
did the opponent scout my whole base?
was the opponent seen to take island bases?
was the opponent seen to attack island bases?

Or whatever you think might help. Since there’s never a ton of data, the huge number of inputs in the list might be too much.

Generalizing across maps can follow the same kind of idea: Number of players on the map, air distance and ground distance between bases, and so on. Adapting your strategy to the map is basic to Starcraft strategy, and bots are weak at it.

Zia and its coat of many strategies

I had been hoping that Zia would start to choose between its openings, and now that it has I want to see how it’s doing. So I watched a bunch of replays. It’s using strategy learning, though I can’t say in what form. I predicted that choosing between its strategies would be advantageous, and it’s true to an extent.

With more variety, Zia has become more entertaining to watch. I like it. Zia plays these openings that I’ve seen:

5 pool
9 pool
9 pool speed
12 hatchery

I didn’t catch it playing overpool or 12 pool, which you might expect to be common.

Zia’s opening chat message gives a hint about its opponent model. It says “Nice to meet you!” for new opponents and “Hi again!” for opponents it has met before. And it either predicts a “harsh game” or claims “I may overwhelm you.” I think it picks the second message when it believes it has found a strong counter strategy.

Against opponents with a single strategy which is directly countered by one of these openings, like ZZZKbot’s 4 pool (hard countered by 9 pool plus a sunken so that the trickle of attacking lings has no chance whatever), Zia seems to learn the counter and should then win every game. Zia even managed to find a strategy that gives it a chance against IceBot—Zia won a game which brought out weaknesses in both bots, weaknesses I didn’t realize IceBot suffered from.

And I see signs that Zia adapts after the opening. For example, I saw it add a spire when it needed scourge for air defense. I get the impression that it decides flexibly between hydralisks, mutalisks, and lurkers for the middle game—at least it’s not hardwired, maybe it’s random, I hope it’s learned. I have seen it play 12 hatch, 11 pool, 10 gas and also 12 hatch, 11 gas, 10 pool; I hope it’s foreseeing how much gas it will want to boot up its future unit mix.

Playing many openings does have a disadvantage: It’s harder to play all of them well. It’s not enough to know the build, you have to know how to play it, and it adds up to a lot of knowledge. The worst is Zia’s 9 pool speed opening, which it plays in a strange way as a late zergling all-in: It makes 100% zerglings until it attacks around supply 20-24; if it fails, Zia doesn’t have enough economy for the middle game. (I expect a 9 hatch build would strike harder if you want to play that way.)

Zia still plays poorly overall, if you ask me. It needs to brush up on skills like not sending drones through the enemy army. It needs better scouting (it doesn’t send out its initial overlord), better tactics (no, don’t run up the ramp to fight the bunker! Hit the SCVs in the expansion first!), better engagement skills (big groups of zerglings should surround before attacking), and better micro (in lings versus zealots, retreat a ling that will die in 1 more hit). And stuff. It’s a hard game.

Zia’s current description is “Implementing more strats . . .” I guess the author has the most fun with that, which is all that really matters, but it’s not the way to a winning bot. Breadth of skills, not depth of skills: You gain more by reducing your weaknesses than by increasing your strengths. Zia already has relative strength in strategy, and will improve most with other skills.

Hmm, I should write a post about The Winning Attitude for authors of game programs. Only for those who seek the winning attitude, of course; it’s optional.

Tomorrow: Novelty maps.

Tscmoo terran apparent neural network output

I was watching the new Tscmoo terran with its reputed neural networks.

Hmm, what are those red and blue dots?

detail of apparent neural network output

I read that as the output of the neural network. The dot diagram is incomprehensible unless we know about the network layout. The text is the interpretation; it looks like strategy instructions or hints to the rest of the program. I timed a couple of updates and found them 15 seconds apart, which fits with strategy information.

I can’t tell what the details mean. How can the army composition be tank-vulture if you open with two starports (see those wraiths on the screen)? Is that a prediction for the opponent, maybe? What does “support_wraiths” mean, since I didn’t notice the wraiths seeming to support or be supported by anything?

you need more than 1 strategy

Martin Rooijackers aka LetaBot read my posts about Zia and wrote to point out that a zerg bot facing terran wants both mutalisk and lurker options. The reason is that terran may counter the mutas. He mentioned 5 barracks with +1, which should hard counter mutas. He also called out valkyrie and goliath possibilities, specifically pointing out that valkyries force mutas to spread out, which reduces their potential. Zerg needs to scout the build and react before overcommitting to mutalisks—at the latest when the first fliers arrive at the terran base and see what’s up.

Zerg can’t stick with tier 1 units (zerglings and hydralisks) because any likely terran midgame army will walk over them. And hive tech takes time. Lair units are key to the middlegame.

If zerg always goes mutas, any terran with strategy learning will find a way to counter the mutas and gain an advantage every game. I think this has already happened with Zia and Tscmoo terran. If zerg sometimes opens mutas and sometimes lurkers, then terran faces a risk trying to counter mutas with marines—the lurkers counter marines. Terran’s best play becomes less committal and more cautious, and that favors zerg.

Mainline pro play has the zerg starting with a limited number of mutas and using the time they buy with cautious harassment to get lurkers and rapidly tech to hive. But pros of course are totally comfortable with adaptation and tech switches. Not all games follow the main line. Today’s game of Flash (T) vs. Zero (Z) was a great example: Flash opened 14 CC, Zero responded logically with 3 hatcheries before pool and went lurkers while Flash prepared for mutalisks.

Any bot with only one strategy stands at a disadvantage against bots with opponent modeling. It’s true for all matchups. Today’s simple strategy learning will find a counter-strategy within a dozen games, usually less. Humans, and tomorrow’s sophisticated opponent modeling bots, may counter the strategy of the first game in the second, and should quickly find strong counters to most fixed strategies.

To beat humans, or to beat opponent modeling bots, you’ll need strategy flexibility plus either learning or a dose of randomness, ideally both. I promise. If sophisticated opponent modeling doesn’t arrive fast enough for me, I’ll provide it myself. It will make bots much more interesting to watch and to play against.

what AIUR learned

After Overkill yesterday, I wrote a not-quite-as-little Perl script to read AIUR’s learning files. AIUR learns more data: Overkill learns a table (opponent, strategy), while AIUR learns a table (opponent, strategy, map size) where map size is the number of starting positions, which is 2, 3 or 4 in AIIDE 2015.

Unlike Overkill, AIUR recorded every game exactly once, missing none and adding none, so its data should be easier to interpret.

Here’s a sample table for one opponent. Compare it against AIUR’s row in Overkill’s table from yesterday. See the full AIUR learning results.

overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	18	67%	3	33%	1	0%	22	59%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	1	0%	3	33%	25	12%	29	14%
defensive	5	40%	9	33%	15	40%	29	38%
total	27	52%	18	28%	45	20%	90	31%

For reference, here are AIUR’s “moods,” aka strategies.

cheese - cannon rush
rush - dark templar rush
aggressive - fast 4-zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes

We see that against Overkill, the cannon rush was relatively successful on 2-player maps, 3-player maps were a struggle, and on 4-player maps AIUR discovered a little late that the defensive mood was better than the macro mood. We also see that AIUR barely explored further when it found a reasonably successful try. If the best strategy was one that happened to lose its first game and didn’t get tried again, it would never know. With so many table cells to fill in, the tremendously long tournament was not long enough for AIUR to explore every possibility thoroughly.

AIUR selected strategies with an initial phase of try-everything-approximately-once followed by an epsilon-greedy algorithm, with epsilon set at 6%. Epsilon-greedy means that 6% of the time it chose a strategy at random, and otherwise it made the greedy choice, the strategy with the best record so far. With 90 games against each opponent to fill in 18 table cells, most cells never came up in the 6% random sample.

It should be clear why AIUR was still improving steadily at the end of the tournament! I offered a theory that AIUR learned so much because of its extreme strategies. If you read through the full set of tables, you’ll see that a strategy which works on one map size only sometimes works on other sizes too. The combination of opponent and map size paid off in ways that neither could alone, though only sometimes.

Overkill and AIUR fought a learning duel during the tournament. Both are running learning algorithms which assume that the opponent does not change (or at least settles down in the long run), and both bots violated the assumption. AIUR violated it more strongly. Was that an advantage? Could there be a connection with AIUR’s late discovery of the defensive strategy on 4-player maps?

I updated the zip archive of the Perl scripts and related files to add AIUR’s script alongside Overkill’s. By the way, I haven’t tested it on Windows, so it might need a tweak or two for that (nothing more than one or two very small changes).

what Overkill learned

I wrote a little Perl script to read Overkill’s learning files from AIIDE 2015 and add up the numbers. The three strategy names are as Overkill spells them. The opponents are listed in tournament order, so the strongest are at the top.

	NinePoolling		TenHatchMuta		TwelveHatchMuta		total
opponent	n	win	n	win	n	win	n	win
tscmoo	57	26%	19	11%	18	11%	94	20%
ZZZKBot	80	46%	8	0%	8	0%	96	39%
UAlbertaBot	61	30%	20	15%	10	0%	91	23%
Aiur	13	54%	66	80%	3	0%	82	73%
Ximp	2	0%	30	83%	57	93%	89	88%
IceBot	4	25%	72	83%	14	57%	90	77%
Skynet	13	62%	19	68%	58	84%	90	78%
Xelnaga	75	81%	12	50%	3	0%	90	74%
LetaBot	78	100%	10	70%	2	0%	90	94%
Tyr	6	33%	25	64%	53	77%	84	70%
GarmBot	27	96%	27	96%	36	100%	90	98%
NUSBot	66	100%	13	77%	11	73%	90	93%
TerranUAB	30	100%	30	100%	30	100%	90	100%
Cimex	56	100%	33	94%	2	0%	91	96%
CruzBot	30	100%	30	100%	29	100%	89	100%
OpprimoBot	24	96%	33	100%	33	100%	90	99%
Oritaka	56	98%	10	70%	24	88%	90	92%
Stone	56	93%	12	67%	21	81%	89	87%
Bonjwa	30	100%	30	100%	30	100%	90	100%
Yarmouk	30	100%	30	100%	30	100%	90	100%
SusanooTricks	32	100%	23	96%	32	100%	87	99%
total	826	80%	552	80%	504	83%	1882	81%

The number n here is not the number of games played. There were 90 rounds. Some games were perhaps not recorded due to crashes or other errors, which could explain why some opponents have n< 90. Also, when the 10-hatch mutalisk strategy failed, Overkill assumes it must have lost due to a rush that would also kill the 12-hatch muta strategy. In that case Overkill records 2 game records, a 10-hatch muta loss and a 12-hatch muta loss, explaining why some opponents have n> 90. At least that’s what the code says; some of the data in the table doesn’t seem to match up (see the Xelnaga row). What did I miss?

Some of the strategy choices make sense intuitively. Overkill learned to get early zerglings against ZZZKBot and UAlbertaBot which play rushes, and learned that a more economy-oriented strategy worked against XIMP with its later carriers. These are examples of learning as a substitute for scouting and adapting.

Look at the bottom row. Each strategy ended up with virtually the same winning rate; the UCB algorithm evened them out accurately. But it didn’t use the strategies equally often; the 9-pool was more successful on average against this set of opponents. The early zerglings are important against many opponents, for whatever reason.

Look at the individual lines. Except for weaker opponents that Overkill defeats no matter what, for most opponents one or two strategies were clearly better and were played more often. How much did Overkill learn? If it had played strategies randomly, then the winning rate would be the average of the strategy winning rates. The gain can be estimated as the total winning rate minus the mean of the strategy winning rates—how far did you rise above ignorance? The number varies from zero to huge for different opponents. Because of sampling effects, the estimate will statistically tend to be higher than the truth.

This learning method has to play weak strategies to find out that they’re weak, so it can’t be perfect. The regret for each opponent can be estimated as the difference between the total winning rate and the winning rate of the best strategy if you’d known to play it from the start—how far did you fall short of omniscience? For many of the opponents, the regret estimated that way is 6% to 7%. If the learning algorithm converges to an exact solution, then in an infinitely long tournament the regret will fall to 0. Thinking about numbers like this can give you an idea of when learning makes sense.

The Perl script and related files are available as a zip archive.

panic button and fish story

Yesterday’s post was about prior knowledge. The posts before were about learning. Today’s is about prior knowledge for learning.

I was inspired by a remark from Dave Churchill, author of UAlbertaBot, in his new A History of Starcraft AI Competitions: In AIIDE 2015 “UAlbertaBot had [only] a 2/3 winning percentage against some of the lower ranking bots due to the fact that one of the 3 races did not win against those bots.” UAlbertaBot, playing random, had its learning turned off, presumably because the selected strategy for each race was dominant. With learning turned on, it would have lost games trying weaker strategies before settling on the dominant strategy, ending up behind overall—so the thinking, if my guess is good.

Well, that’s like Bisu defeating Savior. When somebody comes up with a counter for the game plan you thought was dominant, don’t you think you should try something different?

You can have it both ways. You can restrict yourself to playing your dominant strategy unless and until it turns out to lose repeatedly. You don’t have to lose games exploring your options; you can take losing to mean that you should start exploring your options.

The panic button implementation is simple. Start out recording the game results as usual, as if learning were turned on, but ignore them and always pick your dominant strategy. But when you get to (say) >10 games with <10% win rate, hit the panic button and let your algorithm try alternatives. It’s unlikely to make things worse!

The fish story implementation is also simple. Pretend, before the first game with a new opponent, that you actually have a history with this opponent. Tell yourself a fish story: “Oh, strategy A, I tried that a few times and always won. And strategy B sucked, I tried that a time or two and lost.” It’s literally a few lines of code to slide fictitious history into your learning data, and you’re done. Your strategy selection algorithm will look at it and say “Strategy A, duh,” and as long as A keeps winning it will explore others at a low rate.

The simpleminded learning algorithms that bots use today assume that you start out knowing nothing about which choices are better. And that’s just false. You always know that some strategies are stronger than others, that some are safe and work against many opponents while others are risky and only exploit certain weaknesses. With the fish story, your bot can start out knowing that A is reliable (“it won repeatedly”), B is a fallback (“it lost once”), and C can be tried if all else fails (“it lost some times”) in a last ditch attempt to trick a few points out of a killer opponent. Or any combination you want.

If you have prior knowledge about your opponents but you’re not sure whether they’ll have updates for the tournament, you can go Baron Munchausen and tell yourself a different fish story about each opponent.

Many variations and other ideas work too. Think about your strategy choices and your selected algorithm and how you would like it to behave. You can probably find your own ways.

Update: Dave Churchill told me the real reason behind UAbertaBot’s decision: He ran out of time! He wrote that he actually implemented a “panic button” method, but did not have time before the tournament to test it and make sure it was solid. I think it’s enough that UAlbertaBot can play random—progress comes one step at a time.

how much does learning help?

Here’s the cumulative win rate graph for the bots that looked like they might be learning. I count UAlbertaBot as not learning, since the author said so.

winning rates for the learning bots

The gyrations on the far left are mostly statistical noise. AIUR learns well, as we know. Tscmoo and Overkill also improve noticeably, each gaining about 3% in win rate between round 20 and the end (enough to move up 1 place in the tournament). LetaBot has a slight upward trend. The others look flat or even trend downward; either they are mislearning, or they are losing more games to the smarter learning bots, or they are drifting due to statistical noise. Statistical noise is usually bigger than your intuition says.

Among the learning bots, the three bots which learned best also finished best.

The non-learning bots:

winning rates for non-learning bots

Most look flat; all trends are slight, except that XIMP gains over 2% from round 20 to the end. Are the weak learning bots mislearning against it? It would be interesting to compare the non-learning bots that better withstood the increasing pressure of the successful learners to see if some common factor in their play made them harder to exploit, but that would be a tough analysis.

Bottom line: Tscmoo and Overkill each learned enough to overtake their nearest opponents, which was possible only because their nearest opponents were so near. AIUR increased its win rate by a giant 10% and overtook a few opponents early, but after round 25 no opponents were in reach. No other bot improved enough to make a difference. Learning, as implemented so far, can give a small edge to a few bots that do it well enough.

With more smarts, bots can learn more and faster. I’ll be suggesting ideas later on (I don’t run Machine Learning in Games for nothing). I hope to see bolder learning curves in this year’s competitions!

which bots learn?

The AIIDE 2015 tournament results include an archive of the directories the bots were allowed to read and write. The tournament was divided into round robins, and after each bot had played every other on the current map the accumulated files in the bot’s “write” directory were copied to its “read” directory, where they could be read back in in the following round. Bots with nothing in their write directories did not learn. Bots with files there at least recorded some information.

Here are the bots that look like they tried to learn, sorted by final standing. Entries learning? yes mean only that the bot wrote files there, not that the bot read them back in or used the information (that’s harder to figure out).

Bottom line: Tscmoo’s files have a curious variety of information. It may be doing something interesting. Nobody else tried anything beyond the straightforward. All bots that stored data wrote one text file per opponent, possibly because the contest rules suggested it; more sophisticated schemes risk slowness or loss of data.

	bot	learning?	comments
1	tscmoo	yes	one file per opponent, human readable-ish
2	ZZZKBot	no
3	Overkill	yes	one file per opponent, with lines opponent\|strategy\|game result
4	UAlbertaBot	yes	though learning was said to be turned off for this tournament
5	AIUR	yes	one file per opponent, 91 numbers each
6	XIMP	no
7	ICEbot	no
8	Skynet	yes	one file per opponent, 7 to 13 lines each in the form “build_2_3 2 0”
9	Xelnaga	yes	one file per opponent, each a single integer in the range [-1,3]
10	LetaBot	yes	one file per opponent, much repetitive information
11	Tyr	yes	one file per opponent, each “win <number>” or “loss <number>”
12	GarmBot	no
13	NUSBot	no
14	TerranUAB	no
15	Cimex	yes	one file per opponent, each empty or with only two numbers
16	CruzBot	yes	one file per opponent, six flags 0 or 1 for each
17	OpprimoBot	no
18	Oritaka	no
19	Stone	no
20	Bonjwa	no
21	Yarmouk	no
22	Susanootricks	no

There’s a folder for Nova, a bot which did not participate. I suppose it intended to.

AIUR learns more

The protoss bot AIUR by Florian Richoux has a set of hand-coded strategies and learns over time which strategies win against which opponents. That’s a popular religion; other bots like Overkill (see my post on it) and Tscmoo worship at the same altar. But a funny thing happened on the way through the tournament. In the AIIDE 2015 competition report, look at the graph of winning rate over time for the different bots. Let me steal the image showing the top half of participants:

win rates by round in AIIDE 2015

AIUR’s line is the one in the middle that keeps rising and rising. Look carefully and you can see it leveling off, but it hasn’t reached its asymptote at the end of the very long tournament. AIUR’s learning seems to learn more, and to keep on learning, even though AIUR’s learning method is about the same as other bots. Howzat happen?

Of course AIUR doesn’t do exactly the same thing as other bots. After all, it calls its strategies “moods,” which sounds entirely different. It doesn’t learn an opponent -> strategy mapping, it learns opponent + map size -> strategy, where map size means the number of starting bases, usually 2, 3, or 4. It can figure out that its cannon rush works better on 2-player maps, for example. I imagine that that’s part of the answer, but could it be the whole story?

I have a theory. My theory is that AIUR’s extreme strategies make good probes for weakness. AIUR’s strategies range from absolutely reckless cannon rush, dark templar rush, and 4-zealot drop cheese to defensive and macro-oriented game plans. AIUR’s strategies stake out corners of strategy space. Compare Overkill’s middle-of-the-road zergling, mutalisk, and hydralisk strats, with no fast rushes or slow macro plays, nothing highly aggressive and nothing highly cautious. My theory is that if an enemy makes systematic mistakes, then one of AIUR’s extreme strategies is likely to exploit the mistakes, and AIUR will eventually learn so.

If true, that could explain why AIUR learns more effectively in the long run. Presumably the reason that it takes so long to reach its asymptote is that it has to learn the effect of the map size. The tournament had 27 games per opponent on 2-player maps, 18 on 3-player, and 45 on 4-player, not enough to test each of its 6 strategies repeatedly. It could learn faster by doing a touch of generalization—I’ll post on that some other day.

AIUR also claims to implement its strategies with a further dose of randomness. Intentional unpredictability could confuse the learning algorithms of its enemies. I approve.