AI methods | Starcraft AI blog

Fresh Meat game comparison

Fresh Meat was reactivated only a few days ago (see my brief writeup). Its play shows sudden switches: Compare Fresh Meat’s first game versus Prism Cactus with its second game against the same opponent. The first game looks aimless, the second shows a clear plan more-or-less cleanly executed.

Fresh Meat has been updated repeatedly in a short time, so any given play difference might be caused by a code change. But Hao Pan hinted that it records its opponent’s unit mix and plays to counter it. From his comment: “it is built on the in-game AI provided there’s no past games. This time I rewrote the unit composition record system, and am looking forward to the build orders COEP comes up with.”

The two games and the statement both say: Here we have an example of recording the opponent’s habits and figuring out how to counter them. It’s similar in concept to what I’m doing with opening data. This kind of system can learn very fast because it reacts immediately to new data without having to do any statistical averaging or other slow stuff. And it can make drastic adaptations, because the adaptation is done by reasoning—in the case of COEP, the reasoning takes the form of an evolutionary search. Its limits are set by the data it records and the capability of its reasoning system.

Fresh Meat is back

Hao Pan’s zerg variant of his terran bot Halo, Fresh Meat, is also back. Fresh Meat’s big claim to fame is that it uses COEP for production decisions. I wrote up COEP in 2019, where Hao Pan left a comment about the Fresh Meat version of that time.

COEP is a search method. Like any search method, it relies on you to provide a game model (the “forward model”) to answer the question “what happens if I do this?” and an evaluation function (a “fitness function”), to answer “and is it any good?” Results are shaped by the method, but depend mostly on the model and the evaluator you provide: A search method amplifies the smarts of its model and evaluator by using them to look into the future, but you need to provide those smarts in the first place, as a seed for the search to plant. (In COEP the seed grows into a lawn rather than a tree, but whatever!)

I was not impressed with Fresh Meat’s production decisions. It likes an early spawning pool—sometimes 4 pool, 5 pool, 7 pool, sometimes later—and makes early zerglings when it feels like it, or not when it doesn’t. It likes drones and sunkens. I haven’t seen it visibly react to enemy air units. It often lets its minerals run up to a high level, a very bad sign. I’m speculating, but the play is similar to what I’d expect if the model and evaluator were trained by play against the built-in AI. Could that be it? If so, perhaps it is learning on the ladders and will improve over time.

movement simulation

Steamhammer is nearly prepared for AIST S4. I made 3 major improvements, plus smaller ones. Not everything I tried succeeded, though. Today I want to write about a new feature I implemented that is definitely valuable, and that Steamhammer will find uses for in the future even though my first use case didn’t work out as I hoped.

Steamhammer inherited from UAlbertaBot a weakness in retreating. It is Steamhammer’s longest-standing serious weakness. The point of retreating is to get away from a fight, so the units get move commands and ignore enemies they meet on the path to their retreat point. Nearly all cases where Steamhammer’s units run into or through or past enemies without fighting are because it is trying to retreat them. There are spectacular failure cases, like running ultralisks through a field of cannons and losing them for nothing. My past attempts to fix it have been half-hearted, since I want to delay coding a full tactical analysis until later. I thought I’d try something more courageous.

So I implemented retreat simulation in FAP. Moving is a lot easier than fighting, and it didn’t take much code. Pass in a target position, and our simulated units move to the position while the enemy’s keep chasing and shooting as usual, with no change to their side of the simulation. We’ll say the retreat “succeeded” if the fleeing units’ losses were lower than some percentage.

My idea was: If the regular combat sim said to retreat, then I ran a retreat sim in the same frame. If the retreat failed, then don’t run away, keep fighting regardless. The retreat sim runs fast because it is simpler, and also I cut its duration to only 2 seconds. I didn’t meet any problems with the frame time limit. In principle, the retreat sim can conclude “the cannon will kill us even if we run away, so keep hitting it,” and notice when the retreat path is blocked, and distinguish a faster force that can escape from a slower force that the enemy will chase down (yes the speedlings will kill your marines for free if they run, better to make a last stand).

In practice, the retreat sim might be worth it on average, but I was not convinced. The sim results were as noisy as ever, and there were times when it hurt. Also, in some situations it’s conceptually the wrong thing to do. Having the “retreating” ultralisks fight cannons is better than running past the cannons, but better yet is to understand the situation as an attack rather than a retreat, and to stand aside until you can win. And that seems to call for the tactical analysis that I was saving for later.

I called the internal routine in FAP not retreatsim() but rather movesim(), because it can answer many questions about when movement is safe.

Will this runby succeed in running by?
How far can the transport go before it must unload?
How many scourge will survive to hit their target?
Will this plague/broodling/etc. succeed, or will the caster be killed first?
If I mind control that unit, will it live to join my army?

And with a little more elaboration, questions like

If it lives to cast, will the defiler/queen/etc. also escape safely?
Is there time to unburrow/unsiege, or should the immobile units remain as they are?
Can the mutalisks pick off another marine and still get away without losses?

I think it’s a safe bet that Steamhammer will be answering some of these questions before AIIDE this year.

lazy learning

Today’s post is AI background info. To use a neural network, or a decision tree, or most machine learning methods, you train it first and query it afterward. The system includes a model, represented as a data structure with code to interpret its meaning, and the learning algorithm adjusts the model to fit the training data. Training may be slow, but the model is usually designed so that answering queries is fast.

Some bots, including Steamhammer, do their opening learning differently, without persistent model data: They store only the training data, records of what happened in past games. When querying the system—deciding what opening to play based on experience—they look through the game records to find any relevant information and decide on the fly, not keeping any other permanent representation. That is called lazy learning, as opposed to eager learning, because it computes its model lazily on demand rather than eagerly up front. It’s a special case of lazy evaluation. There is also the closely related instance-based learning, which often means the same thing.

As an aside, this is analogous to the difference between an interpreter and a compiler. Partial evaluation of a lazy learning algorithm produces an equivalent eager learning algorithm, in exactly the same way that the appropriate Futamura projection converts an interpreter into a compiler. Not that that’s important, I just think it’s fun.

The prototypical lazy learning algorithm is k-nearest neighbors: Represent your training data as points in some space, each associated with a value. To query for some point in the space, you find the k closest points whose values you know and fit some function to them (say, take their average, which amounts to fitting a constant function). Choose k > 1 to average out any noise in individual values. I like to think of it in more general terms: Find whatever instances you can that are similar to your query, and generalize or synthesize the information in the instances into an answer to your query. That is a description of case-based reasoning. You can see k-nearest neighbors as a lightweight example and case-based reasoning as a heavyweight example of the same underlying idea.

Lazy learning does not mean that the learning system does not have a model, it only means that it does not have an explicit model that it stores as a data structure. There is still code to answer queries, and the code embodies an implicit model of how to turn training data into answers.

Lazy learning has advantages and disadvantages. Learning is trivial, since it amounts to storing records of your experience, so it is easy to keep up to date. With a deep learning model it is not very useful to update online in regular games, because training needs so much data. On the other hand, if your learning data is large, then with lazy learning you may not be able to use all of it to answer any given query. You’ll need a way to pick out the relevant records. It’s still effective if the relevant records are few enough, but by restricting the data you use you might be dropping useful information.

Lazy learning methods are natural for opponent modeling. I wouldn’t count them out for strategic uses, though. I can imagine a case base of strategic situations with reactions and evaluations of how the reactions worked out. That could produce a bot which keeps up with the meta by learning it from experience.

risky openings from data

I want to take a day off from analyzing AIIDE data to join in a conversation. From comments to Dragon vs Ecgberht:

Tully Elliston: Using learning to track win:loss %, and having a risk rating for each build (if win rate is above this level, don’t select this build unless it has won at least 1 game against this opponent) could actually be a very useful tool.

You can throw in lots of ridiculous polarised builds, and still ensure they won’t get accidentally selected when beating down a 4pool bot 100 times in a row.

MarcaDBAA: Yes, or give these builds, you don´t want to use at first, some default pseudo-losses, so that they will only be selected after other builds fail to win.

In BananaBrain versus Ecgberht I had mentioned that BananaBrain’s few losses to Ecgberht were due to unnecessarily playing a risky build. The commenters suggest two ways of marking builds as risky.

It can be done automatically from data, in the same style as opening timing data. In fact, it’s on my to-do list. The first step is to keep track of how good each opening is on average: For each matchup, store each opening’s average win rate across all opponents. It can be done offline by adding up the numbers from the individual learning files of each opponent, or you could keep separate records. That already gives you an automatic way to select openings against opponents you have not yet learned about; there’s no more need for hand configuration.

The next step is to compare how well each opening does against strong opponents versus weak opponents. If it reaches its average by beating expectations against strong opponents and falling below against weak opponents, taking into account the opponent’s strength, then it is a risky opening. If the reverse, it is a solid opening and is to be preferred against weak opponents (and if you’re ranked high, also against unknown opponents). One natural way to determine riskiness is to fit a line to the dataset this-opening’s-wins versus opponent-strength as measured by your win rates. The slope of the line tells how risky or solid the opening is. (If you have a lot of data you could fit a more complicated curve. Just make sure it strongly smooths the data.)

The same goes for other data about openings. For example, you can track how well each opening does on each map, and at given starting positions, and against different opponent strategies that you recognize. All the data can fold into your opening selection, without any hand configuration.

BASIL was formerly an excellent forum to collect this kind of data. But now the BASIL pairings are strongly biased toward opponents close in elo, so it is no longer a good option. Look at the crosstable for the last 30 days and notice how the white cells are laid out; unless you rank right in the middle, you can’t get a full cross-section of opponents without a long run.

opening timing data for Steamhammer

Here are my implementation thoughts about opening timing data, as mentioned yesterday. I haven’t decided whether this is what I’ll do next, still thinking. I will at least do something similar eventually.

the data

1. Record timings for all of Steamhammer’s openings, in a static data file to be read at startup. The timings should include the time when each tech or production building finishes, plus the number of workers and the army size and composition at the end of the book line (meaning the units produced; some may have been lost), and maybe a few other things. Time resolution of one second or a few seconds is probably fine. Even so, timings will vary from game to game, so maybe the timings should give low and high values, or mean and variance, or something.

Another idea is to accept that new openings will be added and reject the work of timing them before they can be played. Keep a database of opening timings and update it after each game. That’s only safe on a server which plays one game at a time; for tournaments like AIIDE the database would have to be frozen and read-only.

2. For each game against a given opponent, record the earliest time that each enemy unit type is scouted, including buildings. Steamhammer already does this, with its “unit timing” skill (implemented using the skill kit). Also record the timing and army size and composition of the enemy’s first attack, or maybe its first few attacks, or maybe all of its major attacks. I’ll see what helps.

using the data

The data about the enemy can be used to recognize enemy builds earlier and more consistently. Most bots have a small repertoire of openings. As Steamhammer plays, it can check its scouting data for the current game for the closest matches among recorded games. If the records say that the enemy followed up the same way in all of the close matches, then the enemy strategy is predicted to that extent. You can see it as a kind of case-based reasoning: Find approximately matching cases and generalize from them.

We don’t have to fully predict the enemy strategy to react to it, we only need to know constraints on it. For example, if we’re going to add static defense (whether written into the opening or in reaction to the enemy army size), then we can check records of when the enemy first attacked: Don’t build sunks too early. If a clever enemy notices the vulnerability and attacks early, too bad, but then we have a new game record and will know for next time.

The main purpose of the opening timings is to choose openings. The records of enemy games tell us the range of enemy play. When choosing an opening at the start of the game, before any scouting information is available, we can try to pick one with unit mix and timings that counter the range of enemy play. One basic adaptation is to try to always be a little greedier than the enemy, to get ahead in economy (except when the enemy is too greedy, then we can rush). That’s a principled way to choose 5 hatch before pool versus Dragon. Another is, if the enemy prefers certain units, we can pick openings that produce counter units at the right time.

Of course the data can also be used to adapt openings when our first choice was not right for the enemy’s actual play. It will be a while before Steamhammer can do that with generality.

the Robust Continuous Build-Order Optimization paper

The paper “Robust Continuous Build-Order Optimization in StarCraft” by Dave Churchill, Michael Buro, and Richard Kelly drew some attention. The paper looks to have been written in late 2017 or early 2018, based on its contents and references. It was not published until September 2019, and some of its claims fell out of date in the meantime. I found it underwhelming, which is why I did not write it up earlier, but properly speaking it is at least a small academic advance.

They take UAlbertaBot and BOSS as their experimental platform. Classical BOSS is given a goal to make a set of units (like “make this many zealots plus an observer”) and figures out how to make the units in the shortest time (or tries to; it has bugs and limitations).They take that as a starting place to move onward from. As I see it, the paper has 3 points:

1. Classical BOSS produces a timing attack: If you maximize your forces at the end of the build plan and otherwise don’t care, you may be vulnerable before you reach your strong timing. They compare the goal of producing a fixed set of units as fast as possible, or of producing as many units as possible by a given time (making a peak in the army size curve), with the goal of maximizing the area under the army size curve so that the army size is never too far from its feasible maximum. Maximizing the area means minimizing the timings at which you are vulnerable to an enemy timing attack, aiming for “I’ll always have a big army” rather than “I’ll have the biggest army possible, but only at one given point in time.” That is an important goal, so it’s good to have it available, though in a performance program it should not be the only available goal—to never make a timing attack is to leave money on the table.

2. To define “army size” they introduce an abstract evaluation function that measures the “army value.” The underlying goal here is for the system to make more decisions itself: Instead of the programmer specifying “make n zealots” the evaluation function decides whether one potential army is better than a different one. (Whether that actually offloads decisions from the programmer to the software depends on how you write the evaluator.) BOSS will be recast to maximize the integral of the army value over a given time span, rather than its classical target. In principle, you could use any measure of army value. In their experiments, they stick with a simple total of resources spent to produce the army. It’s fast and adequate to the immediate goals of the paper, but in a performance program you’d want something more capable. An important point is that it is what they call a “one-sided” evaluation: It does not take into account possible enemy reactions during the span of future time that the BOSS search looks into. Apparently the army value must be monotonic in the sense that adding to your army should never make it decrease.

3. A key point is that this altered BOSS search is no more difficult than the classical search, in any sense. It’s easy to understand how and why it works. It can be run in the same amount of time.

The experimental tests look weak from the point of view of a performance program. They are only trying to show that the idea is valid, not trying to make it work well enough to be used seriously. They use protoss only, make only zealots and dragoons, and compare classic UAlbertaBot versus a modified UAlbertaBot. They test against the top 10 AIIDE 2017 winners, which apparently were the best bots when the paper was written. The results are not consistent, but do show large gains against some opponents and no large losses against any. On the other hand, UAlbertaBot’s macro can deteriorate after the opening phase (though less so with protoss), so it’s not clear how much the gains mean.

It’s unclear whether the idea could succeed at all for zerg. Terran and protoss can make workers constantly, so that the army value can be at all times close to the maximum feasible army value. Zerg has to decide between workers and army, and the army you get later depends on the army you forgo now. The army value cannot stay close to the maximum feasible, and the tradeoffs are different. I expect you would get milquetoast macro plans rather than strong sharp ones.

To use this seriously, you’d need to write an army value evaluator which takes into account the opponent’s units. Making more zealots will not stop those lurkers. You’d want to rerun BOSS whenever the opponent’s unit mix threw up a surprise, like cloaked units; that would detract from the academic goal of offloading decisions from the programmer. BOSS can take up to a few seconds to run, spread over that many frames. Even if you overlap build order plans, so that you’re still executing the old one while computing the new one, the new plan will be out of date by that much time. That is an intrinsic disadvantage. Nevertheless, I think it’s entirely possible that with enough effort the paper’s idea could be made to work well in a tournament program, though the paper doesn’t prove it (or try to).

Bottom line: “Robust” in the title refers to the robustness against timing attack that the idea aims for. “Continuous” is false; the BOSS search starts and finishes at discrete points in time and takes many frames, and it’s expensive so you probably don’t want to rerun it too often.

I don’t recommend the method of the paper for a performance Starcraft program. It is part of an academic research program, chipping away at the edges of the problem in hope of eventually sculpting it into something manageable. It’s interesting if you have an experimental attitude and you want to try to improve it further; the first necessary improvements are in the army value evaluation function.

RPS analyzer and game solver

Regardless of any other adaptation skills you may have, if you can predict your enemy’s opening build, you can better counter it. But it’s not simple. Steamhammer tries to distinguish between opponents that play a fixed build, and those that vary their openings. For those that vary their openings, there is much more: Some choose randomly. Some like to repeat the opening that won the last game, switching only on loss. Some stick with an opening that wins more than a given percentage. Some try openings systematically, “A didn’t work, B is next.” Some choose randomly with some probability distribution over the more successful choices. Sometimes openings are treated as black boxes distinguished only by their names, sometimes as strategies which are understood to counter other strategies (Steamhammer does both at different times).

I am wondering whether it makes sense to write a rock-paper-scissors analyzer that tries to tease out exploitable patterns in the opponent’s behavior (there are established techniques), and combine it with a game solver to make better initial opening choices. On the one hand, many bots have exploitable patterns that I know about. If an RPS analyzer can find the patterns too, Steamhammer might seem to gain the mysterious “star sense” to always play the right thing for no visible reason. On the other hand, it’s relatively easy to reduce your exploitability to a low level—use randomness wisely. Also, as Steamhammer gains skills to adapt its strategy during the game, the initial opening choices make less difference. The gain might be little. And by trying to exploit patterns, Steamhammer could itself become more exploitable; it might backfire.

The parts of the system would be:

1. Classify the enemy build. Steamhammer already does this, though it needs improvement.

2. Statistically analyze the sequence of (win/loss, my opening, your opening) under the assumption that the opponent is trying to counter what we’re doing. Knowing what-counters-what may factor in. The output should be a probability distribution over opening classes, “what are they likely to do?”

3. Knowing what-counters-what definitely factors in here: Solve the game. We start with a prior probability of winning for each of our openings against each opening class the opponent might play, and thanks to Bayes we can update it as we learn about the opponent. That gives us a game matrix with uncertainties in the payoffs. (Since Steamhammer knows a huge number of opening builds, making the game matrix too big to fill in, I would classify Steamhammer’s openings too so that the output only decides which class of opening to play.) Without an RPS analyzer, we can solve the game (I expect I would use a Monte Carlo method to handle the uncertainties) and play not far from a Nash equilibrium (i.e., almost perfectly assuming unexploitable play from the opponent). If an RPS analyzer can make a good estimate of the opponent’s plans, in the best case we can do better: We can exploit the opponent’s deviation from the Nash equilibrium to win more than if we played to a Nash equilibrium ourselves.

It’s unclear to me whether either the RPS analyzer or the game solver is worth the effort. Does anybody have an opinion? Perhaps some bot I haven’t looked at has similar ideas?

Continual Online Evolutionary Planning paper

I noticed that Hao Pan’s description says it uses COEP or Continual Online Evolutionary Planning to adapt its build order during the game. And Hao Pan is a successful bot. So I thought I’d write it up.

The original paper is from 2017, Continual Online Evolutionary Planning for In-Game Build Order Adaptation in StarCraft by Niels Justesen and Sebastian Risi. There is a corresponding github repo with source. Long-time followers may remember Niels Justesen the bot. Niels Justesen has other publications; if you like this paper, I suggest Playing Multi-Action Adversarial Games: Online Evolution vs. Tree Search as a followup.

A good implementation is not simple, but the basic idea is easy. You have a set of build order plans, which you create and update in the evolutionary algorithm style. And you have a way to measure how good a plan is; in evolutionary algorithm terminology, its fitness. When you need to make a build order decision, you select the current best plan and follow it.

OK, details. In this paper, a build order plan is simply a build order, a list of stuff to build (or research) in order. In principle, it can be any length. You start with a population of different build orders, which you can generate randomly if you don’t have any better ideas. You evaluate the fitness of each one. Probably on the first try none of them will be particularly fit, so you need to make more plans. You create new builds by treating each one as if it were a stretch of DNA, and introducing different kinds of recombination and mutation (traditionally, a lot of recombination and a little mutation). The paper explains the authors’ choices. And of course evaluate the fitness of the new child builds. To keep the population size down, you delete less fit builds.

In concept, the evolution process runs continuously, and you query it for the best known choice when you need to make a decision. The authors actually implemented a client-server architecture, with a build order server. You could run it in a different thread, or do evolution on demand, or whatever; it should all work. When you start to carry out a build, you have to update the population to reflect that: This build says make 4 probes, I made 1 probe, there are 3 left. And the fitness measure needs to know about the game state so it can tell how good a build is.

So how does that fitness measure work? There are 2 parts, a forward model that predicts what happens when you follow a build order, and the fitness itself which measures how good the result is. You feed the forward model with information about the game state; the authors use mainly unit counts for both sides. In the paper, the forward model ignores builds that are illegal (make a corsair with no stargate), because the builds will have all kinds of random stuff in them. The forward model simply simulates what units you end up with when.

The fitness itself is based on a model of what counters what. You know what enemy units you’ve seen, and the forward model tells you what units you will have yourself. The fitness measure knows that vultures beat zealots and lose to dragoons, so in the face of vultures it sees lower fitness in a build that keeps making zealots and higher fitness in a build with more dragoons.

The build order being evaluated for fitness may be long, and you don’t want to measure fitness only at the end. You might lose before then! So they measure momentary fitness repeatedly throughout the build, and combine the local fitness values into a total fitness value, giving more weight to local fitness values that are early in the build. You’re pretty sure what’s next, not so sure about what happens later.

There is a video showing a game as protoss versus the built-in AI. Production decisions are made by COEP, the micro and other behavior is UAlbertaBot. Beating the built-in AI is not impressive, but what’s worse is that COEP showed some strange behavior. For example, COEP chose to research air attack for no reason; it never made a stargate. Even if it was planning at one time to go carriers, it made no sense to research air attack that early. Possibly there was not enough computation time to weed out all bad decisions.

A weakness of the described version of COEP is that it only counts the enemies you’ve seen. It doesn’t try to predict the future enemy unit mix, or to weigh the risk to you of different enemy choices. But that’s a weakness of the implementation, not of the basic algorithm. Features like that can be implemented.

COEP is extremely general. You need a way to represent plans, an evolutionary way to modify them, and a way to evaluate them for fitness. I think that can describe any problem (some more naturally than others). You could use COEP for tactical decisions and micro decisions too, or for deciding what to have for lunch.

You pay for the generality with more computation. You end up inventing and evaluating many plans that you will never follow, including plans that are unrealistic or outright invalid. You can try directed evolution ideas to reduce the extra CPU time, but generality will come with a cost.

I expect that deep learning, if you can apply it, will work better than COEP. The disadvantage of deep learning is that it burns data by the supertanker load. With COEP you have to implement a bunch of stuff yourself instead of letting the computer figure it out, but you don’t need the mass of data, so it may be more accessible for a hobbyist. Of course, you can combine the methods: Follow the COEP architecture and use deep learning to create and modify the alternative plans, or to evaluate them, or both. Since deep learning is then solving an easier problem, training should be easier.

goal monitoring as a structural principle

Humans naturally keep track of progress toward their goals. It doesn’t matter what you’re trying to do: If you reach out for something but don’t get a good grasp, you notice. If you’re making a point in conversation, you try to tell whether the other person understood. In Starcraft, people notice (unless they’re swamped with multitasking) when a unit gets stuck, or an army takes a bad path, or really when anything interferes with a plan. And having noticed a problem, they can try to solve it.

Isn’t it obviously a good idea? And yet it seems rare in bots. I think nearly all bots only notice problems that they are explicitly coded to look for. They don’t notice when their units run into Iron’s wall and start to move aimlessly, seeking a firing position that they can’t reach. Even a novice human will realize that something is wrong, but bots don’t register that progress is stalled and keep trying to execute the failing plan.

I’ve been thinking about adding goal progress monitoring throughout Steamhammer, at every level: Strategy, operations, tactics, micro. First, I want to rewrite everything with explicit goals anyway, because I think it is clearer and more flexible. Carrying out a goal consists of choosing a plan (either ahead of time or piece by piece on the fly) and executing the plan. Then, goal monitoring means being able to tell whether the plan is executing as intended. Firing at Iron’s marine behind the wall is a 2-step plan, get into range and fire a shot. Getting into range is a subgoal to move to a position that is in range. And we can tell whether a movement plan is executing as intended: Is the range closing with time? Does the movement take about as long as predicted? If not, then the plan is going wrong and we may want to patch the plan, or try a different plan, or back up further and try a different goal.

It seems like a lot of detail work if done by hand, and I’m sure I will do part of it by hand. But it means that the bot will always react to problems. If Steamhammer is beating its head against Iron’s wall, it will notice. Even if it doesn’t have a wall recognizer and doesn’t know how to react, it will know that its plan is failing and that is should try something different—choose another target to shoot at, maybe that will work. After several tries, it will be sure to find that shooting the wall itself succeeds. Even without specific knowledge, having general adaptivity seems valuable.

It also provides a clear task structure. Today, Steamhammer’s structure is ad hoc—the underlying principle might as well be “let’s code up some behavior!” With a structure of goals and plans, the amorphous behavior becomes a programming pattern to be reused over and over in the code. Each behavior is made up of a fixed set of parts: Choose goals, plan how to meet each goal, monitor the progress of each plan, back up and try something else if the plan is failing.

The clear structure also helps with machine learning. “OK, system, now learn to behave!” is a hard problem. “Subsystems, go learn to choose goals, plan, monitor, and retry” is an easier problem, because the relationships between the parts are fixed. There is less to learn, and better information about which parts are working well.

Well, that’s my long-term vision. I expect I will get there in a slow and messy way, as always.

Legionnaire’s analysis of Sparkle and Transistor

The planned post about strategy abstraction is delayed by a power outage at my house. Here’s a brief filler.

TeamLiquid has a post with analysis of new ASL maps Sparkle and Transistor by Australian protoss Legionnaire. Without drawing any strong conclusions about overall balance, Legionnaire points out how map features will affect play.

Current bots are poor at adapting to map features. More than that, it is beyond the state of the art for any AI system to adapt to maps with as few games as humans need. Humans reason out how map features affect play, and with experience they sharpen their reasoning. Machines, so far, mainly collect statistics about the course of events, and they need a vastly larger number of games to zero in on good strategy. Of course they may be able to play those many games faster, but we don’t know how to make a system that can combine reasoning with empirical learning like a human. I’m interested in Legionnaire’s expert analysis as an example that may offer clues.

limitations of combat simulation

Combat simulation answers one question, and answers it pretty well. It adds a lot of strength to a bot’s play. But there are many related questions that the simulation does not answer, at least not simulation as we currently implement and use it.

Taking free shots. Combat simulation is all about armies fighting, but the enemy army is not the only thing you want to shoot at. When the combat sim says to retreat from the enemy, Steamhammer retreats, even if if is already out of range and could safely take shots at exposed supply depots or other valuable stuff that can’t shoot back. It’s not a problem with the combat sim in itself, but it is a side effect of relying on the combat sim as the big decision maker. I’ve been thinking I’d solve it by calculating safe places and advantageous places, and holding position in advantageous places that are safe.

Can you run away? You’re in a hot fight, and the sim says you’re going to lose. Steamhammer runs away, and sometimes it is a mistake. If you’re fighting ranged units, you may die before you’re out of range; better to stay and do some damage first. If you’re fighting faster units, they can catch you and get free kills. Unupgraded marines should not flee from speed zerglings; if they are going to die no matter what, at least take some of the enemy along. It’s complicated, though. If you have friends nearby, maybe you can run and live. Or the dragoons can run and leave the reaver behind. The reaver is too slow to escape, so it should stop and fire when it can. In general, you want to simulate different cases and see which one comes out best. It amounts to performing a search over different tactical actions.

Harass versus big battle. FAP author Hannes Bredberg suggested (see comments to separate, independent unit control for micro) running the simulation for a very short period such as 0.5 second, a poke at the enemy. If you gain from the poke, do it. It’s a smart idea; a bot might do well without once looking ahead beyond the next moment. But instant gratification is not the only gratification. What if the first zerglings die to firebats, but once those are gone you mop up survivors and come out ahead? What if you lose more than you gain in the battle itself, but the enemy is wiped out and afterwards you can go to town on the enemy base? To recognize chances for delayed gratification, you want to look further ahead.

Maybe the right idea is to sample the outcome-so-far periodically as the simulation runs. If you’re ahead after 0.5 second, stop there and attack. If you’re behind then but ahead after 1.0 second, that’s a higher risk so maybe you should attack only if you’re a little further ahead. And so on. You can also cut the sim short if you’re too far behind. Maybe you can estimate whether losses are accelerating, or decelerating and possibly turning around. It’s both cheaper than running a simulation of the entire battle, and potentially more valuable. It’s an idea, anyway.

Tactical coordination. Here’s a common blunder made by Steamhammer, and also by related bots like Microwave: Zerglings and mutalisks are assigned to different squads, and act independently. There is no explicit coordination. The combat simulator acts as an implicit coordinator. It runs separately for each squad, and if the squads are in the same place then in each run it will pit the same friendlies against the same enemies and give the same answer. The mistake comes when both squads are deciding whether to attack (say) an enemy base strongly defended by sunken colonies and zerglings. Combat sim says, “Well, the zerglings are toast, but the mutalisks eventually win, so yeah! Go attack!” The right decision is to keep the zerglings back to stave off any attempted breakout, and go win with the mutalisks alone—losing your own zerglings lets the enemy’s run free, forcing the mutalisks to give up the attack and defend. The combat sim itself is unable to understand. In this case we could work around it by handling each squad separately, but that is a special case; in general, arbitrary units might be unhelpful in a given battle. I’ve seen Tscmoo do it right, at least in some cases.

Screening. Steamhammer likes to keep its army forward, just out of contact with the enemy. The forward position is aggressive and also risky. One advantage is that Steamhammer gets to see what the enemy has and what the enemy is doing; the corresponding disadvantage is that the enemy gets to see Steamhammer too. One of the skills I want for Steamhammer is screening: Maintain a small screening force forward to keep tabs on the enemy, and hold the bulk of the army farther back, in a more flexible position where it is also out of enemy sight. The screening tactic is common in pro zerg games, especially earlier in the game.

How does that affect combat simulation? It’s perfectly possible to simulate a fight between distant enemies, but the result will be less accurate. Imagine a zerg army gathering far outside the range of photon cannons and defending zealots. Should the zerg attack? SparCraft and FAP will assume that the armies approach each other and fight in the middle, far from the cannons, and will keep saying “zerg wins!” until the zerg army actually approaches the cannons and finds that the zealots are still in defensive positions. The combat sim needs more smarts to get the right answer.

strategy learning by solving the game matrix

Being unpredictable to your enemies has value. How can you do strategy learning and still remain unpredictable when you should? You can’t simply require randomness, because if one strategy dominates, then you should play it every time. At other times, you may benefit from playing 2 strategies equally, or by playing a normal strategy 95% of the time and otherwise rushing. It depends on what opponent strategies counter each of yours, and the n-arm bandit methods that bots use now don’t understand that. Here’s one way to do it. It’s a step up in complexity from UCB, but not a tall step.

You can record the results of your strategies and the enemy strategies in a zero-sum game matrix, and solve the strategy game (which is the subgame of Starcraft that involves choosing your strategy). In the first cut version, each cell of the matrix is “n times it happened that I played this and the enemy played that, and k of those were wins.” Take the observed probability of win for each cell of the game matrix as the payoff for that cell, and solve the game. The solution tells you how often you should play each of your strategies, assuming that the opponent chooses optimally.

There are a couple different algorithms to solve zero-sum game matrixes fast. I personally prefer the iterative approximate algorithm (here is a simple python implementation), but it doesn’t make much difference.

If you recognize a lot of strategies on both sides, you’ll have many matrix cells to fill in, each of which requires some number of game results to produce a useful probability. 10 strategies for each side already means that a big AIIDE length tournament won’t produce enough data. For a first cut, I recommend recognizing only 2 or 3 categories of enemy strategies, such as (example 1) rush, 1 base play, 2 base play, or (example 2 for zerg) lings and/or hydras, mutalisks, lurkers. Since you’re grouping enemy strategies into broad categories, you don’t need much smarts to recognize them.

You can group your own strategies in a completely different way, if you like. There’s no reason to stick to the same categories. Also, your bot presumably knows what it is doing and doesn’t need to recognize game events as signifying that it is following a given class of strategy.

In this method, you are assumed to choose your strategy before you scout, or at least ignoring scouting information. You can take your time to recognize the enemy strategy, and base the recognition decision on anything you see during the entire game.

How do you get started learning? You might want to start with a matrix of all zeroes and only use the game matrix for decisions after you’ve gathered enough data. Instead, I suggest keeping a global matrix alongside the ones for each opponent, with floating point game counts and win counts in each cell. The global matrix has the totals for all opponents. (Or maybe there’s a global matrix for each opponent race.) When you face a new opponent, initialize the new opponent’s matrix with scaled down game counts and win counts from the global matrix, as if only a small number of games had been played in total (I suggest 1 to 3 times the number of cells in the matrix as a try). You’ll start out playing a strategy mix that is good against the average opponent, and as you accumulate data the mix will shift to specifically counter this opponent.

There are tons of ways to fancy it up if you want to try harder. You could try a variant where you estimate the enemy’s choice probabilities instead of assuming the enemy plays optimally (you’ll need a different solution algorithm). You can keep a larger game matrix in parallel with the small one, and switch to it when you’ve accumulated enough data. Or use a hierarchical method that unfolds categories when there is enough data to distinguish them. You can try a more complicated Bayesian game solution algorithm, which realizes that the numbers in each cell are empirical approximations and takes that into account (“oh, this cell doesn’t have many games, better not rely too strongly on its value”). You can include scouting information in the strategy decision (“well, I can see it’s not a rush, so strike out that option for the opponent”). You can divide your notion of strategy into any fixed number of aspects, and keep independent matrixes for each aspect, so that your strategy choices are potentially random in many different dimensions. The sky is the limit.

Steamhammer’s opponent model

At its most general, a learning system consists of a data structure, a learning algorithm to update the data structure with new data, and a retrieval algorithm to answer questions from the learned data structure. You could say it is a specialized kind of memory, with a data store and ways to get information in and out. The information you get out is not the same as what you put in, and that is part of what makes it useful. Or you could say that ordinary computer memory is a learning system with no loss and no generalization.

One of my goals for Steamhammer’s opponent model is to learn from a single game. There is a well-known class of learning systems which makes it easy, because their data store is nothing more than a record of the original input data, called instance-based learning or memory-based learning. This simplest example is the k-nearest neighbors algorithm family: Your data is a set of points (x, y) , input x gives output y (where x and y might be, say, vectors). To learn from a new data point, simply store it along with the others. To answer questions of the form “what’s the output for input x?” you find a given number, k, of x‘s nearest neighbors according to some distance measure, and average their outputs (using whatever kind of average may be appropriate for your problem). A fancier class of systems that can draw complex conclusions from a single example goes under the name case-based reasoning, where the data store is a database of structured “cases”, or examples.

Anyway, I thought I should use a method in the nearest neighbor family. It’s the simplest way to meet my goals.

What should my data points look like? Well, what information does Steamhammer use to make strategy decisions? It looks at the enemy’s current unit mix. I want to be able to predict the enemy’s unit mix at a given time: “Oh no, those zerglings are early, it’s a rush!” or “this enemy switches to goliaths and hardly any tanks, I should build up hydralisks next.” Both are nothing more than unit mix @ time.

My data points are games, boiled down to sequences of unit mixes. In the first implementation, Steamhammer takes a snapshot of the unit mixes of both sides every 30 seconds, “this many drones, that many zerglings, ....” I also threw in some supplementary information: The map, the opening chosen, and as shortcuts for opening selection the times at which the enemy got various things, such as the first combat units, the first flyers, the first detection, and so on. And it simply appends all the data to a file named after that opponent.

To answer the question “what will the enemy’s unit mix be at time t?” the first implementation finds the nearest neighbor. It looks through the game records to find the best match game, the past game against the same opponent which is most like the current game, according to a similarity measure which adds up differences in unit mixes over time, up to the current time in the current game. (So the best match will change at most once every 30 seconds.) Having found the best match, it looks up the recorded enemy unit mix in the best match game record which is closest to time t and calls that the prediction. It’s dead simple.

That was my motivation. In fact, the game records have endless uses beyond predicting the enemy unit mix. For example, to figure out whether an opening is safe to play against this opponent, run the timings of the opening against the timings of the game records. If the opening always gets defenders in time, then the enemy will not smash you with a rush (or at least it will only be a surprise once). Or if you notice that the enemy never gets detection, then go lurkers and get an easy win. And so on.

Einstein, hand me the simplicity!

You can see why I thought the method was obvious. With clear goals and the right background knowledge, it is obvious. And you can see why I thought I could get it working within a few weeks; there is nothing complicated here. If I were a better coder, I would have succeeded.

Of course, it may turn out that the simplest option is not good enough. For the first cut I wanted to take the easiest way. If some part turns out to work poorly, I have improvements up my sleeve. The possible improvements are as endless as the possible uses.

The recorded unit mixes include buildings. Buildings are especially important for predicting what the opponent is up to, but my first cut similarity measure does not understand that. It treats the difference between 1 barracks or 2 the same as it treats the difference between 1 marine or 2, and that is obviously not ideal.
For some purposes, it may be better to record the total units ever made (or ever seen, if the enemy’s) instead of the current unit mix, because the current mix depends on the outcome of battles as well as the strategy followed.
If the best match is not close at all, maybe it should be ignored.
If there are a number of good matches, maybe they should be averaged together.
Surely the current unit mix should have a role in predicting the next unit mix. In the first cut, it is ignored.

The bottom line is that my first implementation may or may not work adequately. But I’m confident it can be improved until it does work.

recommended deep learning textbook

The recent textbook Deep Learning looks excellent to me. The authors are Ian Goodfellow, Yoshua Bengio, and Aaron Courville. There is an expensive edition, but it is also readable for free at the website. It’s a much-recommended book, and I recommend it too.

It is a theory book more than a practical book. I would say it is for people who have a computer background and perhaps don’t have deep math experience yet but aren’t afraid of math and are willing to dig in. I think it should be a good book for an early grad student, or an undergrad with strong interest, or a bright high schooler. The first part of the book presents the math knowledge you’ll need, like linear algebra and probability theory, so it is possible to start if you don’t know much. As always, the more background you have, the easier it gets.

To become expert, you have to know the theory and have experience applying it. If you want practical exercises to work your way into the technology, I think your approach should be to pick a software framework first (for example TensorFlow, Torch, Caffe) and then seek out tutorials or sample projects specific to the framework.

Everyone has their own learning style. If I were getting into deep learning from scratch, my approach would be: Read the whole book once through quickly to get an idea of the shape of things. With an overview in my mind, I could pick out parts that I needed to step through slowly and carefully. It takes time and practice to make unfamiliar concepts familiar, so if I hit topics where I felt weak or awkward I might seek out other resources.

GHOST solver

I want to draw attention to GHOST, a constrained optimization solver from 2014 that claims to be good for a wide variety of RTS problems. The lead is Florian Richoux, author of the protoss bot AIUR which was a strong contender in its day.

According to the description, GHOST should be good for Starcraft problems like designing a wall, placing static defense, choosing where to siege your tanks, planning build orders, optimizing your army composition, prioritizing targets to shoot at—a huge variety of things, actually. It claims to give answers not too far from optimal and to be fast enough to run within a single frame (though it will depend on what problem you’re solving).

the paper: GHOST: A Combinatorial Optimization Framework for Real-Time Problems (pdf)
the code: richoux/GHOST (github)

I haven’t tried GHOST and I don’t intend to, because it is under a GPL license, which rubs me the wrong way. But if is as good as the paper says, it could be a superb way to solve a lot of different Starcraft problems.