search | Starcraft AI blog

Fresh Meat game comparison

Fresh Meat was reactivated only a few days ago (see my brief writeup). Its play shows sudden switches: Compare Fresh Meat’s first game versus Prism Cactus with its second game against the same opponent. The first game looks aimless, the second shows a clear plan more-or-less cleanly executed.

Fresh Meat has been updated repeatedly in a short time, so any given play difference might be caused by a code change. But Hao Pan hinted that it records its opponent’s unit mix and plays to counter it. From his comment: “it is built on the in-game AI provided there’s no past games. This time I rewrote the unit composition record system, and am looking forward to the build orders COEP comes up with.”

The two games and the statement both say: Here we have an example of recording the opponent’s habits and figuring out how to counter them. It’s similar in concept to what I’m doing with opening data. This kind of system can learn very fast because it reacts immediately to new data without having to do any statistical averaging or other slow stuff. And it can make drastic adaptations, because the adaptation is done by reasoning—in the case of COEP, the reasoning takes the form of an evolutionary search. Its limits are set by the data it records and the capability of its reasoning system.

Fresh Meat is back

Hao Pan’s zerg variant of his terran bot Halo, Fresh Meat, is also back. Fresh Meat’s big claim to fame is that it uses COEP for production decisions. I wrote up COEP in 2019, where Hao Pan left a comment about the Fresh Meat version of that time.

COEP is a search method. Like any search method, it relies on you to provide a game model (the “forward model”) to answer the question “what happens if I do this?” and an evaluation function (a “fitness function”), to answer “and is it any good?” Results are shaped by the method, but depend mostly on the model and the evaluator you provide: A search method amplifies the smarts of its model and evaluator by using them to look into the future, but you need to provide those smarts in the first place, as a seed for the search to plant. (In COEP the seed grows into a lawn rather than a tree, but whatever!)

I was not impressed with Fresh Meat’s production decisions. It likes an early spawning pool—sometimes 4 pool, 5 pool, 7 pool, sometimes later—and makes early zerglings when it feels like it, or not when it doesn’t. It likes drones and sunkens. I haven’t seen it visibly react to enemy air units. It often lets its minerals run up to a high level, a very bad sign. I’m speculating, but the play is similar to what I’d expect if the model and evaluator were trained by play against the built-in AI. Could that be it? If so, perhaps it is learning on the ladders and will improve over time.

Steamhammer and machine learning

I’ve mentioned it briefly before, but here’s a longer post. It is past time to start writing serious machine learning methods into Steamhammer. I’ve been writing code for it behind the scenes, though none is in Steamhammer yet, even in the development version. I selected a combination of techniques that will learn fast, will run fast, and I hope will be accurate enough. Right now I need to fix a numerical precision problem (I’ve always hated numerical analysis), but soon it should be ready to start testing on non-toy problems. It won’t be in the next Steamhammer 3.1, but perhaps a version or two after that, if all goes well.

The first application will be a “will I win this game?” evaluation function. The idea of evaluation functions is very general: You can evaluate anything, “how good is this build order?” “am I likely to win this fight?” “which tactical maneuver is better, A or B?”—anything you want to measure or compare, really. The use of evaluators is also very general. Whenever you want to make a choice, if you have the right evaluator and you can provide it the right inputs, you can compare the choices and pick the one that looks best. That is what search is, and search is one of the most basic ideas in AI.

The “am I winning?” evaluator will take several hundred numbers as inputs, unit counts and things like that. You can see my 2018 analysis of LastOrder for some of the possibilities. The output will be an evaluation of how likely Steamhammer is to win from the game position, I think a probability or something that can be converted to a probability. My initial estimate is that it should run in under 5 milliseconds. It doesn’t need to be run often, so even if that’s optimistic it will be fast enough. If it works as well as I hope, it will be possible to specialize the evaluator for each opponent. If that succeeds, there will be a pre-learned evaluator for unfamiliar opponents, and learning data in the opponent model will update it to understand that player. I’m seriously expecting the learning to be fast enough for that to help, though we’ll see.

The first use of the evaluator will be to select openings. Right now Steamhammer keeps tabs on whether a given opening won or lost. The bot does not know, at least until it plays a lot of games, whether it won because the opening gave it a huge strategic advantage, or whether it was behind after the opening but managed to scrape a win anyway. The evaluator will tell it, and it will select better openings. For example, against a much stronger opponent Steamhammer rarely wins and falls back on trying builds at random, hoping to hit one that works. Most of the random choices are poor, but it is losing every game anyway so it can’t tell. The evaluator will tell it which tries are more nearly successful; it will try those more often and have better chances. That is only an example; I expect the evaluator to help against most opponents.

A later use of the evaluator will be to construct new builds. I have plans in mind. There is already code in Steamhammer—it’s not finished or working, but the nub of the idea is there—to simulate build orders. When that is in place, Steamhammer will be able to evaluate builds that it has never played in a real game and get an idea of whether they will work. “I got run over fast. If I substitute 12 pool for 12 hatch, am I ready in time?” If that succeeds, Steamhammer will be able to customize builds to counter specific opponents. The potential is great, and this evaluator is a key step on the way.

Starcraft gives the players many many choices. It’s not possible to search any large proportion of them. In the search/knowledge tradeoff, I think that means that knowledge is preferred: You want to search few choices (at least compared to how many there are), but select and evaluate the choices with a lot of knowledge. That’s why I think that knowledge-rich machine learning methods are the right way.

the Robust Continuous Build-Order Optimization paper

The paper “Robust Continuous Build-Order Optimization in StarCraft” by Dave Churchill, Michael Buro, and Richard Kelly drew some attention. The paper looks to have been written in late 2017 or early 2018, based on its contents and references. It was not published until September 2019, and some of its claims fell out of date in the meantime. I found it underwhelming, which is why I did not write it up earlier, but properly speaking it is at least a small academic advance.

They take UAlbertaBot and BOSS as their experimental platform. Classical BOSS is given a goal to make a set of units (like “make this many zealots plus an observer”) and figures out how to make the units in the shortest time (or tries to; it has bugs and limitations).They take that as a starting place to move onward from. As I see it, the paper has 3 points:

1. Classical BOSS produces a timing attack: If you maximize your forces at the end of the build plan and otherwise don’t care, you may be vulnerable before you reach your strong timing. They compare the goal of producing a fixed set of units as fast as possible, or of producing as many units as possible by a given time (making a peak in the army size curve), with the goal of maximizing the area under the army size curve so that the army size is never too far from its feasible maximum. Maximizing the area means minimizing the timings at which you are vulnerable to an enemy timing attack, aiming for “I’ll always have a big army” rather than “I’ll have the biggest army possible, but only at one given point in time.” That is an important goal, so it’s good to have it available, though in a performance program it should not be the only available goal—to never make a timing attack is to leave money on the table.

2. To define “army size” they introduce an abstract evaluation function that measures the “army value.” The underlying goal here is for the system to make more decisions itself: Instead of the programmer specifying “make n zealots” the evaluation function decides whether one potential army is better than a different one. (Whether that actually offloads decisions from the programmer to the software depends on how you write the evaluator.) BOSS will be recast to maximize the integral of the army value over a given time span, rather than its classical target. In principle, you could use any measure of army value. In their experiments, they stick with a simple total of resources spent to produce the army. It’s fast and adequate to the immediate goals of the paper, but in a performance program you’d want something more capable. An important point is that it is what they call a “one-sided” evaluation: It does not take into account possible enemy reactions during the span of future time that the BOSS search looks into. Apparently the army value must be monotonic in the sense that adding to your army should never make it decrease.

3. A key point is that this altered BOSS search is no more difficult than the classical search, in any sense. It’s easy to understand how and why it works. It can be run in the same amount of time.

The experimental tests look weak from the point of view of a performance program. They are only trying to show that the idea is valid, not trying to make it work well enough to be used seriously. They use protoss only, make only zealots and dragoons, and compare classic UAlbertaBot versus a modified UAlbertaBot. They test against the top 10 AIIDE 2017 winners, which apparently were the best bots when the paper was written. The results are not consistent, but do show large gains against some opponents and no large losses against any. On the other hand, UAlbertaBot’s macro can deteriorate after the opening phase (though less so with protoss), so it’s not clear how much the gains mean.

It’s unclear whether the idea could succeed at all for zerg. Terran and protoss can make workers constantly, so that the army value can be at all times close to the maximum feasible army value. Zerg has to decide between workers and army, and the army you get later depends on the army you forgo now. The army value cannot stay close to the maximum feasible, and the tradeoffs are different. I expect you would get milquetoast macro plans rather than strong sharp ones.

To use this seriously, you’d need to write an army value evaluator which takes into account the opponent’s units. Making more zealots will not stop those lurkers. You’d want to rerun BOSS whenever the opponent’s unit mix threw up a surprise, like cloaked units; that would detract from the academic goal of offloading decisions from the programmer. BOSS can take up to a few seconds to run, spread over that many frames. Even if you overlap build order plans, so that you’re still executing the old one while computing the new one, the new plan will be out of date by that much time. That is an intrinsic disadvantage. Nevertheless, I think it’s entirely possible that with enough effort the paper’s idea could be made to work well in a tournament program, though the paper doesn’t prove it (or try to).

Bottom line: “Robust” in the title refers to the robustness against timing attack that the idea aims for. “Continuous” is false; the BOSS search starts and finishes at discrete points in time and takes many frames, and it’s expensive so you probably don’t want to rerun it too often.

I don’t recommend the method of the paper for a performance Starcraft program. It is part of an academic research program, chipping away at the edges of the problem in hope of eventually sculpting it into something manageable. It’s interesting if you have an experimental attitude and you want to try to improve it further; the first necessary improvements are in the army value evaluation function.

Continual Online Evolutionary Planning paper

I noticed that Hao Pan’s description says it uses COEP or Continual Online Evolutionary Planning to adapt its build order during the game. And Hao Pan is a successful bot. So I thought I’d write it up.

The original paper is from 2017, Continual Online Evolutionary Planning for In-Game Build Order Adaptation in StarCraft by Niels Justesen and Sebastian Risi. There is a corresponding github repo with source. Long-time followers may remember Niels Justesen the bot. Niels Justesen has other publications; if you like this paper, I suggest Playing Multi-Action Adversarial Games: Online Evolution vs. Tree Search as a followup.

A good implementation is not simple, but the basic idea is easy. You have a set of build order plans, which you create and update in the evolutionary algorithm style. And you have a way to measure how good a plan is; in evolutionary algorithm terminology, its fitness. When you need to make a build order decision, you select the current best plan and follow it.

OK, details. In this paper, a build order plan is simply a build order, a list of stuff to build (or research) in order. In principle, it can be any length. You start with a population of different build orders, which you can generate randomly if you don’t have any better ideas. You evaluate the fitness of each one. Probably on the first try none of them will be particularly fit, so you need to make more plans. You create new builds by treating each one as if it were a stretch of DNA, and introducing different kinds of recombination and mutation (traditionally, a lot of recombination and a little mutation). The paper explains the authors’ choices. And of course evaluate the fitness of the new child builds. To keep the population size down, you delete less fit builds.

In concept, the evolution process runs continuously, and you query it for the best known choice when you need to make a decision. The authors actually implemented a client-server architecture, with a build order server. You could run it in a different thread, or do evolution on demand, or whatever; it should all work. When you start to carry out a build, you have to update the population to reflect that: This build says make 4 probes, I made 1 probe, there are 3 left. And the fitness measure needs to know about the game state so it can tell how good a build is.

So how does that fitness measure work? There are 2 parts, a forward model that predicts what happens when you follow a build order, and the fitness itself which measures how good the result is. You feed the forward model with information about the game state; the authors use mainly unit counts for both sides. In the paper, the forward model ignores builds that are illegal (make a corsair with no stargate), because the builds will have all kinds of random stuff in them. The forward model simply simulates what units you end up with when.

The fitness itself is based on a model of what counters what. You know what enemy units you’ve seen, and the forward model tells you what units you will have yourself. The fitness measure knows that vultures beat zealots and lose to dragoons, so in the face of vultures it sees lower fitness in a build that keeps making zealots and higher fitness in a build with more dragoons.

The build order being evaluated for fitness may be long, and you don’t want to measure fitness only at the end. You might lose before then! So they measure momentary fitness repeatedly throughout the build, and combine the local fitness values into a total fitness value, giving more weight to local fitness values that are early in the build. You’re pretty sure what’s next, not so sure about what happens later.

There is a video showing a game as protoss versus the built-in AI. Production decisions are made by COEP, the micro and other behavior is UAlbertaBot. Beating the built-in AI is not impressive, but what’s worse is that COEP showed some strange behavior. For example, COEP chose to research air attack for no reason; it never made a stargate. Even if it was planning at one time to go carriers, it made no sense to research air attack that early. Possibly there was not enough computation time to weed out all bad decisions.

A weakness of the described version of COEP is that it only counts the enemies you’ve seen. It doesn’t try to predict the future enemy unit mix, or to weigh the risk to you of different enemy choices. But that’s a weakness of the implementation, not of the basic algorithm. Features like that can be implemented.

COEP is extremely general. You need a way to represent plans, an evolutionary way to modify them, and a way to evaluate them for fitness. I think that can describe any problem (some more naturally than others). You could use COEP for tactical decisions and micro decisions too, or for deciding what to have for lunch.

You pay for the generality with more computation. You end up inventing and evaluating many plans that you will never follow, including plans that are unrealistic or outright invalid. You can try directed evolution ideas to reduce the extra CPU time, but generality will come with a cost.

I expect that deep learning, if you can apply it, will work better than COEP. The disadvantage of deep learning is that it burns data by the supertanker load. With COEP you have to implement a bunch of stuff yourself instead of letting the computer figure it out, but you don’t need the mass of data, so it may be more accessible for a hobbyist. Of course, you can combine the methods: Follow the COEP architecture and use deep learning to create and modify the alternative plans, or to evaluate them, or both. Since deep learning is then solving an easier problem, training should be easier.

tactical analysis

Today there was a short string of comments about retreating. Steamhammer makes mistakes in retreating: It retreats to poor positions, and the purpose of retreating is to disengage so it issues a move command—but it doesn’t check whether it is moving to a safe place. It often loses by retreating into or through an enemy force without fighting. It’s such a severe weakness that I think almost every Steamhammer fork has implemented some improvement to the behavior.

I am planning a simple improvement for CIG. For the following AIIDE tournament, I should have time to implement actual factual honest-to-goodness tactical analysis and make maneuver decisions properly. So here are my thoughts on what has to go into tactical analysis.

1. Distinguish different kinds of tactics. Maneuvering is one kind: Deciding when and where to move, when and where to attack, what locations need static defense or mobile defenders or scouting units, all that stuff. In maneuvering tactics, you do planning to decide on goals, and then carrying out the goals requires reacting to the situation but doesn’t require planning as such. The planning is all up front; if the bot hits a problem, the solution is to replan and set new goals. There are also what I think of as sequencing tactics, like clearing a map block or carrying out a drop or performing a coordinated attack from different directions. They require a sequence of steps. The bot has to recognize a need—clear the neutral buildings to open the path to expand to that base; select and collect resources—use lurkers because the buildings are stacked, otherwise it will take too long; plan paths; and execute steps in order—first clear the neutral buildings, then send the drone to expand. The response to problems seems more complicated than “just replan from scratch;” minor problems can be solved by assigning more resources. I don’t think there’s necessarily a sharp line between maneuvering tactics and sequencing tactics, but they do seem to call for different planning skills.

2. I am increasingly thinking that the 3 levels of analysis strategy, tactics, micro are not enough. I think I want 4 levels, which I will call strategy, operations, tactics, and micro. To a certain extent, this already shows in Steamhammer, where decisions made at the squad level sometimes look like tactics and sometimes look like micro. Retreating is a perfect example: It is a tactical decision which is made by the squad based on local tactical analysis of the situation of that squad only. Roughly speaking, high level tactics are in the combat commander, low level tactics in the squad, and micro in the squad’s micro managers. I want to reorganize it somewhat so that it is clear what level each decision is made at.

Strategy is about resources. It decides on the drone count, how much gas to collect, when to expand, and what units to make. The unit count it achieves and the unit mix it chooses have big effects on the lower levels. Strategy also includes the highest-level understanding of the game situation, like “I need to stay defensive until I’ve built up a big enough force” or “I’m ahead in economy, so an even trade is good.”

Operations is about choosing high-level tactical goals and assigning units to achieve them. Raiding squad, go hit that enemy base; blocking squad, delay any enemies from arriving there. I think that attack and retreat should be decided at this level. Sometimes it is right to retreat the raiding squad when strong enemies arrive; sometimes it is right to sacrifice the raiding squad to do as much damage as possible. The decision is best made based on the whole situation. You want to be able to reason, “sacrificing the raiders to delay the enemy a little longer means the upcoming drop is more likely to work.”

Tactics tries to achieve the operational goals. It chooses roles for each unit in a squad, or perhaps subdivides the squad into subsquads with different subgoals: Hydras engage the cannons, lurkers run past and reach the mineral line as fast as possible.

Micro is individual unit control. Where should the lurkers aim to kill as much stuff as possible?

As always, the different levels of analysis should ideally feed information to each other. See levels of abstraction.

3. How to analyze maneuver tactics efficiently? MaasCraft showed the way: Cluster units together and analyze at the level of clusters. At the operational level, it doesn’t matter exactly where individual units are. The analysis has to remain valid for at least a few seconds, and units will move around in that time. Draw a bounding box or a circle around each cluster and represent the contents of the cluster as summary measures of unit count, total hit points, speed or range of speeds, or whatever other values are useful. I believe many bots do this kind of analysis, but Steamhammer doesn’t.

With that information in hand, you can make quick estimates of when an army can arrive, how much force can be brought to bear on a given point, who wins the fight, and so on. The estimates may not be accurate, but combat simulators aren’t accurate either. You just have to be good enough to improve your decisions.

4. What should tactical analysis look like? MaasCraft uses a form of MCTS, a logical choice of search algorithm. A simpler search would already improve Steamhammer’s tactics. I expect that, with a smart enough evaluation function, a brief search that looks ahead a short time would produce strong tactical play. Surely machine learning should be able to produce an evaluator that smart. But I have ideas about how to write one by hand, and I may try that first.

Still next: Oops, somehow Proxy slipped down a slot.

LetaBot’s wall 4 - placing buildings to form the wall

Continuing from yesterday, here is how RecursiveWall places buildings to try to create a wall. It is of course recursive. It wants a vector of building types and a current search depth.

void BuildingManager::RecursiveWall(std::vector Buildings, int depth){

The call is like this. (The three bits of code here are in different places in the source.) The numbers representing buildings are used as indexes into the SpaceArray mentioned yesterday, which has the information needed to compute the gaps between buildings. I’m not sure why the two depots get different indexes into SpaceArray (which has the same data for both). Is it so they can be told apart in maps, so that building adjacency can be found? Anyway, depth is initially 0.

	  WallSound = false;
	  
	  std::vector Buildings;
	  Buildings.push_back(2);//Barracks
	  Buildings.push_back(3);//Supply depot
	  Buildings.push_back(4);//Supply depot

	  RecursiveWall(Buildings,0);

The code is full of implementation details that don’t matter at this level of discussion (and includes some redundant code), so here is pseudocode. CanWall() is a utility function to figure out whether a building can be placed at a given location. mapWall() maintains a 2D map of building locations. I should also mention, though I didn’t realize it until today, that WalledOffAStar() returns false if a wall exists and true if you can walk through, despite its name and some comments. It’s confusing.

RecursiveWall (vector Buildings, int depth) :
  if (WallSound) {                    // we found a wall
    return;                           // so stop the search
  }
  if (depth == Buildings.size()) {    // all buildings are placed
    walkable = WalledOffAStar (...)   // can we walk through?
    if (not walkable) {               // no, try lifting the barracks
      gate = WalledOffAStar (..., ignoring barracks)
      if (gate) {                     // can we walk through now?
        WallSound = true;             // yes, we found a wall
        BarracksWall = ...;           // record the building placements
      }
    }  
    return;                           // either way, we're done
  } else {
    int BuildType = Buildings[depth]; // next building to place
    for (int x, y inside the bounding box) {
      if (CanWall (BuildType, x, y)) {
        mapWall (x, y, BuildType, BuildType); // update data structures
        BWAPI::TilePosition thisTile = BWAPI::TilePosition (x, y);
        BuildingPlace.push_back (thisTile);
        RecursiveWall (Buildings, depth+1);   // recursive call
        BuildingPlace.pop_back();             // roll back updates
        mapWall (x, y, BuildType, BUILDABLE);
      }
    }
  }

Wow, it’s a little hard to follow even boiled down this much. WallSound is initialized to false before the call. So WallSound ends up true only if both the wall is sound and lifting the barracks opens it (so that the barracks is a functional part of the wall and not an appendix to a wall of supply depots). Older versions of LetaBot sometimes got that wrong and walled themselves permanently into their bases.

RecursiveWall does not know which of the buildings in its vector are the same, so if one building fails to place at any location it simply continues on to the next. As far as it knows, a good wall might still be possible with the remaining buildings. An optimization might be to prune the search appropriately when building placements fail. Is it too rare to matter?

Here’s what CanWall does, again in pseudocode. It is given the building’s type and proposed location and returns whether the location is good to build on. Buildings are required to be adjacent to an already placed building. It knows that the barracks is first and lets it be “adjacent to itself.”

CanWall (buildingType, buildingX, buildingY) :
  nextToB = false;       // adjacent to an existing building?
  for (int x, y in the building's footprint from SpaceArray) {
    if (outside the map) return false;
    if (not buildable) return false;
    if (covers Closest or Furthest) return false;
    if (useHighGround and this is not high ground) return false;
    if (buildingType == barracks) nextToB = true;
    if (adjacent to an already placed building) nextToB = true;
  }
  return nextToB;

The code runs the “adjacent to an existing building” loop even for a barracks. For efficiency, it could skip adjacency checks (the inner loop and thus the most expensive part of the code) once nextToB turns true, since it never goes false again. The compiler may be smart enough to lift the “is this building a barracks?” test above the outer loop, since it doesn’t depend on x and y, but it would be tidier to code it that way from the start.

Next: Discussion of walling code.

search versus knowledge

I want to elaborate on something I passed over lightly yesterday. Making decisions in a complicated situation needs both search and knowledge, and you can trade off how much you rely on each. You can search few options with knowledgeable selection and evaluation of the options, or you can search many options with less understanding of each. The left column opposes the right column:

search	knowledge
planning	reaction
processor time	data space

Humans are slow (neural operations in milliseconds) and have to act in real time, so humans rely heavily on knowledge for fast reactions. In a fast-moving Starcraft game, humans are primarily reactive and knowledge-driven (do what you practiced), and don’t have time to think through plans in detail.

Computers are fast (processor operations in nanoseconds), so computers should use more search than humans. (Also our learning algorithms are slower than human learning, at least in terms of data volume, so gathering knowledge is harder.) AI has a history of discovering how well search works. It makes sense.

Starcraft is real-time, so Starcraft bots should use more knowledge and less search than (say) chess or go programs that have time to think. On general principles, Starcraft bots should be more search-oriented than human Starcraft players, but more knowledge-oriented than chess programs.

Now consider the abstraction hierarchy, strategy, tactics, unit control. Unit control decisions happen the most often (since there are many units and each may need new orders frequently), so unit control has the strongest real-time constraint. Unit control, by the same general principles, should be as reactive and knowledge-based as possible, to free up time (allowing faster reactions and more time to think about tactics and strategy). Tactical and strategy decisions don’t have to be made as often and are based on a smaller amount of more abstract information, which favors searches that compare alternatives.

That doesn’t mean you have to do it that way, it’s just what the high-level view of the situation suggests. A strategy catalog (a knowledge base of strategies) makes perfect sense to me. Depending on what you count as “strategy” it may be that you can encode all your openings and counters and deceptions as knowledge and make strategy decisions mostly by lookup (where the lookup index includes an opponent model and dash of randomness). Or maybe not, I don’t know! Tactics seem more varied and call more loudly for comparison of alternatives by search and evaluation.

MaasCraft’s play

Does MaasCraft’s search make its play more interesting? I grabbed its replays from Starcraft AI Data Archive and watched some.

Short answer: No. It mostly moves its army forward and back along a path between its base and the enemy’s (exactly what its author Dennis Soemers hoped search would avoid) and rarely pulls any interesting maneuvers. It bears out what the author said about the appropriateness of the algorithm to the strategy.

I did notice one advantage MaasCraft has over many bots: It doesn’t chase after fast units that it can’t catch. I suppose the search sees that they get away.

tactical search in MaasCraft

MaasCraft is a protoss bot by Dennis Soemers which does a tactical search to decide how to maneuver its army. MaasCraft looks slightly above average. It finished 8th of 18 with a 59% win rate in AIIDE 2014 and 7th of 13 in CIG 2014 with 55% wins and hasn’t played since as far as I can tell.

Soemers’s paper on it is Tactical Planning Using MCTS in the Game of StarCraft (pdf BSc thesis from Games and AI Group, Maastricht University). You can get the code from the Starcraft AI Data Archive.

So, tactical search. “Tactical” here means maneuvering squads around the map and deciding when to engage or retreat. At the most basic level, search means finding alternatives and comparing them to pick one that is probably better. The comparison itself might involve a further search—and that is the root of all search algorithms. Search done right ought to outperform any scripted behavior, so our mission (should we choose to accept it) is to figure out how to do search right. This is the most interesting try I’ve seen so far.

The search nodes for tactical decisions have to include some representation of squads and squad positions on the map. In a chess program a search node represents the whole game state, but a specialized search in Starcraft should only represent the details it cares about. MaasCraft keeps it super abstract. The map is simplified to a graph with “nodes at each chokepoint and each potential base location,” the paper says, so that every location is just a node in the graph. A squad has a location, a speed, a hit point total, and a total damage rate (and presumably a number of units). A base is treated like a squad that can’t move (its static defense provides a damage rate). The paper doesn’t mention any provision for air units, but MaasCraft does mention air units in its code.

The moves for a squad that is not in combat are to stand pat or to move to an adjacent node in the location graph (where it may get into a fight). A squad that is already in combat can either keep fighting or retreat to an adjacent location. Moves take different amounts of time, and since each side can have any number of squads, many moves can be ongoing at once.

Finding the next state when you know the current state and the ongoing moves amounts to simulating part of the game. Movement is easy: A squad has a speed and the location graph has a distance. The course of a battle is estimated by Lanchester’s Square Law, which is a good estimate only for ranged fire and works better if all units in a squad are alike. I’m sure it’s less accurate than a battle simulator but it must be faster.

The search algorithm is one of the coolest bits. It is a variation of MCTS (Monte Carlo tree search) adjusted to cope with simultaneous moves of different durations. The next move down a branch is whichever move comes up next for whichever player. A squad that’s moving or retreating gets a new move when its current move finishes. If I understand it right, a squad that’s waiting or fighting gets an opportunity to change its mind when some other move finishes.

Algorithms in the MCTS family traditionally evaluate leaves of the tree by playing out one line of the game to the end. MaasCraft plays out up to one minute ahead and then applies a heuristic evaluation (not unlike AlphaGo). The evaluator gives points for destroying bases and harming armies, plus a tiny bonus to encourage moving toward the enemy.

How well does it work? Search should make play much more flexible and adaptive. A tactical search can plan sandwich maneuvers, or deliberately sacrifice a base to a strong enemy squad as long as it can defeat a weaker squad and take an enemy base in compensation. Looking into the future is what intelligence is. But this look into the future is simplified, so the question stands: How well does it work?

In the form interviewDennis Soemers wrote “I suspect the algorithm may be better suited for a more defensive and late-game strategy which is less reliant on perfect execution of the early-game. I also believe the algorithm would still be able to run sufficient simulations with a more fine-grained map representation and a more accurate Battle Model, since the bot is currently running on the bots ladder using less time per frame (I believe I use 35ms per frame in the competitions and only 20ms per frame on the ladder), with similar performance.”

That makes sense to me. In the early game, details matter; if you’re chasing the enemy scout, or fighting the first zealots with your first marines, it seems to me that you want to represent the situation in detail and search as exactly as you can. As armies get bigger, approximations are good enough and may be needed for speed.

By the way, code under the name “MCTSDennis” appears in LetaBot. Switches control whether it is active. The two bot authors are both from Maastricht University.