Starcraft AI blog | Entries from January 2020

the strange question of reaching a refinery

You’ve located an enemy refinery building and you want to know whether your ground units can walk there to attack it. Is it on ground that they can reach, or on an island? For other buildings, Steamhammer solves the reachability problem with a partition map: Areas reachable by ground from each other are in the same partition, and 2 simple lookups tell it whether one point is reachable from another.

Refineries are trickier. First of all, a geyser is not walkable ground, so properly speaking it doesn’t belong to any partition. The same for a refinery built on it; every other building is built on ground that can be walked over after the building is destroyed. This actually causes a rare bug in Steamhammer, where the ground squad understands that it cannot attack an island—except for any refinery that may be on the island, it may attempt to attack that! It knows that a refinery building is a special case, but it doesn’t understand the special case correctly.

Of course, the problem can be solved by filling in the partition map for the refinery with the partition that the geyser is in. “Of course”? No, actually it can’t: The geyser itself might be part of the unwalkable terrain between partitions. Imagine a wall on the left and a wall on the right, and a geyser plugging the gap between them. North of the geyser is one partition, south might be a different partition, and there is no way to walk from one to the other. The geyser, and any refinery you build on it, is adjacent to 2 partitions but doesn’t belong to either.

This actually happens in an even more complicated way on maps like Gold Rush and a number of others, where two assimilators form a gate. An assimilator is smaller than a geyser (protoss space warp tech, no doubt), so if you destroy the eggs between the assimilators, units can pass through. But if you destroy the assimilators the gap disappears and the bare geysers block passage. A bot needs sophisticated knowledge of the game to understand the effects.

I haven’t chosen a way to fix it yet. But it’s clear that geysers and refineries have to be treated with special care.

resource tracking code for everybody

Steamhammer’s new resource tracking code is short and largely independent of the rest of the program, so I decided to release it for anybody to borrow. If your bot is C++, you should be able to drop this in with little effort (using the results is up to you). If your bot is written in NeverHeardOfItScript, you can at least see how to do it. If you want, you may be able to get resource tracking into your bot before the next Steamhammer release.

ResourceInfo.zip includes ResourceInfo.cpp and its header ResourceInfo.h. The 2 files add up to 163 lines and have no dependencies beyond BWAPI (well, they are in namespace UAlbertaBot, but you can strip that out). One instance of ResourceInfo tracks the last known resource amount of one mineral patch or one geyser (for whatever reason, I chose to implement them both in the same class). I believe it handles all cases correctly. When a mineral patch mines out, its associated mineral patch unit disappears, and the code recognizes the missing patch and sets the amount to 0. A mineral patch that starts with 0 minerals causes no confusion. A geyser changes its unit type when a refinery is built on it, and then the associated unit changes entirely if the refinery is destroyed. The code understands all that. It correctly considers the gas amount inaccessible if there is an enemy refinery (I found a cute way to code it so that undoing the workaround is not a trivial one-line change).

There are different ways to integrate ResourceInfo into a program. In Steamhammer, I put it into the information manager. Here’s the declaration of the data structure to hold all the ResourceInfo instances:

// Track a resource container (mineral patch or geyser) by its initial static unit.
std::map<BWAPI::Unit, ResourceInfo> _resources;

It is a map from the static unit of each mineral patch or geyser to the corresponding ResourceInfo instance. It is initialized once on startup like this:

void InformationManager::initializeResources()
{
    for (BWAPI::Unit patch : BWAPI::Broodwar->getStaticMinerals())
    {
        _resources.insert(std::pair<BWAPI::Unit, ResourceInfo>(patch, ResourceInfo(patch)));
    }
    for (BWAPI::Unit geyser : BWAPI::Broodwar->getStaticGeysers())
    {
        _resources.insert(std::pair<BWAPI::Unit, ResourceInfo>(geyser, ResourceInfo(geyser)));
    }
}

Keeping separate data structures for minerals and gas would make as much sense, maybe more. Or you could associate each resource container with the base it belongs to, or however you want to organize it. For example, if all you want to know is how many minerals and gas were last known to exist at a given base, then you could give each base a vector of mineral ResourceInfo instances and another vector for gas, with no need for a map to look up individual patches and geysers. Steamhammer’s data structure is essentially global to allow central updating and general-purpose lookup.

Every frame, you have to update the ResourceInfo instances. It’s fast. With separate mineral and gas data structures, this step would be simpler.

// Update any visible mineral patches or vespene geysers with their remaining amounts.
void InformationManager::updateResources()
{
    for (BWAPI::Unit patch : BWAPI::Broodwar->getStaticMinerals())
    {
        auto it = _resources.find(patch);
        it->second.updateMinerals();
    }
    for (BWAPI::Unit geyser : BWAPI::Broodwar->getStaticGeysers())
    {
        auto it = _resources.find(geyser);
        it->second.updateGas();
    }
}

With Steamhammer’s _resources map, you look up a resource amount by its initial static unit. If you want error checking, throw instead of returning 0 on error.

// Return the last seen resource amount of a mineral patch or vespene geyser.
// NOTE Pass in the static unit of the resource container, or it won't work.
int InformationManager::getResourceAmount(BWAPI::Unit resource) const
{
    auto it = _resources.find(resource);
    if (it == _resources.end())
    {
        return 0;
    }
    return it->second.getAmount();
}

Each Base remembers its own mineral patches and geysers, and it can add up the values for you. No need to repeat that code. The only other piece is the debug drawing, so you can see what the subsystem knows. Might as well throw that in so you don’t have to write your own. It draws a currently visible resource’s amount in white, and an out-of-view resource’s last known amount in blue (mineral) or green (gas) with the last frame that the resource amount was updated.

void InformationManager::drawResourceAmounts() const
{
    const BWAPI::Position offset(-20, -16);
    const BWAPI::Position nextLineOffset(-24, -6);

    for (const std::pair<BWAPI::Unit, ResourceInfo> & r : _resources)
    {
        BWAPI::Position xy = r.first->getInitialPosition();
        if (r.second.isAccessible())
        {
            BWAPI::Broodwar->drawTextMap(xy + offset, "%c%d", white, r.second.getAmount());
        }
        else
        {
            char color = r.second.isMineralPatch() ? cyan : green;
            BWAPI::Broodwar->drawTextMap(xy + offset, "%c%d", color, r.second.getAmount());
            BWAPI::Broodwar->drawTextMap(xy + nextLineOffset, "%c@ %d", color, r.second.getFrame());
        }
    }
}

the Robust Continuous Build-Order Optimization paper

The paper “Robust Continuous Build-Order Optimization in StarCraft” by Dave Churchill, Michael Buro, and Richard Kelly drew some attention. The paper looks to have been written in late 2017 or early 2018, based on its contents and references. It was not published until September 2019, and some of its claims fell out of date in the meantime. I found it underwhelming, which is why I did not write it up earlier, but properly speaking it is at least a small academic advance.

They take UAlbertaBot and BOSS as their experimental platform. Classical BOSS is given a goal to make a set of units (like “make this many zealots plus an observer”) and figures out how to make the units in the shortest time (or tries to; it has bugs and limitations).They take that as a starting place to move onward from. As I see it, the paper has 3 points:

1. Classical BOSS produces a timing attack: If you maximize your forces at the end of the build plan and otherwise don’t care, you may be vulnerable before you reach your strong timing. They compare the goal of producing a fixed set of units as fast as possible, or of producing as many units as possible by a given time (making a peak in the army size curve), with the goal of maximizing the area under the army size curve so that the army size is never too far from its feasible maximum. Maximizing the area means minimizing the timings at which you are vulnerable to an enemy timing attack, aiming for “I’ll always have a big army” rather than “I’ll have the biggest army possible, but only at one given point in time.” That is an important goal, so it’s good to have it available, though in a performance program it should not be the only available goal—to never make a timing attack is to leave money on the table.

2. To define “army size” they introduce an abstract evaluation function that measures the “army value.” The underlying goal here is for the system to make more decisions itself: Instead of the programmer specifying “make n zealots” the evaluation function decides whether one potential army is better than a different one. (Whether that actually offloads decisions from the programmer to the software depends on how you write the evaluator.) BOSS will be recast to maximize the integral of the army value over a given time span, rather than its classical target. In principle, you could use any measure of army value. In their experiments, they stick with a simple total of resources spent to produce the army. It’s fast and adequate to the immediate goals of the paper, but in a performance program you’d want something more capable. An important point is that it is what they call a “one-sided” evaluation: It does not take into account possible enemy reactions during the span of future time that the BOSS search looks into. Apparently the army value must be monotonic in the sense that adding to your army should never make it decrease.

3. A key point is that this altered BOSS search is no more difficult than the classical search, in any sense. It’s easy to understand how and why it works. It can be run in the same amount of time.

The experimental tests look weak from the point of view of a performance program. They are only trying to show that the idea is valid, not trying to make it work well enough to be used seriously. They use protoss only, make only zealots and dragoons, and compare classic UAlbertaBot versus a modified UAlbertaBot. They test against the top 10 AIIDE 2017 winners, which apparently were the best bots when the paper was written. The results are not consistent, but do show large gains against some opponents and no large losses against any. On the other hand, UAlbertaBot’s macro can deteriorate after the opening phase (though less so with protoss), so it’s not clear how much the gains mean.

It’s unclear whether the idea could succeed at all for zerg. Terran and protoss can make workers constantly, so that the army value can be at all times close to the maximum feasible army value. Zerg has to decide between workers and army, and the army you get later depends on the army you forgo now. The army value cannot stay close to the maximum feasible, and the tradeoffs are different. I expect you would get milquetoast macro plans rather than strong sharp ones.

To use this seriously, you’d need to write an army value evaluator which takes into account the opponent’s units. Making more zealots will not stop those lurkers. You’d want to rerun BOSS whenever the opponent’s unit mix threw up a surprise, like cloaked units; that would detract from the academic goal of offloading decisions from the programmer. BOSS can take up to a few seconds to run, spread over that many frames. Even if you overlap build order plans, so that you’re still executing the old one while computing the new one, the new plan will be out of date by that much time. That is an intrinsic disadvantage. Nevertheless, I think it’s entirely possible that with enough effort the paper’s idea could be made to work well in a tournament program, though the paper doesn’t prove it (or try to).

Bottom line: “Robust” in the title refers to the robustness against timing attack that the idea aims for. “Continuous” is false; the BOSS search starts and finishes at discrete points in time and takes many frames, and it’s expensive so you probably don’t want to rerun it too often.

I don’t recommend the method of the paper for a performance Starcraft program. It is part of an academic research program, chipping away at the edges of the problem in hope of eventually sculpting it into something manageable. It’s interesting if you have an experimental attitude and you want to try to improve it further; the first necessary improvements are in the army value evaluation function.

resource tracking progress and thoughts

Today I put in the workaround for yesterday’s visible vespene BWAPI bug. Steamhammer now considers the gas reserves of an enemy refinery to be inaccessible. At the same time I fixed a rare bug that could briefly give a gas supply of 0, handled mineral and gas amounts of 0 in a more informative way, added features to improve the debug display, and made unnecessary efficiency improvements (“the cost is negligible—but—but—look how much I can reduce it!”). It added up to refactoring most of the code, but there’s not much of it so it wasn’t long.

Some later day I’ll add tracking of the bot’s own mining. Any minerals or gas missing from the map that Steamhammer did not mine were mined by the enemy, putting one lower limit on the enemy’s total resource gathering. Another lower limit comes from counting enemy units. The first use of enemy mining information will be for scouting: Overlord thinks, “these minerals have been mined, I don’t need to fly any farther to know that the base has been taken.” Look at the map Python. An overlord moving from the base at 3 to the base at 12 will encounter the mineral patches first, and can see that they have been mined before it comes into the sight range of any worker doing the mining. Identifying the enemy base even seconds earlier is an advantage worth taking.

The enemy’s resources will be valuable for strategy calculations. The same overlord on Python can judge whether the enemy is playing a low-econ strategy even before it gets close enough to count the enemy workers. The total resources mined by the enemy, as long as we have it, tells us about the enemy plan. It’s possible to find a hard upper limit too, but it seems complicated to get an upper limit that’s tight enough to be useful. Workers can be made this fast, bases can be taken this fast, a mineral patch can be mined this fast, it’s not as simple as counting stuff on the map.

In the long run, I envision an enemy resource usage estimator that takes visible resource amounts as only one source of information. Saved game records in the opponent model would be another, and a library of known strategies might be another. It feeds to an opponent plan recognizer, which feeds to a strategy planner. In my mind, resource tracking is a step along the path to smart strategy adaptation.

Stepping back to take a wider view, there are two ways to be smart: You can learn stuff like CherryPi with its neural networks, or you can calculate stuff like FAP—you can use knowledge or you can use search. I think you should ideally use both, like AlphaZero, because they support each other. Calculated limits on the unknown value of the game state are good inputs to help a machine learning algorithm.

visible enemy vespene

BWAPI will tell you how much gas remains in a visible geyser, even if the enemy owns a refinery over it. Human players get to see how much gas is in a bare geyser that they can see, or in one of their own refineries, but they can’t see enemy gas. It’s a BWAPI bug that gives away strategically valuable information. I noticed immediately when I wrote resource tracking for Steamhammer, but only filed github issue 847 today because I was too lazy to check past issues and change logs to find out if it was already known.

Has this never been noticed before? That’s hard for me to believe. Did I miss something among the BWAPI issues? I suspect that relatively few bots do resource tracking (even a bot as strong as BananaBrain will expand to an empty base). But some do, and surely somebody must have realized that BWAPI’s behavior doesn’t match Starcraft’s behavior. Or do bot authors really lack the game knowledge?

The bug doesn’t affect Steamhammer’s behavior much in practice—yet. Steamhammer only uses the resource amounts in deciding what base to expand to, and if it’s taking a destroyed enemy base then the enemy refinery was normally also destroyed, and Steamhammer gets to see the remaining gas amount then. But before long I want Steamhammer to start using resource amounts for strategy calculations, and then it will make a big difference if it uses its illicit access to the true enemy gas reserves. Knowing how much gas was mined is a big clue to what the enemy has done and is doing.

I’m thinking I’ll add a workaround to the bug, so that Steamhammer doesn’t collect information that it should not have.

the joy of inferring tech buildings

The picture is from a test game against the built-in AI. The columns on the right are a new debug display. The game was played to test an unrelated new debug display, which is not showing. Because of a newly-introduced bug (success! I’m introducing new bugs as planned!) Steamhammer had trouble against the built-in AI and did not win until late.

The green unit types on the right are Steamhammer’s units. White is the count of completed units, yellow is the count of uncompleted units. Hmm, 19 uncompleted zerglings, an odd number? That is because 2 zerglings in an egg may not hatch at exactly the same time.

The orange unit types are the enemy units. The white numbers count enemies that have been seen, and the red numbers are inferred enemy buildings. For example, Steamhammer has seen observers, so it knows that there must be a robo fac and an observatory. But wait, why are there a hatchery, a hydra den, and a spawning pool among the inferred enemy buildings?

Ha ha, protoss mind controlled a lurker! It’s there on the list of enemy units. Bug 1: The building inference code doesn’t know about mind control. Bug 2: The code also does not know that lurker research implies a lair.

Bug 1 takes effort to fix in full generality, because you have to remember what units the enemy has mind controlled, and Steamhammer doesn’t track that (or need to for now, it’s rarely important). If enemy protoss mind controlled your carrier, you can’t infer from that carrier that the enemy has a stargate and fleet beacon. If enemy protoss mind controlled an SCV and never mind controlled a marine, then you would like to infer a barracks from seeing a marine. For now I put in a simple workaround: Require an inferred building to belong to the same race as the enemy. That will reduce the errors, at least.

Mind control is a very sharp corner case, it can cut.

Bug 2 I decided also to leave unfixed for now, because there is another bug hiding behind it, related to zerg. Inferring buildings from tech adds complexity. If you have seen 1 enemy hatchery and later see a lurker, implying a lair, does that imply that the enemy now has 1 hatchery and 1 lair? No, the hatchery might have morphed into a lair. In effect, a lair is a hatchery and a hive is a lair, but not the reverse. Similarly if you see a guardian, implying a greater spire. Most tricky.

choosing among tournament designs

I wrote in 2016 about tournament design in general. Today I’ll talk about specific designs and consider Dan Gant’s proposed tournament design.

I use words precisely. A “game” is a single game. A “match” is a sequence of games between the same 2 players. A “pairing” is the choice of which 2 players play a given game or match against each other. (I don’t say “matchup” here, but I reserve the word for race matchups, terran versus zerg and so on.)

With n players there are n * (n - 1) / 2 possible pairings. Each pairing is potentially independent; A might beat all other comers but lose to X, Y, or Z, and you don’t know until they play. One of the goals of a tournament is usually to declare “a winner” even though there may be no unambiguously strongest player—there may be cycles of A beats B beats C beats A, as in the Condorcet paradox. That’s only one reason that tournament design is hard, but it’s enough!

tournament designs

Here are some popular tournament designs. Many more exist. There are also many ways to combine them; these are simple tournament designs that can be plugged together as modules to create complex tournament designs.

Round robin means each player plays each other. The SSCAIT round robin phase is a double round robin where each player plays each other twice, and the CIG and AIIDE academic competitions are similar but with many more rounds. A round robin tournament collects evidence evenly across all pairings, and traditionally it also weights the evidence evenly: Winning a game against a weak competitor is 1 point, winning a game against a strong competitor is 1 point.

Knockout means that losing players are eliminated. In single elimination, whoever loses one game is out; in double elimination, two losses and out; there are higher variants. Knockout tournaments usually feature fixed brackets, where the pairings of winners and losers are laid out beforehand. If the players are seeded 1 to 10, then pairing 1 with 10, 2 with 9 and so on gives an advantage to the higher seeds, so they are more likely to meet in final rounds. A knockout samples few pairings and gives less evidence than other designs that the winner deserves it, or in other words, the luck of the pairing is a big factor.

Grouping the competitors into sub-tournaments can be one stage of a tournament. Pro Starcraft tournaments commonly have a “group stage” where the players are divided into groups of 4 who play among themselves. The top 2 finishers of the 4 move on to the next stage, which has a knockout format. Grouping selects some pairings to gather evidence about, and ignores others. Luck again; did you get the group of death?

Progressive elimination designs come in many shapes, but in general, the tournament runs through steps or stages, and after each step the weakest competitors are dropped. For example, the players play a round robin, then 1 or more of the competitors with the lowest scores are eliminated; slice off those rows and columns from the crosstable. The remaining players keep their surviving results in the crosstable and add to them by playing another round robin, and the process repeats. This kind of design collects some evidence about all pairings, and deliberately collects more about the top competitors; it cares more about the strongest competitors, which have the best chances to win.

The Swiss system attempts to identify a winner with fewer total games. A Swiss tournament has a fixed number of rounds, much less than the number of players, where each player plays one game per round. In the first round, players are normally paired by seeding order (if players are 1 to 10, pair 1 with 6, 2 with 7, on to 5 with 10). In later rounds, players are paired against others with the same score: Winners against winners, losers against losers. If the number of players with a given score is odd, one player has to be paired up or down; there are rules about how to do this fairly given the seeding order. Compared to designs that collect more evidence, a Swiss tournament has a higher chance of players ending up tied at the top. Swiss makes sense when the number of players is large and the number of games per player must be kept small; I would recommend it over 4-player groups in pro Starcraft qualification tournaments (the ones to choose players before the main tournament starts).

matches

Anywhere that I wrote “game” above, the players could play a match of multiple games instead. That’s common in Starcraft knockout tournaments: You are knocked out, or at least knocked down to the loser’s bracket, if you lose a best-of-k match for some odd number k. A k-round round robin can be seen as a single round robin of k-game matches. A match can be taken as a tournament between 2 competitors, and like a bigger tournament, there are different way to design it.

Best of k means that the first player to win over half of k games wins the match. The remaining games may not be played—in Starcraft, they aren’t; in other sports, they sometimes are.

Ahead by k means that the players play games until one of them pulls ahead of the other with a lead of k games. If the players are evenly matched, then the match might continue for a long time, especially if k is big. You collect more evidence to decide a close call, but you don’t know how long it will take.

Statistical decision procedures are also possible, though I’ve never seen them used in a public match—they’re hard to explain to viewers. The match continues until one player is significantly ahead by some chosen statistical criterion, functionally equivalent to, say, “A is at least 67% likely to be better than B given these results.”

the Dan Gant proposal

Dan Gant of purple fame proposes using match scoring rather than game scoring for a round robin tournament. My 2016 tournament design post points out that this would have changed the results of AIIDE 2015. The proposed details, to quote Dan:

To create a better event that’s pure round robin, I think the formula is this:
1. Bots are ranked on how many opponents they had winning matchups against
2. First tiebreaker: Head-to-head record against all tied opponents
3. Second tiebreaker: Overall win%

Dan says “matchup” where I would use “pairing”. I foreshadowed this above when I pointed out that a k-round round robin can be seen as a single round robin of k-game matches. Dan says, instead of adding up the game results to get a competitor’s score, add up the match results. In the SSCAIT round robin phase, k=2, so a bot scores 1 if it defeated an opponent 2-0 and otherwise scores 0. The idea is that we should only care about how many opponents a bot can defeat—to beat more enemies is better, even if you lose occasional games, and he believes that leads to better incentives for bot authors. It’s a reasonable argument. To be a top bot currently, you must score nearly perfectly against most weaker bots, and the effort to get that near-perfect score is not as interesting as the effort to beat your next competitor up. That’s the claim, and it makes sense to me. I can also see that the claim may be more appealing to top bot authors than to others. Steamhammer’s tricks like burrowed zerglings will be countered by top bots soon if not already, and may never be understood by lower-ranked bots, so I expect that the long-run effect will be to score higher against lower-ranked opponents; that doesn’t make the feature less fun or interesting to me.

With a score range of only 0 to 1 per opponent, rather than 0 to k, there will be more tied rankings, so he proposes tiebreakers. With k=2 in SSCAIT, the first tiebreaker amounts to “I had 4 ties and 3 losses, you had 3 ties and 4 losses, so I’m ahead.” With k=100 or more in AIIDE, the first tiebreaker will hardly matter, and only the second tiebreaker will count.

It’s simple to modify the scheme to avoid tied matches, if that’s something you care about. For SSCAIT, instead of a k-game match with k=2, go with a best-of-k match with k=3. Based on the rate of 1-1 matches in this year’s crosstable, I estimate this would require about 10% more games. There would still be ties in the rankings, and breaking the ties might be more complicated, so I don’t consider best-of-k a better choice overall for SSCAIT.

Scoring matches instead of games means throwing away information, at least until tiebreakers are applied. How much information you are throwing away depends on k, so the argument for this design also depends on k. If k is large, you’ll often know which player is better in a given pairing long before all k games are played. Given that you go with match scoring, that is an argument for choosing something other than a fixed k-game match for each pairing.

Different tournaments have different goals. Dan’s proposal is in part calling for the goal of different bot author incentives to be incorporated in tournament goals. It’s perfectly reasonable, and may help the community maintain interest and activity over the long haul. If some tournament organizers agree, then there should be more discussion about what kinds of design best meet the goal. I also think it’s good that we have different tournaments with different goals, so it’s good if not all organizers agree.

If a large academic tournament like CIG or AIIDE accepts the argument, then I have a different proposal. I think a progressive elimination tournament might perhaps meet both academic “let’s measure everything” goals and Dan’s “give authors good incentives” goals. These tournaments have n players with n in the tens and k rounds for k near 100. They are round robin. A progressive elimination design might go like this: Play 50 rounds with all players. Then drop the lower half and play 50 more rounds (so far this is like CIG 2016 except that first stage results should carry over). Repeat once or twice more. The result is fewer total games, all pairings have at least half as much evidence collected, and pairings between top competitors are much more deeply investigated. Lower-ranked competitors were eliminated so detailed results against them do not affect the ranking of the final winners.

Maybe a more gentle form of progressive elimination would be better. I feel it does make sense to sample more data from higher-ranked bots than from the tail, though perhaps not so much more. As far as I can see, some design in a progressive elimination style could meet more goals.

Steamhammer skill kit

As a teaser, here is a code snippet from the Steamhammer 3.0 development version.

void SkillKit::update()
{
    for (Skill * skill : skills)
    {
        if (skill->nextUpdate() <= the.now())
        {
            skill->update();
            if (skill->feasible() && skill->good())
            {
                skill->execute();
            }
        }
    }
}

(the.now() is short for BWAPI::Broodwar->getFrameCount(), the current frame number.)

The skill kit doesn’t actually add skills as such, it adds control of skills; to “execute” a skill means to set flags or post goals or otherwise send orders to the rest of the program to carry out the skill as directed. The skill controller can run every frame if it likes and change its orders at any time. There are two main features. 1. The skill kit is extensible: It is easy to write your own skill controllers as subclasses of Skill and drop them in. This is good for modularity and especially good for people who fork Steamhammer. 2. The skill kit provides an easy interface to the persistent opponent model files. A Skill can include one method to write its data to a string, and one method to parse a string that it wrote back into data, and the skill kit takes care of the I/O plumbing.

As I use the skill kit for more things, Steamhammer will become much more flexible in adapting to its opponents.

Soon: Tournament design is hard.

Steamhammer and bugs

I’m pleased with Steamhammer’s reliability this tournament. I watched all 88 games, and for the first time ever I saw no losses caused by crippling bugs. There were 2 games with terrible play due to bugs, but in both cases Steamhammer recovered and won anyway. There was also a bug that occasionally caused multiple drones to be sent to fail to scout the enemy: Lose one, send another, repeat. That would lose games for sure against an even opponent, but in practice it occurs in games that Steamhammer will lose no matter what. I think I know the cause, and it will be fixed soon.

I surmise that Steamhammer is especially prone to reliability problems because it aspires to do everything. It plays over 150 opening builds of all kinds; failing to adapt to all the misadventures of the opening causes a lot of snags. I count only 3 zerg abilities yet to be implemented: Nydus canals, infested terrans, and lifting off an infested command center, all on my list for coming months. (Drop is technically implemented even though not yet used by zerg. I guess overlord sight range is not implemented either, but it’s trivial if I ever want it.) More features means more bugs; fixing bugs in defiler play was a big time sink last year.

Another source of reliability hitches is my habit of swapping in new plans as fast as I can make them. A number of Steamhammer’s internal modules are half-rewritten, stuck in the middle of a transition from an old design that is not flexible enough to a new design that is not finished enough. It would be more efficient to complete one task before moving on to the next. But then I would be making progress on only one front at a time, and it wouldn’t be as much fun. I like the variety.

Anyway, after burnishing the 2.x versions for over a year, I’ve remedied almost all of the worst bugs. It is past time to get back to major features and structural work, so that I can add a new round of bugs. To symbolize that, I’ll be calling the next version 3.0 even though it doesn’t in reality have any new feature worthy of a major version number. It will in time.

SSCAIT round robin is over

And that’s it, the SSCAIT 2019 round robin phase is complete. The last game was #40 CUNYBot by Bryan Weber > #43 Marine Hell.

We have #1 Locutus, #2 PurpleWave, #3 BetaStar. Oldtimers with little or no recent development that qualified for the round of 16 are #5-6 Iron (tied with BananaBrain), #12 Killerbot by Marian Devecka, and #15-16 Bereaver. The last bots to qualify were #15-16 TyrProtoss and Bereaver, and the first to miss out was #17 StyxZ with 2 wins fewer. Arrakhammer that I thought would be a close call fell several places and tied with Skynet by Andrew Smith and XIMP by Tomas Vajda for places #19-21 (not bad company). I think this is the first time XIMP has ever failed to reach the finals; the field has finally overtopped it.

Top terran is #4 Halo by Hao Pan. Top zerg is #9 Microwave. I want to call out #10 Proxy and #13 MadMixP as doing particularly well.

Good work, all! As far as I am concerned, the round robin is the main tournament and the finals are lagniappe. Still, there’s more to look forward to.

Steamhammer in the SSCAIT elimination phase

As I write, the SSCAIT round robin phase is close to its end, and though the top 2 places are undecided, most of the top 16 ranks are either mathematically fixed or highly unlikely to change. I am able to forecast part of Steamhammer’s path through the final elimination phase with only moderate risk of error.

Steamhammer will finish #11, the same as last year. It’s the middle of the range #10-#12 that I predicted on 29 December.

BananaBrain and Iron are tied for #5-#6. I don’t know how the tie will be broken for pairing purposes, because they scored 1-1 against each other. Perhaps it will be broken randomly. It matters for Steamhammer, because if the pairings work the same as last year, #11 Steamhammer will be paired against #6.

suppose #6 Iron

A stroke of luck. #11 Steamhammer 3-0 #6 Iron in the first round, or at worst 3-1, the same round 1 result as last year. In the second round, Steamhammer should face #3 BetaStar, which will win crushingly, still the same result, pushing Steamhammer into loser’s bracket round 2. There, Steamhammer is likely to face #16 TyrProtoss, which will be a close match that I can’t call. In the last two weeks on BASIL, the score is Steamhammer 13-11 TyrProtoss. In the SSCAIT round robin, the score is Steamhammer 1-1 TyrProtoss. Last year, this is that match that eliminated Steamhammer. If Steamhammer does win, surviving longer than last year, its next opponent in the loser’s round 3 is harder to forecast, and the match could again go either way—though I think Steamhammer is more likely to drop out here than to go on. It’s funny that this path is so nearly the same as last year, and maybe longer even though Steamhammer is probably relatively weaker.

suppose #6 BananaBrain

Though Steamhammer 2-0 BananaBrain in the round robin thanks to my preparation and/or luck, BananaBrain is likely to win this match. BASIL shows Steamhammer 2-14 BananaBrain. In this case, Steamhammer is likely kicked out at once.

Yeah, elimination tournaments are very sensitive to pairings and lucky results. Which means that something entirely different could yet happen.

experience with Steamhammer’s burrowed zerglings

Steamhammer’s ostensible Killer Feature or tournament surprise was burrowed zerglings to block the opponent from taking nearby bases. The lings are assigned to Watch squads to watch the bases, and burrow there as soon as burrow is researched. How did it work out? Do the Watch squads contribute to Steamhammer’s strength?

My conclusion: Yes. The cost is the expense of researching Burrowing (100/100) plus the assignment of up to 4 zerglings to watch enemy bases, taking them from other duties. It’s reasonably low. To pay that back, burrowed zerglings have to gain intelligence or delay enemy expansions. A few opponents, like Tscmoo, know what to do about burrowed zerglings and send detection and force. Even so, the intended expansion is delayed, and Tscmoo then starts the base immediately so that the zergling’s residual vision sees it and Steamhammer can react instantly. It’s already worth the cost. Other opponents treat a burrowed zergling the same as a spider mine, which is not always astute. XIMP sends a zealot to “trigger the mine” with no effect; the burrowed zergling continues to block the base until carriers fly over with an observer. It’s no better when terran scans and tries to kill the “mine” with an SCV. Some opponents don’t react and can’t deliberately expand at all; they are stuck until the zergling is removed by mischance. In many cases, I think Steamhammer would benefit by researching burrow earlier. There is virtually no effect against zerg, because by default the strategy boss researches burrow when Steamhammer has 3 bases, and that is late game in a ZvZ.

A couple issues limit effectiveness. First, the zergling unburrows when it knows that it is detected—and then it forgets and reburrows at once, going into a loop. The intention was to stand up and fight or flee when necessary, but there was a bug. Second, in an emergency when all other zerglings are killed, the last watching zergling unburrows and races home to join in the defense. But one zergling running in from afar barely affects the fight, and the enemy becomes free to expand without hindrance. After seeing games, I concluded that it will be more effective to leave the last watcher in place. Both issues are fixed in the development version. Overall, this is a low rate of errors compared to other new features I’ve made; care in testing paid off.

Against some opponents, the Watch squads do little good and represent mostly cost. The best example is Tyr protoss, which against Steamhammer plays one-base cannon defense into timing attack. When the attack comes, it includes an observer and the oncoming steamroller incidentally rolls up the zergling burrowed in the natural. Steamhammer pays the cost, and Tyr bypasses it with no effort. Blocking later bases sometimes helps, though. Another example is Iron, which likes to build turrets before it starts the command center. The burrowed zergling at most gets to see a turret or two. At some point I will put the Watch squads into the upcoming opponent model skill system and have Steamhammer collect data on how well they work. That will help Steamhammer research burrow at the right timing, as late as possible while effective against this opponent, and never if it is never effective.

I have more uses in mind for burrow. In particular, I want to use it to protect drones from some attacks. Reaver drops usually don’t include an observer, so when the reaver lands, the drones should poof out of sight before they can be targeted. When the reaver is picked up, the drones casually reappear and continue work. That will be funny to watch.

RPS analyzer and game solver

Regardless of any other adaptation skills you may have, if you can predict your enemy’s opening build, you can better counter it. But it’s not simple. Steamhammer tries to distinguish between opponents that play a fixed build, and those that vary their openings. For those that vary their openings, there is much more: Some choose randomly. Some like to repeat the opening that won the last game, switching only on loss. Some stick with an opening that wins more than a given percentage. Some try openings systematically, “A didn’t work, B is next.” Some choose randomly with some probability distribution over the more successful choices. Sometimes openings are treated as black boxes distinguished only by their names, sometimes as strategies which are understood to counter other strategies (Steamhammer does both at different times).

I am wondering whether it makes sense to write a rock-paper-scissors analyzer that tries to tease out exploitable patterns in the opponent’s behavior (there are established techniques), and combine it with a game solver to make better initial opening choices. On the one hand, many bots have exploitable patterns that I know about. If an RPS analyzer can find the patterns too, Steamhammer might seem to gain the mysterious “star sense” to always play the right thing for no visible reason. On the other hand, it’s relatively easy to reduce your exploitability to a low level—use randomness wisely. Also, as Steamhammer gains skills to adapt its strategy during the game, the initial opening choices make less difference. The gain might be little. And by trying to exploit patterns, Steamhammer could itself become more exploitable; it might backfire.

The parts of the system would be:

1. Classify the enemy build. Steamhammer already does this, though it needs improvement.

2. Statistically analyze the sequence of (win/loss, my opening, your opening) under the assumption that the opponent is trying to counter what we’re doing. Knowing what-counters-what may factor in. The output should be a probability distribution over opening classes, “what are they likely to do?”

3. Knowing what-counters-what definitely factors in here: Solve the game. We start with a prior probability of winning for each of our openings against each opening class the opponent might play, and thanks to Bayes we can update it as we learn about the opponent. That gives us a game matrix with uncertainties in the payoffs. (Since Steamhammer knows a huge number of opening builds, making the game matrix too big to fill in, I would classify Steamhammer’s openings too so that the output only decides which class of opening to play.) Without an RPS analyzer, we can solve the game (I expect I would use a Monte Carlo method to handle the uncertainties) and play not far from a Nash equilibrium (i.e., almost perfectly assuming unexploitable play from the opponent). If an RPS analyzer can make a good estimate of the opponent’s plans, in the best case we can do better: We can exploit the opponent’s deviation from the Nash equilibrium to win more than if we played to a Nash equilibrium ourselves.

It’s unclear to me whether either the RPS analyzer or the game solver is worth the effort. Does anybody have an opinion? Perhaps some bot I haven’t looked at has similar ideas?

inferring enemy tech buildings

You see an enemy unit. Sometimes it tells you what else the enemy has; for example, if you see a marine, you know there is a barracks even if you haven’t seen it. Drawing the inferences makes it easier to figure out the opponent’s plan. I thought it would be trivial to calculate the inferences, since BWAPI provides UnitType::requiredUnits() that tells what you need to create a unit of a given type. But it took me some head scratching; inference does not equal creation in reverse. Here’s what I ended up with. I edited the code slightly for readability; the real version is a little more obscure.

The code should be close to correct, and it works in tests so far, but I won’t be surprised if I’ve missed cases. There are a lot of cases.

// Trace back the tech tree to see what tech buildings the enemy required in order to have
// produced a unit of type t. Record any requirements that we haven't yet seen.
// Don't record required units. An SCV is required for a barracks, but don't count the SCV.
// A hydralisk is required for a lurker, but it is used up in the making.
void PlayerSnapshot::inferUnseenRequirements(const PlayerSnapshot & ever, BWAPI::UnitType t)
{
    if (t == BWAPI::UnitTypes::Zerg_Larva)
    {
        // Required because BWAPI believes that a larva costs 1 gas.
        return;
    }
    std::map<BWAPI::UnitType, int> requirements = t.requiredUnits();
    if (t == BWAPI::UnitTypes::Terran_Vulture_Spider_Mine)
    {
        requirements[BWAPI::UnitTypes::Terran_Factory] = 1;
        requirements[BWAPI::UnitTypes::Terran_Machine_Shop] = 1;
    }
    if (t.gasPrice() > 0)
    {
        requirements[the.enemyRace.getRefinery()] = 1;
    }
    for (std::pair<BWAPI::UnitType, int> requirement : requirements)
    {
        BWAPI::UnitType requiredType = requirement.first;
        if (!ever.contains(requiredType))
        {
            if (requiredType.isBuilding() && !UnitUtil::BuildingIsMorphedFrom(requiredType, t))
            {
                unitExists[requiredType] = true;
            }
            inferUnseenRequirements(ever, requiredType);
        }
    }
}

The code recursively traces back all the requirements to the root of the tech tree, if necessary. If you see an arbiter, you may learn about the stargate, the arbiter tribunal, the templar archives, the citadel, the core, and the gateway. In practice, an early scout which is turned back by a dragoon and sees nothing else concludes that protoss has a gateway, assimilator, and cyber core. It should infer a reaver from a scarab and a carrier from an interceptor. An opponent plan recognition rule does not have to say “if they have a cyber core, or some unit that requires a cyber core...” it can say “if they have a seen or inferred cyber core....”

The parameter ever remembers what enemy unit types we have ever directly seen, at any time in the game, even if all the examples have been destroyed since. That is because, if you see a vulture, all you know is that at some point in the game a factory existed to make the vulture. You have to remember. ever is filled in at the beginning of the game (or at the latest when we see the first enemy, if they went random) with the units that exist at the beginning of the game: The resource depot and worker types, plus for zerg, the larva and overlord types. That prevents loops: According to UnitType::requiredUnits(), it takes a nexus to create a probe and it takes a probe to create a nexus. True, but you have to ground the recursion!

Notice the bit !UnitUtil::BuildingIsMorphedFrom(requiredType, t): A lair requires a hatchery, but that doesn’t mean that when you see a lair you can infer a hatchery somewhere on the map. You can only infer the spawning pool. I think this serves essentially the same purpose as UnitType::isSuccessorOf() in BWAPI 4.2.0, but I haven’t switched yet. Non-building units are tricky to infer without isSuccessorOf(), so I didn’t. I don’t try to infer the vulture from the spider mine. A sunken colony requires a creep colony, but the creep colony disappears. An archon requires 2 high templar, ditto.

I filled in a couple of special cases. Spider mines: As far as UnitType::requiredUnits() is concerned, even though spider mines are units, they do not require units to create but only tech. Gas: If we see anything that requires gas, infer a refinery building. It may help early in the game. I haven’t tried to draw inferences from tech. You should be able to infer a control tower from seeing a cloaked wraith, an academy from seeing a scan occur, a lair from seeing a lurker, and so on. But one step at a time!

SSCAIT halfway point

The SSCAIT 2019 round robin stage is half finished, so it is time to take stock.

There remain few surprises. Among the top 16, Microwave has been doing well to hold #8 so far, putting it in the top half of the finals bracket, an advantage. TyrProtoss, Killerbot by Marian Devecka, Xiao Yi, and Arrakhammer are at places 14-17 with nearly equal winning percentages. Xiao Yi and Arrakhammer in particular are virtually tied. It’s likely that one of the 4 will draw the short straw and miss the finals.

For most improved, I vote MadMixP (because StyxZ made its improvement earlier). MadMix introduced a new cannon contain opening which has been tripping up opponents, including Steamhammer. I am pleased with the progress of Simplicity, which is growing stronger and more well-rounded. Ecgberht is tricky, not strong on fighting but still dangerous.

Former champion and benchmark player IceBot is below the 50% mark. Until February 2015, it was the strongest bot on SSCAIT. Other old school champions like XIMP by Tomas Vajda and UAlbertaBot by Dave Churchill are less robust and are ranked lower yet. Onward!