tournaments - 10 | Starcraft AI blog

looking ahead to AIST S3

AIST S3 is coming up fast, with play to start on 1 March. As I expected, I did not register Steamhammer (the proxy skills are looking fun, though). Events coincided to leave me extremely busy over the last ten days or so.

Here are the registered players. I sorted them by BASIL elo, so we can take the table as a first guess at the likely winners. Of course participants are likely to have special tournament updates, so it’s only a guess.

race	bot	elo
protoss	PurpleWave	2878
protoss	Locutus	2805
zerg	Microwave	2624
protoss	BananaBrain	2610
terran	Dragon	2483
terran	WillyT	2358
terran	LetaBot	2255
zerg	McRave	2153

McRave chooses to participate as zerg, which is interesting—first, that he chose to play, and second, that he chose to play offrace. The sensational news is that LetaBot signed up! That implies that LetaBot is updated to BWAPI 4.4.0, and suggests that the long-awaited bug fixing work may be complete. If so, this version of LetaBot may be much stronger than the BASIL elo of the old version indicates. I’m looking forward!

As usual in recent years, protoss is on top, terran struggles, and zerg is scattered around. We’ll see whether the tournament results agree with that.

8 is a power of 2, so 8 players are a good number for an elimination tournament. AIST S1 had 5 players and S2 had 10, so byes had to be inserted into the pairings. None of that should be needed this time.

The maps for S3 are (2)Overwatch, (2)Tres Pass, (3)Power Bond, (4)Circuit Breaker, (4)Fighting Spirit, and (4)Gladiator. I linked the unfamiliar maps; bots have played on the others before and should be ready. None of the maps has difficult features that call for special-case coding, which must be a relief to participants. The strangest feature is that Tres Pass has a short air distance between the 2 bases and a long ground distance, which is not so strange at all. Power Bond does have a neutral command center on the center platform, which zerg could purloin with a queen (and land elsewhere to make infested terrans), but I doubt either zerg has the skills.

Since Steamhammer is not playing, may the second best bot win!

how to defile

We have the next round of SSCAIT results. I was hoping for Killerbot by Marian Devecka over ZNZZBot, because Steamhammer is a favorite over Killerbot. But ZNZZBot squeaked a win by crash. Steamhammer will face ZNZZ in loser’s round 3 and almost certainly be eliminated. Oh well.

one little plague

A highlight from Steamhammer-Icebot on Fighting Spirit. In old days, Steamhammer would have finished a winning game like this by brute force, with mass ultralisks or mass guardians. The actual ending did feature brute force, but more elegantly applied.

Dark swarm ensured that zerglings would break the front. Rubble of bunkers and turrets lies everywhere. Tanks rain splash damage from above, but as zerg reduced the natural, the defiler (selected) threw a plague over every tank. When the mutalisks visited them, the tanks popped like so many bubbles.

Very satisfying. All that work on defilers was worth it.

choosing among tournament designs

I wrote in 2016 about tournament design in general. Today I’ll talk about specific designs and consider Dan Gant’s proposed tournament design.

I use words precisely. A “game” is a single game. A “match” is a sequence of games between the same 2 players. A “pairing” is the choice of which 2 players play a given game or match against each other. (I don’t say “matchup” here, but I reserve the word for race matchups, terran versus zerg and so on.)

With n players there are n * (n - 1) / 2 possible pairings. Each pairing is potentially independent; A might beat all other comers but lose to X, Y, or Z, and you don’t know until they play. One of the goals of a tournament is usually to declare “a winner” even though there may be no unambiguously strongest player—there may be cycles of A beats B beats C beats A, as in the Condorcet paradox. That’s only one reason that tournament design is hard, but it’s enough!

tournament designs

Here are some popular tournament designs. Many more exist. There are also many ways to combine them; these are simple tournament designs that can be plugged together as modules to create complex tournament designs.

Round robin means each player plays each other. The SSCAIT round robin phase is a double round robin where each player plays each other twice, and the CIG and AIIDE academic competitions are similar but with many more rounds. A round robin tournament collects evidence evenly across all pairings, and traditionally it also weights the evidence evenly: Winning a game against a weak competitor is 1 point, winning a game against a strong competitor is 1 point.

Knockout means that losing players are eliminated. In single elimination, whoever loses one game is out; in double elimination, two losses and out; there are higher variants. Knockout tournaments usually feature fixed brackets, where the pairings of winners and losers are laid out beforehand. If the players are seeded 1 to 10, then pairing 1 with 10, 2 with 9 and so on gives an advantage to the higher seeds, so they are more likely to meet in final rounds. A knockout samples few pairings and gives less evidence than other designs that the winner deserves it, or in other words, the luck of the pairing is a big factor.

Grouping the competitors into sub-tournaments can be one stage of a tournament. Pro Starcraft tournaments commonly have a “group stage” where the players are divided into groups of 4 who play among themselves. The top 2 finishers of the 4 move on to the next stage, which has a knockout format. Grouping selects some pairings to gather evidence about, and ignores others. Luck again; did you get the group of death?

Progressive elimination designs come in many shapes, but in general, the tournament runs through steps or stages, and after each step the weakest competitors are dropped. For example, the players play a round robin, then 1 or more of the competitors with the lowest scores are eliminated; slice off those rows and columns from the crosstable. The remaining players keep their surviving results in the crosstable and add to them by playing another round robin, and the process repeats. This kind of design collects some evidence about all pairings, and deliberately collects more about the top competitors; it cares more about the strongest competitors, which have the best chances to win.

The Swiss system attempts to identify a winner with fewer total games. A Swiss tournament has a fixed number of rounds, much less than the number of players, where each player plays one game per round. In the first round, players are normally paired by seeding order (if players are 1 to 10, pair 1 with 6, 2 with 7, on to 5 with 10). In later rounds, players are paired against others with the same score: Winners against winners, losers against losers. If the number of players with a given score is odd, one player has to be paired up or down; there are rules about how to do this fairly given the seeding order. Compared to designs that collect more evidence, a Swiss tournament has a higher chance of players ending up tied at the top. Swiss makes sense when the number of players is large and the number of games per player must be kept small; I would recommend it over 4-player groups in pro Starcraft qualification tournaments (the ones to choose players before the main tournament starts).

matches

Anywhere that I wrote “game” above, the players could play a match of multiple games instead. That’s common in Starcraft knockout tournaments: You are knocked out, or at least knocked down to the loser’s bracket, if you lose a best-of-k match for some odd number k. A k-round round robin can be seen as a single round robin of k-game matches. A match can be taken as a tournament between 2 competitors, and like a bigger tournament, there are different way to design it.

Best of k means that the first player to win over half of k games wins the match. The remaining games may not be played—in Starcraft, they aren’t; in other sports, they sometimes are.

Ahead by k means that the players play games until one of them pulls ahead of the other with a lead of k games. If the players are evenly matched, then the match might continue for a long time, especially if k is big. You collect more evidence to decide a close call, but you don’t know how long it will take.

Statistical decision procedures are also possible, though I’ve never seen them used in a public match—they’re hard to explain to viewers. The match continues until one player is significantly ahead by some chosen statistical criterion, functionally equivalent to, say, “A is at least 67% likely to be better than B given these results.”

the Dan Gant proposal

Dan Gant of purple fame proposes using match scoring rather than game scoring for a round robin tournament. My 2016 tournament design post points out that this would have changed the results of AIIDE 2015. The proposed details, to quote Dan:

To create a better event that’s pure round robin, I think the formula is this:
1. Bots are ranked on how many opponents they had winning matchups against
2. First tiebreaker: Head-to-head record against all tied opponents
3. Second tiebreaker: Overall win%

Dan says “matchup” where I would use “pairing”. I foreshadowed this above when I pointed out that a k-round round robin can be seen as a single round robin of k-game matches. Dan says, instead of adding up the game results to get a competitor’s score, add up the match results. In the SSCAIT round robin phase, k=2, so a bot scores 1 if it defeated an opponent 2-0 and otherwise scores 0. The idea is that we should only care about how many opponents a bot can defeat—to beat more enemies is better, even if you lose occasional games, and he believes that leads to better incentives for bot authors. It’s a reasonable argument. To be a top bot currently, you must score nearly perfectly against most weaker bots, and the effort to get that near-perfect score is not as interesting as the effort to beat your next competitor up. That’s the claim, and it makes sense to me. I can also see that the claim may be more appealing to top bot authors than to others. Steamhammer’s tricks like burrowed zerglings will be countered by top bots soon if not already, and may never be understood by lower-ranked bots, so I expect that the long-run effect will be to score higher against lower-ranked opponents; that doesn’t make the feature less fun or interesting to me.

With a score range of only 0 to 1 per opponent, rather than 0 to k, there will be more tied rankings, so he proposes tiebreakers. With k=2 in SSCAIT, the first tiebreaker amounts to “I had 4 ties and 3 losses, you had 3 ties and 4 losses, so I’m ahead.” With k=100 or more in AIIDE, the first tiebreaker will hardly matter, and only the second tiebreaker will count.

It’s simple to modify the scheme to avoid tied matches, if that’s something you care about. For SSCAIT, instead of a k-game match with k=2, go with a best-of-k match with k=3. Based on the rate of 1-1 matches in this year’s crosstable, I estimate this would require about 10% more games. There would still be ties in the rankings, and breaking the ties might be more complicated, so I don’t consider best-of-k a better choice overall for SSCAIT.

Scoring matches instead of games means throwing away information, at least until tiebreakers are applied. How much information you are throwing away depends on k, so the argument for this design also depends on k. If k is large, you’ll often know which player is better in a given pairing long before all k games are played. Given that you go with match scoring, that is an argument for choosing something other than a fixed k-game match for each pairing.

Different tournaments have different goals. Dan’s proposal is in part calling for the goal of different bot author incentives to be incorporated in tournament goals. It’s perfectly reasonable, and may help the community maintain interest and activity over the long haul. If some tournament organizers agree, then there should be more discussion about what kinds of design best meet the goal. I also think it’s good that we have different tournaments with different goals, so it’s good if not all organizers agree.

If a large academic tournament like CIG or AIIDE accepts the argument, then I have a different proposal. I think a progressive elimination tournament might perhaps meet both academic “let’s measure everything” goals and Dan’s “give authors good incentives” goals. These tournaments have n players with n in the tens and k rounds for k near 100. They are round robin. A progressive elimination design might go like this: Play 50 rounds with all players. Then drop the lower half and play 50 more rounds (so far this is like CIG 2016 except that first stage results should carry over). Repeat once or twice more. The result is fewer total games, all pairings have at least half as much evidence collected, and pairings between top competitors are much more deeply investigated. Lower-ranked competitors were eliminated so detailed results against them do not affect the ranking of the final winners.

Maybe a more gentle form of progressive elimination would be better. I feel it does make sense to sample more data from higher-ranked bots than from the tail, though perhaps not so much more. As far as I can see, some design in a progressive elimination style could meet more goals.

SSCAIT round robin is over

And that’s it, the SSCAIT 2019 round robin phase is complete. The last game was #40 CUNYBot by Bryan Weber > #43 Marine Hell.

We have #1 Locutus, #2 PurpleWave, #3 BetaStar. Oldtimers with little or no recent development that qualified for the round of 16 are #5-6 Iron (tied with BananaBrain), #12 Killerbot by Marian Devecka, and #15-16 Bereaver. The last bots to qualify were #15-16 TyrProtoss and Bereaver, and the first to miss out was #17 StyxZ with 2 wins fewer. Arrakhammer that I thought would be a close call fell several places and tied with Skynet by Andrew Smith and XIMP by Tomas Vajda for places #19-21 (not bad company). I think this is the first time XIMP has ever failed to reach the finals; the field has finally overtopped it.

Top terran is #4 Halo by Hao Pan. Top zerg is #9 Microwave. I want to call out #10 Proxy and #13 MadMixP as doing particularly well.

Good work, all! As far as I am concerned, the round robin is the main tournament and the finals are lagniappe. Still, there’s more to look forward to.

Steamhammer in the SSCAIT elimination phase

As I write, the SSCAIT round robin phase is close to its end, and though the top 2 places are undecided, most of the top 16 ranks are either mathematically fixed or highly unlikely to change. I am able to forecast part of Steamhammer’s path through the final elimination phase with only moderate risk of error.

Steamhammer will finish #11, the same as last year. It’s the middle of the range #10-#12 that I predicted on 29 December.

BananaBrain and Iron are tied for #5-#6. I don’t know how the tie will be broken for pairing purposes, because they scored 1-1 against each other. Perhaps it will be broken randomly. It matters for Steamhammer, because if the pairings work the same as last year, #11 Steamhammer will be paired against #6.

suppose #6 Iron

A stroke of luck. #11 Steamhammer 3-0 #6 Iron in the first round, or at worst 3-1, the same round 1 result as last year. In the second round, Steamhammer should face #3 BetaStar, which will win crushingly, still the same result, pushing Steamhammer into loser’s bracket round 2. There, Steamhammer is likely to face #16 TyrProtoss, which will be a close match that I can’t call. In the last two weeks on BASIL, the score is Steamhammer 13-11 TyrProtoss. In the SSCAIT round robin, the score is Steamhammer 1-1 TyrProtoss. Last year, this is that match that eliminated Steamhammer. If Steamhammer does win, surviving longer than last year, its next opponent in the loser’s round 3 is harder to forecast, and the match could again go either way—though I think Steamhammer is more likely to drop out here than to go on. It’s funny that this path is so nearly the same as last year, and maybe longer even though Steamhammer is probably relatively weaker.

suppose #6 BananaBrain

Though Steamhammer 2-0 BananaBrain in the round robin thanks to my preparation and/or luck, BananaBrain is likely to win this match. BASIL shows Steamhammer 2-14 BananaBrain. In this case, Steamhammer is likely kicked out at once.

Yeah, elimination tournaments are very sensitive to pairings and lucky results. Which means that something entirely different could yet happen.

SSCAIT halfway point

The SSCAIT 2019 round robin stage is half finished, so it is time to take stock.

There remain few surprises. Among the top 16, Microwave has been doing well to hold #8 so far, putting it in the top half of the finals bracket, an advantage. TyrProtoss, Killerbot by Marian Devecka, Xiao Yi, and Arrakhammer are at places 14-17 with nearly equal winning percentages. Xiao Yi and Arrakhammer in particular are virtually tied. It’s likely that one of the 4 will draw the short straw and miss the finals.

For most improved, I vote MadMixP (because StyxZ made its improvement earlier). MadMix introduced a new cannon contain opening which has been tripping up opponents, including Steamhammer. I am pleased with the progress of Simplicity, which is growing stronger and more well-rounded. Ecgberht is tricky, not strong on fighting but still dangerous.

Former champion and benchmark player IceBot is below the 50% mark. Until February 2015, it was the strongest bot on SSCAIT. Other old school champions like XIMP by Tomas Vajda and UAlbertaBot by Dave Churchill are less robust and are ranked lower yet. Onward!

Steamhammer’s opponent model

In the tournament, the ranks are starting to resolve. Outside the collapse of McRave, I don’t see any big surprises. Locutus has played fewer games than other bots, so the top position is less clear than others. Steamhammer is likely to finish around rank #10-#12, in the range of past performances, so it is safely in despite my worries.

In development, resource tracking was soon working. When destroying an enemy base you normally get to see the resource counts, so used bases should be evaluated accurately.

I started writing a scout boss, then I got distracted by another project. I am refactoring gas steal into one skill in a skill system that retains data in the game records of the opponent model. You subclass a Skill object, fill in around 8 virtual methods of which most are simple, and you get opponent modeling that estimates when the skill is useful against whoever you’re playing now. Since it’s implemented in code, it’s highly flexible. You can choose what data to record, including measurements of how successful a skill was in the current game, and how to interpret the data.

The immediate effect I hope for is better gas steal decisions. Steamhammer’s new queen skills are also good candidates, because queens vary from useful to wasteful. Will a queen be good? How many queens? Will ensnare be good? Even tactical decisions like “play defensively until time t” should be possible, with t adjusted in real time as the opponent model watches the game unfold. I like the idea of learning to adapt tactical play to the opponent.

The game record file format will change. The format has to change soon anyway, because it doesn’t record all the information needed for some important strategy decisions. I designed it so that I can add new format game records without having to discard old ones.

SSCAIT 2019 underway

For anyone who hasn’t noticed yet, SSCAIT is underway and games are being played. 45 bots made it in after screening, which means 45 * 44 = 1980 games in the double round robin phase. So far, some bots have played up to 7 of their games, while some have yet to play any.

• results so far
• the stream
• ranks and crosstable by MicroDK

Since comments are coming in about it anyway, here’s a post to attach them to. MarcoDBAA suggests Simplicity-ZNZZBot as a fun game (and I agree).

prep for the SSCAIT annual tournament

I’ve disabled Randomhammer on SSCAIT in preparation for the tournament. It’s less than a week away.

For Steamhammer, I am trying to overcome my usual deadline intolerance. I fixed one critical bug, and now I am concentrating all my effort on my Killer Feature. Whether it will kill anything is an open question, but it will be fun, and above all it should be finished and tested and tuned by the deadline. I’ll be satisfied if it can catch out a few strong opponents.

As usual, the big improvements promise to arrive after the tournament. I am adding, or on the verge of adding, basic skills that promise stronger play, and it is impossible to exploit new skills fully in a short time. I expect to soon have the infrastructure needed to add nydus canal support, though I don’t know whether I’ll actually add it soon. Nydus canals make island bases much more useful. I am very tempted to delay strategy adaptation work to add a bag-of-tricks meta-skill that knows how to select from a bag of tactical ideas to pose problems to different opponents. I know a lot of tricks that promise to be effective in specific situations.

In the arena of unimportant abilities, it’s tempting to fill out the queen skills. Steamhammer is capable of controlling a fleet of queens simultaneously without blowing out the per-frame time limit, but they tend to all simultaneously broodling the same target. Queens tend to carelessly fly into danger and die, and at the same time do not know how to fly around freely and seek their own targets—they wait for targets to wander into range. A bug in the production system prevents Steamhammer from producing infested terrans, but I’m not fixing the bug yet because it doesn’t have the skill to control them properly. Infested terrans are powerful but must be used carefully, for example dropped from overlords or coordinated with defilers. That’s something I may work on gradually over the next year.

Upcoming: I want to time a few other openings the same way I timed the Styx opening, to compare them. I think it will be enlightening. And I still hope to get back to the AIIDE tournaments and do more analysis; we’ll see if I can pull it off.

AIIDE 2019 - looking into AITP

AITP follows an “aggressive defense” game plan, similar to SAIDA, where it builds up a strong ground army, then sets up tank lines in forward positions to constrict the enemy like a boa (preferably a snake, sometimes a feather boa). Its overall skill is far less, though; it finished second to last in AIIDE 2019. Overall, after reading some of the key code, I am not impressed (maybe you shouldn’t expect me to have been). The plans are ambitious but the work looks hasty, as if the authors underestimated it and had to rush to make the tournament. Still, the plans are ambitious, and that makes it somewhat interesting.

tactics

One of AITP’s first steps is to calculate map positions for a wall (supplyPoint barracksPoint bunkerPoint), defensive locations (chokePoint1 through chokePoint4) and offensive tank lines (frontLine1 through frontLine3). How does it calculate them? They are hardcoded for every starting base on every map in InformationManager::initializePosition(). That must be part of why AITP did not want to participate in the unknown map tournament. These positions seem to be the foundation on which all the tactical decisions are laid.

Spider mines are placed in neat diagonal double lines at locations offset from the defense line calculated by CombatCommander::updateDefenseLine(), which looks mostly at the supply and the previous defense line and sets the current defense line to chokePoint2 (the natural choke) or to one of the precalculated frontLine values. It’s simple and primitive, and there is not much examination of the game situation, a good starting try but nothing you expect to be strong. It’s silly sometimes; the defense line may be in front of an empty base that is away from the fight. CombatCommander::updateSpiderSquads() sets up spiderSquad1 and spiderSquad2—and also lays mines itself, calculating the offset steps and checking every vulture itself rather than leaving it to the squads. The intended separation of functions is not observed. Micro::LaySpiderMine() is unused.

StrategyManager::shouldBuildTurret() decides how many turrets should go where. Some turrets go to occupied choke points 1 and 2, some next to command centers, and some at the front lines computed above depending on certain “squad positions.” These so-called squad positions come from CombatCommander::getSquadPosition(std::string squadID), which takes a string that is called squad ID but is actually a 2-digit code that identifies a location rather than a squad. The location is usually an offset from the current defense line, but there are exceptions. The ID seems to choose one of a hardcoded set of offsets for the final position.

Units, of course, also go to the current defense line: That is the tank line, soon furnished with mines and turrets, that is meant to restrict the enemy. At some point CombatCommander::updateAttackEnemyBase() becomes true and sets _aimEnemyBase to the location of an enemy base to destroy. When this happens depends on what the current defense line is, but it’s another set of simple calculations using the closest friendly unit to various enemy bases.

what to build

I outlined the build order and unit mix decisions in the post how AITP played. There are game stages A, B and C and “modules” A1, A2 and so on for each game stage.

A1 is the antirush module that starts a barracks at 5, whose play was described in that post. It switches to the next module when there is a bunker and 4 marines and the engineering bay is started. One of AITP’s strategies was A1-B1-B2-C2. B1 makes SCVs up to 20, barracks up to 5, academy and medics, moves out at 10 marines, and switches to the next module after 10 minutes game time. B2 makes a bunker under certain conditions—but only if the previous module was A4. The modules are not in fact modular but know about each other. Then it makes SCVs and marines up to a smaller limit than B1, makes a factory and gets the upgrades and expansion. It switches when there are 2 command centers or the supply reaches 100 (really 100, not 50: int supplyUsed = BWAPI::Broodwar->self()->supplyUsed() / 2;). Module C2 is the middlegame: It makes SCVs to the limit, adds factories, throws bunkers into the middle of the map, gets armory upgrades, and never switches.

Here’s an extract showing the strange over-specificity of AITP’s code, from StrategyManager::doSwitchModule() which switches from the current module to the next one. The code does not simply parse out the strategy module names from the StrategyName string, it handles each as a special case, sometimes taking extra actions that I would say should be factored out.

	if (_currentModule == "A1")
	{
		if (Config::Strategy::StrategyName == "A1-B1-B2-C2")
		{
			_currentModule = "B1";
			CombatCommander::Instance().clearSquad("bunkerSquad2");
		}
		else if (Config::Strategy::StrategyName == "A1-B3-C2")
		{
			_currentModule = "B3";
			CombatCommander::Instance().clearSquad("bunkerSquad2");
		}
	}

AITP’s modules are not easy to read. I think Steamhammer’s explicit build orders plus zerg strategy rules are more perspicuous and no less expressive—but then I would say that, wouldn’t I?

AIIDE 2019 - how AITP played

I think AITP is the only AIIDE 2019 bot whose game play deserves its own post. Most others can be watched at SSCAIT and BASIL, and BunkerBoxeR is severely buggy. AITP finished second to last, but it is complex and interesting. In this post, the game links are links to replays; you’ll have to drop them into the OpenBW player yourself.

AITP is derived from Steamhammer and shares some of the same habits. For example, it places supply depots and other buildings of the same size preferentially at the edge of the map, like Steamhammer terran. As explained in what AITP learned, the bot has an interesting abstract strategy system that is different from any other Steamhammer derivative.

AITP does not have strategy work only. It knows how to make a wall at the ramp, how to lift the barracks to open the wall, and how to land the barracks nearby to leave the wall open. It also knows how to lift the barracks back and restore the wall in case of danger. It researches spider mines and lays mines in neat diagonal lines in places like the approach to its natural. Later in the game, when it has a dangerous tank ball, it moves out and sieges to cover movement routes, like SAIDA but with less understanding of where the important places are, and builds turrets there. It has an excessive desire to build bunkers in the middle of the map, which it then doesn’t use properly.

It has also lost important skills that it inherited. It seems to have lost the emergency reaction of taking SCVs off gas when mineral mining is more important because of a large gas excess. Marines may stand away from a bunker that they should jump into, a new and critical misplay. AITP often moves its army away from the action toward a distant empty base, as in this game versus XiaoYi on Fortress (possibly a bug in deciding on the squad target).

Overall, I judge AITP as promising but not mature. I imagine that the tournament came up too early in its development. Some of its skills are impressive, others look incomplete or broken. It needs work to fix bugs (223 crashes during the tournament), polish skills, and add strategy modules to cover more cases. If it gets that work, I think it could become very strong.

Here are my observations of how AITP played when following each of its 5 declared openings. I correlated its learning files with the detailed tournament results to find games with each strategy. Later I’ll read the code and decipher the strategy modules more fully. All strategies involve initial buildings placed at the base entrance. On the map Aztec, where the main base is on low ground below a ramp, it smartly builds at the entrance to the natural instead of the entrance to the main. Few bots know to do that; even Locutus got in some trouble by not doing that.

Every one of these strategies is defensive, as you might expect with buildings at the entrance. I think AITP has good vulture micro and would benefit from having the option of more aggressive vulture play. Also none of the openings is honed to a fine edge; they are all a little rough.

A1-B1-B2-C2 Make a very fast barracks at 5 supply, placed at the base entrance, then finish the wall with a bunker and supply depot. This is the build labeled AntiRush. Later add more barracks and get medics (though no marine upgrades) and move out. I initially guessed that AITP played factory unit mixes every game, but this is a barracks unit mix. This plan scored wins only against random UAlbertaBot, which of course favors rushes. Here is a win on Heartbreak Ridge versus protoss zealots.

A1-B3-C2 Make a very fast barracks at 5 supply, placed at the base entrance, then finish the wall with a bunker and supply depot (this part must be A1). Follow up with factory and ebay and turret up the base. Get vultures with speed first, then expand. Eventually switch into a tank-goliath unit mix (I’m guessing this is C2, which if true means that I didn’t see a game with the anti-rush A1-B1-B2-C2 strategy which got as far as stage C). This was AITP’s most successful strategy by far, in fact I would call it the only successful strategy. It was a top choice against ZZZKBot, Microwave, McRave, UAlbertaBot, and BunkerBoxeR. Here is one of AITP’s 7 wins against Microwave on Python where Microwave played a 9 pool and followed up unambitiously.

A3-B5-C1 Narrow the base entrance with supply depot then barracks, and start a factory before the first marine. Get vultures with mines first and remain defensive. Start a command center in the base, and keep making vultures. When the command center is done, float it into the natural, keep defending with the vultures, and finally add tanks before moving out. This strategy scored zero against every opponent except BunkerBoxeR, which played broken builds. In the first game versus Iron on Benzene, AITP put up a creditable fight.

A3-B7-C1 Narrow the base entrance with supply depot then barracks, and start a factory before the first marine. Get vultures with mines first and remain defensive (up to here must be A3). Then get tanks and start a command center in the base. Not a suitable opening against aggressive play. When there are 2 tanks (no siege mode), lift the finished command center toward the natural. This plan did not show good success against any opponent, but worked better than A3-B5-C1 above. Here is a game versus McRave on Destination.

A4-B2-C1 Wall the base entrance with barracks, depot, bunker, in that order. Make marines and vultures and a fast ebay for turrets. When there are 5 marines, push out, bunker the natural, and expand. The strategy does not impress me; it is not greedy enough to gain an advantage against a cautious opponent (DaQin), and as implemented not safe enough to survive an aggressive opponent (Microwave). Watch how effortlessly PurpleWave wins with 2 dragoons and straightforward play when the 5 marines move out; AITP is missing the defensive skills to make it work.

CoG 2019 downloads fixed

For those like me who have not been paying attention, the CoG 2019 results page has had its broken downloads fixed. The SOURCE_CODE link now gives you source, and REPLAY_04 is a valid zip file full of replays.

AIIDE 2019 - what Microwave did

Here’s data from Microwave’s history files, using the same script as for BananaBrain with a little customization. Unlike Microwave’s learning files, which deliberately omit data and include information from pre-learning, the history files tell what Microwave actually did during the games. Microwave didn’t record information about the opponent’s strategy, so that table is left out. That made it look a little sparse, so I added columns giving the first and last games when the opening was tried, where the first game in the history file is game 0. We can see things like when a winning opening was found, and whether it kept winning. If there are fewer than 100 games recorded for an opponent because Microwave crashed, then the game numbers generally do not align with the tournament round numbers.

Against difficult opponents, Microwave experimented widely. Against some opponents that Microwave pre-trained against, it played whatever came out of pre-training. So I don’t have much to say about opponents in the top half of the post. But toward the bottom I’ve made some comments. Especially see the note to AITP.

#1 locutus

opening	games	wins	first	last
10Hatch9Pool9gas	8	12%	1	52
2HatchHydra	7	0%	0	53
2HatchLurker	7	29%	83	89
2HatchLurkerAllIn	2	0%	63	90
2HatchMuta	12	25%	3	56
3HatchHydraBust	3	0%	10	57
3HatchLingBust	3	0%	38	91
3HatchPoolHydra	5	0%	16	92
4HatchBeforeGas	4	0%	27	93
4PoolHard	3	0%	15	58
4PoolSoft	4	0%	21	59
5Pool	2	0%	36	60
5PoolSpeed	3	0%	41	94
6Pool	3	0%	42	95
6PoolSpeed	3	0%	43	96
7Pool	2	0%	37	61
8Pool	3	0%	44	97
9Pool	9	22%	45	78
9PoolLurker	2	0%	46	79
9PoolSpeed	3	0%	11	62
9PoolSpeedLing	2	0%	47	80
ZvP_10Hatch9Pool	4	0%	17	81
ZvZ_Overpool11Gas	4	0%	18	82
23 openings	98	8%

#2 purplewave

opening	games	wins	first	last
10Hatch9Pool9gas	11	9%	20	93
2HatchHydra	6	0%	14	87
2HatchMuta	5	0%	35	94
3HatchHydraBust	9	0%	3	95
3HatchLingBust	14	7%	0	74
4PoolHard	1	0%	80	80
4PoolSoft	7	14%	30	75
5Pool	8	12%	15	90
5PoolSpeed	1	0%	81	81
6Pool	1	0%	82	82
6PoolSpeed	1	0%	83	83
7Pool	8	0%	17	76
8Pool	4	0%	42	91
9Pool	1	0%	84	84
9PoolSpeed	3	0%	52	92
9PoolSpeedLing	14	21%	4	77
ZvP_10Hatch9Pool	1	0%	85	85
ZvZ_Overpool11Gas	1	0%	86	86
18 openings	96	7%

#3 bananabrain

opening	games	wins	first	last
10Hatch9Pool9gas	1	0%	54	54
2HatchHydra	1	0%	51	51
2HatchMuta	1	0%	52	52
3HatchLingBust	37	49%	0	92
4PoolHard	3	0%	29	63
4PoolSoft	4	0%	28	67
5Pool	11	45%	22	76
5PoolSpeed	7	29%	19	78
6Pool	1	0%	62	62
6PoolSpeed	5	20%	20	68
7Pool	1	0%	55	55
8Pool	3	0%	24	69
9Pool	7	43%	56	70
9PoolSpeed	1	0%	53	53
9PoolSpeedLing	3	0%	25	71
ZvZ_Overgas9Pool	4	0%	26	77
ZvZ_Overpool11Gas	3	0%	35	79
17 openings	93	31%

#4 daqin

opening	games	wins	first	last
10Hatch9Pool9gas	11	18%	2	77
2HatchHydra	4	0%	18	78
2HatchLurker	4	0%	23	79
2HatchMuta	13	23%	17	89
3HatchHydraBust	3	0%	20	51
3HatchLingBust	31	39%	16	76
3HatchPoolHydra	3	0%	25	52
4PoolSoft	3	0%	6	53
5Pool	3	0%	7	54
7Pool	3	0%	11	55
9Pool	3	0%	1	56
9PoolSpeed	3	0%	10	57
9PoolSpeedLing	3	0%	0	58
ZvP_10Hatch9Pool	3	0%	5	59
14 openings	90	19%

#5 steamhammer

opening	games	wins	first	last
9PoolSpeed	100	75%	0	99
1 openings	100	75%

#6 zzzkbot

opening	games	wins	first	last
9PoolHatch	1	0%	0	0
ZvZ_Overgas11Pool	70	80%	1	70
2 openings	71	79%

Why are only 71 games recorded? According to the official results, Microwave crashed in 56 games throughout the tournament, and 29 of those crashes happened against ZZZKBot. Microwave recorded every game in which it did not crash. It’s a debugging opportunity. :-/

#8 iron

opening	games	wins	first	last
10Hatch9Pool9gas	2	0%	53	82
2HatchHydra	1	0%	83	83
2HatchLurkerAllIn	2	0%	63	88
2HatchMuta	11	9%	0	72
3HatchHydraBust	15	33%	5	77
3HatchHydraExpo	1	0%	84	84
3HatchPoolHydra	1	0%	85	85
4HatchBeforeGas	4	0%	18	89
4PoolHard	6	0%	13	78
4PoolSoft	7	14%	11	71
5Pool	1	0%	86	86
5PoolSpeed	6	0%	14	79
6Pool	2	0%	54	87
6PoolSpeed	5	20%	35	92
7Pool	10	30%	19	68
8Pool	7	14%	17	80
9Pool	8	12%	1	95
9PoolSpeedLing	4	0%	21	96
OverpoolTurtle	4	0%	22	81
19 openings	97	13%

#9 xiaoyi

opening	games	wins	first	last
10Hatch9Pool9gas	2	0%	42	47
2HatchLurker	1	0%	48	48
2HatchMuta	2	0%	45	46
4PoolSoft	38	63%	1	38
5Pool	2	50%	0	39
7Pool	51	76%	49	99
9Pool	2	50%	40	41
9PoolSpeedLing	2	0%	43	44
8 openings	100	65%

As soon as Microwave found that 7 pool worked, it played 7 pool exclusively.

#10 mcrave

opening	games	wins	first	last
2HatchMuta	40	62%	0	79
3HatchHydraBust	13	92%	86	98
4PoolHard	1	0%	80	80
4PoolSoft	40	62%	1	40
9Pool	1	0%	85	85
ZvZ_Overgas11Pool	4	50%	81	84
6 openings	99	65%

Microwave was late to discover the success of the hydra bust opening. That’s why it was played so little. The example shows the importance of finding good ideas as early as possible. I am adding smarts to Steamhammer to make it better at finding the good tries fast.

It’s interesting that 2HatchMuta and 4PoolSoft have the same numbers, but were given up on at different times.

#11 ualbertabot

opening	games	wins	first	last
4PoolSoft	100	82%	0	99
1 openings	100	82%

The choice against UAlbertaBot was determined by pre-training. From scratch, I expect Microwave would have tried a wider variety.

#12 aitp

opening	games	wins	first	last
9PoolSpeedLing	100	93%	0	99
1 openings	100	93%

If the first try wins, keep it up. What if Microwave had an opening that would have won more than 93%? The theory is that, above some winning rate, the risk of losing by trying alternatives is higher than the risk of losing by sticking with a known good opening. But what winning rate is high enough to stick with? It depends on how much you respect your opponents. If you expect to win nearly every game, like Locutus, maybe you should switch to an alternative as soon as you lose a single game. If you expect to finish near the bottom, maybe you should stick with a strategy that wins 50%.

But more: How much do you respect each opponent? Maybe bots should have a “contempt factor” like chess programs may use to decide whether to aim for a draw: Accept a low winning rate strategy against Locutus, but demand 95% wins against the unknown who you’ve decided is a weak newbie. I would rather call it a respect factor! In a UCB algorithm, a level of respect is implicitly encoded in the exploration rate constant. Does any bot already have a respect factor for specific opponents?

#13 bunkerboxer

opening	games	wins	first	last
5Pool	100	99%	0	99
1 openings	100	99%

Apparently the initial choice against an unknown is random.

AIIDE 2019 - what BananaBrain learned

I wrote a script to analyze BananaBrain’s game history files, which record its experience with each opponent. For now, I had the script summarize the strategies played and the enemy strategies recognized. The history files also record the map and a value that represents the game duration. History files are rich with information, and there are many ways to summarize it. It would be interesting to see how strategy usage and win rate vary by map, among other possibilities.

The same script should work with minor changes to summarize Microwave’s history files.

BananaBrain had prepared history files for the opponents #1 Locutus, #2 PurpleWave, #5 Steamhammer, #6 ZZZKBot, #7 Microwave, and #8 Iron. Data from the prepared history files was not copied into the write directory. That is different from how Steamhammer and Locutus keep their game records, and it has the nice effect that the tables show exactly what happened in the tournament, from BananaBrain’s point of view.

For each opponent, the left table is BananaBrain’s choice. The right table is BananaBrain’s idea of what the opponent did. All the win rates are from BananaBrain’s point of view, so that, for example, when Locutus played P_1gatecore, BananaBrain won 5% of the time. Of course, the opponent’s view of its own strategy is likely to be more fine-grained than BananaBrain’s. To take the extreme case, Steamhammer played 30 different openings against BananaBrain, and BananaBrain recognized them in 8 categories.

#1 locutus

opening	games	wins
PvP_10/12gate	6	17%
PvP_12nexus	11	36%
PvP_2gatedt	10	0%
PvP_2gatedtexpo	9	0%
PvP_3gaterobo	5	0%
PvP_3gatespeedzeal	8	25%
PvP_4gategoon	6	0%
PvP_9/9gate	12	8%
PvP_9/9proxygate	9	0%
PvP_nzcore	8	12%
PvP_zcore	4	0%
PvP_zcorez	6	0%
PvP_zzcore	6	17%
13 openings	100	10%

enemy	games	wins
P_1gatecore	20	5%
P_cannonrush	29	7%
P_fastexpand	1	0%
P_ffe	19	21%
P_unknown	31	10%
5 openings	100	10%

As you might expect against Locutus, the best choice was a fast expansion.

Is the single game of enemy P_fastexpand a misrecognition? I suspect that Locutus played otherwise, and BananaBrain didn’t see everything and wasn’t able to draw the right conclusion. Or maybe it’s a bug somewhere. PurpleWave and McRave also show a single P_fastexpand game.

#2 purplewave

opening	games	wins
PvP_10/12gate	23	70%
PvP_12nexus	2	0%
PvP_2gatedt	6	17%
PvP_2gatedtexpo	3	33%
PvP_3gaterobo	2	0%
PvP_3gatespeedzeal	1	0%
PvP_4gategoon	8	38%
PvP_9/9gate	26	88%
PvP_9/9proxygate	13	62%
PvP_nzcore	3	0%
PvP_zcore	4	25%
PvP_zcorez	5	40%
PvP_zzcore	4	25%
13 openings	100	56%

enemy	games	wins
P_1gatecore	54	56%
P_2gate	25	60%
P_2gatefast	6	33%
P_fastexpand	1	0%
P_ffe	2	50%
P_unknown	12	67%
6 openings	100	56%

Against PurpleWave, different zealot rushes worked best. Maybe it is because zealot rushes depend for their success more on execution than on the enemy’s strategic reaction. PurpleWave is particularly good at reacting to the enemy strategy, and BananaBrain is good at execution.

#4 daqin

opening	games	wins
PvP_10/12gate	8	62%
PvP_12nexus	6	33%
PvP_2gatedt	6	17%
PvP_2gatedtexpo	12	83%
PvP_3gaterobo	7	14%
PvP_3gatespeedzeal	6	33%
PvP_4gategoon	5	0%
PvP_9/9gate	14	93%
PvP_9/9proxygate	9	67%
PvP_nzcore	7	43%
PvP_zcore	6	33%
PvP_zcorez	7	43%
PvP_zzcore	7	43%
13 openings	100	51%

enemy	games	wins
P_1gatecore	82	50%
P_unknown	18	56%
2 openings	100	51%

BananaBrain made quite a variety of tries, and was most successful with... zealot rush and dark templars, which are kind of different. BananaBrain’s varied opening choice is a strength.

#5 steamhammer

opening	games	wins
PvZ_10/12gate	15	100%
PvZ_1basespeedzeal	8	88%
PvZ_2basespeedzeal	11	82%
PvZ_4gate2archon	7	57%
PvZ_5gategoon	7	86%
PvZ_9/9gate	12	92%
PvZ_9/9proxygate	15	100%
PvZ_bisu	4	75%
PvZ_neobisu	2	50%
PvZ_sairdt	7	100%
PvZ_sairgoon	2	0%
PvZ_stove	10	70%
12 openings	100	85%

enemy	games	wins
Z_10hatch	38	76%
Z_12hatch	31	84%
Z_12pool	11	91%
Z_4/5pool	3	100%
Z_9pool	1	100%
Z_9poolspeed	4	100%
Z_overpool	2	100%
Z_unknown	10	100%
8 openings	100	85%

2 gate zealot openings work well against Steamhammer—but only when played by PurpleWave or BananaBrain. Steamhammer can usually defend versus a lesser protoss.

#6 zzzkbot

opening	games	wins
PvZ_10/12gate	17	100%
PvZ_1basespeedzeal	11	91%
PvZ_2basespeedzeal	4	25%
PvZ_4gate2archon	4	50%
PvZ_5gategoon	6	67%
PvZ_9/9gate	15	100%
PvZ_9/9proxygate	3	67%
PvZ_bisu	5	60%
PvZ_neobisu	4	25%
PvZ_sairdt	12	100%
PvZ_sairgoon	6	50%
PvZ_stove	13	100%
12 openings	100	83%

enemy	games	wins
Z_4/5pool	33	85%
Z_9pool	17	100%
Z_9poolspeed	2	100%
Z_overpool	23	65%
Z_unknown	25	84%
5 openings	100	83%

I like that BananaBrain varies its opening choice even when several openings win 100%. (Steamhammer does too; if more than one opening has scored 100% so far, Steamhammer chooses randomly among them.) Playing a strong opening gives the opponent one problem to solve (“how do I survive this?”). Unpredictably playing one of several strong openings sets the opponent two problems (“what is this fiend doing, and then how do I live through it?”) which must both be solved, more than twice as difficult.

#7 microwave

opening	games	wins
PvZ_10/12gate	20	90%
PvZ_1basespeedzeal	11	73%
PvZ_2basespeedzeal	3	33%
PvZ_4gate2archon	6	50%
PvZ_5gategoon	8	75%
PvZ_9/9gate	17	88%
PvZ_9/9proxygate	8	75%
PvZ_bisu	10	60%
PvZ_neobisu	3	33%
PvZ_sairdt	4	50%
PvZ_sairgoon	2	0%
PvZ_stove	8	62%
12 openings	100	71%

enemy	games	wins
Z_10hatch	8	88%
Z_12hatch	38	55%
Z_12pool	2	100%
Z_4/5pool	28	71%
Z_9pool	9	67%
Z_9poolspeed	7	100%
Z_overpool	3	100%
Z_unknown	5	100%
8 openings	100	71%

#8 iron

opening	games	wins
PvT_10/12gate	6	67%
PvT_10/15gate	3	0%
PvT_12nexus	4	25%
PvT_1gatedtexpo	25	84%
PvT_2gatedt	10	60%
PvT_9/9gate	10	60%
PvT_9/9proxygate	4	75%
PvT_bulldog	1	0%
PvT_dtdrop	14	64%
PvT_nzcore	5	40%
PvT_proxydt	2	0%
PvT_stove	4	25%
PvT_zcore	5	40%
PvT_zzcore	7	43%
14 openings	100	58%

enemy	games	wins
T_1fac	30	63%
T_2fac	1	0%
T_fastexpand	29	48%
T_unknown	40	62%
4 openings	100	58%

Bulldog! That involves protoss dropping zealots, typically on cliff tanks, with a simultaneous attack by ground. When successful, a bulldog can abruptly break a terran defense that is sound against any purely ground attack. I don’t think I’ve seen BananaBrain play that; I should watch more games versus terran. Can anybody point out an example?

#9 xiaoyi

opening	games	wins
PvT_10/12gate	10	90%
PvT_10/15gate	7	43%
PvT_12nexus	5	20%
PvT_1gatedtexpo	11	100%
PvT_2gatedt	7	57%
PvT_9/9gate	6	33%
PvT_9/9proxygate	6	17%
PvT_bulldog	5	0%
PvT_dtdrop	9	89%
PvT_nzcore	6	17%
PvT_proxydt	7	71%
PvT_stove	8	75%
PvT_zcore	6	33%
PvT_zzcore	7	57%
14 openings	100	57%

enemy	games	wins
T_1fac	37	57%
T_fastexpand	20	65%
T_unknown	43	53%
3 openings	100	57%

The Stove worked against XiaoYi? Again, XiaoYi shows weakness against tricks. The Stove involves making scouts to harass while teching to dark templar. It should not be hard for a good terran to defend against; notice that Iron dealt with it well enough.

#10 mcrave

opening	games	wins
PvP_10/12gate	7	71%
PvP_12nexus	6	50%
PvP_2gatedt	6	67%
PvP_2gatedtexpo	8	50%
PvP_3gaterobo	9	78%
PvP_3gatespeedzeal	8	62%
PvP_4gategoon	7	57%
PvP_9/9gate	8	75%
PvP_9/9proxygate	6	33%
PvP_nzcore	10	90%
PvP_zcore	7	57%
PvP_zcorez	10	90%
PvP_zzcore	8	88%
13 openings	100	69%

enemy	games	wins
P_1gatecore	34	74%
P_2gate	26	65%
P_2gatefast	29	69%
P_fastexpand	1	0%
P_proxygate	4	100%
P_unknown	6	50%
6 openings	100	69%

It looks like most openings performed similarly against McRave, and BananaBrain struggled to identify what worked. I imagine a fierce learning battle, both trying to keep one step ahead.

#11 ualbertabot

opening	games	wins
PvU_10/12gate	17	94%
PvU_9/9gate	17	100%
PvU_9/9proxygate	13	85%
PvU_flex	12	67%
PvU_nzcore	11	64%
PvU_zcore	16	88%
PvU_zzcore	13	77%
7 openings	99	84%

enemy	games	wins
P_1gatecore	8	100%
P_2gate	6	83%
P_2gatefast	21	71%
P_unknown	3	33%
T_1fac	5	100%
T_2fac	7	100%
T_2rax	10	90%
T_fastexpand	3	100%
T_unknown	5	100%
Z_10hatch	2	100%
Z_12hatch	8	100%
Z_4/5pool	17	71%
Z_unknown	4	75%
13 openings	99	84%

#12 aitp

opening	games	wins
PvT_10/12gate	7	100%
PvT_10/15gate	8	100%
PvT_12nexus	6	100%
PvT_1gatedtexpo	8	100%
PvT_2gatedt	7	100%
PvT_9/9gate	6	100%
PvT_9/9proxygate	7	100%
PvT_bulldog	9	100%
PvT_dtdrop	7	100%
PvT_nzcore	7	100%
PvT_proxydt	7	100%
PvT_stove	9	100%
PvT_zcore	6	100%
PvT_zzcore	6	100%
14 openings	100	100%

enemy	games	wins
T_1fac	4	100%
T_2fac	12	100%
T_fastexpand	24	100%
T_unknown	60	100%
4 openings	100	100%

#13 bunkerboxer

opening	games	wins
PvT_10/12gate	7	100%
PvT_10/15gate	7	100%
PvT_12nexus	7	100%
PvT_1gatedtexpo	7	100%
PvT_2gatedt	7	100%
PvT_9/9gate	6	100%
PvT_9/9proxygate	7	100%
PvT_bulldog	8	100%
PvT_dtdrop	7	100%
PvT_nzcore	6	100%
PvT_proxydt	8	100%
PvT_stove	8	100%
PvT_zcore	7	100%
PvT_zzcore	8	100%
14 openings	100	100%

enemy	games	wins
T_unknown	100	100%
1 openings	100	100%

BananaBrain apparently does not have a bunker rush recognizer.

AIIDE 2019 - what AITP learned

AITP scored zero against over half of the participants, so its learning results are not deeply interesting. Also, its strategies are labeled with opaque sequences of letters and numbers. But it was easy to generate the tables, and they offer a little insight into AITP’s interesting design, so here they are.

Unlike other Steamhammer forks, AITP does not spell out concrete opening builds in the configuration file, at least not beyond 4 x SCV—start by making workers. The strategy names themselves are code sequences that tell what to do throughout the game. The letters A, B, C are stages of the game, and the combinations A1, A2 etc. are “modules” that may be active during the matching stage. Each module has its own update method to decide what to build, and the StrategyManager sometimes checks the current module for other decisions. There is module switching code in case of surprises (StrategyManager::shouldSwitchModule()); it also sets flags and updates other information.

I like it, it’s a flexible way to specify a plan for the whole game, and allows for changing plans on the fly. It’s an abstract strategy system, similar in principle to what I plan for Steamhammer. My implementation will look entirely different, though.

AITP has only 5 strategies configured. I gather that it can switch to other sequences on the fly if circumstances warrant. 5 is not many, though; I think they have only completed the basics. Here is the Steamhammer opening group it assigns to each strategy. It does not use the opening group strings, but they may have some heuristic value:

• A1-B3-C2 AntiRush
• A1-B1-B2-C2 Rush
• A3-B5-C1 NoneBunker
• A3-B7-C1 NoneBunker
• A4-B2-C1 8BB (does that mean BBS?)

#1 locutus

opening	games	wins
A1-B1-B2-C2	7	0%
A1-B3-C2	10	0%
A3-B5-C1	16	0%
A3-B7-C1	27	0%
A4-B2-C1	40	0%
5 openings	100	0%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Naked expand	59	59%	0%	11	11%	0%	7%	83%
Proxy	27	27%	0%	9	9%	0%	11%	67%
Turtle	9	9%	0%	3	3%	0%	0%	67%
Unknown	5	5%	0%	77	77%	0%	0%	80%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	100	2:01	1:17	8:58
enemy combat units	100	3:49	2:42	8:06
enemy air units	35	7:18	6:14	7:57
enemy cloaked units	61	7:34	6:14	11:26

AITP lost every game, but did not explore its possible strategies equally. It seems to have priorities. Maybe later I will look into how that works. AutoGasSteal is set true in the configuration file, but AITP did not record itself as stealing gas against any opponent. Presumably it is turned off in the code.

#2 purplewave

opening	games	wins
A1-B1-B2-C2	7	0%
A1-B3-C2	15	0%
A3-B5-C1	17	0%
A3-B7-C1	20	0%
A4-B2-C1	30	0%
5 openings	89	0%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Naked expand	80	90%	0%	3	3%	0%	2%	98%
Unknown	9	10%	0%	86	97%	0%	0%	89%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	89	2:03	1:19	6:53
enemy combat units	89	3:39	3:11	6:22
enemy air units	83	6:53	6:03	11:38
enemy cloaked units	60	6:53	5:25	14:01

#3 bananabrain

opening	games	wins
A1-B1-B2-C2	11	0%
A1-B3-C2	5	0%
A3-B5-C1	11	0%
A3-B7-C1	37	0%
A4-B2-C1	35	0%
5 openings	99	0%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	9	9%	0%	3	3%	0%	0%	78%
Heavy rush	20	20%	0%	5	5%	0%	10%	70%
Naked expand	30	30%	0%	5	5%	0%	3%	83%
Proxy	31	31%	0%	4	4%	0%	0%	90%
Unknown	9	9%	0%	82	83%	0%	0%	89%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	99	2:01	0:45	6:38
enemy combat units	99	3:39	2:25	8:07
enemy air units	69	6:31	3:39	11:25
enemy cloaked units	78	6:34	3:47	9:46

#4 daqin

opening	games	wins
A1-B1-B2-C2	5	0%
A1-B3-C2	7	0%
A3-B5-C1	17	0%
A3-B7-C1	16	0%
A4-B2-C1	28	0%
5 openings	73	0%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Unknown	73	100%	0%	73	100%	0%	0%	100%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	73	4:03	2:43	10:39
enemy combat units	73	3:42	2:34	7:59
enemy air units	73	7:01	5:55	8:43
enemy cloaked units	33	11:01	10:01	15:01

#5 steamhammer

opening	games	wins
A1-B1-B2-C2	5	0%
A1-B3-C2	17	6%
A3-B5-C1	4	0%
A3-B7-C1	22	5%
A4-B2-C1	21	5%
5 openings	69	4%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Naked expand	67	97%	3%	13	19%	0%	18%	82%
Unknown	2	3%	50%	56	81%	5%	0%	50%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	69	2:27	0:51	5:46
enemy combat units	69	3:31	2:53	7:15
enemy air units	48	7:24	5:22	14:13
enemy cloaked units	7	8:22	7:34	11:21

Steamhammer is the highest-ranked opponent that AITP scored wins against. It looks like a few scattered games, though.

#6 zzzkbot

opening	games	wins
A1-B3-C2	51	47%
A3-B7-C1	3	0%
A4-B2-C1	4	25%
3 openings	58	43%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	57	98%	44%	37	64%	51%	63%	37%
Unknown	1	2%	0%	21	36%	29%	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	58	2:37	0:58	8:29
enemy combat units	58	2:56	2:18	5:11
enemy air units	43	8:13	6:38	11:51
enemy cloaked units	0	-	-	-

It looks like ZZZKBot played its 4 pool in over half the games, and perhaps its guardian rush in the remainder. A1-B3-C2 is the strategy labeled AntiRush. AITP recorded more wins for itself than it actually scored, despite recording fewer games than it played. I suspect that AITP has changed the meaning of the numbers.

#7 microwave

opening	games	wins
A1-B1-B2-C2	5	0%
A1-B3-C2	32	22%
A3-B5-C1	14	0%
A3-B7-C1	17	0%
A4-B2-C1	7	0%
5 openings	75	9%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	59	79%	8%	5	7%	0%	3%	93%
Naked expand	15	20%	13%	3	4%	0%	7%	80%
Unknown	1	1%	0%	67	89%	10%	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	75	2:38	1:43	5:53
enemy combat units	75	3:29	2:46	4:29
enemy air units	13	13:01	11:26	15:46
enemy cloaked units	0	-	-	-

#8 iron

opening	games	wins
A1-B1-B2-C2	19	0%
A1-B3-C2	5	0%
A3-B5-C1	27	0%
A3-B7-C1	12	0%
A4-B2-C1	36	0%
5 openings	99	0%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	98	99%	0%	99	100%	0%	100%	0%
Unknown	1	1%	0%		-	-	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	99	2:18	1:55	7:23
enemy combat units	99	4:14	3:33	4:55
enemy air units	95	6:05	5:37	6:39
enemy cloaked units	95	5:55	5:26	6:38

#9 xiaoyi

opening	games	wins
A1-B1-B2-C2	10	0%
A1-B3-C2	13	0%
A3-B5-C1	25	0%
A3-B7-C1	11	0%
A4-B2-C1	36	0%
5 openings	95	0%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	94	99%	0%	95	100%	0%	100%	0%
Unknown	1	1%	0%		-	-	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	95	1:47	1:22	8:27
enemy combat units	95	3:13	2:39	4:46
enemy air units	92	9:13	7:22	11:14
enemy cloaked units	89	6:26	5:41	11:14

#10 mcrave

opening	games	wins
A1-B1-B2-C2	3	0%
A1-B3-C2	15	27%
A3-B5-C1	14	0%
A3-B7-C1	26	8%
A4-B2-C1	40	30%
5 openings	98	18%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	6	6%	17%	2	2%	0%	17%	67%
Heavy rush	37	38%	19%	4	4%	0%	5%	89%
Naked expand	42	43%	21%	7	7%	14%	10%	86%
Unknown	13	13%	8%	85	87%	20%	0%	92%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	98	1:59	1:14	12:41
enemy combat units	98	4:13	2:55	7:13
enemy air units	36	7:39	6:17	10:39
enemy cloaked units	72	6:34	5:29	11:01

#11 ualbertabot

opening	games	wins
A1-B1-B2-C2	14	21%
A1-B3-C2	30	57%
A3-B5-C1	1	0%
A4-B2-C1	14	21%
4 openings	59	39%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	50	85%	40%	13	22%	54%	20%	74%
Naked expand	7	12%	14%	3	5%	33%	0%	71%
Unknown	2	3%	100%	43	73%	35%	0%	50%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	58	2:41	1:17	12:26
enemy combat units	58	3:29	2:18	7:27
enemy air units	8	7:05	6:46	15:01
enemy cloaked units	0	-	-	-

Again, AITP recorded fewer games and more wins than happened. Is it a bug, or is it intentionally over-recording wins for certain strategies to focus its search? Or what? AITP is interesting, it deserves a closer look into the code.

#13 bunkerboxer

opening	games	wins
A1-B3-C2	56	98%
A3-B5-C1	3	100%
2 openings	59	98%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Unknown	2	3%	100%	58	98%	98%	0%	50%
Worker rush	57	97%	98%	1	2%	100%	0%	100%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	56	2:07	1:50	3:25
enemy combat units	36	7:44	2:58	9:22
enemy air units	0	-	-	-
enemy cloaked units	0	-	-	-

overall

	total		TvT		TvP		TvZ		TvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
A1-B1-B2-C2	86	3%	29	0%	33	0%	10	0%	14	21%
A1-B3-C2	256	42%	74	74%	52	8%	100	32%	30	57%
A3-B5-C1	149	2%	55	5%	75	0%	18	0%	1	0%
A3-B7-C1	191	2%	23	0%	126	2%	42	2%
A4-B2-C1	291	6%	72	0%	173	7%	32	6%	14	21%
total	973	14%	253	23%	459	4%	202	17%	59	39%
openings played	5		5		5		5		4

AIIDE 2019 - what DaQin learned

DaQin is derived from Locutus and also keeps 200 game records. But DaQin did not have pre-learned data. No games were left uncompleted; there are 100 against each opponent.

DaQin plays fewer builds than the other bots I’ve looked at so far.

#1 locutus

opening	games	wins
3GateDT	100	17%
1 openings	100	17%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
DarkTemplar rush	89	89%	16%	96	96%	17%	97%	2%
Proxy	6	6%	17%	2	2%	0%	0%	0%
Unknown	5	5%	40%	2	2%	50%	0%	0%

timing	#	median	early	late
gas steal attempt	47	1:43	1:39	2:06
gas steal success	0	-	-	-
enemy scout	99	6:07	1:21	9:07
enemy combat units	100	4:34	2:22	6:47
enemy air units	96	6:30	4:02	18:41
enemy cloaked units	0	-	-	-

DaQin had an enemy-specific strategy configured for Locutus, so it didn’t try anything else. Locutus is the only opponent that DaQin tried to prepare for, as far as I can see.

DaQin incorrectly recognized dark templar rush as Locutus’s strategy in most games, then correctly recorded that no cloaked units were seen during the game. See yesterday for Locutus’s play against DaQin, which did not include any DT build. I assume that the dark templar recognition is deliberately over-cautious, because DTs are dangerous. Locutus does have a fake dark templar build, where it adds a citadel of Adun to fool opponents into expecting dark templar (it works against most UAlbertaBot-derived bots).

#2 purplewave

opening	games	wins
2GateDT	23	22%
3GateDT	3	0%
4GateGoon	74	14%
3 openings	100	15%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
DarkTemplar rush	32	32%	16%	35	35%	23%	69%	0%
Fast rush	66	66%	14%	64	64%	11%	80%	0%
Proxy	1	1%	100%	1	1%	0%	0%	0%
Unknown	1	1%	0%		-	-	0%	0%

timing	#	median	early	late
gas steal attempt	29	0:46	0:46	0:50
gas steal success	8	-	-	-
enemy scout	99	2:17	1:18	4:41
enemy combat units	99	2:47	2:21	5:13
enemy air units	41	8:42	4:05	18:10
enemy cloaked units	85	6:07	5:06	15:41

Against PurpleWave, in contrast, DaQin less often foresaw dark templar, but apparently often faced them. (Arbiters can’t get out that fast.)

#3 bananabrain

opening	games	wins
2GateDT	4	25%
3GateDT	68	56%
4GateGoon	28	36%
3 openings	100	49%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
DarkTemplar rush	47	47%	53%	55	55%	62%	51%	0%
Fast rush	48	48%	44%	39	39%	33%	35%	0%
Heavy rush	1	1%	0%	2	2%	50%	0%	0%
Not fast rush	1	1%	100%	2	2%	0%	0%	0%
Proxy	1	1%	100%	2	2%	50%	0%	0%
Unknown	2	2%	50%		-	-	0%	0%

timing	#	median	early	late
gas steal attempt	43	1:42	0:46	1:48
gas steal success	9	-	-	-
enemy scout	100	1:59	1:21	3:09
enemy combat units	100	2:57	2:19	5:43
enemy air units	67	8:14	3:58	12:42
enemy cloaked units	28	5:47	4:57	19:38

#5 steamhammer

opening	games	wins
ForgeExpand5GateGoon	100	94%
1 openings	100	94%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush		-	-	1	1%	100%	0%	0%
Heavy rush	29	29%	97%	18	18%	100%	14%	3%
Hydra bust	1	1%	100%	2	2%	100%	0%	0%
Not fast rush	64	64%	92%	72	72%	93%	69%	8%
Proxy		-	-	1	1%	100%	0%	0%
Unknown	6	6%	100%	6	6%	83%	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	97	2:25	0:51	6:03
enemy combat units	100	3:17	1:57	7:03
enemy air units	18	9:23	5:30	16:18
enemy cloaked units	16	5:51	4:57	13:43

#6 zzzkbot

opening	games	wins
ForgeExpand5GateGoon	97	10%
ForgeExpandSpeedlots	3	0%
2 openings	100	10%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	3	3%	33%	5	5%	100%	0%	33%
Heavy rush	90	90%	3%	93	93%	4%	100%	0%
Not fast rush		-	-	1	1%	100%	0%	0%
Unknown	7	7%	86%	1	1%	0%	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	97	2:57	0:59	7:30
enemy combat units	100	2:39	1:47	4:31
enemy air units	7	7:58	7:46	8:25
enemy cloaked units	0	-	-	-

How did ZZZKBot upset DaQin? These numbers suggest zergling bust (it could be hydras, but DaQin does have a hydra bust recognizer which did not fire): Mostly “heavy rush,” few mutalisks, no lurkers. Steamhammer also settled on zergling bust as the best bet, but was much less successful. Microwave tried its zergling bust build versus DaQin without success. Maybe ZZZKBot’s extreme aggression is the key.

#7 microwave

opening	games	wins
ForgeExpand5GateGoon	84	85%
ForgeExpandSpeedlots	16	75%
2 openings	100	83%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush	15	15%	93%	15	15%	100%	33%	0%
Heavy rush	32	32%	81%	20	20%	85%	16%	9%
Not fast rush	50	50%	80%	59	59%	76%	66%	4%
Proxy		-	-	1	1%	100%	0%	0%
Unknown	3	3%	100%	5	5%	100%	0%	0%

timing	#	median	early	late
gas steal attempt	0	-	-	-
gas steal success	0	-	-	-
enemy scout	97	2:33	1:10	6:10
enemy combat units	90	3:29	1:50	6:37
enemy air units	41	10:37	5:15	14:07
enemy cloaked units	5	6:31	6:23	10:23

#8 iron

opening	games	wins
12NexusCarriers	92	96%
4GateGoon	8	50%
2 openings	100	92%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	55	55%	96%	95	95%	94%	95%	2%
Proxy	8	8%	50%	3	3%	33%	0%	12%
Unknown	37	37%	95%	2	2%	100%	0%	0%

timing	#	median	early	late
gas steal attempt	92	2:19	2:15	2:35
gas steal success	0	-	-	-
enemy scout	87	2:58	1:41	12:09
enemy combat units	100	4:18	2:50	5:49
enemy air units	36	8:23	6:29	15:43
enemy cloaked units	30	8:25	7:54	15:43

12NexusCarriers seems to be the default build versus terran. Apparently terrans, even Iron, were not able to punish the fast expand. Well, they’re not supposed to be able to without risk, that’s the point of cutting probes for nexus on 12, but it does require good play from protoss to ensure.

#9 xiaoyi

opening	games	wins
12NexusCarriers	93	84%
3GateDT	1	0%
4GateGoon	1	0%
DTDrop	5	80%
4 openings	100	82%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	60	60%	83%	47	47%	94%	48%	42%
Not fast rush	29	29%	76%	10	10%	80%	14%	48%
Proxy	1	1%	0%	2	2%	100%	0%	0%
Safe expand	4	4%	100%	1	1%	0%	0%	0%
Unknown	6	6%	100%	40	40%	70%	0%	17%

timing	#	median	early	late
gas steal attempt	99	2:19	0:46	2:25
gas steal success	3	-	-	-
enemy scout	93	2:23	2:10	19:03
enemy combat units	100	3:24	2:33	7:06
enemy air units	80	8:23	7:09	17:30
enemy cloaked units	11	8:15	7:57	8:27

XiaoYi usually got air tech pretty fast, that’s unusual and interesting. I’m guessing it scouted the carriers coming and prepared wraiths.

#10 mcrave

opening	games	wins
2GateDT	1	0%
3GateDT	62	52%
4GateGoon	37	24%
3 openings	100	41%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
DarkTemplar rush	19	19%	53%	16	16%	62%	63%	0%
Fast rush	79	79%	39%	83	83%	36%	96%	0%
Naked expand		-	-	1	1%	100%	0%	0%
Unknown	2	2%	0%		-	-	0%	0%

timing	#	median	early	late
gas steal attempt	13	1:41	0:46	1:46
gas steal success	2	-	-	-
enemy scout	100	2:22	1:25	6:11
enemy combat units	100	3:03	2:21	5:29
enemy air units	21	6:11	3:38	15:57
enemy cloaked units	76	6:23	5:17	8:33

McRave upset DaQin. Dark templar in 3 out of 4 games, and they came out pretty early. PurpleWave showed a similar pattern, but it wasn’t as salient because it wasn’t an upset. The dark templar rush recognizer did not seem to be fully effective, possibly because it was overridden by the fast rush recognizer. DaQin’s best counter was DT-back-atcha.

#11 ualbertabot

opening	games	wins
12NexusCarriers	2	50%
3GateDT	25	88%
4GateGoon	4	50%
DTDrop	2	50%
ForgeExpand5GateGoon	67	78%
5 openings	100	78%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
DarkTemplar rush	12	12%	75%	11	11%	91%	17%	8%
Factory	3	3%	67%	11	11%	100%	0%	0%
Fast rush	67	67%	78%	47	47%	57%	51%	7%
Heavy rush	1	1%	100%	5	5%	100%	0%	0%
Hydra bust		-	-	1	1%	100%	0%	0%
Not fast rush	13	13%	92%	15	15%	100%	8%	15%
Proxy	1	1%	0%	2	2%	50%	0%	0%
Unknown	3	3%	67%	8	8%	100%	0%	0%

timing	#	median	early	late
gas steal attempt	24	1:43	0:46	2:17
gas steal success	2	-	-	-
enemy scout	87	1:47	1:14	9:30
enemy combat units	98	3:01	1:38	6:58
enemy air units	9	7:37	6:07	15:47
enemy cloaked units	4	5:09	4:33	5:19

DaQin had some trouble adapting to random UAlbertaBot. This is a point where preparation for the opponent would have been valuable: Make a build that UAlbertaBot can’t beat and ensure that it is played. It can be a general-purpose build; PurpleWave included a cannon turtle build that is safe against all sorts of rushes.

#12 aitp

opening	games	wins
12NexusCarriers	100	100%
1 openings	100	100%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Factory	79	79%	100%	5	5%	100%	4%	96%
Unknown	21	21%	100%	95	95%	100%	0%	90%

timing	#	median	early	late
gas steal attempt	100	2:19	2:16	2:25
gas steal success	0	-	-	-
enemy scout	11	7:53	2:38	11:45
enemy combat units	100	5:55	2:43	7:29
enemy air units	67	10:07	8:50	14:01
enemy cloaked units	0	-	-	-

#13 bunkerboxer

opening	games	wins
12NexusCarriers	95	98%
4GateGoon	5	100%
2 openings	100	98%

plan	predicted			recognized			accuracy
plan	count	games	wins	count	games	wins	good	?
Fast rush		-	-	1	1%	100%	0%	0%
Not fast rush	78	78%	97%	35	35%	100%	35%	60%
Proxy	5	5%	100%	5	5%	100%	0%	0%
Unknown	17	17%	100%	59	59%	97%	0%	71%

timing	#	median	early	late
gas steal attempt	93	2:20	2:15	7:19
gas steal success	28	-	-	-
enemy scout	62	2:07	1:47	7:18
enemy combat units	59	2:59	2:09	7:51
enemy air units	0	-	-	-
enemy cloaked units	0	-	-	-

Beating BunkerBoxeR with a build of fast expansion into carriers is... not the intuitive choice. But I guess it worked.

overall

	total		PvT		PvP		PvZ		PvR
opening	games	wins	games	wins	games	wins	games	wins	games	wins
12NexusCarriers	382	94%	380	94%					2	50%
2GateDT	28	21%			28	21%
3GateDT	259	42%	1	0%	233	37%			25	88%
4GateGoon	157	25%	14	64%	139	21%			4	50%
DTDrop	7	71%	5	80%					2	50%
ForgeExpand5GateGoon	348	65%					281	62%	67	78%
ForgeExpandSpeedlots	19	63%					19	63%
total	1200	63%	400	93%	400	30%	300	62%	100	78%
openings played	7		4		3		2		5