archive by month
Skip to content

CIG 2017 crosstable

My version of the CIG 2017 crosstable. I have small differences from the official results—see the explanation below. My results reverse the finishing places of #8 CasiaBot and #9 Ziabot as well as #17 Bigeyes and #18 OpprimoBot, because even small differences affect ranking.

The format of the results file has changed since last year. There is no documentation, so I don’t know what all the columns mean, but I only needed a few of them and I was able to pick them out. It turns out that column 6 is true if the first player won, otherwise false. It looks like each game is recorded twice, with the same winner, loser, and map but some differences in other data. I imagine that each player is running in its own instance, and each instance records its own data. The games are numbered so the duplicates can be recognized, and sometimes the games are recorded out of order. I had to rewrite parsing code, but only a handful of lines.

The results file turned out to have a section of corrupted data in the middle. Information about a small number of games is missing or corrupted, and I had to delete it from my input. The tournament format of 125 rounds with 20 participants called for 125 * 20 * 19 / 2 = 23750 games. Each game was recorded twice, so there should be 47500 lines in the results file. One game was expected to be missed because the tournament manager software has an off-by-one bug and doesn’t play one game. The last and highest-numbered game recorded in the results file is 23721, and numbering starts from 0 so it looks as though 3 games in fact went unplayed, or at least uncounted. There are 47487 lines remaining in the input file, accounting for 23722 games or 99.88% of the ideal 23750, or 99.89% of the expected and claimed 23749 games.

Anyway, my winning percentages are different from the official numbers mostly starting in the 3rd decimal place, which is what you expect with a discrepancy in game count in the third decimal place. Apparently the official numbers don’t suffer from corrupted data. I have written to the organizers to see if they can provide a clean result file.

overallZZZKtscmPurpLetaUAlbMegaOverCasiZiabIronAIURMcRaTyrSRboTerrBonjBigeOpprSlinSals
ZZZKBot75.43%52%37%53%82%77%90%31%67%66%93%83%84%82%88%89%86%94%90%90%
tscmoo73.50%48%90%82%54%67%65%46%66%58%69%74%69%75%92%89%79%94%85%94%
PurpleWave66.51%63%10%42%62%74%89%54%41%55%82%51%89%74%86%61%90%74%73%94%
LetaBot62.75%47%18%58%62%49%76%82%70%75%51%66%54%42%70%72%78%41%87%95%
UAlbertaBot61.67%18%46%38%38%30%62%60%70%42%60%79%46%67%88%90%73%87%86%92%
MegaBot61.06%23%33%26%51%70%55%37%61%44%44%60%50%81%90%82%77%97%82%96%
Overkill59.65%10%35%11%24%38%45%55%76%43%65%62%42%84%94%91%93%91%83%92%
CasiaBot58.32%69%54%46%18%40%63%45%34%46%62%59%70%29%47%82%93%75%84%94%
Ziabot58.49%33%34%59%30%30%39%24%66%47%58%39%60%73%79%89%82%90%81%96%
Iron58.11%34%42%45%25%58%56%57%54%53%69%56%45%69%74%70%76%76%73%74%
AIUR56.73%7%31%18%49%40%56%35%38%42%31%64%74%81%82%92%72%90%83%93%
McRave47.20%17%26%49%34%21%40%38%41%61%44%36%44%65%61%59%52%56%77%76%
Tyr45.32%16%31%11%46%54%50%58%30%40%55%26%56%24%45%58%49%63%58%91%
SRbotOne45.24%18%25%26%58%33%19%16%71%27%31%19%35%76%35%61%97%43%75%94%
TerranUAB38.58%12%8%14%30%12%10%6%53%21%26%18%39%55%65%77%63%67%65%93%
Bonjwa33.04%11%11%39%28%10%18%9%18%11%30%8%41%42%39%23%61%67%65%95%
Bigeyes30.90%14%21%10%22%27%23%7%7%18%24%28%48%51%3%37%39%60%58%91%
OpprimoBot31.90%6%6%26%59%13%3%9%25%10%24%10%44%37%57%33%33%40%79%93%
Sling26.07%10%15%27%13%14%18%17%16%19%27%17%23%42%25%35%35%42%21%77%
Salsa9.52%10%6%6%5%8%4%8%6%4%26%7%24%9%6%7%5%9%7%23%

observations

Newcomers #3 PurpleWave and #8 CasiaBot were the only players with positive scores versus #1 ZZZKBot. When I have time I’ll look into the replays and see if ZZZKBot was ready for the old-timers with special builds, or if it was the old-timers who weren’t ready for ZZZKBot. Perhaps ZZZKBot now has a learning feature and switches to a backup build if its 4 pool doesn’t work? I’m interested to find out.

#2 Tscmoo was edged out by #1 ZZZKBot and #8 CasiaBot, but otherwise had plus scores across the board. It showed stable performance across opponents—except for its crushing 90% win over #3 PurpleWave.

#3 PurpleWave did reasonably well against all except #2 tscmoo (10% wins) and #9 Zia. (I count 42% against #4 LetaBot as reasonably good, though it’s technically an upset.) But there were several weaker opponents that it edged out more narrowly than you might expect. My conclusion: Strong, but a little lacking in the solidity needed to defeat weaker opponents consistently. With more maturity it will likely become even stronger.

#8 CasiaBot seems to have the most uneven results, with severe upsets in both directions—69% versus #1 ZZZKBot and 29% against #14 SRbotOne.

The biggest upset is #18 OpprimoBot at 59% against #4 LetaBot.

The winning rates of #10 Iron and #12 McRave versus tail-ender #20 Salsa, which lost nearly all games against other opponents, are 74% and 76%. It backs up the claim that the two played only about 75% of games due to map problems with BWEM.

What do you notice in the crosstable?

Next: Per-map crosstables. We can expect dramatic numbers in some table cells thanks to Iron and McRave.

Update: I heard back from the organizers. They say that they had to alter the results file to compensate for errors in the tournament manager. And they think that all the problems are slight given the large number of games played. I think that’s true as far as it goes, but it leaves me feeling a little uneasy about the official results.

Steamhammer 1.3.3 is uploaded

Steamhammer 1.3.3 is uploaded. Fixing the disastrous “I seem to be making a command center, I’d better cancel it” bug was trickier than I expected. After I did fix it, I saw the game Steamhammer vs AyyyLmao, which was just as disastrous. I had to watch closely to see what happened: At about 1:15 into the game, Steamhammer moved a drone to start its spawning pool, and simultaneously sent out its scouting drone. The scouting drone momentarily blocked the spawning pool from starting, which happens from time to time and causes a slight delay. Not this time, though. The spawning pool was canceled on the spot due to another bug, and Steamhammer’s build order was disrupted. Steamhammer recovered poorly and lost the game against an opponent it should have beaten.

Anyway, I made several tweaks to the rules for giving up on buildings, and I tested more thoroughly with games against a variety of opponents. I think it’s mostly OK now. But the change is more fragile than I realized, and I won’t be surprised if more bugs are lying in wait.

Next: The full CIG 2017 results are posted, but they don’t come with the colorful crosstable (at least not yet). I’ll supply my usual red-and-blue crosstable.

Steamhammer 1.3.2 added a severe terran bug

Ack! I fixed one bug that happens when terran gives up on constructing a building, but I missed an even more serious terran bug in the same code. In Steamhammer 1.3.2, terran is unable to build a command center! The bot gives up on the building partway through and cancels it.

See Randomhammer vs ICEbot for an example. Randomhammer expanded late, and ICE had already placed a spider mine at its expansion spot, so Randomhammer gave up on its first attempt as intended. After its vessel came out, Randomhammer eventually cleared the mine and started the command center again—and again—and again, canceling it over and over. Mega ouch! I coded up a quick 14CC opening and verified that the bot can’t finish a command center at all.

The bug is so severe that it deserves a quick 1.3.3 release with a fix. Stand by, I’ll try to get it in today.

Steamhammer 1.3.2 is uploaded

Steamhammer 1.3.2 change list. It amounts to the donated openings, 3 fixes for terran and protoss because I only tested zerg in the AIIDE version, and 2 minuscule tweaks to zerg—and of course configuration to play on SSCAIT instead of in the AIIDE tournament. Otherwise it is the same as the AIIDE version.

  • New protoss openings from Antiga, donated by Iruian. Protoss play will be more varied and hopefully stronger.
  • Version 1.3 had a configuration mistake, when Randomhammer rolled protoss or terran, affecting opponents Dave Churchill (UAlbertaBot) and Andrey Kurdiumov (a UAlbertaBot fork with software engineering changes rather than play changes). It was configured to always play a zerg opening, so when the bot was not zerg, the opening group was not set. Randomhammer played normally... except that it never built a combat unit. Combat units are made depending on the opening group. I corrected the configuration. Also see the next item.
  • Added a workaround in case the opening group is not set to a correct value and it needs to be: Set a default opening group when the error is noticed. As it always has, it puts a message on the screen when the error is noticed, so you’ll know what’s wrong.
  • The new “cancel a building if it takes too long or if too many workers are lost” feature included a bug for terran. Partly constructed buildings could be left around instead of being canceled. Fixed.
  • I changed the formula for making lurkers as aux units, so that Steamhammer makes more aux lurkers. Aux units are extra units added to the regular unit mix. This should improve play when Steamhammer has lurker tech but isn’t using lurkers as a primary unit.
  • If there are mutalisks, then devourers go into the flying squad with the mutalisks. They were mistakenly always put in the ground squad. It should improve devourer play, at least sometimes and a little.

Tomorrow I’ll update Steamhammer’s web page with the source. Then I will return to the opponent model.

Steamhammer 1.3.2 and Antiga’s openings

The biggest new feature in Steamhammer 1.3.2 will be the new protoss openings from Antiga (thanks to Iruian). I sorted the openings into a different order and renamed a few to keep to a more consistent scheme. I’m leaving out the 9-10 gate opening for now; it is only slightly different from 9-9 gate and plays worse because of BOSS, and Antiga leaves it out too.

I’m not straying too far from Antiga’s weights, either. I question some of them, but they are tested and known good and provide variety. I’m making only relatively small adjustments. And that’s the story. It’s taking time to configure and test everything, but I’ll push out the new version before long.

If anybody else would like to donate openings, I’ll include them in the Steamhammer distribution as extras if they are remotely useful. If they’re good, I’ll configure them to play in the live Steamhammer. Since protoss has just been fed, the terran openings are now begging for their handouts.

Someday, when Steamhammer is smarter than now, I’ll build the opening tree and have Steamhammer choose opening lines on the fly. Possibly I’ll have both a concrete tree of macro actions and an abstract tree of strategic plans. At that point, the bot will be able to decide for itself under what circumstances an opening is good, and it will make sense to feed it as many variant openings as possible regardless of quality. I don’t mind accumulating data now for that future day.

the donated openings from Antiga

Here’s what I found out about Antiga’s donated openings. (Thanks again to Iruian, by the way.) This post is only about the openings themselves. Also donated are weights, how often to play each opening in each matchup, which I’ll consider separately. My plan is to distribute the donated file along with Steamhammer’s source, so that whoever wants to can easily borrow the openings and weights. The openings are good, so I’ll also put most of them directly into Steamhammer’s regular configuration; Randomhammer will play them when it rolls protoss.

Going by names, Antiga’s openings are a superset of Steamhammer’s. But 3 of the openings with the same names are different, so it’s a little confusing.

identical openings

  • 1ZealotCore
  • DTRush
  • DTDrop
  • CorsairDT (weak because Steamhammer sucks with corsairs)
  • 12Nexus
  • 13Nexus

openings new in Antiga

These are good openings, so at the moment I’m thinking I’ll throw them into Steamhammer’s config for active use. The weights are a separate question.

  • 9-10Gate
  • NoZealotCore
  • 10-15GateGoon
  • 2ZealotCore
  • 2GatewayGoonExpo
  • Nexusfirst5zealotExpo

openings that are shared but different

I tested the shared openings by playing them against each other head-to-head. I’ll call it Steamhammer versus Antiga, though both sides were running identical Steamhammer code. I didn’t pay attention to the results of the games, which varied depending on how battles happened to come out. I paid attention to macro: Who had more income, who produced units more efficiently, who was able to expand sooner.

9-9 gate The 2 openings proceeded in lockstep until Antiga went out of book. Then Steamhammer’s beautifully optimized UAlbertaBot zealot rush quickly pulled ahead. It evened out somewhat after Steamhammer also went out of book, but not entirely.

10-12 gate Steamhammer’s 10-12 gate opening is not as highly optimized. Again the openings were identical until Antiga went out of book. Steamhammer pulled ahead, but not dangerously. I don’t see any big difference in strength between the openings, only a small advantage to Steamhammer’s existing opening.

ForgeExpand The openings are the same, except that Antiga adds a third photon cannon at the end of the opening. It barely delays the early zealots and dragoons, so I judge it an improvement: Safer and practically as aggressive. I will switch Steamhammer’s opening to this version.

optimizing Steamhammer’s openings

Iruian donated some protoss openings with weights as used in the bot Antiga. I will distribute them with Steamhammer, but I’m still pondering exactly how. The openings are supposed to be straight from Liquipedia, and the ensemble is claimed 50 to 100 elo stronger than Steamhammer’s default protoss openings. I don’t have any reason to doubt it, but I’m still in the process of checking the openings myself.

Unfortunately, there are reasons not to take openings straight out of Liquipedia without testing. Most of Steamhammer’s provided openings are modified from Liquipedia versions, and sometimes the changes are major. (Some were taken from other sources, and a few were developed from scratch, starting with no more than the opening stem and a plan.)

Sometimes Liquipedia is unhelpful, if not outright wrong. I think the 12 pool lurker build on this page is a prime example. It says you should adhere to the build strictly. But if you build everything that the build order calls for, then lurkers are severely delayed, and the build order makes no sense (at least to me); use a 12 hatch build instead. If you minimize the build to get lurkers as fast as possible, then you end up with 4 or more larvas for a long stretch after the second hatchery finishes, meaning that there was no reason to get the second hatchery so soon; you should have stuck with a single hatchery build. Presumably a pro can weigh the situation and decide what extra stuff is good to make, but Steamhammer doesn’t have that skill. So I found the Liquipedia build order unhelpful.

Often Steamhammer plays poorly in the middle game, and the weak play can be worked around or delayed by extending the opening. This is the main reason to deviate from standard builds: Alter the opening so that Steamhammer plays it better.

BOSS is weak, which causes terran and protoss to play poorly after the opening. BOSS tends to build too many production buildings—I often see 6 or 7 gateways on a one-base income that can support 4 gateways, which is wasteful. BOSS also likes to bunch the production of similar things, not keeping the nexus and gateways busy at the same time, but rather “probe probe probe probe zealot zealot zealot zealot”, where first the nexus is busy, then the nexus goes idle and the gateways become busy. It’s crazy inefficient.

That is why Steamhammer’s 9-9Gate opening is so long and detailed. If it ended early like the Liquipedia build, then BOSS would take over sooner and build inefficiently, and the bot would fall behind in probe count and zealot count for the rest of the game. I know this for sure, because I’ve run the different variants head-to-head. Dave Churchill optimized the 9-9 gate opening already for UAlbertaBot, and he did an excellent job. I optimized the 10-12 gate opening myself in a similar way and did not do it as thoroughly; it has room for improvement.

Someday I’ll write a new macro system and drop BOSS, but not yet. I might be able to get to it this year....

Sometimes the middle game strategy decisions are silly. Terran and protoss are especially short on strategy smarts, but the zerg strategy boss also has many weaknesses. In one frustrating example, if the opening build makes a spire but doesn’t make the mutalisks, Steamhammer may decide that lurkers were a better idea after all, get no lair units for a long time, and lose. There are many ways for the strategy boss to misunderstand the strategy behind an opening, so that you have to write a long explicit build order. Some openings I have shortened after improving the strategy boss. Some openings have legacy endings that it might be good to remove.

Anyway, the bottom line is that Steamhammer is not strong enough to play Liquipedia opening builds uncritically. Some are fine, some will confuse the poor bot.

On the other hand, Steamhammer’s existing protoss openings leave a lot to be desired. They’re more a demonstration of what’s possible than a sound selection. It’s no surprise if Antiga’s openings are stronger.

Next: Looking at the donated openings.

NLPRbot

NLPRbot is cpac by Qiyue Yin. I don’t know, but I have to suspect that it is the same version of cpac that is playing in AIIDE 2017.

It is a fork of Steamhammer. The configuration file has been incorporated into the .dll, but it still tells what is going on. Versus terran, it plays Steamhammer’s 11Gas10PoolLurker opening 90% of the time, and a couple other openings the rest (a sensible choice). I couldn’t see any difference between its lurker micro and Steamhammer’s. Versus zerg it plays a half dozen openings, the openings that Steamhammer plays most often. Versus protoss and random it plays even more similarly to Steamhammer.

It has fixed opponent-specific openings named like this, in the order listed. The opponent names are the same as in AIIDE, and some are different than names on SSCAIT.

opponentopening
UAlbertaBotOverpoolSpeedDave
Steamhammer5PoolHard
Aiur5PoolHard
Ximp2HatchMutaXimp
Xelnaga5PoolHard
Skynet5PoolHardSkynet
MegaBot5PoolHard
MicrowaveZvZ_Overpool9Gas
ZZZKBot9PoolSpeedExpo
McRave2HatchMutaMcRave

I’m not sure why Aiur and Xelnaga rated special counters. The ordinary opening mix should beat them reliably.

I didn’t dig into the dll in detail, but I do see additions that look like tracking unit types and keeping feature vectors. It looks like there are extensive changes to the zerg strategy boss; possibly a learning algorithm has been plugged in. In one game I noticed different scouting behavior. Most of the time it plays like Steamhammer.

As far as the configuration itself goes, though, NLPRbot strikes me as a mildly obfuscated fork of Steamhammer 1.3, configured for maximum wins with minimum effort from the author. I don’t know any reason other than obfuscation to rename it from cpac to NLPRbot. Maybe the author wants it to look like an unrelated bot? Maybe somebody other than the author posted it? On the AIIDE roster, Qiyue Yin is listed as “Independent”, so it’s apparently not an institutional thing.

In any case, NLPRbot aka cpac seems successful so far, even scoring a win over Krasi0. With Steamhammer skills plus hand configuration plus strategy improvements, it should be a dangerous opponent.

Steamhammer’s opponent model

At its most general, a learning system consists of a data structure, a learning algorithm to update the data structure with new data, and a retrieval algorithm to answer questions from the learned data structure. You could say it is a specialized kind of memory, with a data store and ways to get information in and out. The information you get out is not the same as what you put in, and that is part of what makes it useful. Or you could say that ordinary computer memory is a learning system with no loss and no generalization.

One of my goals for Steamhammer’s opponent model is to learn from a single game. There is a well-known class of learning systems which makes it easy, because their data store is nothing more than a record of the original input data, called instance-based learning or memory-based learning. This simplest example is the k-nearest neighbors algorithm family: Your data is a set of points (x, y) , input x gives output y (where x and y might be, say, vectors). To learn from a new data point, simply store it along with the others. To answer questions of the form “what’s the output for input x?” you find a given number, k, of x‘s nearest neighbors according to some distance measure, and average their outputs (using whatever kind of average may be appropriate for your problem). A fancier class of systems that can draw complex conclusions from a single example goes under the name case-based reasoning, where the data store is a database of structured “cases”, or examples.

Anyway, I thought I should use a method in the nearest neighbor family. It’s the simplest way to meet my goals.

What should my data points look like? Well, what information does Steamhammer use to make strategy decisions? It looks at the enemy’s current unit mix. I want to be able to predict the enemy’s unit mix at a given time: “Oh no, those zerglings are early, it’s a rush!” or “this enemy switches to goliaths and hardly any tanks, I should build up hydralisks next.” Both are nothing more than unit mix @ time.

My data points are games, boiled down to sequences of unit mixes. In the first implementation, Steamhammer takes a snapshot of the unit mixes of both sides every 30 seconds, “this many drones, that many zerglings, ....” I also threw in some supplementary information: The map, the opening chosen, and as shortcuts for opening selection the times at which the enemy got various things, such as the first combat units, the first flyers, the first detection, and so on. And it simply appends all the data to a file named after that opponent.

To answer the question “what will the enemy’s unit mix be at time t?” the first implementation finds the nearest neighbor. It looks through the game records to find the best match game, the past game against the same opponent which is most like the current game, according to a similarity measure which adds up differences in unit mixes over time, up to the current time in the current game. (So the best match will change at most once every 30 seconds.) Having found the best match, it looks up the recorded enemy unit mix in the best match game record which is closest to time t and calls that the prediction. It’s dead simple.

That was my motivation. In fact, the game records have endless uses beyond predicting the enemy unit mix. For example, to figure out whether an opening is safe to play against this opponent, run the timings of the opening against the timings of the game records. If the opening always gets defenders in time, then the enemy will not smash you with a rush (or at least it will only be a surprise once). Or if you notice that the enemy never gets detection, then go lurkers and get an easy win. And so on.

Einstein, hand me the simplicity!

You can see why I thought the method was obvious. With clear goals and the right background knowledge, it is obvious. And you can see why I thought I could get it working within a few weeks; there is nothing complicated here. If I were a better coder, I would have succeeded.

Of course, it may turn out that the simplest option is not good enough. For the first cut I wanted to take the easiest way. If some part turns out to work poorly, I have improvements up my sleeve. The possible improvements are as endless as the possible uses.

  • The recorded unit mixes include buildings. Buildings are especially important for predicting what the opponent is up to, but my first cut similarity measure does not understand that. It treats the difference between 1 barracks or 2 the same as it treats the difference between 1 marine or 2, and that is obviously not ideal.
  • For some purposes, it may be better to record the total units ever made (or ever seen, if the enemy’s) instead of the current unit mix, because the current mix depends on the outcome of battles as well as the strategy followed.
  • If the best match is not close at all, maybe it should be ignored.
  • If there are a number of good matches, maybe they should be averaged together.
  • Surely the current unit mix should have a role in predicting the next unit mix. In the first cut, it is ignored.

The bottom line is that my first implementation may or may not work adequately. But I’m confident it can be improved until it does work.

Steamhammer AIIDE 2017 version

You can download the archive I submitted to AIIDE 2017, which is Steamhammer 1.3.1. It follows the AIIDE rules: It includes a compiled binary, the configuration file, and source. Unlike an SSCAIT submission, it does not include the BWAPI 4.1.2 library.

The configuration file is specialized for the tournament. The terran and protoss opening lines are removed, and it is intended to play zerg only. I did all my testing with zerg, and mostly on the AIIDE maps. Everything is optimized for the best AIIDE performance, at the expense of anything else.

Next is Steamhammer 1.3.2 for SSCAIT. It will restore the terran and protoss configuration. I also made a minor change that should improve devourer play (at least a little), and may make a few other small fixes. I need to test that protoss and terran are working well. It shouldn’t take more than a day or two, and then I’ll be right back on opponent modeling.

AIIDE 2017 outlook

AIIDE submission closes after today. Some bots may yet withdraw at the last moment or otherwise disappear, but we pretty much know the roster. I want to look ahead with the information we have.

race distribution

This year we have a zerg plurality. That happens to be good for Steamhammer, because ZvZ is Steamhammer’s best matchup.

race#%
terran721%
protoss1031%
zerg1443%
random13%

The percentages add up to 98% instead of 100% due to rounding.

BWAPI versions

I was surprised how many bots were already on BWAPI 4.2.0.

BWAPI#%
3.7.4928%
4.1.21753%
4.2.0618%

The 2 bots listed as withdrawn were also on BWAPI 4.2.0. Does anybody know if or when SSCAIT will support the newer version? I know resources are limited, so I won’t be surprised if it takes time.

6 of the 9 bots still on BWAPI 3.7.4 are the legacy bots carried over from last year. Only 3 are active entrants. Surprisingly, one of the 3 is ForceBot, which seems to be a recent bot since it only recently appeared on SSCAIT and suffered some teething troubles there.

the newcomers

There are unusually many absolute newcomers, bots whose names or authors I don’t recognize. History says that most newcomers will do poorly; it takes a long time to turn an empty repository into a contender. But there are exceptions. This year PurpleWave reached #3 in CIG 2017. Last year Bereaver was a sensation. It will be a surprise if an unknown wins—but surprises happen.

  • bonjwAI
  • CherryPi
  • cpac
  • DeepTerran
  • HOLD
  • Inspir
  • Myscbot
  • Sling

I think many are especially looking forward to the Facebook entry CherryPi by old hand Gabriel Synnaeve. I’ve read the papers, and my hopes are not high. My impression is that the project is not far along. I expect CherryPi to finish in the lower half, and I mainly hope that it will show some interesting behaviors along the way.

the unpredictable

A few bots are both potential top finishers and potential also-rans. Arrakhammer and McRave have shown inconsistent results, strong play when in their best form but many losses when problems creep into the codebase. AILien was often impressive on SSCAIT, but stopped playing there after updating to BWAPI 4.2.0. I don’t know what updates it may have seen since.

the known contenders

Iron is of course the favorite. In CIG, Iron suffered because it was unready for the new and troublesome maps. In AIIDE, the maps are standard and are the same as last year. I see a strong chance that Iron will pull in well ahead.

Bots that are likely to finish high are LetaBot, Microwave, PurpleWave, Steamhammer, UAlbertaBot, and ZZZKBot. ZZZKBot won CIG 2017 and is a perpetual contender. This tournament has more new bots than CIG, which favors ZZZKBot’s exceptionally well-implemented 4 pool—it smashes opponents which are not perfectly prepared. ZZZKBot is my pick as the most likely #2 finisher. Beyond that, it gets harder for me to forecast.

Steamhammer I think is likely to finish fairly high, but outside the top 3. It will lose almost all games versus Iron and PurpleWave, and some versus UAlbertaBot (which I expect to be stronger with the new SparCraft). Without opponent modeling and unable to adjust its strategy mixes, Steamhammer is also at risk of losing many games to newcomer bots. I see the opponent model as a necessary feature for a tournament, because unknown opponents might do anything, and known opponents might have surprises in hand.

But really, we don’t know. The tournament exists to find out.

Update: The roster page now says that 4 of the newcomer bots, plus AILien, either withdrew or did not submit. 27 contenders remain.

Update 2: Now AILien is listed as submitted after all. Whew.

Steamhammer 1.3.1 change list

Steamhammer 1.3.1 is the AIIDE 2017 tournament version. I made the submission.

Here’s the change list. This version is configured specially for the tournament. On Saturday I’ll release the link to the tournament submission, and not long after I’ll upload a version to SSCAIT with a configuration tailored for SSCAIT.

I list a lot of changes, but they are small. I allowed only changes that were quick and had a low risk of introducing bad side effects, since I was concentrating on opponent modeling. I expect the net effect will be that Steamhammer plays a slightly but distinctly cleaner game. I think that people watching the games won’t be able to tell the difference unless they know what to look for, but if they watch long enough they’ll realize that the new version makes fewer strange blunders.

configuration

• Stuff related to I/O moved to a new IO configuration section. The ReadDirectory and WriteDirectory used to be under Strategy. This is where you turn the opponent model on or off. It is currently off, because it isn’t working yet.

• Added one new command, "go scout once around", which tries to send the scout on one circuit of the enemy base and then return it home. The waypoints can get messed up, so in some positions on some maps the scout follows a strange path.

• I added a half-dozen zerg openings. They were intended as options for the opponent modeler to choose among. They are 2 hatch and 3 hatch lurker openings to fill out the lurker selections, turtle openings to hold against zealot rushes, and a couple additional anti-4 pool openings to give more choices against hard zergling rushes.

buildings

• Steamhammer has a current “main base” where new buildings go. It frequently chooses a random new main base to spread out its buildings. This especially helps protoss avoid filling up its starting base and dying. The change is to not shift to a new main base while in the opening book. The effect is that the opening line is executed more efficiently, with less drone movement. See InformationManager::maybeChooseNewMainBase().

• Release the worker when a building is canceled or fails to construct. BuildingManager::undoBuildings(). There seems to be at least one more case where the building manager doesn’t release a worker when it should.

• Keep track of how long it has been since a building was ordered, and how many workers have been assigned to build it (a new workers gets assigned when the previous one is lost). If the building has waited too long, or too many workers have died trying to make it, then cancel the building. This prevents Steamhammer from sending all its workers through the enemy army to start a building (it still sends a few). It also stops the building manager from accumulating buildings that can’t be started, the cause of more than one serious bug.

tactics

• Bftjoe suggested dropping the “is any enemy unit in range?” check from Squad::needsToRegroup(). Good idea. It was originally an optimization in UAlbertaBot, and started to grow complicated as I tried to fix bugs. Let the combat simulation handle it; that way is simpler and better.

• I dropped the “we just retreated, don’t attack yet” timer from 3 seconds to 2. Also part of Squad::needsToRegroup().

FAP combat simulation

• The author N00byEdge fixed a bug in FAP in calculating concussive damage. This especially helps it understand how badly zerglings lose to vultures.

• Don’t pass carrier interceptors to FAP. As bftjoe pointed out, FAP is designed to treat the carrier as if it were doing the damage directly.

• The past Steamhammer 1.3 did not add workers to the combat sim. I changed it to add workers, whether ours or the enemy’s, which have attack orders. A worker which is actively engaged in combat will be simulated, at least while it is attacking rather than blocking or fleeing, and one which is busy working will be ignored. It should be more accurate.

• Other minor changes to which units are passed in to the combat simulator. For example, SparCraft understands detectors while FAP does not, so Steamhammer no longer passes mobile detectors to the combat sim—it won’t do anything with them. Overall, the unit selection code is simpler and cleaner. As part of this, I deleted the now unused InformationManager::isValidUnit(); the job goes to UnitUtil, which is responsible for classifying units.

other stuff

• Steamhammer had several routines which selected a mineral worker to complete some job. It failed if there were no mineral workers, even if there were idle workers that could be assigned. These routines can now choose idle workers too, fixing several bugs.

• I fixed the drone dance bug in WorkerManager::handleGasWorkers(). The bug was located by Arrak.

• Tracking of bases is not always accurate—even our own bases. I added InformationManager::updateTheBases() to periodically double-check and correct errors. So far, only the case of “we thought this was our base, but it isn’t really” is written; the other cases are unfinished. The fix corrects some serious misbehaviors.

UnitUtil::IsValidUnit() now considers a unit “valid” when it is loaded into a bunker or transport. A loaded unit does not have a valid position, so it used to say that the unit was invalid.

UnitUtil::GetAllUnitCount() has the job of counting all units of a given type, whether complete or not. It now counts uncompleted morphed units, meaning lurkers in the egg and guardians and devourers in the cocoon.

project settings

• Don’t try to link SparCraft, which is long gone. SparCraft was still included in a Visual Studio linker setting, and I didn’t notice because I had an old copy still lying around. Sorry about that. :-(

zerg strategy boss

• If our tech target is a lair tech, make sure we have at least 2 extractors. If it is a hive tech, make sure we have 3. This corrects some strategy freezes where Steamhammer would stay at a low tech level for a long time, not getting more gas because the low tech level didn’t need it, and not teching up because the higher tech level wanted more gas. It was a gap in the rules.

• Don’t try to research hydra upgrades and lurker aspect simultaneously. Oops. This caused a production freeze until the research finished.

• The previous Steamhammer 1.3 considers that a ground emergency lasts 5 seconds after the last enemy unit is seen off. Then it says, “whew, time to make drones.” But if it was a close defense, then it was left with few combat units and often lost to the next attack. I bumped the timer to up 15 seconds, enough time to rebuild combat squads before switching to drones. The change makes the biggest difference in games versus protoss.

• Fixed a minor bug in deciding whether to make an aux unit (an extra unit type added in small numbers to the regular unit mix).

• Also relaxed the condition for creating an aux unit, so Steamhammer is quicker to make them. The biggest effect is that if it is going hydra-ling and also has lurker tech, it is more likely to add the 1 aux lurker (or so) to its unit mix, increasing its fighting effectiveness.

• Other adjustments to the unit mix, to what-counters-what, and the like. The code needed a little refactoring to allow for the (removed) hookup to the opponent model, but changes to the logic are minor.

• Fixed a typo that caused Steamhammer to try to make ultralisks when it wanted guardians as a main unit. Oops. The effect was usually to get no hive units, since there was usually no ultra cavern.

• When Steamhammer has a tech target, it sometimes techs too fast. I tuned it down in certain cases. It used to often research lurkers and then immediately start a spire instead of using the gas to make its first lurkers, a harmful delay. Now it holds off on the spire and queen’s nest until it has “enough” combat units (according to an arbitrary low limit).

no opponent modeling in AIIDE :-(

I am bitterly disappointed. I wrote my opponent modeling code. I kept it as simple as I could, since it’s the first draft of a major new feature. When it seemed to work I hooked it into the strategy boss, which then made decisions based on the enemy’s predicted future unit mix, not based on the current observed unit mix. I ran a few test games without visible problems. Then I tested more thoroughly, and turned up severe bugs. It is not close to solid enough to enter into a major tournament. And as far as I am concerned, frantic last minute debugging is not a plan.

I got what I deserved for hurrying to meet the deadline. I should have run careful tests earlier. I still think I had enough time to polish up the basic features, if I had only worked wisely. It’s not that complicated. It’s a simple idea that nobody happens to have implemented before (that I know of).

Anyway, for the AIIDE version I’ll turn off opponent modeling and remove the hookup into the strategy boss. I’ll release it as version 1.3.1, with a configuration file specially tuned for the tournament. It does have bug fixes and other improvements over the current version 1.3, so it should play a little better. (I count 19 bug fixes and small improvements plus 1 minor new feature, as well as new 2 hatch and 3 hatch lurker openings.) Then I’ll take my time with opponent modeling and release version 1.4 when it is good and ready.

The opponent model has strong capabilities in principle. It can meet most of the goals I laid out. If there is data—enough data from past games and enough from the current game—it can estimate the enemy unit mix at any future time during the game, and can predict the times of important events like “enemy gets air units” and “enemy gets detection.” An opponent that plays a fixed strategy should turn transparent by the second game. An opponent with a modest number of fixed strategies should become predictable as soon as Steamhammer has seen all the strategies—and when the current game has progressed far enough to distinguish them. I do expect trouble predicting opponents which successfully hide their tech, or which make important decisions in the middle game randomly, or which (like Steamhammer itself) play a wide range of strategies.

The downside is that it is only a data structure. Each use of the prediction capability has to be coded into the bot separately. I coded the prediction routine to estimate the enemy’s future unit mix. I coded 2 uses of the prediction in the strategy boss, one to set the tech target and one to set the unit mix (using different time horizons). When I want to use predictions to make spore colonies in time, or to get the right number of scourge at the right time to counter XIMP’s carriers as they show themselves, I’ll have to separately code more uses into the strategy boss, because of the scattered way the strategy boss works. When I want to use it for opening selection, I’ll have to code a different prediction routine and use it. And so on. The ability to predict is powerful, and you have to pay for the power.

Since I kept things simple for the first cut, predictions may be a little unsteady. It should be good enough to be useful, but whether it is or not, there is room for improvement.

I’m bitterly disappointed not to get opponent modeling into AIIDE. It’s an easy idea—to me it’s obvious, and I have to imagine that nobody else has done it only because they’re concentrating on other aspects. But it’s also new and I expect it to be successful. If I’m right, then other authors will soon be borrowing my opponent modeling code or writing their own along similar lines. The actively developed bots may take a step up in ability.

choosing tournament maps

I doubt anyone will take my advice, and I certainly won’t take it myself, so I can offer it with complete freedom. Here’s what I would take into account in choosing maps if I were running a tournament.

Balance. You want the maps to be fair across races. We can’t use statistics from pro games to judge balance, because bot balance and pro balance are unrelated. Also bots are improving rapidly and the participant pools are small, plus we may choose some new maps in each tournament, so past tournament statistics are not too helpful for balance either. But the same graph I linked above, showing that bot balance and pro balance are different, also shows that bots have narrower imbalances. Or to say it differently: The maps may have imbalances, but bots suck at exploiting the imbalances. Choose enough maps, and the balance differences will average out statistically; it’s the same principle as balancing a portfolio of stocks. The 5 maps of CIG 2017 are not enough to convince me, but the 10 maps of AIIDE are probably enough.

Number of starting positions. For this year, maps were chosen like this:

tournament2 player3 player4 player
CIG1 (20%)2 (40%)2 (40%)
AIIDE3 (30%)2 (20%)5 (50%)
SSCAIT3 (21%)2 (14%)9 (64%)

Those all seem reasonable to me. I like SSCAIT’s ratios best. 2 player maps favor rush strategies and 4 player maps favor macro strategies, and you don’t want to emphasize either too much. One issue is that there aren’t many good 3 player maps (though the best ones are quite good).

Novelty versus consistency. If you carry some maps over from year to year, we can use them to (at least try to) measure balance changes and progress. If you introduce new maps, you pose a stronger test of adaptability. If I were choosing, I would pick some old standbys and a few unusual maps that the bots might not have played on before (or else run specialized tournaments and do both separately). CIG has done a good job of this, though I think it’s only a side effect of their process and not a deliberate decision.

Prodding bots to improve. Since bots are poor at exploiting map features, I want to include some maps with exploitable features to encourage authors to step up. Think of Iron failing on the map Hitchhiker at CIG 2017 because (as the author explained) BWEM did not grasp all the map features; do you think Iron will fail the same way next year? I proposed the map Namja Iyagi, which has 6 islands, as a map with exploitable features which is still playable by bots that do not understand islands. PurpleFistJadian suggested Outsider, which has the exploitable feature of pushing units through mineral lines and remains playable without. There are a lot of choices; the universe of pro maps is large.

No appearance of cheating. Sometimes the tournament organizers participate in the tournament. When not, the organizers may have a real or apparent interest in some participants: “Bot X uses method Y, which I’ve been pushing. So if X does well....” To avoid controversy, we may want map selection to make favoritism visibly difficult. So divide your universe of maps into classes depending on the other goals, and choose randomly from each class. We’ve seen the procedure of accepting a number from each participant, XORing the numbers together, and using that as the random number seed for a known generator, so that the process is transparent and tamper-resistant. It has never been clear to me whether the organizers actually follow the elaborate process. We often see map pools reused from year to year, probably to save time. Well, if there are no suspicions, then there is no reason to allay them. I have no reason to suspect that the tournaments are unfair, even unintentionally.

bftjoe hits a winning streak

These 2 games are instructive. They were played almost back to back. bftjoe beats Iron with a 1 base lurker opening. The strategy is close to Steamhammer’s overpool lurker build, but there are some wrinkles. The lesson: Wrought Iron is brittle. Iron runs on specific rules, and when a situation comes up that the rules did not anticipate, it can break down. Casiabot has also defeated Iron with 1 base lurkers.

lurkers break in

bftjoe beats Krasi0 with 2 hatch mutalisk play. The lesson: Krasi0 has different builds for different purposes, and it is subject to strategic surprise if you figure out how to counter one of its builds.

mutalisks wreak havoc

The top bots have been staying ahead of the pack, but they are not pulling further ahead. They remain vulnerable.

CIG 2017 results first look

The CIG 2017 tournament results are out. So far we have a results table and a slideshow of the conference presentation with a few interesting details. I expect the full results will be out before long. Here I summarize the results table in a way that emphasizes the tight grouping of finishers 4 through 11; they all scored nearly the same. There is a wide gap between the bots with plus scores and the lower half.

placewin ratebots
175%ZZZKBot
274%tscmoo
367%PurpleWave
4-1163-57%LetaBot, UAlbertaBot, MegaBot, Overkill, CasiaBot, Ziabot, Iron, Aiur
12-2046-9%McRave, Tyr, SRBotOne, TerranUAB, Bonjwa, Bigeyes, OpprimoBot, Sling, Salsa

The format this year was straight round robin with 20 entrants, 5 maps, and 125 rounds. With 125 games for each pair of opponents, 25 games on each map, slow machine learning algorithms had some data to bite into.

3 of the 5 maps are also used on SSCAIT: Tau Cross, Andromeda, and Python. The 2 player map Hitchhiker has a short rush distance and favors cheese. But if the game does not end early, then the narrow ravine between bases and the arrangement of map blocks calls for sophisticated play. I think Hitchhiker must be a difficult map for bots. The remaining map is 3 player Alchemist, which was also used last year. Each base has 2 ramp entrances, so a bot which wants to defend at “the” ramp may go wrong.

Discussion. The sophisticated 4-pooler ZZZKBot by Chris Coxe was the top winner. I imagine that having Hitchhiker in the map pool helped it. I’m curious to see its games and find out whether it had special-case strategies for specific opponents, as it has had in the past. Tscmoo played random for the first time and placed second. These 2 usually place high. PurpleWave came in third, an outstanding performance for a new entrant. Congratulations!

The pre-tournament favorite Iron did surprisingly poorly. It must have had a bug. I don’t know the cause, but my first guess is that it suffered on one of the maps. McRave is another bot which did not perform at its peak.

The most interesting detail in the slide show is a pair of graphs showing the effect of machine learning for opponent modeling. The first graph shows the win rates of the winners ZZZKBot and Tscmoo sagging toward the end of the tournament. The second shows MegaBot and SRBotOne soaring toward the end and says that they were the top scorers in the final rounds 120-125. In other words, if the tournament had continued long enough, the winners would have been completely different. One the one hand, this shows the power of machine leaning; on the other hand, it shows the slowness, because the long tournament was not long enough. In Steamhammer, I would like both fast adaptation and slow adaptation. The middlegame use of fast adaptation is almost working now, and I intend to use the same mechanism in a different way for opening selection. But there will be no time to add slow adaptation to fine-tune as more games accumulate.

I think that UAlbertaBot and Overkill were the 2015 versions. UAlbertaBot presumably had learning turned off. Those 2 and OpprimoBot have been constant for a while and can serve as benchmarks to judge progress. (AIUR is not as constant a benchmark because it has learning turned on.) In AIIDE 2015, the top finishers in order were Tscmoo, ZZZKBot, Overkill, UAlbertaBot. So in 2 years, former top bots have receded into the pack of above-average scorers.