archive by month
Skip to content

the winning attitude

I compare 3 idealized attitudes a game programmer might take. Real people aren’t so simple, of course, but it’s a way to think about attitude. In reality, you can pick your own goals, or you can not even think about it and just do what comes naturally, whatever. The only prescription here is if your goal is this, then these points may help you approach it.

the winning attitude

Trying to come out on top.

I think that working on a game program is a lot like working on an industrial process or business procedure: The principles of continuous improvement apply.

  • The goal is to reduce mistakes and inefficiencies.
    • To win, reduce the biggest play errors.
    • Thou shalt not suffer a bug to live. Even a minor bug detracts from your ability to understand what’s happening.
    • Code for clarity and reliability.
    • Test carefully to find bugs and understand what works and how.
    • When you have a choice, go after the big win first. Don’t go astray solving little problems when you can solve a big one.
  • Constantly try new ideas.
  • Objective data tells you which ideas live or die. Your intuition may often be wrong.
  • Never stop! There is no “good enough,” at most “good enough for today.”
  • Dig in and analyze details. The deeper and wider your understanding, the better your new ideas will be.

See how chess programmer Ed Schröder does it. Chess programs are mature, of course, and improve in much smaller steps than Starcraft bots, but there are similarities.

“There will be errors in the program—hundreds of them—and it is only through extensive testing that the errors can be found and corrected.” — Jonathan Schaeffer, One Jump Ahead

the academic attitude

Trying to understand.

Typical long-term goals are to find a powerful technique that works to solve many different problems, or to understand theoretically when and why a given technique works. In practice, academics usually have to settle for step-by-step progress like everybody else. Typical methods include simplifying the problem or tackling one aspect at a time, to make it more tractable. An academic will not consider the work done when the program is proven to work, but only when the theory behind the program is proven to work—I think that that theoretical inclination is the most important difference between the academic attitude and the winning attitude: Theoretical versus practical.

The academic attitude is responsible for a lot of great AI ideas, from alpha-beta to deep learning. I think that the winning attitude is responsible for the lion’s share of the refinements and practical tricks in putting the great ideas to use. The two attitudes don’t exclude each other, after all.

The paper “Evaluating Real-Time Strategy Game States Using Convolutional Neural Networks” by Marius Stanescu, Nicolas A. Barriga, Andy Hess and Michael Buro in the CIG 2016 proceedings is a good example of the academic attitude. The authors want to apply deep learning with convolutional neural networks, as in AlphaGo, to RTS games. It’s a big problem that can’t be solved in one short paper, only started. And they don’t start with a full-size game like Starcraft, they start with a greatly simplified game as a testbed to begin to get some insight. The key conclusion: “several orders of magnitude slower than the other evaluation functions, but the accuracy gain far outweighs the speed disadvantage.”

the hobbyist attitude

Having fun.

This is more freeform. No bullet list! Do whatever’s interesting, try crazy experiments because they might turn out entertaining. Or repeat other people’s successes, if you think that’s fun. Some people do.

Tscmoo’s terran nuke and protoss dark archon strategies are the hobbyist attitude at work.

SSCAIT scores - raw data for yesterday’s graph

Here is a csv file of rating differences and score ratios. It contains all the data that went into yesterday’s graph, plus the names of the opponents so we can see which bots are behind each dot. It is over 4MB for 80,798 games, hardly Big Data territory but not something you want to look through by eye either.

Krasi0 asked about the outliers in the upper left, where the weaker bot upset the stronger with a huge score ratio. I pulled out the 44 games where the weaker bot was at least 50 Elo behind and won with a score ratio of 100 or more. There’s quite a variety of bots and it looks like most of the losers are genuine strong opponents, but we can’t tell from this why they lost so badly.

winnerloserrating_diffscore_ratio
Odin2014Soeren Klett-121.241105546074102
Maja NemsilajovaGaoyuan Chen-87.5441567081255133.75
EradicatumXVRMartin Rooijackers-176.159988871814134.75
Florian RichouxAndrew Smith-75.4167275722925139.75
EradicatumXVRMartin Rooijackers-160.787872111647123.6
Jakub Tranciktscmooz-234.397880730261107.85
EradicatumXVRTomas Vajda-310.393156864715238.75
Odin2014tscmoo-341.595996071025111.2
Jakub TrancikKrasimir Krystev-206.879443535575118.5
Martin RooijackersTomas Vajda-185.169798041552178
Martin RooijackersTomas Vajda-196.771893565481196.5
Jakub TrancikFlorian Richoux-201.533152887389146
Martin RooijackersTomas Vajda-214.947508799781194.5
EradicatumXVRSoeren Klett-59.609552771464121.625
Martin RooijackersTomas Vajda-197.317290128813173.75
EradicatumXVRMartin Rooijackers-153.322771550638134
NUS BotTomas Cere-217.342932814072102.25
EradicatumXVRMartin Rooijackers-299.534513717939140
Martin RooijackersTomas Vajda-177.764917764824171.5
Jakub Tranciktscmoo-114.300240687596113.25
Martin RooijackersTomas Vajda-261.726998366137185.5
Tomas VajdaICELab-61.7649472971907110.3
Maja NemsilajovaGaoyuan Chen-444.702632047678148.5
Jakub TrancikAurelien Lermant-96.3561952340121144
Adrian SternmullerSerega-165.832601805773127.6
Ian Nicholas DaCostaTomas Cere-180.057678684451109.25
OpprimoBotRadim Bobek-107.746261964435187.75
Martin RooijackersTomas Vajda-276.603234707109108.2
Florian RichouxMartin Rooijackers-117.736046527607126
Igor LacikGaoyuan Chen-94.2338975266132121.75
Florian RichouxAndrew Smith-186.917595581173123.25
Radim BobekUPStarcraftAI-62.9693177806437136.5
Tomas VajdaAndrew Smith-79.8133863519054152.771739130435
Jakub TrancikFlorian Richoux-84.9235344515139128
Martin RooijackersTomas Vajda-78.9815583248117207
Jakub TrancikPeregrineBot-69.7933587952098121.25
Dave ChurchillAndrew Smith-94.5789654176522180.75
Tomas VajdaWuliBot-52.5760915924302182.5
Soeren KlettDave Churchill-155.311160590808361.25
Odin2014PeregrineBot-94.3129452431417124.55
DAIDOESMegaBot-178.609296376345115.875
OpprimoBotMartin Rooijackers-786.274344230815307.75
Soeren KlettDave Churchill-68.2015666358432109

SSCAIT scores - compared to rating differences

This scatter chart shows rating differences versus score ratios. For each SSCAIT game in the dataset which ended with both scores above zero (about 80,000 games), the x-axis has the rating difference, winner’s rating minus loser’s rating. The ratings are calculated as of that game, so changing versions don’t mess things up (much). Check the Elo table to see what the rating differences mean. The y-axis has the score ratio, winner’s score divided by loser’s score. The y-axis is on a logarithmic scale and shows ratios from 0.2 to 200. A small number of points are off the top or bottom; no points are off the left or right. You can click through the graph to get the same image on its own, which may make it easier to zoom in and out.

Since it’s from the winner’s point of view, most games have rating difference > 0 (the higher rated bot won) and score ratio > 1 (the bot with the higher score won).

scatter chart of rating difference versus score ratio

There is complex structure here, but I’m at a loss to interpret much of it. Horizontal lines show that some score ratios are popular, which seems like a quirk of the game. Beyond the sharp lines, some fuzzier stratification in the score ratios is visible. When the stronger bot wins, it is usually by a score ratio of at least 2, increasing slowly as the strength differences goes up. The slowly rising “soft floor” in the score ratio is interesting and surprising. There are other clear structures in the chart, but I don’t know what they mean. It is mildly interesting that the left side, when the stronger bot lost, looks less structured.

Games where the score ratio is less than 1 are games where stopping early and adjudicating by score would give the wrong answer. It’s rare... but not rare enough to give me confidence in the timeout adjudication procedure. Some points are off the bottom, so some bots won despite having less than 1/5 the score by SSCAIT rules. The adjudication procedure will make occasional extreme mistakes.

Next: The winning attitude.

SSCAIT scores - crashes

In the SSCAIT data, some scores are recorded as 0 or -100. I think 0 means that the bot scored no kills and no razings, so the game finished very early. Bots might legitimately lose with no kills against a rush, but I have to guess that in many cases it means the bot failed to start up or failed to do anything. I think -100 means the bot crashed. I also think that exceeding the time limit per frame ends the game without changing the scores, so -100 only means crashes, not time infractions. I don’t promise that my interpretation of the numbers is correct! I could be wrong!

Anyway, assuming that my interpretation is right, this table of -100 scores should tell us about reliability. Like yesterday, I include only games after 17 August, so that distant versions of the same bot are not lumped together. The game counts are different from yesterday, because yesterday’s table excludes games in which either side has a score of 0 or -100.

“Crashes” are games in which the bot had -100 score. “Double crashes” are games in which both sides had -100 score. “Fast crashes” are games in which the opponent had 0 score, so the crash must have happened very early or when the opponent failed to achieve anything. “Max opp score” is the largest score that the opponent had in any game where the bot crashed. “Other crashes” are crashes where the opponent has a positive score (all crashes except double crashes and fast crashes). “Mean pos opp score” is the average of the opponent’s score in these other crashes. Large opponent scores mean that the game went on for a long time before the crash. Since a crash scores as -100, there is no way to tell whether the bot was ahead or behind when it crashed.

botgamescrashescrash %double
crash %
fast
crash %
max
opp
score
other
crash %
mean
pos
opp
score
krasi039410.25%0.25%0.00%-1000.00%-
Iron bot33182.42%1.51%0.00%72250.91%4367
Marian Devecka302309.93%0.33%0.00%1291109.60%52320
Martin Rooijackers33620.60%0.60%0.00%-1000.00%-
tscmooz33030.91%0.91%0.00%-1000.00%-
tscmoo33210.30%0.30%0.00%-1000.00%-
LetaBot CIG 201631810.31%0.31%0.00%-1000.00%-
WuliBot30430.99%0.99%0.00%-1000.00%-
Simon Prins318195.97%2.52%3.46%00.00%-
ICELab334123.59%0.30%0.30%35002.99%2960
Sijia Xu28510.35%0.35%0.00%-1000.00%-
LetaBot SSCAI 2015 Final338288.28%0.89%0.59%608506.80%16196
Dave Churchill3230.00%0.00%0.00%--
Chris Coxe22310.45%0.45%0.00%-1000.00%-
Tomas Vajda317165.05%0.63%0.32%1184354.10%44220
Flash30620.65%0.65%0.00%-1000.00%-
Zia bot3273310.09%0.92%1.22%450957.95%23120
PeregrineBot1570.00%0.00%0.00%--
tscmoop32820.61%0.61%0.00%-1000.00%-
Andrew Smith33451.50%1.50%0.00%-1000.00%-
Florian Richoux3083110.06%0.97%0.32%574058.77%14962
Carsten Nielsen34741.15%1.15%0.00%-1000.00%-
Soeren Klett28420.70%0.35%0.35%-1000.00%-
Jakub Trancik3180.00%0.00%0.00%--
Tomas Cere34530.87%0.00%0.58%818250.29%81825
MegaBot3055417.70%1.31%2.95%9245013.44%16394
Aurelien Lermant34941.15%1.15%0.00%-1000.00%-
Odin2014183179.29%0.55%1.64%386007.10%22077
Gaoyuan Chen3290.00%0.00%0.00%--
DAIDOES13685.88%0.74%0.74%17504.41%4742
Igor Lacik144139.03%0.69%0.00%541008.33%13421
Matej Istenik32310.31%0.31%0.00%-1000.00%-
NUS Bot13732.19%0.00%0.00%341752.19%22400
Roman Danielis30430.99%0.99%0.00%-1000.00%-
ZerGreenBot362569.44%2.78%8.33%3702558.33%22087
Ian Nicholas DaCosta15010.67%0.67%0.00%-1000.00%-
AwesomeBot164169.76%1.22%1.83%278506.71%7968
Johan Kayser30610.33%0.00%0.33%00.00%-
Martin Vlcak1511912.58%0.66%11.92%00.00%-
Rob Bogie1350.00%0.00%0.00%--
Christoffer Artmann3435215.16%0.58%11.95%7002.62%300
Marek Gajdos1756838.86%26.29%12.00%1000.57%100
Travis Shelton1550.00%0.00%0.00%--
Bjorn P Mattsson2883612.50%0.00%12.50%00.00%-
Vladimir Jurenka3493710.60%1.43%2.87%4006.30%264
neverdieTRX17010.59%0.59%0.00%-1000.00%-
OpprimoBot3515114.53%0.57%1.42%3882512.54%17424
Sungguk Cha32016852.50%1.25%4.69%6067546.56%17463
Jacob Knudsen1775631.64%0.56%3.39%1775027.68%5957
HoangPhuc15111576.16%0.66%5.96%3805069.54%10765
ButcherBoy1423524.65%0.70%7.04%1000016.90%1364

The top bots are mostly reliable—except Marian Devecka’s Killerbot. My impression from watching games is that Killerbot may crash when losing. It makes sense that bots should crash less often when winning, because bot authors have more reason to fix crashes that lose winning games. The bottommost bots are all crash-prone. Not a coincidence, is it? Step 1 to a winning bot: Fix your crashing bugs!

We might imagine that double crashes are caused primarily by the game or some other part of the runtime system, but the high rate of double crashes by Marek Gajdos makes me question that. What is that bot breaking?

Tomorrow: I’ll try to correlate scores with ratings, one way or another.

SSCAIT scores - summary by bot

I’m looking into the scores recorded in the SSCAIT game data, which I have up to 27 September. So far I haven’t found anything too interesting, but it’s not entirely useless either.

According to the SSCAIT rules, a player’s score is the sum of units killed plus buildings razed: BWAPI::Player::getKillScore() + BWAPI::Player::getRazingScore().

Here’s basic score information for the dates between 17 August 2016 and 27 September 2016. It’s the same date range I used in the SSCAIT crosstables, chosen so as not to smear too many different bot versions into one table. The difference and ratio columns are all arithmetic means. They give the difference or the ratio between the winner’s and loser’s scores. (Well, the loss score ratio is the ratio between the loser’s and winner’s scores, to make it easier to compare by eye.) Games in which either side had a score of 0 (no kills) or -100 (crash) are left out.

botgameswin %mean
score
mean
win
score
mean
loss
score
win
score
diff
loss
score
diff
win
score
ratio
loss
score
ratio
krasi029886.91%54850556764936845776930615.583.06
Iron bot24280.17%23967248352045918644-769916.774.06
Marian Devecka21192.42%211202221377989442-295788.978.94
Martin Rooijackers26079.23%25077287221117022046-276869.1413.59
tscmooz24773.68%2032225134684712118-152039.8410.71
tscmoo27271.69%35035415911843523035-240155.496.53
LetaBot CIG 201625673.05%26635302291689622995-218109.034.24
WuliBot23465.81%8441862780846814-1805717.2212.69
Simon Prins22465.62%24256280581699821971-1239117.6812.18
ICELab24266.53%36286430992274528818-263506.4710.22
Sijia Xu23663.98%121191459877159913-2346015.798.28
LetaBot SSCAI 2015 Final23664.41%1940024725976418854-1742711.045.24
Dave Churchill23956.49%6656841243776394-1637017.4111.89
Chris Coxe17557.71%2989388617633468-379823.596.49
Tomas Vajda23064.78%34027396272372734048-1923436.525.11
Flash25164.14%114351387270758335-253057.789.89
Zia bot23651.69%130821764082058544-1551812.127.28
PeregrineBot13151.91%3560541515584651-658722.7712.65
tscmoop25852.33%156532312874488055-307189.9514.85
Andrew Smith25756.03%15896183911271712180-265088.847.42
Florian Richoux22353.36%147752280455888518-228056.3011.30
Carsten Nielsen27550.91%101471212181008140-2005711.686.89
Soeren Klett22845.18%39869573192549043732-129739.3211.69
Jakub Trancik20445.10%15552159651521211133290914.556.41
Tomas Cere28144.13%1831732027748818054-289587.1316.25
MegaBot18555.68%1666122464937216042-174739.2116.45
Aurelien Lermant28837.15%18580366697886-12822-226411.2512.13
Odin201413146.56%125521675788878998-1675716.2311.67
Gaoyuan Chen25839.53%114171685578618691-266987.1313.62
DAIDOES10127.72%1079423254601517054-1009312.4313.22
Igor Lacik10535.24%1318628378492014545-144837.667.40
Matej Istenik26029.62%17509326521113719086-115097.547.64
NUS Bot9632.29%70691288542958953-147548.7313.61
Roman Danielis24422.95%19086432681188320858-271513.8310.30
ZerGreenBot666.67%97621355421802291-80453.4410.87
Ian Nicholas DaCosta11616.38%40481035828126223-87957.5612.45
AwesomeBot11526.96%81001871241849280-168983.1120.40
Johan Kayser24917.67%1296535732807823303-1020710.8711.14
Martin Vlcak10231.37%1138620558719410658-1545912.3414.06
Rob Bogie5950.85%127771624891869555-2266213.034.92
Christoffer Artmann19717.26%771224928412115025-151944.9416.67
Marek Gajdos7211.11%518716426378212820-96715.8016.60
Travis Shelton10616.98%79421781659229374-94553.5117.10
Bjorn P Mattsson19018.95%49951473027197300-152373.9928.59
Vladimir Jurenka14228.17%95611533372978970-116915.607.47
neverdieTRX13714.60%849321777622213883-107834.5511.22
OpprimoBot21812.39%15672252581431717671-565420.1813.84
Sungguk Cha12225.41%1526334072885521740-114686.508.50
Jacob Knudsen9219.57%728519250437511994-121935.0818.43
HoangPhuc1291.67%151481631623009550-223453.1410.72
ButcherBoy156.67%1777565515003105-65432.2218.09

The most striking point is that Krasi0 was ahead in points, on average, in the games that it lost. So was Jakub Trancik’s cannon bot. The data that I have does not record the cause of losses. It’s perfectly possible to lose while ahead on points, when you fight efficiently and destroy masses of enemy stuff before dying. But you may also be ahead on points when you crash or overstep the time limit.

Score increases as the game goes on. I think that the score diff columns mostly tell us how long the games were. So, for example, Marian Devecka’s Killerbot often won short games and hung on through a long fight in lost games. The score ratio columns seem more informative about how far ahead the winner was at the end of the game. Killerbot tended to win with about the same point ratio that it lost with. Krasi0 won with a huge point ratio and lost with a small ratio, which might reflect its defensive style. Iron, which is super-aggressive, also won with a huge ratio and lost with a small ratio, which in its case might mean that it won after a long series of pinpricks or lost in a sudden collapse.

The numbers are not easy to interpret! But they must mean something.

Tomorrow: I’ll try to dig out something about the rate of failing to start up.

learning signals

For strategy learning, current bots as far as I’ve seen learn based on the game result and nothing else. It’s also the only learning I’ve written up so far.

I’ll tell you one of the deep secrets of the dark magic of machine learning: If you want to learn better, don’t grub around for a better algorithm like I did yesterday. I mean, you can and it will probably help. But first, dig for better information to learn from. The big gains most often come from finding better learning signals. Yesterday’s suggestion about generalizing across opponents and across maps was an example.

When Bisu loses, does he adjust his probability of playing the game opening downward? Not like that, no. He thinks through the game events and finds the cause of the loss. If Bisu came out of the opening in a sound position, then it would be silly to blame the opening for the loss, no matter what happened later in the game. (By the way, this is an example of the classic credit assignment problem, one of the oldest named problems in AI: I got this result. What features of the situation deserve credit or blame for the result?)

I expect that it will be a long time before bots can reason about cause and effect. But they should be able to figure out “am I more likely ahead or behind?” In fact, that can be a learning target itself. The input data is scouting info seen at a point during the game, and the goal might be (for example) to estimate the actual supply difference as seen in a replay (if you do learning from replays) or to estimate the probability of winning the game (which works for learning during games by temporal differences—worth reading up on if you don’t know it). Once you have the ability to estimate whether you’re winning, you can learn to choose the opening that leaves you in the strongest position, not the opening that is seen to win most often. If your estimate is good then it provides more and better information (a score at the correct point in the game) than whether you won or lost the game (1 bit after the game), so you’ll learn faster and better.

As a rough cut you could say: A quick win or loss is definitely related to the opening. If the bot adapts during the game, then the longer the game, or the more adaptation done after the opening, the less credit or blame the opening is likely to deserve. In fact, if you lost a long game then the opening might deserve credit for putting you in a position to survive that long!

The same general idea, look for good learning signals, goes for all kinds of learning. You already knew that if you want your bot to learn micro, don’t count won or lost battles, count units lost and damage done. It’s obvious, right? And so on.

generalization for strategy learning

This post is for people who want to do strategy learning better than we have seen so far, but who haven’t married AI and are looking for a few hints on what’s good to try. I assume the simplest case: The bot has a fixed set of strategies and wants to choose one based on experience (but possibly influenced by scouting info). Similar ideas work in more complicated cases, too.

In past posts I looked at the strategy learning done by Overkill and AIUR. Overkill learns (strategy, opponent) and AIUR learns (strategy, opponent, map size). I found out that, on the one hand, AIUR learned more by including the map size, but on the other hand, AIUR learned more slowly and didn’t have time to explore the possibilities thoroughly and find the best. It would be nice to learn (strategy, opponent, opponent race, map, player positions, any other results of early scouting), but how could a bot possibly learn so much?

Overkill and AIUR learn tables of outcomes. Tabular learning is slow learning because it does not generalize. AIUR may win with its cannon rush on a 2-player map against opponent A and opponent B, but when it faces opponent C on a 2-player map it starts with a blank slate. It doesn’t try cannon rush first because that worked against other opponents, it says “well gosh darn I don’t know a thing yet, I’ll pick randomly.” And again, when nexus-first wins against opponent D on 2-player and 3-player maps and AIUR faces opponent D on a 4-player map for the first time, it’s “well gosh darn.”

Tabular learning is, well, it’s the only kind of learning which does not generalize. Tabular learning is a form of rote memorization, and all the countless other learning algorithms try to generalize in one way or another. That doesn’t mean you should learn strategies using any random algorithm you have lying around, though. You can, but it’s best to look for one that suits the problem.

The problem requirements are not too complicated.

1. Our algorithm’s input will be a set of past observations like (strategy, opponent, any other data you want to include, game result). The output will be the strategy to play this game, where you don’t know the game result yet. Or at least the output will have enough information to let you decide on a strategy. Estimated-probability-to-win for each strategy choice is one idea.

2. Some of the inputs, like the opponent, are categorical (as opposed to numerical). We need an algorithm that likes categorical inputs. Some work best with numerical inputs. One way to look at it is: Fitting a curve from opponent A to opponent C doesn’t tell you anything about opponent B, so you don’t want an algorithm that’s always trying that.

3. The algorithm should work well with small to moderate amounts of data. In the first game of the tournament, with no observations made yet, you’ll pick a strategy from prior knowledge (pick randomly, or pick one that did well in testing, or a combination). In the second game, you want to consider your prior knowledge plus 1 data point. The prior knowledge stops some algorithms from saying “we lost the first game, by generalization all strategies always lose.” You want the 1 data point to be important enough to make some difference, and not so important that it immediately overrides prior knowledge. And so on to thousands or tens of thousands of data points if the tournament is that long (it’s hardly likely to be longer); by then, prior knowledge should not make much difference.

4. You also want to consider exploration. If you always play the strategy that looks best (a “greedy algorithm”), then you may be overlooking a strategy that plays better but happened to lose its first game, or that never got tried. You have to explore to learn well.

My suggestions. First, exploration is not hard. Epsilon-greedy (see multi-armed bandit) should always work for exploration. There may be better choices in particular cases, but you have a fallback. You can do better if the algorithm outputs not only an estimated win rate but also its confidence in the estimate: Preferentially explore options which have low confidence.

Second, prior knowledge is not too hard either. You can always encode your prior knowledge as a set of fictional data points, fish story style. Again, there may be better ways, especially if you go with a Bayesian algorithm which by definition includes priors.

The requirement to work with varying but mostly modest amounts of data means that batch algorithms that analyze the dataset as a whole are preferred. Incremental algorithms that analyze one data point at a time, like the huge family of reinforcement learning algorithms that includes most neural networks, are by and large less suitable; they have a harder time controlling the level of generalization as the amount of data increases, to learn fast enough without overfitting. It’s not that reinforcement learning won’t work, or even that it can’t be made to work just as well, but without extra knowledge and care you can expect it to be less effective or less efficient. I was surprised to see the new version of Overkill use reinforcement learning for unit production decisions—it may be a good choice, but if so it’s not obvious why.

I suggest boosted decision trees. Decision trees have good generalization properties with small and modest amounts of data, and adding a boosting algorithm increases their accuracy. Since there’s not too much data and strategy learning happens once per game, speed should not be a problem. (If it does get too slow, then discard the oldest data points.) Go look up code to implement it and check the license, you know the drill.

It’s just a suggestion. Other choices may be better.

In a little more detail, at the end of each game the bot records the result with whatever other information it wants to learn from: Opponent, race, map, etc. At the start of each game it reads the records and runs its learning algorithm from scratch (it doesn’t have to or want to remember what it thought it knew last game). You may want to vary this depending on tournament rules about when learning data becomes available.

With the learned model in hand, the bot can look at the game situation, run it through to find out what strategies seem best, and combine that with the exploration policy to decide what strategy to play.

What if some inputs are not known yet? Say the opponent is random and your scout didn’t find out the enemy race before it’s time to decide on the initial strategy. If the learning algorithm estimates win rates, here’s one way: Run the game situation through three times, once with each race, and combine the results. There are different ways to combine the results, but averaging works. The same for other information that you don’t know yet; run through each possibility that hasn’t been excluded (“I know they’re not at that base, but then my scout died”). If there’s too much unknown info to test all possibilities against your learned model, then limit it to a statistical sample.

Generalizing across opponents. If you have an opponent model, you can do better. If you’re able to recognize characteristics of your opponents, then you can remember the information in an opponent model and use the models to generalize across opponents. It’s a way of learning counter-strategies alongside counter-this-opponent strategies. I think opponent modeling should make strategy learning more effective. “Oh, opponent X went dark templar and I won with strategy A. Now I’m fighting opponent Y, which has been known to go dark templar too.”

  • opponent random?
  • opponent race
  • how rushy/all-in? (consider the earliest attack, or the early economy)
  • when (if ever) did opponent make unit X (for each X)?
  • when did opponent get upgrade Y (for each Y)?
  • when did opponent first use spell Z (for each Z)?
  • or in less detail: when did opponent get air units/detection/etc.?
  • how soon/often did opponent expand?
  • did the opponent scout my whole base?
  • was the opponent seen to take island bases?
  • was the opponent seen to attack island bases?

Or whatever you think might help. Since there’s never a ton of data, the huge number of inputs in the list might be too much.

Generalizing across maps can follow the same kind of idea: Number of players on the map, air distance and ground distance between bases, and so on. Adapting your strategy to the map is basic to Starcraft strategy, and bots are weak at it.

breakthrough game programs

In a given game, the progress of game playing programs shows occasional sudden breakthroughs to higher performance and long periods of slow improvement while programmers learn how to better exploit the previous breakthrough. It’s my observation. Historically, most games have 1 breakthrough idea whose refinement, as computers grow faster, eventually leads to superhuman skill.

A breakthrough is always based on a simple idea. Maybe it’s my power of oversimplification, but that’s how it seems to me. A breakthrough program is always complex, though: A new idea is not accepted as a breakthrough unless it performs, and if you come up with a potential breakthrough idea then you have to do the detail work to bring it to the point where it can actually break through. The detail work can include inventing more new ideas to support the breakthrough.

chess: Chess 4.0, Slate and Atkin, 1973
the simple idea: full-width search
By 1973, the alpha-beta algorithm was universal. Chess programmers generally believed that, to search deeply enough to play well, programs had to be selective about which moves they searched. They wrote complicated code to select good candidate moves and prune bad ones. Slate and Atkin observed that the pruning code was slow and introduced a lot of play errors when good moves were mistakenly pruned—how are you supposed to know whether a move is good before you search it? Their new version, Chess 4.0, searched all moves as deeply as their previous version had searched selected moves, and played much better. To exploit the breakthrough, they used efficient bit-board data structures and added complex new ideas of iterative deepening search, a transposition table to avoid re-searching known positions and to re-use move ordering information from past search iterations, and a quiescence search of forcing moves such as captures. In the 1990s the null move heuristic became popular and chess programs started to become more selective again—they try to search bad moves less rather than not at all. Today’s superhuman chess programs are highly selective in their searches. I don’t see it as a repudiation of the original breakthrough idea, but as a refinement: Here is how you tune it; you still search every move, some deeply and some shallowly.

backgammon: TD-Gammon, Gerald Tesauro, 1992
the simple idea: neural network evaluator trained by self-play
Previous backgammon programs had used hand-coded evaluators. Gerald Tesauro figured out how to train a neural network as the evaluator by playing the program against itself while using temporal difference learning to reduce errors, producing a far more accurate evaluator. Today’s near-perfect backgammon programs use the same idea, with the addition of various kinds of search. AlphaGo also famously adopted this style of evaluator—now you know where the idea came from.

othello: Logistello, Michael Buro, 1993
the simple idea: statistical evaluator with a huge number of binary features
An othello program with a fast search and a simple evaluation function can play well. Michael Buro added a highly knowledgeable evaluation function for overwhelming strength, crushing the human champion 6-0 in 1997. The idea behind the evaluator is: Score a large number of positions by search; these scores have to be reasonably accurate. Find a huge number of binary features that you can calculate fast (in othello, millions of possible disk patterns that may appear on the board). Then treat the evaluation features as independent and use straightforward statistical regression to find the value of each feature. It’s important to get the math details right; you can read some of the papers at the link. Suddenly you have an evaluator with a hell of a lot of detailed knowledge! The same idea has been used to produce strong evaluators in many other games, and I don’t know any reason it wouldn’t work in Starcraft.

contract bridge: GIB, Matthew Ginsberg, 1998
the simple idea: draw a statistical sample of the possible game situations
Bridge is a game of partial information: After the deal, you only know your own cards. As play goes on, you gain more information. GIB randomly generates game situations compatible with what it knows and evaluates its choices in the sample situations; whatever is best on average is best in reality, within the statistical margin of error. GIB and other current programs still use the same idea. See computer bridge for more. Starcraft is also a game of partial information, so the idea is directly relevant.

go: Crazy Stone, Rémi Coulom, 2006
the simple idea: draw a statistical sample of possible lines of play (Monte Carlo tree search)
(This is the same Rémi Coulom who wrote bayeselo, by the way.) GIB knows its possible lines of play and has to sample from the unknown game situations. Go programs know the game situation but, because there are a huge number of possible moves at any time, can only sample from possible lines of play. By now, this family of search algorithms is somewhat well developed; there have been rounds of improvement as new refinements are invented. The Starcraft bot MaasCraft uses a variation, and the latest versions of LetaBot are supposed to as well.

go: AlphaGo, DeepMind, 2015
the simple idea: use deep learning for search control
Go is such a difficult game that it took 2 breakthroughs to catch up to human performance. AlphaGo keeps the idea of Monte Carlo tree search from Crazy Go and other programs and adds search control by deep learning; that is its main new idea, in my view. AlphaGo also includes the learning by self-play idea of TD-Gammon and, with deep learning, makes it more successful than past attempts in go. I think everybody has noticed by now that deep learning is highly promising for Starcraft, too.

Search and knowledge are the two ways to get stuff done, and here I strip new ideas down to their essence, so it should be no surprise that the breakthroughs fall more or less neatly into two categories. The search breakthroughs are in Chess 4.0, GIB, and Crazy Stone. The knowledge breakthroughs are in TD-Gammon, Logistello, and AlphaGo. AlphaGo does not categorize so neatly, though; it is both a knowledge and a search idea. And that is as it should be, if you ask me. Search and knowledge belong together.

Right now, Starcraft programs are at the stage where the greater part of most programs is a mass of ad hoc hand-crafted heuristic kludges (it’s redundant; the 4 terms all mean the same). We’re still at the stage where chess programs were before Chess 4.0 and backgammon programs before TD-Gammon. We are still looking for our breakthrough. People are casting around for the weak point in the wall of difficulty. Now is the time to learn from past breakthroughs!

Personally, I’m expecting a knowledge breakthrough more than a search breakthrough. But it’s only my guess, and I could easily be wrong. The skills we want are creativity and close analysis; whoever has enough of both will produce the Starcraft breakthrough.

Bereaver status

Bereaver continues to score wins against its top rivals, but I’m not sure whether it has reached #1 on SSCAIT. Krasi0 and Iron dominate the weaker bots, winning nearly all games, while Bereaver still drops some games to them. It takes time and effort to make a bot solid against all the different strategies!

Bereaver loses some games to protoss bots with strong macro, such as Skynet, when Bereaver takes its natural too late. It can also lose to unusual strategies like XIMP’s carriers. Here Bereaver storms the carriers and takes their shields off—it wasn’t enough, they still had all their hit points. After a long fight, XIMP recorded the win.

Bereaver storms XIMP’s carriers

Bereaver’s zergling rush defense seems to work well against ZZZKBot, but shows weaknesses against variations. Here Zia opened 5 pool and, unlike ZZZKBot which suffered some pathing errors, immediately broke Bereaver’s ramp. The cannon-probe defense was firm and Bereaver held easily. So far so sound.

Zia breaks Bereaver’s ramp

But while ZZZKBot does the fastest possible 4 pool and can only follow up by sending more lings, Zia switched to drone production and teched to mutalisks off of 1 hatchery. There is no strong followup to a failed 5 pool and Bereaver could have shrugged off the weak air attack... if it hadn’t restricted its production to zealots. Even so, the zealots could have won the game if they had attacked instead of holding their ground.

Bereaver’s zealots stand around under air attack

Bereaver still put up a fight, but Zia added to its mutalisk numbers and finally won.

Against Overkill’s 9 pool, Bereaver still canceled its gateway at the first sight of the spawning pool, before the scouting probe had traveled far enough to see the drones. It was not a sound reaction to 9 pool. After losing the blocking probes, protoss was already behind in economy. Overkill aggressively went after more probes, and despite poor play later was ahead the whole game and won as it should.

Overkill goes after Bereaver’s probes

Notice Bereaver’s supply. With only 6 probes, it was impossible to catch up.

the elimination race

That’s enough data analysis for now. Too much of the same is growing stale. I still want to dig deeper into the SSCAIT data set, but first let’s take a break for a few of my usual ill-founded suggestions. Today a rant: Bots are inefficient at destroying bases.

Here’s a hidden principle that is implicit in Starcraft strategy and only occasionally mentioned: Who wins the base race? If the two armies bypass each other and each races to finish off the enemy base, who wins the game?

I think human players over some low level can answer this question off the cuff in common situations, because it’s basic to decision making. If you can win the base race, then you are free to leave the path to your base open. You have freedom of maneuver and can engage or not as you choose. You can consider whether it’s smart to work your way behind the enemy army. If you are fated to lose the base race, then you have no choice but to engage the enemy to avoid losing. When you are under threat, all you can do is seek favorable ground to make your stand. You are on the strategic defensive, in a sense. You’re not necessarily on the tactical defensive—for example, active forward play like Iron’s blocks the path to Iron’s bases and prevents any base race as long as it stands, and tactically it is aggressive not defensive.

Whether you can win a pure elimination race is only one consideration, of course. Nothing is ever that simple! The more mobile army has more choices. Mutalisks may be fast enough to ravage the enemy workers and still return in time to help defend, especially if the air distance is less than the ground distance. Even in that case, estimating the base race is how you tell when it’s time to return.

For an experienced human it’s easy to know who has the advantage in the base race, but it seems tricky for bots. Zerg has fewer buildings but they’re likely to be split between more bases. Killing a base includes defeating reinforcements produced during the process. The stronger army doesn’t necessarily win the base race; it depends on the situation. A hydra army that loses to the terran infantry army might win the base race because of higher damage rate since medics don’t shoot, but a lurker-ling army that wallops the marines could lose the base race because terran will float buildings while trashing the zerg bases. If it’s close, both sides may try to live longer with hidden buildings, a distant extractor for zerg or a sneak pylon for protoss.

And, of course, bots are not too efficient at killing bases and not too clever with survival tricks.

MassCraft’s tactical search understands base races in principle. I’m not sure whether it’s accurate enough to understand them in practice.

A bot that—one way or another—understood base races and could instigate them at the right times would probably score occasional easy wins. But I expect that the main advantage would be as for humans, knowledge of when you are free to maneuver and when you are forced to engage.

Lesson: If you take away nothing else, take away the idea that rapidly finishing off an undefended base has value. A lot of bots are slow and lazy about it, I guess because it’s “not important.” In reality, even if you’re attacking an undefended expansion and not caught in an elimination race, it’s important to target the key buildings first and destroy everything efficiently. First, you don’t know when enemy reinforcements might show up to interrupt. And second, the sooner the better when it comes to freeing your forces for the rest of the fight.

Bereaver may be the new #1

Bereaver is mounting a strong challenge for #1 on SSCAIT, and may have already reached it. It has been scoring wins against most of its rivals, including Krasi0, Iron, LetaBot and Tscmoo versions, and ZZZKBot. I haven’t seen a win against KillerBot, though.

Bereaver’s favored game plan is to outmacro its opponent and win with repeated direct frontal attacks, “damn the torpedoes” style. Against strong defenders like Krasi0 and IceBot that can lead to long games, but they aren’t strategically interesting.

Here Bereaver has defeated Krasio’s slightly sloppy early push in the center and broken into the terran natural. The zealots ignored the defending vultures and killed all the SCVs, bringing a quick win.

Bereaver damns the torpedoes, or at least the vultures

Bereaver has adequate skills with storm and reavers. Both could stand to be improved (but what couldn’t?). It knows how to cannon a ramp for defense (the early error where the cannons blocked the path is fixed). I think its most impressive special skill is entrance blocking, as here. It forms the zealot block with smooth ease, and dissolves it just as easily when it wants to let units through. It’s nothing compared to ZerGreenBot’s special skills of reaver drop, zealot bombing, and overlord hunting, but Bereaver is much stronger overall.

Bereaver blocks its ramp with zealots

Bereaver plays a different strategy against ZZZKBot’s 4 pool. As soon as it scouts the rush, it blocks its entrance with probes. Here it blocked the entrance much too early, while the zerglings were still in their eggs, and lost mining time needlessly.

Bereaver blocks its entrance with probes

Then it cancels its gateway, starts a forge instead, and cannons its main. The plan seems successful. It’s slow, but who cares about that when you win? The probe block is tight. The lings tend to suffer pathfinding failure and wander, but may soon kill a probe to break through, as here. The probes fought back, as you can see by the blood.

ZZZKBot kills a probe to open the entrance

Then Bereaver pulls probes to defend its cannons and easily holds. With its probe skills, it is not in any danger whatever. After this it can build up and win at its leisure.

Bereaver defends its cannons with probes

AIIDE 2016 - upsets by player

Here is a list of the AIIDE 2016 players with upsets, cases in which a lower-ranked bot defeated a higher opponent. 15 of the 21 participants scored a total of 27 upsets out of 210 pairings. In most cases the upset was of an opponent only slightly ahead—the table doesn’t worry about the size of the upset. Xelnaga, well-positioned near the end of a run of bots with close scores, is the runaway upset champion; it also upset LetaBot 8 places ahead. To earn an upset you have to score worse overall, so another way to say it is that Xelnaga is the most inconsistent performer, with strengths and weaknesses that don’t always offset each other. The deepest upset is JiaBot’s upset of ZZZKBot, ranked 9 places ahead.

botupsets
2 ZZZKBot1 Iron
4 LetaBot3 tscmoo
5 UAlbertaBot3 tscmoo
7 Overkill2 ZZZKBot, 5 UAlbertaBot
8 Aiur4 LetaBot, 7 Overkill
9 MegaBot5 UAlbertaBot, 7 Overkill
10 IceBot6 Ximp, 9 MegaBot
11 JiaBot2 ZZZKBot, 9 MegaBot
12 Xelnaga4 LetaBot, 7 Overkill, 9 MegaBot, 10 IceBot, 11 JiaBot
13 Skynet11 JiaBot, 12 Xelnaga
14 GarmBot9 MegaBot, 12 Xelnaga
15 NUSBot8 Aiur, 13 Skynet
17 SRbotOne15 NUSBot
19 Oritaka14 GarmBot
20 CruzBot15 NUSBot

Upsets are interesting because they suggest weaknesses in the stronger bot. They’re clues that may point toward what’s important to fix. For example, the upsets of #2 ZZZKBot by much less successful zergs remind us that 4 pool is easy for zerg to counter; protoss and terran need specific knowledge to counter the rush, but zerg can play a standard 9 pool (which in ZvZ is also fine against other openings) as a hard counter.

AIIDE 2016 - upsets by map

I counted the number of upsets in the crosstables for each map. An upset is when a lower-ranked player defeats one who ends up at a higher rank. On any given map you should expect more upsets than in the tournament overall, which sums over the maps and smooths out differences.

mapupsets
(2)Benzene32
(2)Destination16
(2)HeartbreakRidge36
(3)Aztec29
(3)TauCross33
(4)Andromeda32
(4)CircuitBreaker37
(4)EmpireoftheSun33
(4)Fortress36
(4)Python30

I expected that more standard maps would have fewer upsets, but it didn’t turn out that way. Less standard Heartbreak Ridge is the only map to tease me by acting as if I understood. I’m a little surprised by the high upset counts on standard Circuit Breaker and Fortress and the lower counts on less standard Aztec. I am very surprised that Destination has half the upsets of other maps! I can’t explain it.

Another theory you might try is: Many bots are tuned on SSCAIT, so SSCAIT maps might show more solid play and fewer upsets. The non-SSCAIT maps here are only Aztec and Fortress, which don’t support the hypothesis (though there’s little evidence either way).

I thought that per-map upset rate might be a way to measure the strategic fragility of bots on maps that they weren’t tuned for, but it may also measure code fragility. Look into yesterday’s crosstables to see where today’s totals came from. Heartbreak Ridge has many upsets not so much because bots weren’t ready for it, but because NUSBot wasn’t ready for it. LetaBot could not defend Fortress. Only Circuit Breaker’s high rate of upsets was not helped along by any single bot—and Circuit Breaker is as standard as they come. So we’re not learning as much about map characteristics as about bot characteristics.

Can anybody explain why Destination was more stable and less upset-prone than other maps? That’s the big mystery here.

AIIDE 2016 - crosstables per map

AIIDE 2016 crosstables for each map in the tournament. First, the whole tournament crosstable, for reference. It should match the one on the official results page, though for some reason some of the overall win percentages differ in the second decimal (by a trivial amount). The official results are correct, so I may have some floating point error accumulation (an off-by-one error would cause a bigger discrepancy).

AIIDE 2016overallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron87.44%23%84%87%86%99%88%96%68%98%81%91%57%100%96%100%99%98%100%100%100%
ZZZKBot85.05%77%50%51%77%98%39%87%92%93%46%100%94%100%100%100%98%100%100%100%100%
tscmoo82.71%16%50%43%42%91%97%80%90%99%91%87%83%100%98%100%100%99%89%100%100%
LetaBot74.00%13%49%57%66%68%94%34%52%84%88%39%62%93%93%92%99%99%97%100%100%
UAlbertaBot70.37%14%23%58%34%76%42%80%47%77%71%100%79%81%83%99%86%59%99%100%100%
Ximp64.54%1%2%9%32%24%52%81%67%7%96%62%98%94%91%98%92%96%92%97%100%
Overkill61.93%12%61%3%6%58%48%30%34%79%52%24%62%94%97%96%98%94%92%100%100%
Aiur61.26%4%13%20%66%20%19%70%53%79%53%56%86%87%44%88%96%92%88%91%100%
MegaBot58.42%32%8%10%48%53%33%66%47%49%42%32%78%47%91%94%83%89%84%91%90%
IceBot57.43%2%7%1%16%23%93%21%21%51%56%36%61%61%100%100%100%99%100%100%100%
JiaBot57.25%19%54%9%12%29%4%48%47%58%44%49%37%98%89%90%82%99%86%91%100%
Xelnaga56.98%9%0%13%61%0%38%76%44%68%64%51%42%2%99%100%92%94%89%96%100%
Skynet55.03%43%6%17%38%21%2%38%14%22%39%63%58%100%39%100%100%100%100%100%100%
GarmBot42.52%0%0%0%7%19%6%6%13%53%39%2%98%0%84%84%100%93%47%99%100%
NUSBot27.46%4%0%2%7%17%9%3%56%9%0%11%1%61%16%57%34%61%81%30%90%
TerranUAB27.39%0%0%0%8%1%2%4%12%6%0%10%0%0%16%43%90%73%88%96%99%
SRbotOne22.46%1%2%0%1%14%8%2%4%17%0%18%8%0%0%66%10%53%88%57%100%
Cimex20.50%2%0%1%1%41%4%6%8%11%1%1%6%0%7%39%27%47%59%51%99%
Oritaka19.51%0%0%11%3%1%8%8%12%16%0%14%11%0%53%19%12%12%41%68%100%
CruzBot16.75%0%0%0%0%0%3%0%9%9%0%9%4%0%1%70%4%43%49%32%100%
Tyr1.11%0%0%0%0%0%0%0%0%10%0%0%0%0%0%10%1%0%1%0%0%

Now each of the 10 maps. Each cell represents only 9 games, occasionally fewer when games are missing.

(2)BenzeneoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron95.00%89%100%78%100%100%78%100%89%100%89%100%78%100%100%100%100%100%100%100%100%
ZZZKBot73.89%11%22%11%78%100%11%11%89%100%67%100%78%100%100%100%100%100%100%100%100%
tscmoo83.33%0%78%22%44%100%100%78%89%100%100%78%78%100%100%100%100%100%100%100%100%
LetaBot81.67%22%89%78%67%67%100%44%33%100%100%44%89%100%100%100%100%100%100%100%100%
UAlbertaBot64.80%0%22%56%33%44%22%88%44%78%67%100%78%78%89%89%67%56%89%100%100%
Ximp59.22%0%0%0%33%56%25%11%78%11%100%44%100%100%33%100%100%100%89%100%100%
Overkill68.93%22%89%0%0%78%75%22%44%89%78%22%100%89%100%89%100%100%89%100%100%
Aiur72.07%0%89%22%56%12%89%78%78%67%78%78%89%100%33%100%89%89%100%89%100%
MegaBot56.11%11%11%11%67%56%22%56%22%22%44%22%89%56%100%89%78%100%100%67%100%
IceBot60.00%0%0%0%0%22%89%11%33%78%67%33%100%67%100%100%100%100%100%100%100%
JiaBot52.78%11%33%0%0%33%0%22%22%56%33%67%56%100%67%100%89%100%89%78%100%
Xelnaga56.67%0%0%22%56%0%56%78%22%78%67%33%67%0%100%100%100%89%89%78%100%
Skynet45.81%22%22%22%11%22%0%0%11%11%0%44%33%100%11%100%100%100%100%100%100%
GarmBot42.78%0%0%0%0%22%0%11%0%44%33%0%100%0%78%100%100%100%67%100%100%
NUSBot31.11%0%0%0%0%11%67%0%67%0%0%33%0%89%22%22%33%67%100%11%100%
TerranUAB27.22%0%0%0%0%11%0%11%0%11%0%0%0%0%0%78%100%56%89%89%100%
SRbotOne20.56%0%0%0%0%33%0%0%11%22%0%11%0%0%0%67%0%44%100%22%100%
Cimex22.78%0%0%0%0%44%0%0%11%0%0%0%11%0%0%33%44%56%67%89%100%
Oritaka14.44%0%0%0%0%11%11%11%0%0%0%11%11%0%33%0%11%0%33%56%100%
CruzBot21.23%0%0%0%0%0%0%0%11%33%0%22%22%0%0%89%11%78%11%44%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(2)DestinationoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron91.67%22%89%67%89%100%100%100%78%100%89%100%100%100%100%100%100%100%100%100%100%
ZZZKBot89.44%78%56%56%100%100%44%100%100%100%67%100%89%100%100%100%100%100%100%100%100%
tscmoo85.47%11%44%44%67%100%89%67%100%100%89%100%100%100%100%100%100%100%100%100%100%
LetaBot84.44%33%44%56%67%89%100%56%89%100%78%78%100%100%100%100%100%100%100%100%100%
UAlbertaBot68.33%11%0%33%33%56%11%78%78%78%67%100%100%89%89%100%78%67%100%100%100%
Ximp70.00%0%0%0%11%44%78%100%78%11%89%89%100%100%100%100%100%100%100%100%100%
Overkill63.89%0%56%11%0%89%22%44%67%78%67%33%56%78%89%100%100%89%100%100%100%
Aiur61.67%0%0%33%44%22%0%56%67%67%33%100%100%89%56%100%100%78%89%100%100%
MegaBot50.56%22%0%0%11%22%22%33%33%11%67%22%89%11%89%89%100%89%100%100%100%
IceBot60.56%0%0%0%0%22%89%22%33%89%56%33%89%89%100%100%100%89%100%100%100%
JiaBot56.67%11%33%11%22%33%11%33%67%33%44%67%22%100%78%89%78%100%100%100%100%
Xelnaga51.11%0%0%0%22%0%11%67%0%78%67%33%67%11%100%100%78%100%89%100%100%
Skynet46.67%0%11%0%0%0%0%44%0%11%11%78%33%100%44%100%100%100%100%100%100%
GarmBot43.89%0%0%0%0%11%0%22%11%89%11%0%89%0%67%78%100%100%100%100%100%
NUSBot35.00%0%0%0%0%11%0%11%44%11%0%22%0%56%33%89%100%78%78%67%100%
TerranUAB24.44%0%0%0%0%0%0%0%0%11%0%11%0%0%22%11%89%78%78%89%100%
SRbotOne21.79%0%0%0%0%22%0%0%0%0%0%22%22%0%0%0%11%78%100%78%100%
Cimex17.78%0%0%0%0%33%0%11%22%11%11%0%0%0%0%22%22%22%67%33%100%
Oritaka13.89%0%0%0%0%0%0%0%11%0%0%0%11%0%0%22%22%0%33%78%100%
CruzBot12.78%0%0%0%0%0%0%0%0%0%0%0%0%0%0%33%11%22%67%22%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(2)Heartbreak RidgeoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron88.89%22%89%89%89%100%100%100%33%100%89%78%89%100%100%100%100%100%100%100%100%
ZZZKBot87.78%78%78%67%78%100%44%100%100%100%11%100%100%100%100%100%100%100%100%100%100%
tscmoo80.56%11%22%44%56%78%100%67%89%100%89%89%89%100%100%100%100%100%78%100%100%
LetaBot72.22%11%33%56%78%44%100%22%11%89%100%0%100%100%100%100%100%100%100%100%100%
UAlbertaBot70.00%11%22%44%22%56%44%89%56%78%56%100%89%100%100%100%67%67%100%100%100%
Ximp73.33%0%0%22%56%44%67%100%78%11%100%89%100%100%100%100%100%100%100%100%100%
Overkill51.18%0%56%0%0%56%33%0%12%83%22%12%22%100%100%89%89%89%88%100%100%
Aiur68.89%0%0%33%78%11%0%100%67%100%78%78%100%78%100%78%100%100%78%100%100%
MegaBot54.19%67%0%11%89%44%22%88%33%100%44%22%11%44%100%100%100%100%22%89%0%
IceBot54.80%0%0%0%11%22%89%17%0%0%56%11%89%89%100%100%100%100%100%100%100%
JiaBot58.33%11%89%11%0%44%0%78%22%56%44%44%22%100%100%100%67%100%89%89%100%
Xelnaga60.89%22%0%11%100%0%11%88%22%78%89%56%67%0%100%100%100%100%89%89%100%
Skynet56.11%11%0%11%0%11%0%78%0%89%11%78%33%100%100%100%100%100%100%100%100%
GarmBot38.33%0%0%0%0%0%0%0%22%56%11%0%100%0%100%78%100%100%11%89%100%
NUSBot0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
TerranUAB30.56%0%0%0%0%0%0%11%22%0%0%0%0%0%22%100%89%78%100%89%100%
SRbotOne22.22%0%0%0%0%33%0%11%0%0%0%33%0%0%0%100%11%44%67%44%100%
Cimex21.67%0%0%0%0%33%0%11%0%0%0%0%0%0%0%100%22%56%56%56%100%
Oritaka28.49%0%0%22%0%0%0%12%22%78%0%11%11%0%89%100%0%33%44%44%100%
CruzBot20.79%0%0%0%0%0%0%0%0%11%0%11%11%0%11%100%11%56%44%56%100%
Tyr10.06%0%0%0%0%0%0%0%0%100%0%0%0%0%0%100%0%0%0%0%0%
(3)AztecoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron91.11%11%78%89%100%100%89%100%89%100%89%89%89%100%100%100%100%100%100%100%100%
ZZZKBot85.56%89%56%44%89%100%44%78%89%100%22%100%100%100%100%100%100%100%100%100%100%
tscmoo81.67%22%44%22%56%78%100%67%89%100%100%100%67%100%89%100%100%100%100%100%100%
LetaBot78.33%11%56%78%44%89%100%33%56%100%100%11%100%100%89%100%100%100%100%100%100%
UAlbertaBot65.56%0%11%44%56%100%11%44%56%56%89%100%78%56%89%100%89%33%100%100%100%
Ximp62.78%0%0%22%11%0%67%100%67%0%78%89%100%89%100%89%78%78%100%89%100%
Overkill60.00%11%56%0%0%89%33%22%11%100%56%11%67%89%100%100%100%100%56%100%100%
Aiur63.89%0%22%33%67%56%0%78%67%44%44%67%89%89%67%89%89%100%89%89%100%
MegaBot60.00%11%11%11%44%44%33%89%33%67%22%11%89%56%100%100%100%89%100%89%100%
IceBot59.44%0%0%0%0%44%100%0%56%33%44%33%100%78%100%100%100%100%100%100%100%
JiaBot60.00%11%78%0%0%11%22%44%56%78%56%56%56%89%78%89%100%100%100%78%100%
Xelnaga57.78%11%0%0%89%0%11%89%33%89%67%44%33%0%100%100%100%89%100%100%100%
Skynet48.89%11%0%33%0%22%0%33%11%11%0%44%67%100%44%100%100%100%100%100%100%
GarmBot45.00%0%0%0%0%44%11%11%11%44%22%11%100%0%100%100%100%100%44%100%100%
NUSBot22.22%0%0%11%11%11%0%0%33%0%0%22%0%56%0%11%22%44%89%33%100%
TerranUAB27.78%0%0%0%0%0%11%0%11%0%0%11%0%0%0%89%89%67%89%89%100%
SRbotOne23.89%0%0%0%0%11%22%0%11%0%0%0%0%0%0%78%11%67%100%78%100%
Cimex24.44%0%0%0%0%67%22%0%0%11%0%0%11%0%0%56%33%33%78%78%100%
Oritaka16.67%0%0%0%0%0%0%44%11%0%0%0%0%0%56%11%11%0%22%78%100%
CruzBot15.00%0%0%0%0%0%11%0%11%11%0%22%0%0%0%67%11%22%22%22%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(3)Tau CrossoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron86.67%22%100%89%67%100%89%89%44%100%78%67%89%100%100%100%100%100%100%100%100%
ZZZKBot83.80%78%44%12%89%100%22%89%100%100%44%100%100%100%100%100%89%100%100%100%100%
tscmoo84.44%0%56%44%78%89%89%100%100%100%100%78%89%100%100%100%100%100%67%100%100%
LetaBot76.70%11%88%56%78%44%100%44%33%89%100%22%89%88%100%100%100%100%100%100%100%
UAlbertaBot67.22%33%11%22%22%44%56%78%22%67%67%100%78%100%56%100%100%89%100%100%100%
Ximp70.56%0%0%11%56%56%56%78%67%11%100%78%100%100%100%100%100%100%100%100%100%
Overkill59.78%11%78%11%0%44%44%22%33%56%33%44%44%100%100%89%100%89%89%100%100%
Aiur61.67%11%11%0%56%22%22%78%67%89%56%67%100%78%44%67%100%100%78%89%100%
MegaBot63.89%56%0%0%67%78%33%67%33%56%44%33%89%56%89%100%100%89%89%100%100%
IceBot58.89%0%0%0%11%33%89%44%11%44%67%44%89%44%100%100%100%100%100%100%100%
JiaBot53.63%22%56%0%0%33%0%67%44%56%33%33%11%100%100%67%78%100%78%89%100%
Xelnaga56.11%33%0%22%78%0%22%56%33%67%56%67%22%0%100%100%100%100%67%100%100%
Skynet50.56%11%0%11%11%22%0%56%0%11%11%89%78%100%11%100%100%100%100%100%100%
GarmBot40.78%0%0%0%12%0%0%0%22%44%56%0%100%0%67%78%100%100%33%100%100%
NUSBot34.44%0%0%0%0%44%0%0%56%11%0%0%0%89%33%100%33%100%89%33%100%
TerranUAB26.67%0%0%0%0%0%0%11%33%0%0%33%0%0%22%0%100%67%67%100%100%
SRbotOne17.78%0%11%0%0%0%0%0%0%0%0%22%0%0%0%67%0%67%78%11%100%
Cimex15.56%0%0%0%0%11%0%11%0%11%0%0%0%0%0%0%33%33%56%56%100%
Oritaka25.56%0%0%33%0%0%0%11%22%11%0%22%33%0%67%11%33%22%44%100%100%
CruzBot16.11%0%0%0%0%0%0%0%11%0%0%11%0%0%0%67%0%89%44%0%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(4)AndromedaoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron82.78%22%56%78%78%89%100%100%56%100%67%100%11%100%100%100%100%100%100%100%100%
ZZZKBot82.78%78%33%22%67%100%67%100%78%67%56%100%89%100%100%100%100%100%100%100%100%
tscmoo85.56%44%67%44%33%89%100%100%89%89%100%78%78%100%100%100%100%100%100%100%100%
LetaBot81.11%22%78%56%89%78%100%33%78%89%100%89%33%89%100%100%100%100%89%100%100%
UAlbertaBot73.33%22%33%67%11%100%56%89%33%89%78%100%78%78%89%100%89%56%100%100%100%
Ximp60.56%11%0%11%22%0%56%89%89%0%100%67%89%78%100%100%78%100%44%78%100%
Overkill62.78%0%33%0%0%44%44%44%44%78%78%67%44%100%100%100%89%89%100%100%100%
Aiur55.56%0%0%0%67%11%11%56%33%89%56%44%78%78%11%89%100%100%89%100%100%
MegaBot57.22%44%22%11%22%67%11%56%67%22%33%56%78%33%78%100%78%89%89%89%100%
IceBot57.22%0%33%11%11%11%100%22%11%78%44%67%22%33%100%100%100%100%100%100%100%
JiaBot55.00%33%44%0%0%22%0%22%44%67%56%56%33%100%89%78%89%100%67%100%100%
Xelnaga45.00%0%0%22%11%0%33%33%56%44%33%44%44%0%100%100%67%67%56%89%100%
Skynet62.78%89%11%22%67%22%11%56%22%22%78%67%56%100%33%100%100%100%100%100%100%
GarmBot46.11%0%0%0%11%22%22%0%22%67%67%0%100%0%89%78%100%100%44%100%100%
NUSBot29.44%0%0%0%0%11%0%0%89%22%0%11%0%67%11%78%11%100%78%11%100%
TerranUAB28.89%0%0%0%0%0%0%0%11%0%0%22%0%0%22%22%100%100%100%100%100%
SRbotOne25.56%0%0%0%0%11%22%11%0%22%0%11%33%0%0%89%0%33%89%89%100%
Cimex17.78%0%0%0%0%44%0%11%0%11%0%0%33%0%0%0%0%67%33%56%100%
Oritaka25.56%0%0%0%11%0%56%0%11%11%0%33%44%0%56%22%0%11%67%89%100%
CruzBot15.00%0%0%0%0%0%22%0%0%11%0%0%11%0%0%89%0%11%44%11%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(4)Circuit BreakeroverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron85.00%44%100%89%89%100%67%100%67%89%78%100%11%100%78%100%89%100%100%100%100%
ZZZKBot83.33%56%67%44%56%89%44%100%89%78%44%100%100%100%100%100%100%100%100%100%100%
tscmoo79.44%0%33%22%33%100%100%89%78%100%67%89%78%100%100%100%100%100%100%100%100%
LetaBot81.67%11%56%78%67%89%100%33%89%100%100%78%33%100%100%100%100%100%100%100%100%
UAlbertaBot70.56%11%44%67%33%89%22%89%22%78%78%100%78%67%89%100%100%44%100%100%100%
Ximp56.67%0%11%0%11%11%33%56%44%0%100%44%100%89%100%89%67%78%100%100%100%
Overkill62.78%33%56%0%0%78%67%0%11%78%67%11%56%100%100%100%100%100%100%100%100%
Aiur60.56%0%0%11%67%11%44%100%44%89%22%33%89%100%44%89%89%100%100%78%100%
MegaBot64.44%33%11%22%11%78%56%89%56%67%56%33%67%67%89%100%67%89%100%100%100%
IceBot53.33%11%22%0%0%22%100%22%11%33%44%33%33%33%100%100%100%100%100%100%100%
JiaBot56.67%22%56%33%0%22%0%33%78%44%56%22%22%89%100%89%100%100%78%89%100%
Xelnaga58.33%0%0%11%22%0%56%89%67%67%67%78%22%0%100%100%89%100%100%100%100%
Skynet62.22%89%0%22%67%22%0%44%11%33%67%78%78%100%33%100%100%100%100%100%100%
GarmBot42.78%0%0%0%0%33%11%0%0%33%67%11%100%0%100%89%100%100%11%100%100%
NUSBot31.67%22%0%0%0%11%0%0%56%11%0%0%0%67%0%67%67%56%100%78%100%
TerranUAB27.22%0%0%0%0%0%11%0%11%0%0%11%0%0%11%33%89%78%100%100%100%
SRbotOne21.67%11%0%0%0%0%33%0%11%33%0%0%11%0%0%33%11%33%89%67%100%
Cimex19.44%0%0%0%0%56%22%0%0%11%0%0%0%0%0%44%22%67%22%44%100%
Oritaka17.78%0%0%0%0%0%0%0%0%0%0%22%0%0%89%0%0%11%78%56%100%
CruzBot14.44%0%0%0%0%0%0%0%22%0%0%11%0%0%0%22%0%33%56%44%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(4)Empire of the SunoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron86.11%0%100%89%67%100%89%100%100%100%89%89%11%100%89%100%100%100%100%100%100%
ZZZKBot85.00%100%22%67%67%89%33%89%89%89%56%100%100%100%100%100%100%100%100%100%100%
tscmoo81.67%0%78%44%0%100%100%78%100%100%89%67%89%100%100%100%100%100%89%100%100%
LetaBot72.78%11%33%56%44%78%100%56%67%89%100%22%22%100%78%100%100%100%100%100%100%
UAlbertaBot76.11%33%33%100%56%100%78%100%44%78%67%100%78%67%56%100%100%33%100%100%100%
Ximp63.89%0%11%0%22%0%56%89%56%0%89%56%100%100%100%100%100%100%100%100%100%
Overkill58.33%11%67%0%0%22%44%11%33%56%56%11%78%89%100%89%100%100%100%100%100%
Aiur56.67%0%11%22%44%0%11%89%56%78%56%22%89%89%22%89%100%67%89%100%100%
MegaBot51.67%0%11%0%33%56%44%67%44%56%22%22%78%11%100%89%78%67%67%89%100%
IceBot53.33%0%11%0%11%22%100%44%22%44%33%11%22%44%100%100%100%100%100%100%100%
JiaBot60.56%11%44%11%0%33%11%44%44%78%67%67%67%100%100%100%78%100%67%89%100%
Xelnaga63.33%11%0%33%78%0%44%89%78%78%89%33%22%11%100%100%100%100%100%100%100%
Skynet59.44%89%0%11%78%22%0%22%11%22%78%33%78%100%44%100%100%100%100%100%100%
GarmBot43.89%0%0%0%0%33%0%11%11%89%56%0%89%0%89%89%100%89%22%100%100%
NUSBot27.78%11%0%0%22%44%0%0%78%0%0%0%0%56%11%44%56%44%78%11%100%
TerranUAB28.33%0%0%0%0%0%0%11%11%11%0%0%0%0%11%56%67%100%100%100%100%
SRbotOne21.11%0%0%0%0%0%0%0%0%22%0%22%0%0%0%44%33%67%89%44%100%
Cimex21.67%0%0%0%0%67%0%0%33%33%0%0%0%0%11%56%0%33%56%44%100%
Oritaka21.11%0%0%11%0%0%0%0%11%33%0%33%0%0%78%22%0%11%44%78%100%
CruzBot17.22%0%0%0%0%0%0%0%0%11%0%11%0%0%0%89%0%56%56%22%100%
Tyr0.00%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%
(4)FortressoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron81.11%0%33%100%78%100%78%89%56%89%67%100%67%100%89%100%100%78%100%100%100%
ZZZKBot88.33%100%56%100%67%100%22%100%89%100%44%100%89%100%100%100%100%100%100%100%100%
tscmoo86.11%67%44%89%33%89%100%78%78%100%89%89%78%100%89%100%100%100%100%100%100%
LetaBot39.44%0%0%11%78%0%44%0%22%0%0%22%0%56%78%22%89%89%78%100%100%
UAlbertaBot75.00%22%33%67%22%100%78%89%33%78%89%100%78%100%78%100%78%56%100%100%100%
Ximp63.89%0%0%11%100%0%22%89%44%11%100%56%89%89%78%100%100%100%89%100%100%
Overkill68.89%22%78%0%56%22%78%78%22%89%22%22%89%100%100%100%100%100%100%100%100%
Aiur55.56%11%0%22%100%11%11%22%22%78%56%33%67%78%44%89%89%100%100%78%100%
MegaBot70.00%44%11%22%78%67%56%78%78%56%33%67%89%78%89%89%89%89%89%100%100%
IceBot61.11%11%0%0%100%22%89%11%22%44%56%33%44%89%100%100%100%100%100%100%100%
JiaBot62.78%33%56%11%100%11%0%78%44%67%44%44%33%100%78%89%67%100%100%100%100%
Xelnaga56.67%0%0%11%78%0%44%78%67%33%67%56%22%0%89%100%89%100%100%100%100%
Skynet59.44%33%11%22%100%22%11%11%33%11%56%67%78%100%33%100%100%100%100%100%100%
GarmBot39.44%0%0%0%44%0%11%0%22%22%11%0%100%0%56%67%100%56%100%100%100%
NUSBot33.89%11%0%11%22%22%22%0%56%11%0%22%11%67%44%67%22%56%100%33%100%
TerranUAB29.44%0%0%0%78%0%0%0%11%11%0%11%0%0%33%33%89%56%67%100%100%
SRbotOne23.89%0%0%0%11%22%0%0%11%11%0%33%11%0%0%78%11%56%67%67%100%
Cimex23.89%22%0%0%11%44%0%0%0%11%0%0%0%0%44%44%44%44%100%22%89%
Oritaka11.67%0%0%0%22%0%11%0%0%11%0%0%0%0%0%0%33%33%0%22%100%
CruzBot18.89%0%0%0%0%0%0%0%22%0%0%0%0%0%0%67%0%33%78%78%100%
Tyr0.56%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%11%0%0%
(4)PythonoverallIronZZZKtscmLetaUAlbXimpOverAiurMegaIceBJiaBXelnSkynGarmNUSBTerrSRboCimeOritCruzTyr
Iron86.11%0%100%100%100%100%89%78%67%100%78%89%22%100%100%100%100%100%100%100%100%
ZZZKBot90.56%100%67%78%78%100%56%100%100%100%44%100%100%100%100%100%89%100%100%100%100%
tscmoo78.89%0%33%56%22%89%89%78%89%100%89%100%89%100%100%100%100%89%56%100%100%
LetaBot71.67%0%22%44%44%100%100%22%44%89%100%22%56%100%89%100%100%100%100%100%100%
UAlbertaBot72.78%0%22%78%56%67%44%56%78%89%56%100%56%78%100%100%89%89%100%100%100%
Ximp64.44%0%0%11%0%33%56%100%67%11%100%11%100%100%100%100%100%100%100%100%100%
Overkill62.22%11%44%11%0%56%44%56%56%89%44%0%67%100%78%100%100%89%100%100%100%
Aiur56.11%22%0%22%78%44%0%44%33%89%56%33%56%89%22%89%100%89%67%89%100%
MegaBot56.11%33%0%11%56%22%33%44%67%33%56%33%100%56%78%89%44%89%89%89%100%
IceBot55.56%0%0%0%11%11%89%11%11%67%89%56%22%44%100%100%100%100%100%100%100%
JiaBot56.11%22%56%11%0%44%0%56%44%44%11%33%44%100%100%100%78%89%89%100%100%
Xelnaga63.89%11%0%0%78%0%89%100%67%67%44%67%56%0%100%100%100%100%100%100%100%
Skynet58.33%78%0%11%44%44%0%33%44%0%78%56%44%100%33%100%100%100%100%100%100%
GarmBot42.22%0%0%0%0%22%0%0%11%44%56%0%100%0%100%89%100%89%33%100%100%
NUSBot28.89%0%0%0%11%0%0%22%78%22%0%0%0%67%0%89%0%67%100%22%100%
TerranUAB23.33%0%0%0%0%0%0%0%11%11%0%0%0%0%11%11%89%56%89%100%89%
SRbotOne26.11%0%11%0%0%11%0%0%0%56%0%22%0%0%0%100%11%44%100%67%100%
Cimex20.00%0%0%11%0%11%0%11%11%11%0%11%0%0%11%33%44%56%56%33%100%
Oritaka20.00%0%0%44%0%0%0%0%33%11%0%11%0%0%67%0%11%0%44%78%100%
CruzBot16.11%0%0%0%0%0%0%0%11%11%0%0%0%0%0%78%0%33%67%22%100%
Tyr0.56%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%11%0%0%0%0%

I could look at these charts all day and keep finding stuff to think about. Winner Iron was upset by ZZZKBot and scored overwhelmingly against every other bot except the two dragoon-heavy protosses MegaBot and Skynet, which earned upsets on some maps. ZZZKBot’s results fall with rush distance, closely in line with Martin Rooijacker’s explanation. LetaBot did well on most maps, but struggled on Fortress. NUSBot apparently crashed on Heartbreak Ridge, giving Tyr nearly half of its wins. Weak bot Cimex did surprisingly well against strong UAlbertaBot on every map except Tau Cross and Python. Why is that?

AIIDE 2016 Bayesian Elo ratings

Again I have Elo as calculated by Remi Coulom’s bayeselo program. The # column gives the official ranking, so you can see how it differs from the rank by Elo (the bayeselo ranking is slightly more accurate because it takes into account all the information in the tournament results, not only the raw winning rate). I left out the 95% confidence interval column as relatively uninteresting, since the “better?” column tells us how likely each bot is to be superior to the one below it.

#botscoreElobetter?
1Iron87%201699.4%
2ZZZKBot85%197499.6%
3tscmoo zerg83%193299.9%
4LetaBot74%181599.8%
5UAlbertaBot70%177499.9%
6Ximp65%169999.6%
8Aiur61%166351.6%
7Overkill62%166399.6%
9MegaBot58%162788.5%
10IceBot57%161157.5%
12Xelnaga57%160850.0%
11JiaBot57%160898.1%
13Skynet55%1581100%
14GarmBot43%1441100%
16TerranUAB27%125074.7%
15NUSBot27%124099.9%
17SRbotOne22%116799.0%
18Cimex21%113092.6%
19Oritaka20%110699.3%
20CruzBot17%1064100%
21Tyr1%533-

There are some switches from the official ranking, due to bots being statistically indistinguishable. Overkill and Aiur are in a dead heat. IceBot (terran), Xelnaga (protoss) and JiaBot (zerg) are also virtually even. bayeselo gives IceBot a 57.6% chance of being better than JiaBot two ranks down, essentially the same as its 57.5% chance of being better than Xelnaga one rank down.

Tomorrow: The per-map crosstables.

AIIDE 2016 results discussion

The AIIDE 2016 results are out. The top finishers, in order, are Iron, ZZZKBot, Tscmoo zerg, Letabot—half terran and half zerg. As usual, Martin Rooijackers sent me a few observations. Here are my thoughts so far.

• The tournament was played on the same maps as last year. It doesn’t call for that in the rules. We know that they’re OK maps, though.

• Martin Rooijackers says that all four top finishers were updated in between the CIG 2016 entry deadline and the AIIDE deadline a month later. [This turned out to be wrong. See the comments. I misread Martin Rooijacker’s message—he wrote that #1 Iron and #2 ZZZKBot had been updated.] You need continual work to stay at the top. It’s fierce up there.

• In particular, ZZZKBot moved up to #2, with minus scores only against fellow zergs Overkill and JiaBot. I’m curious to know what the improvements are.

• Tscmoo zerg scored even against ZZZKBot (a poor performance; zerg should easily beat the 4 pool), and worse than even against the rest of the top 5. Tscmoo owed its #3 finish to terrorizing its inferiors. Martin Rooijackers took this to mean that terran dominated. But with 2 of the top 4, I have to say that zerg is hanging in there.

• In last year’s AIIDE, UAlbertaBot and Overkill were neck-and-neck. The Bayesian Elo calculation said that UAlbertaBot was about 60% likely to be stronger, a narrow margin. Neither has been updated since. [This was also wrong! See comments.] In CIG 2016, Overkill was ahead. In this tournament, UAlbertaBot finished well ahead. These two can’t make up their minds! Apparently this time UAlbertaBot with its rush builds was better able to cope with the changing field.

• XIMP made it to #6. On SSCAIT its last update was February 2015, and I believe it was a minor adjustment to timings. The carrier strategy seems resilient. By contrast IceBot, AIIDE and CIG champion in 2014 ahead of XIMP, was in the middle of the pack. All its smarts did not make it resilient against improvements in its enemies.

• “JiaBot” in AIIDE is probably the same as “Ziabot” in CIG and “Zia bot” in SSCAIT. In some languages, the same sound can transliterate into English as either J or Z. Looking at the Ziabot source from CIG 2016, I think that the author is Korean based on variable names.

• Xelnaga finished ahead of Skynet, which surprised me. In CIG, XelnagaII was near the bottom. It seems a surprisingly sudden change.

• Tyr finished dead last by a mile, with a 1% win rate which seems so low that it must be due to bugs. Tyr had major updates this year which left it fairly strong. It finished in the middle of the pack in CIG 2016. I imagine it was updated at the last minute with a grievous error.

Tomorrow: The Elo table. I’ve done the calculations, but I have to draw up the table.