tournaments - 16 | Starcraft AI blog

experience on the BWAPI bots ladder

I like the BWAPI ladder. It doesn’t seem to have an official name; I’ll just call it “the ladder”.

I’ve enjoyed following the games. I play over most of Steamhammer’s games on the ladder every day. The ladder makes random pairings, so it feeds me a wider variety of opponents and I see more strengths and weaknesses. Also the ladder plays more games in total, because it plays games at full speed, not slowed down for streaming.

Providing accurating rankings and elo is the primary purpose of the ladder, at least as I see it. With a diet of randomly chosen opponents, Steamhammer’s rank stabilized at #8, behind CherryPi and ahead of TyrProtoss, with elo steady in the 2230s. On SSCAIT, the same version’s ranking is not stable—it has varied by over a factor of 2, with the elo tending to rise into the low 2200s, then fall rapidly, then slowly rise again, depending on the whims of the voters. When the voters see Steamhammer with a high rank, they tend to pair it against opponents that it will lose rating points to; after it has lost the rating points, they tend to pay less attention. Steamhammer ends up below its equilibrium elo, and the popular opponents that defeat it end up overrated. The ladder pairs bots fairly, so it better predicts tournament performance.

Randomhammer’s rank can’t be compared across competitions so neatly, though, because the competitions treat random players differently. The difference in rules makes it less useful for predicting the tournament performance of a random player.

File I/O seems to work a little differently in each competition. They are all based on Dave Churchill’s tournament manager software, but each competition uses a different version or tweaks it differently, and the behavior is not exactly the same. They all share in common a read directory and a write directory, with read-only access to read and write-only to write, and copy the contents of write to read. They differ in whether and/or when they clear directories. AIIDE proceeds in all-play-all rounds, and clears write at the end of each round of many games. SSCAIT and the ladder proceed by single games, and can’t do exactly the same thing. I believe that SSCAIT never clears write. I don’t know what the ladder does, but it has different behavior, and Steamhammer’s code doesn’t work correctly.

Steamhammer’s problem, I saw immediately when I requested and received the stored data, is that its record of games against each opponent only extends back 1 game. Instead of the whole history, the opponent model has to draw conclusions based on the one previous game. Data is being cleared at some point; perhaps write is cleared before each game. Steamhammer appends data to the opponent’s file after each game, which works on SSCAIT. I think if I change it to rewrite the entire data file (originally read from read), instead of only appending the new game record, it will work everywhere, including the ladder and the AIIDE tournament. I won’t know for sure until it happens, though, because the details are not documented. The change will be in the next version, 1.4.2.

Call it a bug in Steamhammer. The bug means that Steamhammer’s rank and elo can’t be compared between SSCAIT and the ladder, even though the opponents are mostly the same in both. It’s possible that Steamhammer plays better with the bug, so its higher rank on the ladder is justified. The point about the stability of the rank stands, though.

BWAPI bots ladder

The annual AIIDE, CIG, and SSCAIT tournaments give us snapshots of bot strength at particular moments in time. The SSCAIT ladder used to let us gauge strength at other times as well, but then voting was added, and voting distorts the ratings so that we only get a general idea. To add voting made sense, because it helps keep people engaged, and that (as I see it) is SSCAIT’s main goal. Still, we lost a useful ability.

To me, following Steamhammer’s results, the elo distortion caused by voting is unmissable. Since November or earlier, and except during the tournament when voting wasn’t allowed, every time Steamhammer’s rating rose to around 2200, the voters soon took interest and fed it a sequence of opponents that it would lose to, hammering the rating back down. Then voters lost interest and its rating gradually rose again. Watching this process helped me make my correct prediction that Steamhammer would finish in the SSCAIT tournament between places #4 to #8, even though the bot could not maintain such a high rank outside the tournament.

BWAPI bots ladder fills the gap. (It doesn’t seem to have an official name.) There is no voting or streaming; it only plays games. Since it doesn’t slow down games to make them watchable, it also plays more games. The larger amount of data should also improve the ratings (though it depends on the K factor in the elo calculation).

The UI is sparse. Presumably the project is in an early stage of development, just finished enough to make public. The web page doesn’t provide information about plans. I’ve sent a list of questions to the contact address.

A few points:

• The ladder seems to have been pre-populated with SSCAIT bots. It is a decision that can be questioned: The bot authors did not give permission for this public use. An author who doesn’t like it can write to the contact address to have their bot removed.

• The maps are the SSCAIT maps.

• Minimal public information. The bot names and game results and replays are made public; nothing else is revealed. It is an intentional choice. I don’t even see a way to find out the upload time of a bot, so that you can try to distinguish versions.

• Random bots are allowed, but the tournament manager chooses the race so that the opponent knows it when the game starts. This is different from playing random. It reflects the opinion that playing random gives an unfair advantage.

I personally disagree with the treatment of random players. I think that decisions about fair balance should be made on the basis of data, not argument, and that we don’t have the data. One issue is that a random bot is more difficult to create (no matter whether it is taught its knowledge by hand or by machine learning), which you could take as counterbalancing any advantage it might gain by playing random. Another issue is that a bot which wants to play against random opponents is not able to on this ladder. Of course, in the end the organizers are doing the work, and they get to make the decisions.

Overall, the appearance of a new ladder is a good sign of the health of the community. It fills a gap: It provides a better continuous measure of how well different bots are doing than we have had before. Those who like the design decisions may prefer it over other competitions, and those who don’t also gain by living in a richer world.

island map Sparkle

One of the maps proposed for the Afreeca Star League season 5 (ASL5) is an island map: Sparkle. In my 2016 post on novelty maps I wondered whether it was possible to balance an island map using modern map-balancing tricks. Sparkle is a serious attempt to do that, and I’m interested. At the moment, the public is voting to help decide which maps will be included in the tournament.

SSCAIT 2017 round of 8 remarks

There’s a striking pattern in the SSCAIT round of 8 that I want to point out. First, the video and the Liquipedia page. I’ll discuss the results. The round was made up of 6 newer and frequently updated bots, which were all paired against each other, and 2 old hands which were paired together. Each pair played a best of 5 match.

The old hands were Killerbot and XIMP, not updated for this tournament. They played a balanced match which was decided by strategy and tactics and fine details of play which the bots didn’t take into account. There weren’t any obvious or decisive bugs, it was about good everyday play.

Taking the rest of the matches from the top, Iron-Microwave went to Microwave because the zerg bot was able to break the terran wall. Iron knew how to repair its wall and made marines to defend it. But the marines were afraid of the zerglings which could not attack them through the wall, and did not shoot. I suppose that the combat simulator doesn’t understand the wall and says “uh oh, we’ll lose if the melee units get close, keep a safe distance!” Microwave won by exploiting a bug in Iron.

Steamhammer swept Arrakhammer 3-0 with 3 zergling builds in a row. Nepeta didn’t point it out in the video, but in each game Arrakhammer’s own zerglings fought piecemeal, inefficiently engaging with a partial force and then retreating, suffering more damage than they dealt. Steamhammer won by exploiting a bug in Arrakhammer.

CherryPi beat McRave 3-2 in a close match. McRave showed strategy weaknesses which deserve some of the blame. Nepeta’s points in the video are valid: Expand earlier, get +1 attack for the zealots to counter zerglings (since with +1 a zealot kills an unupgraded zergling in 2 hits instead of 3, a giant difference), and don’t cower in your own base when you play an aggressive 2 gate opening (you should never be behind in units for long). But McRave’s biggest weakness was that its high templar usually did not cast psionic storm, and seemed happy to suicide themselves. CherryPi won by exploiting the bug. In one game that CherryPi lost, zerg did not build the macro hatcheries it needed to keep up its zergling-heavy unit mix, and CherryPi’s mineral bank grew into the thousands. McRave won that game by exploiting a bug in CherryPi.

Two conclusions are loud and clear. 1. 6 of the 8 participants that made it this far are frequently updated and fast evolving. Only 2 old timers could keep up. The hard work to make many improvements pays off. 2. The same frequent updates leave bots vulnerable to bugs. The old hands were solid (at least they looked solid this time), and the fast movers had fragile spots.

I’m not sure there are any lessons for bot authors, other than “hard work pays off” and “fix the worst problems first,” both of which we already knew. The pattern in the results was so striking that I couldn’t ignore it.

Next: Steamhammer 1.4 change list.

SSCAIT 2017 is complete, submissions are open again

SSCAIT has reopened for submissions. I uploaded Steamhammer 1.4 (including a copy to play as Randomhammer). Its web page should be updated with the source tomorrow, and the day after I’ll post the change list here.

why is SSCAIT replaying tournament matches?

Why is the SSCAIT tournament re-running so many games in the elimination phase? There may be discussion of this on Facebook or somewhere, but I haven’t seen it. (I don’t use Facebook at all, because I don’t want to support their world domination plans. It would interfere with my world domination plans.)

I keep seeing games come up that are clearly tournament matches—then, later, the same matchups appear again. It looks exactly as though tournament matches are being replayed. Which games will be declared official?

They did the same thing on a smaller scale last year, and it caused some controversy. See the comments to the post Steamhammer vs LetaBot, SSCAIT round of 16 from last January. Last year it affected Steamhammer, and yet it didn’t bother me at all, partly because I saw the single elimination bracket more as entertainment than as a test of strength. There are legitimate administrative reasons to replay games. This year it doesn’t affect Steamhammer (games have been replayed, but with the same predictable results), but it bothers me more. There is an effort to make this phase of the tournament more rigorous, and replaying games undercuts that. When a learning bot is paired against a non-learning bot, such as CherryPi versus Iron, having more games against the opponent gives an advantage to the learning bot.

I doubt there’s favoritism behind the scenes, but how can I know? This comes up in politics all the time: It is not enough to avoid impropriety. If you want to be trusted then you also have to avoid the appearance of impropriety.

solid versus daring

A game player of a given strength is solid if it wins reliably against weaker opponents, and daring if it loses more games to weaker opponents and makes up for it by winning some against the stronger. I think the term solid is common. I decided for myself that its opposite should be daring.

The idea applies to all games of skill with winners and losers. You can always find more solid and more daring players, unless the game is so constraining that it leaves no room for stylistic differences. From the point of view of a player with a fixed level of skill, you could say that being solid means that your style of play aims to reduce your risk of losing, while playing daringly means you try to increase your chance of winning. From the point of view of an author, you could say that trying to make your bot more solid means working to reduce exploitable weaknesses that cause losses, while trying to be more daring means creating strengths that will catch out some opponents (like timing attacks or unusual rushes or tech switches). It makes sense for authors of weak bots to focus on daringly beating the stronger, and authors of strong bots to solidly beat the weaker. (Of course it also makes sense to do whatever is more fun.)

I’ve never seen a statistical measure of solidness, in the same way the elo is a statistical measure of strength. It seems widely useful, so I hope somebody has worked one out, or will work one out now that they know about it. A good one seems complicated, though. You could do something like estimate the winning chances each player has against each opponent with a method like that of bayeselo, then try to fit a measure of deviation from flatness over the range for each player. Does the difference between predicted and measured winning chance vary systematically depending on the predicted winning chance?

Here’s one simple measure for the top finishers in the SSCAIT round robin: What proportion of a bot’s losses came against the top 16? If most losses are against strong opponents, the bot is solid. The measure is approximately statistically fair only for the top few bots. We can see that Iron is solid and Tscmoo and McRave much more daring, while Killerbot and Bereaver are more solid than Tscmoo and McRave. I don’t think this number gives us much insight into whether Iron is more solid than Bereaver.

#	bot	top16 loss rate	%
1	Iron	7/10	70%
2	Tscmoo	4/14	28%
3	McRave	5/15	33%
4	Killerbot	9/19	47%
5	Bereaver	11/22	50%

Another simple measure for the stronger bots is: What’s the weakest opponent that you lost to in the SSCAIT round robin? The measure will be noisy, and comparisons only work for players that are close in strength. Also extremely daring lower-rank players like Oleg Ostroumov can distort it. But it’s quick to figure out and that counts for a blog post. I read the results from the unofficial crosstable.

#	bot	worst loss
1	Iron	#31 PurpleCheese
2	Tscmoo	#56 NUS Bot
3	McRave	#69 FTTankTER
4	Killerbot	#60 Oleg Ostroumov
5	Bereaver	#35 Dawid Loranc
6	Steamhammer	#44 Lukas Moravec
7	Wuli	#61 Marine Hell
8	CherryPi	#60 Oleg Ostroumov

My feeling is that Killerbot and Wuli are more solid than this noisy measure gives them credit for, and otherwise the numbers give a rough but fair idea. Iron is more solid than Tscmoo or McRave. Bereaver and Steamhammer are more solid than, say, McRave and CherryPi. In Steamhammer I’ve worked toward solidness, so I’m pleased to have it.

SSCAIT 2017 round robin results

The SSCAIT 2017 round robin phase has finished. See the official results and the unofficial crosstable. The unofficial crosstable seems to include a few extra games; I guess there’s a small leak in the pipeline. I have a few thoughts about the results.

Of the top 5, McRave is a newcomer this year and the rest are the old guard: #1 Iron, #2 Tscmoo random, #3 McRave, #4 Killerbot by Marian Devecka, #5 Bereaver. Killerbot and Bereaver weren’t updated this year and couldn’t quite keep up with the best, but remain tough opponents. It still takes a long time to produce a strong bot.

The results were influenced by the long tail of weaker bots which brought the tournament up to 78 participants. With many weaker opponents, the top players benefit from solid play, avoiding the risk of losing. Bots with daring play, which score well against strong opponents but lose to some that are weaker, were at a disadvantage. #1 Iron is the most solid bot: Look at the crosstable and see its row of 1-1 results against its strongest opposition; it more than made up for those losses with extreme consistency in defeating the lower ranks (the weakest bot it lost a game to was #31 PurpleCheese). Tscmoo in contrast scored well against top opposition, but had more losses to the long tail. I will try a little more analysis of the solid/daring tradeoff in another post.

#7 Wuli is hanging in there. The hard zealot rush is still a successful strategy, and it executes well.

#8 CherryPi remains an interesting case. It also suffered from its daring play. To my eye, it seemed to be learning something about each opponent from the first game, and applying it in the second. As the tournament continued, it surged higher in the ranking. How high might it have finished in a very long tournament? It would be interesting to count how many times it scored a loss then a win versus win then loss in the 2 games against each opponent: A high ratio of loss-win over win-loss indicates the ability to learn from a single game. But it might not be so clear; against an opponent that also learns like McRave, or that changes its play up like Steamhammer, what CherryPi figures out from its first game might lead it astray in the second (I think that happened in the second McRave-CherryPi game).

Microwave, Neo Edmund Zerg, and TyrProtoss tied for places #9-#11, each with 31 losses. I had expected Microwave to do a little better, but I think it relies on its opening learning, and it hadn’t played all the opponents before so it didn’t know enough. I had expected the rushbot Neo Edmund Zerg to do a little worse, but the many newcomers of course all fell to its rush.

My predictions for the tournament are reasonably good (except for the glaring mistake that the tournament was actually a double round robin). I did not expect Tscmoo to finish so high. Steamhammer I boldly forecast to finish in the narrow range from #4 to #8, and it ended up squarely in the middle of that range at #6. I’m pleased that I understand the performance of my own bot.

the elimination phase

According to the rules, random bots will not play in the elimination phase. So Tscmoo random and Andrey Kurdiumov are excluded, and the 16 continuing to the elimination phase should be:

Iron
McRave
Killerbot by Marian Devecka
Bereaver
Steamhammer
Wuli
CherryPi
Microwave
Neo Edmund Zerg
TyrProtoss
XIMP by Tomas Vajda
Arrakhammer
Skynet by Andrew Smith
LetaBot by Martin Rooijackers
AILien
ZurZurZur or Black Crow

Last year, Steamhammer and Zia tied for places #16-#17, and played a best-of-3 tiebreaker to decide who continued. This year ZurZurZur and Black Crow are tied for #16-#17 (excluding random bots) with 108 wins and 46 losses. I hope for another tiebreaker!

Last year the pairings were #1-#16, #2-#15, and so on. It gives the top finishers an advantage over middle finishers; #8 is paired with #9 and must play a close rival. The official pairings were tweeted while I was in the process of writing the post; here they are:

This is close to what I expected, but not quite the same. The tied bots Arrakhammer and Skynet were taken in reverse order from the order listed in the official results, so Steamhammer is paired against Skynet and Bereaver against Arrakhammer. Maybe the idea is to avoid Steamhammer playing against its fork Arrakhammer? Or maybe the idea is to avoid 2 mirror matchups, ZvZ and PvP? Anyway, these are acceptable pairings by the same rules followed last year, except for the unannounced tiebreaker. Maybe ZurZurZur’s 2-0 win over Black Crow in the round robin is taken to break the tie.

New this year is a loser’s bracket. This is now a double elimination design, where you have to lose twice to be out, and no longer single elimination. If you lose 1 match, you fall to the loser’s bracket, where you remain until you either lose a second match or win every match and win the loser’s bracket. The final is between the winner of the winner’s bracket, which lost 0 times, and the winner of the loser’s bracket, which lost 1 time. Every other bot lost twice and is out. Giving participants a second chance makes the tournament a little more fair. On the other hand, last year the elimination phase included best-of matches for the round of 4 and later, whereas this year I’m guessing that they may be single games.

I think the rules should explain the format of the tournament. There is no clear explanation that I know of.

luck

In AIIDE this year, Steamhammer scored much lower in the first 5 round robins (55%) than over the entire tournament (64% over 110 rounds). The difference amounts to finishing #10 instead of #13. It could be due to other bots getting confused by Steamhammer’s random openings and mislearning, but I think it is more likely to be statistical noise. Steamhammer happened to be unlucky early on, and the bad luck washed away over the long tournament. Data is cleansing.

SSCAIT has only 2 round robins. The 5 rounds of Steamhammer’s bad luck comprised 135 games compared to the 154 games each bot plays in the 2 rounds of SSCAIT, similar numbers. Some bots will be lucky and place a little higher than they would have in a very long tournament, and some bots will be unlucky. SSCAIT has a different purpose than AIIDE, so I think that’s OK. But it is a point to remember.

It’s difficult to judge by intuition whether a bot is getting lucky or unlucky. The majority of Steamhammer’s losses so far are “unlucky” losses against opponents that Steamhammer usually defeats. That is exactly what we should expect. The bots in the highest places (Steamhammer is currently #6 out of 78) don’t have many opportunities to lose to stronger opponents. Look at the crosstable and you’ll see that all the top bots have the majority of their losses on the right-hand side, against weaker opponents. No player is perfectly solid, we all lose occasionally due to our own mistakes; it’s a hard game.

That said, I have a clear idea of which games I see as unlucky results. For example, Steamhammer lost 0-2 to Flash this tournament, while in my test at home, Steamhammer beat Flash at a ratio of 4:1. In its losses, Steamhammer happens to randomly choose openings that don’t work against this opponent, or gets into less common situations where weaknesses pop up. Steamhammer’s first game against Flash is its worst game of the tournament so far; Steamhammer barely seems to be in the game at all, but simply falls down when poked. I should repeat that an unlucky result against one opponent doesn’t mean that Steamhammer’s overall result is unlucky. With 2 games against each opponent, lucky and unlucky results against given opponents are virtually inevitable. And it’s hard to judge by intuition whether the good and bad luck balance out.

Steamhammer does have one clear lucky win, Steamhammer > Microwave. Microwave learned that 5 pool on average beats Steamhammer’s ZvZ opening mix, and played it this game too. Steamhammer got lucky and randomly chose 9 pool speed, which counters 5 pool, and won after a long game. Steamhammer maintained its lead the whole game, but Microwave defended stubbornly and had to be ground down (it’s a good game if you like that kind of thing). How will the second Steamhammer-Microwave game go? I can’t predict! Microwave will have an edge if it keeps its opening, but after losing it may switch.

For the rest of the tournament, I predict 2 losses to Iron and 1 more loss to McRave. TyrProtoss is also likely to take its game, and if CherryPi adapts against Steamhammer in the same way it has adapted against other zergs that defeated it, then CherryPi will have an edge in its remaining game—and those are all the likely losses. If Steamhammer wins any of those 5 games, they will be lucky wins. I can’t predict the Microwave or Tscmoo games. Any other losses will be unlucky losses. It seems plain that the majority of Steamhammer’s losses for the rest of the tournament will be unlucky losses, losses against opponents that Steamhammer usually beats, and that is how it should be. Frequent unlikely chances outweigh scarce likely chances.

Next: An epic game.

SSCAIT check-in

All players have now reached at least 30 games, so the rating list and crosstable have data for all entrants. That makes today a good time for a check on prospects.

By the way, the 2016 crosstable is missing from its old URL. Did it move, or is it lost? [Update: Last year’s crosstable was and remains here. ]

The elo rating is a better estimate of the eventual ranking than the win rate, because it takes into account whether the bot has played stronger or weaker opponents. On the other hand, the win rate is up-to-date with all games included, while the elo is calculated from older cached data so it may exclude recent games (like CherryPi > McRave, which was just played). Anyway, here are the top 10 by elo.

bot	elo	win%
McRave	2245	92.31%
Marian Devecka (Killerbot)	2240	90.57%
tscmoor	2223	94.59%
TyrProtoss	2216	85.71%
Steamhammer	2214	89.13%
Martin Rooijackers (LetaBot)	2210	84.09%
Iron	2207	88.57%
Neo Edmund Zerg	2206	82.46%
Tomas Vajda (XIMP)	2204	77.78%
Bereaver	2182	91.18%

The first point to strike me is that the rankings are extremely close. There is barely a difference between a rating 0f #9 2204 and #4 2223; 20 elo points is a 53% chance to win for the higher rated opponent. The difference from the top to the bottom of this list gives #1 McRave about 3:2 odds over #10 Bereaver, far from an overwhelming advantage. With little difference measurable between players, the ranking is not stable and the final results could look entirely different in detail.

Bereaver is the bot with the biggest difference between its rank by win rate and its rank by elo. It has played weaker opponents. That probably reduces the accuracy of its ranking.

Iron surprisingly lost 3 games early and fell far back. Since then it has lost only 1 more and climbed back upward.

Steamhammer has been pretty stable over time at rank #4 plus or minus 1 (currently at #5). I predicted that it would finish in place 4 to 8, and so far I haven't seen a reason to doubt my prediction.

The CherryPi-McRave game was sad to watch. CherryPi did its usual thing, with tons of zerglings plus a few mutalisks. McRave made many high templar and idly walked them into the middle of the map to die. Not once in the game did protoss cast a storm or merge an archon. Could this be related to the last-minute binary hot-fix? Against this unit mix, protoss could have gotten away with skipping storm research and merged archons right away. Other protoss mistakes in the game gave me a feeling that McRave is easy to beat if you can just get your basics up to a good enough level.

a paradox of motivation

It occurs to me that tournaments cause a paradox of motivation. Once the tournament is underway, nothing I do affects it. I want Steamhammer to defeat its strong opponents, giving them losses so Steamhammer can pull in front. If I have to lose some games, I would prefer to lose to the weaker opponents which pose no threat of placing higher.

In development, I don’t ignore the strong opponents by any means, but I have been concentrating on building a firm foundation, getting the basics down solid. Compare: Last year, Steamhammer scored many upsets against stronger players and many losses against the lower ranks. Today it is the other way around; Steamhammer can barely touch Iron or McRave (in last year’s edition it scored an upset over Iron), and it only now and then drops a game to the bottom half. Steamhammer has gone from upset-prone to consistent. I haven’t been trying to beat the top bots, I’ve been trying to play better. I put extra effort into analyzing Steamhammer’s losses against the weakest opponents in AIIDE, for example. It’s the opposite plan from trying to beat the current #1.

Well, I don’t plan to change course. I just thought the paradox of motivation was interesting. As soon as there is nothing I can do, my goals change.

SSCAIT 2017 early results

How is the tournament going so far? Many games remain to be played, and bots have played different numbers, so all conclusions are tentative. But we can see some trends.

The top leaders at the moment by win rate are Bereaver 15-0, McRave 21-1, Tscmoo Random 17-1, Steamhammer 20-2, and KillAll at 25-3. Bereaver is a genuine candidate for a high finish, but most likely not #1. It has played relatively few games. McRave’s one loss is to Wuli. I think McRave is now the top candidate for #1 finisher. KillAll is the biggest surprise to me.

ICEbot at 13-2 is above expectations, but has also played few games. Iron has unexpectedly lost 3 games already and stands at 16-3. I won’t be surprised if its standing rises after more games. Killerbot by Marian Devecka is at 18-3 and can be expected to finish high. Microwave at 11-3 has a higher rate of losses than I expected. I may have overestimated it, as its author MicroDK said, but it hasn’t played many games yet. Microwave does have a win over Iron thanks to wall-busting skills.

CherryPi at 17-5 may be the bot in the tournament with the most smarts. Sometimes its play looks crisp and precise, with accurate reactions. But CherryPi is also immature, as you can guess from its slightly higher rate of losses. With less experience on SSCAIT, it is not ready for everything. It collapsed in the face of the cannon contain of Juno by Yuanheng Zhu, which the strongest bots all defeat easily. (Juno, I learned, builds a cannon contain not only at its enemy’s natural, but at every enemy expansion it finds. I hadn’t seen that before, though I’ve watched many Juno games from AIIDE.) Other CherryPi losses are to AIlien and to Steamhammer in strategically similar ZvZ games: CherryPi undertook a strategy of a rush-safe opening followed by attempted zergling domination (similar in idea to the ZvZ plan of Killerbot by Marian Devecka but with a less thoroughgoing execution), while its opponents chose less-safe hatchery first openings that in fact let them win the zergling war. Trying to win without risk is risky.

CherryPi showed up with a new version not long before the tournament. It reportedly did not show its strongest builds at first: If true, I take that to be a sign of lacking confidence. If you believe that your team, with its funding and top experts, has produced a better bot than all others, then you should also believe that others will be unable to catch up or to exploit its weaknesses, certainly not in a short time. I get the impression that the CherryPi team understands that their bot is not yet solid and mature.

Hannes Bredberg at 14-5 is impressing me as much improved. It used to scatter its marines to scout then gather them to attack, which was cool but not effective. Now it keeps its marines in formation and punches much harder.

Steamhammer is scoring at the level I predicted; it is now at #4 when I gave it places 4 through 8. The win over XIMP reassures me that I probably fixed the weakness that caused the recent test version to lose to XIMP. One point that worries me is that its losses are to MegaBot2017 which is much lower ranked, and to Neo Edmund Zerg which it should beat nearly 100% of the time. Except for CherryPi and XIMP, Steamhammer has not yet played most of its strongest opposition.

As always, the results will tell us. Now I have to hurry up and post this before the results change on me!

SSCAIT 2017 tournament expectations

How will the tournament go? I can’t predict the final results because they come out of an elimination phase, which is unpredictable by nature. It depends on pairings and luck. But I have some insight into the round robin phase. There are 78 participants, much more than the 45 last year, so each will play each other only once, giving the announced 6006 = 78 * 77 games. I haven’t seen it explained, but I suppose that the top 16 will go to the elimination phase, like last year.

How bots do in regular ladder play can be very different from how they will do in the tournament. The voters have a big influence. Steamhammer has been underrated lately on the ladder and will finish high in the round robin.

With Krasi0 choosing to stand out of the picture, the top favorites, of course, are Iron and McRave. Microwave has become steady and reliable and should also finish near the top; it even has a chance to finish #1, because it is not as big a target as Iron and McRave. And there are old standbys like Bereaver and like Killerbot by Marian Devecka.

Tscmoo random is being allowed to play, which I did not expect. As a random bot, it won’t be allowed into the elimination phase, but it will compete in the round robin. It was last updated in October and will probably finish out of the top ranks. But it would have had a good chance to make it into the elimination bracket if that were allowed by the rules.

CherryPi is a hard case to figure out. This version of CherryPi follows what is at heart a simple game plan: It wants to win with masses of zerglings (sometimes it falls back on winning with masses of hydralisks). It seems to have some ability to vary its plan by massing early, or saving up and massing more later. A casual watch of its games gives me the impression that it may be learning how to beat its opponents by varying its timing, but without a close study I’m not sure; maybe it’s something else. In any case, CherryPi’s initial results are mixed, with losses to weaker bots and wins against stronger ones. And in a single round robin, CherryPi won’t have any additional games to learn about opponents; it will have to rely on what it already knows.

Steamhammer I expect to finish out of the top 3 but likely between ranks 4 and 8. It should beat lower-ranked bots consistently and its only sure losses are against Iron and McRave, though of course it will lose some other games too. Microwave will probably win, but not definitely. I give Steamhammer a 75% chance against former nemesis CasiaBot because I hand-coded in a counter to CasiaBot’s hand-coded Steamhammer-countering opening. Steamhammer’s other nemesis TyrProtoss is not participating. The last couple games against Bereaver give the impression that Steamhammer can now go toe-to-toe against the protoss bot in the middle game, which it could not do before (Steamhammer won early or not at all). So the game against Bereaver is likely also a win.

AIIDE 2017 unattributed crashes

In AIIDE 2017, the tournament manager launched some games that did not start. These games were recorded with duration 0 and score 0 for both sides, and were ignored in the official tally. In the detailed results HTML page, the games are listed as crashes with the crashed player being “unknown”. I think of these games as unattributed crashes: If one bot identifiably crashed, then that bot lost the game. But some games failed without either bot crashing in a way that the tournament manager recognized and attributed to the bot, and those games had to be skipped.

And yet, looking at how often bots appeared in “unknown” crash games, there is one obvious conclusion. The % column here is the percentage of unattributed crash games that the bot participated in. Each unattributed crash game has 2 participants, so the percentages add up to 200% before rounding (even though the column total says 100%).

bot	crashes	%
ZZZKBot	4	2.20%
PurpleWave	7	3.85%
Iron	5	2.75%
cpac	7	3.85%
Microwave	8	4.40%
CherryPi	4	2.20%
McRave	6	3.30%
Arrakhammer	7	3.85%
Tyr	4	2.20%
Steamhammer	6	3.30%
AILien	4	2.20%
LetaBot	15	8.24%
Ximp	8	4.40%
UAlbertaBot	2	1.10%
Aiur	5	2.75%
IceBot	15	8.24%
Skynet	12	6.59%
KillAll	5	2.75%
MegaBot	168	92.31%
Xelnaga	8	4.40%
Overkill	12	6.59%
Juno	8	4.40%
GarmBot	9	4.95%
Myscbot	6	3.30%
HannesBredberg	6	3.30%
Sling	7	3.85%
ForceBot	10	5.49%
Ziabot	6	3.30%
total	182	100%

With these numbers in hand, the great majority of unattributed crashes can be attributed after the fact to MegaBot. MegaBot may have a bug that sometimes breaks the tournament infrastructure. Likely the bug is in the infrastructure itself, and MegaBot happens to tickle it—and other bots do too, though less often.

As a side effect, MegaBot’s official score could be considered too high. If we see the unattributed crashes with MegaBot as “MegaBot’s fault,” then the games should not be skipped in the results, but counted as wins for the opponent and losses for MegaBot. The change is unfair, though: Even if the bug is in MegaBot, which we do not know, then surely not all of the unattributed crashes are due to MegaBot. Other bots or the infrastructure must be responsible for some.

Running a big tournament is hard....

AIIDE 2017 AILien does not learn

AILien’s learning turned out to be non-existent in the AIIDE 2017 version. It writes files but does not read them back in.

Here is AILien’s recorded data from AIIDE 2017 for the opponent XIMP.

lingScore=24.4783
hydraScore=-4762.48
lurkerScore=0
mutaScore=-904.381
ultraScore=-1757.08
guardScore=0
macroHeavyness=0
supplyISawAir=149
strategy=17
victory=false

The first six lines are information about what units to make, as updated during the game. macroHeavyness and supplyISawAir are updated but I didn’t find any sign that they are used for anything; I think they were used in a former version. strategy likely used to be an index into the unused GameCommander::strategyMap, but no matter what it originally was, now it is a constant. And victory is the result of the most recent game. As I said at the start, the information is written out but is never read back in.

Next: Those tscmoo games I promised last month.

AIIDE 2017 what AIUR learned

Here is what AIUR learned about each opponent over the course of the tournament. I did this mostly because it’s easy; I already had the script from last year. But it’s also informative—AIUR’s reactions tell us how each bot played, and may tell bot authors what they need to work on.

The data is generated from files in AIUR’s final read directory. AIUR recorded 111 games against some opponents even though the tournament officially ran for 110 rounds; that is presumably because the tournament did run longer but was cut back to a multiple of 10 rounds for fairness (since there are 10 maps). On the other hand, AIUR’s total game count according to itself is 2938 and according to the tournament results is 2965, so it may have been unable to record some games (it is listed with 53 crashes, so that’s not a surprise). First an overall view, totalling the data for all opponents. We can see that all 6 of AIUR’s strategies (“moods” it calls them) were widely valuable: Every strategy has win rate over 50% on some map size. AIUR’s overall win rate in the tournament was 50.46%.

overall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	159	55%	59	37%	161	44%	379	47%
rush	134	66%	87	55%	185	50%	406	56%
aggressive	107	56%	108	43%	155	30%	370	41%
fast expo	69	45%	84	33%	197	51%	350	46%
macro	46	28%	69	52%	211	37%	326	39%
defensive	352	60%	185	58%	570	55%	1107	57%
total	867	57%	592	49%	1479	48%	2938	50%

2, 3, 4 - map size, the number of starting positions
n - games recorded
wins - winning percentage over those games
cheese - cannon rush
rush - dark templar rush
aggressive - fast 4 zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes (not entirely successful)

#1 zzzkbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	16	12%	1	0%	4	0%	21	10%
rush	5	0%	1	0%	1	0%	7	0%
aggressive	3	0%	1	0%	5	0%	9	0%
fast expo	4	0%	1	0%	5	0%	10	0%
macro	3	0%	2	0%	3	0%	8	0%
defensive	3	0%	16	31%	37	24%	56	25%
total	34	6%	22	23%	55	16%	111	14%

AIUR struggled against the tournament leader but was not entirely helpless. Its cannon rush had a chance on 2 player maps and its anti-rush strategy on the others. We see how AIUR gains by taking the map size into account.

#2 purplewave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	0%	4	0%
rush	28	79%	3	33%	40	55%	71	63%
aggressive	1	0%	3	33%	1	0%	5	20%
fast expo	1	0%	11	36%	10	60%	22	45%
macro	1	0%	2	0%	1	0%	4	0%
defensive	1	0%	1	0%	1	0%	3	0%
total	33	67%	21	29%	55	51%	109	51%

AIUR upset #2 PurpleWave, a surprising outcome. The DT rush and the fast expand were both somewhat successful—rather unrelated strategies.

#3 iron	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	1	0%	7	0%	13	0%
rush	5	0%	2	0%	7	0%	14	0%
aggressive	3	0%	2	0%	12	0%	17	0%
fast expo	8	0%	14	7%	9	0%	31	3%
macro	6	0%	1	0%	10	0%	17	0%
defensive	5	0%	2	0%	10	0%	17	0%
total	32	0%	22	5%	55	0%	109	1%

Learning can’t help if nothing you try wins....

#4 cpac	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	4	0%	0	0%	2	0%	6	0%
aggressive	2	0%	1	0%	1	0%	4	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	2	0%	3	33%	2	0%	7	14%
defensive	24	38%	16	69%	48	50%	88	50%
total	34	26%	22	55%	55	44%	111	41%

Cpac was configured to play 5 pool against AIUR. It worked, but AIUR was able to compensate to an extent by playing its anti-rush build.

#5 microwave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	4	0%	8	0%
rush	1	0%	1	0%	4	0%	6	0%
aggressive	20	20%	15	13%	11	0%	46	13%
fast expo	1	0%	2	0%	6	0%	9	0%
macro	1	0%	1	0%	4	0%	6	0%
defensive	1	0%	1	0%	26	12%	28	11%
total	26	15%	22	9%	55	5%	103	9%

Microwave was successful but showed a little vulnerability to surprise zealots dropped in its main. I suspect it’s a tactical reaction issue.

#6 cherrypi	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	2	0%	2	0%	1	0%	5	0%
fast expo	2	0%	1	0%	1	0%	4	0%
macro	2	0%	1	0%	9	11%	12	8%
defensive	26	4%	16	12%	42	12%	84	10%
total	34	3%	22	9%	55	11%	111	8%

#7 mcrave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	26	100%	5	60%	45	62%	76	75%
rush	3	67%	9	67%	4	50%	16	62%
aggressive	1	0%	4	50%	1	0%	6	33%
fast expo	1	0%	2	50%	2	50%	5	40%
macro	1	0%	1	0%	1	0%	3	0%
defensive	1	0%	1	0%	2	0%	4	0%
total	33	85%	22	55%	55	56%	110	65%

AIUR upset McRave with its cannon rush, and the dark templar rush did well too. AIUR executes the best cannon rush of any bot, in my opinion. It is a sign that McRave’s play was not robust enough against tricks.

#8 arrakhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	3	0%	7	0%
rush	1	0%	1	0%	4	0%	6	0%
aggressive	1	0%	5	60%	3	0%	9	33%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	0	0%	12	50%	38	37%	50	40%
defensive	29	66%	1	0%	4	25%	34	59%
total	34	56%	22	41%	54	28%	110	39%

#9 tyr	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	6	67%	1	0%	1	0%	8	50%
rush	20	100%	1	0%	2	0%	23	87%
aggressive	3	33%	10	20%	1	0%	14	21%
fast expo	1	0%	7	29%	49	35%	57	33%
macro	1	0%	1	0%	1	0%	3	0%
defensive	2	50%	2	0%	1	0%	5	20%
total	33	79%	22	18%	55	31%	110	43%

The DT rush won 100% of the time on 2 player maps and was tried only a few times on larger maps, losing. Was it only unlucky on the 3 and 4 player maps, or is there a real difference? With only 3 games total, we can’t tell from the numbers. It is a weakness of AIUR’s learning: It’s slow because there is so much to learn. The flip side of the slowness is that, over a long tournament, it learns a lot.

#10 steamhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	1	0%	1	0%	4	0%
rush	2	50%	1	0%	2	0%	5	20%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	0	0%	1	0%	1	0%	2	0%
defensive	27	81%	17	88%	49	67%	93	75%
total	33	70%	22	68%	55	60%	110	65%

I was surprised to see Steamhammer upset by AIUR. I had thought that AIUR was a solved problem. On SSCAIT too, Steamhammer started to show losses against AIUR in September for the first time in months. I may have introduced a weakness in some recent version and AIUR’s learning took that long to find it on SSCAIT. In AIIDE, the tournament was easily long enough.

#11 ailien	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	3	0%	1	0%	2	0%	6	0%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	2	50%	0	0%	3	33%
macro	4	50%	8	75%	1	0%	13	62%
defensive	24	58%	8	88%	49	37%	81	48%
total	34	47%	22	64%	54	33%	110	44%

#12 letabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	7	43%	1	0%	2	0%	10	30%
rush	3	33%	13	54%	43	40%	59	42%
aggressive	5	40%	1	0%	1	0%	7	29%
fast expo	13	46%	3	33%	1	0%	17	41%
macro	1	0%	1	0%	6	33%	8	25%
defensive	1	0%	3	33%	1	0%	5	20%
total	30	40%	22	41%	54	35%	106	38%

I suspect that fast expo was the best strategy on 4 player maps, but how was AIUR to know? A weakness of AIUR’s epsilon-greedy learning, compared to UCB, is that it doesn’t realize that a less-explored option is more likely to be misevaluated.

#13 ximp	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	34	35%	0	0%	1	0%	35	34%
rush	0	0%	0	0%	1	0%	1	0%
aggressive	0	0%	13	8%	52	2%	65	3%
fast expo	0	0%	9	0%	0	0%	9	0%
macro	0	0%	0	0%	1	0%	1	0%
defensive	0	0%	0	0%	0	0%	0	0%
total	34	35%	22	5%	55	2%	111	13%

#14 ualbertabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	0	0%	0	0%	1	100%	1	100%
rush	0	0%	0	0%	0	0%	0	0%
aggressive	0	0%	0	0%	1	100%	1	100%
fast expo	0	0%	0	0%	0	0%	0	0%
macro	0	0%	0	0%	0	0%	0	0%
defensive	34	32%	21	5%	52	27%	107	24%
total	34	32%	21	5%	54	30%	109	26%

What’s up with all those zeroes? AIUR is coded to try each strategy once before it starts making decisions, and that did not happen here. It turns out that AIUR has pre-learned data for Skynet, XIMP, and UAlbertaBot, so its learning in those cases looks different.

#16 icebot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	1	0%	4	0%
rush	1	0%	2	50%	3	33%	6	33%
aggressive	3	100%	3	67%	4	50%	10	70%
fast expo	14	100%	3	67%	44	93%	61	93%
macro	4	75%	2	50%	1	0%	7	57%
defensive	9	89%	10	80%	2	50%	21	81%
total	32	88%	22	64%	55	82%	109	80%

#17 skynet	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	13	92%	0	0%	0	0%	13	92%
rush	21	95%	21	90%	51	88%	93	90%
aggressive	0	0%	0	0%	0	0%	0	0%
fast expo	0	0%	1	100%	0	0%	1	100%
macro	0	0%	0	0%	0	0%	0	0%
defensive	0	0%	0	0%	4	50%	4	50%
total	34	94%	22	91%	55	85%	111	89%

#18 killall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	3	0%	1	0%	5	0%
rush	1	0%	2	0%	1	0%	4	0%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	3	0%	1	0%	5	0%
macro	0	0%	2	0%	2	50%	4	25%
defensive	30	80%	10	70%	49	76%	89	76%
total	34	71%	22	32%	55	69%	111	62%

#19 megabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	3	67%	1	0%	2	0%	6	33%
rush	2	0%	14	36%	5	0%	21	24%
aggressive	6	67%	4	25%	4	0%	14	36%
fast expo	2	50%	1	0%	4	0%	7	14%
macro	1	0%	1	0%	36	25%	38	24%
defensive	17	76%	1	0%	2	0%	20	65%
total	31	65%	22	27%	53	17%	106	33%

#20 xelnaga	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	9	100%	6	83%	1	0%	16	88%
rush	19	100%	4	75%	1	0%	24	92%
aggressive	1	0%	3	33%	1	0%	5	20%
fast expo	1	0%	4	75%	1	0%	6	50%
macro	2	0%	2	50%	50	36%	54	35%
defensive	2	50%	3	67%	1	0%	6	50%
total	34	85%	22	68%	55	33%	111	56%

Against Xelnaga, AIUR found solutions on 2 and 3 player maps but not on 4 player maps. Is it another case of underexploration?

#21 overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	3	67%	5	40%
rush	2	50%	0	0%	0	0%	2	50%
aggressive	8	100%	4	100%	7	86%	19	95%
fast expo	3	67%	3	100%	7	100%	13	92%
macro	4	75%	3	67%	12	92%	19	84%
defensive	14	93%	11	100%	26	96%	51	96%
total	32	84%	22	91%	55	93%	109	90%

#22 juno	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	14	36%	33	15%	52	19%
rush	3	0%	1	0%	1	0%	5	0%
aggressive	2	0%	1	0%	2	0%	5	0%
fast expo	2	0%	1	0%	16	12%	19	11%
macro	1	0%	1	0%	1	0%	3	0%
defensive	19	21%	4	25%	2	0%	25	20%
total	32	12%	22	27%	55	13%	109	16%

Juno’s cannon contain upset AIUR. Learning didn’t help much, because the problem wasn’t in any of the strategies, it was in AIUR’s poor reactions to cannons appearing in front of its base. It is amusing to watch 2 bots cannon each other when sometimes both get cannons up.

#23 garmbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	2	50%	1	0%	0	0%	3	33%
aggressive	17	94%	17	100%	3	67%	37	95%
fast expo	0	0%	1	0%	23	83%	24	79%
macro	0	0%	1	0%	1	0%	2	0%
defensive	5	80%	1	0%	27	81%	33	79%
total	25	84%	22	77%	55	78%	102	79%

#24 myscbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	50%	4	25%
rush	2	0%	3	67%	2	50%	7	43%
aggressive	3	33%	2	100%	9	78%	14	71%
fast expo	1	0%	2	50%	1	0%	4	25%
macro	4	50%	4	100%	3	67%	11	73%
defensive	23	61%	10	100%	38	79%	71	76%
total	34	50%	22	86%	55	75%	111	69%

#25 hannesbredberg	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	80%	3	100%	3	67%	11	82%
rush	2	50%	3	100%	2	50%	7	71%
aggressive	2	50%	2	50%	2	0%	6	33%
fast expo	8	100%	3	100%	9	89%	20	95%
macro	2	50%	4	100%	11	91%	17	88%
defensive	15	100%	7	100%	28	100%	50	100%
total	34	88%	22	95%	55	89%	111	90%

#26 sling	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	50%	1	0%	3	33%	6	33%
rush	2	50%	0	0%	1	0%	3	33%
aggressive	12	100%	0	0%	23	96%	35	97%
fast expo	1	0%	5	100%	1	0%	7	71%
macro	3	67%	5	80%	12	75%	20	75%
defensive	5	80%	11	100%	15	80%	31	87%
total	25	80%	22	91%	55	80%	102	82%

Here is another possible case of insufficient exploration. The 4 zealot drop won 100% of the time on 2 player maps and 96% of the time on 4 player maps, but was never tried on 3 player maps (I guess due to a crash, since AIUR tries to play each strategy once). It’s not a severe problem, though, because 3 player maps did have 2 strategies that scored 100%.

#27 forcebot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	0	0%	1	0%	1	0%	2	0%
aggressive	3	67%	2	0%	1	0%	6	33%
fast expo	0	0%	1	0%	1	0%	2	0%
macro	0	0%	9	78%	3	67%	12	75%
defensive	29	100%	8	75%	48	94%	85	94%
total	33	94%	22	59%	55	85%	110	83%

#28 ziabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	12	100%	7	86%	36	86%	55	89%
rush	1	0%	1	100%	4	75%	6	67%
aggressive	6	100%	8	88%	6	83%	20	90%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	3	0%	1	0%	1	0%	5	0%
defensive	6	67%	4	75%	6	83%	16	75%
total	29	76%	22	77%	55	80%	106	78%

Next: AILien’s learning.