archive by month
Skip to content

a few preliminary Elo charts

The SSCAIT data includes 103 bots, and 3 of them have 10 or fewer games, leaving exactly 100 with useful rating curves. I’ve crunched and formatted the data, and now all I have to do is draw it. I hope to create a humongalicious zoomable graph of daily rating data for all 100 bots—if I can find a way to draw that many lines on a graph in a way that’s usable. Well, I’ll think of something. I chose powerful graphing software that’s fully capable of doing the job, but it’s complicated and my skill and patience may be less than fully capable....

Anyway, another appetizer. Here are static rating graphs for 2016 for the top 3 CIG finishers, all of which had many updates this year. The graphs run from 1 January 2016 to 27 September 2016. The authors may be interested in comparing their updates with movements in their graph. Krasi0 shows steady improvement since April, while the other two look more irregular.

graph of Krasi0’s rating in 2016

graph of Iron’s rating in 2016

graph of Tacmoo terran’s rating in 2016

ZerGreenBot does zealot bombs

A quick note: The current version of ZerGreenBot has given up on reaver drops (at least for now), but it does know how to bomb tanks with zealots. I think it’s the first bot with that skill.

Bereaver

Wow, did you see the new protoss bot Bereaver? It was uploaded today at SSCAIT. In its first game it failed to start (oops). But in its second game it played strikingly well against Krasi0, breaking Krasi0’s bunker in the early game even while expanding and teching; Krasi0 defended well and held, of course, because that’s what Krasi0 does. Then Bereaver put up a fierce fight in the middle game with reavers and high templar, repeatedly wearing down and breaking Krasi0’s pushes until Bereaver ran out of resources, unable to expand beyond its third in the face of terran map control.

Somebody with game-scheduling power must have seen the game too, because games against other top bots came up right away. Bereaver lost more than it won, but it did defeat IceBot despite misplacing its natural nexus and being unable to take a third, ignorant of how to clear the mine blocking the expansion spot.

I was also impressed with the game against Andrew Smith’s Skynet. Bereaver apparently diagnosed Skynet’s dark templar rush and prepared against it, cannoning and easily holding its ramp. The cannons were misplaced and blocked dragoons and reavers inside, a fatal blunder, but the basic skill is there.

It’s a great start for a new bot. Many rough edges are in plain sight, which means that improvements should come easily!

Elo rating table

Here’s a table that explains what Elo ratings mean. To find out the chance that one bot will beat another, subtract their Elo ratings and look up the difference in the table. Iron is rated 2081 and Wulibot is rated 1871. The difference is 210—look it up in the table!

The probability estimate is not perfect, but it is good on average.

rating
diff
win %rating
diff
win %rating
diff
win %rating
diff
win %
050% 20076% 40091% 60097%
1051% 21077% 41091% 61097%
2053% 22078% 42092% 62097%
3054% 23079% 43092% 63097%
4056% 24080% 44093% 64098%
5057% 25081% 45093% 65098%
6059% 26082% 46093% 66098%
7060% 27083% 47094% 67098%
8061% 28083% 48094% 68098%
9063% 29084% 49094% 69098%
10064% 30085% 50095% 70098%
11065% 31086% 51095% 71098%
12067% 32086% 52095% 72098%
13068% 33087% 53095% 73099%
14069% 34088% 54096% 74099%
15070% 35088% 55096% 75099%
16072% 36089% 56096% 76099%
17073% 37089% 57096% 77099%
18074% 38090% 58097% 78099%
19075% 39090% 59097% 79099%
20076% 40091% 60097% 80099%

SSCAIT initial and current Elo ratings

I’m still working on Elo curves over time, but today I have Elo ratings for each bot in the SSCAIT data at the beginning and end of its career. Here is yesterday’s table plus the new info, now sorted by decreasing current rating—the bot’s real strength yesterday as best we can measure. The topmost ratings are, to my surprise, exactly in the order I expected!

To make the ratings easier to interpret, I added two columns labeled “expect”. These are the expected winning rate of the bot against the average opponent. The rating system is designed so that the average Elo rating is constant at 1500, and it’s easy to compute the expected winning rate against an opponent rated 1500. The constant average rating, by the way, means that a bot which remains the same can see its rating decline over time if its opponents improve.

Ratings are not accurate for bots with a very small number of games. I plan to exclude those bots from the curves over time.

initialcurrent
botwin %EloexpectEloexpectgamesearliestlatest
krasi068.77%159363.07%216397.85%21422015 Nov 302016 Sep 27
Iron bot77.74%158061.31%208196.59%19992015 Nov 272016 Sep 26
Marian Devecka58.66%179084.15%206596.28%62892013 Dec 252016 Sep 27
Martin Rooijackers68.50%184087.62%201194.99%72902014 Jul 282016 Sep 27
tscmooz79.80%182386.52%199194.41%50062015 Feb 272016 Sep 27
tscmoo72.06%183887.50%197894.00%57192015 Jan 222016 Sep 27
LetaBot CIG 201675.68%174880.65%193292.32%4442016 Aug 012016 Sep 27
WuliBot72.76%177382.80%187189.43%9842016 Apr 192016 Sep 26
Simon Prins55.48%151351.87%186789.21%54312015 Jan 252016 Sep 27
ICELab81.12%218998.14%186589.10%83442013 Dec 252016 Sep 27
FlashTest69.44%174480.29%186388.99%2162016 Mar 222016 Jul 27
Sijia Xu71.65%185088.23%184988.17%23282015 Oct 102016 Sep 27
LetaBot SSCAI 2015 Final65.87%171077.01%181385.84%4162016 Aug 042016 Sep 27
Dave Churchill75.48%198594.22%180485.19%82752013 Dec 252016 Sep 27
Chris Coxe73.10%175481.19%180084.90%22012015 Sep 032016 Sep 27
Tomas Vajda79.37%216997.92%179084.15%83722013 Dec 252016 Sep 27
Flash65.69%145843.98%177783.13%9912016 Apr 182016 Sep 27
LetaBot IM noMCTS60.93%164569.73%176682.22%12262016 May 182016 Aug 01
Zia bot52.24%156859.66%175781.45%5362016 Jul 072016 Sep 27
A Jarocki62.77%171177.11%174180.02%9322015 Oct 042016 Jan 26
PeregrineBot57.29%169275.12%172878.79%12762016 Feb 092016 Sep 10
tscmoop78.16%189590.67%172178.11%19922015 Nov 112016 Sep 26
Andrew Smith65.00%170576.50%171877.81%83912013 Dec 252016 Sep 27
Florian Richoux62.11%177082.55%171677.62%82032013 Dec 252016 Sep 27
Carsten Nielsen66.08%170876.81%169575.45%47112015 Mar 172016 Sep 27
Soeren Klett63.62%206896.34%168774.58%82772013 Dec 252016 Sep 27
Vaclav Horazny37.35%10667.60%168674.47%64552013 Dec 252015 Nov 18
La Nuee51.61%149949.86%166271.76%5582015 Dec 132016 Mar 18
Jakub Trancik45.08%175581.27%165771.17%84162013 Dec 252016 Sep 27
Marek Suppa51.85%174680.47%165570.94%44132015 Jan 052016 Mar 18
Krasimir Krystev70.52%203395.56%165370.70%65102013 Dec 252016 Mar 10
ASPbot201149.78%167172.80%165270.58%2272015 Jan 292016 Feb 25
Marcin Bartnicki60.42%185588.53%163368.26%14352014 Nov 282016 Mar 18
Tomas Cere61.11%188890.32%163168.01%83732013 Dec 252016 Sep 27
MegaBot49.40%157660.77%163067.88%4192016 Aug 012016 Sep 27
Aurelien Lermant58.26%168874.69%162266.87%36872015 Jun 222016 Sep 27
Matej Kravjar49.57%172378.31%161966.49%32342013 Dec 252015 Feb 18
Daniel Blackburn43.79%165170.46%160564.67%68832013 Dec 252016 Jan 26
Gabriel Synnaeve45.96%173779.65%158461.86%16582013 Dec 252015 Nov 24
David Milec49.09%155257.43%156659.39%552015 Jan 132015 Jan 20
Odin201455.65%165971.41%156559.25%56482014 Dec 212016 Sep 11
Gaoyuan Chen48.05%158261.59%155958.41%51182015 Feb 102016 Sep 27
Henri Kumpulainen38.81%144742.43%155357.57%8942016 Jan 132016 May 31
Martin Dekar33.14%142939.92%153354.73%49102013 Dec 252016 Jan 25
Serega48.20%177182.64%150550.72%38032015 Jan 312016 Jan 26
Chris Ayers35.53%161065.32%148147.27%15202015 Aug 102016 Jan 26
Nathan a David39.34%144642.29%148147.27%10042016 Feb 232016 Aug 08
DAIDOES34.02%137032.12%147145.84%4852016 Jun 132016 Sep 08
FlashZerg0.00%147446.27%145944.13%72016 Apr 242016 May 12
Igor Lacik39.32%160865.06%145443.42%80732013 Dec 252016 Sep 08
Matej Istenik44.74%170976.91%144942.71%82972013 Dec 252016 Sep 27
EradicatumXVR40.88%153755.30%144341.87%46872013 Dec 252016 Jan 23
Ibrahim Awwal30.57%151051.44%143741.03%5302013 Dec 252014 Mar 24
Tomasz Michalski27.02%131425.53%143240.34%4332015 Dec 222016 Mar 18
Oleg Ostroumov48.75%171477.41%143140.20%36412013 Dec 252016 Jan 26
NUS Bot35.72%148247.41%142639.51%33372015 May 192016 Sep 06
Martin Pinter28.98%140937.20%142539.37%37402013 Dec 252015 Dec 11
Roman Danielis45.63%168874.69%141738.28%51552013 Dec 252016 Sep 26
ZerGreenBot22.22%140436.53%141638.14%362016 Sep 222016 Sep 27
Rafael Bocquet0.00%145042.85%141538.01%102015 Jun 232015 Jun 26
Flashrelease0.00%144942.71%141337.73%82016 Apr 242016 Apr 24
Marek Kadek37.29%155758.13%141337.73%76412013 Dec 252016 May 22
Ian Nicholas DaCosta37.12%139435.20%140436.53%29282015 Apr 272016 Sep 08
AwesomeBot29.81%132626.86%140336.39%4732016 Jun 162016 Sep 08
Radim Bobek23.37%131525.64%139034.68%11512015 Oct 012016 Mar 06
Adrian Sternmuller26.89%143640.89%137532.75%45292013 Dec 252016 Jul 22
Martin Strapko19.76%138834.42%136631.62%33862013 Dec 252016 Jan 26
Maja Nemsilajova23.81%136531.49%136331.25%42462013 Dec 252015 Nov 29
Johan Kayser24.46%129423.40%136131.00%4132016 Jul 292016 Sep 27
UPStarcraftAI24.75%134629.18%136030.88%6102015 Dec 242016 Apr 13
Martin Vlcak28.92%137032.12%135330.02%12242016 Feb 162016 Sep 07
Johannes Holzfuss35.04%153154.45%135129.78%6852016 Mar 052016 Jun 15
Vojtech Jirsa14.14%118614.09%135029.66%27862015 Jan 122015 Sep 05
JompaBot21.99%131625.75%134929.54%10552016 Feb 042016 Aug 13
Rob Bogie31.34%133527.89%134629.18%6512016 May 142016 Sep 06
Christoffer Artmann20.51%128922.89%134428.95%3952016 Aug 072016 Sep 27
Marek Gajdos22.69%125119.26%133127.43%13842016 Jan 302016 Sep 11
Travis Shelton23.59%139034.68%131425.53%12212016 Feb 282016 Sep 06
Peter Dobsa13.25%122717.20%130724.77%30272015 Jan 112015 Oct 02
VeRLab17.06%124118.38%130424.45%8972016 Feb 282016 Aug 01
Andrej Sekac11.76%135930.75%129623.61%682013 Dec 252014 Jan 04
Bjorn P Mattsson22.22%135129.78%129523.50%44422015 Apr 052016 Sep 27
Lukas Sedlacek22.86%134428.95%129323.30%702015 Jan 122015 Jan 20
Sergei Lebedinskij13.30%117813.55%129323.30%10832015 May 282015 Sep 03
Vladimir Jurenka38.45%163568.51%127821.79%61672013 Dec 252016 Sep 27
neverdieTRX20.66%126520.54%127221.21%3342016 Jul 192016 Sep 10
OpprimoBot21.85%132126.30%125619.71%20092015 Nov 182016 Sep 27
Marek Kruzliak14.45%115111.83%125519.62%9342013 Dec 252015 Jan 20
Sungguk Cha18.65%120715.62%125019.17%6972016 Jun 052016 Sep 27
Jacob Knudsen20.53%10838.31%124718.90%12572016 Feb 232016 Sep 10
Ludmila Nemsilajova16.04%113310.79%122817.28%5052013 Dec 252015 Jan 21
Karin Valisova17.68%123818.12%122617.12%11712013 Dec 252016 Jan 26
HoangPhuc15.67%113210.73%120915.77%3002016 Jul 182016 Sep 07
Sebastian Mahr15.06%120515.47%118213.82%12022016 Jan 132016 Aug 08
Jan Pajan14.48%121015.85%117913.61%11192013 Dec 252016 Jan 05
Pablo Garcia Sanchez12.20%112310.25%117413.28%5902015 Dec 242016 Apr 13
Ivana Kellyerova11.47%112910.57%113110.68%16302013 Dec 252015 Apr 01
Lucia Pivackova13.29%11119.63%10908.63%8352013 Dec 252015 Jan 20
Tae Jun Oh4.55%10697.72%10366.47%1542016 Mar 222016 Apr 11
Denis Ivancik10.76%11029.19%10226.00%5022013 Dec 252015 Jan 20
ButcherBoy4.74%9213.45%9704.52%4222016 Jun 212016 Sep 06
Jon W5.06%9203.43%9644.37%7902015 Apr 302015 Jul 09
Matyas Novy6.32%113010.62%8852.82%16932015 Feb 042015 Jul 09

How did I get the initial ratings? I had a cute idea. One of the issues with computing Elo ratings over time is: How do you initialize the ratings? Most systems either start everybody with the same rating, which makes an ugly graph, or use a different and less accurate method to estimate the rating in early games. But in this case I have the whole data set in hand. I set the final rating of every bot to the same rating and computed ratings backwards in time to find an initial rating. Then I threw away everything except the initial rating, and calculated the real ratings forward in time to find the ratings over time and the final ratings. That way every data point is equally good, from beginning to end. I doubt I’m the first to think of it, but it’s a cute idea and I’m pleased.

Next: I’ll find some sensible way to plot the curves. Stand by!

SSCAIT career records

Krasimir Krastev aka Krasi0 sent me a file of game results from SSCAIT, including 141,163 games recorded between 25 December 2013 and today. (Obviously it doesn’t include all games played today.) He’s particularly interested in the evolution of Elo ratings over time and my colorful crosstables per map.

It may take me a while to get to that stuff. Here’s a down payment. First, the career record of the 103 bots in the data, with win rates and dates. The top career win rate is IceBot from ICELab, followed by Tscmoo zerg and Tomas Vajda’s XIMP. Of course career win rate is not a fair comparison for bots which improved greatly over their careers, or which have shorter careers.

botwin %gamesearliestlatest
A Jarocki62.77%9322015 Oct 042016 Jan 26
Adrian Sternmuller26.89%45292013 Dec 252016 Jul 22
Andrej Sekac11.76%682013 Dec 252014 Jan 04
Andrew Smith65.00%83912013 Dec 252016 Sep 27
ASPbot201149.78%2272015 Jan 292016 Feb 25
Aurelien Lermant58.26%36872015 Jun 222016 Sep 27
AwesomeBot29.81%4732016 Jun 162016 Sep 08
Bjorn P Mattsson22.22%44422015 Apr 052016 Sep 27
ButcherBoy4.74%4222016 Jun 212016 Sep 06
Carsten Nielsen66.08%47112015 Mar 172016 Sep 27
Chris Ayers35.53%15202015 Aug 102016 Jan 26
Chris Coxe73.10%22012015 Sep 032016 Sep 27
Christoffer Artmann20.51%3952016 Aug 072016 Sep 27
DAIDOES34.02%4852016 Jun 132016 Sep 08
Daniel Blackburn43.79%68832013 Dec 252016 Jan 26
Dave Churchill75.48%82752013 Dec 252016 Sep 27
David Milec49.09%552015 Jan 132015 Jan 20
Denis Ivancik10.76%5022013 Dec 252015 Jan 20
EradicatumXVR40.88%46872013 Dec 252016 Jan 23
Flash65.69%9912016 Apr 182016 Sep 27
Flashrelease0.00%82016 Apr 242016 Apr 24
FlashTest69.44%2162016 Mar 222016 Jul 27
FlashZerg0.00%72016 Apr 242016 May 12
Florian Richoux62.11%82032013 Dec 252016 Sep 27
Gabriel Synnaeve45.96%16582013 Dec 252015 Nov 24
Gaoyuan Chen48.05%51182015 Feb 102016 Sep 27
Henri Kumpulainen38.81%8942016 Jan 132016 May 31
HoangPhuc15.67%3002016 Jul 182016 Sep 07
Ian Nicholas DaCosta37.12%29282015 Apr 272016 Sep 08
Ibrahim Awwal30.57%5302013 Dec 252014 Mar 24
ICELab81.12%83442013 Dec 252016 Sep 27
Igor Lacik39.32%80732013 Dec 252016 Sep 08
Iron bot77.74%19992015 Nov 272016 Sep 26
Ivana Kellyerova11.47%16302013 Dec 252015 Apr 01
Jacob Knudsen20.53%12572016 Feb 232016 Sep 10
Jakub Trancik45.08%84162013 Dec 252016 Sep 27
Jan Pajan14.48%11192013 Dec 252016 Jan 05
Johan Kayser24.46%4132016 Jul 292016 Sep 27
Johannes Holzfuss35.04%6852016 Mar 052016 Jun 15
JompaBot21.99%10552016 Feb 042016 Aug 13
Jon W5.06%7902015 Apr 302015 Jul 09
Karin Valisova17.68%11712013 Dec 252016 Jan 26
krasi068.77%21422015 Nov 302016 Sep 27
Krasimir Krystev70.52%65102013 Dec 252016 Mar 10
La Nuee51.61%5582015 Dec 132016 Mar 18
LetaBot CIG 201675.68%4442016 Aug 012016 Sep 27
LetaBot IM noMCTS60.93%12262016 May 182016 Aug 01
LetaBot SSCAI 2015 Final65.87%4162016 Aug 042016 Sep 27
Lucia Pivackova13.29%8352013 Dec 252015 Jan 20
Ludmila Nemsilajova16.04%5052013 Dec 252015 Jan 21
Lukas Sedlacek22.86%702015 Jan 122015 Jan 20
Maja Nemsilajova23.81%42462013 Dec 252015 Nov 29
Marcin Bartnicki60.42%14352014 Nov 282016 Mar 18
Marek Gajdos22.69%13842016 Jan 302016 Sep 11
Marek Kadek37.29%76412013 Dec 252016 May 22
Marek Kruzliak14.45%9342013 Dec 252015 Jan 20
Marek Suppa51.85%44132015 Jan 052016 Mar 18
Marian Devecka58.66%62892013 Dec 252016 Sep 27
Martin Dekar33.14%49102013 Dec 252016 Jan 25
Martin Pinter28.98%37402013 Dec 252015 Dec 11
Martin Rooijackers68.50%72902014 Jul 282016 Sep 27
Martin Strapko19.76%33862013 Dec 252016 Jan 26
Martin Vlcak28.92%12242016 Feb 162016 Sep 07
Matej Istenik44.74%82972013 Dec 252016 Sep 27
Matej Kravjar49.57%32342013 Dec 252015 Feb 18
Matyas Novy6.32%16932015 Feb 042015 Jul 09
MegaBot49.40%4192016 Aug 012016 Sep 27
Nathan a David39.34%10042016 Feb 232016 Aug 08
neverdieTRX20.66%3342016 Jul 192016 Sep 10
NUS Bot35.72%33372015 May 192016 Sep 06
Odin201455.65%56482014 Dec 212016 Sep 11
Oleg Ostroumov48.75%36412013 Dec 252016 Jan 26
OpprimoBot21.85%20092015 Nov 182016 Sep 27
Pablo Garcia Sanchez12.20%5902015 Dec 242016 Apr 13
PeregrineBot57.29%12762016 Feb 092016 Sep 10
Peter Dobsa13.25%30272015 Jan 112015 Oct 02
Radim Bobek23.37%11512015 Oct 012016 Mar 06
Rafael Bocquet0.00%102015 Jun 232015 Jun 26
Rob Bogie31.34%6512016 May 142016 Sep 06
Roman Danielis45.63%51552013 Dec 252016 Sep 26
Sebastian Mahr15.06%12022016 Jan 132016 Aug 08
Serega48.20%38032015 Jan 312016 Jan 26
Sergei Lebedinskij13.30%10832015 May 282015 Sep 03
Sijia Xu71.65%23282015 Oct 102016 Sep 27
Simon Prins55.48%54312015 Jan 252016 Sep 27
Soeren Klett63.62%82772013 Dec 252016 Sep 27
Sungguk Cha18.65%6972016 Jun 052016 Sep 27
Tae Jun Oh4.55%1542016 Mar 222016 Apr 11
Tomas Cere61.11%83732013 Dec 252016 Sep 27
Tomas Vajda79.37%83722013 Dec 252016 Sep 27
Tomasz Michalski27.02%4332015 Dec 222016 Mar 18
Travis Shelton23.59%12212016 Feb 282016 Sep 06
tscmoo72.06%57192015 Jan 222016 Sep 27
tscmoop78.16%19922015 Nov 112016 Sep 26
tscmooz79.80%50062015 Feb 272016 Sep 27
UPStarcraftAI24.75%6102015 Dec 242016 Apr 13
Vaclav Horazny37.35%64552013 Dec 252015 Nov 18
VeRLab17.06%8972016 Feb 282016 Aug 01
Vladimir Jurenka38.45%61672013 Dec 252016 Sep 27
Vojtech Jirsa14.14%27862015 Jan 122015 Sep 05
WuliBot72.76%9842016 Apr 192016 Sep 26
ZerGreenBot22.22%362016 Sep 222016 Sep 27
Zia bot52.24%5362016 Jul 072016 Sep 27

Also the maps. Games on 2014 October 24 and earlier did not specify the map; it is blank in the file. The first game with a map specified was 2014 October 29, so there’s a gap in the records (maybe downtime, or tournament stuff). Anyway, we can see that the maps are the usual SSCAIT map pack plus BGH for a small number of games on April Fools.

It is Most Curious that Electric Circuit has fewer games. It was last played on 2015 Feb 3, though I still see it in the map pack that they distribute.

mapgamesearliestlatest
(2)Benzene.scx82572014 Oct 292016 Sep 27
(2)Destination.scx81372014 Oct 292016 Sep 27
(2)HeartbreakRidge.scx82492014 Oct 292016 Sep 27
(3)NeoMoonGlaive.scx81572014 Oct 292016 Sep 27
(3)TauCross.scx81822014 Oct 292016 Sep 27
(4)Andromeda.scx82332014 Oct 292016 Sep 27
(4)CircuitBreaker.scx80832014 Oct 292016 Sep 27
(4)ElectricCircuit.scx9752014 Oct 292015 Feb 03
(4)EmpireoftheSun.scm83182014 Oct 292016 Sep 26
(4)FightingSpirit.scx82882014 Oct 292016 Sep 27
(4)Icarus.scm82372014 Oct 292016 Sep 27
(4)Jade.scx81542014 Oct 292016 Sep 27
(4)LaMancha1.1.scx81302014 Oct 292016 Sep 27
(4)Python.scx81752014 Oct 292016 Sep 27
(4)Roadrunner.scx81722014 Oct 292016 Sep 27
(8)BGH.scm4632015 Apr 012016 Apr 02
[none specified]249532013 Dec 252014 Oct 24

CIG 2016 - crosstables per map

Today I crush you under a mass of charts, crosstables for each of the 5 maps in CIG 2016. This is 4 dimensional data (bot 1, bot 2, map, winning rate) and I imagine there’s a clearer way to present it, but I don’t know what it is so you get it in the first form I thought of. At the end is a link to the software.

As a reminder, here are the maps.

  • (2)RideofValkyries1.0
  • (3)Alchemist1.0
  • (3)TauCross1.1
  • (4)LunaTheFinal2.3
  • (4)Python1.3

With 100 rounds and 5 maps, for each pairing 20 games were played on each map (minus a few games missing due to errors). So the percentages vary in steps of 5% (or more if games are missing), and the error bars are wide.

The qualifier tables are big. The first is the full tournament for comparison, the rest are the subtournaments played on each map.

overallIrontscmLetaOverMegaUAlbZZZKAiurTyrZiabTerrSRboOpprXelnBonjSals
Iron79.20%56%39%56%38%63%53%91%99%95%100%100%98%100%100%100%
tscmoo76.97%44%48%53%75%87%82%43%45%81%100%100%100%97%100%100%
LetaBot74.07%61%52%81%28%60%51%31%72%78%100%100%99%99%99%100%
Overkill70.98%44%47%19%84%32%56%79%31%88%98%95%93%99%100%100%
MegaBot70.11%62%25%72%16%52%7%66%85%99%96%95%87%91%99%100%
UAlbertaBot69.25%37%13%40%68%48%55%77%42%88%100%89%85%97%100%100%
ZZZKBot69.18%47%18%49%44%93%45%69%29%49%100%100%100%94%100%100%
Aiur63.15%9%57%69%21%34%23%31%44%89%86%96%99%94%96%100%
Tyr61.64%1%55%28%69%15%58%71%56%24%74%98%94%88%95%99%
Ziabot46.43%5%19%22%12%1%12%51%11%76%59%98%100%32%100%100%
TerranUAB33.51%0%0%0%2%4%0%0%14%26%41%81%76%90%70%98%
SRbotOne22.15%0%0%0%5%5%11%0%4%2%2%19%19%92%74%99%
OpprimoBot22.10%2%0%1%7%13%15%0%1%6%0%24%81%56%27%99%
XelnagaII20.71%0%3%1%1%9%3%6%6%12%68%10%8%44%56%83%
Bonjwa18.95%0%0%1%0%1%0%0%4%5%0%30%26%73%44%100%
Salsa1.47%0%0%0%0%0%0%0%0%1%0%2%1%1%17%0%
ValkyriesoverallIrontscmLetaOverMegaUAlbZZZKAiurTyrZiabTerrSRboOpprXelnBonjSals
Iron72.67%35%10%40%25%75%40%75%95%100%100%100%95%100%100%100%
tscmoo80.67%65%55%60%80%90%80%30%55%95%100%100%100%100%100%100%
LetaBot72.00%90%45%95%30%60%10%10%85%55%100%100%100%100%100%100%
Overkill71.33%60%40%5%85%25%50%95%45%70%95%100%100%100%100%100%
MegaBot70.33%75%20%70%15%30%0%65%90%100%95%95%100%100%100%100%
UAlbertaBot69.67%25%10%40%75%70%50%85%35%90%100%75%90%100%100%100%
ZZZKBot77.67%60%20%90%50%100%50%100%55%40%100%100%100%100%100%100%
Aiur63.00%25%70%90%5%35%15%0%70%70%80%95%100%90%100%100%
Tyr59.67%5%45%15%55%10%65%45%30%65%70%100%100%95%95%100%
Ziabot51.33%0%5%45%30%0%10%60%30%35%65%100%100%90%100%100%
TerranUAB34.67%0%0%0%5%5%0%0%20%30%35%80%80%90%80%95%
SRbotOne24.33%0%0%0%0%5%25%0%5%0%0%20%25%85%100%100%
OpprimoBot20.00%5%0%0%0%0%10%0%0%0%0%20%75%75%20%95%
XelnagaII12.00%0%0%0%0%0%0%0%10%5%10%10%15%25%40%65%
Bonjwa17.67%0%0%0%0%0%0%0%0%5%0%20%0%80%60%100%
Salsa3.00%0%0%0%0%0%0%0%0%0%0%5%0%5%35%0%
AlchemistoverallIrontscmLetaOverMegaUAlbZZZKAiurTyrZiabTerrSRboOpprXelnBonjSals
Iron77.00%80%45%80%40%40%0%95%100%75%100%100%100%100%100%100%
tscmoo69.90%20%35%75%50%80%75%15%35%75%100%100%100%89%100%100%
LetaBot72.33%55%65%55%15%40%80%15%75%90%100%100%100%100%95%100%
Overkill66.67%20%25%45%90%10%55%60%45%85%100%95%75%95%100%100%
MegaBot70.67%60%50%85%10%45%15%60%90%100%85%90%95%80%95%100%
UAlbertaBot77.00%60%20%60%90%55%40%85%80%95%100%85%85%100%100%100%
ZZZKBot72.67%100%25%20%45%85%60%65%50%45%100%100%100%95%100%100%
Aiur68.00%5%85%85%40%40%15%35%60%90%80%90%100%100%95%100%
Tyr51.33%0%65%25%55%10%20%50%40%10%60%90%80%80%85%100%
Ziabot47.33%25%25%10%15%0%5%55%10%90%50%100%100%25%100%100%
TerranUAB36.67%0%0%0%0%15%0%0%20%40%50%80%80%90%80%95%
SRbotOne23.67%0%0%0%5%10%15%0%10%10%0%20%5%100%80%100%
OpprimoBot22.33%0%0%0%25%5%15%0%0%20%0%20%95%35%20%100%
XelnagaII25.08%0%11%0%5%20%0%5%0%20%75%10%0%65%75%90%
Bonjwa18.33%0%0%5%0%5%0%0%5%15%0%20%20%80%25%100%
Salsa1.00%0%0%0%0%0%0%0%0%0%0%5%0%0%10%0%
Tau CrossoverallIrontscmLetaOverMegaUAlbZZZKAiurTyrZiabTerrSRboOpprXelnBonjSals
Iron82.33%65%40%75%15%55%95%90%100%100%100%100%100%100%100%100%
tscmoo77.33%35%35%40%100%90%95%45%35%85%100%100%100%100%100%100%
LetaBot80.67%60%65%90%25%80%80%60%50%100%100%100%100%100%100%100%
Overkill73.33%25%60%10%100%40%55%95%20%95%100%100%100%100%100%100%
MegaBot72.33%85%0%75%0%55%15%85%95%100%100%100%80%95%100%100%
UAlbertaBot67.33%45%10%20%60%45%65%75%35%85%100%95%75%100%100%100%
ZZZKBot57.67%5%5%20%45%85%35%25%5%55%100%100%100%85%100%100%
Aiur61.20%10%55%40%5%15%25%75%15%95%95%100%100%95%95%100%
Tyr68.33%0%65%50%80%5%65%95%85%10%80%100%100%90%100%100%
Ziabot43.00%0%15%0%5%0%15%45%5%90%65%95%100%10%100%100%
TerranUAB33.44%0%0%0%0%0%0%0%5%20%35%80%90%100%70%100%
SRbotOne21.00%0%0%0%0%0%5%0%0%0%5%20%20%95%70%100%
OpprimoBot20.67%0%0%0%0%20%25%0%0%0%0%10%80%55%20%100%
XelnagaII22.00%0%0%0%0%5%0%15%5%10%90%0%5%45%70%85%
Bonjwa18.33%0%0%0%0%0%0%0%5%0%0%30%30%80%30%100%
Salsa1.00%0%0%0%0%0%0%0%0%0%0%0%0%0%15%0%
LunaoverallIrontscmLetaOverMegaUAlbZZZKAiurTyrZiabTerrSRboOpprXelnBonjSals
Iron81.33%35%55%50%50%65%70%95%100%100%100%100%100%100%100%100%
tscmoo81.00%65%40%35%70%95%90%65%75%85%100%100%100%95%100%100%
LetaBot74.00%45%60%70%25%55%55%30%85%85%100%100%100%100%100%100%
Overkill68.90%50%65%30%70%35%55%65%0%95%100%80%89%100%100%100%
MegaBot71.24%50%30%75%30%70%0%65%75%95%100%95%90%95%100%100%
UAlbertaBot66.22%35%5%45%65%30%60%75%35%75%100%95%80%95%100%100%
ZZZKBot68.33%30%10%45%45%100%40%80%30%55%100%100%100%90%100%100%
Aiur60.67%5%35%70%35%35%25%20%35%90%85%95%95%90%95%100%
Tyr63.21%0%25%15%100%25%65%70%65%30%75%100%100%85%100%95%
Ziabot45.33%0%15%15%5%5%25%45%10%70%60%95%100%35%100%100%
TerranUAB31.67%0%0%0%0%0%0%0%15%25%40%85%65%80%65%100%
SRbotOne20.67%0%0%0%20%5%5%0%5%0%5%15%10%80%65%100%
OpprimoBot24.16%0%0%0%11%10%20%0%5%0%0%35%90%60%35%100%
XelnagaII22.07%0%5%0%0%5%5%10%10%15%65%20%20%40%45%90%
Bonjwa19.67%0%0%0%0%0%0%0%5%0%0%35%35%65%55%100%
Salsa1.01%0%0%0%0%0%0%0%0%5%0%0%0%0%10%0%
PythonoverallIrontscmLetaOverMegaUAlbZZZKAiurTyrZiabTerrSRboOpprXelnBonjSals
Iron82.67%65%45%35%60%80%60%100%100%100%100%100%95%100%100%100%
tscmoo75.92%35%75%55%75%80%70%60%25%65%100%100%100%100%100%100%
LetaBot71.33%55%25%95%45%65%30%40%65%60%100%100%95%95%100%100%
Overkill74.67%65%45%5%75%50%65%80%45%95%95%100%100%100%100%100%
MegaBot66.00%40%25%55%25%60%5%55%75%100%100%95%70%85%100%100%
UAlbertaBot66.00%20%20%35%50%40%60%65%25%95%100%95%95%90%100%100%
ZZZKBot69.57%40%30%70%35%95%40%75%5%53%100%100%100%100%100%100%
Aiur62.88%0%40%60%20%45%35%25%40%100%90%100%100%95%95%100%
Tyr65.67%0%75%35%55%25%75%95%60%5%85%100%90%90%95%100%
Ziabot45.12%0%35%40%5%0%5%47%0%95%53%100%100%0%100%100%
TerranUAB31.10%0%0%0%5%0%0%0%10%15%47%80%65%90%55%100%
SRbotOne21.07%0%0%0%0%5%5%0%0%0%0%20%35%100%55%95%
OpprimoBot23.33%5%0%5%0%30%5%0%0%10%0%35%65%55%40%100%
XelnagaII22.41%0%0%5%0%15%10%0%5%10%100%10%0%45%50%85%
Bonjwa20.74%0%0%0%0%0%0%0%5%5%0%45%45%60%50%100%
Salsa1.33%0%0%0%0%0%0%0%0%0%0%0%5%0%15%0%

The final tables are small. Again, the first is the full tournament, the rest are the maps.

overalltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo65.14%52%44%79%71%77%83%50%
Iron54.43%48%38%49%49%74%30%93%
LetaBot53.71%56%62%49%81%69%30%29%
ZZZKBot53.08%21%51%51%42%35%93%78%
Overkill51.43%29%51%19%58%43%81%79%
UAlbertaBot49.07%23%26%31%65%57%76%66%
MegaBot38.00%17%70%70%7%19%24%59%
Aiur35.14%50%7%71%22%21%34%41%
ValkyriesoveralltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo68.57%60%30%75%75%85%90%65%
Iron47.86%40%25%40%50%60%25%95%
LetaBot50.71%70%75%0%85%70%30%25%
ZZZKBot67.86%25%60%100%40%55%95%100%
Overkill52.14%25%50%15%60%40%100%75%
UAlbertaBot50.00%15%40%30%45%60%90%70%
MegaBot35.71%10%75%70%5%0%10%80%
Aiur27.14%35%5%75%0%25%30%20%
AlchemistoveralltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo49.29%45%35%65%80%55%50%15%
Iron44.29%55%50%0%35%60%30%80%
LetaBot51.43%65%50%75%60%65%20%25%
ZZZKBot61.43%35%100%25%45%45%95%85%
Overkill51.43%20%65%40%55%45%75%60%
UAlbertaBot52.14%45%40%35%55%55%75%60%
MegaBot42.14%50%70%80%5%25%25%40%
Aiur47.86%85%20%75%15%40%40%60%
Tau CrossoveralltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo61.43%25%45%90%45%100%95%30%
Iron66.43%75%55%95%50%85%15%90%
LetaBot61.43%55%45%85%90%80%35%40%
ZZZKBot29.29%10%5%15%50%25%75%25%
Overkill52.14%55%50%10%50%40%70%90%
UAlbertaBot45.00%0%15%20%75%60%75%70%
MegaBot37.86%5%85%65%25%30%25%30%
Aiur46.43%70%10%60%75%10%30%70%
LunaoveralltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo71.43%70%60%80%75%75%90%50%
Iron55.00%30%30%55%50%80%40%100%
LetaBot51.43%40%70%65%75%65%20%25%
ZZZKBot52.14%20%45%35%60%25%100%80%
Overkill48.57%25%50%25%40%45%85%70%
UAlbertaBot50.00%25%20%35%75%55%70%70%
MegaBot40.00%10%60%80%0%15%30%85%
Aiur31.43%50%0%75%20%30%30%15%
PythonoveralltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo75.00%60%50%85%80%70%90%90%
Iron58.57%40%30%55%60%85%40%100%
LetaBot53.57%50%70%20%95%65%45%30%
ZZZKBot54.68%15%45%80%15%26%100%100%
Overkill52.86%20%40%5%85%45%75%100%
UAlbertaBot48.20%30%15%35%74%55%70%60%
MegaBot34.29%10%60%55%0%25%30%60%
Aiur22.86%10%0%70%0%0%40%40%

The charts are full of small insights—more than I have time to examine. See for example how XelnagaII’s upset of Ziabot occurred on all maps except Ride of Valkyries; I’m sure that says something about at least one of those bots. We can tease out which pairings the map imbalances spring from. ZZZKBot did poorly on Tau Cross, as explained by Martin Rooijackers due to the long rush distance. And so on.

My strongest impression is how much results vary from map to map. I still think 5 maps are not enough to judge strength fairly. To my eye, the datapoint that stands out most is that ZZZKBot defeated the powerful Iron 100% of the time on Alchemist, in both the qualifier and the final, although otherwise Alchemist was a mediocre map for ZZZKBot. It looks as though Iron has a strategy bug on that map which ZZZKBot exploits. All bot authors who competed may want to eye the charts for hints about weaknesses to fix.

Download a zip file of the perl scripts with documentation.

CIG 2016 - the final hidden in the qualifier

Yesterday I claimed that the final stage of CIG 2016 produced little new information, because it was equivalent to drawing a subset from the qualifiers. Is it true? I wrote a script to render crosstables from subsets of game results.

Here’s my rendition of the real finals. I liked the red and green color coding of win rates in the original, but some people are red-green colorblind so my version has red and blue instead. I also went with a more contrasty color curve.

overalltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo65.14%52%44%79%71%77%83%50%
Iron54.43%48%38%49%49%74%30%93%
LetaBot53.71%56%62%49%81%69%30%29%
ZZZKBot53.08%21%51%51%42%35%93%78%
Overkill51.43%29%51%19%58%43%81%79%
UAlbertaBot49.07%23%26%31%65%57%76%66%
MegaBot38.00%17%70%70%7%19%24%59%
Aiur35.14%50%7%71%22%21%34%41%

Here is the crosstable of the final hidden in the qualifier, which is to say the qualifier games played between finalists.

overalltscmIronLetaZZZKOverUAlbMegaAiur
tscmoo61.71%44%48%82%53%87%75%43%
Iron56.57%56%39%53%56%63%38%91%
LetaBot52.00%52%61%51%81%60%28%31%
ZZZKBot52.14%18%47%49%44%45%93%69%
Overkill51.57%47%44%19%56%32%84%79%
UAlbertaBot48.29%13%37%40%55%68%48%77%
MegaBot42.86%25%62%72%7%16%52%66%
Aiur34.86%57%9%69%31%21%23%34%

Overall results match closely. LetaBot and ZZZKBot have switched ranks, but that’s not a surprise because their scores were extremely close.

The 2 table cells with the largest differences are Tscmoo vs Overkill and MegaBot vs UAlbertaBot. The Tscmoo-Overkill numbers are within the expected range of statistical variation, according to spot checks with Fisher’s Exact Test, but the MegaBot-UAlbertaBot numbers are highly surprising, far outside the expected range. (The right way to do this would test both whole tables as a sample of samples of samples. :-) So there’s indication that something may be afoot.

I had a new thought. It’s theoretically possible that differences are caused by learning bots which generalize across opponents. Tscmoo and MegaBot are both learning bots (I verified it: they both wrote stuff to their learning files) and both seem as though they might be able to generalize across opponents. (Overkill is a learning bot but does not generalize.) So my original claim is not 100% true: The qualifiers don’t entirely duplicate the final in the presence of learning bots which generalize across opponents. Alternately, there could have been a problem with a big effect on that pairing (such as a bug in MegaBot related to its learning, an example which is equivalent to mis-generalizing across opponents). We have the source and the replays, so a sufficiently deep dig should turn up the issue if it is in the bots. There’s a chance that the issue is with the tournament operations, or with my script.

Here I combine the qualifier results with the final results to get the best numbers available. The organizers for whatever reason explicitly decided not to do this. Luckily, it doesn’t change the ranking of the bots.

overall
tscmoo63.43%
Iron55.50%
LetaBot52.86%
ZZZKBot52.61%
Overkill51.50%
UAlbertaBot48.68%
MegaBot40.43%
Aiur35.00%

Tomorrow: More map analysis. Also I’ll release the script for others to play with.

CIG 2016 results discussion

I got ahead of myself yesterday—I should step back and talk about the CIG 2016 results more generally! Martin Rooijackers aka LetaBot sent me a few observations by e-mail. They mostly match up with my observations, and I’ll add a few of my own.

• Terran Renaissance confirmed, as predicted (probably by everybody who cared to predict).

• The top 3 winners, besides being terran, are all bots with many updates over the last several months.

• 3 bots of the final 8 are carryovers from past years (#5 Overkill, #6 UAlbertaBot, and #8 AIUR). They scored in the lower half. #4 ZZZKBot seems to have been only slightly updated. The long work put into the top 3 paid off in playing strength.

• Martin Rooijackers observes that #7 MegaBot is the highest-scoring brand new bot. It’s true if you count Iron as a continuation of Stone. And given MegaBot’s self-description as a meta-bot that uses the strategies of others, MegaBot is arguably not brand new either. In any case, the point is that it seems to take a long period of work to get to the top. The competition is fierce.

• None of the final 8 bots dominated the others. Even tail-ender AIUR had an equal record against winner Tscmoo and a winning record against LetaBot. The CIG 2016 finals crosstable has upsets throughout. Comparing to the AIIDE 2015 crosstable with 22 participants, the rate of upsets of bots near each other in rank seems visually similar, so with only 8 final bots the upsets run all the way through. Generally, bot #n is not clearly better than bot #n+1; the ranking is not stable at that level. In the qualifying stage, the rate of upsets visually looks steady down to #9 Tyr and then falls. AIIDE 2015 did not have that pattern.

• I predicted that ZZZKBot still had a chance to make it into the top 3. It didn’t, but it scored 53.08% to make #4 in the finals versus #3 LetaBot’s 53.71%. I think the prediction was justified. This was its last chance, though, without big updates.

• The qualifier results and finals results look different. Iron was narrowly on top in the qualifiers, but Tscmoo pulled well ahead in the finals (a surprise to me). Apparently Tscmoo is better tuned to defeat strong opponents.

• The slides on the result page include a chart of win rates over time which shows that learning helps some, but (as in the past) not as much as you’d hope. To learn more we need smarter learning. I’ll drop a few suggestions in a future post.

The bottom line is that we’re making good progress, though we’re still not far along the path. Tscmoo’s long short term memory is a pioneering idea and Tscmoo finished #1, but we don’t know much about it. Did the memory help results? Meanwhile, LetaBot finished #3 here, and is in a strong position as Martin Rooijackers tries to pioneer a next step in another direction, a tactical search derived from MaasCraft. Will the search lead to the hoped-for jump in strength? Tune in next time!

I question the tournament design. They ran a 100-round round robin with 16 bots and used the results to accept half of the entrants into the final—a staged design with qualifier and finals. That’s perfectly reasonable; it says that they’re more interested in who beats the strong than who consistently beats the weak. Having selected the finalists, they discarded the qualifier results and ran an independent final with 100 more rounds on the same maps for the 8 finalists. They even discarded bot learning files from the qualifier, so that nothing carried over. The final duplicated the qualifiers, only with fewer bots, and produced little new information. They could have saved the time and extracted the final results from the qualifier stage. It would have been equivalent.

In a staged tournament, each stage should produce new information. It could add to the qualifier results. It could have more rounds. It could include seeded opponents that skipped the qualifiers (though I wouldn’t recommend that for an academic tournament). It could include different maps. It could follow harsher rules. But something!

I can understand why they didn’t pass the qualifier results through to the final stage. They had the software they had, and an organizer’s time is always short. But this final had no point. I hope future tournaments will remember the lesson.

map balance - bot balance in CIG 2016

CIG 2016 reported its results in the same format as AIIDE 2015 (I’m sure they used the same software), so I was able to compute the map balance with a few adjustments to my script. The tournament was run in two halves, qualifiers and finals, each with 100 rounds. With 5 maps, that makes 20 times through the map pool. They could have used twice as many maps without any disadvantage that I see.

The qualifiers, with 16 bots playing 12,000 games total (minus a few lost to errors):

mapTvZZvPPvT
winsnwinsnwinsn
(2)RideofValkyries.scx49%64061%24057%480
(3)Alchemist.scm50%64045%24060%479
(3)TauCross.scx56%64043%24053%479
(4)LunaTheFinal.scx53%63747%24053%480
(4)Python.scx49%63845%24050%478
overall51%319548%120055%2396

The 3 races came out remarkably even! We already know that’s more due to the strength distribution of bots in the tournament than to the fairness of the game. The low-high spread in TvZ was 56%-49% = 7%; in ZvP 18%, and in PvT 7%. Ride of Valkyries had strikingly different ZvP results than the other maps. I don’t know why. Can anybody guess? The human balance also showed one map standing out in ZvP, but it was Alchemist.

The final, with 8 bots playing 2800 games, looks considerably different:

mapTvZZvPPvT
winsnwinsnwinsn
(2)RideofValkyries.scx54%12092%8045%120
(3)Alchemist.scm52%12079%8063%120
(3)TauCross.scx76%12065%8049%120
(4)LunaTheFinal.scx67%12084%8046%120
(4)Python.scx66%12094%8034%120
overall63%60083%40048%600

Here, protoss did poorly because the protoss bots came out on the bottom this time. It’s interesting that the middle-of-the-table zergs did more to hold down the protoss than the winning terrans (but it fits with the game storyline :-). Beyond that, I’m reluctant to draw conclusions from this smaller number of games with fewer players.

I feel vindicated: Map balance can make a difference, even though we don’t understand what the difference is!

ZerGreenBot

The new protoss bot ZerGreenBot was uploaded at SSCAIT today. It describes itself as “terribad” and... I can’t disagree, but it’s fun. To defend its base it builds zealots and dragoons. These units only leave the base if they are lured out. To attack it sends a shuttle with 2 reavers. It seems to keep building shuttles and reavers from one robo, so whether the first lives or dies, more will fly out later to attack independently.

It never expands. It doesn’t scout until the shuttle flies around the map. If it happens to see the enemy natural first, it never seems to realize that the enemy must have a main too. The shuttle disregards danger. Sometimes it drops the reavers near their target, but on the wrong side of a cliff. In one game ZerGreenBot took a couple potshots at an unfinished spire, a good first target, but then moved on—and the attack was later cleared by the first mutalisk. And yet its manic shuttle-reaver micro is fun! The basic procedure seems to be drop, fire at whatever’s near, pick up, move a little around the outside of the base, drop, etc. It’s as if it were trying to duplicate the Berkeley Overmind’s “dismantle the enemy base from the outside in” tactics. When there are two shuttles, they do a wacky dance.

One thing the bot does right is that it keeps the shuttle always moving, so that it never has to accelerate from a stop. I take that as a sign that the author understands shuttle-reaver micro and merely hasn’t implemented much of it yet (because it is crazy hard).

In the games I’ve seen so far, opponents react poorly to the reaver drop. They don’t understand that the shuttle is a high priority target, and they don’t know how to escape or how to attack. Even with no other improvements, better shuttle-reaver control by itself might make ZerGreenBot a dangerous opponent for many bots, though probably not for the top tier. If the goal is to play strongly, I suggest this order of improvements: 1. Better choice of targets and drop locations. 2. Attention to avoiding danger. 3. Smarter scouting, so that the better choice of targets bites harder. And only then work on expanding and being more aggressive with the other units. Well, it’s only my first thought; I’m sure the author knows better than I do.

By the way, I think the name is a joke. “Zerg-reen” sounds like a zerg marine, everything that is not protoss.

map balance - comparing pro and bot balance

I started to think about fancy ways to normalize map balance data so that the numbers could be compared—and then I realized, who the hell cares? The data’s not good enough in the first place, at least the bot data, which is based on only 21 bots with idiosyncratic play styles and big race imbalances regardless of the maps. We can only get a general idea of the comparison anyway.

So I decided on a simple subtraction of the average from each map balance number, so that a map with average balance has normalized balance 0%. Then we can compare maps to see if they have similar relative balance for pros and bots. After normalization, TvZ > 0 means that terran did better than average on that map, and TvZ < 0 means that terran did worse.

mapTvZZvPPvT
probotprobotprobot
Benzene10.8%-2.9%-5.3%1.3%-5.1%-0.4%
Destination-1.0%-1.9%2.6%2.3%0.7%-0.4%
Heartbreak Ridge-4.7%3.1%2.2%-0.7%5.3%-3.4%
Aztec-14.3%-1.9%-4.4%1.3%11.6%-0.4%
Tau Cross-3.3%-0.9%-4.4%-0.7%-1.8%-0.4%
Andromeda-10.6%1.1%4.4%-1.7%3.8%-4.4%
Circuit Breaker-0.4%-0.9%-2.6%-1.7%-0.8%2.6%
Empire of the Sun10.9%-4.9%-4.4%-1.7%-2.7%0.6%
Fortress11.0%8.1%12.3%-0.7%-2.6%4.6%
Python1.9%1.1%-0.5%2.3%-8.0%1.6%

There’s no “overall” row because, after normalization, it’s just a row of zeroes. Also, as I mentioned, the sizes of the imbalances can’t be compared directly. A relative balance of -5% in the bot ZvP column (average balance 71%) doesn’t mean the same thing as -5% in the pro ZvP column (average balance 54.4%).

No convincing pattern is visible. The pro and bot columns have the same sign in 12 cases, which is not distinguishable from 50% (15 cases). Sometimes a pro map with a large imbalance has a large imbalance for bots too; sometimes not. Here’s a scatter chart with relative pro balance on the x-axis and relative bot balance on the y. Remember that the signs are arbitrary: We arbitrarily chose to compare TvZ rather than ZvT, so + and - were chosen arbitrarily. If your eyes think they see a pattern, flip one or two of the symbol sets around one axis or the other before you decide it’s real.

scatter chart showing the lack of relationship between map balance for pros and for bots

What does it all mean in practice? There are some maps with apparent imbalances, which means we should have map pools large enough that imbalances tend to average out. Most maps are not far from balanced, so 10 maps should be enough; the 5 maps of CIG 2016 do not seem enough. Other than that, there’s no reason to change how we select maps. We don’t know whether last year’s relative map balances will carry over to this year, when the skill of the top bots is greater and they are terran rather than zerg. The main conclusion is the same as the conclusion of all studies since the invention of science: More research is needed!

map balance - bot balance in AIIDE 2015

I wrote Ye Usualle Little Perl Script to calculate map balance in AIIDE 2015, based on the the detailed game results (the “plaintext” link on that page). The results do not tell us what race random UAlbertaBot got each game, so its results don’t count in the analysis. UAlbertaBot was the only random bot.

mapTvZZvPPvT
winsnwinsnwinsn
(2)Benzene.scx18%40572%31564%567
(2)Destination.scx19%40573%31564%567
(2)HeartbreakRidge.scx24%40570%31561%567
(3)Aztec.scx19%40572%31564%567
(3)TauCross.scx20%40570%31564%567
(4)Andromeda.scx22%40569%31560%567
(4)CircuitBreaker.scx20%40569%31567%567
(4)EmpireoftheSun.scm16%40569%31565%567
(4)Fortress.scx19%40570%31569%567
(4)Python.scx22%40573%31566%567
overall20%405071%315064%5670

In the table, n is the total number of games in the matchup, one of several crosschecks to make sure the analysis is right. The tournament had 5 zerg, 7 protoss, and 9 terran bots (plus random UAlbertaBot, which was not counted, making 22 participants). There were 90 rounds, each on one map (which over 10 maps means 9 times through the map pool). So for TvZ there should be 5*9*90 = 4050 games; for ZvP 5*7*90 = 3150 games; for PvT 7*9*90 = 5670 games. Good.

OK, from this exercise I learned more about race balance in this tournament than about map balance. Zerg came out on top because zerg bots won. Meanwhile terran bots were concentrated toward the bottom of the crosstable, while protoss were scattered throughout. Zerg crushed protoss 2:1 but annihilated terran 5:1. I had not realized that it was so extreme. The maps made small differences, the bots made big differences.

Bots analyze maps shallowly and try to play about the same on different maps. I had expected that that lack of adaptivity would cause maps to affect results strongly: Adapting means that the bot matters more; failing to adapt means that the map matters more. But if so, it’s not visible in this table. Maybe the maps are standardized enough that adaptation doesn’t matter at this level of play. Or maybe my original thinking is wrong, and adaptation is what allows the map to matter—Heartbreak Ridge has a narrow base entrance, so that you can easily block your enemy in or out, and high ground over the natural to proxy on, and I haven’t seen any bot take advantage of those features.

You can download the AIIDE 2015 map balance analysis script in a .zip file. I ran it on a *nix but it can probably be adapted to run under Windows with no more than a tweak or two.

Next: I’ll try to normalize the results and compare human map balance to bot map balance in relative terms. Though you can get an idea already by eyeballing the tables.

map balance - AIIDE 2015

Here’s the map balance table for AIIDE 2015. As yesterday, these per-matchup statistics are for pro players and are copied from TLPD.

mapTvZZvPPvT
Benzene64.1%49.1%48.7%
Destination52.3%57%54.5%
Heartbreak Ridge48.6%56.6%59.1%
Aztec39%50%65.4%
Tau Cross50%50%52%
Andromeda42.7%58.8%57.6%
Circuit Breaker52.9%51.8%53%
Empire of the Sun64.2%50%51.1%
Fortress64.3%66.7%51.2%
Python55.2%53.9%45.8%
overall53.3%54.4%53.8%

With 10 maps to average over, the balance looks close enough to be fair. Some individual maps have large imbalances, but they mostly even out over the map pool. They don’t completely even out, though, because imbalances are too consistent across maps; there aren’t enough counterbalancing maps.

Of these 10 maps, only 2 (Tau Cross and Python) overlap with the 5 CIG 2016 maps.

Human balance and bot balance should be different. Next: I’ll try to investigate the bot balance in practice, using the AIIDE 2015 game results. Per-matchup numbers can’t be deduced from any of the summary tables, so I’ll have to go back to the raw game results. Will human and bot balance be somewhat similar, or all different?

map balance - CIG 2016

Map balance is hard.

Only about 5 competition maps have stats showing balance within a few percent of equal for all matchups. Seriously! That’s less than 2% of maps ever used in pro play! (Though to be fair, the total includes maps without enough games for us to know the balance.) The closest are the popular Fighting Spirit, Circuit Breaker, and Tau Cross, and the less-popular Arcadia 2 and Neo Aztec. If you want a balanced map pool beyond these 5 maps, you have to balance the maps against each other: “This one is T>P by 10%, so the rest should add up to P>T by 10%.” Of course those are human stats, and bot balance should be different, so you might want to balance using bot data.

The AIIDE and CIG rules both say that maps will be chosen at random from a larger pool. SSCAIT says its maps are selected from popular recent pro maps, and doesn’t mention balance. So I decided to look into it.

For today I calculated the balance of the CIG 2016 map pool, 5 maps randomly selected from a larger collection. Think of this as a first check to see how balance may come out when you’re not paying attention.

  • (2)RideofValkyries1.0
  • (3)Alchemist1.0
  • (3)TauCross1.1
  • (4)LunaTheFinal2.3
  • (4)Python1.3

I used balance numbers from the TLPD map database, which gives statistics for pro games played from 1999 to 2012. It’s not a definitive current pro balance, but it should be pretty good and it was complete and easy to use. Alchemist is not often played (presumably because it is grossly Z>P; also, according to Liquipedia “Alchemist is mostly noted for being a poor attempt at an asymmetrical three-player map”) and its stats are based on only 53 games. The % number in each cell is the winning rate for the first race in the matchup over each column.

mapTvZZvPPvT
Ride of Valkyries48.5%67.1%54.4%
Alchemist55.6%80%62.5%
Tau Cross50%50%52%
Luna the Final53.2%60.2%60%
Python55.2%53.9%45.8%
overall52.5%62.2%54.9%

I’d say that’s a substantial Z>P imbalance.

The numbers from TLPD are raw outcomes, with no attempt to adjust for the strength of the players. That’s likely good enough; it should average out over the large number of games played on most of these maps. But if we want to compare the pro balance with the bot balance after the tournament is over, we may want to do some normalization of both data sets. I’m predicting that this tournament will be dominated by terran bots. A comparison might give the impression that the maps are T>P and T>Z for bots, when in fact the terran bots were playing better.

Tomorrow: AIIDE 2015 map balance.

tournament map selection as a prod

I will never run a tournament. I don’t have the stomach for that much administrative work (and hats off to those who do!). So it’s perfectly safe for me to offer advice—I know I’ll never have to listen to it myself.

The way I see it, one goal of tournaments is to prod bots to improve; tournaments motivate. Another goal is to measure progress; tournament organizers are happy to include older bots that have competed in past tournaments, to see how they do against newer competition. There’s some tension between the two goals, but you don’t want to compromise either of them too much.

Earlier I suggested changing timeout rules to prod the winner to finish the game. Another way to prod bots is to make them play on new maps that present different challenges. Unfortunately, most of the concept maps that I talked about seem too hard for current bots (and the novelty maps are not suitable for competitions). Exception: The map Fantasy is not too hard, but it’s too subtle. Stepping down a level, I don’t know any current bot that can play on an island map. Even ignoring balance issues, a tournament would not want to include an island map like Charity, or even a semi-island map like Indian Lament, because it would break the goal of measuring progress. Bots that were made able to play the maps would likely score 100% against bots that could not.

There is a compromise. I suggest the map Namja Iyagi, a land map with 4 mineral-only islands (one in the corner behind each main base) and 2 mineral-and-gas islands. A bot with island skills would have a large advantage over a bot without island skills (the prod)—but not necessarily a decisive advantage. Two bots with no island skills could still play sound games against each other. If Namja Iyagi is only one map out of several, the tournament results remain a fair measure of progress.

The map Return of the King has 4 islands, so it might be a gentler prod.

Another prod that would be good is a map that promotes (but does not require) pushing through minerals or mineral-walking through obstacles, as in some of the concept maps. I’m not sure what a good choice would be, though.

A Team Liquid thread RFC: BW AI Bot Ladder proposes a much fancier attempt to encourage progress.

Tomorrow: Map balance.