archive by month
Skip to content

AIIDE 2017 race balance

The race representation in AIIDE 2017 was very unbalanced, with 13 zergs and only 4 terrans. But the results were closely balanced by race. If the colors in the table look white, well, one of them is and the others nearly are.

racescore
terran51%
protoss50%
zerg49%
random53%

All races did equally well overall. At the top of the rankings too, the 3 winners represent each race and their scores are virtually equal. I take it to mean that there is no good reason for the preponderance of zergs. In a way, the balance is a coincidence; if one race had stronger entrants, maybe for reasons unrelated to Starcraft, there would be an imbalance. And yet the point is made: It doesn’t matter what race you choose for your bot.

Only 1 bot played random, UAlbertaBot. That leaves the vRandom statistics not very interesting, so I left them out of the other tables.

Since the overall balance is virtually level, I added a matchup table.

vTvPvZ
terran51%52%
protoss49%51%
zerg48%49%

Again, the balance is virtually level. Terran wasn’t balanced because terran did well against zerg and poorly against protoss or vice versa; everything was equal all around. Well, protoss did a smidge better versus zerg and a smidge worse versus terran, but it’s hardly noticeable.

#botraceoverallvTvPvZ
1ZZZKBotzerg83.11%75%79%88%
2PurpleWaveprotoss82.35%79%82%83%
3Ironterran81.52%88%85%78%
4cpaczerg71.01%73%63%75%
5Microwavezerg70.86%77%67%71%
6CherryPizerg69.08%92%70%62%
7McRaveprotoss67.07%70%65%68%
8Arrakhammerzerg65.95%65%59%72%
9Tyrprotoss65.91%52%70%68%
10Steamhammerzerg64.14%57%54%74%
11AILienzerg58.29%48%61%62%
12LetaBotterran56.92%30%61%61%
13Ximpprotoss54.19%34%63%55%
14UAlbertaBotrandom53.40%58%60%47%
15Aiurprotoss50.46%54%49%52%
16IceBotterran45.62%64%50%40%
17Skynetprotoss43.78%40%32%54%
18KillAllzerg43.04%39%55%34%
19MegaBotprotoss42.83%43%41%45%
20Xelnagaprotoss37.10%54%38%34%
21Overkillzerg32.69%25%30%37%
22Junoprotoss29.57%39%35%24%
23GarmBotzerg27.09%15%34%24%
24Myscbotprotoss25.94%19%25%27%
25HannesBredbergterran21.26%18%11%31%
26Slingzerg21.09%8%28%19%
27ForceBotzerg17.97%21%15%20%
28Ziabotzerg17.21%26%21%13%

Individual bots, of course, are not as balanced. Some of the table cells have striking numbers. First of all, with many zergs and few terrans, the vZ column carries the most weight. Sure enough, #3 Iron’s relative weakness versus zerg (“only” winning 3:1) allowed competitors to squeeze in front.

The largest number in any cell is #6 CherryPi’s 92% versus terran. CherryPi crushed the 4 terrans: #3 Iron, #12 LetaBot, #16 ICEbot, #25 HannesBredberg. #26 Sling, in contrast, rolled over and died for terrans but had some chance against other races. It makes sense that the matchup with the fewest participants, terran, would give us the most extreme numbers.

#12 LetaBot and #13 XIMP struggled against terran, while #16 ICEbot and #20 Xelnaga were happy to accept terran customers. #10 Steamhammer and #17 Skynet only played well against zerg, while #18 KillAll liked protoss victims.

Next: The per-map crosstables. Prepare for data overload.

Trackbacks

No Trackbacks

Comments

Jay Scott on :

I want to emphasize that the equal balance is a coincidence of the participants. If HannesBredberg had not played, terran would have looked strong. If Iron had been missing, terran would have looked weak. Balance was even because the participants of each race were evenly matched, on average—which may have roots in the game, but if so, the causes are indirect. One bot more or less would have shifted the balance. And yet it still demonstrates that effort and skill make results, and race is not important at the current level of play.

krasi0 on :

Amazing! Broodwar is such a beautiful and balanced game, even in AI land!
BTW, you should do the same breakdowns and crosstables for the last 4k games on SSCAIT (4k is the number that MicroDK's ELO / ICCUP calculations are based on)

MicroDK on :

Even with 4000 games each bot only plays around 95 games each on average and only 2 games vs each other bot on average. Some bots have extreme number of games vs other bots and some bots have no games vs other bots. I dont think we will get any useful information out of that.

krasi0 on :

Well, I guess, it's the same at the human pro level scene. If Flash did not play at all, the balance would have been broken, etc.

MicroDK on :

Nice breakdown! A welcome surprise that Microwave did well in ZvZ. I already knew that ZvP would be weaker. But I had no time to test more openings before submission. Also, I did not have time to include an anti zerg rush opening. That would have made a high impact on performance vs ZZKBot and cpac.

skar1ath on :

Just to counteract the people lamenting the rise of hardcoded cheese in ZZZKBot, I'd like to point out how this breakdown highlights the incredible success of PurpleWave. PW might have won 3 fewer games total, but its results in the crosstable actually look better (2 matchups with a losing record, vs. 3 for ZZZK). Here, we can see that in terms of per-race matchups, PW was the most consistent of any bot, with only a 4 percentage-point difference between its worst-case winrate vs. Terran and its best-case winrate vs. Zerg. It also has the best worst-case, per-race winrate, at 79%, just beating out Iron's 78%.

Personally, I think that cheese is healthy for BWAI scene. While it leads to some low-effort bots that clean up weaker opponents for large numbers of wins, it also forces developers to depend less on a single, easy to cheese strategy, and to write more flexible, well-rounded bots. Going forward, I think we'll see more success from bots like PurpleWave, that can play cheese when needed but also do so much more.

krasi0 on :

I completely agree with your analysis. Congratulations to PurpleWave's author - Dan! PW is probably the bot that has gone the longest path uphill in the shortest time.
I, too, think that cheese has its role. That's why we've been allowing multiple entries (non-competitive) per bot author on SSCAIT, where the additional entries are one-cheese-trick ponies, e.g. 5 pool, purple tickles, Stone, etc. Those are good benchmarks and serve as baselines for other bots to compete against.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.