AIIDE 2017 race balance

The race representation in AIIDE 2017 was very unbalanced, with 13 zergs and only 4 terrans. But the results were closely balanced by race. If the colors in the table look white, well, one of them is and the others nearly are.

race	score
terran	51%
protoss	50%
zerg	49%
random	53%

All races did equally well overall. At the top of the rankings too, the 3 winners represent each race and their scores are virtually equal. I take it to mean that there is no good reason for the preponderance of zergs. In a way, the balance is a coincidence; if one race had stronger entrants, maybe for reasons unrelated to Starcraft, there would be an imbalance. And yet the point is made: It doesn’t matter what race you choose for your bot.

Only 1 bot played random, UAlbertaBot. That leaves the vRandom statistics not very interesting, so I left them out of the other tables.

Since the overall balance is virtually level, I added a matchup table.

	vT	vP	vZ
terran		51%	52%
protoss	49%		51%
zerg	48%	49%

Again, the balance is virtually level. Terran wasn’t balanced because terran did well against zerg and poorly against protoss or vice versa; everything was equal all around. Well, protoss did a smidge better versus zerg and a smidge worse versus terran, but it’s hardly noticeable.

#	bot	race	overall	vT	vP	vZ
1	ZZZKBot	zerg	83.11%	75%	79%	88%
2	PurpleWave	protoss	82.35%	79%	82%	83%
3	Iron	terran	81.52%	88%	85%	78%
4	cpac	zerg	71.01%	73%	63%	75%
5	Microwave	zerg	70.86%	77%	67%	71%
6	CherryPi	zerg	69.08%	92%	70%	62%
7	McRave	protoss	67.07%	70%	65%	68%
8	Arrakhammer	zerg	65.95%	65%	59%	72%
9	Tyr	protoss	65.91%	52%	70%	68%
10	Steamhammer	zerg	64.14%	57%	54%	74%
11	AILien	zerg	58.29%	48%	61%	62%
12	LetaBot	terran	56.92%	30%	61%	61%
13	Ximp	protoss	54.19%	34%	63%	55%
14	UAlbertaBot	random	53.40%	58%	60%	47%
15	Aiur	protoss	50.46%	54%	49%	52%
16	IceBot	terran	45.62%	64%	50%	40%
17	Skynet	protoss	43.78%	40%	32%	54%
18	KillAll	zerg	43.04%	39%	55%	34%
19	MegaBot	protoss	42.83%	43%	41%	45%
20	Xelnaga	protoss	37.10%	54%	38%	34%
21	Overkill	zerg	32.69%	25%	30%	37%
22	Juno	protoss	29.57%	39%	35%	24%
23	GarmBot	zerg	27.09%	15%	34%	24%
24	Myscbot	protoss	25.94%	19%	25%	27%
25	HannesBredberg	terran	21.26%	18%	11%	31%
26	Sling	zerg	21.09%	8%	28%	19%
27	ForceBot	zerg	17.97%	21%	15%	20%
28	Ziabot	zerg	17.21%	26%	21%	13%

Individual bots, of course, are not as balanced. Some of the table cells have striking numbers. First of all, with many zergs and few terrans, the vZ column carries the most weight. Sure enough, #3 Iron’s relative weakness versus zerg (“only” winning 3:1) allowed competitors to squeeze in front.

The largest number in any cell is #6 CherryPi’s 92% versus terran. CherryPi crushed the 4 terrans: #3 Iron, #12 LetaBot, #16 ICEbot, #25 HannesBredberg. #26 Sling, in contrast, rolled over and died for terrans but had some chance against other races. It makes sense that the matchup with the fewest participants, terran, would give us the most extreme numbers.

#12 LetaBot and #13 XIMP struggled against terran, while #16 ICEbot and #20 Xelnaga were happy to accept terran customers. #10 Steamhammer and #17 Skynet only played well against zerg, while #18 KillAll liked protoss victims.

Next: The per-map crosstables. Prepare for data overload.

Trackbacks

No Trackbacks

Comments

Jay Scott on Wednesday, October 11. 2017:

I want to emphasize that the equal balance is a coincidence of the participants. If HannesBredberg had not played, terran would have looked strong. If Iron had been missing, terran would have looked weak. Balance was even because the participants of each race were evenly matched, on average—which may have roots in the game, but if so, the causes are indirect. One bot more or less would have shifted the balance. And yet it still demonstrates that effort and skill make results, and race is not important at the current level of play.

krasi0 on Wednesday, October 11. 2017:

Amazing! Broodwar is such a beautiful and balanced game, even in AI land!
BTW, you should do the same breakdowns and crosstables for the last 4k games on SSCAIT (4k is the number that MicroDK's ELO / ICCUP calculations are based on)

MicroDK on Wednesday, October 11. 2017:

Even with 4000 games each bot only plays around 95 games each on average and only 2 games vs each other bot on average. Some bots have extreme number of games vs other bots and some bots have no games vs other bots. I dont think we will get any useful information out of that.

krasi0 on Wednesday, October 11. 2017:

Well, I guess, it's the same at the human pro level scene. If Flash did not play at all, the balance would have been broken, etc.

MicroDK on Wednesday, October 11. 2017:

Nice breakdown! A welcome surprise that Microwave did well in ZvZ. I already knew that ZvP would be weaker. But I had no time to test more openings before submission. Also, I did not have time to include an anti zerg rush opening. That would have made a high impact on performance vs ZZKBot and cpac.

skar1ath on Wednesday, October 11. 2017:

Just to counteract the people lamenting the rise of hardcoded cheese in ZZZKBot, I'd like to point out how this breakdown highlights the incredible success of PurpleWave. PW might have won 3 fewer games total, but its results in the crosstable actually look better (2 matchups with a losing record, vs. 3 for ZZZK). Here, we can see that in terms of per-race matchups, PW was the most consistent of any bot, with only a 4 percentage-point difference between its worst-case winrate vs. Terran and its best-case winrate vs. Zerg. It also has the best worst-case, per-race winrate, at 79%, just beating out Iron's 78%.

Personally, I think that cheese is healthy for BWAI scene. While it leads to some low-effort bots that clean up weaker opponents for large numbers of wins, it also forces developers to depend less on a single, easy to cheese strategy, and to write more flexible, well-rounded bots. Going forward, I think we'll see more success from bots like PurpleWave, that can play cheese when needed but also do so much more.

krasi0 on Thursday, October 12. 2017:

I completely agree with your analysis. Congratulations to PurpleWave's author - Dan! PW is probably the bot that has gone the longest path uphill in the shortest time.
I, too, think that cheese has its role. That's why we've been allowing multiple entries (non-competitive) per bot author on SSCAIT, where the additional entries are one-cheese-trick ponies, e.g. 5 pool, purple tickles, Stone, etc. Those are good benchmarks and serve as baselines for other bots to compete against.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA