Starcraft AI Ladder crosstables

The Starcraft AI Ladder does not display crosstables or per-map results. I wanted to see the charts to know Steamhammer’s strengths and weaknesses, so I calculated them myself. I modified the script I use to analyze the CoG and AIIDE tournament results every year. The tournament manager’s results file is now in CSV format, a change, but of course it was no trouble to parse. The pop-up table legend explains how to interpret the results to know whether to count each game, only referring to a “Duration” column which the file itself names “Game Time” (to distinguish it from “Wall Time”), and which has value “0:00” on an unstarted game rather than “00:00:00”. My script skipped a total of 3 games out of 2793, all of them with PurpleWave as one player and all due to GAME_STATE_NOT_UPDATED_60S_BOTH_BOTS.

I found that the “Download Search Results” did not behave quite as its name suggests. It seemed to perhaps remember a previous search rather than the current setting, or anyway something unexpected. But after a couple tries I was able to get the complete record of games played since the last reset on 17 April (just a couple days ago). I trimmed off the incomplete round 133, so the file I analyzed includes all games of rounds 0 through 132. 2793 games in 2 days is a great number, far more than BASIL plays.

The ladder would be more valuable if it had more participants. As it is, I am learning from it, because nowhere else runs so many games so quickly.

crosstable

#	bot	overall	Bana	Stea	Micr	Halo	Ecgb	Purp	ZZZK
1	BananaBrain	86.22%		69%	51%	100%	98%	100%	99%
2	Steamhammer	71.68%	31%		80%	46%	79%	95%	98%
3	Microwave	70.05%	49%	20%		56%	97%	99%	100%
4	Halo	61.86%	0%	54%	44%		94%	99%	80%
5	Ecgberht	27.35%	2%	21%	3%	6%		36%	95%
6	PurpleWave	20.88%	0%	5%	1%	1%	64%		56%
7	ZZZKBot	11.79%	1%	2%	0%	20%	5%	44%

BananaBrain is on top in this small field. Steamhammer is doing well; it scores nearly a third versus BananaBrain, wins most games versus Microwave, and is about equal with Halo by Hao Pan. Thanks to the huge number of games, Steamhammer’s learning is saturated so this should be its peak performance. Steamhammer eked out a slight overall lead over Microwave only due to its dominating head-to-head results; against every other bot, Microwave scored better.

The version of PurpleWave is broken; it crashes or oversteps a frame time limit most games. I have to imagine that fixes are progressing in the workshop. I suspect that this version of ZZZKBot may not be working perfectly either, but I didn’t look into it.

each bot’s results per map

BananaBrain	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
Steamhammer	69%	79%	57%	64%	77%	31%	77%	77%	77%	69%	85%
Microwave	51%	21%	57%	50%	62%	46%	46%	62%	46%	46%	77%
Halo	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
Ecgberht	98%	86%	100%	93%	100%	100%	100%	100%	100%	100%	100%
PurpleWave	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
ZZZKBot	99%	100%	100%	100%	100%	100%	100%	92%	100%	100%	100%
overall	86.22%	81%	86%	85%	90%	79%	87%	88%	87%	86%	94%

BananaBrain barely noticed opponents other than Steamhammer and Microwave. Against Steamhammer it had trouble on the map Tau Cross, and against Microwave on Benzene. Checking the mix of strategies played on those maps would probably explain the cause.

Steamhammer	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	31%	21%	43%	36%	23%	69%	23%	23%	23%	31%	15%
Microwave	80%	93%	100%	100%	100%	69%	62%	77%	77%	54%	69%
Halo	46%	14%	36%	29%	69%	54%	38%	69%	46%	77%	31%
Ecgberht	79%	93%	79%	71%	92%	85%	77%	85%	85%	62%	62%
PurpleWave	95%	100%	100%	100%	100%	100%	92%	92%	92%	100%	77%
ZZZKBot	98%	100%	100%	100%	100%	100%	100%	92%	92%	100%	100%
overall	71.68%	70%	76%	73%	81%	79%	65%	73%	69%	71%	59%

Steamhammer’s results vary strongly from map to map. I think it is a sign that the opening selection is not paying enough attention to the map. I should have gone with a proper Bayesian calculation rather than an ad hoc algorithm.

Microwave	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	49%	79%	43%	50%	38%	54%	54%	38%	54%	54%	23%
Steamhammer	20%	7%	0%	0%	0%	31%	38%	23%	23%	46%	31%
Halo	56%	43%	57%	21%	77%	23%	92%	85%	62%	62%	38%
Ecgberht	97%	79%	100%	93%	100%	100%	100%	100%	100%	100%	100%
PurpleWave	99%	100%	100%	100%	100%	100%	100%	100%	100%	100%	92%
ZZZKBot	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
overall	70.05%	68%	67%	61%	69%	68%	81%	74%	73%	77%	64%

Microwave also shows a lot of variation from map to map. That’s harder for me to interpret, even though it is the same evidence: Microwave has fewer openings overall than Steamhammer, so it is possible that poor results on some maps are due to not having an appropriate strategy available. Of course it could also be that the opponent’s play is much stronger on some maps.

Halo	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Steamhammer	54%	86%	64%	71%	31%	46%	62%	31%	54%	23%	69%
Microwave	44%	57%	43%	79%	23%	77%	8%	15%	38%	38%	62%
Ecgberht	94%	100%	93%	100%	92%	100%	100%	92%	85%	92%	85%
PurpleWave	99%	100%	100%	100%	100%	100%	100%	100%	100%	100%	92%
ZZZKBot	80%	93%	79%	79%	38%	100%	69%	77%	85%	92%	85%
overall	61.86%	73%	63%	71%	47%	71%	56%	53%	60%	58%	65%

Halo by Hao Pan seems to have consistent trouble on Aztec, the only map in the pool with a low-ground main and a ramp up to the natural. That could be the cause. Most bots underestimate the difficulty of defending the main from enemies on high ground.

Ecgberht	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	2%	14%	0%	7%	0%	0%	0%	0%	0%	0%	0%
Steamhammer	21%	7%	21%	29%	8%	15%	23%	15%	15%	38%	38%
Microwave	3%	21%	0%	7%	0%	0%	0%	0%	0%	0%	0%
Halo	6%	0%	7%	0%	8%	0%	0%	8%	15%	8%	15%
PurpleWave	36%	7%	7%	14%	100%	100%	8%	15%	92%	23%	8%
ZZZKBot	95%	93%	100%	100%	85%	100%	92%	100%	92%	100%	92%
overall	27.35%	24%	23%	26%	32%	36%	21%	23%	36%	28%	26%

For Steamhammer, Ecgberht is a tricky opponent that can sometimes pull surprise wins. Other bots don’t seem to have the same experience.

PurpleWave	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Steamhammer	5%	0%	0%	0%	0%	0%	8%	8%	8%	0%	23%
Microwave	1%	0%	0%	0%	0%	0%	0%	0%	0%	0%	8%
Halo	1%	0%	0%	0%	0%	0%	0%	0%	0%	0%	8%
Ecgberht	64%	93%	93%	86%	0%	0%	92%	85%	8%	77%	92%
ZZZKBot	56%	93%	43%	57%	0%	77%	62%	62%	77%	46%	38%
overall	20.88%	31%	23%	24%	0%	13%	27%	26%	15%	21%	28%

PurpleWave crashes every game on Aztec, and frequently on other maps. :-(

ZZZKBot	overall	Benzen	Destin	Heartb	Aztec	TauCro	Androm	Circui	Empire	Fortre	Python
BananaBrain	1%	0%	0%	0%	0%	0%	0%	8%	0%	0%	0%
Steamhammer	2%	0%	0%	0%	0%	0%	0%	8%	8%	0%	0%
Microwave	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%	0%
Halo	20%	7%	21%	21%	62%	0%	31%	23%	15%	8%	15%
Ecgberht	5%	7%	0%	0%	15%	0%	8%	0%	8%	0%	8%
PurpleWave	44%	7%	57%	43%	100%	23%	38%	38%	23%	54%	62%
overall	11.79%	4%	13%	11%	29%	4%	13%	13%	9%	10%	14%

Trackbacks

No Trackbacks

Comments

Dan on Sunday, April 19. 2020:

The timeouts are almost certainly due to the Tournament Manager bug* where client bots are launched in a manner that causes writing to stdout to block forever. I've fixed it since then (with separate configurations such that I allow stdout writes in some environments but not others) but haven't yet updated the AIIDE ladder version.

*I call it a Tournament Manager bug, but really I blame the design of Java's API for launching processes which I expect the TM is using; infinite blocking on writing to stdout is the *default behavior* which is not very clever. SCHNAIL ran into the same issue.

I'll upload a fixed version when I have a chance.

MicroDK on Monday, April 20. 2020:

The AIIDE ladder has been reset and will be reset every sunday. Also the read folders will now be reset also. This was a bug before. ;)

Jay Scott on Monday, April 20. 2020:

Will there we a way to get data from the previous week, so you can have the most complete possible data set?

MicroDK on Monday, April 20. 2020:

I don't know. The ladder will reset at 11:59 pm GMT Sundays. So download the data before the reset.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA