archive by month
Skip to content

Starcraft AI Ladder crosstables

The Starcraft AI Ladder does not display crosstables or per-map results. I wanted to see the charts to know Steamhammer’s strengths and weaknesses, so I calculated them myself. I modified the script I use to analyze the CoG and AIIDE tournament results every year. The tournament manager’s results file is now in CSV format, a change, but of course it was no trouble to parse. The pop-up table legend explains how to interpret the results to know whether to count each game, only referring to a “Duration” column which the file itself names “Game Time” (to distinguish it from “Wall Time”), and which has value “0:00” on an unstarted game rather than “00:00:00”. My script skipped a total of 3 games out of 2793, all of them with PurpleWave as one player and all due to GAME_STATE_NOT_UPDATED_60S_BOTH_BOTS.

I found that the “Download Search Results” did not behave quite as its name suggests. It seemed to perhaps remember a previous search rather than the current setting, or anyway something unexpected. But after a couple tries I was able to get the complete record of games played since the last reset on 17 April (just a couple days ago). I trimmed off the incomplete round 133, so the file I analyzed includes all games of rounds 0 through 132. 2793 games in 2 days is a great number, far more than BASIL plays.

The ladder would be more valuable if it had more participants. As it is, I am learning from it, because nowhere else runs so many games so quickly.

crosstable

#botoverallBanaSteaMicrHaloEcgbPurpZZZK
1BananaBrain86.22%69%51%100%98%100%99%
2Steamhammer71.68%31%80%46%79%95%98%
3Microwave70.05%49%20%56%97%99%100%
4Halo61.86%0%54%44%94%99%80%
5Ecgberht27.35%2%21%3%6%36%95%
6PurpleWave20.88%0%5%1%1%64%56%
7ZZZKBot11.79%1%2%0%20%5%44%

BananaBrain is on top in this small field. Steamhammer is doing well; it scores nearly a third versus BananaBrain, wins most games versus Microwave, and is about equal with Halo by Hao Pan. Thanks to the huge number of games, Steamhammer’s learning is saturated so this should be its peak performance. Steamhammer eked out a slight overall lead over Microwave only due to its dominating head-to-head results; against every other bot, Microwave scored better.

The version of PurpleWave is broken; it crashes or oversteps a frame time limit most games. I have to imagine that fixes are progressing in the workshop. I suspect that this version of ZZZKBot may not be working perfectly either, but I didn’t look into it.

each bot’s results per map

BananaBrainoverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
Steamhammer69%79%57%64%77%31%77%77%77%69%85%
Microwave51%21%57%50%62%46%46%62%46%46%77%
Halo100%100%100%100%100%100%100%100%100%100%100%
Ecgberht98%86%100%93%100%100%100%100%100%100%100%
PurpleWave100%100%100%100%100%100%100%100%100%100%100%
ZZZKBot99%100%100%100%100%100%100%92%100%100%100%
overall86.22%81%86%85%90%79%87%88%87%86%94%

BananaBrain barely noticed opponents other than Steamhammer and Microwave. Against Steamhammer it had trouble on the map Tau Cross, and against Microwave on Benzene. Checking the mix of strategies played on those maps would probably explain the cause.

SteamhammeroverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
BananaBrain31%21%43%36%23%69%23%23%23%31%15%
Microwave80%93%100%100%100%69%62%77%77%54%69%
Halo46%14%36%29%69%54%38%69%46%77%31%
Ecgberht79%93%79%71%92%85%77%85%85%62%62%
PurpleWave95%100%100%100%100%100%92%92%92%100%77%
ZZZKBot98%100%100%100%100%100%100%92%92%100%100%
overall71.68%70%76%73%81%79%65%73%69%71%59%

Steamhammer’s results vary strongly from map to map. I think it is a sign that the opening selection is not paying enough attention to the map. I should have gone with a proper Bayesian calculation rather than an ad hoc algorithm.

MicrowaveoverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
BananaBrain49%79%43%50%38%54%54%38%54%54%23%
Steamhammer20%7%0%0%0%31%38%23%23%46%31%
Halo56%43%57%21%77%23%92%85%62%62%38%
Ecgberht97%79%100%93%100%100%100%100%100%100%100%
PurpleWave99%100%100%100%100%100%100%100%100%100%92%
ZZZKBot100%100%100%100%100%100%100%100%100%100%100%
overall70.05%68%67%61%69%68%81%74%73%77%64%

Microwave also shows a lot of variation from map to map. That’s harder for me to interpret, even though it is the same evidence: Microwave has fewer openings overall than Steamhammer, so it is possible that poor results on some maps are due to not having an appropriate strategy available. Of course it could also be that the opponent’s play is much stronger on some maps.

HalooverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
BananaBrain0%0%0%0%0%0%0%0%0%0%0%
Steamhammer54%86%64%71%31%46%62%31%54%23%69%
Microwave44%57%43%79%23%77%8%15%38%38%62%
Ecgberht94%100%93%100%92%100%100%92%85%92%85%
PurpleWave99%100%100%100%100%100%100%100%100%100%92%
ZZZKBot80%93%79%79%38%100%69%77%85%92%85%
overall61.86%73%63%71%47%71%56%53%60%58%65%

Halo by Hao Pan seems to have consistent trouble on Aztec, the only map in the pool with a low-ground main and a ramp up to the natural. That could be the cause. Most bots underestimate the difficulty of defending the main from enemies on high ground.

EcgberhtoverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
BananaBrain2%14%0%7%0%0%0%0%0%0%0%
Steamhammer21%7%21%29%8%15%23%15%15%38%38%
Microwave3%21%0%7%0%0%0%0%0%0%0%
Halo6%0%7%0%8%0%0%8%15%8%15%
PurpleWave36%7%7%14%100%100%8%15%92%23%8%
ZZZKBot95%93%100%100%85%100%92%100%92%100%92%
overall27.35%24%23%26%32%36%21%23%36%28%26%

For Steamhammer, Ecgberht is a tricky opponent that can sometimes pull surprise wins. Other bots don’t seem to have the same experience.

PurpleWaveoverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
BananaBrain0%0%0%0%0%0%0%0%0%0%0%
Steamhammer5%0%0%0%0%0%8%8%8%0%23%
Microwave1%0%0%0%0%0%0%0%0%0%8%
Halo1%0%0%0%0%0%0%0%0%0%8%
Ecgberht64%93%93%86%0%0%92%85%8%77%92%
ZZZKBot56%93%43%57%0%77%62%62%77%46%38%
overall20.88%31%23%24%0%13%27%26%15%21%28%

PurpleWave crashes every game on Aztec, and frequently on other maps. :-(

ZZZKBotoverallBenzenDestinHeartbAztecTauCroAndromCircuiEmpireFortrePython
BananaBrain1%0%0%0%0%0%0%8%0%0%0%
Steamhammer2%0%0%0%0%0%0%8%8%0%0%
Microwave0%0%0%0%0%0%0%0%0%0%0%
Halo20%7%21%21%62%0%31%23%15%8%15%
Ecgberht5%7%0%0%15%0%8%0%8%0%8%
PurpleWave44%7%57%43%100%23%38%38%23%54%62%
overall11.79%4%13%11%29%4%13%13%9%10%14%

Trackbacks

No Trackbacks

Comments

Dan on :

The timeouts are almost certainly due to the Tournament Manager bug* where client bots are launched in a manner that causes writing to stdout to block forever. I've fixed it since then (with separate configurations such that I allow stdout writes in some environments but not others) but haven't yet updated the AIIDE ladder version.

*I call it a Tournament Manager bug, but really I blame the design of Java's API for launching processes which I expect the TM is using; infinite blocking on writing to stdout is the *default behavior* which is not very clever. SCHNAIL ran into the same issue.

I'll upload a fixed version when I have a chance.

MicroDK on :

The AIIDE ladder has been reset and will be reset every sunday. Also the read folders will now be reset also. This was a bug before. ;)

Jay Scott on :

Will there we a way to get data from the previous week, so you can have the most complete possible data set?

MicroDK on :

I don't know. The ladder will reset at 11:59 pm GMT Sundays. So download the data before the reset.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.