Starcraft AI Ladder crosstables
The Starcraft AI Ladder does not display crosstables or per-map results. I wanted to see the charts to know Steamhammer’s strengths and weaknesses, so I calculated them myself. I modified the script I use to analyze the CoG and AIIDE tournament results every year. The tournament manager’s results file is now in CSV format, a change, but of course it was no trouble to parse. The pop-up table legend explains how to interpret the results to know whether to count each game, only referring to a “Duration” column which the file itself names “Game Time” (to distinguish it from “Wall Time”), and which has value “0:00” on an unstarted game rather than “00:00:00”. My script skipped a total of 3 games out of 2793, all of them with PurpleWave as one player and all due to GAME_STATE_NOT_UPDATED_60S_BOTH_BOTS.
I found that the “Download Search Results” did not behave quite as its name suggests. It seemed to perhaps remember a previous search rather than the current setting, or anyway something unexpected. But after a couple tries I was able to get the complete record of games played since the last reset on 17 April (just a couple days ago). I trimmed off the incomplete round 133, so the file I analyzed includes all games of rounds 0 through 132. 2793 games in 2 days is a great number, far more than BASIL plays.
The ladder would be more valuable if it had more participants. As it is, I am learning from it, because nowhere else runs so many games so quickly.
crosstable
| # | bot | overall | Bana | Stea | Micr | Halo | Ecgb | Purp | ZZZK |
|---|---|---|---|---|---|---|---|---|---|
| 1 | BananaBrain | 86.22% | 69% | 51% | 100% | 98% | 100% | 99% | |
| 2 | Steamhammer | 71.68% | 31% | 80% | 46% | 79% | 95% | 98% | |
| 3 | Microwave | 70.05% | 49% | 20% | 56% | 97% | 99% | 100% | |
| 4 | Halo | 61.86% | 0% | 54% | 44% | 94% | 99% | 80% | |
| 5 | Ecgberht | 27.35% | 2% | 21% | 3% | 6% | 36% | 95% | |
| 6 | PurpleWave | 20.88% | 0% | 5% | 1% | 1% | 64% | 56% | |
| 7 | ZZZKBot | 11.79% | 1% | 2% | 0% | 20% | 5% | 44% |
BananaBrain is on top in this small field. Steamhammer is doing well; it scores nearly a third versus BananaBrain, wins most games versus Microwave, and is about equal with Halo by Hao Pan. Thanks to the huge number of games, Steamhammer’s learning is saturated so this should be its peak performance. Steamhammer eked out a slight overall lead over Microwave only due to its dominating head-to-head results; against every other bot, Microwave scored better.
The version of PurpleWave is broken; it crashes or oversteps a frame time limit most games. I have to imagine that fixes are progressing in the workshop. I suspect that this version of ZZZKBot may not be working perfectly either, but I didn’t look into it.
each bot’s results per map
| BananaBrain | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Steamhammer | 69% | 79% | 57% | 64% | 77% | 31% | 77% | 77% | 77% | 69% | 85% |
| Microwave | 51% | 21% | 57% | 50% | 62% | 46% | 46% | 62% | 46% | 46% | 77% |
| Halo | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| Ecgberht | 98% | 86% | 100% | 93% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| PurpleWave | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| ZZZKBot | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 100% |
| overall | 86.22% | 81% | 86% | 85% | 90% | 79% | 87% | 88% | 87% | 86% | 94% |
BananaBrain barely noticed opponents other than Steamhammer and Microwave. Against Steamhammer it had trouble on the map Tau Cross, and against Microwave on Benzene. Checking the mix of strategies played on those maps would probably explain the cause.
| Steamhammer | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BananaBrain | 31% | 21% | 43% | 36% | 23% | 69% | 23% | 23% | 23% | 31% | 15% |
| Microwave | 80% | 93% | 100% | 100% | 100% | 69% | 62% | 77% | 77% | 54% | 69% |
| Halo | 46% | 14% | 36% | 29% | 69% | 54% | 38% | 69% | 46% | 77% | 31% |
| Ecgberht | 79% | 93% | 79% | 71% | 92% | 85% | 77% | 85% | 85% | 62% | 62% |
| PurpleWave | 95% | 100% | 100% | 100% | 100% | 100% | 92% | 92% | 92% | 100% | 77% |
| ZZZKBot | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 92% | 92% | 100% | 100% |
| overall | 71.68% | 70% | 76% | 73% | 81% | 79% | 65% | 73% | 69% | 71% | 59% |
Steamhammer’s results vary strongly from map to map. I think it is a sign that the opening selection is not paying enough attention to the map. I should have gone with a proper Bayesian calculation rather than an ad hoc algorithm.
| Microwave | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BananaBrain | 49% | 79% | 43% | 50% | 38% | 54% | 54% | 38% | 54% | 54% | 23% |
| Steamhammer | 20% | 7% | 0% | 0% | 0% | 31% | 38% | 23% | 23% | 46% | 31% |
| Halo | 56% | 43% | 57% | 21% | 77% | 23% | 92% | 85% | 62% | 62% | 38% |
| Ecgberht | 97% | 79% | 100% | 93% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| PurpleWave | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 92% |
| ZZZKBot | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| overall | 70.05% | 68% | 67% | 61% | 69% | 68% | 81% | 74% | 73% | 77% | 64% |
Microwave also shows a lot of variation from map to map. That’s harder for me to interpret, even though it is the same evidence: Microwave has fewer openings overall than Steamhammer, so it is possible that poor results on some maps are due to not having an appropriate strategy available. Of course it could also be that the opponent’s play is much stronger on some maps.
| Halo | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BananaBrain | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| Steamhammer | 54% | 86% | 64% | 71% | 31% | 46% | 62% | 31% | 54% | 23% | 69% |
| Microwave | 44% | 57% | 43% | 79% | 23% | 77% | 8% | 15% | 38% | 38% | 62% |
| Ecgberht | 94% | 100% | 93% | 100% | 92% | 100% | 100% | 92% | 85% | 92% | 85% |
| PurpleWave | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 92% |
| ZZZKBot | 80% | 93% | 79% | 79% | 38% | 100% | 69% | 77% | 85% | 92% | 85% |
| overall | 61.86% | 73% | 63% | 71% | 47% | 71% | 56% | 53% | 60% | 58% | 65% |
Halo by Hao Pan seems to have consistent trouble on Aztec, the only map in the pool with a low-ground main and a ramp up to the natural. That could be the cause. Most bots underestimate the difficulty of defending the main from enemies on high ground.
| Ecgberht | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BananaBrain | 2% | 14% | 0% | 7% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| Steamhammer | 21% | 7% | 21% | 29% | 8% | 15% | 23% | 15% | 15% | 38% | 38% |
| Microwave | 3% | 21% | 0% | 7% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| Halo | 6% | 0% | 7% | 0% | 8% | 0% | 0% | 8% | 15% | 8% | 15% |
| PurpleWave | 36% | 7% | 7% | 14% | 100% | 100% | 8% | 15% | 92% | 23% | 8% |
| ZZZKBot | 95% | 93% | 100% | 100% | 85% | 100% | 92% | 100% | 92% | 100% | 92% |
| overall | 27.35% | 24% | 23% | 26% | 32% | 36% | 21% | 23% | 36% | 28% | 26% |
For Steamhammer, Ecgberht is a tricky opponent that can sometimes pull surprise wins. Other bots don’t seem to have the same experience.
| PurpleWave | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BananaBrain | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| Steamhammer | 5% | 0% | 0% | 0% | 0% | 0% | 8% | 8% | 8% | 0% | 23% |
| Microwave | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 8% |
| Halo | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 8% |
| Ecgberht | 64% | 93% | 93% | 86% | 0% | 0% | 92% | 85% | 8% | 77% | 92% |
| ZZZKBot | 56% | 93% | 43% | 57% | 0% | 77% | 62% | 62% | 77% | 46% | 38% |
| overall | 20.88% | 31% | 23% | 24% | 0% | 13% | 27% | 26% | 15% | 21% | 28% |
PurpleWave crashes every game on Aztec, and frequently on other maps. :-(
| ZZZKBot | overall | Benzen | Destin | Heartb | Aztec | TauCro | Androm | Circui | Empire | Fortre | Python |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BananaBrain | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 8% | 0% | 0% | 0% |
| Steamhammer | 2% | 0% | 0% | 0% | 0% | 0% | 0% | 8% | 8% | 0% | 0% |
| Microwave | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| Halo | 20% | 7% | 21% | 21% | 62% | 0% | 31% | 23% | 15% | 8% | 15% |
| Ecgberht | 5% | 7% | 0% | 0% | 15% | 0% | 8% | 0% | 8% | 0% | 8% |
| PurpleWave | 44% | 7% | 57% | 43% | 100% | 23% | 38% | 38% | 23% | 54% | 62% |
| overall | 11.79% | 4% | 13% | 11% | 29% | 4% | 13% | 13% | 9% | 10% | 14% |
Comments
Dan on :
*I call it a Tournament Manager bug, but really I blame the design of Java's API for launching processes which I expect the TM is using; infinite blocking on writing to stdout is the *default behavior* which is not very clever. SCHNAIL ran into the same issue.
I'll upload a fixed version when I have a chance.
MicroDK on :
Jay Scott on :
MicroDK on :