archive by month
Skip to content

CIG 2018 - what Locutus learned

Locutus only recorded 8 games. It is configured to retain 200 game records, and I read the source code and verified that Locutus does not intentionally drop game records before the limit of 200. Recording exactly 8 games is the same problem that McRave suffered, and must be due to CIG problems. I don't know what the underlying problem was. My suspicion is that CIG organizers or tournament software may have accidentally or mistakenly cleared learning data for some bots. If that is what happened, and it happened once 8 games before the end of the tournament, it seems likely that it happened more than once. Who knows, though? The error might be somewhere else. Maybe they mistakenly shipped us data from after round 8 instead of round 125—in that case the tournament may have run normally, and only the data about it is wrong.

Locutus has prepared data for some opponents, stored in the AI directory. When Locutus finds it has no game records for a given opponent, it looks in AI to see if it has prepared data, and if so, it reads in those game records. At the end of the game, it writes out the prepared game records along with the record for the newly played game, and from then on the prepared records are treated like any others and retained unless and until the 200 record limit is passed.

How many other bots were affected by the 8 game problem?


Here is Locutus’s prepared data. Against some opponents, like McRave, Locutus picks out openings to avoid at first. If other openings don’t win either, I’m sure Locutus will come back and try these anyway. Against others, it picks out winners to try first. For some, it simply provides data. Most but not all of the prepared data is for opponents which were carried over from last year, for which pre-learning is sure to be helpful... if it is done on the same maps.

#3 mcrave

openinggameswins
12Nexus5ZealotFECannons10%
Turtle10%
2 openings20%

#6 iron

openinggameswins
DTDrop14100%
1 openings14100%

#7 zzzkbot

openinggameswins
ForgeExpand5GateGoon2100%
1 openings2100%

#11 ualbertabot

openinggameswins
4GateGoon10%
9-9GateDefensive250%
ForgeExpand5GateGoon1593%
3 openings1883%

#14 aiur

openinggameswins
4GateGoon3100%
9-9GateDefensive1100%
2 openings4100%

#16 ziabot

openinggameswins
9-9GateDefensive10%
ForgeExpand5GateGoon1100%
2 openings250%

#19 terranuab

openinggameswins
DTDrop10100%
1 openings10100%

#21 opprimobot

openinggameswins
DTDrop11100%
1 openings11100%

#22 sling

openinggameswins
ForgeExpand5GateGoon2100%
1 openings2100%

#23 srbotone

openinggameswins
DTDrop7100%
PlasmaProxy2Gate1100%
2 openings8100%

#24 bonjwa

openinggameswins
DTDrop6100%
PlasmaProxy2Gate1100%
2 openings7100%

overall

totalPvTPvPPvZPvR
openinggameswinsgameswinsgameswinsgameswinsgameswins
12Nexus5ZealotFECannons10% 10%
4GateGoon475% 3100% 10%
9-9GateDefensive450% 1100% 10% 250%
DTDrop48100% 48100%
ForgeExpand5GateGoon2095% 5100% 1593%
PlasmaProxy2Gate2100% 2100%
Turtle10% 10%
total8092%50100%667%683%1883%
openings played72423

Here is Locutus’s learned data. In every case, the number of games recorded is 8 plus the number of games in the prepared data. With only 8 games there is not much to go on, but the prepared data does seem to have helped Locutus choose successful openings.

#2 purplewave

openinggameswins
12Nexus5ZealotFECannons10%
4GateGoon10%
9-9GateDefensive580%
Proxy9-9Gate10%
4 openings850%

#3 mcrave

openinggameswins
12Nexus5ZealotFECannons10%
4GateGoon367%
Proxy9-9Gate5100%
Turtle10%
4 openings1070%

#4 tscmoo

openinggameswins
4GateGoon10%
9-9GateDefensive10%
ForgeExpand5GateGoon425%
Proxy9-9Gate250%
4 openings825%

#5 isamind

openinggameswins
4GateGoon683%
9-9GateDefensive1100%
Proxy9-9Gate1100%
3 openings888%

#6 iron

openinggameswins
DTDrop2295%
1 openings2295%

#7 zzzkbot

openinggameswins
ForgeExpand5GateGoon786%
ForgeExpandSpeedlots250%
Proxy9-9Gate10%
3 openings1070%

#8 microwave

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

#9 letabot

openinggameswins
DTDrop888%
1 openings888%

#10 megabot

openinggameswins
4GateGoon8100%
1 openings8100%

#11 ualbertabot

openinggameswins
4GateGoon10%
9-9GateDefensive250%
ForgeExpand5GateGoon2391%
3 openings2685%

#12 tyr

openinggameswins
4GateGoon8100%
1 openings8100%

#13 ecgberht

openinggameswins
DTDrop888%
1 openings888%

#14 aiur

openinggameswins
12Nexus5ZealotFECannons10%
2GateDTExpo1100%
4GateGoon580%
9-9GateDefensive1100%
Proxy9-9Gate475%
5 openings1275%

#15 titaniron

openinggameswins
DTDrop8100%
1 openings8100%

#16 ziabot

openinggameswins
9-9GateDefensive10%
ForgeExpand5GateGoon683%
ForgeExpandSpeedlots250%
Proxy9-9Gate1100%
4 openings1070%

#17 steamhammer

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

#18 overkill

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

#19 terranuab

openinggameswins
DTDrop18100%
1 openings18100%

#20 cunybot

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

#21 opprimobot

openinggameswins
DTDrop19100%
1 openings19100%

#22 sling

openinggameswins
ForgeExpand5GateGoon10100%
1 openings10100%

#23 srbotone

openinggameswins
DTDrop15100%
PlasmaProxy2Gate1100%
2 openings16100%

#24 bonjwa

openinggameswins
DTDrop14100%
PlasmaProxy2Gate1100%
2 openings15100%

#25 stormbreaker

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

#26 korean

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

#27 salsa

openinggameswins
ForgeExpand5GateGoon8100%
1 openings8100%

overall

totalPvTPvPPvZPvR
openinggameswinsgameswinsgameswinsgameswinsgameswins
12Nexus5ZealotFECannons30% 30%
2GateDTExpo1100% 1100%
4GateGoon3382% 3187% 20%
9-9GateDefensive1164% 786% 10% 333%
DTDrop11297% 11297%
ForgeExpand5GateGoon10693% 7997% 2781%
ForgeExpandSpeedlots450% 450%
PlasmaProxy2Gate2100% 2100%
Proxy9-9Gate1573% 1182% 250% 250%
Turtle10% 10%
total28890%11497%5480%8693%3471%
openings played102644

CIG 2018 - what Steamhammer learned

I wrote a new script to analyze Steamhammer’s learning data. A couple points: 1. Steamhammer crashed in nearly half of its games in CIG 2018. It can’t save learning data after a crash, so against some opponents Steamhammer had few opportunities to experiment. The number of crashes varied strongly depending on the opponent. 2. Steamhammer was set to remember the previous 100 games, since I figure there’s no play advantage to remembering more. The tournament was 125 rounds long. So in the tables below, “100 games” means that Steamhammer played at least 100 games without crashing, and up to 25 games may have been dropped, the early games. Against some weak opponents, Steamhammer learned, within 25 games, how to win 100% of the remaining games, and those tables give a 100% win rate for remembered games. Steamhammer did not score 100% against any opponent overall; it always had some losses in early games.

I should be able to run the same analysis for Steamhammer forks which retain Steamhammer’s opponent model file format.

#1 Locutus

openinggameswins
2HatchHydraBust10%
3HatchHydraExpo20%
3HatchLingBust10%
3HatchLingExpo10%
4HatchBeforeGas10%
OverpoolSpeed956%
6 openings1533%

A mystery is solved. Why was Steamhammer’s crash rate higher than I expected? Because many opponents learned to make Steamhammer crash. A crash for the opponent is a win, and the bot doesn’t care how it wins, so if it can learn a plan that makes the opponent crash reliably, it will. The stronger opponents tend to be learning bots, so Steamhammer crashed more often on average against strong opponents. This also means that my glib conclusion that Steamhammer won 66% of non-crash games, so it seems to have kept up with general progress is not sound. The non-crash games were mostly against weak opponents.

Locutus was lucky that it could figure out how to break Steamhammer. As Bruce mentioned in a comment, this Locutus version had a bug when facing certain zergling timings, and Steamhammer quickly figured out how to exploit the bug. It’s possible that Steamhammer minus the crash would have upset Locutus.

#2 PurpleWave

openinggameswins
11Gas10PoolMuta10%
3HatchHydra30%
3HatchLurker10%
4PoolSoft10%
7Pool12Hatch10%
7PoolSoft10%
9Hatch8Pool10%
9HatchExpo9Pool9Gas10%
9PoolSpeed10%
AntiFactory10%
Over10Hatch60%
Over10Hatch1Sunk70%
Over10Hatch2Sunk180%
Over10HatchBust10%
Over10HatchSlowLings40%
OverhatchMuta10%
OverpoolHatch10%
OverpoolTurtle30%
ZvP_3HatchPoolHydra20%
ZvP_4HatchPoolHydra10%
ZvT_12PoolMuta10%
ZvZ_Overpool11Gas10%
22 openings580%

PurpleWave shut out Steamhammer. It didn’t learn to make Steamhammer crash because every game was a win for it anyway. Steamhammer desperately tried alternatives all over the map, including crazy all-ins and openings intended for ZvT and ZvZ, and nothing worked.

#3 McRave

openinggameswins
11Gas10PoolLurker10%
4HatchBeforeGas10%
9HatchExpo9Pool9Gas10%
9PoolSpeed5100%
ZvP_3HatchPoolHydra20%
5 openings1050%

#4 tscmoo

openinggameswins
9PoolExpo10%
9PoolHatch10%
9PoolSunkHatch10%
AntiFact_2Hatch10%
Over10Hatch2Sunk10%
OverhatchExpoLing1315%
OverpoolSpeed2223%
7 openings4018%

#5 ISAMind

openinggameswins
3HatchHydraExpo10%
4HatchBeforeGas10%
OverpoolSpeed4100%
ZvP_2HatchMuta70%
ZvP_3HatchPoolHydra60%
5 openings1921%

#6 Iron

openinggameswins
2HatchHydra10%
3HatchLingExpo20%
4PoolHard10%
6PoolSpeed10%
9Hatch8Pool10%
9HatchMain9Pool9Gas10%
9PoolSunkSpeed10%
AntiFact_13Pool40%
AntiFact_2Hatch8312%
AntiFactory10%
Over10Hatch10%
PurpleSwarmBuild10%
ZvP_2HatchMuta10%
ZvT_12PoolMuta10%
14 openings10010%

Iron is not a learning bot, so it did not learn to crash Steamhammer. Still, these results show a weakness in Steamhammer: Its best opening against Iron is AntiFactory, which it tried only once in these 100 games. Steamhammer did not explore enough. I tried to fix the weakness in Steamhammer 2.0.

#7 ZZZKBot

openinggameswins
11Gas10PoolMuta10%
8Pool729%
9HatchMain9Pool9Gas10%
9PoolSpeed10%
OverhatchMuta10%
Overpool+110%
OverpoolSpeed10%
ZvZ_12HatchMain20%
ZvZ_12Pool10%
ZvZ_12PoolLing4858%
ZvZ_Overgas9Pool20%
ZvZ_Overpool9Gas20%
12 openings6844%

#8 Microwave

openinggameswins
9PoolSunkHatch580%
9PoolSunkSpeed2767%
OverpoolSunk10%
OverpoolTurtle333%
ZvZ_12PoolLing10%
5 openings3762%

This looks like successful learning. Too bad Steamhammer only successfully played 37 of the 125 games.

#9 LetaBot

openinggameswins
11Gas10PoolLurker10%
2HatchLurkerAllIn40%
3HatchHydraExpo10%
3HatchLurker1338%
9HatchExpo9Pool9Gas4536%
OverpoolLurker1331%
ZvP_2HatchMuta10%
ZvT_12PoolMuta10%
ZvT_13Pool10%
ZvT_3HatchMuta10%
10 openings8131%

#10 MegaBot

openinggameswins
11Gas10PoolLurker10%
3HatchHydra10%
3HatchHydraExpo10%
3HatchLingExpo2143%
Over10Hatch10%
OverhatchExpoLing1100%
ZvP_3HatchPoolHydra20%
7 openings2836%

#11 UAlbertaBot

openinggameswins
3HatchLingExpo10%
5PoolHard2Player10%
9PoolExpo10%
9PoolSpeed10%
9PoolSunkHatch4633%
9PoolSunkSpeed2948%
Over10Hatch1Sunk20%
OverpoolSpeed10%
ZvZ_Overpool9Gas10%
9 openings8335%

#12 Tyr

openinggameswins
9PoolHatch5100%
ZvP_3HatchPoolHydra50%
2 openings1050%

#13 Ecgberht

openinggameswins
11Gas10PoolLurker1050%
2HatchLurker2361%
2HatchLurkerAllIn4475%
Over10HatchBust333%
OverpoolLurker875%
OverpoolSpeed333%
ZvT_13Pool10%
7 openings9265%

#14 Aiur

openinggameswins
11Gas10PoolLurker1100%
5PoolHard2Player1100%
9PoolSunkHatch1100%
9PoolSunkSpeed2100%
Over10Hatch10%
Over10Hatch1Sunk250%
Over10Hatch2Hard1100%
Over10HatchSlowLings1100%
OverpoolSpeed2100%
OverpoolTurtle367%
10 openings1580%

#15 TitanIron

openinggameswins
3HatchLingBust10%
AntiFact_13Pool650%
AntiFact_2Hatch10%
AntiFactory7442%
Over10Hatch2Sunk10%
OverhatchExpoMuta10%
OverpoolLurker10%
ZvZ_Overgas9Pool1421%
ZvZ_Overpool9Gas10%
9 openings10037%

This selection of openings implies that TitanIron plays a factory-first build against zerg, like Iron, and is a non-learning bot, like Iron. Later I’ll look into the source and find out for sure.

#16 Ziabot

openinggameswins
11Gas10PoolMuta425%
2.5HatchMuta10%
3HatchHydraBust10%
6PoolSpeed10%
8Pool771%
9Hatch8Pool10%
9PoolHatch450%
ZvP_2HatchTurtle10%
ZvZ_12Pool10%
ZvZ_12PoolMain1625%
ZvZ_Overpool11Gas1050%
ZvZ_Overpool9Gas5374%
12 openings10056%

Low win rates against Zia and some other opponents suggest to me that Steamhammer had other new weaknesses besides crashing. I think Steamhammer should score over 80% against Zia.

#18 Overkill

openinggameswins
11Gas10PoolMuta1090%
4PoolHard2396%
6PoolSpeed28100%
9Hatch8Pool10%
OverhatchLing250%
OverpoolSpeed1392%
ZvZ_12HatchExpo250%
ZvZ_12PoolMain10%
8 openings8091%

#19 TerranUAB

openinggameswins
2HatchLurker5290%
AntiFact_13Pool888%
AntiFact_2Hatch978%
AntiFactory3190%
4 openings10089%

#20 CUNYbot

openinggameswins
11Gas10PoolMuta978%
OverhatchLing3497%
ZvZ_12PoolLing2796%
ZvZ_Overgas9Pool10%
ZvZ_Overpool9Gas1989%
5 openings9092%

#21 OpprimoBot

openinggameswins
11Gas10PoolLurker367%
2HatchLurker250%
2HatchLurkerAllIn683%
6PoolSpeed19100%
OverpoolLurker10%
OverpoolSpeed580%
ZvT_12PoolMuta2095%
ZvT_3HatchMuta20100%
ZvT_3HatchMutaExpo24100%
9 openings10094%

#22 Sling

openinggameswins
4PoolHard475%
4PoolSoft6100%
5PoolHard2Player3100%
ZvZ_12HatchMain10%
ZvZ_Overgas9Pool10%
5 openings1580%

The selection of fast rush openings suggests that Sling played a macro strategy which was countered by fast rushes. But I don’t want to draw strong conclusions based on 15 non-crash games out of 125.

#23 SRbotOne

openinggameswins
11Gas10PoolLurker1493%
2HatchLurker1090%
2HatchLurkerAllIn1090%
3HatchLurker17100%
4PoolSoft17100%
5PoolHard7100%
9HatchExpo9Pool9Gas475%
9PoolLurker3100%
OverpoolLurker5100%
9 openings8795%

The wide range of lurker openings means that SRbotOne by Johan Kayser fought with mostly barracks units. Well, we already knew that.

#24 Bonjwa

openinggameswins
9PoolExpo6100%
9PoolSunkHatch5100%
9PoolSunkSpeed5100%
AntiFact_2Hatch3100%
AntiFactory5100%
ZvT_2HatchMuta1100%
6 openings25100%

#25 Stormbreaker

openinggameswins
11Gas10PoolMuta1100%
4PoolHard1100%
9PoolSunkHatch8100%
9PoolSunkSpeed8100%
OverhatchLing1100%
OverhatchMuta7100%
OverpoolSpeed1100%
OverpoolSunk7100%
ZvZ_12HatchExpo2100%
ZvZ_12HatchMain3100%
ZvZ_12PoolLing1100%
ZvZ_12PoolMain3100%
12 openings43100%

#26 Korean

openinggameswins
4PoolHard1100%
4PoolSoft3100%
5PoolHard5100%
5PoolHard2Player3100%
5PoolSoft1100%
6PoolSpeed6100%
OverhatchLing9100%
OverhatchMuta12100%
ZvZ_12HatchExpo13100%
ZvZ_12HatchMain16100%
ZvZ_12PoolLing14100%
ZvZ_12PoolMain17100%
12 openings100100%

#27 Salsa

openinggameswins
4PoolHard2100%
4PoolSoft4100%
5PoolHard7100%
5PoolHard2Player1100%
5PoolSoft1100%
6PoolSpeed8100%
OverhatchLing11100%
OverhatchMuta8100%
ZvZ_12HatchExpo12100%
ZvZ_12HatchMain20100%
ZvZ_12PoolLing13100%
ZvZ_12PoolMain12100%
ZvZ_Overgas9Pool1100%
13 openings100100%

overall

totalZvTZvPZvZZvR
openinggameswinsgameswinsgameswinsgameswinsgameswins
11Gas10PoolLurker3168% 2871% 333%
11Gas10PoolMuta2669% 10% 2572%
2.5HatchMuta10% 10%
2HatchHydra10% 10%
2HatchHydraBust10% 10%
2HatchLurker8782% 8782%
2HatchLurkerAllIn6473% 6473%
3HatchHydra40% 40%
3HatchHydraBust10% 10%
3HatchHydraExpo50% 10% 40%
3HatchLingBust20% 10% 10%
3HatchLingExpo2536% 20% 2241% 10%
3HatchLurker3171% 3073% 10%
4HatchBeforeGas30% 30%
4PoolHard3291% 10% 3194%
4PoolSoft3197% 17100% 10% 13100%
5PoolHard19100% 7100% 12100%
5PoolHard2Player989% 1100% 7100% 10%
5PoolSoft2100% 2100%
6PoolSpeed6397% 2095% 4398%
7Pool12Hatch10% 10%
7PoolSoft10% 10%
8Pool1450% 1450%
9Hatch8Pool40% 10% 10% 20%
9HatchExpo9Pool9Gas5137% 4939% 20%
9HatchMain9Pool9Gas20% 10% 10%
9PoolExpo875% 6100% 20%
9PoolHatch1070% 5100% 450% 10%
9PoolLurker3100% 3100%
9PoolSpeed862% 683% 10% 10%
9PoolSunkHatch6650% 5100% 1100% 1392% 4732%
9PoolSunkSpeed7265% 683% 2100% 3574% 2948%
AntiFact_13Pool1856% 1856%
AntiFact_2Hatch9721% 9621% 10%
AntiFactory11257% 11158% 10%
Over10Hatch90% 10% 80%
Over10Hatch1Sunk119% 911% 20%
Over10Hatch2Hard1100% 1100%
Over10Hatch2Sunk200% 10% 180% 10%
Over10HatchBust425% 333% 10%
Over10HatchSlowLings520% 520%
OverhatchExpoLing1421% 1100% 1315%
OverhatchExpoMuta10% 10%
OverhatchLing5796% 5796%
OverhatchMuta2993% 10% 2896%
Overpool+110% 10%
OverpoolHatch10% 10%
OverpoolLurker2854% 2854%
OverpoolSpeed6156% 862% 1573% 1587% 2322%
OverpoolSunk888% 888%
OverpoolTurtle933% 633% 333%
PurpleSwarmBuild10% 10%
ZvP_2HatchMuta90% 20% 70%
ZvP_2HatchTurtle10% 10%
ZvP_3HatchPoolHydra170% 170%
ZvP_4HatchPoolHydra10% 10%
ZvT_12PoolMuta2383% 2286% 10%
ZvT_13Pool20% 20%
ZvT_2HatchMuta1100% 1100%
ZvT_3HatchMuta2195% 2195%
ZvT_3HatchMutaExpo24100% 24100%
ZvZ_12HatchExpo2997% 2997%
ZvZ_12HatchMain4293% 4293%
ZvZ_12Pool20% 20%
ZvZ_12PoolLing10479% 10479%
ZvZ_12PoolMain4973% 4973%
ZvZ_Overgas9Pool1921% 1421% 520%
ZvZ_Overpool11Gas1145% 10% 1050%
ZvZ_Overpool9Gas7674% 10% 7476% 10%
total159664%68562%15526%63382%12329%
openings played6937363113

This summary table took me hours to get right, so I hope it's useful.

Steamhammer played 69 openings in 1596 non-crash games, which is around 2/3rds of the openings it knows. No single matchup had more than 37 different openings. There were far more games against terran and zerg than against protoss and random, partly due to the crashing pattern. Against the random opponents (Tscmoo and UAlbertaBot), it settled on mostly general-purpose openings, as you might expect. Its best matchup was ZvZ, with a Jaedong-like 82% win rate (and lately, Jaedong crashes half the time too, so they’re just alike).

Openings that were both popular and successful include 2HatchLurker and 2HatchLurkerAllIn versus terran, 6PoolSpeed with a 97% win rate against mostly weak opponents, 9PoolSunkSpeed used across all matchups, and ZvZ specialties OverhatchLing, ZvZ_12PoolLing, and ZvZ_Overpool9Gas. None of the opening choices surprises me, though some of the win rates do.

CIG 2018 - Overkill was broken

Did Overkill actually perform much worse in CIG 2018 than in past years? Here are the bots carried over from 2017 to 2018, with win rates in both years, with numbers from the official results. We see that Overkill collapsed in win rate from 2017 to 2018, a far bigger change than any other bot. Iron performed poorly in 2017 because it failed on the map Hitchhiker. Other bots mostly had modestly lower win rates in the stronger field this year. My 2017 crosstable was calculated from the detailed results, which included some corrupted data and are a little different from the official results—except for Sling, which was a lot different: 26.07% in 2017 versus its official 18.08%, reducing its year-over-year difference.

bot20172018
UAlbertaBot65.59%60.58%
Overkill62.75%34.68%
Ziabot61.75%51.08%
Iron61.62%74.31%
Aiur59.83%51.54%
TerranUAB36.78%34.40%
SRbotOne34.14%24.37%
OpprimoBot30.69%27.11%
Bonjwa30.67%23.57%
Sling18.08%26.52%
Salsa4.64%1.54%

Was the difference due to the maps? N0. In 2017, Overkill scored 57% or more on every map (CIG 2017 bots x maps). In 2018, Overkill scored 38% or below on every map (official results). And 3 of the 5 maps were the same: Tau Cross, Andromeda, and Python.

Did they run different versions of Overkill? The source that they distributed for Overkill is identical in both years. Theoretically they might have run something different by mistake—but it produced the expected files in the write directory, so it would be a surprise.

Finally I downloaded the Overkill replays and watched some. The poor bot’s build orders were severely distorted, skipping over drones and buildings. It would do things like take gas on 7 and then stop all construction, or follow a normal-ish build but drop many drones so that its economy was anemic. Sometimes drones moved erratically instead of mining. It looked similar to play I’ve seen from Steamhammer when latency stuff is way out of whack. Of the games I looked at, some were hopelessly muddled, some were close to normal with only occasional dropped drones, and none were 100% good. I don’t know what the problem was, something corrupted or a server setting that Overkill could not cope with, but whatever it was, Overkill was badly broken and far short of its normal strength.

43864-OVER_ZIAB.REP (Overkill’s last game of the tournament) is an example replay that shows the problems.

It’s possible that some other bots may have been affected. If the difference was in a server setting that Overkill was not ready for, then it would be surprising if every other bot was ready.

CIG 2018 - what Overkill learned

After analyzing AIUR yesterday, I ran a similar (but much simpler) analysis for the classic zerg #18 Overkill. The version in CIG 2018 has not been updated since 2015 and is the same version that still plays on SSCAIT. In 2015 it was a sensation, placing 3rd in both CIG and AIIDE—its place of 18 in this tournament, with about 35% win rate, suggests huge progress over the past 3 years. But keep reading; Overkill appears to have been broken in this tournament. I did this analysis once before: See what Overkill learned in AIIDE 2015.

Classic Overkill knows 3 openings, a 9 pool opening which stays on one base for a good time, and 10- and 12-hatch openings to get mutalisks first. When it chooses 9 pool, that means that the opponent is either rushing (so the 9 pool is necessary to defend) or is being too greedy (which the 9 pool can exploit). Overkill counts some games twice in an attempt to learn faster, so sometimes its total game count is larger than the number of rounds in the tournament (125).

NinePoollingTenHatchMutaTwelveHatchMutatotal
opponentnwinnwinnwinnwin
#1 Locutus420%420%410%1250%
#2 PurpleWave430%430%420%1280%
#3 McRave440%440%430%1310%
#4 tscmoo400%400%472%1271%
#5 ISAMind420%420%410%1250%
#6 Iron547%320%393%1254%
#7 ZZZKBot472%390%472%1332%
#8 Microwave546%350%422%1313%
#9 LetaBot526%330%402%1253%
#10 MegaBot6012%240%417%1258%
#11 UAlbertaBot410%410%482%1301%
#12 Tyr400%390%472%1261%
#13 Ecgberht5716%244%4212%12312%
#14 Aiur9434%147%1712%12528%
#15 TitanIron3611%200%6916%12512%
#16 Ziabot160%160%9323%12517%
#17 Steamhammer10748%70%1010%12442%
#19 TerranUAB2467%30%9883%12578%
#20 CUNYbot1844%617%10166%12561%
#21 OpprimoBot3667%30%8676%12571%
#22 Sling6746%60%5242%12542%
#23 SRbotOne2374%425%9589%12284%
#24 Bonjwa7592%425%4687%12588%
#25 Stormbreaker7091%20%5387%12588%
#26 Korean7799%20%4693%12595%
#27 Salsa46100%3294%46100%12498%
total130536%5976%137240%327432%

The 10 hatch opening was useless in this tournament—against every opponent, 10 hatch was the worst choice, at best tying for 0. In 2015, 10 hatch was about as successful as the other openings.

Signs are that something was wrong with Overkill in this tournament. In AIIDE 2015, then #3 Overkill scored 23% against then #4 UAlbertaBot, 68% against #5 AIUR, and 99% against #17 OpprimoBot. In CIG 2018, it was 1.6% against UAlbertaBot, 28% against AIUR, 71% against OpprimoBot. All versions appear to be the same in both tournaments—I didn’t look closely, but I did unpack the sources and check dates (in particular, Overkill has file change dates up to 8 October 2015 in both tournaments). Overkill had 14 crash games in CIG 2018, not enough to account for the difference. It’s hard to believe that the maps could have shifted results that much.

Tomorrow: What went wrong with Overkill?

CIG 2018 - what AIUR learned

Here is what the classic protoss bot AIUR learned about each opponent over the course of CIG 2018. AIUR has not been updated in many years and has fallen behind the state of the art, but its varied strategies and learning still make it a tricky opponent in a long tournament. Seeing AIUR's counters for each opponent tells us something about how the opponent played. For past editions, see AIIDE 2017 what AIUR learned and what AIUR learned (AIIDE 2015).

This is generated from data in AIUR's final write directory. There were 125 rounds and 5 maps, one 2-player and two each 3- and 4-player maps. For some opponents, all games were recorded, giving 25 games on the 2-player map and 50 games each on 3- and 4-player maps. For most opponents, fewer games were recorded. AIUR recorded 2932 games, and the results table lists 318 crashes for AIUR. 2932 + 318 = 3250, the correct total game count. Unrecorded games were lost due to crashes, and for no other reason.

First the overview, summing across all opponents.

overall234total
 nwinsnwinsnwinsnwins
cheese7249%12765%13235%33149%
rush2941%26933%26155%55944%
aggressive1323%22568%18478%42271%
fast expo3324%18548%20748%42546%
macro4633%18052%13560%36153%
defensive14175%31473%37955%83465%
total33454%130056%129856%293256%
  • 2, 3, 4 - map size, the number of starting positions
  • n - games recorded
  • wins - winning percentage over those games
  • cheese - cannon rush
  • rush - dark templar rush
  • aggressive - fast 4 zealot drop
  • fast expo - nexus first
  • macro - aim for a strong middle game army
  • defensive - try to be safe against rushes

Looking across the bottom row, you can see that AIUR had a plus score on every size of map, and that it had to choose different strategies to do so well. It's a strong result for a bot which has essentially no micro skills and has not been updated since 2014. It does still have the best cannon rush of any bot, if you ask me.

#1 locutus234total
 nwinsnwinsnwinsnwins
cheese10%80%2512%349%
rush10%100%60%170%
aggressive10%40%50%100%
fast expo10%140%50%200%
macro10%70%40%120%
defensive10%714%50%138%
total60%502%506%1064%

Even against the toughest opponents, AIUR can scrape a small edge with learning. Against Locutus, it pulled barely above zero, but got a few extra wins because it discovered that its cannon rush occasionally scores on 4-player maps. Results against PurpleWave below are similar. I suspect that if AIUR had played the cannon rush every game, Locutus would have adapted and nullified the edge. Maybe it did, and that’s why the edge is so small.

#2 purplewave234total
 nwinsnwinsnwinsnwins
cheese10%80%3918%4815%
rush10%80%20%110%
aggressive10%100%30%140%
fast expo40%80%20%140%
macro10%100%20%130%
defensive30%60%20%110%
total110%500%5014%1116%


#3 mcrave234total
 nwinsnwinsnwinsnwins
cheese1100%10%10%333%
rush10%412%10%432%
aggressive00%20%30%50%
fast expo10%10%4217%4416%
macro10%30%10%50%
defensive10%20%20%50%
total520%502%5014%1059%

Against McRave, the choice is nexus first. McRave must have settled on a macro opening itself.

#4 tscmoo234total
 nwinsnwinsnwinsnwins
cheese1127%10%10%1323%
rush10%10%30%50%
aggressive10%119%10%138%
fast expo520%3315%10%3915%
macro10%20%2214%2512%
defensive10%20%2218%2516%
total2020%5012%5014%12014%

Against the unpredictable Tscmoo, AIUR wavered before settling on an unpredictable set of answers. Notice that not all the strategies are well explored: If you win less than 1 game in 5, then playing an opening 3 times is not enough. If the tournament were much longer, AIUR would likely have scored higher because of its slow but effective learning.

#5 isamind234total
 nwinsnwinsnwinsnwins
cheese10%20%40%70%
rush1100%3719%388%7614%
aggressive00%10%30%40%
fast expo10%50%20%80%
macro10%10%20%40%
defensive10%40%10%60%
total520%5014%506%10510%

ISAMind may be based on Locutus, but unlike Locutus it is vulnerable to AIUR’s dark templar rushes. It’s a sign that it is not as mature and well tested.

#6 iron234total
 nwinsnwinsnwinsnwins
cheese10%10%50%70%
rush10%2619%20%2917%
aggressive00%20%20%40%
fast expo10%10%3110%339%
macro10%195%40%244%
defensive10%10%60%80%
total50%5012%506%1059%


#7 zzzkbot234total
 nwinsnwinsnwinsnwins
cheese40%20%20%80%
rush40%40%10%90%
aggressive30%20%10%60%
fast expo30%30%10%70%
macro70%50%40%160%
defensive40%3429%4112%7919%
total250%5020%5010%12512%

4 pooler ZZZKBot is of course best countered by a defensive anti-rush strategy. Well, it helped, but the rush is too strong for AIUR to survive reliably. On the 2-player map, AIUR found no answer.

#8 microwave234total
 nwinsnwinsnwinsnwins
cheese20%20%10%50%
rush10%277%10%297%
aggressive10%10%10%30%
fast expo10%20%10%40%
macro10%10%922%1118%
defensive1822%1724%3625%7124%
total2417%5012%4922%12317%

Microwave apparently also played a rushy style versus AIUR. That’s interesting. I think that AIUR’s defensive strategy is good against pressure openings generally, so Microwave was likely playing low-econ but not necessarily fast rushes.

#9 letabot234total
 nwinsnwinsnwinsnwins
cheese10%10%10%30%
rush10%10%333%520%
aggressive00%333%10%425%
fast expo10%4149%4349%8548%
macro1100%333%10%540%
defensive10%10%10%30%
total520%5044%5044%10543%

Fast expo makes sense against LetaBot’s “wait for it... wait for it... here it comes!” one big smash.

#10 megabot234total
 nwinsnwinsnwinsnwins
cheese10%20%30%60%
rush250%40%3811%4411%
aggressive10%30%30%70%
fast expo10%30%20%60%
macro20%3628%20%4025%
defensive1894%20%20%2277%
total2572%5020%508%12526%

Why did MegaBot have so much more trouble on the 2-player map? According to the official per-map result table, MegaBot did fine overall on Destination (the one 2-player map), so its trouble came only against AIUR. Maybe I should watch replays and diagnose it.

#11 ualbertabot234total
 nwinsnwinsnwinsnwins
cheese10%10%10%30%
rush20%4337%20%4734%
aggressive10%20%10%40%
fast expo10%20%10%40%
macro1833%10%10%2030%
defensive10%10%4416%4615%
total2425%5032%5014%12423%


#12 tyr234total
 nwinsnwinsnwinsnwins
cheese10%10%10%30%
rush1100%10%3281%3479%
aggressive00%3746%875%4551%
fast expo1100%333%367%757%
macro10%633%333%1030%
defensive10%20%333%617%
total540%5040%5072%10555%

I suspect that Tyr suffered here because it is a jvm bot and could not write its learning file.

#13 ecgberht234total
 nwinsnwinsnwinsnwins
cheese1100%3889%250%4188%
rush1100%10%4367%4567%
aggressive00%475%10%560%
fast expo1100%10%20%425%
macro10%367%10%540%
defensive10%367%10%540%
total560%5082%5060%10570%


#15 titaniron234total
 nwinsnwinsnwinsnwins
cheese10%10%250%425%
rush10%10%333%520%
aggressive00%4279%4288%8483%
fast expo10%10%10%30%
macro1100%250%10%450%
defensive1100%30%10%520%
total540%5068%5078%10571%

TitanIron appears to have been too predictable. Notice that the winning strategy on most maps was never tried (without crashing) on the 2-player map. It might have won there too.

#16 ziabot234total
 nwinsnwinsnwinsnwins
cheese1650%250%10%1947%
rush10%20%10%40%
aggressive10%10%333%520%
fast expo10%250%00%333%
macro10%10%10%30%
defensive333%4269%4457%8962%
total2339%5062%5052%12354%


#17 steamhammer234total
 nwinsnwinsnwinsnwins
cheese10%10%10%30%
rush367%475%9100%1688%
aggressive3100%17100%15100%35100%
fast expo20%20%250%617%
macro1100%10100%10%1292%
defensive14100%16100%22100%52100%
total2483%5092%5094%12491%


#18 overkill234total
 nwinsnwinsnwinsnwins
cheese10%30%250%617%
rush00%250%10%333%
aggressive00%10%1060%1155%
fast expo10%367%00%450%
macro00%00%00%00%
defensive1688%4190%3778%9485%
total1878%5080%5072%11876%


#19 terranuab234total
 nwinsnwinsnwinsnwins
cheese1100%888%10%1080%
rush1100%11100%30100%42100%
aggressive00%475%250%667%
fast expo1100%16100%683%2396%
macro1100%989%1090%2090%
defensive1100%250%10%450%
total5100%5092%5090%10591%


#20 cunybot234total
 nwinsnwinsnwinsnwins
cheese10%250%475%757%
rush1100%10%20%425%
aggressive00%475%1392%1788%
fast expo10%250%250%540%
macro1100%989%13100%2396%
defensive1100%32100%15100%48100%
total560%5090%4990%10488%


#21 opprimobot234total
 nwinsnwinsnwinsnwins
cheese1100%12100%683%1995%
rush1100%5100%7100%13100%
aggressive00%7100%4100%11100%
fast expo1100%11100%17100%29100%
macro1100%8100%7100%16100%
defensive1100%7100%9100%17100%
total5100%50100%5098%10599%


#22 sling234total
 nwinsnwinsnwinsnwins
cheese10%10%10%30%
rush1100%5100%250%888%
aggressive00%13100%13100%26100%
fast expo1100%7100%10100%18100%
macro1100%8100%11100%20100%
defensive1100%16100%13100%30100%
total580%5098%5096%10596%


#23 srbotone234total
 nwinsnwinsnwinsnwins
cheese10%250%10%425%
rush1100%9100%367%1392%
aggressive00%13100%16100%29100%
fast expo1100%10100%8100%19100%
macro1100%786%6100%1493%
defensive1100%9100%16100%26100%
total580%5096%5096%10595%


#24 bonjwa234total
 nwinsnwinsnwinsnwins
cheese1100%9100%475%1493%
rush1100%13100%10100%24100%
aggressive00%7100%10100%17100%
fast expo1100%6100%7100%14100%
macro1100%7100%8100%16100%
defensive1100%8100%11100%20100%
total5100%50100%5098%10599%


#25 stormbreaker234total
 nwinsnwinsnwinsnwins
cheese475%10%475%967%
rush00%580%10100%1593%
aggressive00%18100%7100%25100%
fast expo00%00%6100%6100%
macro00%9100%8100%17100%
defensive20100%17100%15100%52100%
total2496%5096%5098%12497%


#26 korean234total
 nwinsnwinsnwinsnwins
cheese7100%2100%10100%19100%
rush00%7100%8100%15100%
aggressive00%5100%8100%13100%
fast expo00%8100%8100%16100%
macro00%5100%6100%11100%
defensive14100%23100%10100%47100%
total21100%50100%50100%121100%

Well, if you win every game, learning cannot help.

#27 salsa234total
 nwinsnwinsnwinsnwins
cheese9100%15100%9100%33100%
rush00%00%3100%3100%
aggressive00%11100%8100%19100%
fast expo00%00%4100%4100%
macro00%8100%7100%15100%
defensive15100%16100%19100%50100%
total24100%50100%50100%124100%

CIG 2018 - bots that wrote data

The CIG organizers have released the final read/write folders for the 2018 tournament. I looked through all the folders to see if each bot recorded information. If it saved nothing, it did not learn. If it saved some data, it may have used it for learning (or it might be log files or whatever). I also added a curve “↕” column, whether the bot’s win rate moves up or down (or is approximately flat) between round 40 and the end of the tournament—is the bot improving until late in the tournament? Win curves early in the tournament are noisy, so they’re hard to compare. (If anybody can’t see the Unicode up and down arrows, let me know and I can change them.)

Some bots have files in their AI folder, which may be prepared data or pre-learned data for specific opponents. I note that too. Prepared data could be kept elsewhere, including in the binary, so I didn’t see all of it. We know that PurpleWave had extensive preparations for specific opponents.

As has been mentioned, bots in Java or Scala (bots which run on the jvm) were unable to write learning data. Those that depend on their learning data were playing at a severe disadvantage. #2 PurpleWave lost narrowly to #1 Locutus and was one of the affected bots. It’s a serious problem for a tournament that wants to be taken seriously.

#botinfo
1LocutusPrepared data for 11 opponents. Learning data very similar to Steamhammer’s.
2PurpleWave-jvm :-(
3McRave-Looks like wins and losses of each of 16 available strategies for the previous 8 games. Perhaps a sliding window?
4tscmooLooks like strategy and win/loss info for each opponent, in a hard-to-read structured format. Past years have had more elaborate data.
5ISAMindPrepared file that looks like neural network learning data. Per-opponent learned data that looks like Steamhammer data. ISAMind is based on Locutus, so that makes sense.
6IronNothing.
7ZZZKBot-Game records, one game per line, in an opaque format that looks about the same as last year.
8MicrowaveFor each opponent, wins and losses for 8 different strategies.
9LetaBotInformation about a few recent games against ZiaBot, probably not used for learning.
10MegaBotExtensive log data. The apparent learning files are MegaBot-vs-[opponent].xml and give scores for NUSBot, Skynet, Xelnaga (MegaBot’s three heads).
11UAlbertaBotWin/loss numbers for 4 protoss, 4 terran, and 5 zerg strategies. But the same strategy was always chosen for each race, so learning was turned off.
12Tyrjvm :-(
13Ecgberhtjvm :-(
14AiurThe familiar lists of numbers for each opponent.
15TitanIronNothing.
16ZiaBotOne file with data for TerranUAB, UAlbertaBot, and 3 lines for SRbotOne. Zia’s learning looks broken or disabled.
17SteamhammerSteamhammer saved data when it did not crash, and successfully learned a little bit.
18OverkillA file for each opponent, game records with opponent/opening/score.
19TerranUAB-Nothing.
20CUNYbotOne file output.txt with strategy information and numbers, naming a few opponents but not most. A prepared file in the AI folder has the same format. It’s mysterious.
21OpprimoBotNothing.
22SlingNothing.
23SRbotOneA large number of “stats” files named with date and time, apparently game records. For each opponent, another file giving the strategy “Terran_Attrition” and win/loss numbers. I’m not sure whether this could be learning data, but the bot did earn an up arrow.
24BonjwaNothing.
25StormbreakerPrepared data NN_model_policy and NN_model_stateValue, apparently neural network learning data. For each opponent, game records with 4 numbers per game. The format is like Overkill’s but records more information.
26Korean-Nothing.
27Salsa-Nothing.

Most striking is that the “nothing” bots cluster toward the bottom. If you don’t even try to record data, either you are Iron or you performed weakly. The jvm bots, which in fact recorded nothing due to no fault of their own, still placed higher than all the nothing bots other than Iron. Perhaps recording data is a proxy for how much effort has gone into the play.

Some bots had a rising win rate (an up arrow) despite doing no learning, most distractingly UAlbertaBot. I think that since UAlbertaBot plays random, its opponents can easily get confused about it. In general, I think that playing unpredictably (either being random or choosing varied openings randomly) can mess up the learning of some other bots.

I will be analyzing what certain bots learned. It will shed light on their opponents.

popular posts seem largely random

Does this make sense to anybody? Here are the most popular posts according to my statistics, leaving aside recent posts. Why these?

LetaBot’s guest post on rushbots deserves to be popular. The others seem mostly random. Is it because some posts accidentally include keywords that Google likes, or what?

many ways to defend against SAIDA’s drops in TvT

SAIDA introduces new drop skills in TvT. Bots have not faced drops like this before, and are not adept at defending. SAIDA likes to drop tanks and goliaths with several ships, at the edge of the map, using your mineral line or your buildings for cover. The drops are able to destroy a base if the defender is weak or disorganized.

There are a lot of defensive possibilities; I’ll list some. Make SAIDA pay for those drops!

active defense

• Drop prediction. If you spot moving dropships, you may be able to guess where they are going. You can try to divert wraiths or goliaths to intercept the path, or send defenders to the predicted drop zone.

• Wraiths. Seek out those dropships and make them hurt. If they’re loaded, you can force them to unload prematurely. If the drop already happened, shoot down as many as you can. Even if the dropships escape to friendly territory, you have caused delays and gained time.

• Counter drops. Drop your own units directly on top of the enemy units. SAIDA’s tanks will have to unsiege, and (if you have coordination skills) you can take the opportunity to move in with other units. A good counter drop can turn the enemy drop from a benefit into a cost.

• If all else fails, maneuver tanks to pin down and destroy the dropped units. Don’t let them stay alive and kill more of your stuff. So far, Tscmoo has done the best job of this.

exploit predictability

Bots tend to have stereotyped play. SAIDA likes to fly along the edge of the map and drop on the edge where its units cannot be surrounded. An opponent could record the events, notice the obvious pattern, and prepare special defenses. Or at least: Once you’ve seen dropships, set up some turrets to see them coming and restrict their movement. By the time big drops can happen, terran commonly has excess minerals and can afford to throw up a bunch of turrets even if they aren’t efficiently placed.

• Place turrets along the edge where SAIDA may approach. (This is more common in TvP as defense against arbiter recall.) If you have high confidence, you could even detail goliaths to lie in ambush. If SAIDA doesn’t know about the turrets, it will have to fly into range and take damage before it can evade. At worst, you will have seen the drop and can try to predict where it will go next.

• Lay spider mines in potential drop zones, such as behind your mineral line. I don’t know whether SAIDA will drop on the mines and blow up, or scan the mines and drop elsewhere, but it’s to your advantage either way. Laying mines near your mineral line is not as clever against protoss or zerg drops, because protoss and zerg can more easily drag the mines into your workers. Terran doesn’t have a good unit to drag mines with.

stay alive

• Lift the command center and run SCVs. There’s a good chance you can keep the CC alive and quickly restore the base to operation once the drop is cleared. Lifting the command center is a basic terran skill; I find it surprising that Krasi0 doesn’t have it yet.

Steamhammer 2.1 status

My energy is recovering slowly from “blrgh, is it day again?” toward “I wonder what’s for lunch?”

I got a modest amount of work done for Steamhammer 2.1. I fixed 4 different bugs in terran play, and now terran is up to snuff—there was a good one where medics liked to break away and advance on their own. Steamhammer is better than before with barracks units, still klutzy with factory units though vultures may get stuck on each other less often. At least one protoss bug is not as easy and needs actual work to solve. I also feel like fixing scourge, so we’ll see how long it takes. Should be more on the order of days than weeks.

For Steamhammer 2.2, I think the headline feature will be dropping BWTA. That will be a relief. When it looks solid, I can move to BWAPI 4.2.0 and be free of the bugs in 4.1.2. The 4.1.2 bugs effectively make drop more expensive for zerg, which has discouraged me from working on drop skills.

Last year, the end-of-year Steamhammer version 1.4a3 (gotta love that version number) was not only the absolutely strongest Steamhammer of the year, it was also relatively strongest: It showed the best results against other bots. Steamhammer finished higher in SSCAIT than in AIIDE. I’m seeing early signs that it might work out the same this year. This year, the AIIDE version includes a lot of necessary work, but not all of it is polished enough. By the end of the year, the new bugs should be smoothed out and other important problems fixed. I’m expecting a strong chance that Steamhammer will again finish higher in SSCAIT than in AIIDE. I think I am being taught a lesson in good tournament preparation.

Still coming soon-ish: CIG 2018 analysis.

new bot SAIDA

I think we have a new champion.

New terran SAIDA has been playing extremely impressive games on SSCAIT, scoring 10-0 as I write. In games so far, it breaks down both Krasi0 and Locutus with a strategy like this: Stay home on 2 bases and build up a strong tank force, move out and establish a contain as close to the enemy natural as possible, use the space this gives to reduce other enemy bases around the map with vulture raids, small tank attacks, and drops with multiple dropships (a unique skill for terran bots). Based on its debug drawing, it seems to have a sophisticated understanding of what its enemy is doing, and from the way it varies its play against different opponents, it makes use of that understanding. It scouts carefully. It can place tanks on high ground appropriately. Its drop positioning is strong. When rushed, it places a bunker in a strong rear position and pops marines in and out at the right times. When PurpleWave tried forward 2 gates, SAIDA scouted it and correctly focussed down the pylon first, then took its vultures away from the gates to hit the protoss main, perfectly done. The bot has a lot of powerful and rare skills.

SAIDA appears to play with primarily factory units against all races. If so, it may be vulnerable against the strongest zergs. Or maybe not, we haven’t seen the games yet! Looking into the binary, I see that SAIDA knows names for a wide variety of strategies by all races. If it also knows counters for those strategies–which I think we can expect—then it is prepared for anything it is likely to see. For example, here are the names it knows for zerg:

opening“main” (current) strategy
Zerg_4_DroneZerg_main_zergling
Zerg_5_DroneZerg_main_maybe_mutal
Zerg_9_DroneZerg_main_hydra
Zerg_9_HatZerg_main_lurker
Zerg_9_OverPoolZerg_main_mutal
Zerg_9_BalupZerg_main_fast_mutal
Zerg_12_PoolZerg_main_hydra_mutal
Zerg_12_HatZerg_main_queen_hydra
Zerg_12_Ap
Zerg_sunken_rush
Zerg_4_Drone_Real

I’m not sure what all of the names mean, but most are obvious. Steamhammer, with its huge opening repertoire, knows openings which are technically not on this list. But for practical purposes, I expect this should cover everything before hive tech.

I can see flaws in SAIDA’s play: It makes too many turrets, its tank positioning versus protoss is too compact and vulnerable, it makes micro errors. But the flaws are not easy for other bots to exploit.

This bot must be the product of long development. The rest of us have work to do!

delays

Other than healthy today. I don’t have much strength for difficult tasks like standing up. I’ll get back to it when I’ve recovered.

CIG 2018 stand by

As usual, the detailed results file is in a slightly different format than past files. My results from analyzing it don’t quite match the official results. Stand by while I figure it out.

CIG 2018 detailed results are out

As LetaBot mentioned in a comment, CIG 2018 detailed results are out. They include a result file that I can analyze with my software, so expect the usual colorful crosstables in the coming days. For today, a few notes:

#7 ZZZKBot, whose basic strategy is 4 pool though it has added other strategies in recent years, was the only player with a plus score over #1 Locutus. On the one hand, it shows how dominant Locutus was. On the other hand, what did ZZZKBot do to win? Was it a prepared strategy? I’ll be looking into it. There are a few other interesting upsets.

The win rate over time graph shows which bots benefited from learning during the tournament, or at least which benefited from changes in play by themselves or their opponents (you could gain win rate over time if your opponents mislearn about you). To my eye, there seems to be a higher rate of curving lines than I remember from past years. I’ll look into that too.

#17 Steamhammer, with about 35% win rate, was the bot with the most crashes at 1553 out of 3250 games. That’s a much higher rate than I expected, 47%. I still do not know why the crashes never showed up in my test environment. On the upside, Steamhammer scored 66% in games where it did not crash, which suggests that it did approximately keep up with general progress over the last year. In any case, it’s another reminder that reliability is a top priority. The second place crashing bot was #27 Salsa at 35% crashes, and Salsa finished dead last. The third place crasher at 30% crashes was #15 TitanIron, which also finished unexpectedly low after high expectations among watchers who guessed it was a fork of Iron. Every other bot had a crash rate under 10%.

And there is source. I will look into the code of some participants to see how they tick.

Steamhammer 2.0 download

Here is the Steamhammer 2.0 download link. It’s the same zip file I submitted for AIIDE 2018, so unlike an SSCAIT upload it doesn’t include BWAPI.

Steamhammer 2.0 download with source and binary.

It is meant to play zerg only. The configuration file contains nothing for terran or protoss.

You can fill in openings yourself and it will play, but my testing shows unacceptable bugs for both terran and protoss. For example, vultures often get stuck on targets as if married to them, unable to switch away until one or the other dies (this bug allows no divorce for any cause). In zealots versus zerglings, the zealots like to move back and forth and give the zerglings free hits. “It’s only fair, we’re so much taller and stronger.”

Steamhammer 2.1 will come out when the bot again plays all races acceptably. I’m not sure how long it will take. After the big effort for AIIDE, my energy is low and I need a break. I haven’t even updated Steamhammer’s web page yet.

Steamhammer 2.0 change list

Some parts of the change list are already posted; think of those posts as part of this one (ahem, “included here by reference are....”). Here is the rest. You may notice that it is slightly long.

code changes

• More stuff in UnitUtil: Cooldown and FramesToReachAttackRange() functions added, and used in micro. GetWeapon() functions reworked for simplicity. Damage-per-frame functions added to compare weapon strengths. IsCompletedResourceDepot() added, factoring out code that was repeated in several places.

UnitInfo::estimateHealth() estimates the health of a unit which may not been seen for a while, accounting for protoss shield regeneration and zerg hp regeneration. (Terran medic healing and SCV repair are not easy to predict.)

InformationManager::enemyHasSiegeMode() added, and used in tactical calculations.

• In GameCommander, I sorted the manager calls so that managers which gather information are called first, and managers which use the information are called later. They had gotten jumbled over time. The main effect is that Steamhammer reacts 1 frame faster after discovering the race of a random opponent, a crucial difference that I estimate will, over Steamhammer’s lifetime, save approximately zero games.

• I renamed SquadData::addSquad() to createSquad(), since that’s what it does, and reworked it for simplicity. I removed the declaration of SquadData::clearSquad(), which was not implemented.

• Some calls in WorkerManager iterate through bases instead of through units to find base-related information. It’s faster and simpler (but I did have to spend time to fix a bug that I introduced in the process).

• More unnecessary includes removed, bringing a negligible improvement in compile times.

• I removed the configuration option Config::Micro::UnitNearEnemyRadius. The value is now chosen dynamically in code.

• In the game info display (turned on with Config::Debug::DrawGameInfo and drawn in the upper left), most labels were not needed because the meaning is obvious. I removed the labels of items other than “Opp Plan”. Less clutter is better.

• The TimerManager display, turned on with Config::Debug::DrawModuleTimers, had grown disorganized and probably incorrect. I straightened it out. I also improved comments in the code to prevent future disorganization.

opponent model

The opponent model has always distinguished between opponents that appear to follow the same plan every game, and opponents that vary their play. Until this version, it used the information only in a minor way. Now it selects openings using an entirely different method for multi-strategy enemies. If you play the same every game, Steamhammer will try to find the best single response, exploiting your predictability. If you mix up your play to confuse Steamhammer, then Steamhammer will mix up its play to confuse you; you get minimal predictability to exploit. We’ll see how well it works, but I’m expecting it to make a big difference in a long tournament like AIIDE.

• Against a single-strategy opponent, Steamhammer sticks with the variant of epsilon-greedy that it has always used: With probability epsilon, explore randomly; otherwise, choose the best known opening according to a weighted win rate that tries to take the maps into account. It’s not strictly classic epsilon-greedy, though, because epsilon varies according to the loss rate: If we are losing a lot, explore more often and play the best known opening less often. (It’s an adaptation to having more openings available than can be tried.) I have modified it so that the exploration rate increases more rapidly with the loss rate, because I found it was too often repeating an opening that won 1 game out of many, instead of looking for a better choice.

• Against a multi-strategy opponent, Steamhammer starts with the same weighted-win-rate calculation that it uses in the single-strategy case. Each opening it has already played is an option, and has a measured win rate. Exploring a new opening is an option, and its win rate is the mean win rate of openings that have been tried, its best estimate of how likely a newly explored opening is to win—but with a hard floor, so that the exploration rate never goes too close to zero. It randomly chooses among the possibilities, giving each a probability proportional to the square of its win rate. Squaring the rates gives openings with win rates near zero little chance of being chosen—unless most openings have win rates near zero. If all win rates are zero, because there are no wins yet, the hard floor on exploration means that Steamhammer explores every time. If the win rates are high for some openings and low for many others, exploration will be rare and Steamhammer will most often randomly choose one of the better openings. It’s ad hoc but makes a certain amount of sense.

macro

• Mineral locking. Steamhammer’s implementation follows Locutus in outline, but is different in detail. Mineral locking helps mainly in macro games, which Steamhammer is now able to play better because of the squad changes, so this was an opportune time to add it.

construction

• Prefer to expand to bases near the edge of the map. Bases near the edge are usually (not always) more protected than bases near the center of the map. In practice, Steamhammer now mostly avoids taking the risky center bases on Heartbreak Ridge and other maps, and the exposed mineral only bases on Python, at least until later in the game. I had to tune it carefully so that the natural of the 1 o’clock base on Tau Cross, far from the edge, is still preferred over the exposed 3rd base at the edge. Someday I’ll implement map analysis and figure out a real measure of exposure to attack.

• Bug fix: Don’t try to build at a location which a worker cannot reach. It never happened, as far as I know, but there was a mistake in the code.

tactics

• In the overview I said that workers are not transferred to base that is in danger, but there is more to it than that. Each base keeps track of whether it is being attacked severely enough that workers appear to be in danger; the estimate is made by CombatCommander::updateBaseDefenseSquads(), which I wrote about under base defense, and can be accessed via the Base objects for each base with base->inWorkerDanger(). If a worker is idle, meaning it is due to be assigned a new task if one is available, then it is not assigned a task at a base where workers are in danger. A worker is normally made idle after completing any task: worker was just created, gas collection is being turned off so worker no longer needs to mine gas, worker is finished defending itself and should be put back to work, and other cases. Workers are not only not transferred to an endangered base, they are often (not always) transferred away from the endangered base because they had to defend themselves, or were pulled for defense, or otherwise changed tasks. It’s hardly perfect, but it saves workers and greatly improves Steamhammer’s resilience to attack. It was a critical improvement. Related weaknesses remain: Steamhammer may still try to build at the endangered base, or transfer drones to a distant base through the enemy army.

• Fixed CombatCommander bugs in deciding which enemy base to attack.

• Rules for recognizing enemy cloaking and assigning detectors to our squads are slightly improved. Steamhammer pays attention to whether it has cloaked units itself and needs to catch enemy observers. This allows overlords to stay safer in some game situations: We don’t have lurkers, therefore we don’t have a strong need to hunt observers, therefore overlords can stay home.

• In feeding units to the combat sim, an enemy building which was last seen uncompleted is entered as completed if is currently out of sight. It is sometimes an improvement, sometimes a mistake. It would be better to track the estimated completion time and use that (a number of bots do).

micro

• Previous versions changed former stateless module Micro into an object with state, but did nothing new. This version updates the state with each of our unit’s orders and a little more information, but still doesn’t use the state for anything. It’s all prepared for serious work, though.

• Enemy unit movement prediction is smarter. It takes distance into account. Also, prediction is used differently in different cases to get better results in practical situations. Mutalisks, wraiths, and vultures now use prediction; as instant-acceleration units they are kited by a different routine than other units, which didn’t use prediction until now.

• Kite only units which the enemy has targeted. If nobody wants to shoot you, you don’t have to step back from the shooting. This especially helps hydralisks, which fire more slowly when kited and tend to get in each other’s way.

• Some bits of micro explicitly take latency into account in their calculations, especially kiting. It’s more accurate.

• Don’t chase an enemy if we’re predicted to be unable to catch it. CanCatchUnit() (defined in Common.cpp) figures it out. It makes less difference than I expected.

• Targeting priority: A ghost which is nuking is the highest priority target. An enemy defiler is also a high priority. These two were oversights. There are other targeting tweaks, for example to reduce cases of attacking the pylon when the cannon behind it is firing; it’s not fully successful.

zerg

• If we are short of mutas or lurkers (by a simple hardcoded count), then do not substitute a drone for a muta or lurker, no matter how much we may want drones. This was the biggest cause of delayed tech switches, and fixing it makes strategic play crisper at key times. This was one of the most important fixes in version 2.0.

• Late in the game, add extractors willy-nilly everywhere that they are possible. Steamhammer was too often gas-starved in the late game, too slow to get more gas when it was needed. This is also an important fix.

• Limit scourge more stringently. Sometimes Steamhammer makes so much scourge that it has no gas for anything else, causing macro problems as it delays production to get more gas.

• The building manager is much more cautious about turning a failed expansion into a macro hatchery. It only orders the change if it appears that we actually need a macro hatchery.

• Build a queen’s nest sooner in some situations. This is to speed hive tech when it is needed. It doesn’t make a big difference.

• If the enemy has too many corsairs or valkyries, get air carapace. This gets air armor for overlords even if we are not making any other air units.

• Favor guardians more, especially versus mass cannons. Past Steamhammer versions reduced guardian use by too much, and this starts to correct it.

• Emergency reaction: If we’re dangerously short on drones, don’t spend one on a building. Oops.

• Emergency reaction: If we have no bases but do have hatcheries, then make sure that the drone limit is at least 3. Having “no bases” means that no hatchery is at a predefined base location; we might still have hatcheries that can mine. Steamhammer formerly thought that, without bases, it had no need for drones. That was OK if it had drones already, not if they were all dead. Now Steamhammer has a chance to recover, provided the enemy is also prostrate after a base trade.

• Don’t automatically make a sunken in reaction to a proxy. It was often an overreaction.

• Don’t automatically get an early sunken versus protoss 2 gate (the sunken was usually too early) or against zerg 2 hatch (it was often unnecessary).

• When planning a morphed unit type (such as a lurker), don’t count it as using up a larva. This minor bookkeeping fix should occasionally make for better production decisions.

• Cancel grossly excess overlords even in the opening book. This may help if an emergency situation comes up in the opening, but it is mainly meant to mitigate bugs which cause production loops. No simple production loops are possible, but there seem to still be some complex loops where unrelated emergency reactions fire, and each prevents the other from recognizing that it has already taken action. It’s rare, though.

• Don’t let rebuilding the spawning pool cause a production jam, and don’t allow multiple copies of the spawning pool. It was a rare but deadly bug in production unjamming.

• Fixed a crash if a hydralisk den, lair, or spire was dropped in the opening (in reaction to an emergency). It was that specific.

• Fixed a rare bug that could prevent gas from being retaken after it was lost.

• Fixed a bug in defensive reactions that could request zerglings when there was no spawning pool.

• Fixed an unimportant bug in deciding to make a hive. It had no real consequences.

zerg openings

• Fixed a number of openings that had suffered bit decay: They were broken or mistuned due to code changes, such as queue reordering and mineral locking.

• Added new turtle openings designed to exploit particular enemy strategies: 11HatchTurtleHydra, 11HatchTurtleLurker, 12HatchTurtle, ZvZ_OverpoolTurtle. These are meant to stop specific rushes and leave Steamhammer in a sound position.

• Finished up and optimized the anti-forge-expand macro openings ZvP_3BaseSpire+Den (a good success) and 4HatchBeforeGas (not as effective). Steamhammer is finally able to play macro games well, so this was important. To play these openings truly well, though, Steamhammer needs greater ability to understand and react to the enemy’s timings.

• I fiddled with opening probabilities, mainly to continue to make openings more equally likely so that the opponent model can explore more efficiently. But also to include the new openings and to adjust to Steamhammer’s new strengths and weaknesses.

• I split the opening 9PoolSpeed into 2 variants. One variant keeps the same name and makes fewer zerglings (hit them by surprise, then transition to a normal game), the other is called 9PoolSpeedAllIn and makes more zerglings (hit them hard and maintain pressure). Both are more effective in their ways than the former compromise opening.

• A few ZvZ openings make 1 drone fewer, to keep zergling numbers as high as possible.

• Some other changes.

Steamhammer 2.0 squad unit clustering

Squad unit clustering is the Steamhammer 2.0 change with the biggest effect on play. It was also the change that took the most effort to get working well, because it adds flexibility, which means more ways to go wrong.

Many bots form squads dynamically by clustering: Each cluster is a squad. Steamhammer forms squads as usual, and clusters the units in each squad into groups which act somewhat independently. The two arrangements are not necessarily different in effect; what matters is how decisions are made. Steamhammer’s system is meant to keep a clean distinction between levels of abstraction, the operational level at which squads are formed and given orders, and the tactical level at which units are maneuvered to carry out the orders. Each cluster in the squad makes decisions based on its local situation, but it is ultimately trying to carry out the same order as the other clusters in the squad.

Clustering is driven by a simple imperative: Units must react to their local environment. In the unified squad structure that Steamhammer inherited from UAlbertaBot, the squad runs a single combat simulation for front-line units which are in contact with the enemy, and the entire squad is told to advance or retreat based on the result. Units which are away from the front lines, but in contact with the enemy, are given nonsensical orders as often as not. For example, if the front line is advancing because the enemy army is away from home, then small groups of reinforcing units which run into the enemy army are also told to attack, and they end up dying. And conversely, if the front line is afraid to attack, then units behind the scenes which meet lone enemies are also afraid of them. Units must react to the situation they are in, not the situation some other unit is in.

With clustered squad units, each cluster that comes into contact with the enemy runs its own combat simulation, and advances or retreats independently depending on the result. When the army is large, the squad members tend to be spread over a wide area, and independent behavior is a necessity. The full set of cluster behaviors is complex; keep reading.

the clustering algorithm

The clustering algorithm itself is implemented in OpsBoss (intended to eventually replace CombatCommander), which is probably not the right place for it. Methods cluster either a given set of units (like the members of a squad) or all units of a player. The feature to cluster all units is intended for clustering enemy units, and not currently used, but I expect it will be valuable for deeper tactical analysis when I get that far.

A cluster is either an air cluster or a ground cluster. Air units and ground units have different movement possibilities—different decisions available—so they should be treated differently. Only an air cluster can retreat over a cliff.

Steamhammer makes circular clusters. I expect that rectangular clusters are as good, and probably a tad cheaper to calculate, but also a tad more complex. Pick an arbitrary unit as the seed of a cluster, find nearby units, calculate the center and radius of the cluster so far, and expand the radius by a fixed amount to see if that draws in any more units. If so, recalculate the center and radius, and so on; if not, the cluster is complete. Repeat until all units are in clusters. A cluster of size 1 is fine.

clustering in squads

Each squad reclusters its units every frame. That means that clusters have no continuity; there is no such thing as “the same cluster” the next frame. It’s a limitation. Some units, like overlords and defilers, are left out of clusters and handled separately. To save time, not every cluster makes decisions on every frame, especially in the late game when there may be many clusters. See Squad::update() for all this stuff.

The clusters then make decisions independently. Each is trying to reach the squad’s order position, and decides what to do based on its local situation. The clusters know that they are working together, so if they see a way, they will try to join up into larger clusters. If you turn on Config::Debug::DrawClusters in the configuration file (it is currently turned on for the SSCAIT stream), you’ll see (among other things) a status string for each cluster: Join Up, Advance, Attack, Retreat, and so on. The decisions work like this:

• If there is no enemy nearby that the cluster can attack, or that can attack the cluster, then the cluster is in a “no fight” situation—combat is impossible for the moment.

• If a “no fight” cluster finds itself in the vanguard, its status string is “Advance” and it moves toward the squad’s order position.

• If a “no fight” cluster is behind other clusters, its status string is “Join Up” and it tries to merge with a cluster that is ahead of it. If the cluster ahead is advancing, they will form a train; if the cluster ahead is stopped or in combat, the trailing cluster may be able to join it.

• Other clusters have to decide whether to attack or retreat (aka regroup). First, the code checks several shortcuts to see if combat sim can be skipped.

• If Steamhammer is near max supply, the status string is “Banzai!” and the decision is to attack..Every cluster of every squad with an attack order will try to get into the fight.

• If the cluster is near a static defense building that can help in the fight, the decision is to attack. There are a couple of different cases, which get different strings mentioning static defense. The behavior is not as good as it could be (most zerg units should move behind the static defense before turning around to fight, instead of standing in front), but it’s an improvement over past versions.

• If the cluster has retreated as far as it can, back into its base which is now under attack, the decision is to attack and the string is “Back to the wall”.

• If none of those checks fires, there is nothing for it but to run the combat sim. Depending on the result, the status string will be “Attack” or “Retreat”. On the enemy side, all the nearby enemies are added to the combat sim, as usual. On the friendly side, only the cluster’s own units are added; clusters assume that they are unable to cooperate. This works better than adding all nearby friendly units, because in fact different friendly units often do not cooperate. The downside is that clusters cannot cooperate when they should, for example to carry out a sandwich maneuver. A better system would be to identify top-down which clusters have the same enemies in their sights, and run one combat sim for all those clusters. I expect to do that at some point after I turn on enemy unit clustering.

• If the decision is to retreat, where should the cluster retreat to? There are several rules for this, checked in Squad::calcRegroupPosition(). I have said that Steamhammer’s old tactical misbehaviors are gone, but it’s not true: Clusters sometimes still try to retreat through the enemy force. There is plenty of room to improve retreating. At some point after I turn on enemy unit clustering, I’ll teach the retreat calculation more.

• Retreating: If there is nearby static defense in any direction, retreat toward it.

• Retreating: If part of the cluster is in range of the enemy (so we did a combat sim) and part is out of range (so it is a safe place to retreat to), retreat to the position of the cluster unit out of enemy range which is closest to the squad’s order position. This is a cluster variant of the classic UAlbertaBot retreat behavior. Restricting it to one cluster at a time makes it behave much more nicely—no more retreating to a random position because some reinforcing unit happens to be approaching from an unexpected angle.

• Retreating: Look for another cluster nearby to join up with. Favor nearby clusters that are closer to the order position and clusters which are attacking. It’s good in general for clusters to join up, but this rule also tries to fix a specific weakness: If a small cluster does its combat sim with its few units, it may be told to retreat, even though a large cluster in front of it has done its combat sim and decided to attack. It is much better for the small cluster to join the large one in its attack, so in this case Steamhammer tries to merge the clusters, or in other words, it retreats toward the enemy. It doesn’t always work; cases still occur of small clusters which are needlessly fearful.

• Retreating: If all else fails, retreat toward the main base.

other details

The call Squad::unitNearEnemy() decides whether a unit is “in range” of the enemy. In the past, Steamhammer often did not retreat far enough for safety, especially if the enemy had sieged tanks. I improved it in 2 ways:

• It uses the information manager’s records of all enemy units last known positions to decide whether an enemy is near, instead of using the MapGrid information about visible enemies only. Having to see the enemy to know to stay out of its firing range was especially bad for zerglings, which have a short sight range. It was also bad against sieged tanks, which can hit from out of sight range.

• If the enemy is known to have tanks with siege mode, then ground units make sure they stay outside of sieged tank range. Before, the assumed safe retreat distance was simply not far enough to be safe from tanks! It was painful to see a squad retreat to just inside siege range, lose the frontmost units to tank fire, and—move forward to stay centered on the retreat point. Ouch!

I made the minimum changes to Squad to get all this to work. The code is getting increasingly messy with special cases. For example, the micro managers which handle clustered units have to be told about the cluster and do a set intersection to figure out which units they should issue commands to, and the micro managers that handle unclustered units do not—squad code and micro code have to be coordinated, making refactoring more difficult. I will eventually rewrite the squads with a new and cleaner design, but probably not this year.

overall

This was a needed change. Units simply must pay attention to their own situations, there is no way around it.

Play versus protoss is vastly stronger. Voters have been feeding Steamhammer a lot of games against top protoss bots, which zerg mostly loses. Even in these losing games, it is easy to see that Steamhammer is putting up a much tougher fight than it used to. Since my home tests against DaQin, which Steamhammer learned to defeat, DaQin has been updated to be more aggressive and Steamhammer can no longer win (and I don’t know what version was entered into AIIDE, so we’ll see how the tournament goes). I can see the path to stronger zerg play, to earning wins over Locutus and PurpleWave. I can tell what needs to be done and I have confidence that it will succeed.

Play versus terran varies, and play versus zerg is weaker because of poor zergling behavior. A large part of the poor zergling behavior is due to clustering changes. In the past, when Steamhammer retreated a squad, it would not rerun the combat sim and consider attacking again until a time limit had passed (I changed the details thoroughly, but the behavior was inherited from UAlbertaBot). Since clusters do not persist from one frame to the next, it was impossible to retain this feature. The result is more indecision, and the indecision strongly affects zergling play, and causes other harm as well. It’s not easy to fix, and I’m still thinking about ways.

Part of the weakness is in retreating from a superior force that then advances. Not all of the enemy force is in sight range, and the rest is assumed to be still in its last-seen positions, so the combat sim says “ha, we can turn around and fight now!” With no retreat time limit, and the short sight range of zerglings, this happens fast and causes serious losses. The weakness was always there, and now it is worse.

Next: The big change list with everything else.