archive by month
Skip to content

AIIDE 2018 - race balance

This post is about race balance in AIIDE 2018, but first I want to show my version of the crosstable, because mine is minutely different from the official results. My version includes exactly one game that the official results skip, game 11821 in round 33 between AILien and CSE, recorded as a crash for CSE after 23400 frames. The only other games omitted from the official results are games that failed to start and are recorded as having a length of -1 frames. I have no idea why this one game was dropped when other crash games were kept. In any case, the effect is that my win rates for CSE and AILien are different from the official rates by a small amount, and that affects my other statistics, though the difference is only one game and it’s hardly noticeable.

overallSAIDCherCSEBlueLocuISAMDaQiMcRaIronZZZKSteaMicrLastTyrMetaLetaArraEcgbUAlbXimpCDBoAiurKillWillAILiCUNYHell
SAIDA95.91%83%
83/100
93%
93/100
97%
97/100
96%
96/100
89%
89/100
95%
95/100
86%
86/100
98%
98/100
97%
97/100
97%
97/100
100%
100/100
100%
100/100
100%
100/100
96%
90/94
98%
98/100
100%
100/100
100%
100/100
87%
87/100
98%
98/100
96%
96/100
95%
95/100
96%
96/100
99%
99/100
100%
100/100
98%
94/96
100%
100/100
CherryPi90.86%17%
17/100
72%
72/100
86%
86/100
89%
89/100
96%
96/100
100%
100/100
85%
85/100
96%
96/100
91%
91/100
81%
80/99
91%
91/100
82%
82/100
96%
96/100
99%
92/93
98%
98/100
89%
89/100
99%
99/100
98%
98/100
99%
99/100
100%
100/100
99%
99/100
100%
100/100
100%
100/100
100%
100/100
100%
100/100
100%
100/100
CSE87.08%7%
7/100
28%
28/100
66%
66/100
68%
68/100
78%
78/100
84%
84/100
72%
71/99
98%
98/100
91%
91/100
98%
98/100
99%
99/100
100%
100/100
100%
100/100
94%
89/95
91%
91/100
100%
100/100
100%
100/100
98%
98/100
100%
100/100
99%
99/100
96%
96/100
100%
100/100
100%
100/100
99%
99/100
99%
97/98
100%
100/100
BlueBlueSky81.48%3%
3/100
14%
14/100
34%
34/100
61%
61/100
66%
66/100
92%
92/100
72%
72/100
98%
98/100
88%
88/100
97%
97/100
94%
94/100
100%
100/100
100%
100/100
64%
60/94
95%
95/100
92%
92/100
100%
100/100
89%
89/100
95%
95/100
96%
96/100
72%
72/100
97%
97/100
100%
100/100
100%
100/100
100%
92/92
100%
100/100
Locutus81.01%4%
4/100
11%
11/100
32%
32/100
39%
39/100
56%
56/100
76%
76/100
54%
54/100
88%
88/100
95%
95/100
97%
97/100
96%
96/100
100%
100/100
97%
97/100
94%
90/96
94%
94/100
97%
97/100
100%
100/100
94%
94/100
94%
94/100
100%
100/100
95%
95/100
98%
98/100
100%
100/100
98%
98/100
100%
90/90
100%
100/100
ISAMind78.46%11%
11/100
4%
4/100
22%
22/100
34%
34/100
44%
44/100
63%
63/100
49%
49/100
98%
98/100
89%
89/100
96%
96/100
93%
93/100
100%
100/100
95%
95/100
88%
79/90
92%
92/100
98%
98/100
100%
100/100
95%
95/100
93%
93/100
99%
99/100
82%
82/100
97%
97/100
100%
100/100
100%
100/100
100%
96/96
100%
100/100
DaQin72.39%5%
5/100
0%
0/100
16%
16/100
8%
8/100
24%
24/100
37%
37/100
42%
42/100
92%
92/100
87%
87/100
99%
99/100
96%
95/99
96%
96/100
99%
99/100
61%
58/95
95%
95/100
99%
99/100
100%
100/100
88%
88/100
73%
73/100
97%
97/100
81%
81/100
98%
98/100
100%
100/100
97%
97/100
93%
89/96
100%
100/100
McRave65.74%14%
14/100
15%
15/100
28%
28/99
28%
28/100
46%
46/100
51%
51/100
58%
58/100
55%
55/100
72%
72/100
63%
63/100
79%
79/100
96%
96/100
88%
88/100
51%
49/97
74%
74/100
99%
99/100
96%
96/100
72%
71/99
41%
41/100
100%
100/100
77%
77/100
58%
58/100
74%
74/100
83%
83/100
92%
89/97
100%
100/100
Iron63.79%2%
2/100
4%
4/100
2%
2/100
2%
2/100
12%
12/100
2%
2/100
8%
8/100
45%
45/100
41%
41/100
73%
73/100
86%
86/100
91%
91/100
94%
94/100
99%
85/86
66%
66/100
94%
94/100
100%
100/100
91%
91/100
99%
99/100
88%
88/100
97%
97/100
85%
85/100
100%
100/100
93%
93/100
91%
87/96
100%
100/100
ZZZKBot51.13%3%
3/100
9%
9/100
9%
9/100
12%
12/100
5%
5/100
11%
11/100
13%
13/100
28%
28/100
59%
59/100
59%
59/100
57%
57/100
55%
55/100
35%
35/100
80%
68/85
40%
40/100
83%
83/100
25%
25/100
86%
86/100
100%
100/100
72%
72/100
89%
89/100
58%
58/100
92%
92/100
73%
73/100
85%
77/91
99%
99/100
Steamhammer50.99%3%
3/100
19%
19/99
2%
2/100
3%
3/100
3%
3/100
4%
4/100
1%
1/100
37%
37/100
27%
27/100
41%
41/100
24%
24/100
25%
25/100
53%
53/100
57%
50/88
74%
74/100
57%
57/100
96%
96/100
89%
89/100
79%
79/100
84%
84/100
88%
88/100
91%
91/100
93%
93/100
83%
83/100
95%
91/96
100%
100/100
Microwave50.46%0%
0/100
9%
9/100
1%
1/100
6%
6/100
4%
4/100
7%
7/100
4%
4/99
21%
21/100
14%
14/100
43%
43/100
76%
76/100
57%
57/100
52%
52/100
72%
66/92
80%
80/100
68%
68/100
73%
73/100
81%
81/100
65%
65/100
78%
78/100
81%
81/100
91%
91/100
86%
86/100
51%
50/99
98%
90/92
100%
100/100
LastOrder49.23%0%
0/100
18%
18/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
4%
4/100
4%
4/100
9%
9/100
45%
45/100
75%
75/100
43%
43/100
57%
57/100
88%
87/99
75%
75/100
83%
83/100
4%
4/100
95%
95/100
85%
85/100
89%
89/100
95%
95/100
89%
89/100
35%
35/100
92%
92/100
96%
95/99
100%
100/100
Tyr44.60%0%
0/100
4%
4/100
0%
0/100
0%
0/100
3%
3/100
5%
5/100
1%
1/100
12%
12/100
6%
6/100
65%
65/100
47%
47/100
48%
48/100
43%
43/100
20%
20/98
61%
61/100
96%
96/100
46%
46/100
58%
58/100
98%
98/100
100%
100/100
50%
50/100
94%
94/100
49%
49/100
62%
62/100
94%
88/94
100%
100/100
MetaBot44.42%4%
4/94
1%
1/93
6%
6/95
36%
34/94
6%
6/96
12%
11/90
39%
37/95
49%
48/97
1%
1/86
20%
17/85
43%
38/88
28%
26/92
12%
12/99
80%
78/98
41%
32/78
62%
56/91
86%
79/92
55%
52/94
40%
38/94
68%
64/94
73%
62/85
49%
47/95
91%
85/93
65%
62/95
87%
80/92
99%
87/88
LetaBot37.80%2%
2/100
2%
2/100
9%
9/100
5%
5/100
6%
6/100
8%
8/100
5%
5/100
26%
26/100
34%
34/100
60%
60/100
26%
26/100
20%
20/100
25%
25/100
39%
39/100
59%
46/78
42%
42/100
76%
76/100
41%
41/100
78%
78/100
44%
44/100
36%
36/100
53%
53/100
23%
23/100
92%
92/100
89%
67/75
100%
100/100
Arrakhammer37.24%0%
0/100
11%
11/100
0%
0/100
8%
8/100
3%
3/100
2%
2/100
1%
1/100
1%
1/100
6%
6/100
17%
17/100
43%
43/100
32%
32/100
17%
17/100
4%
4/100
38%
35/91
58%
58/100
69%
69/100
43%
43/100
71%
71/100
77%
77/100
54%
54/100
59%
59/100
94%
94/100
63%
63/100
100%
95/95
100%
100/100
Ecgberht36.72%0%
0/100
1%
1/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
4%
4/100
0%
0/100
75%
75/100
4%
4/100
27%
27/100
96%
96/100
54%
54/100
14%
13/92
24%
24/100
31%
31/100
58%
58/100
79%
79/100
38%
38/100
57%
57/100
66%
66/100
58%
58/100
84%
84/100
95%
83/87
95%
95/100
UAlbertaBot34.71%13%
13/100
2%
2/100
2%
2/100
11%
11/100
6%
6/100
5%
5/100
12%
12/100
28%
28/99
9%
9/100
14%
14/100
11%
11/100
19%
19/100
5%
5/100
42%
42/100
45%
42/94
59%
59/100
57%
57/100
42%
42/100
48%
48/100
45%
45/100
64%
64/100
75%
75/100
41%
41/100
70%
70/100
83%
78/94
98%
98/100
Ximp32.61%2%
2/100
1%
1/100
0%
0/100
5%
5/100
6%
6/100
7%
7/100
27%
27/100
59%
59/100
1%
1/100
0%
0/100
21%
21/100
35%
35/100
15%
15/100
2%
2/100
60%
56/94
22%
22/100
29%
29/100
21%
21/100
52%
52/100
3%
3/100
68%
68/100
84%
84/100
96%
96/100
64%
64/100
84%
71/85
94%
94/100
CDBot31.98%4%
4/100
0%
0/100
1%
1/100
4%
4/100
0%
0/100
1%
1/100
3%
3/100
0%
0/100
12%
12/100
28%
28/100
16%
16/100
22%
22/100
11%
11/100
0%
0/100
32%
30/94
56%
56/100
23%
23/100
62%
62/100
55%
55/100
97%
97/100
23%
23/100
62%
62/100
78%
78/100
66%
66/100
89%
79/89
93%
93/100
Aiur31.56%5%
5/100
1%
1/100
4%
4/100
28%
28/100
5%
5/100
18%
18/100
19%
19/100
23%
23/100
3%
3/100
11%
11/100
12%
12/100
19%
19/100
5%
5/100
50%
50/100
27%
23/85
64%
64/100
46%
46/100
43%
43/100
36%
36/100
32%
32/100
77%
77/100
27%
27/100
61%
61/100
38%
38/100
76%
65/85
96%
96/100
KillAll29.64%4%
4/100
0%
0/100
0%
0/100
3%
3/100
2%
2/100
3%
3/100
2%
2/100
42%
42/100
15%
15/100
42%
42/100
9%
9/100
9%
9/100
11%
11/100
6%
6/100
51%
48/95
47%
47/100
41%
41/100
34%
34/100
25%
25/100
16%
16/100
38%
38/100
73%
73/100
39%
39/100
70%
70/100
98%
94/96
95%
95/100
WillyT27.76%1%
1/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
26%
26/100
0%
0/100
8%
8/100
7%
7/100
14%
14/100
65%
65/100
51%
51/100
9%
8/93
77%
77/100
6%
6/100
42%
42/100
59%
59/100
4%
4/100
22%
22/100
39%
39/100
61%
61/100
43%
43/100
91%
85/93
100%
100/100
AILien27.04%0%
0/100
0%
0/100
1%
1/100
0%
0/100
2%
2/100
0%
0/100
3%
3/100
17%
17/100
7%
7/100
27%
27/100
17%
17/100
49%
49/99
8%
8/100
38%
38/100
35%
33/95
8%
8/100
37%
37/100
16%
16/100
30%
30/100
36%
36/100
34%
34/100
62%
62/100
30%
30/100
57%
57/100
96%
87/91
100%
100/100
CUNYBot9.84%2%
2/96
0%
0/100
1%
1/98
0%
0/92
0%
0/90
0%
0/96
7%
7/96
8%
8/97
9%
9/96
15%
14/91
5%
5/96
2%
2/92
4%
4/99
6%
6/94
13%
12/92
11%
8/75
0%
0/95
5%
4/87
17%
16/94
16%
14/85
11%
10/89
24%
20/85
2%
2/96
9%
8/93
4%
4/91
95%
80/84
Hellbot1.36%0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
1%
1/100
0%
0/100
0%
0/100
0%
0/100
0%
0/100
1%
1/88
0%
0/100
0%
0/100
5%
5/100
2%
2/100
6%
6/100
7%
7/100
4%
4/100
5%
5/100
0%
0/100
0%
0/100
5%
4/84

getting back to the subject at hand...

Here is the overall race balance. The only random player was UAlbertaBot, so that row and column reflect only UAlbertaBot’s results.

overallvTvPvZvR
terran52%45%60%67%
protoss57%55%63%67%
zerg43%40%37%62%
random35%33%33%38%

The top winner was terran and the runner-up was zerg, but the Locutusoid mass still makes it look like a protoss world. As always, race balance reflects more who the entrants are than anything else.

Here is the more interesting table, how each bot performed against each race.

#botraceoverallvTvPvZvR
1SAIDAterran95.91%99%95%97%87%
2CherryPizerg90.86%82%93%93%98%
3CSEprotoss87.08%79%86%91%98%
4BlueBlueSkyprotoss81.48%79%76%88%89%
5Locutusprotoss81.01%77%74%89%94%
6ISAMindprotoss78.46%80%67%88%95%
7DaQinprotoss72.39%78%54%86%88%
8McRaveprotoss65.74%63%57%76%72%
9Ironterran63.79%67%50%74%91%
10ZZZKBotzerg51.13%44%43%61%86%
11Steamhammerzerg50.99%59%39%58%89%
12Microwavezerg50.46%51%37%63%81%
13LastOrderzerg49.23%25%39%70%95%
14Tyrprotoss44.60%32%29%65%58%
15MetaBotprotoss44.42%45%44%44%55%
16LetaBotterran37.80%34%33%44%41%
17Arrakhammerzerg37.24%45%26%46%43%
18Ecgberhtterran36.72%20%28%51%58%
19UAlbertaBotrandom34.71%33%33%38%-
20Ximpprotoss32.61%28%33%33%52%
21CDBotzerg31.98%42%23%35%55%
22Aiurprotoss31.56%35%30%31%36%
23KillAllzerg29.64%28%26%35%25%
24WillyTterran27.76%30%21%31%59%
25AILienzerg27.04%18%27%32%30%
26CUNYBotzerg9.84%7%15%5%17%
27Hellbotprotoss1.36%1%1%2%2%

At the top, it is interesting that #1 SAIDA had a little trouble versus random #19 UAlbertaBot, which CherryPi steamrolled, and #2 CherryPi had a little trouble versus terrans, which SAIDA annihilated. The upper protoss bloc, the Locutusoids plus #8 McRave, crushed all competition below them, with the worst matchup at 54%. Everybody in the top half was strong against zerg, even the zergs. (#11 Steamhammer was the weakest of them at ZvZ due to the micro bugs that are fixed in version 2.1.) Four zergs are closely grouped around 50%, but had different paths to get there. #15 MetaBot was equally good against all races, which I think is a sign that its 3 heads are well balanced. #18 Ecgberht was strikingly stronger against zergs than other races, reflecting its infantry first strategy. There are not many surprises in the table.

AIIDE 2018 - the unfamiliar bots

I did a quick investigation of the AIIDE bots whose names I didn’t know before. Some of them are interesting, and I will follow up.

#3 CSE and #4 BlueBlueSky were quickly identified as Locutus forks. The links are to diffs helpfully prepared by Bruce, author of Locutus.

#13 LastOrder is credited to Bilibili AI Research and is related to Overkill. Bilibili is a Chinese video sharing website, and their stock prospectus boasts of “our robust artificial intelligence empowered, interest-based content curation,” so it sort of makes sense. LastOrder uses the TensorFlow machine learning framework. The description says “all the production of unit (excluding overlord), building, upgrade, tech and trigger of attack are controlled by a pre-trained Tensorflow model using method similar to ape-X DQN. model are trained distributively against 20 different bots on a cluster of 1000 machines.” It promises more details later at Sijia Xu’s github (the author of Overkill). I see an 8MB file that looks like machine learning results. There is python code that I take to be TensorFlow glue, and C++ code that looks like Overkill. Overkill was strong in 2015 (which is the version still running on SSCAIT), but its learning experiments in 2016 and 2017 were not successful. LastOrder finished in the top half, so I gather that this time there was good progress, perhaps with the help of a larger team.

#15 MetaBot (link to github repo) is an evolved version of the familiar MegaBot. Instead of MegaBot’s 3 heads NUSBot, Skynet, and Xelnaga, MetaBot has AIUR, Skynet, and XIMP, a much stronger combination. MegaBot settled on Skynet versus most opponents, because Skynet is much stronger than NUS or Xelnaga; MetaBot should show more variety. MetaBot ranked higher than #20 XIMP or #22 AIUR (while Skynet did not compete), so there is at least a little evidence that 3 heads are better than 1.

#21 CDBot seems to have been forked from cpac, which is itself a not-very-deep fork of Steamhammer. It did not catch my interest.

#27 Hellbot is written in C# and relies on bwapi-mono-bridge2. It’s quite small and developed from scratch.

AIIDE 2018 results announced

AIIDE 2018 results are out, and they’re exciting!

  1. SAIDA
  2. CherryPi
  3. CSE
  4. BlueBlueSky
  5. Locutus

SAIDA is first and Locutus is only fifth! #1 SAIDA scored 96% overall, 83% against #2 CherryPi, and higher against every other opponent, so the winner was clear. That sounds to me like a fair claim to be world champion. All the top bots became much stronger this year. Look how far down #9 Iron has been pushed.

Zergs #10 ZZZKBot (last year’s winner), #11 Steamhammer, #12 Microwave, #13 LastOrder all scored about 50%. That looks miserable, but it put them right behind #9 Iron and in the upper half of the table—there were 27 participants total. The top bots wiped the floor with all others. Former contenders #19 UAlbertaBot, #20 XIMP, and #22 AIUR scored in the vicinity of 33%.

I’m pleased that #11 Steamhammer finished about as high as it did last year. I have more or less kept up with progress—except progress at the summit.

Stand by for tables and analysis. I’ll be looking at the top bots to see what makes them tick. I don’t know anything about #3 CSE and #4 BlueBlueSky so that should be interesting.

Steamhammer 2.1 change list

Steamhammer 2.1 is uploaded. The headline feature is that terran and protoss play acceptably well again, unlike in version 2.0, so Randomhammer is updated. Zerg should also play significantly better: Zergling micro is fixed, scourge are more effective (I hope), and defilers are more active.

I reset Steamhammer’s learned data for version 2.0, because 2.0 plays differently. Since then it has learned enough to play its best against some opponents, but it is still short of equilibrium. It reached final elo 2164, which is probably lower than its elo if it had enough games under its belt. Version 2.1 is not that different from 2.0, so it is continuing with the same data.

I reset Randomhammer’s data for 2.1, since the existing data is for version 1.4.7. It will have to figure its opponents out from scratch again, which will take months.

Stand by for source. I’m way behind in updating Steamhammer’s web page. Here is the change list:

opponent model

• Fixed a bug in InformationManager that could fail to recognize an in-base proxy.

macro

• Configured WorkersPerPatch to 2.0 for terran and protoss, to reflect mineral locking. Mineral locking helps terran and protoss over zerg, because bases tend to have more workers. Reducing the maximum worker count per base while maintaining full mining efficiency means that much more cash and supply to spend on tech and army.

tactics

• Medics and tanks act in clusters. They are controlled by their own code, which needed updating.

• Bug fix: A cluster of all medics which happened to reach the front line would then continue to advance heedlessly into the enemy position and die. Now it falls back toward the base—normally it will soon merge with another cluster that is on the way. To implement this, I split the former “no fight” case into 2 cases, “I have no fighting units” and “there are no enemies nearby,” which behave differently. All medics is an example of no fighting units.

• Steamhammer knows a number of places that a cluster of units can retreat to. I moved “retreat to join up with another cluster” (which can be an advance rather than a retreat) ahead of “retreat to the location of a cluster unit which is not near the enemy” in priority order. It helps clusters merge a little more often.

• When there are many clusters, Steamhammer saves execution time by not updating all clusters every frame. It divides the clusters into “phases” and updates one phase per frame. In Steamhammer 2.0, the phases were calculated incorrectly, which contributed to bad micro (though it wasn’t the biggest cause). Fixed.

combat sim

• Combat sim scores are based on the cost of units, not the destroyScore, because destroyScore is sometimes strange. For now I set score = mineral cost + gas cost.

• Steamhammer uses the combat sim scores in an unusual way to decide who won the simulation: It’s the side that ended up with more stuff surviving. It seems illogical, but it tested better than alternatives I tried, such as the side that lost less. Still, there are pathological cases where it gives a wrong answer. I fixed this pathological case: If you lost nothing, then you won the simulation, even if you finished with less stuff surviving. (If the other side also lost nothing, then you still won, because a draw counts as a win.)

• I also changed the units included in the combat sim in special cases. 1. If you have nothing but ground units that can’t shoot up, then ignore enemy air units that can’t shoot down. Because the side with more surviving stuff wins, this can affect who wins the simulation. This fixes another pathological case, where for example zerglings might run away from corsairs. 2. If you’re scourge, include only ground enemies that can shoot up. Scourge are never afraid of the air units that they want to destroy, only of useless death by ground enemies.

At some point I’ll add the other natural special case, for air units that can’t shoot down. All this combat sim stuff could be way more sophisticated....

micro

• I fixed the biggest cause of poor micro in Steamhammer 2.0. As part of choosing an enemy target, the melee and ranged unit controllers called CanCatchUnit() to see whether the enemy unit would be able to escape if chased. It was meant to reduce goose chases. Anyway, the results of CanCatchUnit() are apparently wrong. I haven’t looked into what the trouble was, because I found that removing the calls had no effect on goose chases—long chases have become rare due to other changes. The error caused units to overlook targets that they could, and often should, attack. Zergling micro became weak, and zealot micro was worse. It’s all back to normal now.

• Vultures and wraiths would become fixated on their targets, unable to switch away even to retreat in an emergency. Fixed.

• Like clusters, defiler actions are divided into phases. There was a bug in coordinating the cluster phases and defiler phases, so that defiler actions might be skipped for a long time depending on the phase of the cluster the defiler was in. Fixing it makes defilers more active, though they still don’t swarm or plague as often as they should.

• Scourge is allowed to spread out more in regrouping. Mutalisks should usually group tight, scourge should spread out some. Formerly, all air units behaved the same.

• I made a couple adjustments to defiler plague. 1. Plague on a building was formerly worth 0.6 of plague on units with the same hit point loss; I changed the discount to 0.3, half as much, so that defilers will try harder to plague units. Buildings have a lot of hit points and threaten to dominate the scoring. (Static defense buildings are treated the same as mobile units, though.) 2. Plague gets a bonus for carrier interceptors, to exploit the plague-on-interceptor behavior, but I didn’t see Steamhammer trying hard to plague XIMP’s interceptors (only the carriers themselves). I increased the bonus by a factor of 4.

zerg

• Scourge are in their own squad, the Scourge squad. They behave somewhat better in my tests, but it’s primitive and they still have a long way to go. I mentioned a couple other improvements to scourge behavior above. It was surprisingly difficult to get scourge to do anything sensible.

• Get an evolution chamber and a spore colony for air defense when needed even if still in book. Steamhammer formerly waited until the book line came to an end before it dared defend itself. I think this will be a net gain, though it will make mistakes sometimes.

• Tweak: Enemy dragoons and dark templar loom a little larger as reasons to make static defense.

• Fixed a bug in deciding to get defilers. Battlecruisers are an excellent reason to get defilers; valkyries, not so much.

• Strongly avoid spawning mutalisks versus large valkyrie counts. Valkyries in numbers pass through mutas like they’re hardly there.

openings

• Fixed a typo in the opening name Over10Hatch2SunkHard in the AntiZealot strategy combo. When this opening was selected, Steamhammer couldn’t find it and played its default 9 pool instead, a poor choice against mass zealots.

• Added the zerg opening AntiFactoryHydra, which may be better against SAIDA’s unit mix than Steamhammer's original AntiFactory, and the terran opening 10-10-10FD in a form close to that popularized by Flash. 10-10-10 is an opening stem that gets a super-fast factory, and 10-10-10 FD is a followup that continues into an attack with 2 tanks and 8 marines, which is strong against protoss that techs too hard or expands fast.

• I configured terran and protoss counters to Naked expand by protoss. That means configuring which openings are to be tried as counters to an expected enemy plan. 10-10-10FD should be a good counter to protoss Naked expand.

debug drawing

• In the game info display (drawn in the upper left of the screen when turned on), Steamhammer 2.0 added an overall score versus this opponent, shown next to the opponent name, “2-3” meaning 2 wins and 3 losses. Steamhammer 2.1 also adds a score for the chosen opening, drawn next to the opening name. The context makes it easier to interpret Steamhammer’s choices. The numbers are specific to the matchup. Randomhammer will show different numbers depending on what race it rolled.

• Squads have 2 settable flags, “fight visible only” (only include visible enemies in the combat sim, not all known enemies) and “meatgrinder” (be more aggressive, willing to accept high losses). Visible-only is used by the Recon squad, and meatgrinder tested poorly and is not used. If the squad info display is turned on, it draws cyan V and M left of the squad’s information line if the flags are turned on.

downtime

The blog had some downtime due to problems at my hosting provider. There seem to be a few lingering issues, but I think the blog itself is good now. Sorry, they are usually reliable!

openbw.com

Oops, it seems that the openbw.com domain name was not renewed in time. It’s a service we’ve come to rely on. Replay links will not work until it is restored—or somebody brings up a replacement service.

Update: It’s back. They handled the issue quickly.

Killerbot-SAIDA games

I see that Killerbot by Marian Devecka has figured out its own way to beat SAIDA with its persistent mutalisk pressure: win 1 and win 2. I’m not sure how consistent it is, but as soon as SAIDA starts taking serious economic damage, terran only goes downhill. The update is a few days old now, and the most visible change is that Killerbot makes a few unupgraded hydralisks early, presumably only when the enemy has or is expected to get vultures. The hydras counter any early vulture or wraith tricks that terran might try. The idea is well known among human players, and Steamhammer uses it too.

I was looking at Steamhammer’s 2 hatch muta loss to SAIDA today, and thinking “With good muta micro and good decisions, zerg should win this.” I’ve been promising good muta micro for 2 years and haven’t delivered....

Update: Apparently SAIDA reacted by becoming more aggressive. It seems to have found an attack timing that works against Killerbot. Is this the same mechanism that solves rushes by adjusting timings, or a different kind of reaction? If it’s the same, the mechanism is quite general.

Steamhammer 2.1 progress

I have fixed the bugs affecting terran play, a bug affecting defilers, and a few bugs and weaknesses affecting all races. I also improved scourge control, and added a new opening to maintain my sanity. There is still a critical bug affecting protoss, which causes units to wander around without fighting, carrying banners “make levity not war.” Another severe weakness affects base defense, causing defenders to hang back from the action. I’ve spent the last 2 days trying to fix base defense, and it doesn’t work. I might have to think up a different solution.

Everything is good except the last 2 critical problems. Not that there’s any shortage of other weaknesses, but these 2 are so bad that they can’t be ignored. Surely they won’t resist me for long, though. Stand by!

Update the next day: I got everything working well enough, so I thought. I ran final tests and found... the newly implemented scourge micro had stopped working, though I hadn’t touched anything related. Now what has gone wrong?

various short items

SAIDA

SAIDA has been updated and is again defeating Krasi0 and Locutus. The arms race continues!

CIG 2018

I started poking at the detailed results file to figure out how to reproduce the official results exactly... then I discovered that the build order problem was wider than it first seemed. I canceled my plans. We don’t need per-map crosstables and race balance analysis of a tournament with such badly distorted results.

Steamhammer 2.1

I haven’t been working that hard on it, but I have made progress. I fixed some of the bugs introduced along with squad clustering, and found the causes of others. 2.1 should have smoother play in many cases. To say the same thing differently, Steamhammer 2.1 is suffering from feature creep, or at least bug fix creep. Hang on, it shouldn’t take too much longer.

CIG 2018 - not only Overkill was broken

I’ve been watching CIG replays. There are way too many and I watched few, but I soon noticed that Overkill was not the only bot in the tournament with broken build orders due to some unknown incompatibility. Steamhammer is also affected in every game I checked, with symptoms similar to Overkill’s though apparently less severe. Strangely, UAlbertaBot (parent of both Overkill and Steamhammer) does not seem to be affected (though I did see one game where it opened 11-11 gate instead of 9-9 gate). Tyr also had build order problems in every game I checked, and it is unrelated code.

I checked a few other bots and did not see consistent problems as with Overkill, Steamhammer, and Tyr. For example, McRave fell into an early production freeze in one game, but it played normally in other games so that was likely a garden variety bug.

It explains some mysteries about Steamhammer’s performance, though it opens others.

It seems difficult to inventory all the participants and see which were affected by similar problems. For the weaker bots, we may have to read the source to tell whether build orders are working as intended.

Can anybody diagnose it? So far we only have speculation about the cause.

looking at ISAMind

As reported, ISAMind is a fork of Locutus, and the only important difference is that ISAMind can predict the opponent’s opening plan using a trained neural network instead of the rule-based method that Locutus inherited from Steamhammer (and modified). The author of Locutus has conveniently made a branch for ISAMind, so that we can compare and verify the differences.

The neural network is a standard feedforward network trained by backpropagation (we can see this in the network data in the file ISAMind.xml). The computation is implemented using OpenCV, a computer vision library that includes some general-purpose machine learning tools. I’ve looked at OpenCV in the past, and I think it is a good choice.

The input to the network is the frame count and the counts of any early game units seen, plus any opening plan recognized already so the network can decide whether to stick with it, plus the high-level features of the rule-based recognizer for proxy, worker rush, factory tech, and number of known enemy bases. With those high-level features, the network doesn’t have much thinking to do. The output is one of 10 possible opening plans, and if none is recognized it falls back on the rule-based recognizer again. In the best case, this neural network can’t provide much of a boost.

The key question is: How was the network trained? I looked all around and found no sign of an explanation. It could be trained to reproduce the values returned by the original rule-based method, but what would be the point? It could be trained to recognize the plan that leads to the highest win rate, but that would be expensive and might learn to deliberately misrecognize plans, so that’s less likely. It could have been trained on data from games with opponents that were coded to play specific plans, but that would risk lack of variety in the training data. It could have been trained on hand-labeled data, if they took the time to label that much data. My best guess is that they wrote a replay analysis tool that labels the training data “here is what I saw” with the plan, “here is what I should recognize,” that would have been chosen if the scout saw everything. The scout is early and normally does see everything that is in the enemy base (not proxied in some hidden location or built later outside the main), so if my guess is right, the trained network should normally be accurate.

I ran ISAMind locally to see what the predictions looked like. My impression is that the predictions were generally accurate, but sometimes sluggish. Steamhammer recognizes a zealot rush if it sees 2 gateways and no gas. ISAMind doesn’t recognize a zealot rush until it sees the zealots. Whether the sluggishness is harmful depends on whether the bot needs to react quickly. ISAMind recognizes a fast rush at the first sign, and that is the most important enemy plan to recognize immediately. Most input information has to come from the scout probe, and the scout probe circles the base without ever looking outside, so the network rarely sees enemy expansions unless it passes them on the way in; the input data is not complete enough to recognize all the possible plans reliably, because scouting is not thorough enough. (Locutus has a commit “Scout the enemy natural” only later, on 19 September.)

Overall, ISAMind seems like a cute little job that doesn’t try to accomplish much. It is like a class project or an early experiment to start getting tools and ideas into place.

looking at TitanIron

TitanIron is, as all signs indicated, a fork of Iron. It forks from the latest Iron, the AIIDE 2017 Iron. The Iron that played in CIG 2018 was carried over from the previous CIG 2017 tournament, and is an earlier version.

#15 TitanIron crashed in 30% of its games. Its win rate was 51.46% overall, or 73.59% in non-crash games. #6 Iron itself (an earlier version) finished with 74.31% win rate, so TitanIron does not seem to be an improvement, even discounting poor code quality. Curious point: #9 LetaBot upset Iron, because LetaBot copes well with vulture and wraith harassment. But TitanIron upset LetaBot. Another curious point: TitanIron performed poorly on the map Andromeda and strongly on Destination, and about equally well on the other 3 maps. Andromeda seems a surprising map to have trouble with.

I watched some replays. In Iron-TitanIron games, the two played identical build orders until the first factory finished, when Iron made 1 vulture first while TitanIron immediately added a machine shop to get the vulture upgrades faster. The bigger difference came later, when Iron built a starport and made wraiths while TitanIron did not. I got the impression that TitanIron rarely or never goes air. The expense of going air puts TitanIron ahead in vultures for a while, so that it won some games, but it seemed that if the vulture pressure did not push Iron over the edge, then Iron would strike back and take the advantage. I watched only 1 game Locutus-TitanIron, because Locutus’s proxy pylon trick misled TitanIron just as it does Iron, and Locutus won easily. I watched a strange game against AIUR where TitanIron built a second command center far from its natural, slowly floated it over, left it in the air, and built a new command center underneath. Not all the bugs are crashing bugs. In the picture, TitanIron is losing to AIUR. Notice the nicely spaced tanks, the spider mines directly next to one tank, the barracks floating in an unhelpful position, and the spare command center in the air.

extra command center

Overall, my impression is that TitanIron’s play is often similar to Iron’s. Unlike Iron, it does not make air units (it seems to have drop skills, but I didn’t run into any games with drop). Against protoss, TitanIron makes more tanks and uses them more cautiously and often clumsily. TitanIron also seems a bit fonder of expanding and growing its economy.

TitanIron adds over 4,000 lines of code to Iron. It was made by a team of 10, so that’s not an excessive amount of new code. The crash rate and the score suggest that the team was not disciplined enough in code quality and testing (of course Steamhammer crashed even more, so I don’t get to brag). Read on and you’ll see what most of the new lines of code do. I question the choices of where to spend effort. I’m not sure what the plan behind TitanIron was supposed to be.

openings

Iron does not play different openings as such. Conceptually, I see Iron as playing one opening which it varies reactively. TitanIron adds a directory opening with code which allows it to define specific build orders. The build order system is loosely modeled on Steamhammer’s, using similar names (which are not the same as UAlbertaBot’s names)—some members of the team have worked on Steamhammer forks.

TitanIron knows 3 specific build orders, named 8BB CC (1 barracks expand), SKT (tanks first), and 5BB (marines). Based on watching replays, TitanIron retains and uses Iron’s reactive opening, with modifications.

opponent-specific strategies

Iron does not recognize opponents by name. TitanIron recognizes 2 specific opponents: Locutus and PurpleSwarm. The zerg PurpleSwarm is a curious choice, since it did not play in CIG. Maybe they found it an interesting test opponent? In any case, Locutus is the main focus. It is recognized in 4 strategy classes, Locutus, SKT, TankAdvance, and Walling. In Iron’s codebase, any number of strategies can be active at the same time, and other parts of the code check by name which strategies are active to suit their actions to the situation.

	Locutus::Locutus()
	{
		std::string enemyName = him().Player()->getName();
		if (enemyName == “Locutus” || enemyName == “locutus”)
		{
			me().SetOpening(“SKT”);
			m_detected = true;
		}
	}

SKT (defined in opening/opening.cpp) builds a barracks and refinery on 11, then adds 2 factories and gets tanks before vultures. It sounds as though it should refer to the “SK terran” unit mix of marines and medics with science vessels and no tanks, but it doesn’t. The Locutus strategy turns itself off (if I understand the code’s intent correctly) after all 4 dark templar of Locutus’s DT drop are dead, or after frame 13,000. Various buildings (barracks, factory, e-bay, turret) recognize when the Locutus strategy is active and carry out scripted actions. The name “Locutus” also activates the TankAdvance strategy which seems to first guard the natural and then perform a tank push, and deactivates the Walling strategy after frame 11,000 or when above 12 marines, causing the barracks to lift and open the wall.

TitanIron scored a total of 1 win out of 125 games against Locutus, so the special attention does not seem to have paid off.

PurpleSwarm gets less attention. (The question is why it got any.)

	Purpleswarm::Purpleswarm()
	{
		std::string enemyName = him().Player()->getName();
		if (him().Race() == BWAPI::Races::Zerg &&
			(enemyName == “Purpleswarm” || enemyName == “purpleswarm” || enemyName == “PurpleSwarm”))
		{
			me().SetOpening(“5BB”);
			m_detected = true;
		}
	}

5BB (also defined in opening/opening.cpp) builds barracks on 10 and 12, later adding a third barracks and training marines up to 30. I don’t see any other cases where TitanIron uses this opening. The rest of the code has no special instructions for PurpleSwarm or 5BB.

other new files

Besides the opening directory, TitanIron adds 16 files in the strategy and behavior directories, defining 8 strategies and behaviors. The added strategies are:

  • GuardNatural
  • Locutus
  • PurpleSwarm
  • SKT
  • TankAdvance

These are remarkable for being all and only the classes used when Locutus or PurpleSwarm is recognized. Do they have any other purpose? I didn’t dig into it, but I suspect that GuardNatural and TankAdvance may be used more widely against protoss.

The added unit behaviors are:

  • GuardLoc - guard a location
  • HangingBase - carry out drops
  • SKTAttack - related to SKT

GuardLoc has some connection with GuardNatural, but seems to be a general-purpose behavior, as far as I can tell. I’m not sure how HangingBase got its name.

The new opening directory and the newly added strategy and behavior files account for about 2/3rds of the lines of code added to Iron. The rest is scattered through the code and not as easy to inventory, but surely much of it must be uses of the new openings, strategies, and behaviors. I do see a lot of changes related to expanding.

SAIDA’s learning and SAIDA’s weaknesses

SAIDA is holding its position as #1 on SSCAIT, but it is under constant attack from other bots and loses some games. On the one hand, SAIDA has weaknesses against early harassment and timing attacks, especially if the opponent denies scouting. On the other hand, SAIDA appears to have a learning mechanism that recognizes rush timing and figures out a defense. The SAIDA page describes it as “He also catches perfect rush timing by using information he collected.” That’s a vague description, but the behavior does appear to involve learning from experience. MicroDK noted that SAIDA writes data only after it loses; this must be why. For example, BananaBrain tried a dark templar rush and won a series of games, but finally the learning kicked in and SAIDA figured out how to get turrets in time to stop it (SAIDA’s code was not updated). Since then, BananaBrain has mostly lost games, defeating SAIDA only once, in this game where the turret was seconds late.

Other examples include PurpleSpirit winning one game with BBS then being unable to win with it again, and Krasi0 winning with its fast barracks marine cheese with similar results.

In the latest attacks, Locutus won with center gates, making only 2 zealots before switching into dragoons, and Krasi0 added a bunker to its marine cheese to overcome SAIDA’s vulture counter to the marines (SAIDA crashed this game). Will SAIDA learn to defeat these tricks too? I don’t know, let’s find out!

How powerful is this learning mechanism? Surely there must be attacks that it cannot figure out how to forestall—or can’t figure out in reasonable time. If you find 2 winning tricks and switch between them, can it learn to defend against both? If you DT rush once so that it learns to get early turrets, does it get early turrets for the rest of time after you switch back to regular play? The unnecessary turrets give you a small advantage, and at a high level of play, small advantages are big.

Here are some of the weaknesses I see in SAIDA’s play.

  • Poor defense against unscouted early attacks, mitigated by the learning mechanism. SAIDA loses more SCVs than it should.
  • SAIDA recovers poorly from economic setbacks. It does not replenish lost SCVs as well as it should, and stops expanding after a while. If you gain an early lead, you can win by holding on and waiting for SAIDA to mine out.
  • SAIDA is vulnerable to mine drags. It sees no danger in having its spider mines and its forces next to each other. It will even place mines in its mineral line, begging you to blow up its SCVs.
  • SAIDA does not know how to build in safe locations. On some maps, like Moon Glaive, parts of the main base are easily sieged from outside. Krasi0 has won games by blasting down factories that are in range, and SAIDA keeps trying to rebuild in places that are also in range.
  • SAIDA is consistent and predictable. It varies to counter the opponent, but at heart always plays the same strategy and the same tactics. The dropships always fly along the edge.

SAIDA also has great strengths. The greatest may be the big red animated arrow that points out the main attack position. As long as SAIDA has a monopoly on big animated arrows, I think it will remain #1.

AITT bots

Submission is closed for the AI Tinycraft Tournament (AITT), for bots limited to 3000 bytes of source code. 3 of the tiny bots have appeared on SSCAIT. Will more follow? Naturally, very small bots play very simple strategies:

  • Oh Fish - zerg 4 pool
  • PotatoMasher - protoss zealot rush
  • PurpleWavelet - protoss zealot rush

CIG 2018 - 8 game limit and other problems

I checked ISAMind and verified that it is affected by the same 8 game problem as Locutus and McRave: Its learning files store data for only 8 games total, not about 125 games as expected. Tscmoo also has a suspiciously small amount of learning data and may be affected. Ziabot has a problem with its learning data, but it looks like a different problem. Other bots appear unaffected, as far as I can judge—I could be wrong, because I don’t understand how they all work.

It’s unclear what effect the 8 game problem had on the tournament. In the best case, the CIG organizers pulled data incorrectly and the tournament itself ran normally. That seems unlikely to me. More likely, learning data for some bots was lost 8 rounds before the end. In that case, it is possible that most of the tournament ran normally, and one error near the end did not much affect results. The fact that the affected bots finished high supports the hypothesis—though, like PurpleWave, they could have finished high because they’re that good even when handicapped. In the worst case, there may have been repeated problems throughout the tournament. I’ll see if I can think of a way to use the detailed results log to narrow down the possibilities.

I feel that CIG 2018 had a lot of problems.

  • The 8 game problem, affecting Locutus, McRave, ISAMind, and possibly Tscmoo.
  • JVM bots did not write learning data at all, affecting PurpleWave, Tyr, and Ecgberht.
  • Ziabot’s learning problem (it might be a bot bug rather than a tournament bug, but Zia has always been reliable for me).
  • Overkill’s build order breakdown.

That’s a large proportion of the entrants affected by tournament surprises of one kind or another. What other problems are there that I haven’t noticed? I’ve only watched a few replays so far.

When I wrote to the CIG organizers to warn them that Steamhammer might crash a lot, they sent back what I found to be a rude reply which implied that giving them heads-up was a wrong thing to do. That is of course down to language and cultural differences. But still, communicating with the participants is part of running a tournament.

I may skip CIG next year.