machine learning - 6 | Starcraft AI blog

CIG 2018 - what AIUR learned

Here is what the classic protoss bot AIUR learned about each opponent over the course of CIG 2018. AIUR has not been updated in many years and has fallen behind the state of the art, but its varied strategies and learning still make it a tricky opponent in a long tournament. Seeing AIUR's counters for each opponent tells us something about how the opponent played. For past editions, see AIIDE 2017 what AIUR learned and what AIUR learned (AIIDE 2015).

This is generated from data in AIUR's final write directory. There were 125 rounds and 5 maps, one 2-player and two each 3- and 4-player maps. For some opponents, all games were recorded, giving 25 games on the 2-player map and 50 games each on 3- and 4-player maps. For most opponents, fewer games were recorded. AIUR recorded 2932 games, and the results table lists 318 crashes for AIUR. 2932 + 318 = 3250, the correct total game count. Unrecorded games were lost due to crashes, and for no other reason.

First the overview, summing across all opponents.

overall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	72	49%	127	65%	132	35%	331	49%
rush	29	41%	269	33%	261	55%	559	44%
aggressive	13	23%	225	68%	184	78%	422	71%
fast expo	33	24%	185	48%	207	48%	425	46%
macro	46	33%	180	52%	135	60%	361	53%
defensive	141	75%	314	73%	379	55%	834	65%
total	334	54%	1300	56%	1298	56%	2932	56%

2, 3, 4 - map size, the number of starting positions
n - games recorded
wins - winning percentage over those games
cheese - cannon rush
rush - dark templar rush
aggressive - fast 4 zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - try to be safe against rushes

Looking across the bottom row, you can see that AIUR had a plus score on every size of map, and that it had to choose different strategies to do so well. It's a strong result for a bot which has essentially no micro skills and has not been updated since 2014. It does still have the best cannon rush of any bot, if you ask me.

#1 locutus	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	8	0%	25	12%	34	9%
rush	1	0%	10	0%	6	0%	17	0%
aggressive	1	0%	4	0%	5	0%	10	0%
fast expo	1	0%	14	0%	5	0%	20	0%
macro	1	0%	7	0%	4	0%	12	0%
defensive	1	0%	7	14%	5	0%	13	8%
total	6	0%	50	2%	50	6%	106	4%

Even against the toughest opponents, AIUR can scrape a small edge with learning. Against Locutus, it pulled barely above zero, but got a few extra wins because it discovered that its cannon rush occasionally scores on 4-player maps. Results against PurpleWave below are similar. I suspect that if AIUR had played the cannon rush every game, Locutus would have adapted and nullified the edge. Maybe it did, and that’s why the edge is so small.

#2 purplewave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	8	0%	39	18%	48	15%
rush	1	0%	8	0%	2	0%	11	0%
aggressive	1	0%	10	0%	3	0%	14	0%
fast expo	4	0%	8	0%	2	0%	14	0%
macro	1	0%	10	0%	2	0%	13	0%
defensive	3	0%	6	0%	2	0%	11	0%
total	11	0%	50	0%	50	14%	111	6%

#3 mcrave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	100%	1	0%	1	0%	3	33%
rush	1	0%	41	2%	1	0%	43	2%
aggressive	0	0%	2	0%	3	0%	5	0%
fast expo	1	0%	1	0%	42	17%	44	16%
macro	1	0%	3	0%	1	0%	5	0%
defensive	1	0%	2	0%	2	0%	5	0%
total	5	20%	50	2%	50	14%	105	9%

Against McRave, the choice is nexus first. McRave must have settled on a macro opening itself.

#4 tscmoo	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	11	27%	1	0%	1	0%	13	23%
rush	1	0%	1	0%	3	0%	5	0%
aggressive	1	0%	11	9%	1	0%	13	8%
fast expo	5	20%	33	15%	1	0%	39	15%
macro	1	0%	2	0%	22	14%	25	12%
defensive	1	0%	2	0%	22	18%	25	16%
total	20	20%	50	12%	50	14%	120	14%

Against the unpredictable Tscmoo, AIUR wavered before settling on an unpredictable set of answers. Notice that not all the strategies are well explored: If you win less than 1 game in 5, then playing an opening 3 times is not enough. If the tournament were much longer, AIUR would likely have scored higher because of its slow but effective learning.

#5 isamind	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	4	0%	7	0%
rush	1	100%	37	19%	38	8%	76	14%
aggressive	0	0%	1	0%	3	0%	4	0%
fast expo	1	0%	5	0%	2	0%	8	0%
macro	1	0%	1	0%	2	0%	4	0%
defensive	1	0%	4	0%	1	0%	6	0%
total	5	20%	50	14%	50	6%	105	10%

ISAMind may be based on Locutus, but unlike Locutus it is vulnerable to AIUR’s dark templar rushes. It’s a sign that it is not as mature and well tested.

#6 iron	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	5	0%	7	0%
rush	1	0%	26	19%	2	0%	29	17%
aggressive	0	0%	2	0%	2	0%	4	0%
fast expo	1	0%	1	0%	31	10%	33	9%
macro	1	0%	19	5%	4	0%	24	4%
defensive	1	0%	1	0%	6	0%	8	0%
total	5	0%	50	12%	50	6%	105	9%

#7 zzzkbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	4	0%	2	0%	2	0%	8	0%
rush	4	0%	4	0%	1	0%	9	0%
aggressive	3	0%	2	0%	1	0%	6	0%
fast expo	3	0%	3	0%	1	0%	7	0%
macro	7	0%	5	0%	4	0%	16	0%
defensive	4	0%	34	29%	41	12%	79	19%
total	25	0%	50	20%	50	10%	125	12%

4 pooler ZZZKBot is of course best countered by a defensive anti-rush strategy. Well, it helped, but the rush is too strong for AIUR to survive reliably. On the 2-player map, AIUR found no answer.

#8 microwave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	1	0%	5	0%
rush	1	0%	27	7%	1	0%	29	7%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	2	0%	1	0%	4	0%
macro	1	0%	1	0%	9	22%	11	18%
defensive	18	22%	17	24%	36	25%	71	24%
total	24	17%	50	12%	49	22%	123	17%

Microwave apparently also played a rushy style versus AIUR. That’s interesting. I think that AIUR’s defensive strategy is good against pressure openings generally, so Microwave was likely playing low-econ but not necessarily fast rushes.

#9 letabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	1	0%	3	33%	5	20%
aggressive	0	0%	3	33%	1	0%	4	25%
fast expo	1	0%	41	49%	43	49%	85	48%
macro	1	100%	3	33%	1	0%	5	40%
defensive	1	0%	1	0%	1	0%	3	0%
total	5	20%	50	44%	50	44%	105	43%

Fast expo makes sense against LetaBot’s “wait for it... wait for it... here it comes!” one big smash.

#10 megabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	3	0%	6	0%
rush	2	50%	4	0%	38	11%	44	11%
aggressive	1	0%	3	0%	3	0%	7	0%
fast expo	1	0%	3	0%	2	0%	6	0%
macro	2	0%	36	28%	2	0%	40	25%
defensive	18	94%	2	0%	2	0%	22	77%
total	25	72%	50	20%	50	8%	125	26%

Why did MegaBot have so much more trouble on the 2-player map? According to the official per-map result table, MegaBot did fine overall on Destination (the one 2-player map), so its trouble came only against AIUR. Maybe I should watch replays and diagnose it.

#11 ualbertabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	2	0%	43	37%	2	0%	47	34%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	2	0%	1	0%	4	0%
macro	18	33%	1	0%	1	0%	20	30%
defensive	1	0%	1	0%	44	16%	46	15%
total	24	25%	50	32%	50	14%	124	23%

#12 tyr	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	100%	1	0%	32	81%	34	79%
aggressive	0	0%	37	46%	8	75%	45	51%
fast expo	1	100%	3	33%	3	67%	7	57%
macro	1	0%	6	33%	3	33%	10	30%
defensive	1	0%	2	0%	3	33%	6	17%
total	5	40%	50	40%	50	72%	105	55%

I suspect that Tyr suffered here because it is a jvm bot and could not write its learning file.

#13 ecgberht	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	100%	38	89%	2	50%	41	88%
rush	1	100%	1	0%	43	67%	45	67%
aggressive	0	0%	4	75%	1	0%	5	60%
fast expo	1	100%	1	0%	2	0%	4	25%
macro	1	0%	3	67%	1	0%	5	40%
defensive	1	0%	3	67%	1	0%	5	40%
total	5	60%	50	82%	50	60%	105	70%

#15 titaniron	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	50%	4	25%
rush	1	0%	1	0%	3	33%	5	20%
aggressive	0	0%	42	79%	42	88%	84	83%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	1	100%	2	50%	1	0%	4	50%
defensive	1	100%	3	0%	1	0%	5	20%
total	5	40%	50	68%	50	78%	105	71%

TitanIron appears to have been too predictable. Notice that the winning strategy on most maps was never tried (without crashing) on the 2-player map. It might have won there too.

#16 ziabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	16	50%	2	50%	1	0%	19	47%
rush	1	0%	2	0%	1	0%	4	0%
aggressive	1	0%	1	0%	3	33%	5	20%
fast expo	1	0%	2	50%	0	0%	3	33%
macro	1	0%	1	0%	1	0%	3	0%
defensive	3	33%	42	69%	44	57%	89	62%
total	23	39%	50	62%	50	52%	123	54%

#17 steamhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	3	67%	4	75%	9	100%	16	88%
aggressive	3	100%	17	100%	15	100%	35	100%
fast expo	2	0%	2	0%	2	50%	6	17%
macro	1	100%	10	100%	1	0%	12	92%
defensive	14	100%	16	100%	22	100%	52	100%
total	24	83%	50	92%	50	94%	124	91%

#18 overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	3	0%	2	50%	6	17%
rush	0	0%	2	50%	1	0%	3	33%
aggressive	0	0%	1	0%	10	60%	11	55%
fast expo	1	0%	3	67%	0	0%	4	50%
macro	0	0%	0	0%	0	0%	0	0%
defensive	16	88%	41	90%	37	78%	94	85%
total	18	78%	50	80%	50	72%	118	76%

#19 terranuab	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	100%	8	88%	1	0%	10	80%
rush	1	100%	11	100%	30	100%	42	100%
aggressive	0	0%	4	75%	2	50%	6	67%
fast expo	1	100%	16	100%	6	83%	23	96%
macro	1	100%	9	89%	10	90%	20	90%
defensive	1	100%	2	50%	1	0%	4	50%
total	5	100%	50	92%	50	90%	105	91%

#20 cunybot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	50%	4	75%	7	57%
rush	1	100%	1	0%	2	0%	4	25%
aggressive	0	0%	4	75%	13	92%	17	88%
fast expo	1	0%	2	50%	2	50%	5	40%
macro	1	100%	9	89%	13	100%	23	96%
defensive	1	100%	32	100%	15	100%	48	100%
total	5	60%	50	90%	49	90%	104	88%

#21 opprimobot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	100%	12	100%	6	83%	19	95%
rush	1	100%	5	100%	7	100%	13	100%
aggressive	0	0%	7	100%	4	100%	11	100%
fast expo	1	100%	11	100%	17	100%	29	100%
macro	1	100%	8	100%	7	100%	16	100%
defensive	1	100%	7	100%	9	100%	17	100%
total	5	100%	50	100%	50	98%	105	99%

#22 sling	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	100%	5	100%	2	50%	8	88%
aggressive	0	0%	13	100%	13	100%	26	100%
fast expo	1	100%	7	100%	10	100%	18	100%
macro	1	100%	8	100%	11	100%	20	100%
defensive	1	100%	16	100%	13	100%	30	100%
total	5	80%	50	98%	50	96%	105	96%

#23 srbotone	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	50%	1	0%	4	25%
rush	1	100%	9	100%	3	67%	13	92%
aggressive	0	0%	13	100%	16	100%	29	100%
fast expo	1	100%	10	100%	8	100%	19	100%
macro	1	100%	7	86%	6	100%	14	93%
defensive	1	100%	9	100%	16	100%	26	100%
total	5	80%	50	96%	50	96%	105	95%

#24 bonjwa	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	100%	9	100%	4	75%	14	93%
rush	1	100%	13	100%	10	100%	24	100%
aggressive	0	0%	7	100%	10	100%	17	100%
fast expo	1	100%	6	100%	7	100%	14	100%
macro	1	100%	7	100%	8	100%	16	100%
defensive	1	100%	8	100%	11	100%	20	100%
total	5	100%	50	100%	50	98%	105	99%

#25 stormbreaker	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	4	75%	1	0%	4	75%	9	67%
rush	0	0%	5	80%	10	100%	15	93%
aggressive	0	0%	18	100%	7	100%	25	100%
fast expo	0	0%	0	0%	6	100%	6	100%
macro	0	0%	9	100%	8	100%	17	100%
defensive	20	100%	17	100%	15	100%	52	100%
total	24	96%	50	96%	50	98%	124	97%

#26 korean	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	7	100%	2	100%	10	100%	19	100%
rush	0	0%	7	100%	8	100%	15	100%
aggressive	0	0%	5	100%	8	100%	13	100%
fast expo	0	0%	8	100%	8	100%	16	100%
macro	0	0%	5	100%	6	100%	11	100%
defensive	14	100%	23	100%	10	100%	47	100%
total	21	100%	50	100%	50	100%	121	100%

Well, if you win every game, learning cannot help.

#27 salsa	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	9	100%	15	100%	9	100%	33	100%
rush	0	0%	0	0%	3	100%	3	100%
aggressive	0	0%	11	100%	8	100%	19	100%
fast expo	0	0%	0	0%	4	100%	4	100%
macro	0	0%	8	100%	7	100%	15	100%
defensive	15	100%	16	100%	19	100%	50	100%
total	24	100%	50	100%	50	100%	124	100%

CIG 2018 - bots that wrote data

The CIG organizers have released the final read/write folders for the 2018 tournament. I looked through all the folders to see if each bot recorded information. If it saved nothing, it did not learn. If it saved some data, it may have used it for learning (or it might be log files or whatever). I also added a curve “↕” column, whether the bot’s win rate moves up or down (or is approximately flat) between round 40 and the end of the tournament—is the bot improving until late in the tournament? Win curves early in the tournament are noisy, so they’re hard to compare. (If anybody can’t see the Unicode up and down arrows, let me know and I can change them.)

Some bots have files in their AI folder, which may be prepared data or pre-learned data for specific opponents. I note that too. Prepared data could be kept elsewhere, including in the binary, so I didn’t see all of it. We know that PurpleWave had extensive preparations for specific opponents.

As has been mentioned, bots in Java or Scala (bots which run on the jvm) were unable to write learning data. Those that depend on their learning data were playing at a severe disadvantage. #2 PurpleWave lost narrowly to #1 Locutus and was one of the affected bots. It’s a serious problem for a tournament that wants to be taken seriously.

#	bot	↕	info
1	Locutus	↑	Prepared data for 11 opponents. Learning data very similar to Steamhammer’s.
2	PurpleWave	-	jvm :-(
3	McRave	-	Looks like wins and losses of each of 16 available strategies for the previous 8 games. Perhaps a sliding window?
4	tscmoo	↑	Looks like strategy and win/loss info for each opponent, in a hard-to-read structured format. Past years have had more elaborate data.
5	ISAMind	↓	Prepared file that looks like neural network learning data. Per-opponent learned data that looks like Steamhammer data. ISAMind is based on Locutus, so that makes sense.
6	Iron	↓	Nothing.
7	ZZZKBot	-	Game records, one game per line, in an opaque format that looks about the same as last year.
8	Microwave	↑	For each opponent, wins and losses for 8 different strategies.
9	LetaBot	↑	Information about a few recent games against ZiaBot, probably not used for learning.
10	MegaBot	↓	Extensive log data. The apparent learning files are `MegaBot-vs-[opponent].xml` and give scores for NUSBot, Skynet, Xelnaga (MegaBot’s three heads).
11	UAlbertaBot	↑	Win/loss numbers for 4 protoss, 4 terran, and 5 zerg strategies. But the same strategy was always chosen for each race, so learning was turned off.
12	Tyr	↓	jvm :-(
13	Ecgberht	↓	jvm :-(
14	Aiur	↑	The familiar lists of numbers for each opponent.
15	TitanIron	↓	Nothing.
16	ZiaBot	↓	One file with data for TerranUAB, UAlbertaBot, and 3 lines for SRbotOne. Zia’s learning looks broken or disabled.
17	Steamhammer	↑	Steamhammer saved data when it did not crash, and successfully learned a little bit.
18	Overkill	↑	A file for each opponent, game records with opponent/opening/score.
19	TerranUAB	-	Nothing.
20	CUNYbot	↓	One file `output.txt` with strategy information and numbers, naming a few opponents but not most. A prepared file in the `AI` folder has the same format. It’s mysterious.
21	OpprimoBot	↓	Nothing.
22	Sling	↓	Nothing.
23	SRbotOne	↑	A large number of “stats” files named with date and time, apparently game records. For each opponent, another file giving the strategy “Terran_Attrition” and win/loss numbers. I’m not sure whether this could be learning data, but the bot did earn an up arrow.
24	Bonjwa	↑	Nothing.
25	Stormbreaker	↑	Prepared data `NN_model_policy` and `NN_model_stateValue`, apparently neural network learning data. For each opponent, game records with 4 numbers per game. The format is like Overkill’s but records more information.
26	Korean	-	Nothing.
27	Salsa	-	Nothing.

Most striking is that the “nothing” bots cluster toward the bottom. If you don’t even try to record data, either you are Iron or you performed weakly. The jvm bots, which in fact recorded nothing due to no fault of their own, still placed higher than all the nothing bots other than Iron. Perhaps recording data is a proxy for how much effort has gone into the play.

Some bots had a rising win rate (an up arrow) despite doing no learning, most distractingly UAlbertaBot. I think that since UAlbertaBot plays random, its opponents can easily get confused about it. In general, I think that playing unpredictably (either being random or choosing varied openings randomly) can mess up the learning of some other bots.

I will be analyzing what certain bots learned. It will shed light on their opponents.

the critical vital absolutely essential filename question of the decade

Here is a question of virtually no importance that nevertheless has me puzzled: What should I call my learning files?

For now, Steamhammer has an opponent model file for each opponent, named om_[opponent].txt. OM for opponent model. It makes sense, or anyway it makes as much sense as an arbitrary filename can. I like names that make sense.

The next learning data I want to add is opening models, so that Steamhammer can know the timings. With knowledge of both opponent strategies and its own opening strategies, it will be able to directly compare and find counters: Your attack comes at this time, which opening is ready at that timing? It will also be able to recognize which openings are similar to others, so that when it can’t match a strategy, finding good openings by trial and error is quicker.

What should I name the opening model files? Opponent models and opening models are both empirical data, and should be updated as games are played. But the unfortunate words “opponent” and “opening” are too much alike to abbreviate nicely. Should I start with “enemy” and “build order”? “Bot” and “strategy”? Most abbreviations seem unintuitive. My best idea so far is to rename the om_* files to bot_* and use the prefix open_ for openings. Maybe OK?

What’s your idea? Because, according to the bikeshed principle, everyone should have an opinion on this....

Steamhammer’s learning results

When I uploaded Steamhammer 1.4.3 to SSCAIT on 11 June, I erased its learned data from the server. Its elo immediately plunged, partly because the voters wanted to put it through its paces against strong opponents, and partly because it needs its learning data to cope. Most rushbots, and many others, won their first or first few games against Steamhammer. That didn’t change when I uploaded 1.4.4 a week later—the improvements weren’t many.

Finally, only in the past several days, I’ve started to feel that Steamhammer has learned enough that it is closing in on its equilibrium elo. It has been wavering around the high 2100’s, not able to break above 2200 but not falling far either. It seems about right, at least for SSCAIT conditions.

The findings. As I’ve mentioned, clearing the learning data was a deliberate test to see how well the learning system works when learning from scratch. I’m fairly pleased. I see weaknesses, but only weaknesses I expected. Against XIMP, Steamhammer has settled on an opening that has won every game so far, but is not as strong as the 3 hatch before pool opening that I hand-chose for it in the old days. Steamhammer only sees that it wins. It can’t tell the difference between openings that win nearly all games because it sees only the winning rate; it needs an evaluation function that can tell it “this one wins more convincingly.” Against Proxy, it won one game with its unusual 6 pool opening. Then it played another game and recorded another win—because Proxy crashed. Steamhammer thought it had found a winner, and had to lose some games before it realized that the 6 pool was not a reliable counter. (It would be, if not for Proxy’s powerful worker defense.) Possibly 5 pool or 4 pool would succeed, but Steamhammer does not know that some openings are related to others. When I teach it that, it will be able to realize that if one opening shows promise but is not quite successful, it should try related openings.

In some cases, Steamhammer hit on surprising counters. The most striking example is against TyrProtoss, which had been winning every game with its cannon turtle into timing attack strategy. Steamhammer tried its 2 hatch lurker all-in attack, which did not make sense to me—Steamhammer’s lurkers suck when cannons are around, it has little idea how to break the cannons and no idea how to bypass them. But it won a game. We’ll see if it keeps winning!

I expected the weaknesses, and I expected the surprising counters. I feel as though I understood the learning system and its limitations fairly well. It gives me confidence that my planned improvements, when I finally get around to them, will be real improvements.

no more enemy-specific strategies for Steamhammer

Working on the opponent model today, I made one of the key changes for the next version:

    "UseEnemySpecificStrategy" : false,
    "EnemySpecificStrategy" :
    {
    },

No more openings hand-configured for known opponents. Steamhammer has to figure out everything on its own. I’ve been working toward this for a long time, and it’s good to finally take the step. I expect play to become more varied—Steamhammer is likely to discover surprising solutions for some opponents. Play should also become stronger, especially in tournaments where opponents like to prepare specially against select enemies. They’ll have to look for ways to exploit Steamhammer’s tactical and micro mistakes, because the game plans will be too adaptive.

I also wrote the terran vulture-first recognizer for the plan recognizer today. It recognizes a plan called Factory that can only be followed by terran, and Steamhammer zerg is configured to counter the plan with the AntiFactory opening. Testing against Iron, it worked perfectly: The first game, Iron won easily. The second game, Steamhammer countered and fought back hard (and happened to win). That’s how it’s supposed to work.

The recognizer was easy to write. Maybe I should write a few more recognizers and counters.

Iron should be a good test case, because Iron is strong enough to usually defeat the counter—AntiFactory puts up a tough battle, but still mostly loses. Opening learning success looks like this: Steamhammer realizes that AntiFactory is probably best, though not all that good, and explores other openings sometimes but not too often. I think I should be able to get that right.

Will playing better games against Iron entice voters on SSCAIT? I think it might happen. If so, I will quickly grow bored with similar Iron-Steamhammer games, but stream watchers may be pleased. Iron would likely lose a few elo points to Steamhammer on average, instead of gaining as it does now.

The upcoming version 1.4.2 has important improvements for all races, including some improvements I haven’t mentioned. Strategy, macro, and micro are better. Look forward to higher rankings for Steamhammer and Randomhammer.

Legionnaire’s analysis of Sparkle and Transistor

The planned post about strategy abstraction is delayed by a power outage at my house. Here’s a brief filler.

TeamLiquid has a post with analysis of new ASL maps Sparkle and Transistor by Australian protoss Legionnaire. Without drawing any strong conclusions about overall balance, Legionnaire points out how map features will affect play.

Current bots are poor at adapting to map features. More than that, it is beyond the state of the art for any AI system to adapt to maps with as few games as humans need. Humans reason out how map features affect play, and with experience they sharpen their reasoning. Machines, so far, mainly collect statistics about the course of events, and they need a vastly larger number of games to zero in on good strategy. Of course they may be able to play those many games faster, but we don’t know how to make a system that can combine reasoning with empirical learning like a human. I’m interested in Legionnaire’s expert analysis as an example that may offer clues.

adapting to the maps

Instead of configuring all of Steamhammer’s opening probabilities by hand, I want it to figure them out for itself. The starting point is data: For each matchup, keep overall statistics of all the openings in a file. Another goal is to have Steamhammer adapt its strategy to maps based on experience. So I thought, why not combine the two? For each matchup and map, keep a file with numbers for all the openings. Or for each matchup, keep a file for all the pairs (map, opening)—that way the bot has the data to generalize between maps.

Someday I also want Steamhammer to adapt its strategy reactions and its tactics to the map. At first it will analyze the map and decide what looks good (“look at that cliff over the mineral line—I should go air or drop”); later it will learn from experience what works well. I don’t expect to get to that for a long time, though.

Steamhammer has over 60 zerg openings (it doesn’t play all of them), and the count will increase. SSCAIT has 14 maps and other tournaments use fewer, so I think I should be ready for on the order of 1000 pairs (map, opening) if I keep them in one file. Each pair would be one line of data, something like “<opening> <map> <# games> <# wins>” and maybe a few more numbers like mean game length, or a statistical summary of evaluations once there is an evaluation function, or whatever else. If I want a cache of map data like number of starting positions and rush distances and so on, to use in generalizing across maps, that would be in a separate file.

In that case there would be 12 matchup files, including data for when the opponent went random: TvT TvP TvZ TvR, PvT PvP PvZ PvR, ZvT ZvP ZvZ ZvR. With up to 1000 lines per file, it seems like a reasonable amount of data. In every game, Steamhammer would read one file in onStart() and write it in onEnd(), which doesn’t seem excessive. There is one complication. If the opponent goes random, and after I give Steamhammer the ability to change its mind on the fly (which I will do), then when Steamhammer finds out the opponent’s race it may want to read that matchup file too. Reading and analyzing that much data may take more than one frame time, so it might have to go into a separate thread. Another solution for when the opponent goes random might be to read 4 matchup files during onStart() when we are allowed more time. Well, when it comes up I’ll figure out a plan. Maybe nothing special will be needed (seek time for a hard drive could exceed one frame time, but reading from SSD is faster).

That’s the data. How to use it? I haven’t decided on details. The opponent model keeps detailed records for each game against a given opponent, including the map. When we play the opponent for the first time, decisions have to be made without the opponent model, solely on the basis of the matchup+map+opening statistics. I’ll figure out a more-or-less sound way to turn the raw numbers into probability-to-play-this values, including an exploration policy. There are choices. After we’ve played an opponent many times, the opponent model will have more information (since it records more data, and it is specific to the opponent), so it can decide on its own. In between, I’ll need some kind of evidence combination procedure to blend the 2 sources of information together. I expect that a simple procedure would work fine, even weighted averaging of probabilities.

Steamhammer’s configuration file will become much shorter. I expect I’ll retain the manual configuration options for openings, for those bot authors who want to do it that way, but Steamhammer itself will rely on the data it collects. Once I have a good set, I’ll distribute it with the bot.

I’m not sure when I’ll get to all this stuff. Maybe or maybe not in the 1.4.x series.

Next: A tree of openings. It ties in.

the opponent model in Steamhammer 1.4

Today is Steamhammer 1.4’s opponent model. The features are:

Recognize some enemy opening plans.
Predict the enemy’s opening plan from experience against this opponent.
React to the enemy’s predicted or actual opening plan.
Choose openings based on the predicted plan.
Decide whether to steal gas. Does experience suggest it may be worth trying against this opponent?

The code that implements the features has these parts:

The plan recognizer, which looks at the current game situation.
Game records, which save information about past games against this opponent, including the recognized plan.
The plan predictor and gas steal decider, which draw conclusions based on the game records.
Strategy reactions to predicted or recognized enemy plans are in various places–they are uses of the opponent model, not parts of the opponent model.
Opening choices to counter the predicted plan can be set in the configuration file.

1. The plan recognizer was first written up in December. It tries to understand the opponent’s intentions during the earliest part of the game. The code is in OpponentPlan.cpp and is less than 150 lines—it is rudimentary.

Unknown - No plan recognized. The plans are not exhaustive, so this is common.
Proxy - Enemy buildings in your base. This doesn’t include cannon contains or other more distant proxies.
WorkerRush - Like Stone.
FastRush - Basic units faster than 9 pool, 8 rax, or 9 gateway.
HeavyRush - 2 gate zealots, 2 barracks marines, etc.
SafeExpand - Static defense before expanding to the natural. Zerg can’t do this.
NakedExpand - Expansion with no static defense.
Turtle - Static defense without expanding.

My early impression of the plan recognizer was that it often failed to recognize plans, but was rarely wrong when it did. With more experience, I think it often misrecognizes plans severely. It’s crude and clumsy. Even so, when it helps it helps a lot. It’s a net win.

2. Game records are handled in GameRecord.cpp. Steamhammer writes a file for each opponent, like many learning bots. Here is one record from a file named om_UAlbertaBot.txt, where “om” stands for opponent model, with annotations so you can make sense of the list of numbers.

1.4                  <- record format version (from the Steamhammer version when it was first used)
ZvRZ                 <- matchup
(2)Destination.scx   <- map
Over10Hatch          <- Steamhammer’s opening
Heavy rush           <- initial predicted enemy plan
Fast rush            <- actual enemy plan
1                    <- we won, 1 or 0
0                    <- frame we sent a scout to steal gas, 0 if never
0                    <- gas steal happened, 1 or 0 (extractor was/was not queued)
2110                 <- frame enemy first scouted our base
3102                 <- frame enemy got first combat units
0                    <- frame enemy got first air units
0                    <- frame enemy got static air defense
0                    <- frame enemy got mobile air defense
0                    <- frame enemy got cloaked units
0                    <- frame enemy got static detection
1726                 <- frame enemy got mobile detection
12309                <- last frame of the game
END GAME             <- end mark

Theoretically, a single file could have records in more than one format. Of course, only one format exists so far. The “frame enemy got” times recorded are the time we first saw such a thing, which may be much later than it actually happened. For example, here we first saw an enemy overlord (a mobile detector) on frame 1726, but it existed the whole game. The END GAME mark is redundant. It gives us a way to recover in case data in the middle of a file is corrupted—we can skip ahead past the next END GAME and continue reading the file from there.

3. The plan predictor and gas steal decider look at the game records near the start of the game.

The gas steal decider was written up in December. Nothing important has changed since then.

The plan predictor runs once at the start of the game. It looks through the game records to see what recognized plans the enemy has played. It is close to the bare minimum: It counts the recognized plans in the game records for this matchup, ignoring unknown plans, and weighting recent games more using a discount factor so that the past is gradually forgotten. That way it reacts quickly when the enemy changes its play. Whether the game was won or lost, whether past predictions were correct or wrong—all the other information is ignored.

If the opponent was random, then when the opponent’s race is found out, the plan predictor runs again. In the first run, it counted all game records where the opponent went random. In the second run, it counts only games where the opponent was the same race as this game. The predicted plan may change.

4. Strategy reactions could be written in anywhere, but so far they are in StrategyManager for terran and protoss, and in StrategyBossZerg for zerg. So far, all the reactions are reactions to the enemy plan, not to any of the other information (even though it has obvious uses). Some strategy reactions are made when the enemy plan is recognized. Some reactions must begin in time, and happen when the plan is predicted. For example, against UAlbertaBot, the initial predicted plan is “Heavy rush”. If Steamhammer finds out that UAlbertaBot is zerg, it knows that zerg follows a different plan. The predicted plan changes to “Fast rush” and efforts to stop the zerglings begin immediately (if Steamhammer is zerg or terran; the protoss reaction is turned off because I didn’t get it working).

Good strategy reactions are the hard part. I find them much more difficult than the opponent model proper. Some of Steamhammer’s reactions are weak.

5. Openings to counter specific enemy plans can be written into the configuration file in a new subsection CounterStrategies. The opening is chosen once at the start of the game and can’t be changed (in this version), so only the initial predicted enemy plan matters.

Steamhammer first looks for counter strategies specific to the enemy race. The name is in the format “Counter [plan name] v[race character]”. The race character is “U” for Unknown if the opponent went random. You can use all the usual features of random opening selection. As you can see in the example, zerg is configured with a wider range of choices.

"Counter Safe expand vT" :
	{
		"Terran" : "14CCTanks",
		"Protoss" : "13Nexus",
		"Zerg" : [
			{ "Weight" :  1, "Strategy" : "FastPool", "Weight2" : 5 },
			{ "Weight" :  9, "Strategy" : "9PoolSpeed" },
			{ "Weight" :  0, "Strategy" : "ZvT_3HatchMuta", "Weight2" : 50 },
			{ "Weight" : 50, "Strategy" : "ZvT_3HatchMutaExpo", "Weight2" : 0 },
			{ "Weight" : 30, "Strategy" : "3HatchLurker" },
			{ "Weight" : 10, "Strategy" : "3HatchPoolMuta", "Weight2" : 0 }
		]
	},

If no counter strategy is found for the specific enemy race, Steamhammer next looks to see if there’s a general one for all races—leave out the “vX” string. This example points to a reusable strategy combo that specifies openings for each race Steamhammer might be playing, no matter the enemy race. The strategy combo feature has not changed.

"Counter Worker rush" : "AntiFastCheese",

If no counter strategy is found, Steamhammer falls back on its usual random opening selection. So if the regular repertoire is best in any given case, don’t specify a counter for that case.

For the following version Steamhammer 1.4.1, I will add at least one large piece to the opponent model. I'm likely to change my mind once I see how this version does in practice, but at the moment I'm thinking of fine-tuning plan prediction and opening selection based on wins and losses. That will raise Steamhammer's performance ceiling; with enough games, it will learn much more. Other possibilities include: • Make use of more of the information in the game records. This opponent doesn't get detection, use cloaked units; that opponent gets air units around frame X, add a spire for scourge just in time. • Have Steamhammer collect data on its own openings so it can generalize: Play a fast/slow opening; play a lurker/muta opening. • Revamp the plan recognizer with a richer set of plans, maybe a hierarchy so it can refine its conclusions over time. • Machine learning for a probabilistic plan recognizer and/or plan predictor. • Restore the originally planned extensive game records, making it possible to predict unit mixes over time throughout the game. • Add the ability to change openings during play instead of deciding ahead of time, so that decisions can be made as late as possible. • Move scout timing into the opponent model alongside the gas steal decider. • Have expansion decisions or hatchery placement influenced by the opponent model. “Hmm, historically in situations like this we do poorly if the third hatchery is at an expansion. Make it a macro hatchery instead.”

I have no shortage of ideas!

Next: Brief remarks on the SSCAIT round of 8.

strategy learning by solving the game matrix

Being unpredictable to your enemies has value. How can you do strategy learning and still remain unpredictable when you should? You can’t simply require randomness, because if one strategy dominates, then you should play it every time. At other times, you may benefit from playing 2 strategies equally, or by playing a normal strategy 95% of the time and otherwise rushing. It depends on what opponent strategies counter each of yours, and the n-arm bandit methods that bots use now don’t understand that. Here’s one way to do it. It’s a step up in complexity from UCB, but not a tall step.

You can record the results of your strategies and the enemy strategies in a zero-sum game matrix, and solve the strategy game (which is the subgame of Starcraft that involves choosing your strategy). In the first cut version, each cell of the matrix is “n times it happened that I played this and the enemy played that, and k of those were wins.” Take the observed probability of win for each cell of the game matrix as the payoff for that cell, and solve the game. The solution tells you how often you should play each of your strategies, assuming that the opponent chooses optimally.

There are a couple different algorithms to solve zero-sum game matrixes fast. I personally prefer the iterative approximate algorithm (here is a simple python implementation), but it doesn’t make much difference.

If you recognize a lot of strategies on both sides, you’ll have many matrix cells to fill in, each of which requires some number of game results to produce a useful probability. 10 strategies for each side already means that a big AIIDE length tournament won’t produce enough data. For a first cut, I recommend recognizing only 2 or 3 categories of enemy strategies, such as (example 1) rush, 1 base play, 2 base play, or (example 2 for zerg) lings and/or hydras, mutalisks, lurkers. Since you’re grouping enemy strategies into broad categories, you don’t need much smarts to recognize them.

You can group your own strategies in a completely different way, if you like. There’s no reason to stick to the same categories. Also, your bot presumably knows what it is doing and doesn’t need to recognize game events as signifying that it is following a given class of strategy.

In this method, you are assumed to choose your strategy before you scout, or at least ignoring scouting information. You can take your time to recognize the enemy strategy, and base the recognition decision on anything you see during the entire game.

How do you get started learning? You might want to start with a matrix of all zeroes and only use the game matrix for decisions after you’ve gathered enough data. Instead, I suggest keeping a global matrix alongside the ones for each opponent, with floating point game counts and win counts in each cell. The global matrix has the totals for all opponents. (Or maybe there’s a global matrix for each opponent race.) When you face a new opponent, initialize the new opponent’s matrix with scaled down game counts and win counts from the global matrix, as if only a small number of games had been played in total (I suggest 1 to 3 times the number of cells in the matrix as a try). You’ll start out playing a strategy mix that is good against the average opponent, and as you accumulate data the mix will shift to specifically counter this opponent.

There are tons of ways to fancy it up if you want to try harder. You could try a variant where you estimate the enemy’s choice probabilities instead of assuming the enemy plays optimally (you’ll need a different solution algorithm). You can keep a larger game matrix in parallel with the small one, and switch to it when you’ve accumulated enough data. Or use a hierarchical method that unfolds categories when there is enough data to distinguish them. You can try a more complicated Bayesian game solution algorithm, which realizes that the numbers in each cell are empirical approximations and takes that into account (“oh, this cell doesn’t have many games, better not rely too strongly on its value”). You can include scouting information in the strategy decision (“well, I can see it’s not a rush, so strike out that option for the opponent”). You can divide your notion of strategy into any fixed number of aspects, and keep independent matrixes for each aspect, so that your strategy choices are potentially random in many different dimensions. The sky is the limit.

AIIDE 2017 what AIUR learned

Here is what AIUR learned about each opponent over the course of the tournament. I did this mostly because it’s easy; I already had the script from last year. But it’s also informative—AIUR’s reactions tell us how each bot played, and may tell bot authors what they need to work on.

The data is generated from files in AIUR’s final read directory. AIUR recorded 111 games against some opponents even though the tournament officially ran for 110 rounds; that is presumably because the tournament did run longer but was cut back to a multiple of 10 rounds for fairness (since there are 10 maps). On the other hand, AIUR’s total game count according to itself is 2938 and according to the tournament results is 2965, so it may have been unable to record some games (it is listed with 53 crashes, so that’s not a surprise). First an overall view, totalling the data for all opponents. We can see that all 6 of AIUR’s strategies (“moods” it calls them) were widely valuable: Every strategy has win rate over 50% on some map size. AIUR’s overall win rate in the tournament was 50.46%.

overall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	159	55%	59	37%	161	44%	379	47%
rush	134	66%	87	55%	185	50%	406	56%
aggressive	107	56%	108	43%	155	30%	370	41%
fast expo	69	45%	84	33%	197	51%	350	46%
macro	46	28%	69	52%	211	37%	326	39%
defensive	352	60%	185	58%	570	55%	1107	57%
total	867	57%	592	49%	1479	48%	2938	50%

2, 3, 4 - map size, the number of starting positions
n - games recorded
wins - winning percentage over those games
cheese - cannon rush
rush - dark templar rush
aggressive - fast 4 zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes (not entirely successful)

#1 zzzkbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	16	12%	1	0%	4	0%	21	10%
rush	5	0%	1	0%	1	0%	7	0%
aggressive	3	0%	1	0%	5	0%	9	0%
fast expo	4	0%	1	0%	5	0%	10	0%
macro	3	0%	2	0%	3	0%	8	0%
defensive	3	0%	16	31%	37	24%	56	25%
total	34	6%	22	23%	55	16%	111	14%

AIUR struggled against the tournament leader but was not entirely helpless. Its cannon rush had a chance on 2 player maps and its anti-rush strategy on the others. We see how AIUR gains by taking the map size into account.

#2 purplewave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	0%	4	0%
rush	28	79%	3	33%	40	55%	71	63%
aggressive	1	0%	3	33%	1	0%	5	20%
fast expo	1	0%	11	36%	10	60%	22	45%
macro	1	0%	2	0%	1	0%	4	0%
defensive	1	0%	1	0%	1	0%	3	0%
total	33	67%	21	29%	55	51%	109	51%

AIUR upset #2 PurpleWave, a surprising outcome. The DT rush and the fast expand were both somewhat successful—rather unrelated strategies.

#3 iron	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	1	0%	7	0%	13	0%
rush	5	0%	2	0%	7	0%	14	0%
aggressive	3	0%	2	0%	12	0%	17	0%
fast expo	8	0%	14	7%	9	0%	31	3%
macro	6	0%	1	0%	10	0%	17	0%
defensive	5	0%	2	0%	10	0%	17	0%
total	32	0%	22	5%	55	0%	109	1%

Learning can’t help if nothing you try wins....

#4 cpac	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	4	0%	0	0%	2	0%	6	0%
aggressive	2	0%	1	0%	1	0%	4	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	2	0%	3	33%	2	0%	7	14%
defensive	24	38%	16	69%	48	50%	88	50%
total	34	26%	22	55%	55	44%	111	41%

Cpac was configured to play 5 pool against AIUR. It worked, but AIUR was able to compensate to an extent by playing its anti-rush build.

#5 microwave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	4	0%	8	0%
rush	1	0%	1	0%	4	0%	6	0%
aggressive	20	20%	15	13%	11	0%	46	13%
fast expo	1	0%	2	0%	6	0%	9	0%
macro	1	0%	1	0%	4	0%	6	0%
defensive	1	0%	1	0%	26	12%	28	11%
total	26	15%	22	9%	55	5%	103	9%

Microwave was successful but showed a little vulnerability to surprise zealots dropped in its main. I suspect it’s a tactical reaction issue.

#6 cherrypi	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	2	0%	2	0%	1	0%	5	0%
fast expo	2	0%	1	0%	1	0%	4	0%
macro	2	0%	1	0%	9	11%	12	8%
defensive	26	4%	16	12%	42	12%	84	10%
total	34	3%	22	9%	55	11%	111	8%

#7 mcrave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	26	100%	5	60%	45	62%	76	75%
rush	3	67%	9	67%	4	50%	16	62%
aggressive	1	0%	4	50%	1	0%	6	33%
fast expo	1	0%	2	50%	2	50%	5	40%
macro	1	0%	1	0%	1	0%	3	0%
defensive	1	0%	1	0%	2	0%	4	0%
total	33	85%	22	55%	55	56%	110	65%

AIUR upset McRave with its cannon rush, and the dark templar rush did well too. AIUR executes the best cannon rush of any bot, in my opinion. It is a sign that McRave’s play was not robust enough against tricks.

#8 arrakhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	3	0%	7	0%
rush	1	0%	1	0%	4	0%	6	0%
aggressive	1	0%	5	60%	3	0%	9	33%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	0	0%	12	50%	38	37%	50	40%
defensive	29	66%	1	0%	4	25%	34	59%
total	34	56%	22	41%	54	28%	110	39%

#9 tyr	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	6	67%	1	0%	1	0%	8	50%
rush	20	100%	1	0%	2	0%	23	87%
aggressive	3	33%	10	20%	1	0%	14	21%
fast expo	1	0%	7	29%	49	35%	57	33%
macro	1	0%	1	0%	1	0%	3	0%
defensive	2	50%	2	0%	1	0%	5	20%
total	33	79%	22	18%	55	31%	110	43%

The DT rush won 100% of the time on 2 player maps and was tried only a few times on larger maps, losing. Was it only unlucky on the 3 and 4 player maps, or is there a real difference? With only 3 games total, we can’t tell from the numbers. It is a weakness of AIUR’s learning: It’s slow because there is so much to learn. The flip side of the slowness is that, over a long tournament, it learns a lot.

#10 steamhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	1	0%	1	0%	4	0%
rush	2	50%	1	0%	2	0%	5	20%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	0	0%	1	0%	1	0%	2	0%
defensive	27	81%	17	88%	49	67%	93	75%
total	33	70%	22	68%	55	60%	110	65%

I was surprised to see Steamhammer upset by AIUR. I had thought that AIUR was a solved problem. On SSCAIT too, Steamhammer started to show losses against AIUR in September for the first time in months. I may have introduced a weakness in some recent version and AIUR’s learning took that long to find it on SSCAIT. In AIIDE, the tournament was easily long enough.

#11 ailien	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	3	0%	1	0%	2	0%	6	0%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	2	50%	0	0%	3	33%
macro	4	50%	8	75%	1	0%	13	62%
defensive	24	58%	8	88%	49	37%	81	48%
total	34	47%	22	64%	54	33%	110	44%

#12 letabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	7	43%	1	0%	2	0%	10	30%
rush	3	33%	13	54%	43	40%	59	42%
aggressive	5	40%	1	0%	1	0%	7	29%
fast expo	13	46%	3	33%	1	0%	17	41%
macro	1	0%	1	0%	6	33%	8	25%
defensive	1	0%	3	33%	1	0%	5	20%
total	30	40%	22	41%	54	35%	106	38%

I suspect that fast expo was the best strategy on 4 player maps, but how was AIUR to know? A weakness of AIUR’s epsilon-greedy learning, compared to UCB, is that it doesn’t realize that a less-explored option is more likely to be misevaluated.

#13 ximp	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	34	35%	0	0%	1	0%	35	34%
rush	0	0%	0	0%	1	0%	1	0%
aggressive	0	0%	13	8%	52	2%	65	3%
fast expo	0	0%	9	0%	0	0%	9	0%
macro	0	0%	0	0%	1	0%	1	0%
defensive	0	0%	0	0%	0	0%	0	0%
total	34	35%	22	5%	55	2%	111	13%

#14 ualbertabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	0	0%	0	0%	1	100%	1	100%
rush	0	0%	0	0%	0	0%	0	0%
aggressive	0	0%	0	0%	1	100%	1	100%
fast expo	0	0%	0	0%	0	0%	0	0%
macro	0	0%	0	0%	0	0%	0	0%
defensive	34	32%	21	5%	52	27%	107	24%
total	34	32%	21	5%	54	30%	109	26%

What’s up with all those zeroes? AIUR is coded to try each strategy once before it starts making decisions, and that did not happen here. It turns out that AIUR has pre-learned data for Skynet, XIMP, and UAlbertaBot, so its learning in those cases looks different.

#16 icebot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	1	0%	4	0%
rush	1	0%	2	50%	3	33%	6	33%
aggressive	3	100%	3	67%	4	50%	10	70%
fast expo	14	100%	3	67%	44	93%	61	93%
macro	4	75%	2	50%	1	0%	7	57%
defensive	9	89%	10	80%	2	50%	21	81%
total	32	88%	22	64%	55	82%	109	80%

#17 skynet	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	13	92%	0	0%	0	0%	13	92%
rush	21	95%	21	90%	51	88%	93	90%
aggressive	0	0%	0	0%	0	0%	0	0%
fast expo	0	0%	1	100%	0	0%	1	100%
macro	0	0%	0	0%	0	0%	0	0%
defensive	0	0%	0	0%	4	50%	4	50%
total	34	94%	22	91%	55	85%	111	89%

#18 killall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	3	0%	1	0%	5	0%
rush	1	0%	2	0%	1	0%	4	0%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	3	0%	1	0%	5	0%
macro	0	0%	2	0%	2	50%	4	25%
defensive	30	80%	10	70%	49	76%	89	76%
total	34	71%	22	32%	55	69%	111	62%

#19 megabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	3	67%	1	0%	2	0%	6	33%
rush	2	0%	14	36%	5	0%	21	24%
aggressive	6	67%	4	25%	4	0%	14	36%
fast expo	2	50%	1	0%	4	0%	7	14%
macro	1	0%	1	0%	36	25%	38	24%
defensive	17	76%	1	0%	2	0%	20	65%
total	31	65%	22	27%	53	17%	106	33%

#20 xelnaga	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	9	100%	6	83%	1	0%	16	88%
rush	19	100%	4	75%	1	0%	24	92%
aggressive	1	0%	3	33%	1	0%	5	20%
fast expo	1	0%	4	75%	1	0%	6	50%
macro	2	0%	2	50%	50	36%	54	35%
defensive	2	50%	3	67%	1	0%	6	50%
total	34	85%	22	68%	55	33%	111	56%

Against Xelnaga, AIUR found solutions on 2 and 3 player maps but not on 4 player maps. Is it another case of underexploration?

#21 overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	3	67%	5	40%
rush	2	50%	0	0%	0	0%	2	50%
aggressive	8	100%	4	100%	7	86%	19	95%
fast expo	3	67%	3	100%	7	100%	13	92%
macro	4	75%	3	67%	12	92%	19	84%
defensive	14	93%	11	100%	26	96%	51	96%
total	32	84%	22	91%	55	93%	109	90%

#22 juno	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	14	36%	33	15%	52	19%
rush	3	0%	1	0%	1	0%	5	0%
aggressive	2	0%	1	0%	2	0%	5	0%
fast expo	2	0%	1	0%	16	12%	19	11%
macro	1	0%	1	0%	1	0%	3	0%
defensive	19	21%	4	25%	2	0%	25	20%
total	32	12%	22	27%	55	13%	109	16%

Juno’s cannon contain upset AIUR. Learning didn’t help much, because the problem wasn’t in any of the strategies, it was in AIUR’s poor reactions to cannons appearing in front of its base. It is amusing to watch 2 bots cannon each other when sometimes both get cannons up.

#23 garmbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	2	50%	1	0%	0	0%	3	33%
aggressive	17	94%	17	100%	3	67%	37	95%
fast expo	0	0%	1	0%	23	83%	24	79%
macro	0	0%	1	0%	1	0%	2	0%
defensive	5	80%	1	0%	27	81%	33	79%
total	25	84%	22	77%	55	78%	102	79%

#24 myscbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	50%	4	25%
rush	2	0%	3	67%	2	50%	7	43%
aggressive	3	33%	2	100%	9	78%	14	71%
fast expo	1	0%	2	50%	1	0%	4	25%
macro	4	50%	4	100%	3	67%	11	73%
defensive	23	61%	10	100%	38	79%	71	76%
total	34	50%	22	86%	55	75%	111	69%

#25 hannesbredberg	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	80%	3	100%	3	67%	11	82%
rush	2	50%	3	100%	2	50%	7	71%
aggressive	2	50%	2	50%	2	0%	6	33%
fast expo	8	100%	3	100%	9	89%	20	95%
macro	2	50%	4	100%	11	91%	17	88%
defensive	15	100%	7	100%	28	100%	50	100%
total	34	88%	22	95%	55	89%	111	90%

#26 sling	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	50%	1	0%	3	33%	6	33%
rush	2	50%	0	0%	1	0%	3	33%
aggressive	12	100%	0	0%	23	96%	35	97%
fast expo	1	0%	5	100%	1	0%	7	71%
macro	3	67%	5	80%	12	75%	20	75%
defensive	5	80%	11	100%	15	80%	31	87%
total	25	80%	22	91%	55	80%	102	82%

Here is another possible case of insufficient exploration. The 4 zealot drop won 100% of the time on 2 player maps and 96% of the time on 4 player maps, but was never tried on 3 player maps (I guess due to a crash, since AIUR tries to play each strategy once). It’s not a severe problem, though, because 3 player maps did have 2 strategies that scored 100%.

#27 forcebot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	0	0%	1	0%	1	0%	2	0%
aggressive	3	67%	2	0%	1	0%	6	33%
fast expo	0	0%	1	0%	1	0%	2	0%
macro	0	0%	9	78%	3	67%	12	75%
defensive	29	100%	8	75%	48	94%	85	94%
total	33	94%	22	59%	55	85%	110	83%

#28 ziabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	12	100%	7	86%	36	86%	55	89%
rush	1	0%	1	100%	4	75%	6	67%
aggressive	6	100%	8	88%	6	83%	20	90%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	3	0%	1	0%	1	0%	5	0%
defensive	6	67%	4	75%	6	83%	16	75%
total	29	76%	22	77%	55	80%	106	78%

Next: AILien’s learning.

AIIDE 2017 the learning bots

In March 2016 I analyzed which bots learned during the AIIDE 2015 tournament by looking at the data files. Here’s a similar analysis for AIIDE 2017.

I looked at the “write” directory for each bot, to see if it wrote files there, and if so, what the files looked like. Writing data doesn’t mean that the bot actually learned anything—it may not have used the data. Bots not listed in the table did not write anything interesting (maybe log files, nothing more). The table includes 15 bots of the 28 entrants, over half of them.

#	bot	info
1	ZZZKBot	varied info about each game, including tidbits like the time zone and the processor it was played on
2	PurpleWave	for each opponent, a log of games with info including the sequence of strategies followed
5	Microwave	same format as UAlbertaBot (Microwave has more strategies)
6	CherryPi	opening data for each opponent
9	Tyr	for each opponent, seems to save info only about the previous game: win or loss, and a flag or two like "FLYERS" or "CANNONS"
11	AILien	10 values per opponent: scores for zerg unit types, a few odds and ends like `macroHeavyness` and `supplyISawAir`
12	LetaBot	one file which records a log of games, with opponent, race, map, and 3 numbers per game
14	UAlbertaBot	a file for each opponent, giving for each strategy the count of wins and losses; learning was turned on this year
15	Aiur	91 values per opponent: strategy x map size
17	Skynet	a file for each opponent, with lines like "build_0_2 14 12"
19	MegaBot	many files; the important ones seem to be "MegaBot-vs-[bot name].xml" which give scores for each bot MegaBot can imitate: Skynet, NUSBot, Xelnaga
20	Xelnaga	a file for each opponent with a single number: -1, 0, 2, or 3
21	Overkill	many files with neural network data, reinforcement learning data, and opening learning data for each opponent (more than I thought!)
24	Myscbot	same format as UAlbertaBot, but only 1 strategy was played for each opponent; nothing was learned
25	HannesBredberg	2 numbers per opponent, approximately (not exactly) the win and loss counts

LetaBot seems worth looking into, to see whether its log is learning data and, if so, how it is used. PurpleWave also recorded data essentially as a log, which could be used for a wide range of purposes. And AILien has a unique learning method that I should spend some time on.

UAlbertaBot had learning turned on this year. It has sometimes left learning off because its default strategies were dominant. It’s also notable that Ziabot skipped learning this year. In the past it has learned. Ziabot also finished last.

Next: What AIUR learned.

ZZZKBot’s games

The AIIDE 2017 win rate over time graph shows #1 ZZZKBot slowly and steadily learning throughout the tournament. It’s easiest to see if you turn off most of the lines of the graph (click on the bots in the legend). #2 PurpleWave shows a different pattern: Without prior knowledge, it starts lower, learns fast at first, then more slowly, and seems to level off before the end of the tournament (though it might be only a temporary plateau).

McRave

McRave upset ZZZKBot 64-46, so watching the games versus McRave lets us see the learning algorithm in action. ZZZKBot does not have a prior strategy versus McRave, possibly because none of its 4 strategies can win reliably against the early cannons. (There are ways to bust the cannons, so it is a limitation of ZZZKBot.)

There are 10 maps in the tournament, and they are played one map per round, so it takes 10 round robins to play all maps once. ZZZKBot played its 4 pool against McRave for the first 10 rounds to see how it did on each map. The answer: It won some and lost some, depending on whether it scouted protoss quickly and whether McRave pulled probes in time to shield the cannons when necessary.

On maps where the 4 pool won, ZZZKBot repeated it when the map came up again. On maps where the 4 pool lost, ZZZKBot switched to its next strategy, the overpool speedlings. The speedlings did not usually do well, because McRave had 3 cannons up in time. ZZZKBot tried to follow up with mutalisks, but McRave scouted that coming and was more than prepared.

I watched the sequence of games on Benzene. The speedlings mostly lost, except for occasional games where zerg managed to kill the scout probe and leave McRave in the dark and unable to react in time. But ZZZKBot kept trying the strategy, only occasionally switching back to 4 pool. I didn’t watch every game, but ZZZKBot’s other 2 strategies didn’t seem to come into play at all.

Iron

Versus Iron, ZZZKBot mostly stuck with its 1 hatch hydralisk strategy, an unusual opening. One odd point is that ZZZKBot researched hydralisk range before speed, which is rare and usually seen only when attacking a protoss wall. As we see in the per-map crosstables, ZZZKBot scored poorly against Iron on 2 player maps and tended to win on 3 and 4 player maps. The difference was that on 2 player maps, Iron was expecting to be rushed and was more willing to build a bunker, which held the hydras.

ZZZKBot sometimes fell back to its sunken defense into mutalisks, but that was less effective. Iron could stop the mutalisks, and its vultures were able to find gaps in the sunken defense.

Iron is the only opponent configured for the hydralisk build order. ZZZKBot doesn’t seem to use it at all, otherwise. I think the build must have been specially developed for Iron.

XIMP

ZZZKBot chose to bust XIMP’s cannons with its speedling build. The zerglings commonly arrived when XIMP had 2 cannons, versus McRave’s 3, and XIMP is not as skilled with probes. It didn’t help that XIMP likes to leave its gateway units in its main “to defend against drops.”

XIMP won only 4 games out of 110. In 3 of its wins, it got its first zealot into the fight in time to save the day. In the fourth, XIMP expanded to another base rather than its natural. ZZZKBot brilliantly scouted the undefended nexus and chose to attack it first, which allowed XIMP’s third cannon time to finish.

CherryPi

CherryPi is an interesting case because ZZZKBot’s “prior knowledge” was guesswork, “Guessing it is like tscmooz.” CherryPi consistently played a sunken defense into mutalisk build against ZZZKBot, with 2 to 4 sunkens. Where did this trend come from? In any case, Tscmoo doesn’t play any such build as far as I’ve seen.

ZZZKBot’s learning kicked in. It tried the 4 pool; no good. It tried the speedlings; no good. It tried its own sunken defense into mutalisks, building only 1 sunken, and that worked perfectly. The sunken was often poorly placed so ZZZKBot tended to lose a few drones, but its mutalisks were out faster.

the Steamhammer forks

ZZZKBot chose its sunken defense into mutalisks versus Steamhammer (with 5 sunkens), Microwave (3 sunkens), Arrakhammer (9 sunkens, because Arrakhammer likes to bust with zerglings), and KillAll (1 sunken), with great success. The sunken count is hand-configured for each opponent. I found it frustrating, because Steamhammer knows in principle how to respond: It makes drones instead of zerglings and goes air. Unfortunately, Steamhammer’s strategy adjustments are sloppy, and it almost always got its own mutalisks too late. It did things like start another hatchery when the spire finished, and then a queen’s nest, and then—well, then it was irretrievable. I knew all along that opponent modeling is crucial for tournaments.

conclusion

Watching games, I was struck that ZZZKBot’s builds are not tight. It doesn’t expand, and in the middle game (when it gets that far) it ends up with more drones than its hatcheries can support. It suffers mineral excess that it can’t spend, and gas shortage because it has only 1 geyser.

Its micro is not tight either. ZZZKBot doesn’t have a combat simulator (well, it would probably be in a subroutine, and as the ancient Greeks declared, only straight lines and circles will do). If the 4 pool leaves cannons alive, then the next 2 followup zerglings will die to the cannons, accomplishing nothing. Then the next 2, etc., until protoss moves out and wins. Followup is minimal; the bot is about winning right away.

ZZZKBot has a lot of clever little skills, but it is missing some big ones. The weaknesses mean that stronger cheese bots are possible. I don’t think the cheese era is over yet.

Next: CherryPi.

opponent modeling, scouting, and software development

Opponent modeling is coming along, though never as fast as I would like. Steamhammer now records a pretty informative model. Making best use of the model is not as easy—it has a ton of uses. If it works as well as I hope, Steamhammer will gain scary predictive abilities, even against opponents with multiple strategies.

Opponent modeling depends on good scouting. The more Steamhammer finds out about what the opponent does, the better the model. So today I added a new scouting command, "go scout once around" which sends the worker scout on a single circuit of the enemy base and then returns it home. (Usually. There are some funny cases because the waypoint numbering is not quite clean.) In a lot of openings, I used it to replace "go scout location" which only finds the enemy base and doesn’t look to see what’s there. I’m thinking of also adding "go scout while safe".

The command is a minor addition, but while hooking up the wiring I saw awkwardness in the communication among ProductionManager which executes the commands, GameCommander which initiates scouting, and ScoutManager which does the work. I ended up spending the whole afternoon refactoring it for simplicity and testing to make sure I hadn’t broken anything.

Is that what I should be spending my time on? And yet it makes Steamhammer better.

Steamhammer opponent modeling goals

Next up for Steamhammer is opponent modeling. Some version of it will be ready for AIIDE, deadline 1 September. Opening learning methods that we have seen so far implicitly assume that the bot knows a small number of strategies and that one of them is the best counter for the opponent’s play. I want Steamhammer to be able to cope with opponents that are reactive (Iron), multi-strategy (Zia), or both (Krasi0). My goals for opponent modeling are something like this:

Play a wide range of strategies. It has been part of my plan from the first.
Learn from the events of the game, not only from the outcome. That way the bot can learn more and faster.
Learn from one trial: See an opponent’s strategy in the first game and counter it in the second, or at least make a good try. Fixed-strategy bots that Steamhammer knows a counter for should stand at a disadvantage from the second game on.
React by both choosing a counter opening and later choosing a counter unit mix in the middle game.
As more games are completed, recognize the range of the opponent’s play.
If no one opening strategy counters the full range of the opponent’s play, use game theory to estimate the best mix of openings.

It’s fancier than the strategy learning we’ve seen before, but it doesn’t seem hard to me. It’s straightforward, at least in principle. The key element is a model of game strategy. Already last night I wrote a GameRecord class that keeps a simplified description of what happens. That will be the basis for reading the opponent’s strategy over the game. As soon as we have one game record for an opponent, we can check our openings against it to predict which openings will succeed, and we can also use the record to make middle game decisions about what to produce.

Against opponents with a fixed play style, like XIMP, I expect this to be fast-acting poison. Steamhammer won’t react “Oh, carriers, I’d better make some scourge,” it will prepare ahead of time, “4 carriers will be coming in a minute or so, let me make the right amount of scourge to explode them all.” Against opponents that vary their play, it will take more games to formulate effective venom.

If I have time, I’ll do more. There is no chance that I can get all of these ideas implemented in time for AIIDE, but I’ll make what progress I can.

Generalize across opponents, so that an unknown opponent faces play that has proved strong before. If you play a lot of openings, then some of them are weak in most circumstances; so far I have accepted that.
Take more information into account, including map and starting positions.
Integrate scouting information with the opponent model to get the best possible prediction of what the opponent is aiming for this game.
Arrange the known opening lines into a tree, and use the integrated prediction to make decisions at every branch. Opening play will become reactive moment by moment, more like pro play.
Use the same mechanism to decide when to break out of the opening book.
Record the decisions made just after leaving the opening book as a new opening line to be added to the book and possibly played against other opponents. With breaking out plus adding opening lines, Steamhammer gains the ability to invent its own openings.

Steamhammer wants opening learning 3: lurkers

I rewrote Steamhammer’s lurker micro. Now lurkers fight well enough rely on. They don’t burrow and unburrow too often, unlike most bots’ lurkers. They do tend to be clumsy getting into position in a narrow space, and a side effect of not unburrowing too often is that they may cooperate poorly with other units, but one step at a time. I’m adding lurker support to the strategy boss, because it was past time.

So far I’ve written only 1 lurker opening. I’ll start slow. But as soon as there is a choice of what lair tech to aim for, Steamhammer is faced with the decision. Choosing randomly will lead to mistakes.

Against a terran that goes straight infantry, as many do, lurkers are a strong choice. Against factory units, lurkers are a poor choice (though they can be good for drops and surprise attacks). The majority of terran bots go one way or the other, though some can do either or both. If the opponent’s unit choice can be predicted, then Steamhammer can choose an optimized counter-build from the start. It won’t have to lose time scouting and reacting.

Against protoss, both lurkers and mutalisks counter zealots; any lair tech works. And dragoons are useful against both lurkers and mutalisks; zerg wants zerglings and hydralisks to combat dragoons. So the opponent model tells whether to hurry up and get a lair and how soon to take the second gas. It’s less valuable than against terran, but useful.

Against an opponent that doesn’t bring detection, lurkers are strong against any ground unit mix. The opponent model can remember that too.

Steamhammer wants opening learning 2: zealot rushes

Wuli’s zealot rush beats Steamhammer about 4 games out of 5. UAlbertaBot’s usually wins too. Carsten Nielsen’s wins fairly often, and Lukas Moravec sometimes plays a winning zealot rush. Steamhammer’s weakness is not the initial defense, which holds the zealots for quite a long time. It’s similar to games versus McRave; the weakness is in the transition to lair tech. Steamhammer ends up without enough drones.

Someday the strategy boss will be smart enough to understand the situation and make a safe transition. It will be easier if I improve Steamhammer’s defensive skills, which are simpleminded. But those things take time. In the short run, it’s easier to come up with an opening that beats zealot rushes. Then all Steamhammer needs is to know when to play that opening.

Bots should benefit hugely from opponent modeling, because other bots mostly play in stereotyped ways. Wuli and Carsten Nielsen never deviate from their rush builds. But Lukas Moravec, among others, plays more than 1 build. So to counter zealot rushes, a bot also wants plan recognition. If the scout sees 2 or more gates and a lack of other stuff (no gas, forge, or expansion), then the bot had better stay safe against a hard rush. If it sees a forge and cannons, it had better emphasize drones and hatcheries instead. Opponent modeling and plan recognition can be combined: Here are the plans the enemy has been seen to follow (described in some abstract way, such as the timings at which units appear). Based on scouting information, this past enemy behavior is the closest match, so let’s counter it. Since bots are predictable, simple opponent modeling is likely to give big leverage.

More about opponent modeling: If you know the opponent’s range of openings, then you can figure out how best to open yourself—what to do until you get scouting information and can adapt. Lukas Moravec never tries a fast rush, so you don’t need to stay safe against rushes. If the opponent never plays proxies (like most), then you don’t have to scout for proxies. If the opponent plays a mix of fast and slow openings, then you can try to estimate the best counter-mix with game theory. There are a ton of ways to gain advantage by knowing the opponent’s habits.