strategy learning - 5 | Starcraft AI blog

adapting to the maps

Instead of configuring all of Steamhammer’s opening probabilities by hand, I want it to figure them out for itself. The starting point is data: For each matchup, keep overall statistics of all the openings in a file. Another goal is to have Steamhammer adapt its strategy to maps based on experience. So I thought, why not combine the two? For each matchup and map, keep a file with numbers for all the openings. Or for each matchup, keep a file for all the pairs (map, opening)—that way the bot has the data to generalize between maps.

Someday I also want Steamhammer to adapt its strategy reactions and its tactics to the map. At first it will analyze the map and decide what looks good (“look at that cliff over the mineral line—I should go air or drop”); later it will learn from experience what works well. I don’t expect to get to that for a long time, though.

Steamhammer has over 60 zerg openings (it doesn’t play all of them), and the count will increase. SSCAIT has 14 maps and other tournaments use fewer, so I think I should be ready for on the order of 1000 pairs (map, opening) if I keep them in one file. Each pair would be one line of data, something like “<opening> <map> <# games> <# wins>” and maybe a few more numbers like mean game length, or a statistical summary of evaluations once there is an evaluation function, or whatever else. If I want a cache of map data like number of starting positions and rush distances and so on, to use in generalizing across maps, that would be in a separate file.

In that case there would be 12 matchup files, including data for when the opponent went random: TvT TvP TvZ TvR, PvT PvP PvZ PvR, ZvT ZvP ZvZ ZvR. With up to 1000 lines per file, it seems like a reasonable amount of data. In every game, Steamhammer would read one file in onStart() and write it in onEnd(), which doesn’t seem excessive. There is one complication. If the opponent goes random, and after I give Steamhammer the ability to change its mind on the fly (which I will do), then when Steamhammer finds out the opponent’s race it may want to read that matchup file too. Reading and analyzing that much data may take more than one frame time, so it might have to go into a separate thread. Another solution for when the opponent goes random might be to read 4 matchup files during onStart() when we are allowed more time. Well, when it comes up I’ll figure out a plan. Maybe nothing special will be needed (seek time for a hard drive could exceed one frame time, but reading from SSD is faster).

That’s the data. How to use it? I haven’t decided on details. The opponent model keeps detailed records for each game against a given opponent, including the map. When we play the opponent for the first time, decisions have to be made without the opponent model, solely on the basis of the matchup+map+opening statistics. I’ll figure out a more-or-less sound way to turn the raw numbers into probability-to-play-this values, including an exploration policy. There are choices. After we’ve played an opponent many times, the opponent model will have more information (since it records more data, and it is specific to the opponent), so it can decide on its own. In between, I’ll need some kind of evidence combination procedure to blend the 2 sources of information together. I expect that a simple procedure would work fine, even weighted averaging of probabilities.

Steamhammer’s configuration file will become much shorter. I expect I’ll retain the manual configuration options for openings, for those bot authors who want to do it that way, but Steamhammer itself will rely on the data it collects. Once I have a good set, I’ll distribute it with the bot.

I’m not sure when I’ll get to all this stuff. Maybe or maybe not in the 1.4.x series.

Next: A tree of openings. It ties in.

the opponent model in Steamhammer 1.4

Today is Steamhammer 1.4’s opponent model. The features are:

Recognize some enemy opening plans.
Predict the enemy’s opening plan from experience against this opponent.
React to the enemy’s predicted or actual opening plan.
Choose openings based on the predicted plan.
Decide whether to steal gas. Does experience suggest it may be worth trying against this opponent?

The code that implements the features has these parts:

The plan recognizer, which looks at the current game situation.
Game records, which save information about past games against this opponent, including the recognized plan.
The plan predictor and gas steal decider, which draw conclusions based on the game records.
Strategy reactions to predicted or recognized enemy plans are in various places–they are uses of the opponent model, not parts of the opponent model.
Opening choices to counter the predicted plan can be set in the configuration file.

1. The plan recognizer was first written up in December. It tries to understand the opponent’s intentions during the earliest part of the game. The code is in OpponentPlan.cpp and is less than 150 lines—it is rudimentary.

Unknown - No plan recognized. The plans are not exhaustive, so this is common.
Proxy - Enemy buildings in your base. This doesn’t include cannon contains or other more distant proxies.
WorkerRush - Like Stone.
FastRush - Basic units faster than 9 pool, 8 rax, or 9 gateway.
HeavyRush - 2 gate zealots, 2 barracks marines, etc.
SafeExpand - Static defense before expanding to the natural. Zerg can’t do this.
NakedExpand - Expansion with no static defense.
Turtle - Static defense without expanding.

My early impression of the plan recognizer was that it often failed to recognize plans, but was rarely wrong when it did. With more experience, I think it often misrecognizes plans severely. It’s crude and clumsy. Even so, when it helps it helps a lot. It’s a net win.

2. Game records are handled in GameRecord.cpp. Steamhammer writes a file for each opponent, like many learning bots. Here is one record from a file named om_UAlbertaBot.txt, where “om” stands for opponent model, with annotations so you can make sense of the list of numbers.

1.4                  <- record format version (from the Steamhammer version when it was first used)
ZvRZ                 <- matchup
(2)Destination.scx   <- map
Over10Hatch          <- Steamhammer’s opening
Heavy rush           <- initial predicted enemy plan
Fast rush            <- actual enemy plan
1                    <- we won, 1 or 0
0                    <- frame we sent a scout to steal gas, 0 if never
0                    <- gas steal happened, 1 or 0 (extractor was/was not queued)
2110                 <- frame enemy first scouted our base
3102                 <- frame enemy got first combat units
0                    <- frame enemy got first air units
0                    <- frame enemy got static air defense
0                    <- frame enemy got mobile air defense
0                    <- frame enemy got cloaked units
0                    <- frame enemy got static detection
1726                 <- frame enemy got mobile detection
12309                <- last frame of the game
END GAME             <- end mark

Theoretically, a single file could have records in more than one format. Of course, only one format exists so far. The “frame enemy got” times recorded are the time we first saw such a thing, which may be much later than it actually happened. For example, here we first saw an enemy overlord (a mobile detector) on frame 1726, but it existed the whole game. The END GAME mark is redundant. It gives us a way to recover in case data in the middle of a file is corrupted—we can skip ahead past the next END GAME and continue reading the file from there.

3. The plan predictor and gas steal decider look at the game records near the start of the game.

The gas steal decider was written up in December. Nothing important has changed since then.

The plan predictor runs once at the start of the game. It looks through the game records to see what recognized plans the enemy has played. It is close to the bare minimum: It counts the recognized plans in the game records for this matchup, ignoring unknown plans, and weighting recent games more using a discount factor so that the past is gradually forgotten. That way it reacts quickly when the enemy changes its play. Whether the game was won or lost, whether past predictions were correct or wrong—all the other information is ignored.

If the opponent was random, then when the opponent’s race is found out, the plan predictor runs again. In the first run, it counted all game records where the opponent went random. In the second run, it counts only games where the opponent was the same race as this game. The predicted plan may change.

4. Strategy reactions could be written in anywhere, but so far they are in StrategyManager for terran and protoss, and in StrategyBossZerg for zerg. So far, all the reactions are reactions to the enemy plan, not to any of the other information (even though it has obvious uses). Some strategy reactions are made when the enemy plan is recognized. Some reactions must begin in time, and happen when the plan is predicted. For example, against UAlbertaBot, the initial predicted plan is “Heavy rush”. If Steamhammer finds out that UAlbertaBot is zerg, it knows that zerg follows a different plan. The predicted plan changes to “Fast rush” and efforts to stop the zerglings begin immediately (if Steamhammer is zerg or terran; the protoss reaction is turned off because I didn’t get it working).

Good strategy reactions are the hard part. I find them much more difficult than the opponent model proper. Some of Steamhammer’s reactions are weak.

5. Openings to counter specific enemy plans can be written into the configuration file in a new subsection CounterStrategies. The opening is chosen once at the start of the game and can’t be changed (in this version), so only the initial predicted enemy plan matters.

Steamhammer first looks for counter strategies specific to the enemy race. The name is in the format “Counter [plan name] v[race character]”. The race character is “U” for Unknown if the opponent went random. You can use all the usual features of random opening selection. As you can see in the example, zerg is configured with a wider range of choices.

"Counter Safe expand vT" :
	{
		"Terran" : "14CCTanks",
		"Protoss" : "13Nexus",
		"Zerg" : [
			{ "Weight" :  1, "Strategy" : "FastPool", "Weight2" : 5 },
			{ "Weight" :  9, "Strategy" : "9PoolSpeed" },
			{ "Weight" :  0, "Strategy" : "ZvT_3HatchMuta", "Weight2" : 50 },
			{ "Weight" : 50, "Strategy" : "ZvT_3HatchMutaExpo", "Weight2" : 0 },
			{ "Weight" : 30, "Strategy" : "3HatchLurker" },
			{ "Weight" : 10, "Strategy" : "3HatchPoolMuta", "Weight2" : 0 }
		]
	},

If no counter strategy is found for the specific enemy race, Steamhammer next looks to see if there’s a general one for all races—leave out the “vX” string. This example points to a reusable strategy combo that specifies openings for each race Steamhammer might be playing, no matter the enemy race. The strategy combo feature has not changed.

"Counter Worker rush" : "AntiFastCheese",

If no counter strategy is found, Steamhammer falls back on its usual random opening selection. So if the regular repertoire is best in any given case, don’t specify a counter for that case.

For the following version Steamhammer 1.4.1, I will add at least one large piece to the opponent model. I'm likely to change my mind once I see how this version does in practice, but at the moment I'm thinking of fine-tuning plan prediction and opening selection based on wins and losses. That will raise Steamhammer's performance ceiling; with enough games, it will learn much more. Other possibilities include: • Make use of more of the information in the game records. This opponent doesn't get detection, use cloaked units; that opponent gets air units around frame X, add a spire for scourge just in time. • Have Steamhammer collect data on its own openings so it can generalize: Play a fast/slow opening; play a lurker/muta opening. • Revamp the plan recognizer with a richer set of plans, maybe a hierarchy so it can refine its conclusions over time. • Machine learning for a probabilistic plan recognizer and/or plan predictor. • Restore the originally planned extensive game records, making it possible to predict unit mixes over time throughout the game. • Add the ability to change openings during play instead of deciding ahead of time, so that decisions can be made as late as possible. • Move scout timing into the opponent model alongside the gas steal decider. • Have expansion decisions or hatchery placement influenced by the opponent model. “Hmm, historically in situations like this we do poorly if the third hatchery is at an expansion. Make it a macro hatchery instead.”

I have no shortage of ideas!

Next: Brief remarks on the SSCAIT round of 8.

strategy learning by solving the game matrix

Being unpredictable to your enemies has value. How can you do strategy learning and still remain unpredictable when you should? You can’t simply require randomness, because if one strategy dominates, then you should play it every time. At other times, you may benefit from playing 2 strategies equally, or by playing a normal strategy 95% of the time and otherwise rushing. It depends on what opponent strategies counter each of yours, and the n-arm bandit methods that bots use now don’t understand that. Here’s one way to do it. It’s a step up in complexity from UCB, but not a tall step.

You can record the results of your strategies and the enemy strategies in a zero-sum game matrix, and solve the strategy game (which is the subgame of Starcraft that involves choosing your strategy). In the first cut version, each cell of the matrix is “n times it happened that I played this and the enemy played that, and k of those were wins.” Take the observed probability of win for each cell of the game matrix as the payoff for that cell, and solve the game. The solution tells you how often you should play each of your strategies, assuming that the opponent chooses optimally.

There are a couple different algorithms to solve zero-sum game matrixes fast. I personally prefer the iterative approximate algorithm (here is a simple python implementation), but it doesn’t make much difference.

If you recognize a lot of strategies on both sides, you’ll have many matrix cells to fill in, each of which requires some number of game results to produce a useful probability. 10 strategies for each side already means that a big AIIDE length tournament won’t produce enough data. For a first cut, I recommend recognizing only 2 or 3 categories of enemy strategies, such as (example 1) rush, 1 base play, 2 base play, or (example 2 for zerg) lings and/or hydras, mutalisks, lurkers. Since you’re grouping enemy strategies into broad categories, you don’t need much smarts to recognize them.

You can group your own strategies in a completely different way, if you like. There’s no reason to stick to the same categories. Also, your bot presumably knows what it is doing and doesn’t need to recognize game events as signifying that it is following a given class of strategy.

In this method, you are assumed to choose your strategy before you scout, or at least ignoring scouting information. You can take your time to recognize the enemy strategy, and base the recognition decision on anything you see during the entire game.

How do you get started learning? You might want to start with a matrix of all zeroes and only use the game matrix for decisions after you’ve gathered enough data. Instead, I suggest keeping a global matrix alongside the ones for each opponent, with floating point game counts and win counts in each cell. The global matrix has the totals for all opponents. (Or maybe there’s a global matrix for each opponent race.) When you face a new opponent, initialize the new opponent’s matrix with scaled down game counts and win counts from the global matrix, as if only a small number of games had been played in total (I suggest 1 to 3 times the number of cells in the matrix as a try). You’ll start out playing a strategy mix that is good against the average opponent, and as you accumulate data the mix will shift to specifically counter this opponent.

There are tons of ways to fancy it up if you want to try harder. You could try a variant where you estimate the enemy’s choice probabilities instead of assuming the enemy plays optimally (you’ll need a different solution algorithm). You can keep a larger game matrix in parallel with the small one, and switch to it when you’ve accumulated enough data. Or use a hierarchical method that unfolds categories when there is enough data to distinguish them. You can try a more complicated Bayesian game solution algorithm, which realizes that the numbers in each cell are empirical approximations and takes that into account (“oh, this cell doesn’t have many games, better not rely too strongly on its value”). You can include scouting information in the strategy decision (“well, I can see it’s not a rush, so strike out that option for the opponent”). You can divide your notion of strategy into any fixed number of aspects, and keep independent matrixes for each aspect, so that your strategy choices are potentially random in many different dimensions. The sky is the limit.

AIIDE 2017 what AIUR learned

Here is what AIUR learned about each opponent over the course of the tournament. I did this mostly because it’s easy; I already had the script from last year. But it’s also informative—AIUR’s reactions tell us how each bot played, and may tell bot authors what they need to work on.

The data is generated from files in AIUR’s final read directory. AIUR recorded 111 games against some opponents even though the tournament officially ran for 110 rounds; that is presumably because the tournament did run longer but was cut back to a multiple of 10 rounds for fairness (since there are 10 maps). On the other hand, AIUR’s total game count according to itself is 2938 and according to the tournament results is 2965, so it may have been unable to record some games (it is listed with 53 crashes, so that’s not a surprise). First an overall view, totalling the data for all opponents. We can see that all 6 of AIUR’s strategies (“moods” it calls them) were widely valuable: Every strategy has win rate over 50% on some map size. AIUR’s overall win rate in the tournament was 50.46%.

overall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	159	55%	59	37%	161	44%	379	47%
rush	134	66%	87	55%	185	50%	406	56%
aggressive	107	56%	108	43%	155	30%	370	41%
fast expo	69	45%	84	33%	197	51%	350	46%
macro	46	28%	69	52%	211	37%	326	39%
defensive	352	60%	185	58%	570	55%	1107	57%
total	867	57%	592	49%	1479	48%	2938	50%

2, 3, 4 - map size, the number of starting positions
n - games recorded
wins - winning percentage over those games
cheese - cannon rush
rush - dark templar rush
aggressive - fast 4 zealot drop
fast expo - nexus first
macro - aim for a strong middle game army
defensive - be safe against rushes (not entirely successful)

#1 zzzkbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	16	12%	1	0%	4	0%	21	10%
rush	5	0%	1	0%	1	0%	7	0%
aggressive	3	0%	1	0%	5	0%	9	0%
fast expo	4	0%	1	0%	5	0%	10	0%
macro	3	0%	2	0%	3	0%	8	0%
defensive	3	0%	16	31%	37	24%	56	25%
total	34	6%	22	23%	55	16%	111	14%

AIUR struggled against the tournament leader but was not entirely helpless. Its cannon rush had a chance on 2 player maps and its anti-rush strategy on the others. We see how AIUR gains by taking the map size into account.

#2 purplewave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	0%	4	0%
rush	28	79%	3	33%	40	55%	71	63%
aggressive	1	0%	3	33%	1	0%	5	20%
fast expo	1	0%	11	36%	10	60%	22	45%
macro	1	0%	2	0%	1	0%	4	0%
defensive	1	0%	1	0%	1	0%	3	0%
total	33	67%	21	29%	55	51%	109	51%

AIUR upset #2 PurpleWave, a surprising outcome. The DT rush and the fast expand were both somewhat successful—rather unrelated strategies.

#3 iron	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	1	0%	7	0%	13	0%
rush	5	0%	2	0%	7	0%	14	0%
aggressive	3	0%	2	0%	12	0%	17	0%
fast expo	8	0%	14	7%	9	0%	31	3%
macro	6	0%	1	0%	10	0%	17	0%
defensive	5	0%	2	0%	10	0%	17	0%
total	32	0%	22	5%	55	0%	109	1%

Learning can’t help if nothing you try wins....

#4 cpac	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	4	0%	0	0%	2	0%	6	0%
aggressive	2	0%	1	0%	1	0%	4	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	2	0%	3	33%	2	0%	7	14%
defensive	24	38%	16	69%	48	50%	88	50%
total	34	26%	22	55%	55	44%	111	41%

Cpac was configured to play 5 pool against AIUR. It worked, but AIUR was able to compensate to an extent by playing its anti-rush build.

#5 microwave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	4	0%	8	0%
rush	1	0%	1	0%	4	0%	6	0%
aggressive	20	20%	15	13%	11	0%	46	13%
fast expo	1	0%	2	0%	6	0%	9	0%
macro	1	0%	1	0%	4	0%	6	0%
defensive	1	0%	1	0%	26	12%	28	11%
total	26	15%	22	9%	55	5%	103	9%

Microwave was successful but showed a little vulnerability to surprise zealots dropped in its main. I suspect it’s a tactical reaction issue.

#6 cherrypi	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	1	0%	1	0%	1	0%	3	0%
aggressive	2	0%	2	0%	1	0%	5	0%
fast expo	2	0%	1	0%	1	0%	4	0%
macro	2	0%	1	0%	9	11%	12	8%
defensive	26	4%	16	12%	42	12%	84	10%
total	34	3%	22	9%	55	11%	111	8%

#7 mcrave	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	26	100%	5	60%	45	62%	76	75%
rush	3	67%	9	67%	4	50%	16	62%
aggressive	1	0%	4	50%	1	0%	6	33%
fast expo	1	0%	2	50%	2	50%	5	40%
macro	1	0%	1	0%	1	0%	3	0%
defensive	1	0%	1	0%	2	0%	4	0%
total	33	85%	22	55%	55	56%	110	65%

AIUR upset McRave with its cannon rush, and the dark templar rush did well too. AIUR executes the best cannon rush of any bot, in my opinion. It is a sign that McRave’s play was not robust enough against tricks.

#8 arrakhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	2	0%	3	0%	7	0%
rush	1	0%	1	0%	4	0%	6	0%
aggressive	1	0%	5	60%	3	0%	9	33%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	0	0%	12	50%	38	37%	50	40%
defensive	29	66%	1	0%	4	25%	34	59%
total	34	56%	22	41%	54	28%	110	39%

#9 tyr	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	6	67%	1	0%	1	0%	8	50%
rush	20	100%	1	0%	2	0%	23	87%
aggressive	3	33%	10	20%	1	0%	14	21%
fast expo	1	0%	7	29%	49	35%	57	33%
macro	1	0%	1	0%	1	0%	3	0%
defensive	2	50%	2	0%	1	0%	5	20%
total	33	79%	22	18%	55	31%	110	43%

The DT rush won 100% of the time on 2 player maps and was tried only a few times on larger maps, losing. Was it only unlucky on the 3 and 4 player maps, or is there a real difference? With only 3 games total, we can’t tell from the numbers. It is a weakness of AIUR’s learning: It’s slow because there is so much to learn. The flip side of the slowness is that, over a long tournament, it learns a lot.

#10 steamhammer	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	0%	1	0%	1	0%	4	0%
rush	2	50%	1	0%	2	0%	5	20%
aggressive	1	0%	1	0%	1	0%	3	0%
fast expo	1	0%	1	0%	1	0%	3	0%
macro	0	0%	1	0%	1	0%	2	0%
defensive	27	81%	17	88%	49	67%	93	75%
total	33	70%	22	68%	55	60%	110	65%

I was surprised to see Steamhammer upset by AIUR. I had thought that AIUR was a solved problem. On SSCAIT too, Steamhammer started to show losses against AIUR in September for the first time in months. I may have introduced a weakness in some recent version and AIUR’s learning took that long to find it on SSCAIT. In AIIDE, the tournament was easily long enough.

#11 ailien	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	3	0%	1	0%	2	0%	6	0%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	2	50%	0	0%	3	33%
macro	4	50%	8	75%	1	0%	13	62%
defensive	24	58%	8	88%	49	37%	81	48%
total	34	47%	22	64%	54	33%	110	44%

#12 letabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	7	43%	1	0%	2	0%	10	30%
rush	3	33%	13	54%	43	40%	59	42%
aggressive	5	40%	1	0%	1	0%	7	29%
fast expo	13	46%	3	33%	1	0%	17	41%
macro	1	0%	1	0%	6	33%	8	25%
defensive	1	0%	3	33%	1	0%	5	20%
total	30	40%	22	41%	54	35%	106	38%

I suspect that fast expo was the best strategy on 4 player maps, but how was AIUR to know? A weakness of AIUR’s epsilon-greedy learning, compared to UCB, is that it doesn’t realize that a less-explored option is more likely to be misevaluated.

#13 ximp	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	34	35%	0	0%	1	0%	35	34%
rush	0	0%	0	0%	1	0%	1	0%
aggressive	0	0%	13	8%	52	2%	65	3%
fast expo	0	0%	9	0%	0	0%	9	0%
macro	0	0%	0	0%	1	0%	1	0%
defensive	0	0%	0	0%	0	0%	0	0%
total	34	35%	22	5%	55	2%	111	13%

#14 ualbertabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	0	0%	0	0%	1	100%	1	100%
rush	0	0%	0	0%	0	0%	0	0%
aggressive	0	0%	0	0%	1	100%	1	100%
fast expo	0	0%	0	0%	0	0%	0	0%
macro	0	0%	0	0%	0	0%	0	0%
defensive	34	32%	21	5%	52	27%	107	24%
total	34	32%	21	5%	54	30%	109	26%

What’s up with all those zeroes? AIUR is coded to try each strategy once before it starts making decisions, and that did not happen here. It turns out that AIUR has pre-learned data for Skynet, XIMP, and UAlbertaBot, so its learning in those cases looks different.

#16 icebot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	2	0%	1	0%	4	0%
rush	1	0%	2	50%	3	33%	6	33%
aggressive	3	100%	3	67%	4	50%	10	70%
fast expo	14	100%	3	67%	44	93%	61	93%
macro	4	75%	2	50%	1	0%	7	57%
defensive	9	89%	10	80%	2	50%	21	81%
total	32	88%	22	64%	55	82%	109	80%

#17 skynet	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	13	92%	0	0%	0	0%	13	92%
rush	21	95%	21	90%	51	88%	93	90%
aggressive	0	0%	0	0%	0	0%	0	0%
fast expo	0	0%	1	100%	0	0%	1	100%
macro	0	0%	0	0%	0	0%	0	0%
defensive	0	0%	0	0%	4	50%	4	50%
total	34	94%	22	91%	55	85%	111	89%

#18 killall	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	3	0%	1	0%	5	0%
rush	1	0%	2	0%	1	0%	4	0%
aggressive	1	0%	2	0%	1	0%	4	0%
fast expo	1	0%	3	0%	1	0%	5	0%
macro	0	0%	2	0%	2	50%	4	25%
defensive	30	80%	10	70%	49	76%	89	76%
total	34	71%	22	32%	55	69%	111	62%

#19 megabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	3	67%	1	0%	2	0%	6	33%
rush	2	0%	14	36%	5	0%	21	24%
aggressive	6	67%	4	25%	4	0%	14	36%
fast expo	2	50%	1	0%	4	0%	7	14%
macro	1	0%	1	0%	36	25%	38	24%
defensive	17	76%	1	0%	2	0%	20	65%
total	31	65%	22	27%	53	17%	106	33%

#20 xelnaga	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	9	100%	6	83%	1	0%	16	88%
rush	19	100%	4	75%	1	0%	24	92%
aggressive	1	0%	3	33%	1	0%	5	20%
fast expo	1	0%	4	75%	1	0%	6	50%
macro	2	0%	2	50%	50	36%	54	35%
defensive	2	50%	3	67%	1	0%	6	50%
total	34	85%	22	68%	55	33%	111	56%

Against Xelnaga, AIUR found solutions on 2 and 3 player maps but not on 4 player maps. Is it another case of underexploration?

#21 overkill	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	3	67%	5	40%
rush	2	50%	0	0%	0	0%	2	50%
aggressive	8	100%	4	100%	7	86%	19	95%
fast expo	3	67%	3	100%	7	100%	13	92%
macro	4	75%	3	67%	12	92%	19	84%
defensive	14	93%	11	100%	26	96%	51	96%
total	32	84%	22	91%	55	93%	109	90%

#22 juno	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	0%	14	36%	33	15%	52	19%
rush	3	0%	1	0%	1	0%	5	0%
aggressive	2	0%	1	0%	2	0%	5	0%
fast expo	2	0%	1	0%	16	12%	19	11%
macro	1	0%	1	0%	1	0%	3	0%
defensive	19	21%	4	25%	2	0%	25	20%
total	32	12%	22	27%	55	13%	109	16%

Juno’s cannon contain upset AIUR. Learning didn’t help much, because the problem wasn’t in any of the strategies, it was in AIUR’s poor reactions to cannons appearing in front of its base. It is amusing to watch 2 bots cannon each other when sometimes both get cannons up.

#23 garmbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	2	50%	1	0%	0	0%	3	33%
aggressive	17	94%	17	100%	3	67%	37	95%
fast expo	0	0%	1	0%	23	83%	24	79%
macro	0	0%	1	0%	1	0%	2	0%
defensive	5	80%	1	0%	27	81%	33	79%
total	25	84%	22	77%	55	78%	102	79%

#24 myscbot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	2	50%	4	25%
rush	2	0%	3	67%	2	50%	7	43%
aggressive	3	33%	2	100%	9	78%	14	71%
fast expo	1	0%	2	50%	1	0%	4	25%
macro	4	50%	4	100%	3	67%	11	73%
defensive	23	61%	10	100%	38	79%	71	76%
total	34	50%	22	86%	55	75%	111	69%

#25 hannesbredberg	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	5	80%	3	100%	3	67%	11	82%
rush	2	50%	3	100%	2	50%	7	71%
aggressive	2	50%	2	50%	2	0%	6	33%
fast expo	8	100%	3	100%	9	89%	20	95%
macro	2	50%	4	100%	11	91%	17	88%
defensive	15	100%	7	100%	28	100%	50	100%
total	34	88%	22	95%	55	89%	111	90%

#26 sling	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	2	50%	1	0%	3	33%	6	33%
rush	2	50%	0	0%	1	0%	3	33%
aggressive	12	100%	0	0%	23	96%	35	97%
fast expo	1	0%	5	100%	1	0%	7	71%
macro	3	67%	5	80%	12	75%	20	75%
defensive	5	80%	11	100%	15	80%	31	87%
total	25	80%	22	91%	55	80%	102	82%

Here is another possible case of insufficient exploration. The 4 zealot drop won 100% of the time on 2 player maps and 96% of the time on 4 player maps, but was never tried on 3 player maps (I guess due to a crash, since AIUR tries to play each strategy once). It’s not a severe problem, though, because 3 player maps did have 2 strategies that scored 100%.

#27 forcebot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	1	0%	1	0%	1	0%	3	0%
rush	0	0%	1	0%	1	0%	2	0%
aggressive	3	67%	2	0%	1	0%	6	33%
fast expo	0	0%	1	0%	1	0%	2	0%
macro	0	0%	9	78%	3	67%	12	75%
defensive	29	100%	8	75%	48	94%	85	94%
total	33	94%	22	59%	55	85%	110	83%

#28 ziabot	2		3		4		total
	n	wins	n	wins	n	wins	n	wins
cheese	12	100%	7	86%	36	86%	55	89%
rush	1	0%	1	100%	4	75%	6	67%
aggressive	6	100%	8	88%	6	83%	20	90%
fast expo	1	0%	1	0%	2	0%	4	0%
macro	3	0%	1	0%	1	0%	5	0%
defensive	6	67%	4	75%	6	83%	16	75%
total	29	76%	22	77%	55	80%	106	78%

Next: AILien’s learning.

ZZZKBot’s games

The AIIDE 2017 win rate over time graph shows #1 ZZZKBot slowly and steadily learning throughout the tournament. It’s easiest to see if you turn off most of the lines of the graph (click on the bots in the legend). #2 PurpleWave shows a different pattern: Without prior knowledge, it starts lower, learns fast at first, then more slowly, and seems to level off before the end of the tournament (though it might be only a temporary plateau).

McRave

McRave upset ZZZKBot 64-46, so watching the games versus McRave lets us see the learning algorithm in action. ZZZKBot does not have a prior strategy versus McRave, possibly because none of its 4 strategies can win reliably against the early cannons. (There are ways to bust the cannons, so it is a limitation of ZZZKBot.)

There are 10 maps in the tournament, and they are played one map per round, so it takes 10 round robins to play all maps once. ZZZKBot played its 4 pool against McRave for the first 10 rounds to see how it did on each map. The answer: It won some and lost some, depending on whether it scouted protoss quickly and whether McRave pulled probes in time to shield the cannons when necessary.

On maps where the 4 pool won, ZZZKBot repeated it when the map came up again. On maps where the 4 pool lost, ZZZKBot switched to its next strategy, the overpool speedlings. The speedlings did not usually do well, because McRave had 3 cannons up in time. ZZZKBot tried to follow up with mutalisks, but McRave scouted that coming and was more than prepared.

I watched the sequence of games on Benzene. The speedlings mostly lost, except for occasional games where zerg managed to kill the scout probe and leave McRave in the dark and unable to react in time. But ZZZKBot kept trying the strategy, only occasionally switching back to 4 pool. I didn’t watch every game, but ZZZKBot’s other 2 strategies didn’t seem to come into play at all.

Iron

Versus Iron, ZZZKBot mostly stuck with its 1 hatch hydralisk strategy, an unusual opening. One odd point is that ZZZKBot researched hydralisk range before speed, which is rare and usually seen only when attacking a protoss wall. As we see in the per-map crosstables, ZZZKBot scored poorly against Iron on 2 player maps and tended to win on 3 and 4 player maps. The difference was that on 2 player maps, Iron was expecting to be rushed and was more willing to build a bunker, which held the hydras.

ZZZKBot sometimes fell back to its sunken defense into mutalisks, but that was less effective. Iron could stop the mutalisks, and its vultures were able to find gaps in the sunken defense.

Iron is the only opponent configured for the hydralisk build order. ZZZKBot doesn’t seem to use it at all, otherwise. I think the build must have been specially developed for Iron.

XIMP

ZZZKBot chose to bust XIMP’s cannons with its speedling build. The zerglings commonly arrived when XIMP had 2 cannons, versus McRave’s 3, and XIMP is not as skilled with probes. It didn’t help that XIMP likes to leave its gateway units in its main “to defend against drops.”

XIMP won only 4 games out of 110. In 3 of its wins, it got its first zealot into the fight in time to save the day. In the fourth, XIMP expanded to another base rather than its natural. ZZZKBot brilliantly scouted the undefended nexus and chose to attack it first, which allowed XIMP’s third cannon time to finish.

CherryPi

CherryPi is an interesting case because ZZZKBot’s “prior knowledge” was guesswork, “Guessing it is like tscmooz.” CherryPi consistently played a sunken defense into mutalisk build against ZZZKBot, with 2 to 4 sunkens. Where did this trend come from? In any case, Tscmoo doesn’t play any such build as far as I’ve seen.

ZZZKBot’s learning kicked in. It tried the 4 pool; no good. It tried the speedlings; no good. It tried its own sunken defense into mutalisks, building only 1 sunken, and that worked perfectly. The sunken was often poorly placed so ZZZKBot tended to lose a few drones, but its mutalisks were out faster.

the Steamhammer forks

ZZZKBot chose its sunken defense into mutalisks versus Steamhammer (with 5 sunkens), Microwave (3 sunkens), Arrakhammer (9 sunkens, because Arrakhammer likes to bust with zerglings), and KillAll (1 sunken), with great success. The sunken count is hand-configured for each opponent. I found it frustrating, because Steamhammer knows in principle how to respond: It makes drones instead of zerglings and goes air. Unfortunately, Steamhammer’s strategy adjustments are sloppy, and it almost always got its own mutalisks too late. It did things like start another hatchery when the spire finished, and then a queen’s nest, and then—well, then it was irretrievable. I knew all along that opponent modeling is crucial for tournaments.

conclusion

Watching games, I was struck that ZZZKBot’s builds are not tight. It doesn’t expand, and in the middle game (when it gets that far) it ends up with more drones than its hatcheries can support. It suffers mineral excess that it can’t spend, and gas shortage because it has only 1 geyser.

Its micro is not tight either. ZZZKBot doesn’t have a combat simulator (well, it would probably be in a subroutine, and as the ancient Greeks declared, only straight lines and circles will do). If the 4 pool leaves cannons alive, then the next 2 followup zerglings will die to the cannons, accomplishing nothing. Then the next 2, etc., until protoss moves out and wins. Followup is minimal; the bot is about winning right away.

ZZZKBot has a lot of clever little skills, but it is missing some big ones. The weaknesses mean that stronger cheese bots are possible. I don’t think the cheese era is over yet.

Next: CherryPi.

opponent modeling, scouting, and software development

Opponent modeling is coming along, though never as fast as I would like. Steamhammer now records a pretty informative model. Making best use of the model is not as easy—it has a ton of uses. If it works as well as I hope, Steamhammer will gain scary predictive abilities, even against opponents with multiple strategies.

Opponent modeling depends on good scouting. The more Steamhammer finds out about what the opponent does, the better the model. So today I added a new scouting command, "go scout once around" which sends the worker scout on a single circuit of the enemy base and then returns it home. (Usually. There are some funny cases because the waypoint numbering is not quite clean.) In a lot of openings, I used it to replace "go scout location" which only finds the enemy base and doesn’t look to see what’s there. I’m thinking of also adding "go scout while safe".

The command is a minor addition, but while hooking up the wiring I saw awkwardness in the communication among ProductionManager which executes the commands, GameCommander which initiates scouting, and ScoutManager which does the work. I ended up spending the whole afternoon refactoring it for simplicity and testing to make sure I hadn’t broken anything.

Is that what I should be spending my time on? And yet it makes Steamhammer better.

Steamhammer opponent modeling goals

Next up for Steamhammer is opponent modeling. Some version of it will be ready for AIIDE, deadline 1 September. Opening learning methods that we have seen so far implicitly assume that the bot knows a small number of strategies and that one of them is the best counter for the opponent’s play. I want Steamhammer to be able to cope with opponents that are reactive (Iron), multi-strategy (Zia), or both (Krasi0). My goals for opponent modeling are something like this:

Play a wide range of strategies. It has been part of my plan from the first.
Learn from the events of the game, not only from the outcome. That way the bot can learn more and faster.
Learn from one trial: See an opponent’s strategy in the first game and counter it in the second, or at least make a good try. Fixed-strategy bots that Steamhammer knows a counter for should stand at a disadvantage from the second game on.
React by both choosing a counter opening and later choosing a counter unit mix in the middle game.
As more games are completed, recognize the range of the opponent’s play.
If no one opening strategy counters the full range of the opponent’s play, use game theory to estimate the best mix of openings.

It’s fancier than the strategy learning we’ve seen before, but it doesn’t seem hard to me. It’s straightforward, at least in principle. The key element is a model of game strategy. Already last night I wrote a GameRecord class that keeps a simplified description of what happens. That will be the basis for reading the opponent’s strategy over the game. As soon as we have one game record for an opponent, we can check our openings against it to predict which openings will succeed, and we can also use the record to make middle game decisions about what to produce.

Against opponents with a fixed play style, like XIMP, I expect this to be fast-acting poison. Steamhammer won’t react “Oh, carriers, I’d better make some scourge,” it will prepare ahead of time, “4 carriers will be coming in a minute or so, let me make the right amount of scourge to explode them all.” Against opponents that vary their play, it will take more games to formulate effective venom.

If I have time, I’ll do more. There is no chance that I can get all of these ideas implemented in time for AIIDE, but I’ll make what progress I can.

Generalize across opponents, so that an unknown opponent faces play that has proved strong before. If you play a lot of openings, then some of them are weak in most circumstances; so far I have accepted that.
Take more information into account, including map and starting positions.
Integrate scouting information with the opponent model to get the best possible prediction of what the opponent is aiming for this game.
Arrange the known opening lines into a tree, and use the integrated prediction to make decisions at every branch. Opening play will become reactive moment by moment, more like pro play.
Use the same mechanism to decide when to break out of the opening book.
Record the decisions made just after leaving the opening book as a new opening line to be added to the book and possibly played against other opponents. With breaking out plus adding opening lines, Steamhammer gains the ability to invent its own openings.

Steamhammer wants opening learning 3: lurkers

I rewrote Steamhammer’s lurker micro. Now lurkers fight well enough rely on. They don’t burrow and unburrow too often, unlike most bots’ lurkers. They do tend to be clumsy getting into position in a narrow space, and a side effect of not unburrowing too often is that they may cooperate poorly with other units, but one step at a time. I’m adding lurker support to the strategy boss, because it was past time.

So far I’ve written only 1 lurker opening. I’ll start slow. But as soon as there is a choice of what lair tech to aim for, Steamhammer is faced with the decision. Choosing randomly will lead to mistakes.

Against a terran that goes straight infantry, as many do, lurkers are a strong choice. Against factory units, lurkers are a poor choice (though they can be good for drops and surprise attacks). The majority of terran bots go one way or the other, though some can do either or both. If the opponent’s unit choice can be predicted, then Steamhammer can choose an optimized counter-build from the start. It won’t have to lose time scouting and reacting.

Against protoss, both lurkers and mutalisks counter zealots; any lair tech works. And dragoons are useful against both lurkers and mutalisks; zerg wants zerglings and hydralisks to combat dragoons. So the opponent model tells whether to hurry up and get a lair and how soon to take the second gas. It’s less valuable than against terran, but useful.

Against an opponent that doesn’t bring detection, lurkers are strong against any ground unit mix. The opponent model can remember that too.

Steamhammer wants opening learning 2: zealot rushes

Wuli’s zealot rush beats Steamhammer about 4 games out of 5. UAlbertaBot’s usually wins too. Carsten Nielsen’s wins fairly often, and Lukas Moravec sometimes plays a winning zealot rush. Steamhammer’s weakness is not the initial defense, which holds the zealots for quite a long time. It’s similar to games versus McRave; the weakness is in the transition to lair tech. Steamhammer ends up without enough drones.

Someday the strategy boss will be smart enough to understand the situation and make a safe transition. It will be easier if I improve Steamhammer’s defensive skills, which are simpleminded. But those things take time. In the short run, it’s easier to come up with an opening that beats zealot rushes. Then all Steamhammer needs is to know when to play that opening.

Bots should benefit hugely from opponent modeling, because other bots mostly play in stereotyped ways. Wuli and Carsten Nielsen never deviate from their rush builds. But Lukas Moravec, among others, plays more than 1 build. So to counter zealot rushes, a bot also wants plan recognition. If the scout sees 2 or more gates and a lack of other stuff (no gas, forge, or expansion), then the bot had better stay safe against a hard rush. If it sees a forge and cannons, it had better emphasize drones and hatcheries instead. Opponent modeling and plan recognition can be combined: Here are the plans the enemy has been seen to follow (described in some abstract way, such as the timings at which units appear). Based on scouting information, this past enemy behavior is the closest match, so let’s counter it. Since bots are predictable, simple opponent modeling is likely to give big leverage.

More about opponent modeling: If you know the opponent’s range of openings, then you can figure out how best to open yourself—what to do until you get scouting information and can adapt. Lukas Moravec never tries a fast rush, so you don’t need to stay safe against rushes. If the opponent never plays proxies (like most), then you don’t have to scout for proxies. If the opponent plays a mix of fast and slow openings, then you can try to estimate the best counter-mix with game theory. There are a ton of ways to gain advantage by knowing the opponent’s habits.

Steamhammer wants opening learning 1: McRave

I still have some essential bugs to fix. Also I keep literally forgetting that I promised a public repository next, and having to remind myself; it’s not appealing work. But beyond all that, I’m thinking about the next step.

The next major feature will be a start on opponent modeling and opening learning. As I think about it more, I’m seeing how important a feature it is to add soon, so I don’t regret setting out down the path. I’m going to write a series of posts giving examples to show how key a feature opponent modeling is.

McRave

The current version of protoss McRave doesn’t play that strongly. Its new forge-expand strategy is not polished yet, and I think new weaknesses have been introduced. But even so, it beats Steamhammer. The live Steamhammer always plays low-econ pressure builds versus protoss, and (unlike most protoss bots, even Bereaver) McRave is a sturdy enough defender to hold off the pressure. While trying to pile on pressure, Steamhammer doesn’t make enough drones, and as the game wears on it can’t keep up.

It’s easy enough to change Steamhammer’s behavior. There is a standard opening which is slower than 12 hatch 11 pool and faster than 3 hatch before pool: It is 12 hatch 13 pool, squeezing in 2 extra drones before the spawning pool. Those 2 drones make a substantial difference in the economy, because the earlier you spawn a drone, the more it pays off. The trade is that zerglings come later and the opponent gets a window for early aggression. Anyway, this McRave version doesn’t go for early aggression, so I coded up a quick 12 hatch 13 pool into 3 hatch hydra build. Sure enough, my first draft won most test games against McRave.

I don’t want to make 12 hatch 13 pool a standard build versus protoss, because too many bots go for early attacks. Steamhammer’s low-econ openings are effective against most opponents. But I need economic openings to win against defensive opponents like McRave.

I could specify an enemy specific opening mix versus McRave and immediately turn a bunch of losses into wins, at least for a time. Steamhammer does that versus rushbots and a few others. Arrakhammer does it too; it has hand-made builds to beat Wuli and McRave, and one to give it a chance versus Iron. But it’s not satisfying. It’s not sustainable, because I have to keep updating the hand-made builds by hand as opponents appear and change. And it’s likely to fail in tournaments, because many opponents show up with surprise updates that are specifically intended to throw off opponents which tune against them.

The answer is so learn each opponent’s habits from experience.

Steamhammer versus Tyr

Here’s a curious point about Steamhammer’s ZvT and Tyr by Simon Prins. Tyr has opening learning, and center map BBS is its answer to bots which don’t defend themselves early. The BBS has always beaten Steamhammer’s mutalisk builds and lost to its zergling builds.

In the past, Tyr has tried BBS against Steamhammer and given it up after a while after losing; it concluded that a regular opening was better. Past Steamhammer versions played zergling openings 25% of the time, which was apparently enough to deter BBS even though Tyr’s slower play didn’t consistently win 75% of the time (it varied by version).

This Steamhammer version plays zergling openings ZvT 20% of the time, because the mutalisk openings are improved more (that was my thinking, at least). Tyr apparently detected the shift in game results, and now it plays BBS every game and wins 4 out of 5, a huge upswing. Can improved play can lead to worse results when it highlights remaining weaknesses for the opponent’s learning to exploit? It could also be because Tyr was updated recently. Or it could be a chance change due to the interaction of opening learning, random choice of builds by Steamhammer, and historical changes in Steamhammer’s performance.

To fix the weakness I thought of a simple adaptation, and I hope to try it out in an upcoming version. Steamhammer is prepared for early pressure versus zerg or protoss opponents, but against terran it tries to exploit the tendency of most terran bots to sit back and macro for a while. So it only has to adapt in the case of early marine pressure, as played by Tyr (with BBS) and the latest tscmoo (with an academy rush) and a number of weaker marine bots like Kruecke and KaonBot. Steamhammer should have better chances to survive if it breaks out of its prepared build when it recognizes the early pressure, and lets the strategy boss do its default thing. I doubt my simple idea is good enough by itself for all cases, but I’ll try it.

Overkill’s new learning 3 - one model or many?

My first question about Overkill’s model is: One model for all opponents, or one model for each opponent? It turns out that the answer is: It depends. It checks curMode.

enum developMode { Develop, Release };
extern developMode	curMode;

Here is an example. Bits of code like this show up for each of the 3 data files used to store the model and learning information.

	if (curMode == Develop)
	{
		filePath = “./bwapi-data/write/RL_data”;
	}
	else
	{
		string enemyName = BWAPI::Broodwar->enemy()->getName();
		filePath = “./bwapi-data/write/RL_data”;
		filePath += enemyName;
	}

In the AIIDE 2016 competition, curMode is set to Release. It looks as though each opponent gets its own model, learned independently. But not learned from scratch!

My idea that Overkill has a general model turned out true. (I may have read it somewhere and forgotten where.) When it plays an opponent for the first time, it uses a model defined in file ModelWeightInit.h as the initial model, and learns starting from there. I don’t see any information about how the initial model was created. It may have been trained by playing against a variety of opponents, in Develop mode.

You could say that the initial model is “how to play Starcraft” and the refined model made for each opponent is “how to beat this opponent.” The same learning system can be used both offline to learn the game and online to model the opponent.

How well did opponent modeling work? We can look at Overkill’s graph of win rate over time in the AIIDE 2016 results. Its winning rate after 20 rounds was 0.59 and after 90 rounds was 0.62. The curve shows a fairly steady rise, and visually it’s convincing that Overkill learned something about its opponents, but it improved its win rate only a little. The unsteady learning curve we saw in the description suggests that Overkill might have eventually learned more if the tournament had gone on long enough—maybe.

Next: The features of Overkill’s model.

Overkill’s new learning 1 - the description

Overkill has new strategy learning unlike what we’ve seen before. I’m excited to look into it.

Author Sijia Xu updated Overkill for AIIDE 2016 with a machine learning method to learn what units to build after the opening, depending on the game situation. As far as I know, no other bot has anything similar. So far, this version of Overkill played only in AIIDE 2016, where it finished 7th; it’s not playing on SSCAIT or anywhere else that I know of. But the code is in Sijia Xu's github repository along with a description.

overview

I’ll start with the description. To quote:

Overkill currently contain two main AI system:

building unit selection. Using reinforcement learning model to choose the most proper unit to build(currently only three types of unit to choose: zerglings,mutalisk,hydralisk) according to current game status, and saving the match info to continually improve the RL model itself.

opening strategy selection. Using UCB1 to select the best opening strategy at the the beginning of the game according to the previous match result against this opponent.

In other words, the opening strategy learning that I went over before remains the same, though according to the readme the openings have been trimmed back to let the unit selection model make more decisions.

It sounds as though Overkill’s tactical abilities are not changed. Overkill 2015 plays with zerglings, mutalisks, and hydralisks and nothing else, switching from one to the next in a fixed order when simple heuristics are triggered. Overkill 2016 plays with the same units but can change between them however it likes depending on the way the game develops. Its play should be more flexible and fun.

My thought: Unit selection is an important skill, but it is only one skill. Unit selection among only 3 units is an interesting experiment, not a way in itself to score a lot of wins. The most basic zerg skill isn’t knowing when to make which fighting unit, it’s knowing when to make drones, and that is not learned here. With no ability to use scourge, lurkers, or any hive tech, and hand-coded tactics that maneuver each type of unit independently of the other types, Overkill does not have a high ceiling on its skill level, no matter how smart the unit selection model grows. That’s why it only came in 7th, I guess. It’s the academic attitude at work, tackling one problem at a time and not worrying about full generality until later. And I’m intensely interested to see what lessons we can draw from the experiment.

the method

Now I’ll cover the “model”, “feature” and “learning methods” sections from the description. They are written to be understandable to AI people, and in fact I can understand the outline clearly enough. But it’s probably incomprehensible unless you know the jargon, so I’ll try to explain step by step in followup posts.

The basic algorithm is Q-learning, a fundamental reinforcement learning algorithm. Q-learning is well worth knowing if you care about AI at all, so I’ll write a post to explain it. Q-learning is a plug-in algorithm where you can plug in different kinds of models which are learned by another algorithm.

Overkill plugs in a model of a linear fit to a large number of binary game features, which is exactly the same kind of model that I briefly mentioned in breakthrough game programs as used in the evaluation function of the othello program Logistello. The description says that Overkill has about 4000 features, which (believe it or not) is a small number for this kind of model! Having more features might make it too slow. The features are things like which units and tech each side has, and so on. I’ll read the code and find out details.

Overkill does its linear fit to the features not using regression like Logistello but using stochastic gradient descent or SGD. Versions of SGD are state of the art and heavily used in deep learning (and more). SGD is important, but it’s not fundamental so I don’t intend to look at it.

The description does not tell us whether Overkill includes “who is my opponent?” in its model. Looking at the I/O files from AIIDE 2016, I see files like “feature_gradientAiur”, “feature_valueAiur”, and “RL_dataAiur” for each opponent. It looks as though Overkill learns a different model for each opponent, although it could combine the opponent data into one model. I’ll read the code and find out.

the results

The description shows us 3 graphs from one experimental run of Overkill playing 600 games versus IceBot. That’s a lot of games. The graphs show Overkill’s model appearing to converge, slowly and unsteadily, and improve its already-high winning rate.

My thoughts: 1. One experiment does not tell us whether this works. It seemed to work against IceBot. 2. The learning rate was slow. It took hundreds of games. As I pointed out in the generalization for strategy learning post, reinforcement learning doesn’t stand out as a good choice when you have limited data. 3. Therefore Overkill’s learning seems suitable only for a very long tournament. If Overkill learns a model for each opponent, then the 90 games against each opponent in AIIDE 2016 were not enough. If it learns one overall model, then there may be enough games but an experiment against one opponent doesn’t tell us.

I don’t see where this is written, but somewhere I got the idea that for AIIDE 2016, Overkill started with a general model trained against some range of opponents, its prior knowledge. Then it refined the prior knowledge with experience during the tournament. It seems like a good idea. On the other hand, Overkill’s AIIDE 2016 winning rate of 69/87 (79%) doesn’t match the expected >90% win rate from the graph if Overkill had been pre-trained against IceBot. So I’m not sure. A general model plus 90 games for each opponent could produce the results we see—it’s plausible.

Next: Q-learning.

generalization for strategy learning

This post is for people who want to do strategy learning better than we have seen so far, but who haven’t married AI and are looking for a few hints on what’s good to try. I assume the simplest case: The bot has a fixed set of strategies and wants to choose one based on experience (but possibly influenced by scouting info). Similar ideas work in more complicated cases, too.

In past posts I looked at the strategy learning done by Overkill and AIUR. Overkill learns (strategy, opponent) and AIUR learns (strategy, opponent, map size). I found out that, on the one hand, AIUR learned more by including the map size, but on the other hand, AIUR learned more slowly and didn’t have time to explore the possibilities thoroughly and find the best. It would be nice to learn (strategy, opponent, opponent race, map, player positions, any other results of early scouting), but how could a bot possibly learn so much?

Overkill and AIUR learn tables of outcomes. Tabular learning is slow learning because it does not generalize. AIUR may win with its cannon rush on a 2-player map against opponent A and opponent B, but when it faces opponent C on a 2-player map it starts with a blank slate. It doesn’t try cannon rush first because that worked against other opponents, it says “well gosh darn I don’t know a thing yet, I’ll pick randomly.” And again, when nexus-first wins against opponent D on 2-player and 3-player maps and AIUR faces opponent D on a 4-player map for the first time, it’s “well gosh darn.”

Tabular learning is, well, it’s the only kind of learning which does not generalize. Tabular learning is a form of rote memorization, and all the countless other learning algorithms try to generalize in one way or another. That doesn’t mean you should learn strategies using any random algorithm you have lying around, though. You can, but it’s best to look for one that suits the problem.

The problem requirements are not too complicated.

1. Our algorithm’s input will be a set of past observations like (strategy, opponent, any other data you want to include, game result). The output will be the strategy to play this game, where you don’t know the game result yet. Or at least the output will have enough information to let you decide on a strategy. Estimated-probability-to-win for each strategy choice is one idea.

2. Some of the inputs, like the opponent, are categorical (as opposed to numerical). We need an algorithm that likes categorical inputs. Some work best with numerical inputs. One way to look at it is: Fitting a curve from opponent A to opponent C doesn’t tell you anything about opponent B, so you don’t want an algorithm that’s always trying that.

3. The algorithm should work well with small to moderate amounts of data. In the first game of the tournament, with no observations made yet, you’ll pick a strategy from prior knowledge (pick randomly, or pick one that did well in testing, or a combination). In the second game, you want to consider your prior knowledge plus 1 data point. The prior knowledge stops some algorithms from saying “we lost the first game, by generalization all strategies always lose.” You want the 1 data point to be important enough to make some difference, and not so important that it immediately overrides prior knowledge. And so on to thousands or tens of thousands of data points if the tournament is that long (it’s hardly likely to be longer); by then, prior knowledge should not make much difference.

4. You also want to consider exploration. If you always play the strategy that looks best (a “greedy algorithm”), then you may be overlooking a strategy that plays better but happened to lose its first game, or that never got tried. You have to explore to learn well.

My suggestions. First, exploration is not hard. Epsilon-greedy (see multi-armed bandit) should always work for exploration. There may be better choices in particular cases, but you have a fallback. You can do better if the algorithm outputs not only an estimated win rate but also its confidence in the estimate: Preferentially explore options which have low confidence.

Second, prior knowledge is not too hard either. You can always encode your prior knowledge as a set of fictional data points, fish story style. Again, there may be better ways, especially if you go with a Bayesian algorithm which by definition includes priors.

The requirement to work with varying but mostly modest amounts of data means that batch algorithms that analyze the dataset as a whole are preferred. Incremental algorithms that analyze one data point at a time, like the huge family of reinforcement learning algorithms that includes most neural networks, are by and large less suitable; they have a harder time controlling the level of generalization as the amount of data increases, to learn fast enough without overfitting. It’s not that reinforcement learning won’t work, or even that it can’t be made to work just as well, but without extra knowledge and care you can expect it to be less effective or less efficient. I was surprised to see the new version of Overkill use reinforcement learning for unit production decisions—it may be a good choice, but if so it’s not obvious why.

I suggest boosted decision trees. Decision trees have good generalization properties with small and modest amounts of data, and adding a boosting algorithm increases their accuracy. Since there’s not too much data and strategy learning happens once per game, speed should not be a problem. (If it does get too slow, then discard the oldest data points.) Go look up code to implement it and check the license, you know the drill.

It’s just a suggestion. Other choices may be better.

In a little more detail, at the end of each game the bot records the result with whatever other information it wants to learn from: Opponent, race, map, etc. At the start of each game it reads the records and runs its learning algorithm from scratch (it doesn’t have to or want to remember what it thought it knew last game). You may want to vary this depending on tournament rules about when learning data becomes available.

With the learned model in hand, the bot can look at the game situation, run it through to find out what strategies seem best, and combine that with the exploration policy to decide what strategy to play.

What if some inputs are not known yet? Say the opponent is random and your scout didn’t find out the enemy race before it’s time to decide on the initial strategy. If the learning algorithm estimates win rates, here’s one way: Run the game situation through three times, once with each race, and combine the results. There are different ways to combine the results, but averaging works. The same for other information that you don’t know yet; run through each possibility that hasn’t been excluded (“I know they’re not at that base, but then my scout died”). If there’s too much unknown info to test all possibilities against your learned model, then limit it to a statistical sample.

Generalizing across opponents. If you have an opponent model, you can do better. If you’re able to recognize characteristics of your opponents, then you can remember the information in an opponent model and use the models to generalize across opponents. It’s a way of learning counter-strategies alongside counter-this-opponent strategies. I think opponent modeling should make strategy learning more effective. “Oh, opponent X went dark templar and I won with strategy A. Now I’m fighting opponent Y, which has been known to go dark templar too.”

opponent random?
opponent race
how rushy/all-in? (consider the earliest attack, or the early economy)
when (if ever) did opponent make unit X (for each X)?
when did opponent get upgrade Y (for each Y)?
when did opponent first use spell Z (for each Z)?
or in less detail: when did opponent get air units/detection/etc.?
how soon/often did opponent expand?
did the opponent scout my whole base?
was the opponent seen to take island bases?
was the opponent seen to attack island bases?

Or whatever you think might help. Since there’s never a ton of data, the huge number of inputs in the list might be too much.

Generalizing across maps can follow the same kind of idea: Number of players on the map, air distance and ground distance between bases, and so on. Adapting your strategy to the map is basic to Starcraft strategy, and bots are weak at it.

Zia and its coat of many strategies

I had been hoping that Zia would start to choose between its openings, and now that it has I want to see how it’s doing. So I watched a bunch of replays. It’s using strategy learning, though I can’t say in what form. I predicted that choosing between its strategies would be advantageous, and it’s true to an extent.

With more variety, Zia has become more entertaining to watch. I like it. Zia plays these openings that I’ve seen:

5 pool
9 pool
9 pool speed
12 hatchery

I didn’t catch it playing overpool or 12 pool, which you might expect to be common.

Zia’s opening chat message gives a hint about its opponent model. It says “Nice to meet you!” for new opponents and “Hi again!” for opponents it has met before. And it either predicts a “harsh game” or claims “I may overwhelm you.” I think it picks the second message when it believes it has found a strong counter strategy.

Against opponents with a single strategy which is directly countered by one of these openings, like ZZZKbot’s 4 pool (hard countered by 9 pool plus a sunken so that the trickle of attacking lings has no chance whatever), Zia seems to learn the counter and should then win every game. Zia even managed to find a strategy that gives it a chance against IceBot—Zia won a game which brought out weaknesses in both bots, weaknesses I didn’t realize IceBot suffered from.

And I see signs that Zia adapts after the opening. For example, I saw it add a spire when it needed scourge for air defense. I get the impression that it decides flexibly between hydralisks, mutalisks, and lurkers for the middle game—at least it’s not hardwired, maybe it’s random, I hope it’s learned. I have seen it play 12 hatch, 11 pool, 10 gas and also 12 hatch, 11 gas, 10 pool; I hope it’s foreseeing how much gas it will want to boot up its future unit mix.

Playing many openings does have a disadvantage: It’s harder to play all of them well. It’s not enough to know the build, you have to know how to play it, and it adds up to a lot of knowledge. The worst is Zia’s 9 pool speed opening, which it plays in a strange way as a late zergling all-in: It makes 100% zerglings until it attacks around supply 20-24; if it fails, Zia doesn’t have enough economy for the middle game. (I expect a 9 hatch build would strike harder if you want to play that way.)

Zia still plays poorly overall, if you ask me. It needs to brush up on skills like not sending drones through the enemy army. It needs better scouting (it doesn’t send out its initial overlord), better tactics (no, don’t run up the ramp to fight the bunker! Hit the SCVs in the expansion first!), better engagement skills (big groups of zerglings should surround before attacking), and better micro (in lings versus zealots, retreat a ling that will die in 1 more hit). And stuff. It’s a hard game.

Zia’s current description is “Implementing more strats . . .” I guess the author has the most fun with that, which is all that really matters, but it’s not the way to a winning bot. Breadth of skills, not depth of skills: You gain more by reducing your weaknesses than by increasing your strengths. Zia already has relative strength in strategy, and will improve most with other skills.

Hmm, I should write a post about The Winning Attitude for authors of game programs. Only for those who seek the winning attitude, of course; it’s optional.

Tomorrow: Novelty maps.

Tscmoo terran apparent neural network output

I was watching the new Tscmoo terran with its reputed neural networks.

Hmm, what are those red and blue dots?

detail of apparent neural network output

I read that as the output of the neural network. The dot diagram is incomprehensible unless we know about the network layout. The text is the interpretation; it looks like strategy instructions or hints to the rest of the program. I timed a couple of updates and found them 15 seconds apart, which fits with strategy information.

I can’t tell what the details mean. How can the army composition be tank-vulture if you open with two starports (see those wraiths on the screen)? Is that a prediction for the opponent, maybe? What does “support_wraiths” mean, since I didn’t notice the wraiths seeming to support or be supported by anything?