This post is for people who want to do strategy learning better than we have seen so far, but who haven’t married AI and are looking for a few hints on what’s good to try. I assume the simplest case: The bot has a fixed set of strategies and wants to choose one based on experience (but possibly influenced by scouting info). Similar ideas work in more complicated cases, too.
In past posts I looked at the strategy learning done by Overkill and AIUR. Overkill learns (strategy, opponent) and AIUR learns (strategy, opponent, map size). I found out that, on the one hand, AIUR learned more by including the map size, but on the other hand, AIUR learned more slowly and didn’t have time to explore the possibilities thoroughly and find the best. It would be nice to learn (strategy, opponent, opponent race, map, player positions, any other results of early scouting), but how could a bot possibly learn so much?
Overkill and AIUR learn tables of outcomes. Tabular learning is slow learning because it does not generalize. AIUR may win with its cannon rush on a 2-player map against opponent A and opponent B, but when it faces opponent C on a 2-player map it starts with a blank slate. It doesn’t try cannon rush first because that worked against other opponents, it says “well gosh darn I don’t know a thing yet, I’ll pick randomly.” And again, when nexus-first wins against opponent D on 2-player and 3-player maps and AIUR faces opponent D on a 4-player map for the first time, it’s “well gosh darn.”
Tabular learning is, well, it’s the only kind of learning which does not generalize. Tabular learning is a form of rote memorization, and all the countless other learning algorithms try to generalize in one way or another. That doesn’t mean you should learn strategies using any random algorithm you have lying around, though. You can, but it’s best to look for one that suits the problem.
The problem requirements are not too complicated.
1. Our algorithm’s input will be a set of past observations like (strategy, opponent, any other data you want to include, game result). The output will be the strategy to play this game, where you don’t know the game result yet. Or at least the output will have enough information to let you decide on a strategy. Estimated-probability-to-win for each strategy choice is one idea.
2. Some of the inputs, like the opponent, are categorical (as opposed to numerical). We need an algorithm that likes categorical inputs. Some work best with numerical inputs. One way to look at it is: Fitting a curve from opponent A to opponent C doesn’t tell you anything about opponent B, so you don’t want an algorithm that’s always trying that.
3. The algorithm should work well with small to moderate amounts of data. In the first game of the tournament, with no observations made yet, you’ll pick a strategy from prior knowledge (pick randomly, or pick one that did well in testing, or a combination). In the second game, you want to consider your prior knowledge plus 1 data point. The prior knowledge stops some algorithms from saying “we lost the first game, by generalization all strategies always lose.” You want the 1 data point to be important enough to make some difference, and not so important that it immediately overrides prior knowledge. And so on to thousands or tens of thousands of data points if the tournament is that long (it’s hardly likely to be longer); by then, prior knowledge should not make much difference.
4. You also want to consider exploration. If you always play the strategy that looks best (a “greedy algorithm”), then you may be overlooking a strategy that plays better but happened to lose its first game, or that never got tried. You have to explore to learn well.
My suggestions. First, exploration is not hard. Epsilon-greedy (see multi-armed bandit) should always work for exploration. There may be better choices in particular cases, but you have a fallback. You can do better if the algorithm outputs not only an estimated win rate but also its confidence in the estimate: Preferentially explore options which have low confidence.
Second, prior knowledge is not too hard either. You can always encode your prior knowledge as a set of fictional data points, fish story style. Again, there may be better ways, especially if you go with a Bayesian algorithm which by definition includes priors.
The requirement to work with varying but mostly modest amounts of data means that batch algorithms that analyze the dataset as a whole are preferred. Incremental algorithms that analyze one data point at a time, like the huge family of reinforcement learning algorithms that includes most neural networks, are by and large less suitable; they have a harder time controlling the level of generalization as the amount of data increases, to learn fast enough without overfitting. It’s not that reinforcement learning won’t work, or even that it can’t be made to work just as well, but without extra knowledge and care you can expect it to be less effective or less efficient. I was surprised to see the new version of Overkill use reinforcement learning for unit production decisions—it may be a good choice, but if so it’s not obvious why.
I suggest boosted decision trees. Decision trees have good generalization properties with small and modest amounts of data, and adding a boosting algorithm increases their accuracy. Since there’s not too much data and strategy learning happens once per game, speed should not be a problem. (If it does get too slow, then discard the oldest data points.) Go look up code to implement it and check the license, you know the drill.
It’s just a suggestion. Other choices may be better.
In a little more detail, at the end of each game the bot records the result with whatever other information it wants to learn from: Opponent, race, map, etc. At the start of each game it reads the records and runs its learning algorithm from scratch (it doesn’t have to or want to remember what it thought it knew last game). You may want to vary this depending on tournament rules about when learning data becomes available.
With the learned model in hand, the bot can look at the game situation, run it through to find out what strategies seem best, and combine that with the exploration policy to decide what strategy to play.
What if some inputs are not known yet? Say the opponent is random and your scout didn’t find out the enemy race before it’s time to decide on the initial strategy. If the learning algorithm estimates win rates, here’s one way: Run the game situation through three times, once with each race, and combine the results. There are different ways to combine the results, but averaging works. The same for other information that you don’t know yet; run through each possibility that hasn’t been excluded (“I know they’re not at that base, but then my scout died”). If there’s too much unknown info to test all possibilities against your learned model, then limit it to a statistical sample.
Generalizing across opponents. If you have an opponent model, you can do better. If you’re able to recognize characteristics of your opponents, then you can remember the information in an opponent model and use the models to generalize across opponents. It’s a way of learning counter-strategies alongside counter-this-opponent strategies. I think opponent modeling should make strategy learning more effective. “Oh, opponent X went dark templar and I won with strategy A. Now I’m fighting opponent Y, which has been known to go dark templar too.”
- opponent random?
- opponent race
- how rushy/all-in? (consider the earliest attack, or the early economy)
- when (if ever) did opponent make unit X (for each X)?
- when did opponent get upgrade Y (for each Y)?
- when did opponent first use spell Z (for each Z)?
- or in less detail: when did opponent get air units/detection/etc.?
- how soon/often did opponent expand?
- did the opponent scout my whole base?
- was the opponent seen to take island bases?
- was the opponent seen to attack island bases?
Or whatever you think might help. Since there’s never a ton of data, the huge number of inputs in the list might be too much.
Generalizing across maps can follow the same kind of idea: Number of players on the map, air distance and ground distance between bases, and so on. Adapting your strategy to the map is basic to Starcraft strategy, and bots are weak at it.