As usual, I have a mass of final comments.
unused features
After a few days to mull it over, I think the unused features are left over from experiments. Sijia Xu had to try stuff out to see what would work, and not everything was worth including. More features mean that potentially more knowledge can be learned, but also that learning is slower. Leaving unused features in the code makes it easier to switch them in and out, even if you end up computing and saving features that never make a difference. I take it as a sign of the academic attitude.
choice of actions
Overkill uses its Q-learning to choose among only 3 possible actions, which combat units to build. It’s an important decision, but only one decision in a sky full of them.
Bots could use similar methods to learn more complex decisions, like tactical decisions: How to group units into squads and what orders to give the squads. Or complex build order choices. Or unit micro decisions. In those cases you can’t list the decisions ahead of time.
Overkill independently learns the Q values for each action. In effect, it learns 3 separate evaluations, Q(situation, zergling action), Q(situation, hydra action) and Q(situation, muta action). If zerglings and hydras are similar in some ways because they’re both ground units, or hydras and mutas are similar because they’re both ranged, then Overkill’s learning can’t take advantage of that; it can’t generalize across actions.
With many actions which are unknown in advance, you need a model which generalizes over the actions. In effect, you want to fold the actions (or their predicted results) into the situation. You can add your potential decisions as primitive features of the situation, treating “these guys are going over there to attack” the same way you treat “the enemy has 3 bases”. Or you can create a separate model of the effects your decisions have on the game, and use that model’s results as the situation.
Something along these lines is what DeepMind wants to do. They’ll use a neural network with deep learning for their model, but the idea is closely related to what Overkill does. From the AI point of view, it’s a fancier version of the same underlying idea, with a closely related top-level learning algorithm.
the linear model
Overkill’s model is a linear function, in the math sense. It sums up a lot of weights. But because of how the weighted features are made, the evaluation of actions is not linear overall with respect to the primitive game features: Overkill’s features themselves are nonlinear, in 2 ways. First, they are not primitive features, they are combination features. To simplify, it’s not “I have zerglings, that’s worth weight w[z], you have goliaths, that’s worth w[g], result w[z] + w[g].” The 2 features are (I’m still simplifying) combined into 4, “I have no zerglings, you have no goliaths, weight w[-z, -g]; I have zerglings, you have no goliaths, w [+z, -g], ....” Second, if the underlying primitive feature like “I have n zerglings” takes a range of values, then Overkill breaks it into a set of binary features, “I have 0 to 6 zerglings, w[0 to 6]; I have 7 to 12 zerglings....” If the value of having n zerglings (with respect to taking a given action) does not increase linearly with n, then the weights will be different to reflect that. (So after combination you get a lot of weights, w[0..6 z, 0..6 g], w[7..12 z, 0..6 g], ....)
I should pause to explain terminology. A “primitive” feature is one that comes directly from the game, like “I have n zerglings.” Other features can be called “derived” features. Overkill’s model is linear with respect to Overkill’s derived features, but not linear with respect to the primitive game features. That’s what I mean by “nonlinear features.”
Starcraft strategy is a highly nonlinear domain (with respect to the primitive features). But if you have nonlinear features, and you have a large enough number of them, then you can re-encode the nonlinear domain as a linear model—not exactly, but as accurately as you like.
Is the linear model with nonlinear binary features a good model? For many games, it is proven good by experience. For Starcraft, I don’t know. Can you find the features you need to play well? Are there few enough that you can learn them in a reasonable time? Sijia Xu has made a start on finding the answer.
offline versus online learning
Overkill’s Q-learning seems to make sense for offline learning, “how to play Starcraft.” It seems to be too slow for online learning of opponent models, “how to beat this opponent.” There are ways to do both at once.
One idea is to use different models for the 2 cases: Learn one big offline model with a huge number of features, to play well on average against unknown opponents. And also learn a small model with few features for each opponent. The small model won’t be able to learn as much knowledge, but it will converge faster. The small opponent model tells you, “here is how this opponent differs from the average opponent.” To make decisions during play, add together the results of the models.
Another idea is to use a hierarchical model. I don’t want to go into detail, but the basic idea is have high-level features with low-level features under them. The high-level features are few and can be learned quickly, so that the model’s learning starts to be useful quickly. The low-level features are many and take more time, so that the model can eventually learn a lot of knowledge if you keep feeding it data.
To learn an opponent model empirically, you have to explore. It’s mathematically required! And exploring an unfamiliar action will sometimes be bad play. We already have proof from opening strategy learning that opponent modeling can be worth it overall; good decisions later can make up for bad play caused by exploring early. But how far can we push it? We can learn to choose from among a few opening strategies and it helps, but can we make a sophisticated opponent model or is the cost of exploring too high? Humans can learn powerful opponent models, but humans don’t rely purely on the statistics—we also reason about our models in a way that is beyond the state of the art in AI.
judgment
As Martin Rooijackers put it, humans have better judgment than bots—so far. And how can we code judgment? There are only 2 fundamental techniques, knowledge and search. To match human judgment we’ll need some combination of knowledge and search.
In a realtime domain like Starcraft, I believe it makes sense to go for knowledge first. And if you go for knowledge first, then I believe hand-coding will not get you there. You’ll fail to hand-code enough knowledge for good judgment. Machine learning is the path, and Sijia Xu has taken a step down it ahead of everyone else.
Next: More on offline versus online learning.