strategy learning by solving the game matrix

Being unpredictable to your enemies has value. How can you do strategy learning and still remain unpredictable when you should? You can’t simply require randomness, because if one strategy dominates, then you should play it every time. At other times, you may benefit from playing 2 strategies equally, or by playing a normal strategy 95% of the time and otherwise rushing. It depends on what opponent strategies counter each of yours, and the n-arm bandit methods that bots use now don’t understand that. Here’s one way to do it. It’s a step up in complexity from UCB, but not a tall step.

You can record the results of your strategies and the enemy strategies in a zero-sum game matrix, and solve the strategy game (which is the subgame of Starcraft that involves choosing your strategy). In the first cut version, each cell of the matrix is “n times it happened that I played this and the enemy played that, and k of those were wins.” Take the observed probability of win for each cell of the game matrix as the payoff for that cell, and solve the game. The solution tells you how often you should play each of your strategies, assuming that the opponent chooses optimally.

There are a couple different algorithms to solve zero-sum game matrixes fast. I personally prefer the iterative approximate algorithm (here is a simple python implementation), but it doesn’t make much difference.

If you recognize a lot of strategies on both sides, you’ll have many matrix cells to fill in, each of which requires some number of game results to produce a useful probability. 10 strategies for each side already means that a big AIIDE length tournament won’t produce enough data. For a first cut, I recommend recognizing only 2 or 3 categories of enemy strategies, such as (example 1) rush, 1 base play, 2 base play, or (example 2 for zerg) lings and/or hydras, mutalisks, lurkers. Since you’re grouping enemy strategies into broad categories, you don’t need much smarts to recognize them.

You can group your own strategies in a completely different way, if you like. There’s no reason to stick to the same categories. Also, your bot presumably knows what it is doing and doesn’t need to recognize game events as signifying that it is following a given class of strategy.

In this method, you are assumed to choose your strategy before you scout, or at least ignoring scouting information. You can take your time to recognize the enemy strategy, and base the recognition decision on anything you see during the entire game.

How do you get started learning? You might want to start with a matrix of all zeroes and only use the game matrix for decisions after you’ve gathered enough data. Instead, I suggest keeping a global matrix alongside the ones for each opponent, with floating point game counts and win counts in each cell. The global matrix has the totals for all opponents. (Or maybe there’s a global matrix for each opponent race.) When you face a new opponent, initialize the new opponent’s matrix with scaled down game counts and win counts from the global matrix, as if only a small number of games had been played in total (I suggest 1 to 3 times the number of cells in the matrix as a try). You’ll start out playing a strategy mix that is good against the average opponent, and as you accumulate data the mix will shift to specifically counter this opponent.

There are tons of ways to fancy it up if you want to try harder. You could try a variant where you estimate the enemy’s choice probabilities instead of assuming the enemy plays optimally (you’ll need a different solution algorithm). You can keep a larger game matrix in parallel with the small one, and switch to it when you’ve accumulated enough data. Or use a hierarchical method that unfolds categories when there is enough data to distinguish them. You can try a more complicated Bayesian game solution algorithm, which realizes that the numbers in each cell are empirical approximations and takes that into account (“oh, this cell doesn’t have many games, better not rely too strongly on its value”). You can include scouting information in the strategy decision (“well, I can see it’s not a rush, so strike out that option for the opponent”). You can divide your notion of strategy into any fixed number of aspects, and keep independent matrixes for each aspect, so that your strategy choices are potentially random in many different dimensions. The sky is the limit.

Trackbacks

No Trackbacks

Comments

Jay Scott on Wednesday, November 22. 2017:

A disadvantage of the method I suggest is that it may not explore enough. If the global matrix has a dominant strategy and you don’t try the others, you’ll never fill in their cells. The ideal would be a Bayesian exploration/exploitation tradeoff. As a simple method, you could try enforcing a minimum probability to play each strategy, perhaps decreasing the minimum as you accumulate data.

krasi0 on Thursday, November 23. 2017:

My bot's opening learning battles on SSCAIT have already begun. Some notable examples are against Steamhammer, McRave, PurpleWave , Zia bot and tscmoo Zerg. With a very close 3:2, Steamhammer has been giving me the biggest headache it seems :)

IMP on Thursday, November 23. 2017:

This is where it gets really interesting. Since I finally got a setup where I can play out full or partial games against self or other opponents fast, my dream is to let the bot categorize strategies by itself, by collecting input from each game and performing a cluster analysis. Essentially unsupervised learning. My currently preferred method would analyze replays with full information, then deduce the set of potential enemy strategies from scouting information in-game. As usually is the case, the problem already contains a meta-problem in choosing the right subset of inputs from the complete game state per frame. As you suggest too, I define the numbers of clusters to identify as 3 to start with.

Jay Scott on Thursday, November 23. 2017:

I like that idea.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA