a couple needed improvements for the opponent model
There are 2 changes I want to make to Steamhammer’s opponent model. Both of them will take some reading up and thinking.
1. The exploration algorithm is not good enough in practice. Its purpose is to try different openings to counter the opponent. Steamhammer uses epsilon-greedy (that is, a fixed probability) to decide when to explore, and to decide what to explore it uses an ad hoc strategy of at first trying openings to counter the opponent’s recognized plan, then with more games casting a wider net.
The problem with deciding when to explore is that Steamhammer knows more openings than it has time to try. With more choices than games, we can model the situation as an infinite-arm bandit. I haven’t read up on it yet. Here are a couple papers I found in a quick look for infinite-arm bandit solutions, if you want to get ahead of me.
Algorithms for Infinitely Many-Armed Bandits (Wang, Audibert, and Munos, 2009)
Simple regret for infinitely many armed bandits (Carpentier and Valko, 2015)
The problem with deciding what to explore can be solved with opening data. Data telling which openings work across all opponents (versus different races, on different maps, with opponents following different plans) can be used in solving both problems. But it will take a long time to collect the data; expect it late this year at the earliest, likely later.
2. It doesn’t mix up its openings enough. Steamhammer’s classic randomized openings were effective. Now it often chooses an opening depending on what the opponent has done in recent games, and can get stuck on fixed responses after the opponent has switched. For example, Proxy was updated to follow a mutalisk plan instead of a zergling plan in ZvZ, but Steamhammer is still responding to the zergling plan, and losing every game because of it.
Steamhammer tries to tell whether the opponent follows a fixed plan, or varies. It doesn’t matter whether the variation is random, or reacts to past games, or reacts to events in the current game, or is due to the bot being updated; the opponent is playing a fixed plan, or not. For example, if Steamhammer tries a 5 pool and Bereaver reacts with its cannon defense, Steamhammer will conclude “Oh, Bereaver doesn’t do the same thing every game.”
Right now, the “my opponent is a constant” recognition is used in a tiny way. I have always wanted to use it to optimize opening choices, like so:
If the opponent plays a fixed plan, Steamhammer should seek a fixed counter. Most bots with opening learning, including Steamhammer so far, seek the best fixed counter; it’s easier and works faster. If the opponent varies its plan, Steamhammer should in general follow a mixed strategy, which is to say, it should choose randomly among some of the effective counters. This is not as easy to code and takes longer to converge on a good solution, but it makes life difficult for opponents which try to learn or adapt.
Comments
Bruce on :
Jay Scott on :
Tully Elliston on :
Arrak on :