a couple needed improvements for the opponent model

There are 2 changes I want to make to Steamhammer’s opponent model. Both of them will take some reading up and thinking.

1. The exploration algorithm is not good enough in practice. Its purpose is to try different openings to counter the opponent. Steamhammer uses epsilon-greedy (that is, a fixed probability) to decide when to explore, and to decide what to explore it uses an ad hoc strategy of at first trying openings to counter the opponent’s recognized plan, then with more games casting a wider net.

The problem with deciding when to explore is that Steamhammer knows more openings than it has time to try. With more choices than games, we can model the situation as an infinite-arm bandit. I haven’t read up on it yet. Here are a couple papers I found in a quick look for infinite-arm bandit solutions, if you want to get ahead of me.

Algorithms for Infinitely Many-Armed Bandits (Wang, Audibert, and Munos, 2009)

Simple regret for infinitely many armed bandits (Carpentier and Valko, 2015)

The problem with deciding what to explore can be solved with opening data. Data telling which openings work across all opponents (versus different races, on different maps, with opponents following different plans) can be used in solving both problems. But it will take a long time to collect the data; expect it late this year at the earliest, likely later.

2. It doesn’t mix up its openings enough. Steamhammer’s classic randomized openings were effective. Now it often chooses an opening depending on what the opponent has done in recent games, and can get stuck on fixed responses after the opponent has switched. For example, Proxy was updated to follow a mutalisk plan instead of a zergling plan in ZvZ, but Steamhammer is still responding to the zergling plan, and losing every game because of it.

Steamhammer tries to tell whether the opponent follows a fixed plan, or varies. It doesn’t matter whether the variation is random, or reacts to past games, or reacts to events in the current game, or is due to the bot being updated; the opponent is playing a fixed plan, or not. For example, if Steamhammer tries a 5 pool and Bereaver reacts with its cannon defense, Steamhammer will conclude “Oh, Bereaver doesn’t do the same thing every game.”

Right now, the “my opponent is a constant” recognition is used in a tiny way. I have always wanted to use it to optimize opening choices, like so:

If the opponent plays a fixed plan, Steamhammer should seek a fixed counter. Most bots with opening learning, including Steamhammer so far, seek the best fixed counter; it’s easier and works faster. If the opponent varies its plan, Steamhammer should in general follow a mixed strategy, which is to say, it should choose randomly among some of the effective counters. This is not as easy to code and takes longer to converge on a good solution, but it makes life difficult for opponents which try to learn or adapt.

Trackbacks

No Trackbacks

Comments

Bruce on Monday, August 13. 2018:

A small thing I did recently was to model how the opponent reacts to winning or losing with a specific plan. Many bots stick with the same plan as long as they are winning, then switch as soon as they lose, so it’s possible to say: “the opponent did a fast rush last game and lost, so it definitely won’t do a fast rush this game” and pick a much more appropriate opening.

Jay Scott on Monday, August 13. 2018:

Oh, a simple form of plan prediction. That’s a good idea. For some reason I only thought of complicated forms....

Tully Elliston on Wednesday, August 15. 2018:

A predictable response to winning or losing vs an opening is fairly abuse-able.

Arrak on Tuesday, August 14. 2018:

Yeah, data really feels like the limit here. In an ideal universe, I reckon I would have my bot decide between 9Pool, Overpool, 12Hatch, and so forth, and then let the bot generate the transitions from scouting info --- a crucial decision being rush-safe vs. 5 Hatch Hydra transition in ZvP. At the very least, it should be possible for a bot to realize that many openings and transitions are closely related, given sufficient data, and equipped with such knowledge, it wouldn't have to explore the entire space to decide which ones to try.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA