Steamhammer plans

My goals and plans for Steamhammer development. What, I told you it was next, didn’t you believe me?

Start with the usual 3 levels of abstraction: Strategy, tactics, unit control. My eventual goal is to conquer the levels one after another with machine learning, starting at the top with strategy—it’s more fun that way. I think I have to tackle each level separately, because any learning method that can learn all levels at once will devour more data than I can feed it—it will be beyond my resources.

My vague plan for each level is to start with a hand-made approximation of good play and learn the differences between the approximation and actual good play. That’s also because it’s more fun that way. First, play will never be too awful, so it won’t be discouraging. Second, learning will be faster because there’s less to learn, which is important because I expect to have to relearn each model repeatedly. For example, every time the tactics change too much, the strategy model should relearn what strategy is good with this set of tactical skills.

One goal is to learn opponent models, at least at the strategy level. I’d love to learn models at each level. We’ll see if I get that far.

Another goal for the strategy level is to be capable of every opening. If an absurd build like 8 gas 14 pool is good against some weird opponent, I want Steamhammer to be able to figure that out, at least in principle. I want to be able to say: If Steamhammer played enough games, it could eventually figure it out, no matter how unusual. Humans can develop new builds, bots should do it too.

For the moment I’m working on infrastructure. I renamed UAlbertaBot’s MetaType data structure which represents “a thing to build/research/etc.” to MacroAct, since I like names I can understand. I added “command” actions to MacroAct, and so far have implemented commands for scouting and taking drones off gas (both of which affect macro—they are macro actions). You can now write these into build orders in the config file, which makes the build orders more powerful; the bot’s behavior is no longer hardcoded. (Dave Churchill was apparently planning something similar, since the original MetaType already had an unused “command” variant.) I’ll use the same system for communication from the strategy boss to the production manager.

After a few more commands, I’ll add positioning information to MacroAct, so that it can distinguish between an expansion hatchery and a macro hatchery, specify which base gets the static defense building, and stuff like that.

That covers the most vital missing skills. Then I’ll start on the strategy boss. Goals are versatility, good macro, fast adaptation, and the ability to recover from upsets, all of which Steamhammer 0.2 is weak at.

A first working version should be ready to release some time in January. Depending on how fast I can make progress, I would like to add one more feature before the release, random choice of openings. Learning bots quickly learn to exploit Steamhammer 0.2’s fixed openings, and that’s no good. I’ll start with fixed probabilities for each opening, specified in the config file. Since the strategy boss can adapt to the game situation, the opening build orders will be shorter than now. A random mix of aggressive openings and macro openings with different tech choices should be difficult for the learners to exploit, as long as Steamhammer can play all the openings well enough. It should also be more fun for humans to play against.

Then it will finally be machine learning time. I’ll decide whether to learn openings or strategic play (after the opening) as the next step. The first target of opening learning will not be “learn the best opening,” it will be “learn the best mix of openings.” The second will be to develop new openings, of course.

Along the way I’ll be fixing bugs, improving micro decisions, and whatever else seems essential. But since I plan to completely replace the tactics and micro code, I’d like to minimize the work I put in and only do the essential.

Trackbacks

No Trackbacks

Comments

Johannes Holzfuß on Sunday, December 25. 2016:

Hi. I'd start with opening learning instead of strategy learning, because that's much simpler. Thompson Sampling (sometimes known as Bayesian Bandit algorithm), for example, is trivial to implement once you have a file I/O system for storing match results. It naturally learns a mix of openings, using them in proportion to how well they perform. You can also pre-specify probabilities by adding pseudo-wins and losses to your data file, which the algorithm will then fine-tune.

Jay Scott on Sunday, December 25. 2016:

It’s a good suggestion. Thompson sampling directly over the openings has a weakness in the case of opponents which do their own opponent modeling: It doesn’t recognize when a mixed strategy is better. I could do Thompson sampling over the distribution of mixes... but without more modeling, we know how that would turn out. I think that some explicit game theory calculation has to work its way in there.

krasi0 on Sunday, December 25. 2016:

Great! Finally someone to take ML in BWAI seriously and approach ambitiously and (hopefully) carry the whole thing from start to finish. I mean most of us have embedded a small ANN here and there but never to any significant benefit. :)
I remember vowing about 5 years ago that my next bot will be Zerg and it will be entirely ML based, i.e. no hardcoded expert knowledge inside. Well, we all can see how it's turned out :P

Jay Scott on Sunday, December 25. 2016:

Don’t be too optimistic, the chickens are still in the eggs!

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA