Continual Online Evolutionary Planning paper

I noticed that Hao Pan’s description says it uses COEP or Continual Online Evolutionary Planning to adapt its build order during the game. And Hao Pan is a successful bot. So I thought I’d write it up.

The original paper is from 2017, Continual Online Evolutionary Planning for In-Game Build Order Adaptation in StarCraft by Niels Justesen and Sebastian Risi. There is a corresponding github repo with source. Long-time followers may remember Niels Justesen the bot. Niels Justesen has other publications; if you like this paper, I suggest Playing Multi-Action Adversarial Games: Online Evolution vs. Tree Search as a followup.

A good implementation is not simple, but the basic idea is easy. You have a set of build order plans, which you create and update in the evolutionary algorithm style. And you have a way to measure how good a plan is; in evolutionary algorithm terminology, its fitness. When you need to make a build order decision, you select the current best plan and follow it.

OK, details. In this paper, a build order plan is simply a build order, a list of stuff to build (or research) in order. In principle, it can be any length. You start with a population of different build orders, which you can generate randomly if you don’t have any better ideas. You evaluate the fitness of each one. Probably on the first try none of them will be particularly fit, so you need to make more plans. You create new builds by treating each one as if it were a stretch of DNA, and introducing different kinds of recombination and mutation (traditionally, a lot of recombination and a little mutation). The paper explains the authors’ choices. And of course evaluate the fitness of the new child builds. To keep the population size down, you delete less fit builds.

In concept, the evolution process runs continuously, and you query it for the best known choice when you need to make a decision. The authors actually implemented a client-server architecture, with a build order server. You could run it in a different thread, or do evolution on demand, or whatever; it should all work. When you start to carry out a build, you have to update the population to reflect that: This build says make 4 probes, I made 1 probe, there are 3 left. And the fitness measure needs to know about the game state so it can tell how good a build is.

So how does that fitness measure work? There are 2 parts, a forward model that predicts what happens when you follow a build order, and the fitness itself which measures how good the result is. You feed the forward model with information about the game state; the authors use mainly unit counts for both sides. In the paper, the forward model ignores builds that are illegal (make a corsair with no stargate), because the builds will have all kinds of random stuff in them. The forward model simply simulates what units you end up with when.

The fitness itself is based on a model of what counters what. You know what enemy units you’ve seen, and the forward model tells you what units you will have yourself. The fitness measure knows that vultures beat zealots and lose to dragoons, so in the face of vultures it sees lower fitness in a build that keeps making zealots and higher fitness in a build with more dragoons.

The build order being evaluated for fitness may be long, and you don’t want to measure fitness only at the end. You might lose before then! So they measure momentary fitness repeatedly throughout the build, and combine the local fitness values into a total fitness value, giving more weight to local fitness values that are early in the build. You’re pretty sure what’s next, not so sure about what happens later.

There is a video showing a game as protoss versus the built-in AI. Production decisions are made by COEP, the micro and other behavior is UAlbertaBot. Beating the built-in AI is not impressive, but what’s worse is that COEP showed some strange behavior. For example, COEP chose to research air attack for no reason; it never made a stargate. Even if it was planning at one time to go carriers, it made no sense to research air attack that early. Possibly there was not enough computation time to weed out all bad decisions.

A weakness of the described version of COEP is that it only counts the enemies you’ve seen. It doesn’t try to predict the future enemy unit mix, or to weigh the risk to you of different enemy choices. But that’s a weakness of the implementation, not of the basic algorithm. Features like that can be implemented.

COEP is extremely general. You need a way to represent plans, an evolutionary way to modify them, and a way to evaluate them for fitness. I think that can describe any problem (some more naturally than others). You could use COEP for tactical decisions and micro decisions too, or for deciding what to have for lunch.

You pay for the generality with more computation. You end up inventing and evaluating many plans that you will never follow, including plans that are unrealistic or outright invalid. You can try directed evolution ideas to reduce the extra CPU time, but generality will come with a cost.

I expect that deep learning, if you can apply it, will work better than COEP. The disadvantage of deep learning is that it burns data by the supertanker load. With COEP you have to implement a bunch of stuff yourself instead of letting the computer figure it out, but you don’t need the mass of data, so it may be more accessible for a hobbyist. Of course, you can combine the methods: Follow the COEP architecture and use deep learning to create and modify the alternative plans, or to evaluate them, or both. Since deep learning is then solving an easier problem, training should be easier.

Trackbacks

No Trackbacks

Comments

Bytekeeper on Sunday, March 17. 2019:

I'm not so sure about the deep learning. Due to it being offline learning there's no adaptability within a tournament.
If even one human player finds a flaw, all can abuse it. That could be solved with some online adaptiveness, like COEP does.
Maybe have some offline learning and then use COEP adapting the network in-game?

Jay Scott on Sunday, March 17. 2019:

Yes, for straight deep learning, you have to train it with everything that it might face, or there will be holes—and for the full game, training with everything looks extremely hard. I think you want to combine DL with some search or reasoning method that has an independent ability to find and exploit weaknesses. They not only complement each other during play, but during learning: You get richer training data with fewer holes.

For production decisions alone, training with everything is not as difficult.

Hao Pan on Sunday, March 17. 2019:

Jay:

Thanks for the article on COEP.

COEP is only implemented for my Z (Fresh Meat), not yet for my T. Although Fresh Meat is currently disabled, COEP did prove as a powerful planner for it. The author of Banana Brain actually did some tests vs Fresh Meat and he identified several openings performed by Fresh Meat: 9-pool, overpool, 12-pool, etc. I also did some tests, pairing up Hao Pan and Fresh Meat. One example is illustrated by this video:
https://www.youtube.com/watch?v=_PD5pHjF4MU

Fresh Meat (under the name Proxy, as Fresh Meat was not on SSCAIT yet and I had to 'smuggle' in Fresh Meat's dll into Proxy's folder when using sc-docker to run the game) opened with 9-pool and a quick expansion. As my custom forward model (which was based on Blizzard's AI) predicted the opponent would have a bio army supported by tanks (this is a bit contrary to your statement, "A weakness of the described version of COEP is that it only counts the enemies you’ve seen. It doesn’t try to predict the future enemy unit mix, or to weigh the risk to you of different enemy choices."), Fresh Meat then proceeded to produce hydras and quickly started researching the lurker aspect. The initial 6 lurkers took care of the bunker at the choke point, and set up a contain which was later reinforced by another batch of lurkers. Due to the overwhelming amount of lurkers, Fresh Meat eventually won the game.

The success of the COEP in the game above hinges on the accuracy of the forward model which was lucky enough to predict opponent's army composition. But in reality, such prediction/forward model is really difficult to make. One possible future step I think of is to look into this paper by the Facebook group:
https://arxiv.org/abs/1812.00054

I ended up implementing COEP on my own, not using anything from the github repo, as I hoped to have something I can freely experiment/tune. The implementation was done such that COEP was performed on each frame and it was fast enough that no significant slow-down of my bot was observed. Niels Justesen was kind enough to provide guidance/tips along the way. One thing Niels helped in particular was the selection of hyperparameters of COEP (which was largely based on empirical experience imo).

All in all, I had good impression of COEP and might do something more on it in the future.

Jay Scott on Sunday, March 17. 2019:

Thanks for the report! I hope it inspires people.

MicroDK on Sunday, March 17. 2019:

Proxy is your bot? It is really difficult to beat... Microwave is about 50% since my last update.

Jay Scott on Sunday, March 17. 2019:

The comment says it was only named Proxy for the test.

Hao Pan on Sunday, March 17. 2019:

Right. Proxy is NOT my bot.

It's the only bot I knew for sure that used BWAPI 4.2.0. And since my bot also uses that, I swapped out Proxy's dll with Fresh Meat's for the test, as Jay commented.

MicroDK on Monday, March 18. 2019:

I guess I misunderstood what you wrote... ;)

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA