I noticed that Hao Pan’s description says it uses COEP or Continual Online Evolutionary Planning to adapt its build order during the game. And Hao Pan is a successful bot. So I thought I’d write it up.
The original paper is from 2017, Continual Online Evolutionary Planning for In-Game Build Order Adaptation in StarCraft by Niels Justesen and Sebastian Risi. There is a corresponding github repo with source. Long-time followers may remember Niels Justesen the bot. Niels Justesen has other publications; if you like this paper, I suggest Playing Multi-Action Adversarial Games: Online Evolution vs. Tree Search as a followup.
A good implementation is not simple, but the basic idea is easy. You have a set of build order plans, which you create and update in the evolutionary algorithm style. And you have a way to measure how good a plan is; in evolutionary algorithm terminology, its fitness. When you need to make a build order decision, you select the current best plan and follow it.
OK, details. In this paper, a build order plan is simply a build order, a list of stuff to build (or research) in order. In principle, it can be any length. You start with a population of different build orders, which you can generate randomly if you don’t have any better ideas. You evaluate the fitness of each one. Probably on the first try none of them will be particularly fit, so you need to make more plans. You create new builds by treating each one as if it were a stretch of DNA, and introducing different kinds of recombination and mutation (traditionally, a lot of recombination and a little mutation). The paper explains the authors’ choices. And of course evaluate the fitness of the new child builds. To keep the population size down, you delete less fit builds.
In concept, the evolution process runs continuously, and you query it for the best known choice when you need to make a decision. The authors actually implemented a client-server architecture, with a build order server. You could run it in a different thread, or do evolution on demand, or whatever; it should all work. When you start to carry out a build, you have to update the population to reflect that: This build says make 4 probes, I made 1 probe, there are 3 left. And the fitness measure needs to know about the game state so it can tell how good a build is.
So how does that fitness measure work? There are 2 parts, a forward model that predicts what happens when you follow a build order, and the fitness itself which measures how good the result is. You feed the forward model with information about the game state; the authors use mainly unit counts for both sides. In the paper, the forward model ignores builds that are illegal (make a corsair with no stargate), because the builds will have all kinds of random stuff in them. The forward model simply simulates what units you end up with when.
The fitness itself is based on a model of what counters what. You know what enemy units you’ve seen, and the forward model tells you what units you will have yourself. The fitness measure knows that vultures beat zealots and lose to dragoons, so in the face of vultures it sees lower fitness in a build that keeps making zealots and higher fitness in a build with more dragoons.
The build order being evaluated for fitness may be long, and you don’t want to measure fitness only at the end. You might lose before then! So they measure momentary fitness repeatedly throughout the build, and combine the local fitness values into a total fitness value, giving more weight to local fitness values that are early in the build. You’re pretty sure what’s next, not so sure about what happens later.
There is a video showing a game as protoss versus the built-in AI. Production decisions are made by COEP, the micro and other behavior is UAlbertaBot. Beating the built-in AI is not impressive, but what’s worse is that COEP showed some strange behavior. For example, COEP chose to research air attack for no reason; it never made a stargate. Even if it was planning at one time to go carriers, it made no sense to research air attack that early. Possibly there was not enough computation time to weed out all bad decisions.
A weakness of the described version of COEP is that it only counts the enemies you’ve seen. It doesn’t try to predict the future enemy unit mix, or to weigh the risk to you of different enemy choices. But that’s a weakness of the implementation, not of the basic algorithm. Features like that can be implemented.
COEP is extremely general. You need a way to represent plans, an evolutionary way to modify them, and a way to evaluate them for fitness. I think that can describe any problem (some more naturally than others). You could use COEP for tactical decisions and micro decisions too, or for deciding what to have for lunch.
You pay for the generality with more computation. You end up inventing and evaluating many plans that you will never follow, including plans that are unrealistic or outright invalid. You can try directed evolution ideas to reduce the extra CPU time, but generality will come with a cost.
I expect that deep learning, if you can apply it, will work better than COEP. The disadvantage of deep learning is that it burns data by the supertanker load. With COEP you have to implement a bunch of stuff yourself instead of letting the computer figure it out, but you don’t need the mass of data, so it may be more accessible for a hobbyist. Of course, you can combine the methods: Follow the COEP architecture and use deep learning to create and modify the alternative plans, or to evaluate them, or both. Since deep learning is then solving an easier problem, training should be easier.