the Robust Continuous Build-Order Optimization paper
The paper “Robust Continuous Build-Order Optimization in StarCraft” by Dave Churchill, Michael Buro, and Richard Kelly drew some attention. The paper looks to have been written in late 2017 or early 2018, based on its contents and references. It was not published until September 2019, and some of its claims fell out of date in the meantime. I found it underwhelming, which is why I did not write it up earlier, but properly speaking it is at least a small academic advance.
They take UAlbertaBot and BOSS as their experimental platform. Classical BOSS is given a goal to make a set of units (like “make this many zealots plus an observer”) and figures out how to make the units in the shortest time (or tries to; it has bugs and limitations).They take that as a starting place to move onward from. As I see it, the paper has 3 points:
1. Classical BOSS produces a timing attack: If you maximize your forces at the end of the build plan and otherwise don’t care, you may be vulnerable before you reach your strong timing. They compare the goal of producing a fixed set of units as fast as possible, or of producing as many units as possible by a given time (making a peak in the army size curve), with the goal of maximizing the area under the army size curve so that the army size is never too far from its feasible maximum. Maximizing the area means minimizing the timings at which you are vulnerable to an enemy timing attack, aiming for “I’ll always have a big army” rather than “I’ll have the biggest army possible, but only at one given point in time.” That is an important goal, so it’s good to have it available, though in a performance program it should not be the only available goal—to never make a timing attack is to leave money on the table.
2. To define “army size” they introduce an abstract evaluation function that measures the “army value.” The underlying goal here is for the system to make more decisions itself: Instead of the programmer specifying “make n zealots” the evaluation function decides whether one potential army is better than a different one. (Whether that actually offloads decisions from the programmer to the software depends on how you write the evaluator.) BOSS will be recast to maximize the integral of the army value over a given time span, rather than its classical target. In principle, you could use any measure of army value. In their experiments, they stick with a simple total of resources spent to produce the army. It’s fast and adequate to the immediate goals of the paper, but in a performance program you’d want something more capable. An important point is that it is what they call a “one-sided” evaluation: It does not take into account possible enemy reactions during the span of future time that the BOSS search looks into. Apparently the army value must be monotonic in the sense that adding to your army should never make it decrease.
3. A key point is that this altered BOSS search is no more difficult than the classical search, in any sense. It’s easy to understand how and why it works. It can be run in the same amount of time.
The experimental tests look weak from the point of view of a performance program. They are only trying to show that the idea is valid, not trying to make it work well enough to be used seriously. They use protoss only, make only zealots and dragoons, and compare classic UAlbertaBot versus a modified UAlbertaBot. They test against the top 10 AIIDE 2017 winners, which apparently were the best bots when the paper was written. The results are not consistent, but do show large gains against some opponents and no large losses against any. On the other hand, UAlbertaBot’s macro can deteriorate after the opening phase (though less so with protoss), so it’s not clear how much the gains mean.
It’s unclear whether the idea could succeed at all for zerg. Terran and protoss can make workers constantly, so that the army value can be at all times close to the maximum feasible army value. Zerg has to decide between workers and army, and the army you get later depends on the army you forgo now. The army value cannot stay close to the maximum feasible, and the tradeoffs are different. I expect you would get milquetoast macro plans rather than strong sharp ones.
To use this seriously, you’d need to write an army value evaluator which takes into account the opponent’s units. Making more zealots will not stop those lurkers. You’d want to rerun BOSS whenever the opponent’s unit mix threw up a surprise, like cloaked units; that would detract from the academic goal of offloading decisions from the programmer. BOSS can take up to a few seconds to run, spread over that many frames. Even if you overlap build order plans, so that you’re still executing the old one while computing the new one, the new plan will be out of date by that much time. That is an intrinsic disadvantage. Nevertheless, I think it’s entirely possible that with enough effort the paper’s idea could be made to work well in a tournament program, though the paper doesn’t prove it (or try to).
Bottom line: “Robust” in the title refers to the robustness against timing attack that the idea aims for. “Continuous” is false; the BOSS search starts and finishes at discrete points in time and takes many frames, and it’s expensive so you probably don’t want to rerun it too often.
I don’t recommend the method of the paper for a performance Starcraft program. It is part of an academic research program, chipping away at the edges of the problem in hope of eventually sculpting it into something manageable. It’s interesting if you have an experimental attitude and you want to try to improve it further; the first necessary improvements are in the army value evaluation function.
Comments