Starcraft AI blog | Entries from January 2019

no end of Steamhammer delays

In Steamhammer, I made a change to construction that was supposed to fix 2 bugs, the delay in starting construction for some buildings, and the abandonment of drones whose buildings cannot be started (they sit idle sometimes for a long time before being sent back to work). I figured out a plan so simple that I thought it should work on the first try—but no, in reality it brought enough bugs for Noah’s ark. It was such a setback that my motivation faltered and I have been debugging desultorily. Everything is delayed further.

Meanwhile, AIST S2 tournament registration is opening. No announcement made it my way; I had to seek it out myself. I decided only a couple days ago that Steamhammer should participate again, though I haven’t tested it at all on the map Eddy. Even if I move slowly, the submission deadline is the end of February and I should be ready. It’s possible that tournament prep may cause more delays (last year it caused long delays; this year won’t be as extreme). I may or may not release the tournament version, depending on how it turns out.

I hope Steamhammer 2.2 will be worth the wait. The new map analysis should not affect strength much, though it will affect play to a certain extent. Some critical bugs are fixed, others are pending; if I get them all done, play should be visibly cleaner.

new bot legacy

The new bot legacy is random and plays a fixed strategy for each race. It’s brand new (the description is “myfirstbot”) and doesn’t have many skills yet. For example, it’s weak to worker harassment and can’t recover if its build is derailed. The refinements take time.

Terran - BBS. Legacy doesn’t seem entirely bent on rushing, though; it is happy to play cautiously and macro up a large marine army. Later it builds an academy and researches stim, but strangely never trains medics. Without medics, the marine force becomes as thin as paper after repeated use of stim.

Protoss - 10-12 gate zealots. This may be legacy’s strongest race. If the scouting probe survives, it may warp pylons in the enemy base for no apparent reason. (Because it needs the supply?)

Zerg - Overpool, rapidly adding hatcheries to increase production of zerglings. It’s somewhat similar to Black Crow, but without zergling speed and not as polished.

Nothing special... yet. Most bots that start as nothing special fade away after a while, but some get ongoing work and become worthy competitors.

the bot Newbie Zerg

Newbie Zerg is, of course, not a newbie at all. It is a reincarnation of an old Newbie Zerg, and of related bots like Newbie Zergrush and 5 Pool. Like those now-classic bots, it plays rushes. This version seems more closely related to 5 Pool: It always plays 5 pool and knows a variety of followups, which are sometimes selected by hand for specific opponents. It’s also alike in being updated frequently.

In the version uploaded today, Newbie Zerg is configured to play a 5 pool macro (strange as it sounds) opening against Krasi0, SAIDA, Iron, Locutus, DaQin, and Tscmoop2. The opening line starts with 5 pool and makes only 4 pairs of zerglings, then starts drone production and transitions to 2 hatch mutalisk. No other opponents are specially configured.

If the author follows their historical pattern, we can expect carefully hand-tailored opening lines to defeat specific strong and predictable opponents. But this time around, the strong opponents are better at defending rushes and are more adaptive than before. Newbie Zerg scored one win over Locutus, but now Locutus has seen its play and a second game will not be as easy. So far it has not been able to beat SAIDA or Iron. Still, it should be a difficult opponent for many bots.

SSCAIT round of 8; AlphaStar

In between ASL 7, the SSCAIT round of 8, about 4 hours of video on DeepMind’s AlphaStar, and keeping up with Steamhammer’s games, I was watching Starcraft today nearly from dawn to dusk. Coding progress: Zero. Time to do something else for a while and write about it!

In SSCAIT, the hard-to-predict matches were PurpleWave-BananaBrain and Iron-Steamhammer. PurpleWave 3-2 BananaBrain was the expected close match. In the next round we can expect PurpleWave > SAIDA and Locutus > Iron, so PurpleWave and Locutus will fight it out in the semifinal. I think Locutus has an edge, but PurpleWave retains chances.

I was not surprised by Iron > Steamhammer, but I was surprised by Iron 3-0 Steamhammer. It was unlucky that it was so one-sided. When Steamhammer has ample learning data, it has a small advantage over Iron. On a 2 player map, the 2 hatch muta variant usually wins, but here Steamhammer played it on a 4 player map where it usually loses due to poor execution—Steamhammer didn’t have enough data to connect its past wins with 2 hatch muta to the map size. On other maps, the AntiFactory build wins 50% or a little more, and in this match Steamhammer was still casting about for ways to win and didn’t try AntiFactory. It’s because I knew that Steamhammer didn’t have enough data that I gave the edge to Iron. Steamhammer is likely to win in the next round and, as others have also predicted, drop out in loser’s round 4 to SAIDA.

Interestingly, commenters who predicted Proxy > Hao Pan were wrong. All the information I can find seems to indicate that Proxy has the upper hand over Hao Pan. Did somebody hit a learning transient?

AlphaStar in Starcraft 2 gives us a foretaste of what to expect from advanced neural network learning. On the one hand, they spent huge computing resources—weeks at a time of “many thousands” of simultaneous games with 16 of Google’s TPUs per player—to learn to play protoss versus protoss on a single map. On the other hand, AlphaStar came out of that work with exceptional micro and strong judgment, areas in which all Brood War bots are currently weak. Machine learning is the way to get strong judgment. But it’s not easy.

They say that AlphaStar plays with average APM around 280 and latency around 350 ms, both somewhat slower than human. That makes its strength more impressive. They didn’t say so clearly, but I got the idea that the 350 ms latency is for free: It takes that long to evaluate their deep and complex network, so they can’t react faster! They did not talk as much about how AlphaStar’s real advantage is not in speed, but in precision: It does not misclick (at least not harmfully). Humans have a tradeoff of speed versus precision; if you do something faster, you do it with more slop. AlphaStar is a little slower, but far more precise than a human, so in fact it stands higher on the speed-precision tradeoff. It should play better, given equal knowledge. Still, it certainly takes fewer liberties than a BWAPI bot.

cannon rush reactions

In Martin Rooijacker’s post on the end of the rushbots: rushes and how to defeat them, the suggested plans to defeat a cannon rush are perfunctory, only enough to get started with. In my earlier post cannon like Nal_rA I suggested other ideas, but also did not go into full detail.

Cannon rushes have become more important in bot play. Jakub Trancik builds cannons in unpredictable places (usually easily visible places, though). Juno by Yuanheng Zhu tries to cannon contain and push in; it wants to contain as close as possible, but is willing to back off and contain the entrance to the natural if it judges that it can’t get closer (though it tends to judge optimistically). AIUR can place cannons in hidden positions to hit the mineral line. And now Krasi0P likes to set up a cannon contain outside the natural and then cannon push in as far as it can, plus it has a dangerous followup game plan. And those aren’t all the possibilities! So here are more ideas on how to react. I plan to implement all of these in Steamhammer, some soon and some eventually.

categorizing cannon rushes

Placement. Cannons can be placed to restrict your movement. The most restrictive placement is a containment, which might be a containment inside the main preventing you from reaching your exit, a containment outside your exit preventing you from passing your exit, or a containment outside the natural preventing you from reaching the center of the map. Or cannons can be placed to deny access to an area, such as your natural mineral line, without stopping you from leaving your base (consider cannons above the natural on Heartbreak Ridge). Or cannons can be placed to hit critical areas, such as the mineral line, the main resource depot, a key tech building like the spawning pool, or your first production building. The mineral line is a common target, since it does the most immediate harm.

Timing. In the fastest cannon rushes, the first pylon is hidden near the enemy and the forge is built next to it. Current bot rushes are played with the forge in the main and the cannons powered by the second pylon. But cannon attacks can be played at any point in the game. If you leave a base undefended, they can put cannons near it that may double as defense when they take the base for themselves (amateurs try this sometimes). On some maps, cannons on high ground can hit a mineral line on low ground (Heartbreak Ridge is only one example; see the mineral only bases on Python for another).

Followup. Most bots will keep making cannons, trying to push in as far as they can. Objectively, once there are enough cannons to give the enemy pause (maybe 4, depending on what you’re aiming for), it’s often better to make gateways and zealots instead; the zealots and cannons defend each other, and the zealots can exploit opportunities to attack. (Presumably bots don’t switch to offense until late because it’s complicated, but it’s only a matter of time before somebody implements it well). If the cannon rush holds the other side in check, then the rusher can expand and/or tech freely and get ahead in any way it chooses.

Goal. Most bots play all-in cannon rushes to win outright (because they know bot opponents will often fall over). Most humans play pressure cannon rushes to set the enemy back so they can get ahead (because they expect that opponents will know how to react). Also possible are low-cost harassment cannon rushes to cause distraction and delay (think 1 pylon and 1 cannon in a hard-to-hit location; the cannon can even be canceled before it finishes depending on what happens).

defeating a cannon rush

In defending a cannon rush, you have a series of goals. 1. It’s best if you can prevent cannons from finishing, or at least hit them while they are few and can be defeated efficiently. 2. If the cannons finish, you need to put a brake on it so the situation doesn’t grow worse. Mitigate the threat. 3. Ultimately, you want to restore your freedom of movement or your access to a denied area.

1. Stop the cannons from finishing. If you see a cannon warping in or near your base, you need to react. If you only see a pylon, you should at least check it out in case you need to react, but it is also easy to overreact. On the one hand, if it is a manner pylon or blocking pylon to delay your natural, you react differently than to a cannon rush. On the other hand, nothing stops the enemy from building cannons next to a blocking pylon. (Or a gateway. Or a shield battery for zealots which are about to arrive.)

You can pull workers to stop cannons. To be worth it, the pull needs to be not too far away, to stop at least some cannons, and to not risk too many losses. In principle, a combat simulator can tell you whether cannons can be stopped before they finish. Are any current combat simulators accurate for cannon rushes? I doubt it. This seems easy to do crudely, difficult to do with fine judgment. Another try is to chase the cannon-building probe and make its job risky, but it’ll be hard to catch.

You also want combat units as early as possible. Start a barracks, gateway, or spawning pool. Cancel stuff if you have to. Pulling workers cuts your income and can delay combat units. Again, rules of thumb will help, excellent decisions are hard.

2. Mitigate the threat. First of all, if the cannons are in containment position, do your level best to get a worker outside before the containment is complete. If you have a scout worker already outside, it should first locate the enemy, which should be safe because there will be no gateway in the enemy main. Then keep the worker alive and hide it. If the containment is complete, you can still try to get a worker out with a diversionary attack: Send a few workers past and at the same time try to kill cannons; one or the other may succeed. If you’re zerg, the escaped drone wants to someday turn into a hatchery. If you’re terran or protoss, it might build anything. You may want a base to replace one that was denied by cannons, or production to go attack the enemy main that will be underdefended because of all the spending on cannons elsewhere. Escaping a worker is less important if you’re terran, first because terran can clear the cannons with a tank, second because terran can lift buildings to the outside.

If the cannons threaten to push closer, you may have to stop them. Zerg can commonly stop encroachment by building 1 sunken as close to the cannons as possible while out of range. Only make more than 1 if absolutely necessary. Protoss may want to add 1 cannon for the same reason, but the forge is an extra expense so it’s not as appealing. Terran needs a tank. An unsieged tank can prevent new cannons from warping in ever closer (it does need micro), and of course once siege finishes the cannons become fodder.

If the cannons deny something vital, you have to replace it. If you can’t mine enough because your main mineral line is under attack, expand. If a critical building is under attack, you may want to start a replacement before it dies. Terran should lift buildings that need to be saved or that can’t be used where they are (e.g. marines spawn in cannon range).

Those are the overall reactions. You have to get micro reactions right too. Don’t try to mine a mineral patch that is in cannon range; do mine any patches that are out of range. Don’t try to build a building, or land a floating building, in cannon range.

3. Restore your freedom. Don’t throw units away against the cannons; build up until you can take them out. This is the most basic thing, and yet most bots do it wrong, including Steamhammer. Terran only needs a tank and siege mode. Protoss wants dragoons, and if the number of cannons is huge, a reaver. Zerg prefers hydralisks. Ideally, attack containing cannons from both sides at once, including forces you’ve built up outside the containment. (Though as mentioned, the outside forces might want to hit the enemy main first.)

If you’re surviving well enough but you can’t afford to break the cannons (there are too many or the enemy reinforced them with an army), bypass them. Protoss should make a robo and shuttle. Zerg can go air or drop, depending on the situation—drop may be needed if you never got a drone out.

Steamhammer already has reactions to pull drones and to get an early pool (canceling stuff if necessary). It needs defensive fixes to prevent throwing away zerglings when the cannons are too strong. On my list for soon-ish are the encroachment-stopping sunken when needed, escaping a drone to make a hidden base, and mining and building only where safe. With those changes, Steamhammer should be resilient in the face of unexpected cannon rushes, at least until the next and stronger wave of cannon rush bots.

Long post, but I’m sure I know more than I remembered to write. What critical points did I forget or not know about?

Update: I reformatted the post and added a couple small bits.

SSCAIT 2018 knockout forecast

I’ve thought through the SSCAIT knockout bracket. The upcoming match between PurpleWave and BananaBrain could go either way, but I think PurpleWave has an edge. If PurpleWave wins, then it is likely to defeat SAIDA in the next round, then an easier opponent in the following round. In the semifinal, PurpleWave would likely face Locutus and have chances but stand at a disadvantage. SAIDA is likely to cruise through the loser’s bracket. If SAIDA faces Locutus, it is a probable win. If it faces PurpleWave, then maybe it will have played enough games for SAIDA’s learning to find a solution, but if not, then PurpleWave will win again. In the other case, PurpleWave loses to BananaBrain in round 2. In this case it will likely struggle on to face Locutus in the loser’s final, and again stand at a disadvantage. SAIDA will cruise through the top, and the final will be between SAIDA and Locutus or PurpleWave. So I think SAIDA and PurpleWave are the most likely winners, with SAIDA as the best single pick because it has fewer likely ways to lose twice.

Steamhammer defeated Krasi0P as predicted, and faces Iron next. The match could go either way, but Iron has better chances. I think if Iron wins, then Steamhammer will likely make it to loser’s round 3 or 4 before dropping out. If Steamhammer wins, in the best case it might get as far as the loser’s round 5. Last year, Steamhammer lost in loser’s round 4, so the relative level of play seems similar.

In unrelated news, I’ve re-uploaded Randomhammer. It’s the tournament version, Steamhammer 2.1.4. The last uploaded version of Randomhammer was 2.1, so there are improvements that were previously only visible in zerg play. But note that the drop openings are not working properly. Drops are working in the development version, but I still have a bunch to do before I release it.

performance differences on BASIL ladder

On the BASIL ladder, Steamhammer in early days performed poorly. But today Steamhammer ranks #8, ahead of KrasioP. If we skip the BASIL participants which are not in the SSCAI tournament (Krasi0 terran and ChimeraBot), that corresponds to place #6 on SSCAIT, just behind BananaBrain, as compared to Steamhammer’s actual tournament finish at #11. The performance corresponds in general to Steamhammer’s performance curve in AIIDE 2018, starting low and rising strongly, but seems even more dramatic.

Many bots have different rankings on BASIL compared to SSCAIT. Random bots are handicapped on BASIL by comparison, since the opponent knows the race ahead of time. There are other differences in rules, plus the environment can cause different reliability and possibly different behavior. For most bots, I think these differences should not matter much—though anything could happen for a bot with reliability problems. Am I wrong? Am I missing something that can make a big difference?

If I’m right, then the important difference is that BASIL plays more games, so learning bots learn more. Other than environment-specific bugs, I don’t know another way to explain big differences in rank, such as Killerbot by Marian Devecka being #19 on BASIL while it came in #7 in SSCAIT 2018: Killerbot is not a learning bot. Another difference (just to give a second example) is that BASIL ranks Ecgberht one step below Arrakhammer, rather than far below (SSCAIT #16 versus #33): Ecgberht is a learning bot.

Steamhammer has a surprisingly high crash rate on BASIL, over 6%. It doesn’t crash remotely that often on SSCAIT. I’ll have to look into that.

SSCAIT round robin is over

The SSCAIT 2018 round robin is complete. The last few games were not interesting and were before dawn in my time zone, but I woke up and watched them anyway. Congratulations to the final 16, it was no easy road, and I now intend for Steamhammer to crawl its way to the top over your corpses. And congratulations to everybody else for the effort, even though others crawled upward over your corpses.

Now we can try to forecast the round of 16, which seems to be playing right now though the games are not being streamed. #1 SAIDA, #2 Locutus, #3 Iron, #4 PurpleWave, and #5 BananaBrain are all big favorites to win. Maybe somebody can work through future pairings in detail and try to tell whether SAIDA can be eliminated by facing both of the 2 opponents that scored 2-0 against it. At first glance it seems possible.

#6 Krasi0P is paired against #11 Steamhammer (I landed just below the middle of my predicted range #7-#13, which were tightly spaced after #7). In the round robin, Steamhammer won the first game, and I expected a win in the second game too, but it didn’t happen. I thought the plan recognizer would understand the cannon rush and react properly. The second game shows that it did not recognize the plan correctly, but it should at least have understood the plan in the second game. The plan recognizer is tested to work in this situation, so I’m still expecting a win in this match. But I was wrong once; maybe there’s a bug I haven’t found. There is a tipping point: If the plan recognizer is wrong often enough, it will make Steamhammer’s play systematically worse, and it will take more than 5 games of match play to overcome that.

#7 Killerbot by Marian Devecka lost to #10 Tscmoo protoss in the round robin, and may keep losing. I think #8 Bereaver, not updated, should be at a disadvantage to #9 Microwave. This pairing system benefits the top scorers, but also grants a relative edge to bots finishing just below the middle, who are paired against comparatively easy opponents.

Beyond the round of 16, I don’t want to forecast. Knockout tournaments by design are sensitive to every result, so mistakes compound quickly. I think you need a full Monte Carlo analysis to estimate probabilities.

I predicted that Steamhammer would score 4-3 in its last 7 games. Its actual score was 5-2. I made my prediction by individually estimating the probability of winning each game, and simply adding up the probabilities. Steamhammer was slightly lucky to win against Iron, which I gave a probability of 0.4 based on test games.

SSCAIT 2018 round robin is nearly over

It looks as though SAIDA and Locutus will tie for #1-#2 in this SSCAIT. SAIDA has suffered 6 losses, as few as any bot, and it has played all games so nobody can pass it. Locutus is the only other player with 6 losses, and its remaining 3 games are against weaker opponents, so it will probably finish without another loss.

Locutus scored 2-0 against SAIDA, so I’m guessing that Locutus will be assigned #1 for pairing purposes. If the same pairing system is used as last year, #1 will be paired against #16 in the round of 16, #2 against #15, and so on. The exact pairings make a big difference in a knockout tournament.

With few games left to play, places #15-#16 are likely to be a tie between Arrakhammer and XIMP by Tomas Vajda, both with 44 losses. There is a small chance that one of them might lose and bring about a tie for place #16.

the decision: what major feature is next?

Here’s my answer to what major feature should be next?

My analysis of SSCAIT games said that most losses were due to poor strategy decisions (in the Steamhammer sense of strategy, which is mainly “what should I spend on?”). Dan Gant did his own analysis of Steamhammer losses and drew the same conclusion. To be sure, some of the bad production decisions are deliberate choices to cover weaknesses in Steamhammer’s other skills. For example, Steamhammer makes an inefficient spore colony to defend against corsairs, rather than saving the expense and going with mobile units that have more uses, because its hydras are poor at air defense. Also, some losses are traceable to tactical mistakes. Even so, most losses are due either to bugs or to poor strategy decisions. Choice 1, strategy adaptation, is my pick for the next major feature. It does the most good.

I want to work on the ops boss too, but it’s too much; I can’t do everything at once. I will miss its interesting skills. The ops boss is a prerequisite for most of the sneaky tricks, flexible tactics, and multi-step plans that I have in mind. Going that way would make Steamhammer’s play more fun.

My goal is not to win the most games in this year’s tournaments, though I wouldn’t mind. I want to make the most progress toward my end goal, an imaginary Final Steamhammer that is strong in every aspect. It just so happens that making the most progress this year helps with making the most progress in the long run. Last year I found that working on macro improvements as I had been was no longer helping; Steamhammer’s tactical and micro weaknesses were dominant, so that improving an area that was already relatively strong made no visible difference. If I made a macro improvement that I thought was big, or one I thought was small, either way Steamhammer’s performance stayed about the same, because it was losing games for other reasons. That meant I could not measure my progress! I had to concentrate on immediate improvements to be sure that I was getting anywhere at all.

This year, tactics and micro are still not strong, but are improved enough that they are no longer the weak point. It’s time to take another loop around the spiral. Next year, if not earlier, I’ll reevaluate and likely bend my course again.

I laid out a 3 phase program for strategy adaptation. My plans change often, so you can be sure I won’t carry through exactly as advertised. Given my past rate of progress and the number of side features I throw in, I doubt that the year is enough time to finish all 3 phases. Still, I should get a lot done and make big improvements.

Speaking of side features, commenters who wanted specific fixes are not entirely out of luck. I can’t do deep reworking of tactics like last year, but I will patch some of the big weaknesses. Lurkers will grow bigger brains. Units will gain at least a limited ability to foresee threats and route around them. I’ll see what I can do about losing units piecemeal while retreating. Stuff like that. Plus I’ll add some skills purely for fun, maybe queen support. Can’t let things grow dull, it’s bad luck—er, I mean, bad for motivation.

Anyway, next up is Steamhammer 2.2 with a variety of fixes and improvements, and without BWTA—good riddance. It will be out when the worst bugs are squashed. After that might be 2.2.1 or 2.3 with incremental changes to fix any fresh bugs and to help with big weaknesses. Strategy adaptation is a big enough change that it will probably deserve the name Steamhammer 3.0, but it will be a while before 3.0 appears. Early versions of strategy adaptation are likely to play worse than current Steamhammer, because it will take time to whip it into shape.

I predict 4-3 in the last 7 games

I predict that Steamhammer will score 4 wins and 3 losses in its last 7 games in the SSCAIT round robin phase. Its remaining set of games has an excess of opponents that give it trouble. Let’s see how accurate I am!

Steamhammer 2.2 status update

It looks as though Steamhammer 2.2 with regions and without BWTA will likely not be ready right after SSCAIT finishes, as I had planned. It will take longer because more stuff needs fixing. For one example, today Steamhammer lost versus UITtest because of a bug where drones entirely stopped mining minerals. I currently have WorkerManager instrumented to the gills to trace it. For another, yesterday I fixed several causes of rare production freezes and also added a workaround for the serious macro glitch that happens when Steamhammer wants to spend a lot of gas and is collecting gas slowly. Game-losing bugs like this are second on my priority list after crashes.

Also on my list is a bug that can cause infinite creep colonies to be built in the main base, instead of a limited number in the natural for defense. And I want to get the second building construction delay bug; fixing the first one got the spawning pool to start earlier, but the spire seems to start later, and that loses games too.

Plus of course I have added a few openings and made the usual varied fixes and changes. I always do.

high and low ground

In my current implementation, regions are defined entirely by chokes, which are themselves defined as narrow places in the terrain. I’m considering whether to change the definition of chokes to include changes in ground level. A narrow ramp counts as a choke no matter what. I’m thinking that maybe a wide ramp should also count as a choke. It has a similar property of defensibility; the difference is that the defensibility is asymmetric: It is easier to hold the top than the bottom. Since it’s defensible you care about it and may want to treat it as a region edge, no matter which side you’re on.

pathfinding

I’m not going to do pathfinding right away (though I’ll likely take initial steps soon), but here is my current plan. I want to separate high-level and low-level pathfinding. A high-level path from A to B is a sequence of chokes to go through, basically the same as BWEM provides. If you’re already in the same region, the sequence is empty. A low-level path is a sequence of positions to go through. A high-level path can be planned fast and if necessary replanned fast. A low-level path has to attend to details, so it saves time if you never have to plan farther ahead than the next choke, especially if you may change your mind.

For high-level pathfinding, Steamhammer needs to know all the chokes and whether each one is open or closed at last report. Therefore chokes must be found without destructible map obstacles (not how my current implementation works), and then each choke analyzed to see whether it is blocked. I also want to know the choke width, in case not all units fit through, to play on maps like Third World with narrow ramps and Blue Storm with a narrow entrance. For threat-aware high-level paths, it needs a measure of threat for each region and/or choke.

For low-level pathfinding, I want to work at the 32x32 tile level as much as possible, because the 8x8 walk tiles are 16 times as much data. I thought of scaling the “how much room is around this walk tile?” data to tile size, taking the max of the walk tile room values within the tile and filling in that as the room around the tile, perhaps with forcing to zero if the value is small or other adjustments. Other grids will also be at tile scale: This tile is blocked by a building, the threat at this tile is such-and-such. A path could be found at tile level, and any tricky parts double checked at finer resolution if necessary.

For air units, regions don’t matter and probably all path planning can be done at tile scale. Then detailed movement decisions related to firing and cooldown will be made frame by frame at pixel scale as now.

zones? areas?

In related news, I’m thinking of changing the name “regions”. Starcraft has its own idea of regions, represented by BWAPI::Region objects. They work differently and Steamhammer currently doesn’t use them, but they could be useful for some purposes. It would be nice to avoid a name clash, so that when you see a variable named region you don’t have to look up the type to find out what it is.

I like short names, so I’m thinking Zone or Area. Any other suggestions?

SSCAIT game result error

Look closely at this line from the latest SSCAIT results. (I edited it slightly so the links work from my site. The formatting is of course different.)

Steamhammer
(Zerg, AI_MODULE)

Jakub Trancik
(Protoss, AI_MODULE)

Bot 1

2019-01-12 14:06:58

.rep / watch

It claims to describe a game between Steamhammer and Jakub Trancik, but the replay and watch links point to a game between Steamhammer and Martin Rooijackers.

I haven’t seen an error like this before. I made a brief check of nearby game results, and they looked correct. The game numbers proceed in sequence. The replay is a genuine new Steamhammer-Martin Rooijackers game and the result is correct; only the opponent name is listed incorrectly. The unofficial crosstable shows Steamhammer as 0-2 versus Martin Rooijackers and 1-1 versus Jakub Trancik, although including this game I believe the results should be 1-1 and 0-1 respectively—I see only 3 games played in these two pairings. The only other sign of error I can find is that the replay does not appear on Steamhammer's SSCAIT page. Is it a one-off error, or are there other confused game results? Does it affect pairings?

I’ll report it directly to SSCAIT.

Update: Michal Certicky reports back that it is now fixed: It was a bug in choosing the replay file, not a mislabeled game bug. See the comment.

what major feature should be next?

After map analysis in the upcoming Steamhammer 2.2, what major feature should I work on next? I haven’t decided. I need to make progress on all fronts, but the largest features should be done one at a time so that they don’t step on each others’ toes. My hope is to have a powerful major feature ready in time for AIIDE 2019 in the fall, a version that I can call Steamhammer 3. As always, I’ll also do many minor features and fixes and stuff. Before I start in, I expect at least a Steamhammer 2.2.1 or 2.3 version with more map analysis features, such as support for finding paths through nydus canals, or at least better static defense placement.

1. Strategy adaptation. This has a lot of parts, and calls for adding judgment skills that don’t exist yet. As phase 1, I would create abstract openings in a general format, and allow the current concrete opening lines (which specify an exact build order) as implementations of the abstract openings. If no concrete opening is known for a given abstract opening, or if the situation changes and the intended opening has to be adapted or abandoned, Steamhammer will have to make up a new concrete build order for the situation. As phase 2, I’ll have Steamhammer collect data on what works. For making the initial opening choice (and for later decisions), I’ll drop the current hand-coded probabilities and rely on the data, so that Steamhammer’s choices will be empirically grounded. At this point, Steamhammer will have far more flexible reactions during the opening builds. As phase 3, I’ll extend the abstract openings to choices of abstract strategy over the whole game. At this point, instead of following hand-written opening lines or hand-coded rules, Steamhammer will weigh decisions: I see the cannons. I can bust them with units, or fly around them, or take the opportunity to grow my economy. Which is better this time, hydras or mutas or drones? Steamhammer currently knows openings for these 3 possibilities, but if it is following its strategy rules, the rules always say to make drones.

2. Operations boss. I want to dump CombatComander and replace its functions with OpsBoss, which currently exists mainly as a stub. The combat commander has a largely fixed set of squads, and its main job is to assign units and give orders to each squad. The ops boss will have goals and plans to achieve the goals. It will make up squads more dynamically, and assigning units will be the tail of its work. The ops boss will be able to carry out multi-step plans like taking island bases (transport a worker, mine out the blocking mineral patch, take the base, transfer workers, add defenses, return workers when the base is mined out) or complex drops, and in general will do more varied and interesting stuff.

3. Squad structure—effectively, the tactics boss. I think the Squad class needs to be completely rewritten; it is not powerful enough to represent all the behaviors that a squad needs. See awkward points and design ideas. As part of this, I would implement formations including large-scale surrounds and some level of support for different kinds of unit coordination: Vultures screen the tanks, dragoons leave a gap for the reaver to shoot through, flying detectors and arbiters maneuver to do their jobs better, etc.

what to do?

Micro still needs tons of work, but I think I can treat micro improvements as a sequence of minor features that I can tackle one at a time. Building and unit placement is important. Another vital subsystem that needs rework is production. All aspects need to be moved under the production goal system, which will improve macro for all races and let me dump BOSS for terran and protoss—and also ties in with strategy adaptation. It’s a necessary change, but ideas 1, 2, and 3 above are also necessary and seem more likely to win tournaments this year.

To me, strategy adaptation and the ops boss are the most tempting. Both make play more fun and interesting, and potentially much stronger. Strategy adaptation would stick to my original strategy-first development plan. From a development efficiency point of view, it is logical to work on squad structure before the ops boss, or at the same time because they interact and their needs are interrelated. But if I work on them together, I might not finish them this year. It’s possible that I could decide to do selected parts of more than one idea.

What do you think? One of these, or something else entirely?

funny map analysis picture

It turns out there are a lot of ways to calculate regions and chokes. In the course of putting one together, I’ve been doing some other map analysis that should be useful for micro and pathfinding. Here is one that doesn’t work yet, a color-coded debug drawing which is supposed to show the room available around each walk tile: How much space is there for a unit or army to fit into? If you know a path, you can check the tiles to find out which units are small enough to travel the path. Or you can figure out how many of your units fit behind the enemy mineral line—should you go there?

Unfortunately, it’s no good as it stands. Among other mistakes, it claims there is no room in places where there obviously is. It makes a funny picture, though.

Like BWEM, I also calculate the distance from the nearest unwalkable tile. Iron makes good use of that information. That code worked correctly on the first try....

which weaknesses are critical?

The tournament version Steamhammer 2.1.4 suffers from a command jam bug which reissues commands far too often, causing many to be dropped. It’s a critical bug with devastating effects, causing units to ignore their orders—to freeze in place, to wander past the enemy taking fire without noticing, and so on. It starts having an effect often before the zerg supply reaches 50, and the effect grows worse as supply increases. By the late game, large groups of units are sitting uselessly around the map doing nothing. It’s a critical bug and intolerably severe.

But how critical is it really? I look at every game that Steamhammer plays. Based on tournament losses, I estimate that if I had fixed the bug before SSCAIT started, Steamhammer’s rank would not be #10 as now, but #7—not much gain considering how closely the ranks are spaced, only a few percentage points up in win rate.

How can such a calamitous bug have so little practical effect? In Steamhammer’s early days, one version had a bug that subtly caused building construction to be delayed. I doubt any stream viewer noticed; I didn’t notice either, until surprised by unexpected losses. Experience and test games proved that it was a critical bug that caused a high rate of losses against early aggression. By comparison, the command jam bug is identifiable as the cause of loss in only a few games, like the one loss against XIMP by Tomas Vajda. In other games where the bug struck hard, as against ICEbot and MadMix, Steamhammer struggled more than it should have but won regardless.

Apparently the bug causes losses only against a narrow range of opponents which play macro games and are strong enough to exploit the weak play that the bug causes. There is no effect against a strong opponent like SAIDA, or against the weakest opponents which lose to Steamhammer’s first 6 zerglings. One explanation is that most opponents either prefer early aggression, or else fall to Steamhammer’s early aggression. Another explanation is that I may underestimate the damage the bug causes; maybe it leads to losses that are not clearly attributable.

forge expand reaction

Why is it still called “forge fast expand”? It was a fast expansion when invented, but by today’s standards it’s not fast at all. That’s why I say “forge expand.” (It has the same number of syllables as FFE).

Though I still have region work to do, today I decided to make an important improvement to how Steamhammer reacts to forge expand and other safe macro openings. The tournament Steamhammer makes 3 attempts to adapt: 1. If the enemy’s opening plan was predicted, it tries to select a good counter opening. 2. Otherwise, having missed the prediction and gone down a poor path, it cancels any planned static defense which is now unnecessary, and 3. makes extra drones to catch up in economy. If it’s still in its opening book, it stays the course and tries to minimize the disruption by changing planned zerglings into drones, which cost the same.

Its plans are still disrupted, though, because the extra drones and the omitted static defense cause minerals to build up. Steamhammer waits until the opening is over before it makes macro hatcheries and otherwise spends down its excess resources, and that is often too late. Zerg can’t keep up with the enemy’s economy and falls far behind.

Today I added 2 new reactions that happen when we want extra drones so that resources threaten to build up: 4. If possible, take gas early (or take another gas early). Putting drones on gas slows down mineral accumulation and may speed up tech openings, so that mutas or lurkers come out sooner. If gas accumulates too much, Steamhammer will stop gas collection, so there’s little downside. 5. Make extra hatcheries as conditions seem ripe. One or all of the extra hatcheries may be placed at expansions, depending on the situation. The rules are more cautious than the macro hatchery rules that apply once the opening is over, because they’re still trying not to disrupt the opening line. The overall effect of the new reactions is that Steamhammer pursues the tech of its opening line, sometimes faster since it has more income, gas, and larvas than expected, and transitions into the middle game in a stronger position. It’s making a big difference in test games, including wins from positions that were sure losses otherwise.

The fix is inspired by recent losses, especially the 2 losses to Skynet by Andrew Smith when it unexpectedly (to Steamhammer) switched from zealot rushes and DT rushes to forge expand. To my intuition, the forge expand reaction seems much less important than the command jam fix, which is a critical bug fix that affects far more games. And yet, taking into account test games and the rate at which Steamhammer was surprised by macro openings in the tournament, I estimate that it will save about 2/3 as many losses—in terms of improving elo, both seem almost equal. How does that happen?

Apparently you have to measure the severity of weaknesses, because intuition does not seem accurate. Unfortunately, to measure with an A/B test, first you need to fix the weakness. Maybe that is an advantage of machine learning, which does its entire job by measuring weaknesses and correcting them.