archive by month
Skip to content

Steamhammer 1.4 testing

Steamhammer 1.4 is more or less finished. I keep thinking I should package it up and be ready to go, and then I come up with another easy way to fix a weakness. I may yet fix a few more weaknesses, but progress so far is good. I’ve been running a lot of tests.

The simplest opponent for Steamhammer is Stone. Steamhammer now defeats Stone, from most openings and on most maps, without any learning needed, because its reactions to worker rushes are improved. (It loses if it plays 4 or 5 pool, because it doesn’t have enough drones to fight with.) I think a more aggressive worker rush will beat it, because Steamhammer’s micro is uncoordinated. But if you play a worker rush in the first game, win or lose, then Steamhammer will counter it in the second game.

The next are zerg rushbots and others that play fixed strategies that the plan recognizer is able to recognize. Steamhammer may lose the first game, or the first few—though maybe not, because its rush reactions are improved. As soon as it scouts well enough to recognize the rush (it’s not too reliable), the next game it switches to a counter and almost always wins from then on. It works against ZZZKBot, it works against Neo Edmund Zerg, it works against UAlbertaBot set to play zerg, it works against 2 different versions of the old bot 5 Pool that I have saved. I think it just works.

The next step up is UAlbertaBot without learning, which plays random and has a fixed strategy for each race. Steamhammer classifies UAlbertaBot’s zerg opening as “Fast rush” and its terran and protoss openings as “Heavy rush”, so once it has collected enough data it always plays to counter the heavy rush. It wins over 90% versus terran and protoss. And because of the improvements to rush defense, plus the fact that the opponent model knows to expect the fast rush as soon as it finds out that UAlbertaBot rolled zerg, it has a chance to hold the zerg too. In the end, the learned openings scored better than the tournament version with its hand-coded counter opening.

Learning opponents are more difficult. I played test matches against the opening learning bots Microwave and Zia. Microwave has the edge over the tournament Steamhammer. Against the new Steamhammer, the opponent model prevented Microwave from finding a stable way to win. When Microwave won a game, it tended to repeat the opening the next game, while Steamhammer tended to counter Microwave’s opening of the previous game. The match came out about even, with Microwave constantly switching its openings, unable to find one that worked consistently. Zia in contrast is weaker than the tournament Steamhammer, but the new Steamhammer was not able to come to grips with it. Zia plays a variety of different openings that look the same to Steamhammer’s plan recognizer, plus it rarely repeated an opening after winning only once, so Steamhammer was unable to choose good counters. Zia is the only opponent I tested that did better against the opponent model than against the random opening selection.

In one other test, I played a match of the new Steamhammer 1.4 versus the tournament version 1.4a3, both playing random. An opponent with random race and random choice of openings should be the worst realistic case for the opponent model opening selection (only a knowledgeable adversary that can predict Steamhammer’s predictions is worse). Also, since Steamhammer is still on BWAPI 4.1.2, the bot does not know when it starts up that it went random itself. The opponent model has no choice but to take separate statistics for each matchup, so when Steamhammer plays random it learns more slowly. The new version scored approximately 2:1. Even with tons of wrong predictions from the opponent model, the new version plays better.

These are only some of the tests I’ve run. There are opponents that the opponent model provides no leverage over. The plan recognizer has a limited repertoire, and if you step outside it can’t help. In the configuration file, I removed most but not all of the hand-selected enemy-specific openings. Removing the rest will be for a later version in the 1.4.x series.

Overall, the upcoming Steamhammer 1.4 is much improved over the tournament version. It is better in all matchups, with smoother macro and several small but useful skills for terran and protoss. It is especially improved in ZvZ and ZvP. It hasn’t caught up with McRave, but it should land upsets more often. I’m eager to see how it does in the wide world of the SSCAIT ladder. Are there more opponents like Zia that will leave it confused?

Next: How it works.

you don’t have to attack

Some bots, like Willbot, take a passive line and build up a large force before taking any offensive action. Others, like Steamhammer, prefer to keep the pressure on and attack whenever possible, even at high risk.

Neither style of play shows real understanding of strategy. Here are 2 simple principles that should be uncontroversial:

1. The side with more stuff has an advantage. If you have more units, or better upgrades, or higher tech, that is an advantage. You get those things by spending minerals and gas, so ultimately the side with more resources has an advantage.

2. The side on the defensive has an advantage. If you have static defense built, tanks sieged, lurkers burrowed, or simply units deployed in good position to engage, that is an advantage. If you have a shorter route to your production buildings, then reinforcements arrive sooner and that is an advantage. If you can stop or delay or channel attacking forces with blocking buildings or chosen terrain, that is an advantage.

From these 2 axioms, we can act like Euclid and derive a theorem: If you control more resources, you don’t have to attack. You can attack if you spot a weakness, but you don’t need to. You can hang back, in a safe defensive position, until your theoretical resource advantage manifests as a practical battlefield advantage, and then attack. At least you can wait until you are maxed at 200 supply.

Containment is the common case we see in bot play. When Steamhammer has the opponent contained, it constantly tries to press forward to notice and immediately exploit any weakness. It’s usually a mistake. Typically you should contain at the best defensive location you can, take care with the forward units you risk for vision, and scout to make sure the opponent can’t sneak an expansion or bypass the containment with drops or air play. You control the resources on the rest of the map, and that is an advantage.

The general rule is: If you can take expansions and deny expansions to the opponent, you will win. You can win without ever entering a fair fight. If the opponent takes an undefended expansion, smash it. If the opponent moves their army to defend a new expansion, smash the enemy natural instead. At worst, you force the opponent to allocate forces accurately to defend all threats. Tscmoo is the bot which implements this rule the best, though it still seems a bit crude to me.

The extreme case is a map split, where each side ends up controlling about half the map—except one side controls an extra base or two. Humans sometimes play from the beginning of the game for a favorable map split. I don’t think any bot understands the idea.

How do we get from here, strategic ignorance, to there, understanding tactical force allocation risks and tradeoffs to meet the strategic goal? Well, I mis-stated it; bots don’t have to understand, they only have to take the right actions. Bots today that contain the enemy don’t (it seems to me) understand what they are doing. They are following rules that produce containments as an emergent behavior. It’s a valid approach. But I recommend more explicit knowledge representations, because I think it will lead to faster progress.

Sometime this year Steamhammer will get an evaluation function that tells it how good or bad a situation is. The first version may be a simple hand-written evaluator that is used for a few decisions. In time, I hope to create an accurate evaluator by machine learning, good for decisions throughout the game. Then the same underlying knowledge, encoded the evaluator, will let Steamhammer adapt its openings moment by moment, choose its unit mix, and maneuver its forces.

why is SSCAIT replaying tournament matches?

Why is the SSCAIT tournament re-running so many games in the elimination phase? There may be discussion of this on Facebook or somewhere, but I haven’t seen it. (I don’t use Facebook at all, because I don’t want to support their world domination plans. It would interfere with my world domination plans.)

I keep seeing games come up that are clearly tournament matches—then, later, the same matchups appear again. It looks exactly as though tournament matches are being replayed. Which games will be declared official?

They did the same thing on a smaller scale last year, and it caused some controversy. See the comments to the post Steamhammer vs LetaBot, SSCAIT round of 16 from last January. Last year it affected Steamhammer, and yet it didn’t bother me at all, partly because I saw the single elimination bracket more as entertainment than as a test of strength. There are legitimate administrative reasons to replay games. This year it doesn’t affect Steamhammer (games have been replayed, but with the same predictable results), but it bothers me more. There is an effort to make this phase of the tournament more rigorous, and replaying games undercuts that. When a learning bot is paired against a non-learning bot, such as CherryPi versus Iron, having more games against the opponent gives an advantage to the learning bot.

I doubt there’s favoritism behind the scenes, but how can I know? This comes up in politics all the time: It is not enough to avoid impropriety. If you want to be trusted then you also have to avoid the appearance of impropriety.

optimizing one opening build

Yesterday I revised the ZvP_10Hatch opening. It was originally designed to counter 2 gate zealot rushes, and now (with the opponent model) it is used to counter expected heavy rushes by terran and random opponents too. I renamed it Over10Hatch. The basic build order is extractor trick for a 10th drone, overlord, hatchery, and a couple of sunkens to help hold off the mass zealot pressure so it’s safe to make drones and tech.

The changes are small. If you watch casually, you might not notice. The extractor trick is now before the overlord, not after, a careless oversight in the original. The first sunken is delayed slightly, and the second sunken needed to deter masses of zealots is delayed longer. The number of early zerglings is cut back to just the number needed in the worst case. The early game plays out much as before, it only looks as though Steamhammer might be cutting it a little close. (CherryPi’s versus-protoss opening often gives a similar impression. The team knows what it is doing.) You could miss that there is a slightly higher drone count.

A pro would cut it closer, but a pro has good judgment. The way the rest of the game plays out looks different. In the old build, Steamhammer would often narrowly hold the zealots, get lurkers just in time to survive, and use the opponent’s lack of tech to slowly push for victory. With the extra income from the new build, Steamhammer holds the zealot pressure easily, safely gets lurkers, and quickly smashes the protoss with a mass of lurkers and lings. The turnaround in the game looks completely different. The effect versus no-academy marine rushes like UAlbertaBot’s is similar.

I thought it was a good lesson in the importance of an efficient build.

where does the overlord start relative to the hatchery?

Steamhammer has a scouting feature that works in ZvZ matches: When it sees the position of the enemy’s first overlord, it tries to infer where the enemy base is. See Steamhammer’s new scouting skills. It turned out that on some maps—Icarus and Roadrunner—the inference can be wrong. The reason is that Steamhammer measures the distance from the potential hatchery location, not from the overlord’s start location, which is offset. If 2 bases are at nearly the same distance, the offset can matter.

So I came to the question: At the start of the game, where is zerg’s overlord placed relative to its hatchery?

The answer turned out to be simple. The overlord’s x position is offset by either -99 or +99 from the hatchery, and its y position by either -65 or +65. The position is chosen to be whatever is closer to the center of the map. In code:

overlord.x = hatchery.x + ((hatchery.x < 32 * BWAPI::Broodwar->mapWidth() / 2) ? +99 : -99);
overlord.y = hatchery.y + ((hatchery.y < 32 * BWAPI::Broodwar->mapHeight() / 2) ? +65 : -65);

I don’t promise that this is exactly right. It works in the cases I tested. I imagine that somebody on the OpenBW project could tell us the exact condition.

how Steamhammer is getting on

backward games

I added a zerg-versus-zerg 12 hatch opening to Steamhammer. It’s the greediest common ZvZ opening, so Steamhammer only plays it when the opponent model expects that the opponent will also play a hatchery-first opening. But look what happened: In a test game today against Microwave, Steamhammer opened 12 hatch and Microwave played its 5 pool opening. Steamhammer saw it coming only after the spawning pool was started; all it could do was start a sunken as soon as possible. Microwave’s zerglings tore down the natural hatchery first, and that was enough time for the sunken to finish and 4 zerglings to hatch, just enough to hold the rush. Steamhammer had a huge drone advantage, went up to 3 hatcheries, and smashed down Microwave by brute force while fighting right in the face of Microwave’s emergency defensive sunken. That’s the way to play!

It’s no wonder that pros, with their much stronger defensive skills, are so willing to play a greedy opening like 12 hatch in ZvZ.

Microwave’s 5 pool, by the way, is a clever build. Microwave builds a second hatchery at its natural to keep the zergling numbers up, and if the pressure doesn’t work out it drones up and switches to mutalisks. The opponent has to keep playing well to stay ahead. In a different test game, Steamhammer predicted the 5 pool and played an anti-rush opening which held the zerglings easily—then Steamhammer lost despite its advantage because it didn’t tech fast enough: The lair finished, and it decided to build a macro hatchery before the spire; the spire finished and it decided to build a third base before any mutalisks.... That’s not the way to play!

Steamhammer 1.4 is frozen

Steamhammer is feature-frozen for the 1.4 release. I will keep on fixing bugs and minor weaknesses (lair finished...) and tweaking the configuration, but I won’t start anything bigger that might delay the release, though there are plenty of temptations. I should be able to upload within a day after SSCAIT reopens for submissions, and release after another day or so once I can see that I didn’t break anything obvious.

Steamhammer 1.4 is substantially stronger than the tournament version in some ways. I think it is especially improved at transitioning out of the opening in ZvP. It will start off slowly on SSCAIT, because it will have to learn for itself how to beat a number of opponents that it currently beats by hand configuration. Also the opponent model has weaknesses and blind spots that are sure to cause some surprises. Still, if I didn’t mess anything up, Steamhammer should at least keep up with the competition and maintain a place near the top. Its play continues to grow more complex and interesting.

the game info display

As an example of a minor change, I have updated the game info display that Steamhammer can draw in the upper left corner.

new game info display

If we have decided to steal gas, whether in the opening build order or in the opponent model’s auto gas steal decider, the strategy line adds + steal gas.

Opp Plan is the opponent’s plan. It is often Unknown. If the opponent model has predicted the opponent’s plan and the plan recognizer hasn’t verified it yet, we get the orange word expect. The opponent model tends to be overconfident about what it expects. When the plan recognizer thinks it knows what is going on, expect disappears and the recognized plan shows. The plan recognizer also relies on inadequate information, but at least it looks first. Steamhammer uses both the expected and the recognized plans, to a limited degree, for different purposes and with different levels of confidence.

The Time: line gives the game time in frames and in minutes:seconds. The minutes:seconds display is both more compact and easier to read than the original XXm YYs display. Then come the mean and maximum time spent per frame, in milliseconds, so we can start to get an idea of when things are slow.

what opponent modeling skills does CherryPi have?

CherryPi has a striking habit of barely scouting before concluding that it knows what the opponent is going to do, and then seeming reluctant to ever change its mind. A pro player will often also barely scout, but pros probe for new information later in the game and are ready to draw new conclusions. I thought the CherryPi-TyrProtoss match from the SSCAIT 2017 round of 16 was a tantalizing example (see the video). It doesn’t tell us how CherryPi works, but it offers a small hint.

In the first game Tyr opened with 2 gateways for early pressure. CherryPi scouted the 2 gateways with a drone at about 1:50 into the game and reacted somewhat logically with an array of sunkens, behind which it built up a strong economy. Tyr saw the sunkens and did not seem to react to them at all, strategically. Protoss should have expanded immediately and taken steps to prevent zerg from expanding further. Tyr did neither, and it lost.

Seeing the opponent’s build and reacting is nice, but it’s not a special skill. In this game, CherryPi did well but didn’t show anything special in terms of strategy.

In the second game we saw something that might be more interesting, but it’s still unclear. At about 2:10 the scouting drone saw a forward pylon and immediately returned home without looking further. When I first watched the replay in OpenBW I thought that CherryPi had left before seeing the forge warping in behind the drone, but when I watched the replay in StarCraft with only CherryPi’s vision turned on I saw that it did just catch the forge starting. Still, the fact that the drone turned around immediately suggests that CherryPi had seen enough; the forward pylon was all it wanted to see to understand the opponent’s build.

CherryPi reacted with 3 hatcheries before pool, which is safe versus forge expand but allows good responses from protoss too. I imagine the CherryPi team chose the build because it’s unusual in that situation and a protoss bot might not know a good reaction (though any human player would have an idea). In this case they were right, and CherryPi got ahead and won.

What do you think happened? Did CherryPi see the forge and know there couldn’t be a gateway yet, so it could safely play a slow build? Or did CherryPi take a leap in the dark and make its decision after seeing only the pylon? Nothing stops protoss from building 2 gateways at the forward pylon and making an aggressive rush—a pro is more likely to do that than to build the 2 gateways in the main. The advantages are that the rush distance is shorter and it protects the natural.

opponent modeling skills

I don’t know what CherryPi is doing. Maybe there’s discussion about it somewhere, which I haven’t seen. But I can’t help comparing it to Steamhammer’s opponent model. When all the intended features of the opponent model are implemented, Steamhammer will be able to see the pylon and immediately conclude “I’ve seen you do that before, it was the start of a forge expand opening and you’re probably playing it again. Let’s counter the forge expand.” I suspect that may be what CherryPi did—probably not in exactly the same way, but maybe in a way that’s broadly similar.

Bots tend to be predictable, and their opponents can take advantage. It’s one of the ideas behind Steamhammer’s opponent model. Seeing a forward pylon narrows down what the opponent is doing, but doesn’t zero in on one possibility. But if the opponent tends to continue the same way as in the past, you can act as though there were only one possibility and start to counter it earlier, gaining an advantage. (Against a forge you make drones, against gateways you make zerglings and a sunken. If you make unnecessary zerglings, you set yourself back.)

The development Steamhammer version can already do this, in a limited way against a random opponent. For example, against UAlbertaBot, Steamhammer says “this is probably a heavy rush (with zealots or marines), so I’ll prepare for that.” If it finds out that UAlbertaBot rolled zerg, it immediately (well, within 8 frames) realizes “uh oh, I was wrong, it is going to be a fast rush which has a different counter.” It doesn’t wait to see early zerglings, or a spawning pool, or a drone count, it immediately starts to adapt its build. The exact reaction depends on the timing, but there is code that says, for example, to cancel the second hatchery if it will allow a spawning pool to get up faster. By reacting immediately, Steamhammer has a better chance to survive despite its weak defensive skills.

The opponent can thwart the opponent model, at least to an extent, by being genuinely unpredictable. That’s a countermeasure. Steamhammer’s opponent model should force top bots to vary their play more. That’s another idea behind the opponent model.

early experience with Steamhammer’s new opening selection

I’ve been testing out Steamhammer’s new opening selection algorithm, part of the opponent model.

CasiaBot plays a hand-made anti-Steamhammer build, sunken turtle into fast spire. (It does this 80% of the time, and 9 pool speed the other 20%. It’s hard to counter both at once.) Steamhammer’s strategy reactions to it are in the right direction, but are too slow and sloppy. For SSCAIT 2017, I hand-made an anti-CasiaBot build that holds off the mutalisks with a slightly faster spire and gets a second gas to win with numbers, while preventing CasiaBot from ever expanding. CasiaBot’s build is hard for Steamhammer to beat, much tougher than CherryPi’s similar turtle build. But the counter usually works, and Steamhammer scored 2-0 in the round robin.

CasiaBot was a simple first test. In the first game, Steamhammer made its usual random opening selection and lost to the turtle build. But the plan recognizer recognized the build. In the second game, Steamhammer played its counter and won. The system worked.

I didn’t test against CherryPi, but I would expect this sequence as the most likely: Steamhammer wins the first game, because its random opening selection usually beats CherryPi’s first ZvZ opening. CherryPi switches to the turtle build for the second game and evens it at 1-1. (That’s what happened in the round robin.) Steamhammer switches to the counter-turtle build and pulls ahead again to 2-1. I don’t know what happens after that. CherryPi might have another winning build and even it again at 2-2, which would set up a learning race.

UAlbertaBot is a much tougher test because it plays random, and Steamhammer doesn’t have the smarts to fully adapt to each of the strategies UAlbertaBot follows. My hand-made counter scores about 2/3, almost always winning against zerg, mostly winning against terran, and usually losing against the protoss zealot rush.

I turned off the hand-made answer and let Steamhammer learn. In the first game, UAlbertaBot rolled terran and lost to Steamhammer’s random choice. In the second game, UAlbertaBot was protoss and Steamhammer played a counter to the terran opening it had seen: Steamhammer doesn’t need to lose to learn. Stopping the marines and stopping the zealots can be done in a similar way, and Steamhammer won again. Then UAlbertaBot rolled zerg twice in a row, and Steamhammer mis-countered both times. In game 5 UAlbertaBot was protoss again, while Steamhammer (having been hit with 5 pool twice in a row) thought its opponent had switched to fast rushes and played to stop the early zerglings, losing again. But at the end of 15 games the score was 9-6, not distinguishable from the 2/3 winning rate of my hand-made counter. The openings played were completely different, most wins were against protoss and terran instead of zerg and terran, and the reasoning behind the choices was the practical “this is what I saw, now beat it” instead of the theoretical “this is what ought to work” of my hand-made counter, but the end result was the same. Good enough!

Only longer experience will show how well the system works in the wider world of the SSCAIT ladder, and whether it can hold up when bot authors look for and exploit its weaknesses. I have thought of a lot of ways it might break down. I know it doesn’t work against Juno by Yuanheng Zhu. But I have also thought of a lot of ways to improve it. I have ideas in mind for the plan recognizer, the opening selector itself, and the reactions to recognized plans. A lot of information is not exploited yet. Some ideas I will get to before release, some will wait for later versions in the 1.4.x series, and some will wait a long time.

Near the 1.4 release I’ll write a description of how it works, up to date with the release. There are several working parts, but for what it does, it’s not complex. It acts for all 3 races, retains configurability, allows for random choice of openings in every situation if the author wants, and expects that some opponents will change their behavior over time. I imagine that getting all these features is simpler than you expect.

solid versus daring

A game player of a given strength is solid if it wins reliably against weaker opponents, and daring if it loses more games to weaker opponents and makes up for it by winning some against the stronger. I think the term solid is common. I decided for myself that its opposite should be daring.

The idea applies to all games of skill with winners and losers. You can always find more solid and more daring players, unless the game is so constraining that it leaves no room for stylistic differences. From the point of view of a player with a fixed level of skill, you could say that being solid means that your style of play aims to reduce your risk of losing, while playing daringly means you try to increase your chance of winning. From the point of view of an author, you could say that trying to make your bot more solid means working to reduce exploitable weaknesses that cause losses, while trying to be more daring means creating strengths that will catch out some opponents (like timing attacks or unusual rushes or tech switches). It makes sense for authors of weak bots to focus on daringly beating the stronger, and authors of strong bots to solidly beat the weaker. (Of course it also makes sense to do whatever is more fun.)

I’ve never seen a statistical measure of solidness, in the same way the elo is a statistical measure of strength. It seems widely useful, so I hope somebody has worked one out, or will work one out now that they know about it. A good one seems complicated, though. You could do something like estimate the winning chances each player has against each opponent with a method like that of bayeselo, then try to fit a measure of deviation from flatness over the range for each player. Does the difference between predicted and measured winning chance vary systematically depending on the predicted winning chance?

Here’s one simple measure for the top finishers in the SSCAIT round robin: What proportion of a bot’s losses came against the top 16? If most losses are against strong opponents, the bot is solid. The measure is approximately statistically fair only for the top few bots. We can see that Iron is solid and Tscmoo and McRave much more daring, while Killerbot and Bereaver are more solid than Tscmoo and McRave. I don’t think this number gives us much insight into whether Iron is more solid than Bereaver.

#bottop16 loss rate%
1Iron7/1070%
2Tscmoo4/1428%
3McRave5/1533%
4Killerbot9/1947%
5Bereaver11/2250%

Another simple measure for the stronger bots is: What’s the weakest opponent that you lost to in the SSCAIT round robin? The measure will be noisy, and comparisons only work for players that are close in strength. Also extremely daring lower-rank players like Oleg Ostroumov can distort it. But it’s quick to figure out and that counts for a blog post. I read the results from the unofficial crosstable.

#botworst loss
1Iron#31 PurpleCheese
2Tscmoo#56 NUS Bot
3McRave#69 FTTankTER
4Killerbot#60 Oleg Ostroumov
5Bereaver#35 Dawid Loranc
6Steamhammer#44 Lukas Moravec
7Wuli#61 Marine Hell
8CherryPi#60 Oleg Ostroumov

My feeling is that Killerbot and Wuli are more solid than this noisy measure gives them credit for, and otherwise the numbers give a rough but fair idea. Iron is more solid than Tscmoo or McRave. Bereaver and Steamhammer are more solid than, say, McRave and CherryPi. In Steamhammer I’ve worked toward solidness, so I’m pleased to have it.

close game Steamhammer-LetaBot

I don’t want to write up any games that might be from the SSCAIT elimination phase, since it wouldn’t be polite to scoop the official announcement. But there are some good ones. In particular, CherryPi is starting to show its full opponent modeling skills, which are more sophisticated than we could see in the round robin phase with only 2 games against each opponent.

So here is a close game from the round robin, Steamhammer versus LetaBot by Martin Rooijackers. Steamhammer randomly chose a 1-hatchery lurker rush, a risky opening which often beats LetaBot quickly. This time, Steamhammer got distracted chasing the scouting SCV and put on no pressure with its early zerglings, allowing LetaBot to bunker safely at the front of its natural instead of in its main. Often LetaBot overreacts to the threat of the early zerglings, which is actually slight, and leaves itself weak to the lurkers. Here it defended nicely.

marines and blood

Well, not quite nicely. A bunch of marines stood in front of the defenses and were slaughtered by lurkers outside detection range of the turret. But Steamhammer is no smarter. As soon as the way to the bunkers was clear, zerg became overaggressive with the lurkers and lost them quickly. One lurker stayed outside bunker range and drained terran minerals into repair for a while, but as soon as marine range research finished in the academy, it died too. Steamhammer had rushed to lurkers with a weak economy (see the worker counts in the picture), so after an even-ish combat outcome, terran was ahead.

Followup lurkers behaved the same, killing infantry that placed itself needlessly in danger, then placing themselves needlessly in danger and dying. LetaBot expanded much later than it should have, letting zerg catch up in economy. But Steamhammer had been frittering its army away while LetaBot continued to build up despite losses, so terran was still ahead.

Finally the terran push came. The 4 sunkens and small zerg force can only delay the inevitable; the natural will definitely be lost. Zerg has 4 bases and an adequate economy, so it can attack the terran ball from all sides, but chances to save the game seen small.

the natural about to come under siege

The rear-placed sunkens were highly effective, because LetaBot assaulted them without regard to losses. Marines funneled between the buildings, suffering both sunken hits and splash damage from tank fire. Scattered zerglings ran in from every direction, but SCVs kept the tanks repaired. Steamhammer kept sending more drones to mine the natural gas, while LetaBot had expanded and added to its SCV count, so zerg was falling behind in economy again.

A small number of terran reinforcements were intercepted by zerglings taking a strange path across the map. The terran attack started to become disorganized, with some units running into the zerg main before the natural was reduced. In the picture, the spread-out tanks are under attack from both ends, and in the minimap is an engagement between zerglings and reinforcing marines, preventing the marines from joining up promptly.

terran is disorganized

The zerg army remained tiny, but it was LetaBot’s turn to rush in pell-mell without organizing its forces. Without marines supporting the tanks, and with adrenal gland research finished to increase the zergling attack rate, the small number of zerglings finished them off easily.

With the slow-to-replace tank mass destroyed, Steamhammer had enough forces to stop any followup attacks. The turnabout was sudden. Zerglings broke into the terran natural while the double bunkers were unoccupied, and simultaneously 1 lurker and a handful of zerglings erased the terran third. There was a little more drama with a last-ditch battlecruiser as Steamhammer was slow to deliver the finishing blow, but in the end an unnecessarily massive zerg force battered down the undefended terran buildings.

It was another narrow comeback. Those make fun games.

Next: Analysis of solid versus daring bots.

Steamhammer opponent model status

The opponent model can (sometimes) select openings in the development Steamhammer. That’s the key feature I wanted, so I should be able to release Steamhammer 1.4 shortly after SSCAIT ends, as I hoped. After some thought, I found a simple way to avoid the big refactoring I had expected.

It’s rather nice to see Steamhammer lose the first game, or sometimes the first several games, then realize what’s going on and abruptly counter the opponent’s play. It tries to predict the opponent’s plan. So far the prediction is made in an extremely simpleminded way, but I may improve it before release. Of course it is easier to predict an opponent that plays a fixed strategy than a learning bot, so I expect the skill to be less effective against the top bots. Predicting the opponent’s plan lets Steamhammer choose more extreme openings that are risky against an unknown opponent. Today I added a ZvZ_12Hatch opening, which is unsafe against many bots but is a strong counter to the favorite strategies of AILien and a few other zergs.

I expect to remove most of the enemy-specific strategies that are configured for various opponents. There are way too many of them. Steamhammer will finally have the ability to adjust its play when an opponent starts playing differently. Some enemy-specific configuration will remain for this version because the opponent model is not smart enough yet. I think I will let Steamhammer learn from scratch rather than feeding it artificial “learned” data to replace the manual configuration. That will depress its results against the rushbots until it learns about them, but it will be a good test of how well the opponent model works. Steamhammer will be smarter, but at first it will look stupider.

As always, my list of bugs to fix and improvements to make is longer than I can possibly work through. CherryPi has a whole team that still can’t fix all its bugs. I should be able to make some progress before the end of the tournament, though.

One of Brood War’s permanent irritations is that updates to the game engine tend to break saved replays from previous versions. There is no backward compatibility. I have decided to add a version number to Steamhammer’s saved opponent data files. I hope that I’ll be able to release future versions with different file formats that can still make use of the learning data from older versions. It would be no fun to have to relearn the opponents from scratch because an improved version came out.

aside

Speaking of SSCAIT, mixed in the usual random game list since the round robin phase ended are games that look like they’re from the round of 16. It worked the same last year. I watched some of these games from the replay page, but Steamhammer’s apparent round of 16 games are omitted from the page, no doubt due to some glitch. I save all of Steamhammer’s replays. Tournament replays are especially interesting, so it will be a shame if some of them aren’t available.

SSCAIT 2017 round robin results

The SSCAIT 2017 round robin phase has finished. See the official results and the unofficial crosstable. The unofficial crosstable seems to include a few extra games; I guess there’s a small leak in the pipeline. I have a few thoughts about the results.

Of the top 5, McRave is a newcomer this year and the rest are the old guard: #1 Iron, #2 Tscmoo random, #3 McRave, #4 Killerbot by Marian Devecka, #5 Bereaver. Killerbot and Bereaver weren’t updated this year and couldn’t quite keep up with the best, but remain tough opponents. It still takes a long time to produce a strong bot.

The results were influenced by the long tail of weaker bots which brought the tournament up to 78 participants. With many weaker opponents, the top players benefit from solid play, avoiding the risk of losing. Bots with daring play, which score well against strong opponents but lose to some that are weaker, were at a disadvantage. #1 Iron is the most solid bot: Look at the crosstable and see its row of 1-1 results against its strongest opposition; it more than made up for those losses with extreme consistency in defeating the lower ranks (the weakest bot it lost a game to was #31 PurpleCheese). Tscmoo in contrast scored well against top opposition, but had more losses to the long tail. I will try a little more analysis of the solid/daring tradeoff in another post.

#7 Wuli is hanging in there. The hard zealot rush is still a successful strategy, and it executes well.

#8 CherryPi remains an interesting case. It also suffered from its daring play. To my eye, it seemed to be learning something about each opponent from the first game, and applying it in the second. As the tournament continued, it surged higher in the ranking. How high might it have finished in a very long tournament? It would be interesting to count how many times it scored a loss then a win versus win then loss in the 2 games against each opponent: A high ratio of loss-win over win-loss indicates the ability to learn from a single game. But it might not be so clear; against an opponent that also learns like McRave, or that changes its play up like Steamhammer, what CherryPi figures out from its first game might lead it astray in the second (I think that happened in the second McRave-CherryPi game).

Microwave, Neo Edmund Zerg, and TyrProtoss tied for places #9-#11, each with 31 losses. I had expected Microwave to do a little better, but I think it relies on its opening learning, and it hadn’t played all the opponents before so it didn’t know enough. I had expected the rushbot Neo Edmund Zerg to do a little worse, but the many newcomers of course all fell to its rush.

My predictions for the tournament are reasonably good (except for the glaring mistake that the tournament was actually a double round robin). I did not expect Tscmoo to finish so high. Steamhammer I boldly forecast to finish in the narrow range from #4 to #8, and it ended up squarely in the middle of that range at #6. I’m pleased that I understand the performance of my own bot.

the elimination phase

According to the rules, random bots will not play in the elimination phase. So Tscmoo random and Andrey Kurdiumov are excluded, and the 16 continuing to the elimination phase should be:

  1. Iron
  2. McRave
  3. Killerbot by Marian Devecka
  4. Bereaver
  5. Steamhammer
  6. Wuli
  7. CherryPi
  8. Microwave
  9. Neo Edmund Zerg
  10. TyrProtoss
  11. XIMP by Tomas Vajda
  12. Arrakhammer
  13. Skynet by Andrew Smith
  14. LetaBot by Martin Rooijackers
  15. AILien
  16. ZurZurZur or Black Crow

Last year, Steamhammer and Zia tied for places #16-#17, and played a best-of-3 tiebreaker to decide who continued. This year ZurZurZur and Black Crow are tied for #16-#17 (excluding random bots) with 108 wins and 46 losses. I hope for another tiebreaker!

Last year the pairings were #1-#16, #2-#15, and so on. It gives the top finishers an advantage over middle finishers; #8 is paired with #9 and must play a close rival. The official pairings were tweeted while I was in the process of writing the post; here they are:

SSCAIT 2017 elimination phase pairings

This is close to what I expected, but not quite the same. The tied bots Arrakhammer and Skynet were taken in reverse order from the order listed in the official results, so Steamhammer is paired against Skynet and Bereaver against Arrakhammer. Maybe the idea is to avoid Steamhammer playing against its fork Arrakhammer? Or maybe the idea is to avoid 2 mirror matchups, ZvZ and PvP? Anyway, these are acceptable pairings by the same rules followed last year, except for the unannounced tiebreaker. Maybe ZurZurZur’s 2-0 win over Black Crow in the round robin is taken to break the tie.

New this year is a loser’s bracket. This is now a double elimination design, where you have to lose twice to be out, and no longer single elimination. If you lose 1 match, you fall to the loser’s bracket, where you remain until you either lose a second match or win every match and win the loser’s bracket. The final is between the winner of the winner’s bracket, which lost 0 times, and the winner of the loser’s bracket, which lost 1 time. Every other bot lost twice and is out. Giving participants a second chance makes the tournament a little more fair. On the other hand, last year the elimination phase included best-of matches for the round of 4 and later, whereas this year I’m guessing that they may be single games.

I think the rules should explain the format of the tournament. There is no clear explanation that I know of.

two McRave games

Here are 2 McRave games. The first is what will probably turn out to be the biggest upset of SSCAIT 2017, and for journalistic balance (look at me! I can pretend to be objective, just like a reporter!) the second is a win over a tough opponent that has given McRave trouble.

McRave is currently at #3, and it will probably finish there. So I find it striking that both games show easy to notice weaknesses on both sides. All bots have a long way to go to become truly strong.

McRave-FTTankTER

As I write, McRave is #3 and FTTankTER is #69 out of 78 entrants, with fewer than 50 games remaining to play in the tournament. There are a couple of unplayed games that theoretically could unseat this one as the biggest upset, but it’s unlikely. What I find most remarkable about the game is not that the result was such a reversal, but that it came about because FTTankTER played better. McRave didn’t lose because of a bug (at least not one that I can detect) or by playing a risky strategy and getting unlucky, but because of missing skills.

McRave-FTTankTER started with McRave fast expanding behind a single gateway and FTTankTER rushing with marines.

marines arrive at the front

McRave did not make an initial zealot, but waited for its cyber core to finish so it could get straight to dragoons, the key unit at the start in PvT. Making 1 zealot slows down dragoons a trifle but adds safety against all kinds of cheeses and fast rushes, so it’s probably smart. But even without, McRave could have held. When a small number of marines show up at your front, they are weak. Marines gain strength in numbers because they are ranged units, but workers are faster and tougher than marines without medics or stim. Just pull workers and defend until your gateways produce. Workers can easily win fights against small numbers.

Instead, this happened:

Protoss pulled probes only after losing its first gateway, when the marine numbers had grown. The probes did not try to surround marines, but mostly milled around in front of the marines as if playing dodgeball. Nearly every probe was lost before the dragoon entered the fight. McRave was too optimistic, first in ignoring the attack, and then in continuing to throw away probes. A fallback plan would be: Abandon the natural, retreat the surviving probes, wait for the dragoon, and try a coordinated probe-dragoon defense of the main.

FTTankTER is clumsy and wasn’t able to finish off its helpless opponent, but the no-kill time limit ran out and terran won on points.

I think McRave shows some wider vulnerability to marine all-in attacks. McRave-Oleg Ostroumov is an example. Since McRave has lost fewer than 10% of its games, its weaknesses are apparently not easy to exploit.

McRave-CherryPi

CherryPi won its first game over McRave when McRave played a standard forge expand. In the second game, McRave played differently and CherryPi never seemed to notice. It was still a fight, though.

When both players learn, it becomes a race to see who can learn more and faster. With only 2 games, we can’t tell how the race would have turned out.

The game McRave-CherryPi on Benzene opened with McRave building 2 gates and CherryPi playing overpool into second and third hatcheries at the natural. CherryPi droned up as if McRave had fast expanded, which it should have known didn’t happen because its zerglings made it to the protoss natural. Zerg was underdefended, and McRave’s zealots killed a couple drones in the zerg natural and started hitting buildings.

Then a sunken started and the zealots retreated for no apparent reason. Protoss should at least take swipes at the morphing sunken until zerglings appear. The protoss scout probe in the main saw the zergling count and location, so McRave could have known it was safe. In the game above, McRave was overconfident; here it is overcautious. It is a sign of not truly understanding the situation (so far, no bot does). In the picture, the zealots have just retreated.

protoss retreats unnecessarily

Wuli beat CherryPi 2-0 with its heavy rush, but McRave likes to tech faster. CherryPi added to 3 sunkens and continued drone production, still seeming to assume that McRave had fast expanded. McRave poked repeatedly at the front without committing much or achieving much; at least it impelled zerg to spend on fighting units instead of drones. McRave often had a vanguard of units doing the combat and a rear guard that stayed out of the fight. I got the impression that McRave was not hiding its strength, but was just confused.

CherryPi had mismanaged the opening and was contained. Lurkers or mutalisks might have forced protoss back, but CherryPi got the lair late and did not make either; it wants to win with low-tier units. Sticking with zerglings and hydralisks and making many drones, zerg soon needed to expand more than it safely could, and put a hatchery at the nearby mineral only base, barely outside the containment. McRave soon scouted it—and did nothing. Protoss continued to poke at the front and ignored the third base. It could have detached a couple of rear guard zealots to take it down; zerg could have done nothing. The picture shows protoss defeating an inadvisable zerg foray near the mineral only third. After this, McRave ignored the third and made another poke at the front (even if the bot doesn’t notice creep, protoss had seen the hatchery with a probe). In the minimap, McRave has just started its natural nexus.

smashing a zerg escape attempt

Finally McRave felt confident enough to split its forces and kill the expansion. Before it died, CherryPi started a fourth base in the lower right corner. CherryPi was ahead in workers but had only 2 mining bases, while McRave had a far stronger army and a mostly successful containment (it only leaked a few drones).

After finishing the zerg third, McRave seemed to realize how far ahead it was and broke into the natural. With drones killed and a second nexus to make more probes, McRave had effectively caught up in economy and its army was more than zerg could face. In the picture, a high templar is storming drones that decided to fight instead of running away. The drones might as well do that; the only place they could safely run away to was the main, which was already saturated.

storming drones

CherryPi did not go down easy, but protoss was too far ahead. Oddly, though McRave made many templar and they accumulated plenty of energy, that one storm was the only one in the game. The high templar stayed in the rear guard where they were too far away to contribute. Also, both bots seemed confused by the neutral building block on the map, and got units stuck behind the block. I expect that from rough bots like Steamhammer, not from polished competitors.

CherryPi showed its curious strategic rigidity, where it believes without scouting that it knows what the opponent is doing—in this case, it even scouted that the opponent was not doing what the zerg opening assumed. To me it seems strange, because in Steamhammer the first major feature I wrote was the strategy boss which solves this exact problem, and it greatly boosted zerg’s strength. McRave showed surprising caution and slowness in taking advantage of opportunities.

Steamhammer 1.4 plans

My hope for Steamhammer 1.4 is to release it shortly after SSCAIT ends. Whether it is then or later, 1.4 will come out when it is “done enough.” I have finally decided what “done enough” means.

The headline feature is of course the opponent model. I turned off the extensive game record for this version, and kept only the brief summary of each game against each opponent, largely a list of numbers like when air units came out. The game summaries are already so rich with information that it’s difficult to exploit it all.

The opponent model will initially support 3 features: 1. The plan recognizer to figure out what the enemy is doing. This is in the tournament version and is being improved. 2. Specific strategy reactions to recorded information about this opponent. 3. The largest addition over the tournament version will be opening selection based on the opponent model. My first cut at this will be simple, but it will be able to hard-counter some opponents in the second game played while still choosing openings randomly. The opponent model will allow Steamhammer to play specialized openings that aren’t safe against the average opponent.

Choosing openings according to the opponent model is turning out harder than I expected. The decisions are not hard; I wrote a very simple first try. Refactoring so that the right information is available at the right time for each decision, that’s the tricky part. The codebase remains on a different model. I’ll have to take care to avoid bugs.

I don’t want to ignore terran and protoss. They have been falling behind in skills. Well, they’re going to keep falling behind, but more slowly. All the opponent model features are supported to some extent by every race. I have added new strategy adaptations and emergency reactions so that terran and protoss should be a little less fragile. Today I am trying to get shield batteries working and fix protoss emergency zealot production.

As usual, bug fixes are making a big contribution. The Steamhammer development version is noticeably stronger than the tournament version, to my eyes. The BuildingManager fix helps all races with macro, and the chooseTechTarget() fix (briefly mentioned here) helps zerg avoid strategy blunders.

Steamhammer 1.4 will be the first of the 1.4.x series. I want to make more progress before I move on to the next major feature.

thoughts about CherryPi

CherryPi remains interesting, although for now it still looks like Just Another Bot.

CherryPi wants to win with masses of zerglings, or occasionally hydralisks against protoss. It has some reactions, but overall tends to show a lack of strategic flexibility. It has a plan, but if the plan doesn’t work the followup tends to be slow and inadequate (compare Killerbot by Marian Devecka, which can completely switch its unit mix when a plan doesn’t work). Today’s game against LetaBot by Martin Rooijackers is an example. CherryPi opened with a fast second hatchery and no gas, to put on early pressure with masses of slow zerglings. LetaBot saw it coming and easily repelled the lings. CherryPi kept making masses of lings with few drones even after terran had medics and stim, when no quantity of slow zerglings could pose a threat. Terran played slowly and overcautiously, making easy-to-see mistakes, but it didn’t matter because the zerg strategy was inconsequent. By the time CherryPi started slowly adding mutalisks, mutalisks were also no threat. LetaBot eventually moved out and swept aside everything in its path. (Then LetaBot got stuck on the enemy ramp and crashed, or overstepped the time limit, but that’s a different lesson.)

CherryPi seems to make a lot of decisions without scouting. For example, it makes scourge (sometimes more than a little) when the enemy has no air tech. It moves its overlords, but does not send one to the enemy base. I think it is making choices based on units that it sees, especially versus protoss. But when it feels overmatched it holds its army well back from the enemy, meaning that it can’t see. Compare Steamhammer, which is aggressive and keeps its units forward even when it’s a big risk; the countervailing advantage is that it gets to see what the enemy has and what the enemy is doing.

Against zerg, CherryPi has different openings. If one loses, it tries the next. From what I’ve seen, the first opening is a 9 pool without gas, followed by a hatchery for mass slow zerglings. It’s a safe middle-of-the-road opening, or in other words halfhearted, but CherryPi loves its favorite. If that loses, the second try is sunken turtle into mutalisks, which is successful against many zergs (I think it is likely to work against Steamhammer too). There may be fancy learning going on behind the scenes, but if so we can’t see it because the tournament doesn’t have enough games.

Against protoss, I can’t detect any such progression in CherryPi’s openings. I think it’s always playing the same opening, and adapting it somewhat to the situation. It expands with 12 hatch, sunkens up its front, and makes massive numbers of drones. It’s similar to Killerbot’s plan, and a good one in general though the early sunkens are often unnecessary (and occasionally insufficient). The games against Bereaver make a good example; the first is a loss and the second a win, but zerg plays similarly in both. In the first game, notice how zerg keeps making mass drones and maintains a strong economy even as it is losing every fight, including losing its drones at a high rate. In the second game, the difference was that the players were at cross positions on a large map, and Bereaver’s corsair play and reaver drop were less effective.

CherryPi is safe against fast rushes; it has a perfect record so far against the zerg rushbots. CherryPi is vulnerable to hard rushes. Wuli won 2-0, and so did Flash which also does heavy early zealot pressure. Black Crow’s relentless zergling waves beat both the zergling opening and the turtle opening.

Overall, CherryPi has glaring weaknesses just like other bots do. But as I write, it is ranked #11, so it is strong by the standards of this tournament. I think the main source of its strength compared to other bots is the same as the main source of Steamhammer’s strength, the pressure style of play, which works because bots are better at attack than at defense. Steamhammer is ahead of CherryPi for now, because I invested effort in stability and resilience and lose fewer games to bugs and basic goofs. The CherryPi team is presumably investing in smarts instead, which should pay off in the long run. I haven’t seen any sign that CherryPi has particular smarts in opponent modeling—as far as I can see its opening learning is a simple algorithm, and I can’t detect anything else it might be doing—but if it does we might not be able to tell, because the tournament is not long enough.

The next AIIDE tournament may be interesting.

cheese game McRave-MadMix

Yesterday an epic, today a short sharp shock. The game McRave vs. MadMix is a brazen cheese. Why shouldn’t I build my first pylon next to your nexus? Maybe because it couldn’t possibly succeed?

MadMix placed its first pylon in McRave’s mineral line, at the corner of the nexus. I can suggest an improvement: Place the pylon slightly to the left so it blocks access to the indented mineral patch. That is called a manner pylon because it is ever so polite. A probe sent to mine there will run behind the mineral line, slowing down the opponent’s mining. If you’re going to push a pylon into the opponent’s face, you might as well make it a manner pylon, as in the famous game Bisu-Pokju from 2007.

A manner pylon is commonly worth it, given that you have an early probe in the opponent’s base, especially if (as in Bisu-Pokju) it blocks 2 mineral patches: In between trapped workers that have to escape (which I doubt bots have the knowledge to do), workers devoted to attacking the pylon, and workers sent behind the mineral line, it can slow down the opponent’s mining more than enough to make up for its cost. It occurs to me that a bot with mineral locking could avoid some of the cost provided it has the special case knowledge to avoid locking workers to the blocked mineral patch or patches. I doubt that any bot has that knowledge yet, since I’ve never seen a bot place a manner pylon! I would be interested to see how a manner pylon interacts with LetaBot’s path smoothing. If the smoothing is not smart enough, some SCVs might be unable to mine at all—I am imagining SCVs bumping against the pylon trying to follow the shortest path.

one pylon for each side

McRave assigned 2 probes to tear down the offending pylon. MadMix calmly continued the cheesemaking process, building a gateway, then replacing the destroyed pylon with 2 fresh pylons, then laying down a second gateway, all while sending fresh probes to make sure one was always on hand. McRave seemed unimpressed and carried on with its 1 gateway build.

2 gates versus 1 gate

McRave ought to have been impressed. McRave’s first zealot was out earlier, and it could have held easily with good play. But McRave didn’t seem to know how to react; as it was attacking proxy buildings and mining gas and starting its cyber core, MadMix was killing probes and pulling ahead. The proxy won. Correct play when you get proxied like this is to delay your tech until you have the situation in hand. Your opponent set itself back to perform the proxy, and you can always stay ahead in units.

Don’t blame McRave for missing knowledge. All the top bots, Iron and Tscmoo included, have knowledge gaps wide enough to drive a government cheese truck through. It was only shortly before the tournament that I added smarts to Steamhammer to react on the fly to this kind of in-base cheese (Steamhammer makes a spawning pool if it has 9 or more drones, and will cancel gas or a second hatchery if that helps it get the pool up faster). And Steamhammer doesn’t understand how to react to other proxies like Juno’s (by Yuanheng Zhu) cannon contain (it still relies on a hand-coded counter for that). Bots need a lot of knowledge and it takes a long time to acquire.