archive by month
Skip to content

Steamhammer is top zerg on SCHNAIL

So, SCHNAIL believes I successfully updated Steamhammer back on 11 July, but it is still running the old version. It’s easy to tell, because of the orthogonal movement pattern and the jittering of Watch squad zerglings before burrow is researched. I’ve been too lazy to diagnose it and try again. Presumably I got something wrong in the zip file.

Nevertheless, I was surprised to see that Steamhammer is the #1 zerg on SCHNAIL as of now. Well, counting active zergs. The top zergs by elo are

1472 Steamhammer
1463 Chris Coxe (aka ZZZKBot)
1452 ZurZurZur
1440 Crona
1432 Monster
1384 Killerbot (aka Marian Devecka)

Chris Coxe is up there because of its rushy habits. I’m not sure why Monster and Killerbot are so far down; I would have expected them to do well against humans. Microwave is ranked above Steamhammer on BASIL, but below on SCHNAIL at elo 1280. In some cases, bots on SCHNAIL may be older versions. I know some bot authors update often on BASIL and rarely on SCHNAIL. Maybe they’ll read this post, update, and try to push Steamhammer back down where it belongs.

In other news, AIIDE 2021 registration is open. Steamhammer will be competing.

Steamhammer’s performance over time

Many will have missed it since the original post was almost a year ago, but today Tully Elliston commented on the Steamhammer 3.1 change list from August 2020:

Tully Elliston: Looking at BASIL win rates, it looks like SH competitive performance dropped visibly after this version.

It does look that way. Here is BASIL’s graph of Steamhammer’s elo for 2020. BASIL throws in the ratings of top bots, which by coincidence is exactly what I want here. The version in question is the red dot on 20 August (delayed from the posting of the change list due to downtime).

graph of rating over 2020

Steamhammer improved slowly but steadily up until around that version hit the server, then more or less held steady while the top bots gradually lifted away. The cause might be the sudden ascendance of Stardust, pushing everyone else down; the theory would be that the other bots on the graph coped better with the killer dragoons. It seems plausible to me, but Stardust is only one opponent and should not have much effect. The cause might be that I had spent a year distracted by other things and worked slowly on Steamhammer. That seems more likely to me. Or it could truly be that a weakness was introduced in this version.

Notice that Steamhammer’s improvement on the graph occurred in between widely-spaced updates. In principle, there are 3 ways that can happen: 1. By chance. 2. By artifacts of the rating system as implemented, because of bots arriving and leaving. You can get elo inflation if bots arrive, lose games and fall in elo to push everybody else up, then are dropped (and BASIL has dropped a lot of bots). 3. By Steamhammer’s opening learning. I think the opening learning is most likely. That opens another hypothesis for why improvement stopped around this version: Maybe, due to weaknesses already inherent in Steamhammer from earlier versions, the learning reached a ceiling and could no longer contribute. This suggests that there may be a bottleneck weakness somewhere, and to make big progress I have to break the bottleneck.

Wah, that is a lot of hypotheses. I looked at the long-term elo graphs for a number of bots which have not been updated the whole time, and they all show elo increases. BASIL has elo inflation, which explains some proportion of the elo rise of all bots. It also means that if your elo does not increase, maybe your bot is not staying the same, but getting worse! (We could take an average of non-updated bots and subtract out their elo inflation to get an estimate of true strength over time. There is no reason to expect that the inflation is constant over time.)

Here is the same graph starting from 1 January 2019 and continuing until today. BASIL began a little before the start of the graph, but the early period shows startup transients as the initial elos are established, so I left it out.

graph of rating over all time

When I compare Steamhammer to Hao Pan and BananaBrain on this graph, I can make out 3 periods. From the start until about October 2019, Steamhammer was neck-and-neck with them. From then until August 2020 or so, Steamhammer remained behind them; a gap had been opened, and the gap stayed roughly constant over that time. And since that time, Steamhammer has gained elo extremely slowly if at all, and has fallen further behind. Despite bug fixes and demonstrable improvements in some points of play, Steamhammer does not seem to be improving and (accounting for elo inflation) may be deteriorating. It is consistent with the distraction hypothesis, if you assume that I still haven’t recovered... but I think I have.

I suspect that the bottleneck weakness hypothesis is true. After watching many SCHNAIL games, I’ve concluded that Steamhammer’s tactical weaknesses in the midgame are critical. It loses too many units due to bad tactical decisions, must replace the lost combat units to stay safe, and (spending on combat units instead of drones) reaches its lategame economy too late. I suspect that if I fix the bottleneck tactical weaknesses, the other improvements I’ve made will start to show.

It’s hard to be sure, though! Gotta try it and find out.

By the way, I think the big point in these graphs is the relative decline of Krasi0. Krasi0 gained slightly over time, but lost its dominance and now is only another top bot. Subtracting elo inflation, perhaps Krasi0 is no longer improving at all.

Fresh Meat game comparison

Fresh Meat was reactivated only a few days ago (see my brief writeup). Its play shows sudden switches: Compare Fresh Meat’s first game versus Prism Cactus with its second game against the same opponent. The first game looks aimless, the second shows a clear plan more-or-less cleanly executed.

Fresh Meat has been updated repeatedly in a short time, so any given play difference might be caused by a code change. But Hao Pan hinted that it records its opponent’s unit mix and plays to counter it. From his comment: “it is built on the in-game AI provided there’s no past games. This time I rewrote the unit composition record system, and am looking forward to the build orders COEP comes up with.”

The two games and the statement both say: Here we have an example of recording the opponent’s habits and figuring out how to counter them. It’s similar in concept to what I’m doing with opening data. This kind of system can learn very fast because it reacts immediately to new data without having to do any statistical averaging or other slow stuff. And it can make drastic adaptations, because the adaptation is done by reasoning—in the case of COEP, the reasoning takes the form of an evolutionary search. Its limits are set by the data it records and the capability of its reasoning system.

SCHNAIL changed map

Hmm, has somebody been playing around a bit with their SCHNAIL installation?

base with many geysers, other alterations

This is from a replay recorded on the SCHNAIL server. I noticed a few other replays with similar map... issues.

Steamhammer 3.5.1 change list

Yesterday I uploaded Steamhammer 3.5.1 to SCHNAIL, as Steamhammer, Randomhammer, and Crazyhammer. At least some games today still seem to be running the old version, though. I’m not sure how the details of update work.

This version concentrates on changes to improve play against humans, though I expect some of the changes to help against bot opponents too. The headline skill is new static defense analysis, which should help Randomhammer’s terran and protoss against all opponents, and zerg against humans. If performance is as good as I hope, I’ll soon upload to SSCAIT too. But I expect real games to look different from my test games.

static defense

New StaticDefense module makes decisions about static defense timing and placement for all races. (Special zerg reactions to events like rushes and proxy bunkers still exist, so the new code doesn’t handle all static defense.) The former code was only for zerg. The new code is more general and capable, and simpler in key aspects. It should be easier to work with. Like the old code, it understands that ground defenses at a natural base also protect the main base behind it, and it understands that drops bypass those defenses.

One frame it examines the situation and makes a plan for what anti-ground and anti-air static defense is needed. The next frame it starts to carry out the plan, including building prerequisite tech like an engineering bay for turrets if needed. After a fixed delay, the cycle repeats. The terran plan only calls for turrets; it’s easily extended to make bunkers, but Steamhammer doesn’t have the skills to use bunkers properly as static defense (its only skill is to put marines that it already has into a nearby bunker if the enemy is nearby too). The protoss plan calls for cannons at all necessary places to defend against cloaked units, vultures, drops, and air attack. The zerg plan is comprehensive. It makes one spore at one or two selected bases to preserve overlords if needed, or the required number of spores at all bases to defend against air attack. Steamhammer should do better against mass wraiths and mass scouts, which human players occasionally go for, and better in ZvZ when far behind in mutalisks. Sunkens are concentrated at the front line base versus bots, because nearly all bots go straight there, and spread to all exposed bases versus humans, because humans like to wipe out unprotected bases first. (At some point I want to add a SkillKit skill to remember the opponent’s past behavior, so that Steamhammer doesn’t have to rely on a blanket heuristic.)

The overall effect should be that zerg is resilient in a slightly wider range of situations, while terran and protoss become better able to survive common attacks like DT rushes, mutalisks, and drops.

Detailed building placement is not improved. I did nothing with the building placer to ensure that that defenses cover approaches or buildings or the mineral line, etc. Sometimes turrets are in a tight line, so that half the mineral line is overprotected and half is open. One step at a time.

• To support this, I moved getMySpireTiming() from the zerg strategy boss to the information manager. I also updated it to work in the case where the bot is making more than one spire simultaneously (which it could do if the opening build explicitly codes it). In ZvZ, if your mutas are far enough behind then you have to start preparations for a spore colony before the enemy spire finishes.

• I moved the “front point” for defense from 7 tiles away from the hatchery to 5 tiles away. It helps zerg place sunkens in a tighter group, and should not hurt terran or protoss.

squad orders

• Assign more anti-air defenders when under attack by protoss scout air units. 2 hydras per scout were not reliably enough; 3 should do it.

• Watch squad: In ZvZ, don’t watch as many bases. Expansions are few in the matchup, and the zerglings are more valuable in combat.

• Watch squad: Don’t waste a ling trying to watch a base which is covered by enemy defense, such as a sieged tank or a cannon.

• Watch squad: Smaller combat sim radius—let the enemy get closer before fleeing.

• Watch squad: An unburrowed zergling watching a base tended to oscillate around the base position, rather than sitting still, churning useless commands. When originally implemented, the Watch squad did not misbehave like that. I found 2 bugs born since then, each of which independently caused oscillation: 1. Being at the order point automatically made a unit “near the enemy,” so that it might run away a short distance though not actually near the enemy. 2. In the distances grid used for pathfinding, the grid 0 point could be slightly offset from the true goal position. I decided to allow some tolerance.

micro

Diagonal movement, lack of which I noted as a disadvantage in Steamhammer’s pathfinding. Units often get where they are going faster, which should help in all matchups versus humans or bots. On the other hand, I’ve noticed that the orthogonal movement can sometimes sneak past the opponent’s army and let Steamhammer make a surprise attack. There is always a tradeoff.

Try to stay out of tank range when not actively attacking. Steamhammer mostly keeps retreated units out of enemy fire, but there are exceptions. The worst was leaving ground units nearly, but not quite, outside of tank range; given time, one enemy tank could destroy an entire ground army (common versus humans who benefit from zero-APM passive defense, rare versus bots). I made 2 changes to fix it. 1. A retreating unit checks an attack map to make sure it is out of range; if not, it hasn’t retreated far enough. 2. The combat sim seeks enemies in a wider circle if there is a risk of sieged tanks, so that it doesn’t overlook the enemy and wander back into range “because it’s safe.”

Canceling buildings under construction, as well as unhatched eggs and cocoons, when they are under attack and about to die, works again. I introduced bugs when I “improved” it, causing it to nearly always fail. Now it is both improved and reliable, which (surprising though it may be) is better than either alone. It also cancels earlier, at a predicted 5 seconds left to live, which may be too soon. I’m guessing 2 or 3 seconds might be best; it depends on how accurate the predictions are.

• Bug fix in handling irradiated units: They feared splashing their radiation onto zerg buildings, though radiation does not affect buildings. Buildings apparently have rad shielding. This was a minor bug with trivial effects. I found the bug while searching for known serious bugs in handling irradiated units, bugs that regularly cause severe mistakes in games, but I could not pinpoint any of them. Except for its deadly flaws, the code appears flawless.

queens

Queens are surprisingly valuable against turtling human terrans, so I put effort into improving them. But not much; these are all simple changes. Big improvements need more complicated work (but will happen in time).

• Configured to make up to 8 queens, up from the previous limit of 6.

• Get the queen’s nest a little earlier if we have reason to want queens or defilers, or a little later if we don’t. This should help Steamhammer get queens and/or hive tech sooner when needed, and delay the expense otherwise.

• Get ensnare less often in ZvT. The combat sim understands ensnare reasonably well, but the queens cast it at inappropriate times when it does no good. Broodling is more valuable with Steamhammer’s skills.

• Don’t get broodling if the enemy has too much air-to-air strength. Wraiths and corsairs love to shoot down queens.

• Versus both terran and protoss, a higher threshold to get broodling in the first place, but once the threshold is crossed, a greater number of queens to use broodling. It’s an efficiency improvement.

• Changes to the scoring for queen broodling targets: A bigger bonus for a target which is defense matrixed. A discount for a plagued target. A bonus for a target under dark swarm. A slight increase to the bonus for already being in range, so that the queen doesn’t have to move in.

zerg

Defense against proxy cannons: Attempt to exploit the sunken range bug. This is one of 2 main expert defenses against cannons behind the minerals (the other is to push workers through the minerals to fight). If it can, Steamhammer will place a sunken which is in range of a pylon and out of range of the cannon or cannons that the pylon powers. When the cannon fires on the hatchery (or on anything else), the Brood War bug will cause the sunken to target the cannon even though the cannon is out of range and cannot fire back at the sunken. Use of this bug seems to be universally legal. If it works as intended, it should stop many tries to put cannons behind the minerals (if the cannons are too early, the hatchery will instead be canceled, or never started, and Steamhammer will have to destroy or play around the cannons).

No bot yet has tried to cannon behind Steamhammer’s minerals. Human protoss players do it often. It doesn’t always win, but with the right followup it’s effective.

• Defense against proxy cannon pushes: More often place multiple sunkens versus surrounding cannons. Opponents, both humans and MadMixP, are increasingly creeping cannons around the edges of the defense zone of Steamhammer’s single anti-cannon sunken.

• A high score for a defiler to place dark swarm over burrowed lurkers, and a lower score over unburrowed lurkers. A minor change.

• Adjust defilerScore higher against protoss dragoons. That will make Steamhammer get defilers earlier—see the item about the queen’s nest timing above under queens.

• The remaining zerg items are adjustments to the unit mix scoring. I adjusted existing scoring terms to reflect results against humans on SCHNAIL, adding only one new term. Human players pose unit mix problems that bots do not. ZvT unit mix adjustments: Wraiths more encourage hydras. Valkyries more strongly discourage guardians and encourage devourers.

• ZvP unit mix adjustments: Lower global bias toward ultralisks. All protoss air encourages hydras. Corsairs and scouts more discourage guardians and encourage devourers. Carriers discourage devourers. The new scoring term is that merely having a stargate (which could be an inferred stargate; it doesn’t have to be directly scouted) discourages guardians. Steamhammer has been making too many guardians against protoss, and uses them in a way that’s strong enough against bots, but fatally weak against humans.

• ZvZ unit mix adjustment: Guardians and devourers are both more discouraged in general. They should be rare.

openings

• In many terran and protoss openings, replaced go scout once around with go scout, meaning that the early worker scout stays inside the enemy base if it can. Compared to zerg, terran and protoss have more workers and can better afford to dedicate one to looking at stuff.

Fresh Meat is back

Hao Pan’s zerg variant of his terran bot Halo, Fresh Meat, is also back. Fresh Meat’s big claim to fame is that it uses COEP for production decisions. I wrote up COEP in 2019, where Hao Pan left a comment about the Fresh Meat version of that time.

COEP is a search method. Like any search method, it relies on you to provide a game model (the “forward model”) to answer the question “what happens if I do this?” and an evaluation function (a “fitness function”), to answer “and is it any good?” Results are shaped by the method, but depend mostly on the model and the evaluator you provide: A search method amplifies the smarts of its model and evaluator by using them to look into the future, but you need to provide those smarts in the first place, as a seed for the search to plant. (In COEP the seed grows into a lawn rather than a tree, but whatever!)

I was not impressed with Fresh Meat’s production decisions. It likes an early spawning pool—sometimes 4 pool, 5 pool, 7 pool, sometimes later—and makes early zerglings when it feels like it, or not when it doesn’t. It likes drones and sunkens. I haven’t seen it visibly react to enemy air units. It often lets its minerals run up to a high level, a very bad sign. I’m speculating, but the play is similar to what I’d expect if the model and evaluator were trained by play against the built-in AI. Could that be it? If so, perhaps it is learning on the ladders and will improve over time.

hard-fought Steamhammer game

Steamhammer (as the random-build Crazyhammer) played a particularly difficult game yesterday on SCHNAIL, a Crazyhammer versus kkt2108 (T) at server time 21-07-08 18:28:46. As background, this terran player had just lost a couple of games to randomly-selected zergling busts with different timings, and apparently decided that it would not happen again. It didn’t!

picture showing the end result of hard fighting

I trimmed out the minimap and everything below, so that you can’t tell who’s winning. But see the signs of tough combat: The zerg natural being replaced, the rubble of terran buildings at its entrance (at the very bottom edge of the picture), the zerg main nearly mined out and with destroyed buildings, a couple of terran stragglers still not ejected, and few zerg combat units visible though apparently zerg survived the battle that just finished.

Don’t miss the simultaneous attacks by both sides at the end.

KangarooBot is back again

Random KangarooBot’s orbit has brought it around to Earth again. I mean, it plays random, and its play is kind of random. The first item in the README change list is “Spell caster class file got sued and works 35% of the time, everytime.” From a short look, it seems that first of all the author is having as much fun as ever, and as a secondary minor matter, the bot is playing somewhat better—except as terran, its terran has no desire to live, though it does put up a pretense by building a cluster of bunkers. One of its better games is this one vs Hannes Bredberg (don’t miss the command center infestation).

KangarooBot on SSCAIT
KangarooBot on BASIL
KangarooBot’s repo on github
brief KangarooBot post from August 2020
brief KangarooBot post from March 2020

Steamhammer’s ZvP win rate

Steamhammer’s BASIL win rates by matchup surprised me:

matchup%
ZvT62%
ZvP60%
ZvZ61%

What the what?!? The numbers are indistinguishable! That’s not what I expected at all! Steamhammer has always been worse at ZvP, and always by a wide margin. What happened? I went so far as to suspect a bug in BASIL and compared other bots to see if anything was glaringly suspicious, but no, nothing was obvious.

Here are the same matchups as played by Randomhammer, which plays zerg identically except for what it may have learned.

matchup%
ZvT63%
ZvP65%
ZvZ78%

Randomhammer plays weaker opponents because it is lower ranked, so its zerg rates are higher. ZvZ stands out as much stronger, the other two are again virtually equal. Compare the rates from February this year, when ZvT and ZvZ were close and ZvP was waving to them from far below.

What changed in the last few months? Steamhammer or its BASIL opponents? I’m sure the higher ZvZ win rate for Randomhammer is due to fewer ZvZ games versus Monster and Crona. But the fact that Steamhammer and Randomhammer have similar ZvP rates despite facing a different mix of opponents suggests that the ZvP change is in Steamhammer, not in the opponents. New versions since February are Steamhammer 3.4.8 and 3.5. I looked back at the change lists and nothing stood out as clearly advantageous for ZvP. Maybe the change of AbsoluteMaxWorkers from 75 to 65 is good for ZvP? It seems plausible, compared to anything else I can think of. (When Steamhammer reaches that worker count, it goes into late game big-army macro mode.)

Bot performance is sometimes mysterious. Intuition does not always bite. Does anybody have a theory?

As an aside, the matchup win rates of adias (aka SAIDA) are amazingly similar: They varied from 69.17% to 69.62% when I checked. Was that achieved by intense optimization of the play? Is it an effect of the learning system?

Steamhammer 3.5.1 coming soon

Next version Steamhammer 3.5.1 is mostly ready. The major work is done and it is passing many tests. There are tweaks in waiting and some fixes needed...

turrets surrounding a nonexistent base

... but I’m not expecting serious issues.

The changes are intended to improve play versus humans, so I’ll test on SCHNAIL first. I made tactical changes that are hard to test and may be risky; if they don’t work out, I’ll have more to do for the next version. Some changes will help versus bots too, so if it looks good it’ll be on SSCAIT shortly after.

Steamhammer > Stardust

Today on BASIL, Steamhammer scored its first win over Stardust in months (no exaggeration). Steamhammer-Stardust on Destination is a zergling rush and not that interesting, other than in being a rare win.

The build is the 6 pool speed opening that I wrote up in 2018. It is a little slower than 5 pool, in exchange for getting zergling speed to make the lings more dangerous. I suspect the build may have winning chances only on 2-player maps, because Steamhammer has to drone scout on bigger maps to find the enemy, and sending 1 drone out of only 6 total gives up a lot.

Did Stardust misdiagnose the opening because it saw the gas being mined? If so, I don’t think it was critical. After zerglings broke in, the probes felt unsafe and stopped mining; that was the real mistake. I think it’s a good guess that the overreaction has been fixed in the CoG 2021 version.

curious Steamhammer plays

Steamhammer understands that plaguing an archon doesn’t do much good. An archon has 350 shield points and only 10 hit points, and plague affects hit points only. But put enough archons together, and Steamhammer may calculate that it’s worth it anyway.

6 plagued archons in battle

Steamhammer intentionally errs on the side of excess plague. Plague on 6 archons destroys at most 6 * (10 - 1) HP, only 54. It’s questionable whether that’s enough to pay for the 3 zerglings the defiler consumed (the lings have the adrenal upgrade and do damage fast, but then again, they’ll die almost instantly against mass archons so maybe they weren’t worth much). Dark swarm might have been a better choice, but I don’t mind. As I said, it’s intentional, and anyway, Steamhammer was losing this game no matter what.

In this game versus Krasi0 on BASIL, Steamhammer selected an unsafe ZvP build. Almost all ZvT builds come with a lair pretty early; the exceptions that come to mind are rushes. This was a multiple-hatch-before-pool into 6 hatch hydra build with a late lair, sensible against sufficiently slow protoss play, senseless against typical terran barracks play. Somehow Krasi0 steered right into it, starting with its thou-shalt-not-rush-me ramp bunker and continuing with goliaths. Hydras are strong against goliaths.

Both bots have a lot to learn about strategy.