versus humans - 2 | Starcraft AI blog

the kitchen sink

A SCHNAIL game. This human player was able to cope with every aggressive trick Steamhammer tried, and it tried several, but not able to keep up macro at the same time.

Zerglings and ultralisks are under the dark swarms, invulnerably shredding everything. Of the ensnared marines to the left of the swarm, the majority are also plagued. Only the most recent spell shows.

When its economy is strong enough, Steamhammer switches quickly from midgame play whose goal is to hold on and not fall behind, into the endgame with power moves. As a human player, how does it feel to face so-so play for most of the game, holding after no worse than a bit of a scramble, and then be abruptly overwhelmed by a prodigious army that seems to have all the tech and be able to do everything simultaneously?

In the following game between the same players, the human was the aggressive one, moving out and repeatedly destroying zerg expansions. Steamhammer never built up its endgame army. But with the terran army away from home, mass zerglings overran the terran bases. Keeping up with AI macro is mechanically demanding for the human player. I expect it’s good practice, though.

tuning against humans

Check out this event in the game webgames vs Steamhammer on Sylphid from SCHNAIL.

Human players of terran and protoss normally group their production buildings. It’s pretty much a requirement for efficient macro. It also introduces a vulnerability: If the opponent maintains control over the area of your production and you don’t already have enough army to force them away, then you are in deep trouble. Any new units you produce will die, so you cannot regain control of your production. Here, terran has built turrets to protect its main and natural mineral lines, but none next to the barracks, an oversight that invites attack.

In the picture, I judge that Steamhammer does not hold enough control over the terran barracks to win, even though the selected barracks is in the red and in danger of burning down. It did hold that much control earlier and had strong winning chances, but frittered it all away: Mutas attacked the barracks, then were distracted away, then attacked the barracks again, then were distracted, and so on, losing mutalisks and allowing the marine count to build up. Bots rarely group their production buildings, so I never taught Steamhammer about the leverage of controlling the enemy production. In the picture, the mutas are about to be distracted away for the last time. Terran repaired the burning barracks, had unhampered production, and won easily with their huge economic advantage.

Distractibility is an issue in itself. I need to improve tactical target selection so that it doesn’t change its mind so frivolously. The skill of controlling the enemy production is a harder question for me, from a development point of view. The skill would be just as valuable in bot-vs-bot games when it applied, but it would not apply often. I wouldn’t be tuning specifically against human play, but it leans that way.

There are specific anti-human skills that don’t help against bot opponents. Humans often try to overwhelm the opponent’s multitasking with threats and feints and multi-point attacks. Multi-point attacks in themselves call for multitasking ability. Bots can do the same much better. I don’t have any plans to tune to exploit human weakness in multitasking, though I suppose I may come around to it in the distant future.

practice games on SCHNAIL

Recently a not particularly skilled protoss played more than 20 practice games versus Steamhammer on SCHNAIL, winning 3 and losing the rest. I went through the games in sequence. Steamhammer put its variety of builds to good use, sometimes 4 pooling, sometimes busting with one or another kind of all-in, sometimes defending then countering, sometimes building up then rolling everything down with hive tech. I was mostly pleased, though tactical clumsiness was an issue. In the most spectacular game, protoss aggressively attacked, destroyed and held the zerg natural, built a gateway and nexus there and started mining the zerg’s minerals, and found and cleared the only other zerg expansion. Steamhammer held its main with sunkens and units, used an escaped drone to lay down another expansion that protoss did not find, teched to lurkers, pushed down its ramp to free its natural, defeated the dragoon army, and won with the one big counter. That must have been frustrating for the human player, who apparently did not have the multitasking ability to both micro the dragoons and get observers in time (a difficult skill to learn, if you ask me, because multitasking is not natural for humans). Yet game after game followed.

Looking at it from the human player’s viewpoint, protoss started with a not very convincing gateway and forge build with no cannons. After Steamhammer easily busted it a few times, protoss started experimenting with other plans. Some were defensive, like holding with cannons and trying to expand by shuttle. The more successful were aggressive, like the game above. Over the long sequence, I thought I could see the human player’s skills slowly sharpening, with more consistent macro and more appropriate strategies. Stronger players quickly notice Steamhammer’s weaknesses and exploit them, but this was a player who hadn’t reached that level of knowledge yet. It takes experience to build skills, you have to work through all the stages.

SCHNAIL is great for this, I conclude. If you can find a bot at your strength level or above, but not hopelessly above, and with some variety in its play, you may have a good practice partner. You can polish your basics and learn what kinds of plans work and try out new ideas. If you find yourself winning most games, then I guess either you’ve improved that much or else you’ve learned to exploit the bot’s weaknesses. I’d say if you want to keep improving, it has become time to look for your next opponent.

experience with SCHNAIL

I’ve been keeping an eye on SCHNAIL by Sonko for human-machine play. The interface is not pretty like BASIL, but it works. Few games are played, by comparison with SSCAIT and BASIL; on the days I counted games, between 64 and 111. Humans—it’s shocking but true—are somewhat slower than computers. A game on SCHNAIL can be “practice” or “ranked”, with different rules. Sonko was surprised early on that nearly all games were practice. I guess it makes sense while people are first trying out something new. Today most games are still practice games, but there is a good mix of ranked games too and most bots seem to have properly established ranks which are not very different from their ranks on BASIL. The ratings look different, though, since the midpoint is 1500 instead of 2000—and humans are better, so the bot ratings are mostly below 1500. The ratio of practice to ranked games varies depending on how popular the bot is to practice against: BananaBrain is popular and has 12% ranked games, while tscmoo is not and has 61% ranked. Some bots have few total games and no believable ranking, probably because they don’t work reliably, and probably should be removed from the rankings.

SCHNAIL supports 22 maps, at least notionally: The 15 SSCAIT maps (including Electric Circuit, which is disabled on SSCAIT) plus a handful of newer maps. I was not able to find any successful games on the maps Tres Pass or Core Breach. Perhaps they use features from Broodwar Remastered? Or do so few bots support them that none has succeeded there yet? The bot upload page comes with checkboxes so you can specify which maps the bot can play on. I didn’t realize at first that the checkboxes scrolled; a more rectangular arrangement would be easier to use. Anyway, my uploads are set to play on all maps except the two broken ones and Electric Circuit, where they run into pathing trouble.

In a practice game you choose your opponent; in a ranked game, SCHNAIL makes the match, trying to choose a bot near the human’s strength when it can. I thought there was an element of randomness in the ranked matchmaking, but if so apparently it is not a large element. I often see streaks where the human is matched against the same bot repeatedly. It makes a difference, too; humans learn fast and adapt their play to what they are facing, and I sometimes see the same pattern in ranked games as in practice games, where the human loses several times and then figures out how to win.

Steamhammer has only a handful of games since it was updated; most days it does not play at all. Nevertheless, some of the games are ranked games, and its rating has climbed. In ranked games SCHNAIL is still matching Steamhammer against players below its current strength, and its rank will likely keep climbing for a while.

The weaknesses I see in Steamhammer’s play versus humans are about the same as the ones I see in its games against other bots. But humans pick up on weaknesses more easily, and exploit them more consistently, so the weaknesses look bigger. It’s useful, actually. I have changed some priorities after seeing games. One protoss at first struggled against Steamhammer’s macro, but after a few games realized: Put a little pressure on, and then Steamhammer can’t defend its expansions, hydras don’t think they can get there. A couple more games, and it was: Cannon the expo while zerg can’t respond, then maybe take it for protoss. Bots aren’t that smart!

Randomhammer was indeed a newer version than Steamhammer at first, explaining why it was rated a little higher. Since then I’ve updated them both, of course. I had to ask to be given control of Randomhammer, but it was no trouble. Randomhammer has no games since it was updated. Human players may be different, but BASIL provides the best available forecast: The win rate graph shows protoss not far below zerg, with terran stuck in the mud far behind. I think the big difference must make terran less fun to play against.

Crazyhammer is a just-for-fun Steamhammer zerg configuration that I uploaded to SCHNAIL yesterday. It is set to choose randomly from its library of over 200 zerg openings, paying no attention to the matchup or the map or anything. I set it to practice-only, since it is in no way competitive. Play a game and be surprised; Crazyhammer might accidentally play something sensible, or it might play a protoss build like 4 hatches before lair against your zerg.

Will it be ordinary 2 hatch muta—or queen rush, or 5 hatches before pool, or 7 pool hydra rush, or...? Chooses utterly at random from a huge library of sensible and insane builds. For weirdness!

As a taste, there are 2 builds starting with 4 pool, 3 with 5 pool, 5 with 6 pool, 8 with 7 pool, 4 with 8 pool, and 18 builds with 9 pool. If I counted right. At the opposite end of the macro spectrum, there are 4 builds with 5 hatcheries before pool, 1 with 4 hatcheries before pool, and 11 with the more reasonable 3 hatcheries before pool. The bulk of the options lie between the extremes. Most of the openings counter something and make sense in some situation, however rare, but a few are pure nonsense thrown in for the hell of it. The two-base ultralisk rush has no known use, and the 7 pool mutalisk rush might beat you if you are a sworn pacifist. Some of the ideas I have never seen anywhere, from bot or human. I hope players will enjoy sampling a little of the prodigious variety.

Crazyhammer has been chosen for 1 game so far, a ZvZ where it went with a perfectly sober 10 pool, scouted the wrong way so that it failed to react to the enemy, and lost easily. Not crazy enough.

Steamhammer on SCHNAIL

I have felt fully busy with Steamhammer, with No Time For Anything Else, so I have been ignoring SCHNAIL. Besides, until I updated to BWAPI 4.4.0 I could not upload a new version anyway. Today I apparently felt less busy, because I logged in there for the first time since early days.

I updated Steamhammer on SCHAIL from version 2.4.1 (dating from late 2019) to the current version 3.4.1. The current version is far stronger against other bots, and it should be much stronger against humans too—though I imagine that some of the new danger-avoiding behaviors that are effective against other bots can be exploited by devious humans. The older version already had a HumanOpponent flag which is supposed to be set automatically when running under SCHNAIL. I have never verified that it works as planned. Maybe I should do that soon....

Unsurprisingly, the old Steamhammer ranks much lower versus humans than the current Steamhammer ranks versus bots. What did surprise me is that Randomhammer ranked higher than Steamhammer proper. Ever since Randomhammer was first uploaded to SSCAIT in 2017, I have never seen Randomhammer above Steamhammer. Steamhammer’s zerg is much stronger than its alternate races. It’s hard to believe that humans are worse than bots against random opponents, so maybe this is a newer version of Randomhammer? I’ll check that out too.

I watched a few replays. It was delightful to see some of the bot exploits that people attempted. Especially when they didn’t work.

Still next: Steamhammer’s experience versus cannon rushes.

bot versus human meta differences

In the Undermind 42 podcast, purple Dan Gant offers the opinion that the bot metagame and human meta will be similar in the long run: “there may be some differences but those differences will be slight” as bot and human meta converge over time.

Maybe so. We won’t know till we know. There are reasons to be sceptical, known instances where bot and human meta differ for reasons that seem fundamental and likely to endure. Here are examples. It’s possible to argue about each one, whether it constitutes a serious or a minor meta difference, but there certainly are differences.

My examples are all in zerg matchups. That might be because zerg is affected more, or just because I know more about zerg. I suspect the latter.

Locutus-style dragoon micro

Cadenzie had the impression that once Locutus had enough dragoons, hydras became weak against them. A dragoon has high speed and a longer range than a hydra, so with perfect micro it makes sense. We’ve seen similar events in other Locutus-human zerg games: Zerg has to win by playing a more efficient build order to stay ahead in macro—but in the limit as bots improve, protoss will have efficient reactive builds. If this is true, then zerg has to seek another way to fight dragoons.

I suspect that the answer is to include speed zerglings in the unit mix along with either hydras or mutalisks. We’ve seen games where tscmoo zerg can coordinate its lings and mutas to beat Locutus dragoons at low cost: The lings get in among the goons, interfering with their movement, and the combined arms attack becomes very effective. A similar tactic should work with lings and hydras, but it would require tactical coordination and strong micro.

In any case, higher efficiency of dragoons in bot play seems like a fundamental meta difference. Wraiths are another unit with high speed and long range, and they also show higher efficiency in bot play (among the best terrans).

splitting irradiated mutalisks

In pro ZvT, irradiate counters mutalisks. Once terran has irradiate, it rarely makes sense to spawn any more mutalisks. That is because mutalisks are stacked tightly to attack, and when 1 mutalisk in the stack is irradiated, the UI provides no way for the zerg player to select it out. You have to separate the mutas until you find the glowing one. Even the greatest zergs can’t quickly and reliably separate the mutas to split out the irradiated one, so frequently all the mutas take serious damage.

In bot games, the irradiated muta can instantly fly away from the group. Irradiate becomes a weak counter to mutalisks, and zerg may well want to continue spawning mutas. Terran will have to seek another counter.

This meta difference can be chalked up to a design issue with BWAPI. BWAPI provides bots the ability to know the status of every unit in a stack, and to be able to pick out any one of them, while the game UI for human players does not. It’s not obvious how to revise the API, but we may want to do it at some point.

mass unit control

Occasionally a top human zerg will collect more than 12 mutalisks—more than one control group—and harass with them all. It’s mechanically difficult, because each control group has to be separately commanded. Jaedong was the first to demonstrate the skill, as far as I have seen. Only the best zerg players can do it well.

Bots, of course, can control units individually and don’t much care how many there are. Bots can, in principle, better control large armies, such as large mutalisk clouds. I’m not sure how important the effect is, but it seems that it ought to have some affect on the meta.

Also see Artosis’s comment about his McRave game, “I’ve played against and cast the best Protoss players in the world, and this bot had a way better economy than anything I’ve ever seen.” Underneath, it is the same issue: Bots can individually control units, so they can make every worker at the earliest possible moment, never leave it idle, and on top of that do micro tricks to speed up mining for the entire game (humans can do that in the very early game). A potentially stronger economy is a fundamental meta difference.

Also Cadenzie’s comment “the individual unit control and multi-tasking is sometimes beyond human ability.” A difference like that must have metagame effects.

Cadenzie versus Locutus 5 game showmatch

Thanks to SCHNAIL we’re starting to see bot matches against strong human players. Today I write about the Cadenzie (Z) 5 game match versus Locutus from last Tuesday. It comes with an interview on Making Computer Do Things. Watch the games first if you’re interested.

Cadenzie is not a top pro; to me her play looks a little slow and awkward compared to the best. But she is very strong, and when the match was announced, I expected her to win every game. In the event, she scored 3-2. I thought all the games were one-sided: Either zerg won fairly easily, or else Locutus collected enough dragoons and was able to overpower her hydralisks with superhuman dragoon micro. In the game where Locutus chose to go zealots instead, the zealots looked wimpy; Steamhammer has had the same experience.

I also felt that the first game was the only one that Cadenzie played with 100% seriousness (and she said as much in the interview). She played a well-rehearsed build with mass hydras and drop. Locutus (lacking PurpleWave’s strategy skills) did not understand how to read her build in order to cut corners in the opening, and it fell behind (slightly behind in bot terms, “massively” behind according to Cadenzie—the distance varies by skill level!). When hydras collected outside its natural, Locutus trickled units out through the narrow opening in its wall and let them be picked off, falling further behind. The front gateway fell, and then overlord speed finished. She’d gotten overlord drop first, and she picked up hydras and put them in the protoss main, where they cleaned up with little effort. I judged that she could as easily have skipped drop and powered through the front door.

I thought the most interesting answer in the interview was “I played in a tournament before where there was a team melee relay style with a mix of progamers and beginner level players and they would take turns every 2 minutes, in a way it was most similar to that.” In other words, Locutus was extremely good at some aspects of the game, and extremely weak at others. That is similar to other games where computer programs were good enough to play humans and not good enough to win every time; for example, chess programs in the old days were superhuman at tactics and weak at strategy in a very similar way.

She repeatedly emphasized that bots need to adapt more to what they scout. I think that’s the main takeaway.

Compare Artosis versus top bots on Twitch: Notice how often Artosis says “In this build, when I see such-and-such, I do so-and-so.” Human players have extensive knowledge of how to play in specific situations, and no bot comes close. SAIDA may come closest, with its one all-purpose build and numerous reactions, but its understanding is shallow by comparison. PurpleWave I think has the greatest strategy knowledge of any bot, but it has a weak understanding of tactics. Locutus relies on its strong micro and aggressive tactics, which cover for the weaknesses in other aspects. Bots not only know less, they don’t integrate their knowledge into a theory of how the game works—they don’t understand what they know, so they a weak at drawing inferences and their adaptation ability is shallow. See for example the Artosis game versus Killerbot by Marian Devecka, where Artosis was able to guess that a third zerg base at an early timing was likely, and scouted for it specifically. In other games, he did not spend effort scouting for expansions he did not see as likely.

By the way, the last game is the best, Artosis vs McRave starting at ~2 hours in.

Filling the knowledge gap I believe will require machine learning. Writing rules and reactions by hand will take a day or two less than forever, and search will not solve all problems if it has only handmade evaluations to rely on. For Steamhammer, I’ve figured out a way to put together familiar algorithms that will execute fast, and I expect it will also learn fast (from little data) and be reasonably accurate (not amazing like deep learning, but adequate). It’s part of my strategy adaptation goal. If it works as well as I hope, I WILL CRUSH YOU PUNY MORTALS BENEATH MY STEEL THUMB BWAHAHAHAHA, or something like that. Actually the first application will be nothing more than an evaluation function to choose openings and strategies, valuable but extracting only a little of the potential, and if it’s successful then after I explain how it works everyone else will get ahead of me again. That will be good too.

SCHNAIL closed beta

For anyone who hasn’t noticed, SCHNAIL is in closed beta as of Friday, and all bot authors are invited. The fronting web site is unfinished, but that should not be an issue for early testers.

I’ll have more to say after I’ve tried it out myself.

Update: When will the mascot character get a proper radula? Mollusk pride!

Steamhammer 2.4.2 change list

Steamhammer 2.4.2 is available as source from Steamhammer’s web page. The documentation is updated too. As far as game play goes, it is identical to version 2.4.1 in the SSCAIT annual tournament. The main difference is that Steamhammer can automatically recognize when it is running under SCHNAIL and treat its opponent as a human.

I hope SCHNAIL testers will let me know how it goes, so I can adjust the behavior against human players.

Here is what’s new.

configuration

  "Skills" :
  {
    "SCHNAILMeansHuman"       : true,
    "HumanOpponent"           : false,
    "SurrenderWhenHopeIsLost" : true,

    "ScoutHarassEnemy"   : false,
    "AutoGasSteal"       : true,
    "RandomGasStealRate" : 0.0,

    "Burrow"             : true,
    "MaxQueens"          : 1,
    "MaxInfestedTerrans" : 0
  },

• The SurrenderWhenHopeIsLost, ScoutHarassEnemy, AutoGasSteal and RandomGasStealRate items are moved from the Strategy section to the Skills section of the configuration file.

• An internal flag Config::Skills::UnderSCHNAIL is added. It does not appear in the configuration file but is set by code. It is true when Steamhammer detects SCHNAIL’s schnail.env file in the read directory. Code can use this to do something different when running under SCHNAIL; it may be useful someday.

• A flag SCHNAILMeansHuman is added. If UnderSCHNAIL and SCHNAILMeansHuman are both true, then Steamhammer overrides the configured value of the HumanOpponent flag and sets it to true.

In other words, if you set SCHNAILMeansHuman to true, then whenever Steamhammer is running under SCHNAIL, it will assume that its opponent is a human. That should almost always be what you want. If it’s not what you want, you can turn off SCHNAILMeansHuman and set the HumanOpponent flag by hand.

• Steamhammer messages “gl hf” at the start of the game if it thinks the opponent is human. It actually wishes the human to have bad luck and suffer torment (BLAST), but it doesn’t mind lying. The real purpose of the message is so you can tell whether the HumanOpponent flag is turned on when it should be.

• The game message was formerly messed up. I think I finally fixed it.

• In the IO section of the config file, I separated Config::IO::PreparedDataDirectory (bwapi-data/AI/om/ for prepared opponent model files) from Config::IO::StaticDirectory (bwapi-data/AI/) for reasons of what I prefer to call clarity. The change doesn’t affect anything but a name in the code.

the HumanOpponent flag

I watched the SCHNAIL video and I’m pleased. Today I added a HumanOpponent flag to Steamhammer’s configuration file, to make the bot more fun for humans to play against. If you set it to true, the flag has 2 effects:

1. It tells the opponent model that the opponent is unpredictable, which the opponent model takes to mean “I’d better be unpredictable too so I can’t be exploited.” It chooses openings more randomly. This is just turning on a standard Steamhammer behavior that already existed.

2. When losing, Steamhammer ggs out much earlier. Steamhammer in the past assumed that its opponent is a bot and may mess up the win, so it surrenders barely one step before it is provably unable to win. (Even so, I’ve seen a couple games over the years when it might have won if it hadn’t given up—when it had no drones and no combat units and no money, but had units in production that could outfight the opponent which was also near death. It’s extremely rare.) Versus a human, that gg timing is way way too late, unacceptably late. When HumanOpponent is turned on, Steamhammer follows a two-part rule: A. The enemy is much stronger than me—my supply is less than half the enemy’s known supply. B. I have been hurt—my supply has fallen below half of its high water mark. The B part ensures that Steamhammer doesn’t give up without a fight merely because it has been grossly outmacroed.

I’m curious to find out how well the gg rule works in real human games. In test games, I thought the gg still came later than a strong human would prefer. But perhaps it is a good fit for human players who deem Steamhammer an interesting opponent. Also, the enemy’s known supply is generally less than the enemy’s true supply, and Steamhammer is weak at scouting so often it is much less. But I will need to improve scouting as part of my strategy adaptation project, and gg timing may improve when I do.

About SCHNAIL: Obviously we can’t expect SCHNAIL users to edit Steamhammer’s config file and set the HumanOpponent flag. The file won’t be exposed to them at all; they don’t need to know it exists. I asked Sonko if there will be a way for a bot to tell that it is running under SCHNAIL. Whatever the final arrangement ends up being, I will help Steamhammer fit into it so the bot does sensible things in human games.