Starcraft AI blog | Entries from December 2020

only one horrible game

Last year, Steamhammer finished SSCAIT for the first time with no losses due to crippling bugs and only 2 close calls. So far, it is on track to repeat in this year’s SSCAIT. I have seen all its games up to now, and there are no losses due to egregious bugs (only the standard issue flaws) and only one near miss. That’s great compared to Steamhammer’s early years, but I still want to fix the bugs.

The bad game is Steamhammer vs legacy (random zerg). Steamhammer made a number of mistakes in the game and suffered at least 2 bugs. The bug I could not accept is that it built spore colonies to defend against air attack—very early, immediately after scouting legacy’s base and seeing that it had not yet taken gas. It’s not possible to get mutalisks that fast, and without gas there was not even a hint of future risk. In fact, legacy never took its gas and played the whole game with a mass slow zergling plan. If Steamhammer had held on to the drones instead of wasting them on static defense due to a bug, I doubt the attack would have troubled it at all.

I traced the bug to, of all things, an integer overflow. The routine that figures out the time the enemy’s spire will complete returns INT_MAX for “never” if there is no evidence of an enemy spire... and I brilliantly added a margin for the mutas to hatch and fly across the map. In C++, integer overflow is officially undefined, so the compiler retreats to its room and laughs its head off before generating the code that will cause the most possible confusion, because “undefined” means it can do that. I don’t know what it did this time, but it was not as simple as wrapping around from an extreme positive value to an extreme negative value, because that would have caused the bug to show up in half of ZvZ games. No, it’s better if it shows up only when it will cause a disgusting blunder out of nowhere.

Anyway, it was easy to fix. I also fixed a bug that caused multiple commanding of overlords. And I’m writing code to collect data for my main current project. Progress is underway.

SSCAIT and performance over time

Yesterday I claimed that the cannon bot Jakub Trancik “has been falling slowly in the rankings year by year, even as bots that began above it fall further.” Is it true? I see room for argument, but there is something to it.

graph of SSCAIT finishes for 5 bots over 6 years

Here is a graph of the SSCAIT finishing ranks of 5 bots over 6 years, from the 2014 through 2019 editions of the round robin phase of the annual tournament. The bots were selected to have no updates over the time period; it is the same code every year, according to the info on SSCAIT’s website. (UAlbertaBot by Dave Churchill was updated in 2015, so I didn’t include its 2014 finish.) The finishing ranks are normalized so that finishing first is 100 and finishing last is 0, so that the ranks can be compared over time even though each year had a different number of participants. The graph shows old bots falling in relative performance as new and updated bots grew stronger over the years.

Jakub Trancik’s finishes were nearly flat from 2014 through 2017, and it fell in 2018. It did not participate in 2019, though it has been allowed back this year. The other non-updated bots showed declines over the period, but not always steep declines. Each bot has a visible knee in the curve, where it bent more sharply down. The year of the knee, the last year of relatively stable performance, ranges from 2016 for Tomas Cere to 2018 for Skynet by Andrew Smith. That might be because performance gains have accelerated in the last few years, or it might be because it takes that long for enough new and updated bots to be tuned against the unchanging old ones. Maybe the knees occur when flashy newcomers start to exploit specific weaknesses of the old guard.

Of these 5 bots, Jakub Trancik has the flattest curve, though it doesn’t look exceptional. It is not a statistical outlier, and Skynet’s curve is almost as level. Jakub Trancik is also the least sophisticated bot, and it has the most extreme and unconventional strategy. The facts might be related.

For comparison, here’s another chart with 4 more non-updated bots. Roman Danielis missed 2016. These curves also seem to have knees, though less sharp, and the shape of Roman Danielis’s curve is not clear to the eye.

graph of SSCAIT finishes for 4 more bots

SSCAIT 2020 so far

The annual SSCAIT has progressed far enough that the competitors have roughly sorted themselves into groups. It’s about 1/4 complete, and we can get an idea of how things are going. Currently we have #1 Monster, which may in fact be the favorite to finish first, but it’s too early to talk about detailed finishing order.

Iron is doing better than I expected, though I guess it’s within the statistical margin of error. I have always been bemused by the consistency of cannonbot #38 Jakub Trancik, 11-16 for 41%, last updated in 2013. It has been falling slowly in the rankings year by year, even as bots that began above it fall further; apparently improvements that help against usual play do not help as much against the cannons. What stands out more to me are the bots that collapsed. Styx is failing to start and losing every game. Microwave, which should be in the top 16, is currently #31 of 56 with 16-15; maybe the latest update introduced a bug.

The biggest upset is #52 Marine Hell > #8 Steamhammer; Steamhammer went from lifetime 67-2 to 67-3 (since opponent modeling was added) against this opponent after failing to scout Marine Hell’s unit mix and making the wrong choice of counter units, among other mistakes. A more interesting upset is #48 Garmbot by Aurelien Lermant > #9 Dragon, where Dragon tried its usual harassing game plan but ended up defending all game instead, and could not hold it together. I was also pleased with #16 Skynet by Andrew Smith > #2 BetaStar after BetaStar chose a risky build, and this time didn’t get away with it. Don’t underestimate your foes: “It is not enough to be a good player, you must also play well” — Siegbert Tarrasch.

Steamhammer is currently at #8 with 17-5, having played only 22 games, fewer games than any other bot in the top 16 except Stardust, which has played only 20. I think Steamhammer’s most likely finish is #9 or #10, but we’ll see. Last year I was slightly pessimistic, and if the same is true this year then it may hold its position.

lazy learning

Today’s post is AI background info. To use a neural network, or a decision tree, or most machine learning methods, you train it first and query it afterward. The system includes a model, represented as a data structure with code to interpret its meaning, and the learning algorithm adjusts the model to fit the training data. Training may be slow, but the model is usually designed so that answering queries is fast.

Some bots, including Steamhammer, do their opening learning differently, without persistent model data: They store only the training data, records of what happened in past games. When querying the system—deciding what opening to play based on experience—they look through the game records to find any relevant information and decide on the fly, not keeping any other permanent representation. That is called lazy learning, as opposed to eager learning, because it computes its model lazily on demand rather than eagerly up front. It’s a special case of lazy evaluation. There is also the closely related instance-based learning, which often means the same thing.

As an aside, this is analogous to the difference between an interpreter and a compiler. Partial evaluation of a lazy learning algorithm produces an equivalent eager learning algorithm, in exactly the same way that the appropriate Futamura projection converts an interpreter into a compiler. Not that that’s important, I just think it’s fun.

The prototypical lazy learning algorithm is k-nearest neighbors: Represent your training data as points in some space, each associated with a value. To query for some point in the space, you find the k closest points whose values you know and fit some function to them (say, take their average, which amounts to fitting a constant function). Choose k > 1 to average out any noise in individual values. I like to think of it in more general terms: Find whatever instances you can that are similar to your query, and generalize or synthesize the information in the instances into an answer to your query. That is a description of case-based reasoning. You can see k-nearest neighbors as a lightweight example and case-based reasoning as a heavyweight example of the same underlying idea.

Lazy learning does not mean that the learning system does not have a model, it only means that it does not have an explicit model that it stores as a data structure. There is still code to answer queries, and the code embodies an implicit model of how to turn training data into answers.

Lazy learning has advantages and disadvantages. Learning is trivial, since it amounts to storing records of your experience, so it is easy to keep up to date. With a deep learning model it is not very useful to update online in regular games, because training needs so much data. On the other hand, if your learning data is large, then with lazy learning you may not be able to use all of it to answer any given query. You’ll need a way to pick out the relevant records. It’s still effective if the relevant records are few enough, but by restricting the data you use you might be dropping useful information.

Lazy learning methods are natural for opponent modeling. I wouldn’t count them out for strategic uses, though. I can imagine a case base of strategic situations with reactions and evaluations of how the reactions worked out. That could produce a bot which keeps up with the meta by learning it from experience.

Stardust-BetaStar game

Steamhammer continues its tradition of starting strongly: It has scored 7-0 in its first games. That is better than Stardust at 6-1. The winning streak will, I’m sure everyone agrees, unquestionably continue for the rest of the tournament, unless of course Steamhammer grows bored with winning. I imagine that many people have seen the Stardust loss, because it is interesting for more than one reason. To me, the key point is that Stardust took island bases.

The game is Stardust-BetaStar on Andromeda. Stardust started a shuttle immediately after its observatory, and the shuttle picked up a probe and headed out immediately, at about 8:10 into the game. By this time, Stardust had already fallen behind in worker count and army size, because its build was not as efficient—which is interesting too, since BetaStar was derived from Locutus. The shuttle headed straight back home to pick up 8 more probes, setting Stardust further behind in economy; after the transfer its main was not mining at full strength, and the timing was too early because the probes arrived at the island base before it was finished and couldn’t mine there at first either.

As soon as the island base finished and the 9 probes there had turned in their first mineral cargoes, the shuttle picked up one of them and carried it due north to the other Andromeda island base. That base too started as soon as minerals allowed, and the shuttle returned—not to the main, but to the south island base. It picked up a load of probes there, leaving the south island severely undersaturated, and flew north straight into BetaStar’s moving dragoons, being shot down with no attempt to evade. Ouch.

Meanwhile BetaStar was in Stardust’s main, starting to take things apart. Given that Stardust intended to take the second island, transferring probes from the main would have made more sense. The rest of the game was boring. BetaStar had only dragoons and never found the island bases, so they were invulnerable. Stardust mined its islands and watched dragoons move around the map with its observer, but never attempted to rebuild. At the end of the game, neither island had as much as a pylon on it. And, curiously, neither island mined gas. Overall, taking the islands contributed to Stardust’s loss, but it didn’t seem like an entirely bad idea because the islands were untouchable. In a longer game, the extra bases would have paid off.

That tells us something about the author of Stardust and Locutus. Bruce Nielsen is like me in one way: We are both willing to enable cool but half-baked features in serious games.

Here is a BASIL game where Stardust started to take an island, though the game ends before the nexus is placed: Stardust-XIAOYICOG2019 on Python. The SSCAIT maps include 3 maps (of 14 used) with islands: Andromeda, Empire of the Sun, and Python.

Steamhammer 3.4 plans

After every release, I traditionally post about my plans for what’s next. Sometimes I carelessly follow the plans before I can change my mind, though usually I can catch myself before I slip into a rut.

The next release should be 3.4, or 3.4.x if I give numbers to some test versions. Like last year, I will treat the tournament season as over, and go into infrastructure work of the kind that introduces new bugs, so that there’s time to iron them out before the next tournament season starts in the fall. The last time I considered plans, I had not decided between the opening timing project and the machine learning evaluation function project. I have to eventually do both and they support each other, so I may end up doing one after the other no matter the order. But for now I’ve decided that opening timing data (with other statistics) is my next item. I think it will make a bigger difference as a first step. I have updated the version number in the code and opened the next file to edit, the work is notionally underway.

When this project is complete (and I may do it in stages rather than all at once), Steamhammer will no longer need long sections in its configuration file telling what openings to play against each race and what the counters are to different enemy strategies. It will decide for itself based on its data. I will likely remove the configuration features altogether, since they don’t fit well with a data-driven architecture. It’s another step toward getting Steamhammer to think for itself rather than to slavishly follow instructions.

All details are up in the air; my plan remains an outline. Writing the code to collect the data I want is easy. Collecting the data I expect to be lengthy but straightforward. The hard part, I think, will be rewriting the opening selection code. The current version uses tricks to shortcut the work, and I won’t be able to get away with that any more, so I expect I’ll have to write it more or less from scratch, adding complexity in the process. On the upside, I can code it for modifiability, so that future opening adaptation changes (that I already know I want) will be easier.

Without more planning work, I can’t estimate how long it may take. 3.4 may be ready for AIST S4, or I may enter the current version 3.3.5. Or possibly I’ll fork an interim version with surprise improvements; anything’s possible.

Steamhammer 3.3.5 change list

Here’s what’s new in Steamhammer 3.3.5, which is the SSCAIT tournament version. Play is visibly improved with this version, and I expected a significant improvement in results, but if so it’s not showing. Santa came by, and Steamhammer’s web page is updated with binary and source downloads.

production

• The “build beside the base” bug (mentioned here) is fixed. It’s a damaging bug and the fix is critical, but the issue that learning hides bugs means that damage will be ongoing for a while yet.

• Building manager can construct sunkens and spores when ordered, carrying out both steps itself. Formerly, laying down the creep colony, and then morphing it to a sunken or spore, were separate steps that had to be queued as two distinct items. Further, there was no connection between the queue items, so sometimes a creep colony intended to be a sunken instead became a spore, and vice versa. The change ensures that you get the static defense you ordered when and where you want it, and simplifies other code so that future features will be easier. The implementation is not perfect; a game against Stone shows that it can go wrong when the spawning pool is destroyed and has to be replaced.

• Fixed a production bug that prevented research from being done in a hive or greater spire. The result was a production freeze, so this is a critical bug fix, though the bug was rare. When I updated to BWAPI 4.4.0 I ensured that research could be ordered in a hive or greater spire, as newly allowed by the BWAPI version, but since my mind is a steel trap I was not clever enough to verify that the research succeeds.

scouting and information

• A terran scan counts as mobile detection. For some decisions, like how useful it will be to make lurkers, having ever seen an enemy scan counts the same as having seen a science vessel or an observer—it means the enemy has the tech. (The question “does it look as though this location is in range of an enemy detector?” is different and was already answered in version 3.3. See UnitUtil::EnemyDetectorInRange(), which did not need any code change under BWAPI 4.4.0 to notice enemy scans.)

• I was annoyed by games where Steamhammer’s drone scout was stopped by a bunker or cannons at the enemy’s front, and since it was unable to move forward, stuck there for the rest of its life as the game continued around it. Must scout—can’t scout—do nothing—die young after a wasted life. I added a rule to release the worker scout in that situation once friendly combat units arrive to keep an eye on the front. It will help the economy and my mood.

overlords

• Formerly, overlords not needed for any other purpose were assigned to wait at Steamhammer’s front defensive line, where the primary sunkens go if there are any. It turned out they were too vulnerable there; sometimes masses were shot down before they could escape. Now one overlord is assigned to the front line, and leftovers are sent to the current “main” base, where they are less available but safer. Occasionally the main base changes and they undertake a mass migration, but I haven’t seen it cause a problem.

• Assign overlords to watch island bases, when it is safe and with a low priority.

micro

• MoveSafely() is smarter. Its purpose is to move a unit to its destination while avoiding enemy attacks. Now a moving unit in danger from enemy mobile attackers seeks nearby friendly units that can defend it—an air unit seeks anti-air defenders, while a ground unit seeks anti-ground defenders. It works by clustering all potential defending units, and looks for a cluster that it can reach. (Clustering is done centrally by the ops boss, on demand, and cached for the frame. A common case is that many overlords are independently moving safely, so that clustering is more efficient than seeking defenders unit by unit. It also provides more information for the decision, like how strong the cluster is.) If it doesn’t find defenders, or if it is in danger from a static building that can’t chase it, then it retains the old behavior of fleeing directly away from the attacker.

• MoveSafely() has a safety margin. The margin is wider for a worker, and for a unit that has no regular attack (like a flying detector or a spellcaster). This is a big improvement that makes overlords safer. It has more effect in practice than the above change of seeking defenders. The reason there was no safety margin before is that I forgot it—I let myself be interrupted before I had entirely finished the feature. Concentration is important!

zerg

• A bug caused gas mining to go severely wrong in a small proportion of games, persistently mining gas when it needed minerals more, a weakness that I originally fixed in version 1.0. In one game, only 3 drones survived and all of them mined gas the rest of the game even as gas built up to 8000 and minerals were out. I didn’t pin the cause down 100%, but the only code that could have caused it was a check that said “hey, I have too many drones and some are idle, there’s no loss in mining gas whether I need it or not.” I commented out the check, which never worked quite as intended anyway, and I haven’t seen the bug since. The ultimate error is likely in the worker manager.

• All-new spore building code fixes further spore placement bugs beyond what the building manager changes fix. It is rather a lot of code change for a small behavior change, but the new code is more general and modifiable and future improvements will be easier.

• Limits on making sunkens and spores are slightly altered. Sunkens to protect against vultures and dark templar are possible minutely later in the game. It is a little more willing to add a spore despite already having a spire to defend with scourge.

• The building placer is a tiny bit smarter about sunken placement to stop cannon rushes and bunker rushes. I don’t notice any practical improvement.

• Steamhammer rarely had more than 1 queen on the map, though it is configured to keep up to 6 and often decided to make more than 1. Something is not right in the strategy boss. I switched it to make queens in batches, as resources allow, rather than one by one on each call “what should I make now?” Now it reaches as many as 2 queens on the map... there is still a bug somewhere. 2 queens are not an efficient number for broodling.

• Minor updates to support multiple defilers. Formerly, Steamhammer never made more than 1 defiler at a time, in part because defiler micro is cpu intensive (it was much more cpu intensive before I fixed its original bugs). I noticed that some opponents were quick to target the defiler, and killed defilers reliably so that little defiling occurred. Making more defilers, currently 2 at a time, is my attempt at a quick fix. It may help, though from games so far I think not much.

There is a small update to the strategy boss, to order more defilers up to a limit. Also, if the enemy is nearly dead, don’t make more than 1 defiler; it won’t be useful. Trivial changes to the micro controller for defilers, MicroDefilers, slightly improve efficiency; it already supported multiple defilers.

• Small adjustments to ZvZ counters in the strategy boss. Nobody will notice.

configuration

• Config::Debug::DrawDefenseClusters draws the air defense and ground defense clusters used by MoveSafely() to find defenders to flee toward. A unit that can shoot both air and ground will belong to both an air defense cluster and a ground defense cluster. Static defense buildings are included in the clusters, so that (for example) an overlord fleeing a corsair may take refuge with a spore colony.

• Config::Skills::MaxDefilers is added. It’s currently set to 2.

openings

• Added 9PoolSpeedSpire.

• Two related but different openings were named Over10HatchHydra. Oops. I renamed one of them to Over10HatchHydraB.

SSCAIT is popular today

I see 15 viewers on the SSCAIT stream as I write. More usually I see 1 to 4 of late, which presumably includes me when I’m watching. It’s a good sign; the annual tournament is driving interest.

SSCAIT tournament soon

I’ve just uploaded Steamhammer 3.3.5, which will be the SSCAIT tournament version unless it hits a last-minute bug. If you dare to rush through your opponent prep, now’s the time! Expect the change list after the deadline. Optically, this version fixes all the most visible bugs introduced in and since the AIIDE version; the games look cleaner, overlords live longer, bizarre expansion behavior does not happen. Results are only slightly improved, though, in part because of the learning hides bugs issue. I expected better.

Starting on 19 December, there’s been a rush of updates. In fact, every bot updated after 27 November was updated (or re-updated) on 19 December or later, so there’s a gap in the dates.

There is not much to predict about the tournament. I think everyone can foresee that the top finishers of the round robin phase will include Stardust, Krasi0 (if it competes as terran this year), Monster, and PurpleWave, and likely BananaBrain which has been doing well. Halo by Hao Pan is significantly weaker, and there is a gap below Hao Pan and adias (aka SAIDA) of nearly 100 elo before the remaining strong bots. Steamhammer is likely to finish near the middle of the top 16, and then survive not very long in the elimination phase, as in past years.

unexpected infrastructure work in Steamhammer

To fix a bug, to make new features easier, and to improve usability, I updated Steamhammer so that the building manager carries out both steps of constructing a sunken colony or a spore colony, laying down the creep colony and then morphing the creep. Openings that used to say "creep colony", [other steps while we wait for the creep to complete], "sunken colony" now go "sunken colony", [other steps]. If you ask for a creep colony, then that is all you get; other code will have to take care of morphing it, because the building manager no longer does that step by itself, only the two-step process as a unit.

The bug, by the way, is that there was no tracking of what was supposed to happen to each creep colony. When making a sunken and a spore at the same time, they often got swapped, so that instead of (say) a spore in the main and a spore and sunk in the natural, there might be 2 spores in the natural and a useless sunk in the main. If you’ve watched many Steamhammer games, you may have wondered why spores are so often oddly placed—now you know one reason.

I didn’t intend to do any serious infrastructure work, but this piece turned into it. It was worth it, because the bug was causing more trouble after every update, but there was more to it than I realized. The assumption that you build a creep first and later morph it as a separate step turned out to be baked into the codebase. Besides the openings and the building manager, I had to closely analyze parts of the strategy boss, the production manager, and the macro acts themselves, and make changes that risk introducing new bugs. It has worked perfectly in tests so far, but I do not have confidence that reactions and corner cases will work in every situation.

I haven’t dealt with the issue of canceling and restarting sunkens due to the curious hit point change of the sunken colony, which Steamhammer has supported since version 1.4. Changes are needed. After that I think it’s not much extra work to implement delayed morphing of sunkens, like Arrakhammer, so I may do that too. But I could be wrong about the amount of work....

How do these subtle assumptions become so deeply threaded through a codebase, and so hard to change? On the one hand, it’s a failure to separate concerns, so you feel you could have done it better. On the other, how are you supposed to know what concerns you’ll have in the future? Software is hard.

software maintenance and the decision cycle

Whether in your brain or in a Starcraft bot, to act in the world you first collect information, evaluate the information to make decisions, and execute your decisions. The steps may not be as neatly separated as the words that describe them, but they are always there. Think of the psychology concepts of perception, cognition, and motor control, or the military OODA loop (observe, orient, decide, act), and other decision cycles.

When you write a big piece of software, it matters how you organize the steps. In general terms, Steamhammer follows its parent UAlbertaBot, and many other bots, in the way it organizes them: By the decisions. The code that makes a decision is responsible for collecting whatever information it needs, by whatever combination of calling BWAPI directly and calling on the rest of the program, and responsible for executing its decisions, again sometimes calling BWAPI directly to issue orders and sometimes passing internal orders to the rest of the program. So one module makes spending decisions (“a hydralisk next”), one module controls mining workers (“send it to that patch”), and so on.

To a certain extent, that organization is inevitable. Decisions of different kinds have to be made by different code (absent super-powerful machine learning or some other extreme abstraction technique), and the code has to have inputs and outputs. But the haphazard way of collecting inputs, and of passing along outputs, is not so good. I noticed long ago, and over time I’ve seen more clearly, that it is error prone.

On the input side, the data a module sees depends on the order that modules run in: They are not independent. I sorted modules so that, on each frame, information-gathering ones like InformationManager run before decision-making ones like CombatCommander, but in the full program the dependencies are not that simple. Read closely and you’ll find comments like “this must happen before that,” and comments like “eh, the data is one frame out of date but in this case it doesn’t matter,” and special cases to work around backward dependencies. I have fixed bugs, and I feel 100% certain that there are undiscovered bugs due to computing information only after it is needed.

On the output side, it’s difficult to coordinate decisions. A common error is double commanding, where a unit is given contradictory orders: One bit says “Look out, drone, the enemy is near, run away,” then the rest of the code doesn’t remember that the decision is made and says “Hey drone, you’re not mining, get back to work.” Most orders (not all) go through the Micro module for execution, and Micro knows not to issue two BWAPI commands for a unit on the same frame, so a frequent result is that the drone is told to run away one frame, then to mine the next frame, and so on back and forth. It’s a common cause of bugs where units vibrate in place instead of doing anything useful, and the worker manager (which makes a lot of special case decisions) has a particularly elaborate internal system to try to prevent it. Literal double commanding at the BWAPI level is only one issue; the same kind of thing can also happen at higher levels of abstraction, causing problems like indecisive squads.

The logical fix is to add architectural barriers between input, decision, and output. In principle, each module collects all its inputs and puts them into a data structure, then draws a line under it, done. Then it makes its decisions on that basis, records the decisions in another data structure (with the idea of forcing it to resolve any conflicting decisions up front), and draws a line under that. Then it executes the recorded decisions. Input, decision, and output become separate phases of execution.

In real life the dependencies are complicated and it’s not that simple. I’m thinking that the ideal architecture for input data is a fixed declarative representation of everything that might be wanted during a given frame, which is evaluated on demand, in the style of lazy functional programming. That way dependencies are explicit, dependency loops will make themselves evident, and only the information you need is computed each frame.

I don’t have such a beautiful solution for output. The Micro module is a partially implemented attempt to separate some decisions from their execution. It does help, but as we’ve seen above, even if it were a complete implementation it would not solve the problem. The decisions themselves have to be good, and though architecture can aid good decisions it can’t require them. Maybe there’s nothing for it but to be clear about exactly what you’re deciding, at what level of abstraction, and be careful to do it right.

learning hides bugs

Today I uploaded the third tournament test version of Steamhammer. Games of the second test version show that it’s already visibly stronger, with a couple of the worst weaknesses ameliorated. In particular, I fixed the heinous bug “Why would I expand to that base? No, I’ll just build macro hatcheries all around it instead.” Watching a lot of games to verify my changes, I was reminded of a lesson.

Bruce Nielsen, author of Stardust, wrote a comment about the disadvantages of opening learning:

I’ve found it refreshing to work without opening learning, as I was definitely using it in Locutus as a crutch to avoid doing necessary underlying work on stuff like worker defense or reacting to scouting information. While it of course worked to a certain extent, it also resulted in a lot of embarrassing losses from exploring builds that only work in very specific situations.

Learning seeks to adapt to the situation. Empirical learning, which is almost all of what bots do, adapts by experimentation, which means that some experiments will fail—those are the embarrassing losses. To me, the first part is more central, the “necessary underlying work” on skills. The bot’s own skills and tendencies are part of the situation that learning adapts to; if you lack a skill, learning will seek a workaround so that the lack causes fewer losses.

And the same if you have a bug. Steamhammer’s build-beside-the-base bug caused macro games to go off the rails. The loss rate did not increase as much as you might expect, because opening learning compensated by switching to all-in builds that did not lead to macro games. Now I have fixed the bug, and it should switch back to macro builds when appropriate. But the learning is slow, and it will not switch all the way back before the tournament, so Steamhammer’s tournament result may be worse than if the bug had never existed. Even though the bug is fixed, it contaminated my learning data, and having had the bug before makes play worse now.

Learning hides bugs. Which would be fine if it hid them completely, but of course that’s impossible. Bugs and weaknesses hurt less when learning can find a workaround, but still hurt. It becomes harder to evaluate your bot’s play and choose which weaknesses are more important to work on.

It makes me think that, if you’re making a serious evaluation of how well your bot is performing, you need to do some tests with learning turned off. Drop the crutch and try to walk without it. For example, you could take learning data from a previous version and freeze it, and run a test to see if there are regressions in playing strength versus particular opponents or when playing particular builds.

Steamhammer tournament plans

For the upcoming SSCAIT annual tournament, I’ll follow my usual plan. I’ve just uploaded a new test version Steamhammer 3.3.1, which fixes one of the critical bugs (and has another surprise change). I’ll drop frequent test versions until tournament time, and after the deadline I’ll release the tournament version. Time is short, so the changes will be mostly bug fixes and low-risk improvements that are unlikely to break stuff.

I expect the standard long no-upload period while the tournament runs. I will either turn to SCHNAIL, or else I’ll work on one of my machine learning ideas. Just after tournament season is the ideal time to add bugs and their associated major new features, so that the rest of the year can work desperately to fix—I mean, to tune them.

Steamhammer 3.3 source

Steamhammer’s web page is updated with source and binary downloads.

Steamhammer 3.3 change list

I’ve uploaded Steamhammer 3.3 to SSCAIT. The changes to play are slight, and stream viewers are unlikely to notice anything even if they look closely. Nevertheless, it’s an important upgrade. If no disasters strike overnight, I should get source up tomorrow.

BWAPI 4.4.0

• Steamhammer is updated to BWAPI 4.4.0, at long last. In AIIDE this year, it was the only updated bot which still relied on the older BWAPI 4.1.2, now Positively Ancient by community consensus.

Part of the change is switching from VS2013 to VS2017. VS2017 has a more capable compiler backend with a stronger optimizer. Steamhammer’s DLL size fell from 1,211,392 bytes for version 3.2.20 to 1,100,800 bytes for 3.3, reflecting the difference in compiler, small changes in Steamhammer’s size, and any changes in the size of BWAPI.lib. That’s about a 9% improvement in object size for the effort of upgrading, surprisingly large. The bot presumably runs faster, but I didn’t measure it. Steamhammer already runs fast, so I doubt any speed improvement matters.

• If the opponent is terran, Steamhammer now tracks enemy comsat scans. Call InformationManager::getEnemyScans() to fetch the current scans. BWAPI 4.4.0 makes enemy scans available, where BWAPI 4.1.2 did not.

errors related to spell units

The spells comsat scan, disruption web, and dark swarm are represented in BWAPI by special units which belong to the player who casts them. Code that looks at units often has to know that. I searched through Steamhammer for code which did not know it, and found cases. Some were due to scans and only had to be fixed because of the 4.4.0 upgrade—it’s a subtlety for authors to be aware of.

• An enemy scan does not imply that the enemy has air tech.

• When deciding how many scourge are needed, don’t count enemy scans as air units.

• In squad targeting, do not target scourge at an enemy scan.

• In micro targeting (which is different from squad targeting), do not target enemy spell units.

• When clearing neutral blocks (like blocking eggs that are part of the map), do not target neutral spell units. A small number of maps have permanent neutral dwebs or swarms.

• In map analysis, don’t mark the areas covered by neutral spells as unwalkable.

• Don’t try to include a spell unit in a squad. I think spell units didn’t get in anyway, but now they are cut off at the first validity check, so it is as safe as can be.

zerg

• Research in a hive. Until 4.4.0, a hive could perform no research due to a BWAPI bug. Steamhammer worked around it by getting all its research done before upgrading to hive, crimping its strategy choices. In hive rush openings, it was unable to research overlord speed ever, a serious issue. I’m so glad this is finally cured.

• Research +3 air upgrades. Until 4.4.0, it was not possible to upgrade in a greater spire due to a BWAPI bug, and Steamhammer had a workaround to avoid +3 spire upgrades altogether.

• Trivial bug fix: If Steamhammer lost its queen’s nest at any time after it started morphing its hive, but did not lose the hive, it was unable to replace the queen’s nest. Well, it rarely needed to, but it does have queen skills. The limitation is lifted.

• Indiscernible bug fix: Prevent a production freeze that could have happened if Steamhammer wanted to research burrow, had no hatchery other than a lair or hive, and the lair or hive was busy researching something else. There was virtually no chance of this bug ever occurring.

openings

• No doubt under the influence of an Infernal Compulsion Engine (old tech now superseded by Electric Motives), I added the zerg openings 9Pool8Hatch and 9Pool9Hatch. Crona plays a related opening with success.

tracking enemy comsat scans

Today I implemented tracking of enemy scans in Steamhammer, a new feature allowed by BWAPI 4.4.0. Before, there was no way for a bot to know if the terran enemy had scanned you, except to notice that a burrowed or cloaked unit you had thought was undetected was suddenly being fired on.

It’s simple. An enemy comsat scan is represented as an unseen enemy detector unit (that potentially exists() from BWAPI’s point of view) with unit type Spell_Scanner_Sweep. To find the scans, iterate through BWAPI::Broodwar->enemy()->getUnits() and pick them out by unit type, done. The scan unit has the same sight range and the same detection range as a science vessel, 10 tiles or 320 pixels.

Unlike the disruption web unit and the dark swarm unit, the Spell_Scanner_Sweep unit is “flying”—the unit type returns true for isFlyer() (I assume that is what makes it equivalent to a science vessel). After realizing that, I searched for related bugs, and found some. One sneaky bug caused by the upgrade to BWAPI 4.4.0 was sending scourge to the location of a scan—“must shoot down the enemy air unit!” Another sneaky bug that was there all along was in analyzing the map: If the map had neutral dwebs or neutral dark swarms, then the tiles under those spells would be marked unwalkable in the internal representation; I never noticed, partly because few maps have permanent spells and partly because Steamhammer doesn’t implement pathfinding yet. I added checks for whether a given unit isSpell() in a number of places.

The scanner sweep unit has getRemoveTimer() so you can tell how long the scan will last. For enemy scans, it always returns 0 in my tests. That kind of makes sense, but it was not what I expected. I thought BWAPI would try to judge the visibility of the scan by whether you could see the swirling sparkles, which tells a human player both where and when the scan occurred, so it is enough information to judge duration. The change log says “the visibility of the scan is approximate and not accurate,” but nothing more.