Starcraft AI blog

Steamhammer 3.3.6 uploaded

Steamhammer 3.3.6 is uploaded, along with matching Randomhammer. It is primarily a bug-fix release, and many of the bugs are noted here. It includes some work on opening timing, but nothing finished, so no writeup yet.

plan recognition

• I instrumented the code that incorrectly decided that Wuli was doing a “Fast rush”, and did not find a logic error. It behaved as designed. I tightened up the thresholds so that a rush has to be faster to be recognized as “fast”.

tactical analysis

• Don’t assign defenders to destroy proxy pylons, or proxy supply depots, or proxy engineering bays, unless there are no other enemies to defend against. As an example, if there’s a pylon and a zealot, call in 4 zerglings to defeat the zealot, and ignore the pylon for the moment. (Formerly, Steamhammer called in 6 zerglings.) If there is a pylon and no other enemy, call in 1 zergling to destroy the pylon. This more frugal defense is also an important improvement to cannon rush defense: Steamhammer does not assign defenders to defeat the cannons, but relies on its anti-cannon sunken to hold them at bay, so that mobile units can instead go attack the enemy base. (Units made outside any cannon containment should have a clear path to the enemy, and another part of cannon rush defense is to build an outside base, diverting the scout drone if necessary.) This change means that Steamhammer also does not assign defenders to defeat the pylons powering the cannons, which was allowing the enemy base to survive too often. To be sure, blunders in unit movement count more.

• The base defense hysteresis is slightly increased, and the in-zone defense radius is increased significantly. This should ameliorate some chasing misbehaviors, such as when mutalisks seem to forget that vultures are in the base.

• base->inOverlordDanger() added: Any overlords at a given base are at risk. Used in deciding where to spawn fresh overlords (see below).

• base->inWorkerDanger() tightened up. The enemy must be 2 tiles closer before the workers are considered to be in danger. Workers have been panicking too early, losing too much mining time. More improvements are needed, but this simple tweak should help.

production

• Bug of trying to morph the same larva twice is fixed. This is a big one. Steamhammer now does its own tracking of orders to morph, bypassing BWAPI. The fix brought a side effect bug of making it impossible to morph a lair into a hive, because the lair remembered its previous morph order from hatchery to lair and refused to repeat it to become a hive. I fixed that by clearing morph orders of buildings when they complete. (Morph orders of units change immediately after they complete, because units are given orders right away.)

• Selecting which larva to morph is smarter. The routine had to be rewritten anyway to solve the morph-twice bug. Formerly, Steamhammer did an ad hoc analysis of the main hatchery of each base when deciding where to make drones, so it could make drones at a base which needed them, and otherwise preferred to spawn units at whatever hatchery had the most larvas (a hatchery with 3 larvas cannot spawn more, so you want to use those first). Now it follows a more general 2-step procedure. First, it decides what bases are best for the production, sorting them into priority order. Second, it scores every larva for nearness to a priority base, and for hatchery larva count. A larva that does not belong to any hatchery (because its hatchery was destroyed) gets a higher score (because it is not likely to live long), provided the enemy is not near. The enemy just destroyed the hatchery, they’ll hit any stray units too.

Base priority is implemented for drones and overlords. For drones, all base hatcheries are considered, not only the main hatchery, and a base that is under attack is avoided when possible. This is a small improvement over the former behavior, and will become more important when Steamhammer distributes hatcheries better, as I plan. For overlords, if the enemy has wraiths or corsairs or other overlord hunters and we have spores to defend, bases with a spore colony get priority. A base under active attack by anti-air units is avoided when possible. This is a substantial improvement and will save games.

micro

• Drones do not burrow in reaction to cannons, or a nearby bunker or sunken colony. It was an oversight in the original implementation. The drones would burrow just out of range and remain burrowed unless and until the enemy static defense was destroyed, not a useful behavior.

• Fixed a bug causing double commanding of overlords.

• Fixed bugs causing double commanding of drones to burrow or unburrow. It’s not harmful, the commanding infrastructure swallows the extra safely, but I fixed it anyway. Though I didn’t entirely fix it, because I see double commanding to unburrow can still happen.

scouting

• In the last version I added a feature to release the scouting worker under tight conditions that ensured that the scout was only released if it was unlikely to be able to scout anything more. A drone stalled from getting into the enemy base by cannons would be released when zerglings arrived to keep watch, for example. I always expected that the conditions would turn out to be too tight, and I was right. I dropped the condition on the distance to the enemy base, so that (for example) a drone chased across the map by enemies could be released if zerglings showed up to save it. I will probably loosen the conditions further in the future.

zerg

• Fix to ZvZ emergency spores due to an integer overflow bug.

• In an emergency, build sunken colonies or spore colonies even if the drone count is low. Keeping the last 5 drones alive is more important than making another drone because you only have 6.

• Don’t break out of the opening versus cannons unless it is necessary to make a faster spawning pool (to make a sunk to hold the cannons at bay). This is a bug fix. It was always intended to work that way.

• Place any anti-cannon sunken colony near the ramp to the main when possible. If the base has a corresponding main base (it is the natural of some main), then prefer sunken positions near the entrance to the main. This is another advance in cannon rush defense. It will be harder for the cannons to creep past the natural and into the main, as MadMixP and Juno both like to do, and it is generally good to defend the ramp anyway. Steamhammer’s cannon rush defense has grown complex, with many necessary skills that add resilience, and yet is still easy to overcome. I should post about that.

• Limit scourge production more tightly yet. I’ve improved this over and over, and it still makes trouble.

• Tune down the scourge priority for valuable targets, except for helpless guardians and cocoons. Scourge were too often trying to fly past the corsairs to reach the carriers, and not making it.

• Spawn all desired queens at once, instead of waiting to order them until resources are available. Rules are sorted so that more than 2 queens were almost never made, even when more were on order. This should make broodling more viable, but it’s mostly for fun.

openings

• I added new openings 12Gas11PoolLurker and 12Gas11PoolMuta. They’re more interesting than you probably guess, so I’ll post separately about them.

Steamhammer’s new bugs

From watching tournament games, I wrote down about 20 to-do items. A couple are ideas to try, but most are bugs, and 8 are critical bugs that I want to fix fast. I thought I had left Steamhammer in a more reliable state. :-(

Steamhammer lost its tournament record of no losses due to crippling bugs when it lost both games to Wuli. In both games, it misdiagnosed the protoss opening as “Fast rush” instead of “Heavy rush” and broke out of an opening that would have been strong, reacting as though it were facing a surprise 6 gate rather than 9-9 gates. You can’t survive a blunder like that against an aggressive opponent. I don’t see the error in the code, but I’ll work it until I find it.

Two winning games were even worse, against cannon bot Juno by Yuanheng Zhu. Steamhammer started out playing a build that had been 100% successful forever, then panicked and broke out of the opening when it saw the cannons. I think that somehow a bug got into the code that decides whether the current build is good for the situation and should not be interrupted. Having broken out of the opening, Steamhammer relied on the strategy boss, which made choices much inferior to the planned opening. In one of the 2 games it played well enough to eventually bust the cannons and win, but the other game, on Python, was plagued by further severe bugs. If you know Starcraft at all, this game will offend your sense of esthetics, and possibly your sense of ethics; it is so bad it is immoral. Steamhammer smuggled a drone to a hidden expansion, one of few successful actions in the game. It built the expansion after a long delay—there is probably another bug there—then sent nearly every drone made at the outside hatchery back to the main so that it died to the containing cannons—definitely a bug, and not one I’ve seen before. Steamhammer also burrowed drones that came near cannons, so that it eventually stopped mining (even gas mining) because every surviving drone was burrowed. And other bugs. So many bugs. :-( Zerg won the game on points because it mined less and because Juno had its own misbehavior, repeatedly building cannons in sunken range where they died instantly.

The rich player usually loses if the game times out, because the rich player made more stuff and it died. And you want to be the rich player. So win your winning games, don’t let them time out!

Steamhammer won its games versus XIMP by Tomas Vajda with its usual seeming ease, but there was a bug there too. This bug I’d seen before, but I only realized the cause in watching the tournament. Steamhammer had a partial production freeze, where it built nothing but scourge and zerglings for a period, losing ground compared to its regular production. The cause is that the reactive behavior of building scourge and the fallback behavior of adding zerglings when the situation allows were jointly blocking the regular behavior of refilling the production queue. Only special case units were being made, not regular units.

The bugs above are severe, but only affected a few games each. Together, they may have cost Steamhammer one rank in the tournament, if that much. I think the latency compensation bug is more severe, even though I can’t point to a game that Steamhammer definitely lost due to it. It affects far more games, and one of the effects is to drop one drone in 12 pool or 12 hatchery builds. Being one drone short of plan, starting early in the opening, is a serious handicap. It also sometimes drops planned zerglings or other units. Many openings have their timing disrupted, so that research is not accomplished or unit counts are reached late, causing further misplays.

I fixed the latency compensation bug yesterday. To do it I had to completely rewrite the routine that chooses which larva to morph into the next unit, so I took extra time and added features. Now it is more general: It is divided into one step that decides which bases are better locations for the next unit, and second step which tries to find a larva near one of those bases. The old version already knew that it’s better to make a drone at a base which needs more drones. I taught it in addition that if there are corsairs or wraiths flying about, and we have spores to defend, it’s better to make overlords at a base with a spore.

As mentioned here, I plan to release a bugfix version Steamhammer 3.3.6 first, then get back to work on my real project of 3.4. I’m often wrong in strength estimates, but even so, if I can fix the critical bugs I think 3.3.6 should be stronger by 50 elo points.

solidity in AIIDE 2020 - part 5

A little more on daring/solid before I post about Steamhammer’s bugs.

Are the numbers reliable? Are results repeatable? If I measure another competition, will the solidity measure of the same bots come out similarly? If I measure A as more solid than B, is it true? Does solidity mean anything?

Statistically, there are two parts to the question. One part is, given a fixed set of bots, what is the spread of the solidity numbers? How many games do you need to feel sure you can tell A is more solid than B? In principle, that can be answered mathematically by running probability distributions forward through the calculation. That would be a useful exercise anyway, since it could suggest better calculations. But it turns out that I am not a statistician, and I don’t want to do it. It might be easier to answer by Monte Carlo analysis: Simulate a large number of tournaments, and see the spreads that come out.

The other part is, how does repeatability vary as the participants in the tournament vary? Will the same bot get a similar solidity number in a tournament where many of its opponents are different? What if it is a tournament where the average bot is (say) more solid than those in the original tournament? Are there other player characteristics that might make a difference? Does repeatability improve as the number of participants increases, as you would expect? That can also be answered by Monte Carlo analysis, but you’ll have to make more assumptions about how players behave. I don’t see any substitute for analyzing actual past tournaments, at least as a first step to understand the important factors.

I will analyze past tournaments, but not now. For the moment, I think my intuitive answers to both parts of the question are good enough: AIIDE 2020 does have enough games that the bots can likely be ordered by solidity without big mistakes, and it does not have enough varied participants to be sure that a solidity measurement from one tournament is useful for predicting the next one. In time, I want to automate the whole calculation and include it in my suite of tournament analysis software, so I can report on it as a matter of course. Right now I’d rather let the ideas simmer for a while and see if something better can be cooked up.

But mainly I want to get back to Steamhammer and make it great!

solidity in AIIDE 2020 - part 4

I computed what I decided to call the upset deviation, which you can take as the average deviation of actual from expected win rate due to upsets. An upset pairing I defined as one where you outscore a stronger opponent (do better than expected) or underscore a weaker opponent (do worse than expected). Theoretically, smaller numbers are more “solid” and bigger numbers are more “daring”. The table also carries over the rms deviation from yesterday.

To summarize the procedure I followed: 1. Compute elo ratings for each participant in a tournament. 2. Using the elo ratings, you can compute expected win rates for each pairing. 3. For each pairing, the difference between the actual tournament result and the expected win rates is the deviation. 4. Square each deviation and calculate the sum of the squares. 5. Extract the pairings which are upsets and calculate the sum of those squares. 6. The upset ratio is the sum of the upset squares as a ratio of the entire sum of squares. 7. For each participant, given the deviations, compute the rms deviation, which is a kind of average of the deviations. Some people may not know what RMS is: It stands for root mean square, which means you square each number, find the arithmetic mean of the collection, then restore the original scale by taking the square root of the result. 8. Multiply the upset ratio by the rms deviation to get the upset deviation.

bot	rms deviation	upset ratio	upset deviation
stardust	7.6%	74.9%	5.7%
purplewave	11.3%	39.6%	4.5%
bananabrain	9.9%	65.2%	6.5%
dragon	15.5%	72.9%	11.3%
mcrave	13.6%	10.1%	1.4%
microwave	15.5%	23.5%	3.6%
steamhammer	16.9%	71.2%	12.0%
daqin	21.2%	63.8%	13.5%
zzzkbot	23.7%	65.7%	15.6%
ualbertabot	9.0%	27.8%	2.5%
willyt	15.7%	33.8%	5.3%
ecgberht	14.7%	52.7%	7.8%
eggbot	6.8%	69.6%	4.8%

The upset ratio has some interest in itself, so I included it. It doesn’t say how big the upsets were, it says what proportion of the (squared) deviations were due to upsets. You have to interpret the percentage as a ratio. The upset deviation then also recognizes how big the upsets were. In this case, you interpret the percentage as the average deviation from expected win rate due to upsets. The whole procedure is ad hoc and of questionable rigor but all the steps are logical and the results make sense to me. Can anybody suggest an improved method?

By this metric, Dragon, Steamhammer, DaQin, and especially ZZZKBot are the “daring” players in this group. McRave and UAlbertaBot are the most “solid”.

Next: Steamhammer’s bugs.

Randomhammer reuploaded

I noticed that Randomhammer was briefly re-enabled on SSCAIT. I had disabled it before the tournament by uploading an empty zip file. So when Randomhammer played a game versus Feint... it wasn’t much of a game. The bot can only play when it exists.

So I reuploaded Randomhammer 3.3.5. It’s exactly the tournament version of Steamhammer, playing random.

solidity in AIIDE 2020 - part 3

I tried to be clever and make a solidity metric that was also a statistical test, so that it was easy to tell when the numbers were meaningful and when they were just noise. It didn’t work the way I wanted. Then I tried to be clever differently, and converted the results to elo differences, so that the metric ended up as a difference in elo. It’s easy to understand, easy to work with, and mathematically sound, because elo is linear. But the small change from a win rate of 98% to a win rate of 99% corresponds to the big elo difference of 122 points, so the measure was dominated by opponents at the extremes, blowing up the statistical uncertainty. OK, enough! Enough cleverness! Do it the easy way!

Here is a simple measure of goodness of fit, rms deviation of actual from expected win rate. This is not solidity, it is more like consistency or predictability: A small number means that the blue and green curves are close together. The numbers are slightly lower than correct, because I used a spreadsheet and it was easier to include the self-matchups with pretend 50% win rate and 0 deviation.

bot	rms deviation
stardust	7.6%
purplewave	11.3%
bananabrain	9.9%
dragon	15.5%
mcrave	13.6%
microwave	15.5%
steamhammer	16.9%
daqin	21.2%
zzzkbot	23.7%
ualbertabot	9.0%
willyt	15.7%
ecgberht	14.7%
eggbot	6.8%

You can eyeball the graphs and compare these numbers, and you should see that the numbers are a fair summary of how well the blue and green lines match. The bot that mostly won and the bot that mostly lost are good fits, DaQin and ZZZKBot are pretty wild, and UAlbertaBot stands out as unusually consistent. In fact, Stardust, UAlbertaBot, and EggBot all play fixed strategies (one per race for random UAlbertaBot), so it should be no surprise that they are consistent. The next most consistent by this measure is BananaBrain, which plays a wide range of strategies very unpredictably, so it is a surprise.

Next: To turn this into a solidity metric is a matter of extracting the portion of deviation which is due to upsets. It will take a bit of detail work with the spreadsheet. I’m out of time today, so I’ll do that tomorrow. It will be interesting to judge whether consistency or solidity is the more useful metric.

SSCAIT 2020 round robin is over

My first try at the solidity metric did not work well enough. The flaw is easy to fix; expect numbers tomorrow if there are no more flaws. For today, a few notes instead.

The SSCAIT 2020 round robin phase has just finished. The top ranks are no surprise. #1 is Stardust with 104-6 for 94.5%, a dominating performance after a slow start when most of the losses were front-loaded. Tied at #2-#3 are BananaBrain and BetaStar with 99-11, and they scored 1-1 against each other. There is a gap below #4 Monster with 98-12. Tied at #5-#6 are Halo by Hao Pan and PurpleWave with 91-19. This time the head-to-head score is Halo-PurpleWave 2-0. Compared to past expectations, that’s a good result for Halo and a poor result for PurpleWave. #7 Iron landed higher than I predicted.

#16 TyrProtoss, the bottom of the top, is notable for losing every game against others of the top 16, except for one game versus #15 MadMixP. It made up for it with solidity against the rest. The next bot to do at least as poorly against the top 16 is #36 Flash. #17 McRaveZ, the top of the bottom, narrowly missed the top 16, and is notable for a string of 1-1 scores against higher-ranked opponents. McRaveZ also lost 0-2 to #54 Marine Hell, the biggest 0-2 upset. Those 2 points left it 2 games behind #16 TyrProtoss, so it hurt. #18 Skynet by Andrew Smith had even more 1-1 scores against higher opponents. Skynet is old but still tough.

The biggest upsets are cannonbot #50 Jakub Trancik > #2-#3 BetaStar, #52 GarmBot by Aurelien Lermant > #8 Dragon, #54 Marine Hell > #11 Steamhammer, and #55 JumpyDoggoBot > #15 MadMixP. That’s not many extreme upsets; the top 16 are fairly solid. I think another notable upset is the old school champion #29 ICEbot > #1 Stardust.

#11 Steamhammer scored 78-32 for 70.9%. I had forecast rank #9 or #10, and I was optimistic. In the past I’ve predicted more accurately. Looking at every Steamhammer game, I learned about a half dozen bugs and weaknesses that I hadn’t seen before, some severe (I’ll post more later). I guess my prediction was off because of the unexpected weaknesses that I didn’t take into account. But in any case, #11 is the same rank that Steamhammer earned last year, and the year before too, and its win percentage varied neatly with the number of bots in the tournament. In the big picture, Steamhammer had a startup transient (weak the first year because it was barely started, extra strong the next year because other bots had not yet adapted to its new skills), and since then has been holding its level, not surpassing its neighbors but not falling behind either. That’s not bad. But this year I’m putting effort into skills no other bot has, so stand back!

Next I expect a wait while the elimination phase is run behind the scenes, then they’ll turn on bot submission. I will prioritize fixing some of the surprise bugs ahead of my bigger project of opening timing, so I’m thinking I’ll upload Steamhammer 3.3.6 sooner (the tournament version is 3.3.5) and hold 3.4 for later. Then I expect the elimination phase results will come out slowly, week by week. Steamhammer is likely to fall to the losers’ bracket in the first round of the elimination phase, and may visit E-Lemon Nation early on.

Steamhammer-Microwave rivalry

The round robin phase of the annual SSCAIT tournament is nearly over.

Steamhammer was ahead of Xiao Yi by one loss when Steamhammer’s final game came up—versus Microwave. Even with bugs that keep it out of contention, Microwave is still Steamhammer’s rival. Xiao Yi had unplayed games left, and in any case Steamhammer defeated Xiao Yi 2-0 so in the worst case it would place ahead on tiebreak. Nevertheless it was a tense pairing. The game was a long and difficult hive tech ZvZ that neither bot could play particularly well. Notice Microwave’s use of overlords to discover and eliminate burrowed drones. The mutas and devourers were all plagued, but too late....

Another difficult Steamhammer game was versus Ecgberht.

Addendum: BetaStar and BananaBrain will likely end up tied for #2-#3. They scored 1-1 against each other. Will the seeding order for the elimination phase be decided arbitrarily, or what?

solidity in AIIDE 2020 - part 2

Here are the graphs I promised. There is one for each bot in AIIDE 2020. Opponents are not labeled, but are arranged along the x-axis in order of strength. The green line shows the expected win rates against the opponents, based on the elos of the two bots (from yesterday). The blue line shows the actual win rate in the tournament. For purposes of charting, each bot has a fictional win rate of 50% against itself, on both the green and blue lines. So every chart shows green and blue crossing at 50%.

The green line has fundamentally the same shape in every graph, since it is based on fixed elo ratings. It’s just stretched a little differently each time. The blue line must roughly follow the green line; by construction, it can’t deviate in one direction without also deviating in the other. Notice that the scale is different on a couple of the later graphs. In particular, EggBot’s graph only goes to 50%.

It’s not easy to eyeball the upset rate as such. You have to align your eyes to the 50% point where the green and blue cross. I should have drawn vertical lines on the graphs there, but my software was not fun. Nevertheless, the general goodness of fit is easy to see, and I guess that might be just as informative. For example, you can easily tell by eye that Dragon tends to upset the strong and suffer against the weaker. The strongest players want to be solid to avoid losing, since most losses are upsets, and the weaker players want to be daring to win more, and some of that shows too. To me, it’s telling that Microwave is visibly more consistent than Steamhammer, since MicroDK aims for defensive play that avoids risk while Steamhammer aims for aggressive play and seeks risk. That is exactly the kind of difference that a solidity metric is supposed to measure.

I judge the experiment a success so far.

Next: I’ll look for a good way to turn the data into single numbers, completing the solidity metric.

TyrProtoss - Simplicity game

I enjoyed the game TyrProtoss vs Simplicity on Benzene. Simplicity is not ranked high, but it plays some good games.

solidity in AIIDE 2020 - part 1

My proposed daring/solid metric turned out to draw a surprising degree of attention. Well, I was going to try it out anyway, but now I have reason to report in detail. Today I did only the first step, finding the elo values.

I chose to rate the players relative to a fictional opponent that scored exactly 50%, giving the fictional player elo 0, because it was easy that way. The game is zero-sum, so that’s the average player of the tournament, in a sense. We only care about elo differences, so the base is arbitrary.

bot	%	elo
Stardust	93.22	455
PurpleWave	79.44	235
BananaBrain	69.61	144
Dragon	62.38	88
McRave	57.22	51
Microwave	54.47	31
Steamhammer	54	28
DaQin	50.14	1
ZZZKBot	39.89	-71
UAlbertaBot	31.14	-138
WillyT	29.44	-152
Ecgberht	24.28	-198
EggBot	4.72	-522

I calculated the table by hand, which seemed easier for a first cut—I simply printed a big elo table and read it backwards, from winning percentage to elo difference. If the solidity metric works out, I’ll have to automate it. It doesn’t seem hard (maybe invert the function by binary search). In fact, the only reason it was easier to do it by hand the first time is that I’ll have to do it by hand anyway to verify that my code is correct.

Next: I want to draw graphs for each player showing the expected and actual scores against each opponent. That will give a visual indication of how well the metric will work out. If it works well, I’ll choose a way to turn it into a number.

solid versus daring play styles

A win against a stronger player is an upset. A loss against a weaker player is an upset in the opposite direction; the weaker player upset you. Two players of the same strength may have different rates of upsets in their games: Maybe one often beats stronger players but loses to weaker ones, and the other has more consistent results and does not. It’s a difference in play style. I call the inconsistent, risk-taking player daring and the consistent, risk-avoiding player solid.

It should be possible to measure solidity from a tournament crosstable. But what is a mathematically correct way to do it? You don’t want to simply count upsets, because you expect to win a lot of games versus a player that is slightly better than you. You want to somehow take the severity of the upset into account. For example, a win rate that stands far from its expected value should count more. But what is the expected win rate if, say, one player scored 80% in the tournament and the other scored 50%?

Here’s one way. Finding expected win rates is what elo is for, so compute an elo rating for each player in the tournament. You could use a program like bayeselo to take all information into account, or you could simply use the tournament win rates to impute elo values, essentially running the elo function in reverse. The two methods will give slightly different answers, but not very different. Then you can use the elo function in the forward direction on the differences between elo values to find expected win rates for each pairing.

Then for each pairing you have an actual win rate from the tournament results, and a calculated expected win rate. By construction, the two are the same in an average sense—but not individually. All that is left is to turn these numbers into a metric of upset-proneness, or daring risk-seeking. I haven’t tried to work out the math of what the metric should be, but the outline is obvious. For each opponent, pick out the pairings that are upsets: Either higher-than-expected win rates against a stronger opponent, or lower-than-expected against a weaker opponent. You might ignore other pairings, on the theory that they are symmetrical anti-upsets, or you might try to refine your metric by assigning them upset values too (I think the results would be a little different but probably close). You want some difference function f(actualRate, expectedRate) that says how big the difference is; you might choose linear distance (subtract then take the absolute value). Then you want a combining function g() that accumulates the difference values into a final metric; if f is distance, then you might choose the arithmetic mean.

I’ve never seen a metric like this, but it seems like an easy idea. Has anyone seen it? Can you point it out?

Next: I want to try this for AIIDE 2020. If it works smoothly I may extend it to other past tournaments, to see whether bots retain a measurably consistent daring/solidity style over time.

Steamhammer games and status

Steamhammer played an excellent game versus Monster today. The game is kind of long and boring to watch, with repetitive action, but I’m pleased by the good play against stubborn defense. Steamhammer wasted some resources and missed some opportunities, but made no severe mistake at any point. It even expanded at a good time, which is depressingly rare in its ZvZs. Near the end, Steamhammer tried to put the cherry on top by ensnaring Monster’s mutalisks, but the mutas zoomed by too fast, the ensnare missed, and the queen was shot down. Oh well, dropping the cherry didn’t change the rest!

For a game that is not in the least excellent but is interesting for its mistakes, I like yesterday’s Steamhammer-Slater game. I watched the game live, and when Steamhammer bumbled the defense of its natural I steeled myself for a quick upset. But it was not so quick after all. The game is a showcase of ways to go wrong on both sides. Some of Steamhammer’s mistakes remain unresolved because my planned fixes are complicated and need to be implemented as projects.

The latency compensation bug is still making me scratch my head. The easiest way to work around it is to use the Micro module’s order tracking; Steamhammer already keeps track of what orders it has given to units, including larvas, so it doesn’t need to rely on BWAPI to keep it straight. I traced the backbone of the production code and added the minimal workaround, a two-line addition to the code that decides whether a unit should be added to the set of candidate producers. And... it didn’t work. In order to control where zerg units are made, to do things like make drones at bases that don’t have enough drones, there is a special-case low-level routine, and it ignores the set of candidate producers and does its own calculations from scratch—slightly complicated calculations that the candidates don’t make easier. I’m still thinking about the right fix. Maybe I can find a way to make it simple and powerful at the same time.

It is, by the way, a serious bug. In Steamhammer, the effect is to sometimes—at predictable times—drop a unit that was queued for production. Among other things, it turns 12 hatch openings into 11 hatch. I had noticed that Steamhammer was playing 11 hatch surprisingly often, but it does have a full suite of intentional 11 hatch openings, so I didn’t realize that it was due to a bug.

SSCAIT 2020 halfway point

The annual SSCAIT is past the halfway point of the round robin phase, and it’s time to take stock. The numbers keep changing, but here’s a snapshot.

Stardust has slowly climbed to #1 after its weak start, with only 4 losses after 50 games, as compared to #2 Monster with 7 losses after 67 games. Stardust has strong chances to hold its #1 position, though it has played fewer games. Stardust’s worst upset was against #29 ICEbot, while Monster’s was against #47 Junkbot. #5 PurpleWave is unexpectedly low, below #4 BananaBrain; from games I’ve seen, I suspect it did not get its usual thorough preparation, or perhaps the prep was concentrated on top opponents so that it can succeed in the elimination phase. #6 Iron is doing better than I expected, and is ahead of #7 Hao Pan, though they are ranked close and the edge may not stick. #9 Xiao Yi is also higher than I expected.

#14 Skynet by Andrew Smith is the only classic unupdated bot to hang on in the top 16. Other classics #17 UAlbertaBot by Dave Churchill and #18 XIMP by Tomas Vajda are just outside, and there is a gap with #16 Proxy so they may remain outside at the end of the round robin. #19 McRaveZ I had hoped to do better; its muta micro is good but its muta decision making (which target to seek, when to attack and when to run away) is not nearly as good as Monster’s. #20 Microwave has been slowly upping its win rate and has an outside chance of making it into the top 16 by the end; I imagine its learning is figuring out how to compensate for the bugs in this version.

Steamhammer is at #13 at the moment after a few losses, but I’m still forecasting that its most likely finish is #9 or #10. It has played more of its tough games than its easy games.

Some bots get special icons on the unofficial crosstable by Lines Prower. It’s a cute touch, though for me it makes the table harder to read. The funniest is Krasi0P’s linux penguin for 2 wins and Windows logo for 2 losses. I don’t understand McRaveZ’s icons. A salt shaker for losses, OK, but a secret agent for wins? I may be missing some background. PurpleWave gets a purple heart for wins. Maybe Lines Prower doesn’t know what a purple heart means to Americans?

apparent latency compensation bug

The just played Simplicity vs BananaBrain is a fine game by Simplicity. The early defense against zealots was especially well done, and Simplicity’s tech and attack decisions were good. Recommended.

In the meantime, I’ve hit a bug that’s slowing me down. I found a reproducible case where production fails because it tries to use the same larva to produce two drones. It looks like slippage in BWAPI’s latency compensation: The production system picks a larva to produce a drone. Ask the type during the same frame after giving the morph order, and you get egg; that is latency comp at work. Ask again a couple frames later, and the egg has turned back into a larva; the production system picks it a second time, and the second morph can only fail. I think it should be easy to work around, but can it be fixed? Latency compensation is not expected to be perfect.

It makes me wonder what other slippages may be hiding under the rug.

only one horrible game

Last year, Steamhammer finished SSCAIT for the first time with no losses due to crippling bugs and only 2 close calls. So far, it is on track to repeat in this year’s SSCAIT. I have seen all its games up to now, and there are no losses due to egregious bugs (only the standard issue flaws) and only one near miss. That’s great compared to Steamhammer’s early years, but I still want to fix the bugs.

The bad game is Steamhammer vs legacy (random zerg). Steamhammer made a number of mistakes in the game and suffered at least 2 bugs. The bug I could not accept is that it built spore colonies to defend against air attack—very early, immediately after scouting legacy’s base and seeing that it had not yet taken gas. It’s not possible to get mutalisks that fast, and without gas there was not even a hint of future risk. In fact, legacy never took its gas and played the whole game with a mass slow zergling plan. If Steamhammer had held on to the drones instead of wasting them on static defense due to a bug, I doubt the attack would have troubled it at all.

I traced the bug to, of all things, an integer overflow. The routine that figures out the time the enemy’s spire will complete returns INT_MAX for “never” if there is no evidence of an enemy spire... and I brilliantly added a margin for the mutas to hatch and fly across the map. In C++, integer overflow is officially undefined, so the compiler retreats to its room and laughs its head off before generating the code that will cause the most possible confusion, because “undefined” means it can do that. I don’t know what it did this time, but it was not as simple as wrapping around from an extreme positive value to an extreme negative value, because that would have caused the bug to show up in half of ZvZ games. No, it’s better if it shows up only when it will cause a disgusting blunder out of nowhere.

Anyway, it was easy to fix. I also fixed a bug that caused multiple commanding of overlords. And I’m writing code to collect data for my main current project. Progress is underway.