Steamhammer’s performance over time

Many will have missed it since the original post was almost a year ago, but today Tully Elliston commented on the Steamhammer 3.1 change list from August 2020:

Tully Elliston: Looking at BASIL win rates, it looks like SH competitive performance dropped visibly after this version.

It does look that way. Here is BASIL’s graph of Steamhammer’s elo for 2020. BASIL throws in the ratings of top bots, which by coincidence is exactly what I want here. The version in question is the red dot on 20 August (delayed from the posting of the change list due to downtime).

Steamhammer improved slowly but steadily up until around that version hit the server, then more or less held steady while the top bots gradually lifted away. The cause might be the sudden ascendance of Stardust, pushing everyone else down; the theory would be that the other bots on the graph coped better with the killer dragoons. It seems plausible to me, but Stardust is only one opponent and should not have much effect. The cause might be that I had spent a year distracted by other things and worked slowly on Steamhammer. That seems more likely to me. Or it could truly be that a weakness was introduced in this version.

Notice that Steamhammer’s improvement on the graph occurred in between widely-spaced updates. In principle, there are 3 ways that can happen: 1. By chance. 2. By artifacts of the rating system as implemented, because of bots arriving and leaving. You can get elo inflation if bots arrive, lose games and fall in elo to push everybody else up, then are dropped (and BASIL has dropped a lot of bots). 3. By Steamhammer’s opening learning. I think the opening learning is most likely. That opens another hypothesis for why improvement stopped around this version: Maybe, due to weaknesses already inherent in Steamhammer from earlier versions, the learning reached a ceiling and could no longer contribute. This suggests that there may be a bottleneck weakness somewhere, and to make big progress I have to break the bottleneck.

Wah, that is a lot of hypotheses. I looked at the long-term elo graphs for a number of bots which have not been updated the whole time, and they all show elo increases. BASIL has elo inflation, which explains some proportion of the elo rise of all bots. It also means that if your elo does not increase, maybe your bot is not staying the same, but getting worse! (We could take an average of non-updated bots and subtract out their elo inflation to get an estimate of true strength over time. There is no reason to expect that the inflation is constant over time.)

Here is the same graph starting from 1 January 2019 and continuing until today. BASIL began a little before the start of the graph, but the early period shows startup transients as the initial elos are established, so I left it out.

When I compare Steamhammer to Hao Pan and BananaBrain on this graph, I can make out 3 periods. From the start until about October 2019, Steamhammer was neck-and-neck with them. From then until August 2020 or so, Steamhammer remained behind them; a gap had been opened, and the gap stayed roughly constant over that time. And since that time, Steamhammer has gained elo extremely slowly if at all, and has fallen further behind. Despite bug fixes and demonstrable improvements in some points of play, Steamhammer does not seem to be improving and (accounting for elo inflation) may be deteriorating. It is consistent with the distraction hypothesis, if you assume that I still haven’t recovered... but I think I have.

I suspect that the bottleneck weakness hypothesis is true. After watching many SCHNAIL games, I’ve concluded that Steamhammer’s tactical weaknesses in the midgame are critical. It loses too many units due to bad tactical decisions, must replace the lost combat units to stay safe, and (spending on combat units instead of drones) reaches its lategame economy too late. I suspect that if I fix the bottleneck tactical weaknesses, the other improvements I’ve made will start to show.

It’s hard to be sure, though! Gotta try it and find out.

By the way, I think the big point in these graphs is the relative decline of Krasi0. Krasi0 gained slightly over time, but lost its dominance and now is only another top bot. Subtracting elo inflation, perhaps Krasi0 is no longer improving at all.

Trackbacks

No Trackbacks

Comments

MicroDK on Monday, July 19. 2021:

1. Did you take into account that Basil changed its matchmaking around september 2020? Now it will pair bots of almost the same strength more often that bots with a big elo gap. Which means that all bots plays stronger bots more often than before.
2. I think that new bots Monster and Stardust, and older bots getting stronger like BananaBrain and Hao Pan also have a big impact being in top 20 means we play them fairly often.
Microwave shows the same trend. Though, it had a big dip from January due to bugs, that were corrected in the last updates.

Jay Scott on Tuesday, July 20. 2021:

Good point. Matchmaking changes can affect elo in practice, though it’s hard to say what the effects would be.

Tully Elliston on Tuesday, July 20. 2021:

I think I've talked about it before in the comments, but my view is that when making changes with a mind to gaining competitive performance, you want your changes to impact the highest % of games possible.

Hyper-optmising behavior of late game or niche units like queens or defilers is cool, but it doesn't improve competitive performance at all if the games are either a) lost before they appear or b) would be won even if the behavior wasn't optimisied. I can't help but feel a lot of games fall into those categories, which means that a lot of improvements may not actually deliver more wins.

If you want maximum impact from changes, you want them to impact almost every game. Changes that deliver improvement to play in the first 8 minutes of the game are at lot more powerful than changes that impact play after the 30 minute mark. Snowballing is very real, so even if the game does last 30+ minutes, dominance in the first 8 minutes can decide the result.

Jay Scott on Tuesday, July 20. 2021:

Well, you’re right. But, as KangarooBot is here to remind us, winning is not the only way to have fun.

MicroDK on Friday, July 23. 2021:

Some times it's just super cool and fun to watch that your bot that you programmed yourself actually is able to do that drop which ends the opponent or being able to place Dark Swarm in the right locations. These are niche skills, but they are super fun for the viewers.
Many of the older bots are still very interesting. Not because they are competitive, but because some of them have niche skills.

Tully Elliston on Saturday, July 24. 2021:

Please don't get me wrong, I wasn't being critical - it's the cool stuff/ unit skills that makes watching bots interesting. I was just putting forward my theory as to why SH play has improved but elo has not.

Tully Elliston on Friday, July 30. 2021:

also interesting to see on BASIL that there are some maps that SH really doesn't get on with, with visibly lower win rates than the rest.

Circuit Breakers 1.0
Ground Zero 2.0
Medusa 2.2
Fighting Spirit
Overwatch

Jay Scott on Friday, July 30. 2021:

Yes, all bots have favored maps. Even Stardust has clearly worse performance on Jade with its inverted ramp, for example.

For Steamhammer, I hope to mitigate map weaknesses with map analysis and map-specific tactical learning. But I expect that will also work better on some maps than others....

MicroDK on Monday, August 9. 2021:

Some of those maps only a few bots are playing on those and mostly only the bots that are being actively developed on meaning that many of them are top bots.

krasi0 on Monday, August 30. 2021:

I think there are three main factors that may have lead you to believe that my bot hadn't improved in the time period specified (Aug 2020 till July 2021):
1) The emergence of Stardust and Monster as new contenders introducing some novel skills and meta which the older bots (mine included) weren't quite prepared for / weren't expecting.
2) The last update of my bot (for the period in question) was in somewhere in the beginning of Jan 2021 while Stardust, Banana, PW, HP, Monster, etc. have had some significant (strong) updates since.
3) My hypothesis is that ELO inflation while likely existing in reality, may affect different categories of bots differently, e.g. the top few bots may gain far less from it than the middle of the pack for example.

And still it's obvious from the graph, that my bot managed to float mostly above the 3000 ELO mark which indicates at least *stability* in strength given that the competition had been improving in the meantime.
All that to say, that the conclusion that you have reached may be a bit shaky.

Jay Scott on Monday, August 30. 2021:

It does seem likely that bots at different ranks may be affected differently by elo inflation—especially since ranks affect the pairings.

I took a brief stab at trying to estimate elo inflation from the data, but found that BASIL’s data is in a form that makes that analysis inconvenient.

krasi0 on Tuesday, August 31. 2021:

Another factor that I forgot to mention is that the new (half a year old by now) matchmaking algorithm has mostly eliminated the phenomenons: top-bot-sniping and "upsets" which in turn, indirectly punishes top bots that are more robust against the variety of low ELO opponents.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA