Steamhammer’s performance over time
Many will have missed it since the original post was almost a year ago, but today Tully Elliston commented on the Steamhammer 3.1 change list from August 2020:
Tully Elliston: Looking at BASIL win rates, it looks like SH competitive performance dropped visibly after this version.
It does look that way. Here is BASIL’s graph of Steamhammer’s elo for 2020. BASIL throws in the ratings of top bots, which by coincidence is exactly what I want here. The version in question is the red dot on 20 August (delayed from the posting of the change list due to downtime).
Steamhammer improved slowly but steadily up until around that version hit the server, then more or less held steady while the top bots gradually lifted away. The cause might be the sudden ascendance of Stardust, pushing everyone else down; the theory would be that the other bots on the graph coped better with the killer dragoons. It seems plausible to me, but Stardust is only one opponent and should not have much effect. The cause might be that I had spent a year distracted by other things and worked slowly on Steamhammer. That seems more likely to me. Or it could truly be that a weakness was introduced in this version.
Notice that Steamhammer’s improvement on the graph occurred in between widely-spaced updates. In principle, there are 3 ways that can happen: 1. By chance. 2. By artifacts of the rating system as implemented, because of bots arriving and leaving. You can get elo inflation if bots arrive, lose games and fall in elo to push everybody else up, then are dropped (and BASIL has dropped a lot of bots). 3. By Steamhammer’s opening learning. I think the opening learning is most likely. That opens another hypothesis for why improvement stopped around this version: Maybe, due to weaknesses already inherent in Steamhammer from earlier versions, the learning reached a ceiling and could no longer contribute. This suggests that there may be a bottleneck weakness somewhere, and to make big progress I have to break the bottleneck.
Wah, that is a lot of hypotheses. I looked at the long-term elo graphs for a number of bots which have not been updated the whole time, and they all show elo increases. BASIL has elo inflation, which explains some proportion of the elo rise of all bots. It also means that if your elo does not increase, maybe your bot is not staying the same, but getting worse! (We could take an average of non-updated bots and subtract out their elo inflation to get an estimate of true strength over time. There is no reason to expect that the inflation is constant over time.)
Here is the same graph starting from 1 January 2019 and continuing until today. BASIL began a little before the start of the graph, but the early period shows startup transients as the initial elos are established, so I left it out.
When I compare Steamhammer to Hao Pan and BananaBrain on this graph, I can make out 3 periods. From the start until about October 2019, Steamhammer was neck-and-neck with them. From then until August 2020 or so, Steamhammer remained behind them; a gap had been opened, and the gap stayed roughly constant over that time. And since that time, Steamhammer has gained elo extremely slowly if at all, and has fallen further behind. Despite bug fixes and demonstrable improvements in some points of play, Steamhammer does not seem to be improving and (accounting for elo inflation) may be deteriorating. It is consistent with the distraction hypothesis, if you assume that I still haven’t recovered... but I think I have.
I suspect that the bottleneck weakness hypothesis is true. After watching many SCHNAIL games, I’ve concluded that Steamhammer’s tactical weaknesses in the midgame are critical. It loses too many units due to bad tactical decisions, must replace the lost combat units to stay safe, and (spending on combat units instead of drones) reaches its lategame economy too late. I suspect that if I fix the bottleneck tactical weaknesses, the other improvements I’ve made will start to show.
It’s hard to be sure, though! Gotta try it and find out.
By the way, I think the big point in these graphs is the relative decline of Krasi0. Krasi0 gained slightly over time, but lost its dominance and now is only another top bot. Subtracting elo inflation, perhaps Krasi0 is no longer improving at all.
Comments
MicroDK on :
2. I think that new bots Monster and Stardust, and older bots getting stronger like BananaBrain and Hao Pan also have a big impact being in top 20 means we play them fairly often.
Microwave shows the same trend. Though, it had a big dip from January due to bugs, that were corrected in the last updates.
Jay Scott on :
Tully Elliston on :
Hyper-optmising behavior of late game or niche units like queens or defilers is cool, but it doesn't improve competitive performance at all if the games are either a) lost before they appear or b) would be won even if the behavior wasn't optimisied. I can't help but feel a lot of games fall into those categories, which means that a lot of improvements may not actually deliver more wins.
If you want maximum impact from changes, you want them to impact almost every game. Changes that deliver improvement to play in the first 8 minutes of the game are at lot more powerful than changes that impact play after the 30 minute mark. Snowballing is very real, so even if the game does last 30+ minutes, dominance in the first 8 minutes can decide the result.
Jay Scott on :
MicroDK on :
Many of the older bots are still very interesting. Not because they are competitive, but because some of them have niche skills.
Tully Elliston on :
Tully Elliston on :
Circuit Breakers 1.0
Ground Zero 2.0
Medusa 2.2
Fighting Spirit
Overwatch
Jay Scott on :
For Steamhammer, I hope to mitigate map weaknesses with map analysis and map-specific tactical learning. But I expect that will also work better on some maps than others....
MicroDK on :
krasi0 on :
1) The emergence of Stardust and Monster as new contenders introducing some novel skills and meta which the older bots (mine included) weren't quite prepared for / weren't expecting.
2) The last update of my bot (for the period in question) was in somewhere in the beginning of Jan 2021 while Stardust, Banana, PW, HP, Monster, etc. have had some significant (strong) updates since.
3) My hypothesis is that ELO inflation while likely existing in reality, may affect different categories of bots differently, e.g. the top few bots may gain far less from it than the middle of the pack for example.
And still it's obvious from the graph, that my bot managed to float mostly above the 3000 ELO mark which indicates at least *stability* in strength given that the competition had been improving in the meantime.
All that to say, that the conclusion that you have reached may be a bit shaky.
Jay Scott on :
I took a brief stab at trying to estimate elo inflation from the data, but found that BASIL’s data is in a form that makes that analysis inconvenient.
krasi0 on :