Steamhammer’s performance over time
Many will have missed it since the original post was almost a year ago, but today Tully Elliston commented on the Steamhammer 3.1 change list from August 2020:
Tully Elliston: Looking at BASIL win rates, it looks like SH competitive performance dropped visibly after this version.
It does look that way. Here is BASIL’s graph of Steamhammer’s elo for 2020. BASIL throws in the ratings of top bots, which by coincidence is exactly what I want here. The version in question is the red dot on 20 August (delayed from the posting of the change list due to downtime).
Steamhammer improved slowly but steadily up until around that version hit the server, then more or less held steady while the top bots gradually lifted away. The cause might be the sudden ascendance of Stardust, pushing everyone else down; the theory would be that the other bots on the graph coped better with the killer dragoons. It seems plausible to me, but Stardust is only one opponent and should not have much effect. The cause might be that I had spent a year distracted by other things and worked slowly on Steamhammer. That seems more likely to me. Or it could truly be that a weakness was introduced in this version.
Notice that Steamhammer’s improvement on the graph occurred in between widely-spaced updates. In principle, there are 3 ways that can happen: 1. By chance. 2. By artifacts of the rating system as implemented, because of bots arriving and leaving. You can get elo inflation if bots arrive, lose games and fall in elo to push everybody else up, then are dropped (and BASIL has dropped a lot of bots). 3. By Steamhammer’s opening learning. I think the opening learning is most likely. That opens another hypothesis for why improvement stopped around this version: Maybe, due to weaknesses already inherent in Steamhammer from earlier versions, the learning reached a ceiling and could no longer contribute. This suggests that there may be a bottleneck weakness somewhere, and to make big progress I have to break the bottleneck.
Wah, that is a lot of hypotheses. I looked at the long-term elo graphs for a number of bots which have not been updated the whole time, and they all show elo increases. BASIL has elo inflation, which explains some proportion of the elo rise of all bots. It also means that if your elo does not increase, maybe your bot is not staying the same, but getting worse! (We could take an average of non-updated bots and subtract out their elo inflation to get an estimate of true strength over time. There is no reason to expect that the inflation is constant over time.)
Here is the same graph starting from 1 January 2019 and continuing until today. BASIL began a little before the start of the graph, but the early period shows startup transients as the initial elos are established, so I left it out.
When I compare Steamhammer to Hao Pan and BananaBrain on this graph, I can make out 3 periods. From the start until about October 2019, Steamhammer was neck-and-neck with them. From then until August 2020 or so, Steamhammer remained behind them; a gap had been opened, and the gap stayed roughly constant over that time. And since that time, Steamhammer has gained elo extremely slowly if at all, and has fallen further behind. Despite bug fixes and demonstrable improvements in some points of play, Steamhammer does not seem to be improving and (accounting for elo inflation) may be deteriorating. It is consistent with the distraction hypothesis, if you assume that I still haven’t recovered... but I think I have.
I suspect that the bottleneck weakness hypothesis is true. After watching many SCHNAIL games, I’ve concluded that Steamhammer’s tactical weaknesses in the midgame are critical. It loses too many units due to bad tactical decisions, must replace the lost combat units to stay safe, and (spending on combat units instead of drones) reaches its lategame economy too late. I suspect that if I fix the bottleneck tactical weaknesses, the other improvements I’ve made will start to show.
It’s hard to be sure, though! Gotta try it and find out.
By the way, I think the big point in these graphs is the relative decline of Krasi0. Krasi0 gained slightly over time, but lost its dominance and now is only another top bot. Subtracting elo inflation, perhaps Krasi0 is no longer improving at all.


