The previous Steamhammer version chose its openings solely based on the enemy’s predicted plan, if there was one, or based on the matchup default openings if no plan was predicted. The new version 1.4.2 can still do that, but once it has played a given opponent enough times it prefers to choose openings that have won in past games. It has always stored the data in its game records; now it is using it.
At heart, the selection algorithm is epsilon-greedy, but there are a few wrinkles.
1. The first 5 games against each opponent, in each matchup, are the initial exploration phase. Choose openings the old way, according to the matchup or the enemy plan.
2. If any opening has a 100% win rate, play it again. This bypasses the exploration phase. If any opening has a 100% win rate on this map, play it again. It does this even if the opening has only been tried once before, so that we can’t much trust in another win. If more than one opening has a 100% win rate on the map, choose among them randomly without regard to how many times each has been played. The idea is to encourage map specialization. Also it usually ensures that a surprise win opening is tried at least a couple more times—once because it always wins, and once because it always wins on this map—to find out whether it was a fluke.
3. Decide randomly whether to explore for new openings, or try to play a known good opening. The exploration rate (“epsilon” in “epsilon-greedy”) varies from 5% if we always win to 15% if we always lose. (I don’t have any reason to think that those are good numbers, it’s just a try. Likely it should grow toward 100% if we keep losing, because there are so many openings to explore. I think I should do some math....) The form of exploration varies according to the total number of games played against this opponent. If there are few games, Steamhammer prefers to try an opening that responds to the enemy plan. With more, it increasingly chooses from the wider variety of openings for the matchup, ignoring the plan. If there are many games, over 30, it starts to choose openings at random from its entire universe of known openings. (Steamhammer doesn’t have game records for that many games against any opponent on SSCAIT yet, but it will happen in long tournaments.)
4. Try to choose the best known opening, the “greedy” in “epsilon-greedy.” Steamhammer combines the win numbers of each opening on the current map with the win numbers across all maps (a much larger number of games, but each is less informative because the map is different) in an ad hoc way to get a “weighted” win rate. It takes that as an estimate of the chance of winning this game if it chooses that opening, and picks the biggest number. In case of ties, it chooses randomly among the tied openings.
The weighted win rates are a crude attempt to adapt to the map without restricting the input to data about that map only. In some future version, I’ll switch to a more general context-aware algorithm that can take more information into account. Steps 2, 3, and 4 should collapse into one.
On the SSCAIT server, I left the existing learning files in place to see how Steamhammer would make use of the old information. So far, the data doesn’t seem too helpful. Some of the files have systematic mispredictions that the new Steamhammer version doesn’t make, and yet still believes (“there it is in writing!”). In the case of Iron, new Steamhammer can recognize and respond to the terran plan, but old Steamhammer recorded a lot of data in which it could not. Steamhammer is choosing openings against Iron according to its old data, when it would do better to ignore it. Well, I know what to do to fix that, and it’s on the list.
Randomhammer needs a lot of data to get anywhere. Even with the old data included, there is not enough. If it has played games against an opponent as protoss, that tells it nothing about what openings it should choose as terran.