Steamhammer’s machine learning component

This information is good as of Steamhammer 4.3, the AIIDE 2024 version, which was not released when I wrote this. Consider it a preview.

Steamhammer has a machine learning position evaluator that estimates its chance of winning in the current game. It relies on unit counts and known tech for each side—it doesn’t do any tactical analysis. For example, it takes into account how many vultures the opponent has, and doesn’t care whether some of them are in its mineral line. Within that limitation, it’s fairly accurate.

There’s one evaluator for each matchup. There are six learning data files, named eval_ZvT.txt and so on. It can be configured to used fixed, pre-trained learning files, to update the files as it plays games, or to ignore the evaluator altogether.

The result of an evaluation is an estimated probability that Steamhammer will win the game versus the mix of opponents that the evaluator was trained against, in the range zero to one. I’ve been training on BASIL, which plays the most games. With these files, the evaluator estimates the chance of winning the current game, on average, against the mix of opponents Steamhammer faces on BASIL. If you ask for an evaluation before the enemy has been scouted, you’ll get back a number that is close to the bot’s win rate in that matchup overall.

Evaluation is fast enough that it doesn’t really restrict anything else that Steamhammer may want to do during the frame. You could do it once a frame for the whole game, though there’s no reason to.

evaluation

The evaluator works by binary reservoir computing (Wikipedia). The input is mixed into a “reservoir” (which doesn’t store up anything in my implementation, despite the name). It learns to evaluate by simple logistic regression (Wikipedia) on the reservoir output. The accuracy is similar to a (non-deep) neural network, but it learns faster.

It is all basic stuff for reservoir computing. I did most of it the easiest good way and ignored fancy algorithms that might be better.

Input. The input is basic information about the map (how many start locations, are there islands, etc.), the game (rough game time), and the two sides. Info about the opponent is based on what has been scouted, obviously (so if Steamhammer’s scouting improves in a new version, the evaluator will overestimate the opponent until it learns better). It includes base count, supply, unit counts, and tech (do you have scan? do you have mines?). For unit counts, units are divided into two types: The main fighting units like zealots and dragoons, and the auxiliaries like high templar and arbiters. The main units are recorded as a unit mix: What proportion of the supply do they make up (saturating at 80%)? The auxiliaries are individually counted, but only on a log2 scale and with an upper bound. For arbiters, it can count 0, 1, 2 to 3, or 4 or more. It is all the information needed for evaluation, outside of rare cases.

The input numbers are assembled into a vector of 512 bits (the same length for every matchup, though it need not be). Because of how the reservoir works, the important information has to be repeated as many times as possible for accuracy (and as few as possible for speed of learning). The various ceilings and approximations cut out less important data so that the more important can be repeated.

Reservoir. The input bits are the input to the reservoir. It runs multiple iterations of the elementary binary cellular automaton that Wolfram calls Rule 90 (Wolfram MathWorld), which combines simplicity and rich behavior. You tell it how many iterations. You want enough to combine the input bits thoroughly, and not so many that they turn into uncorrelated hash. Steamhammer does 32 iterations. (It appears in the code as 16 “steps”; each is two iterations.) Optimizing this hyperparameter would take way longer than I have, and I would not be surprised if it’s a poor choice. But it works well enough.

Readout. The readout work by simple logistic regression on the output of the reservoir. (Linear regression becomes even simpler when the independent variables are bits.) This is usual in reservoir computing. The reservoir combines the input into a complicated sample of the combinations of the input values, so that the actual learning part is as straightforward as possible.

You can attach as many readouts as you like to the same reservoir. Anything that takes the same inputs, you can learn. It’s a standard advantage of reservoir computing. Steamhammer doesn’t take advantage, at least so far.

training

Steamhammer trains its evaluator by temporal differences (Wikipedia). During the game, it runs an evaluation approximately every 30 game seconds and remembers the evaluation result and the reservoir output that gave the result. It uses TD(λ) with the backward updating approach: At the end of the game, it walks backward through the saved evaluations, and updates each one according to the temporal difference value of the one after it. It’s easy; the bookkeeping to save past values is more complicated than the calculation.

This method extracts more information from the game than you get by updating based on the game result alone. One way to look at it is that it knows when mistakes happened in the game, rather than that mistakes happened.

algorithm and files

The reservoir computing algorithm is in subproject RC. I wrote it entirely myself. It is simple and limited, but within its limitations it is general-purpose. The code that builds the binary input is the most complicated part. You provide a vector of fields, where each field is a number with its valid range plus the type of binary encoding to use for that number. You tell it how many times to repeat each number; important numbers should be repeated more so that their combinations are better sampled. Behind the scenes, it reorders the input bits so that the reservoir will combine them more effectively. The reservoir code itself uses binary tricks. It is blazing fast, even with 32-bit words. The readout is the most primitive part. It is in the reinforcement learning style and incrementally updates its logistic model using the simplest possible algorithm. It could be improved to speed up learning, but it’s good enough so far. There’s serialization, to make it easy to store the model in a file. The serialization format includes the model data plus a count of how many learning updates it has had (each temporal difference step is one update), so you can make a rough judgement of how trained the model is.

(Note: A simpler idea to scramble the input vector is to create it unscrambled and run the bits through a fixed, random permutation. My thought is that carefully interleaved bits should be better than randomly permuted bits. This is untested.)

Here are the binary encoding choices. Steamhammer’s evaluator uses all except the Choice encoding. None of its input values happen to be choices.

  , Literal           // the bit pattern as provided, don’t re-encode
  , Gray              // binary reflected Gray code
  , Thermometer       // 1 bits up to the value, then 0 bits
  , Log2Thermometer   // 1 bits up to log2(value), then 0 bits
  , Choice            // set a single bit

Steamhammer’s use of the algorithm to implement its evaluator is in file Evaluator.cpp. If you want to use the algorithm for another purpose, it shows exactly how. It includes file I/O, building the input fields for each matchup, running the evaluator either periodically or on demand and remembering the results, and the temporal differences method to update it.

using the evaluator

Steamhammer 4.3 uses past evaluations in its decision of what openings to repeat. If the opening looked good, then it is a better candidate to try again, no matter what happened later in the game. In my imagination, the big value may be against opponents that Steamhammer nearly always loses to. If the build started well despite losing, prefer it over openings that looked bad from the start. In any case, in a long series against any opponent, the feature should improve opening choices. (Though there are confounding factors, like how long the opening is.)

The newest versions add one field to the opponent model file that Steamhammer writes after every game. It is the evaluated probability of winning at the point when the opening build ends. If the bot lost before the opening build finished, the value is 0. If it won before then, the value is 1. In other games, it is between those values. The mean evaluation for each opening being considered is folded in with other information to arrive at a “goodness” for the opening.

future plans

There are great potential uses that I plan to implement in future versions. With easy modifications, the evaluator can be used to make strategy decisions during the game.

Another feature I want to implement is an opponent-specific evaluation. The evaluation so far is global, in the sense that it is an average across all opponents. It takes as input a large number of features and requires a relatively long time to learn. An opponent-specific evaluator should have a small input and learn fast. It can be combined with the global evaluation to get a result that takes a lot of information into account, and adjusts it according to the opponent. It has the potential to be much more accurate at little more cost.


Jay Scott <jay@satirist.org>
first posted 21 August 2024
last updated 10 October 2024