There’s a ton of code in CherryPi, more than I can read in a day. I tried to pick out key parts to look at.
blackboard architecture
This comment from the file CherryPi/src/upc.h
explains an important part of CherryPi’s high-level architecture.
/**
* Who, where, what action tuple (unit, position, command).
*
* UPCTuples are used for module-to-module communication (via the Blackboard).
* Posting a UPCTuple to the Blackboard is akin to requesting a specific action
* from another module, and consuming a UPCTuple from the Blackboard implies
* that the consumer implements this action. The implementation can also consist
* of refining the UPCTuple so that a more lower-level module can actually
* execute it.
*
* For example, a build order module might post a UPCTuple to create a certain
* unit type, a builder module might wait until their are sufficient resources
* before consuming it and then select a worker and a location. The
* UPCToCommandModule takes care of translating executable UPCTuples (with sharp
* unit, position and command entries) to actual game commands.
*/
I think it’s an excellent architectural choice, especially for a project carried out by a team rather than an individual. Communication between modules is managed in large part automatically by software rather than manually through the calling conventions of each module. It’s flexible and easy to modify and extend.
relation to Tscmoo
People have speculated that CherryPi may borrow a lot from Tscmoo the bot, since Tscmoo the person is on the team. The speculation even made it into ZZZKBot’s source code, as we saw a couple days ago. I compared 2 slices of code that do similar jobs in CherryPi and the CIG 2017 version of Tscmoo. I looked at the combat simulator in both, and code implementing the 1 hatch lurker opening in both.
Note well: If I had looked at different code, I might have drawn different conclusions. I deliberately selected code with related purposes that might be connected. In some places, CherryPi uses ideas from the old BroodwarBotQ that was written up in Gabriel Synnaeve’s PhD thesis.
1. I think CherryPi directly copied nothing from Tscmoo. I didn’t expect it to. The overall architecture was likely decided before Tscmoo the person joined the team. Besides, an academic usually wants credit to be clear, and a corporation usually wants ownership to be clear. The code in detail looks quite different.
2. In the parts I looked at for this comparison, some structure and ideas in Tscmoo were carried over and seemingly reimplemented in CherryPi, with (I should repeat) great differences in detail. It’s clear that somebody familiar with Tscmoo wrote this CherryPi code. For example, in the combat simulator one has addUnit()
and run()
in that order, and the other add_unit()
and run()
in that order. They both refer to “teams”, both count up frames from 0 (I would have counted up from the current frame, some would have counted down to 0), and other shallow similarities.
3. CherryPi, in the parts I compared, seems to be simpler and more cleanly written. In the lurker opening in particular, I think CherryPi encodes the opening a little more abstractly. Sometimes Tscmoo has more features. Tscmoo’s combat simulator simulates splash damage, and CherryPi’s does not.
4. OpenBW is another source of ideas, and it is of course also connected with Tscmoo the person. For example, the FogOfWar
class says it is based on OpenBW. It calculates visibility depending on ground height and so on.
the openings
I always want to know, “what openings does it play?” In the directory CherryPi/src/buildorders
I see 16 classes that look like they could be build orders. The opening learning files include 15 build orders. The in_use.txt
file lists these 8 build orders as active or possibly active:
- 12hatchhydras
- zvp10hatch
- 5pool
- 2basemutas
- 3basepoollings
- 1hatchlurker
- meanlingrush (9 pool speed)
- ximptest (it says this one is “unknown status”)
I will watch games and find out what openings it plays in practice. Come back tomorrow!
As a sample of how openings are defined, here is a snip from the file CherryPi/src/buildorders/meanlingrush.cpp
showing the basic definition of 9 pool speed:
buildN(Zerg_Drone, 9);
buildN(Zerg_Extractor, 1);
buildN(Zerg_Spawning_Pool, 1);
if (countPlusProduction(st, Zerg_Hatchery) == 1) {
build(Zerg_Hatchery, nextBase);
buildN(Zerg_Drone, 9);
}
It writes on the blackboard: Make drones until you have 9, extractor and spawning pool, then add a second hatchery at an expansion and rebuild the drones to 9. Simple and concise. Details like spawning the overlord and figuring out exactly when to start the second hatchery are left for other code to fill in (in Steamhammer, you have to specify it explicitly). On the other hand, here is how it says to collect only 100 gas to research zergling speed:
if (hasOrInProduction(st, Metabolic_Boost) || st.gas >= 100.0) {
state->board()->post("GathererMinGasGatherers", 0);
state->board()->post("GathererMaxGasGatherers", 0);
} else {
state->board()->post("GathererMinGasGatherers", 3);
state->board()->post("GathererMaxGasGatherers", 3);
}
More writing on the blackboard. That’s a complicated test, where in Steamhammer you’d simply specify "go gas until 100"
. It’s fixable. They could, for example, write goals to the blackboard like “collect 100 gas for zergling speed” and have another module collect only enough gas to meet the goals.
machine learning
I’ll take two cases, online learning during the tournament, and offline learning before the tournament starts, producing data that can be fed to or compiled into the bot.
For online learning, the win rate over time graph for CherryPi shows a rapid increase in win rate from .4 to .7 within the first 10 rounds, then a gradual slight decline to the end of the tournament. It looks as though CherryPi rapidly learned how to play against each opponent, then more or less froze its decisions and allowed slower-learning bots to catch up a tiny bit. (Though swings in score in early rounds can also be due to statistical noise.) The readme file says:
CherryPi is a TorchCraft Zerg bot developed by Facebook AI Research.
It uses bandits to select learn strategies that work against a given
opponent.
“Bandits” refers to the n-arm bandit problem, which is behind most bots with opening learning. Looking at the file CherryPi/src/models/bandit.cpp
, I see that that is exactly what CherryPi is doing too. It uses the classic UCB1 algorithm to learn which opening to play against each opponent, just like many other bots.
I looked at the opening learning files, one for each opponent. They are in JSON format and are written by a general-purpose serializer that leaves the data a little hard to interpret by eye. It looks like value2
maps between the 15 opening names and 15 opening index numbers. value3
is 15 zeroes, and value4
and value5
are the learned data for the 15 indexes 0 through 14.
The only offline learning that I found is the same opening learning, performed for certain opponents ahead of time.
- Iron
- LetaBot
- Skynet
- Xelnaga
- ZZZKBot
I can’t guess how they came up with that set of 5 opponents to pre-learn openings against. For these opponents, CherryPi relied on its offline learning exclusively; it did not write new learning data for these opponents. It’s such a strange decision that I have to wonder whether it’s a bug. In any case, we saw yesterday that it backfired against ZZZKBot, which did not play as expected: Unable to learn, CherryPi played the same unsuccessful opening every time, and lost over and over. Both ZZZKBot and CherryPi had incorrect prior knowledge about each other, and only ZZZKBot adapted.
conclusion
It is clear to me that CherryPi the project is not far along compared to where they are aiming. There are plenty of TODOs in the code. The big machine learning ideas that (if successful) could make CherryPi superhuman are not there yet; only some foundations are laid. CherryPi is still a scripted bot like others, not a machine learning bot. Even so, with (as I understand it) 8 people on the team, they have done a tremendous amount of work. They implemented ideas—most of which I didn’t write about—that I wish I had time to do myself. If they can maintain the rate of progress, then within a few years individual coders won’t be able to keep up. On the other hand, progress may slow when they get to the hard part. We’ll have to stay tuned!
Next: Games by CherryPi.