AIIDE 2021 - BananaBrain versus Dragon
BananaBrain and Dragon both recorded their own opening builds for all 157 games played, so I can align their learning files and see how their strategies matched up against each other. BananaBrain also recorded its representation of what the opponent played, so I can compare its idea of Dragon’s build with Dragon’s own idea. I first did this last year. Dragon is carried over from last year unchanged, while BananaBrain is much stronger now.
The win rates and coloring are from the point of view of BananaBrain. Blue is good for BananaBrain and red is good for Dragon.
bananabrain strategies versus dragon strategies
overall | 1rax fe | 2rax bio | 2rax mech | bio | dirty worker rush | mass vulture | siege expand | |
---|---|---|---|---|---|---|---|---|
overall | 117/157 75% | 8/8 100% | 19/30 63% | 8/8 100% | 12/13 92% | 8/8 100% | 27/36 75% | 35/54 65% |
PvT_10/12gate | 34/48 71% | 1/1 100% | 9/17 53% | 1/1 100% | 2/2 100% | 2/2 100% | 3/5 60% | 16/20 80% |
PvT_1gatedtexpo | 0/1 0% | - | 0/1 0% | - | - | - | - | - |
PvT_28nexus | 3/6 50% | - | 1/2 50% | - | - | - | 1/1 100% | 1/3 33% |
PvT_2gaterngexpo | 2/4 50% | - | 0/1 0% | - | - | - | 1/1 100% | 1/2 50% |
PvT_32nexus | 0/1 0% | - | - | - | - | - | - | 0/1 0% |
PvT_9/9gate | 78/96 81% | 7/7 100% | 9/9 100% | 7/7 100% | 10/11 91% | 6/6 100% | 22/29 76% | 17/27 63% |
PvT_9/9proxygate | 0/1 0% | - | - | - | - | - | - | 0/1 0% |
dragon as seen by bananabrain
dragon played | # | bananabrain recognized |
---|---|---|
1rax fe | 8 | 7 T_unknown | 1 T_fastexpand |
2rax bio | 30 | 30 T_unknown |
2rax mech | 8 | 8 T_unknown |
bio | 13 | 13 T_unknown |
dirty worker rush | 8 | 8 T_unknown |
mass vulture | 36 | 21 T_1fac | 14 T_unknown | 1 T_2fac |
siege expand | 54 | 38 T_1fac | 16 T_unknown |
Last year this table showed that BananaBrain was weak at recognizing Dragon’s builds, with a lot of unknowns. There are more recognized builds this year, but BananaBrain plays differently so I’m not sure whether BananaBrain has improved at recognition. What is clear is that everything is blue. Recognizing some builds does not seem to have helped BananaBrain; it did well no matter what.
Comments
Dan on :
I expect a lot of the T_unknowns being due to BB's opener disrupting Dragon's, eg the Zealot pressure stopping a CC from going down and thus preventing a Rax FE diagnosis.
Dan on :
When testing CherryPi for AIIDE 2018 I would often run multiple 100-game series against the same opponent. What was shocking was how much the winrates could diverge based on path-dependent learning behavior. It was very easy to spook the bot off a build, or let it settle on a pretty-good-but-not-best build, and wind up with series that looked very different. This effect was magnified against bots that also learned; sometimes one bot or the other would get spooked out of its probably-best build on relatively few samples.
These dynamics lead me to the opposite conclusion of Bytekeeper in his recent comment: I don't think learning opponent priors will be that critical for maintaining high winrates going forward; doing things better than your opponents and having just enough variety (or the margins to afford trading efficiency for safety, as Stardust continues to do) to avoid exploitability is sufficient.
Jay Scott on :
There is a way out in principle: A bot could have a theory of how strategies work, and reason about why a certain strategy worked or did not in each game. That’s what humans do. If we had that, bots could adapt more quickly and accurately.
Tully Elliston on :
Against a human, I think the bots that mix it up like SH will do better, and make more interesting opponents.
Bytekeeper on :
I could still be very wrong. A large enough variety of builds and skills is certainly going to be needed, before any learning helps. I think many bots that use learning currently use it to "phase out" bad builds. And if they use learning for skill-based features, they will phase out those. That is most likely not intentional.
What I mean is, even if there was a killer build - due to lack of execution, a bot would learn not to use it. An author not being careful would just disregard the build. But how can one detect this kind of error? Especially without prior probability, that a build is good in human land for example.
For something like this, a bot would need to infer a cause for failure - even if it could not figure out a solution. In that case, authors could at least compensate for spooks a bit.
Johan de Jong on :
Jay Scott on :