CoG 2020 last year had 8 participants and played 5600 games, which works out to 200 for each pair of opponents. CoG 2021 this year had 9 participants and played 1800 games, far fewer. That made 50 games per pairing, 1/4 as many for each pair of opponents. The bot winning rates look mostly well-separated, so probably the smaller game count did not much affect the finishing order. Only #4 Microwave/#5 PurpleWave and #6 XiaoYi/#7 BetaStar are closely ranked and might have swapped places. Surely the top 3 would have finished in the same order. The shorter tournament does theoretically give an edge to bots which don’t learn, or don’t learn much, or otherwise reach their learning asymptote quickly, as compared to bots which can keep improving their decisions over a long period.
Last year, 23 of the 5600 games failed to complete with a result, and were not counted—about 4 per thousand. This year, it was 10 of 1800 games—over 5 per thousand. Reliability is about the same, probably because the same bot was responsible for most failures in both years, the technically tricky MetaBot.
And of course the maps are different, with Great Barrier Reef alone played in both years. With only 5 maps total, the different map choices may not average out nicely; there are likely to be accidental differences in how well each bot likes the maps on average. The difference should be pretty small, though I would still prefer 10 maps over 5.
holdover results
#6 XiaoYi, #7 BetaStar, and #8 MetaBot are carried over from last year—and also from before. We have a brief history of how well they’ve done against the field.
bot | 2019 | 2020 | 2021 |
BetaStar | 67.41% | 51.73% | 39.29% |
XiaoYi | 72.21% | 36.57% | 40.10% |
MetaBot | 59.04% | 11.02% | 23.08% |
BetaStar performed worse than last year, but XiaoYi and MetaBot both performed better. That is because the added participant this year was bottom-ranked CUNYbot, which all opponents defeated with overwhelming scores. The field is weaker, so the holdovers look stronger.
holdovers versus updated bots
CUNYbot did not play last year (it last competed in CoG 2019), but all other participants are the same. We can make a closer comparison of 2020 and 2021 by leaving out the newcomer CUNYbot and considering only the holdovers and the updated returning bots. (It’s possible only because the tournament has lost popularity and, in the last 2 years, has retained only a hard core.)
bot | 2020 | 2021 |
BetaStar | 51.73% | 33.43% |
XiaoYi | 36.57% | 31.81% |
MetaBot | 11.02% | 13.82% |
In this virtual tournament, MetaBot was able to hold its low position without slipping further, but the other 2 clearly fell since last year. The field ex CUNYbot seems to have become tougher.
holdover tournament
I noticed something funny, though, that makes me question repeatability. Here are the 2020 and 2021 results of the subtournament played among the holdover bots only. They look different, and have a different finishing order.
2020 | overall | Beta | XIAO | Meta |
#1 BetaStar | 262/388 67.53% | | 93/200 46% | 169/188 90% |
#2 XIAOYI | 260/400 65.00% | 107/200 54% | | 153/200 76% |
#3 MetaBot | 66/388 17.01% | 19/188 10% | 47/200 24% | |
2021 | overall | XIAO | Beta | Meta |
#1 XIAOYI | 64/99 64.65% | | 32/50 64% | 32/49 65% |
#2 BetaStar | 51/97 52.58% | 18/50 36% | | 33/47 70% |
#3 MetaBot | 31/96 32.29% | 17/49 35% | 14/47 30% | |
The bots are the same. Do the different maps make that much difference? Let’s compare the numbers by map (I put these two tables into the same sort order, not following the finishing order).
2020 | overall | BlueSt | Alchem | GreatB | Androm | LunaTh |
#2 XIAOYI | 65.00% | 85% | 81% | 59% | 62% | 38% |
#1 BetaStar | 67.53% | 59% | 65% | 82% | 58% | 73% |
#3 MetaBot | 17.01% | 6% | 4% | 9% | 28% | 40% |
2021 | overall | Rideof | GreatB | NeoAzt | NeoSni | Python |
#1 XIAOYI | 64.65% | 75% | 55% | 55% | 74% | 65% |
#2 BetaStar | 52.58% | 55% | 58% | 53% | 35% | 63% |
#3 MetaBot | 32.29% | 20% | 37% | 42% | 42% | 21% |
Compare the columns for Great Barrier Reef, the map that was the same both years. BetaStar and MetaBot both performed very differently on Great Barrier Reef. The same bots, with the same opponents, on the same map, produced different results. The smaller number of games played this year could add random variation, but this seems like a lot to me.
I assume the difference is due to the randomness of learning. A bot trying to adapt to an opponent may randomly hit on a good idea soon and learn quickly, or may hit on the good idea later and learn slowly. I’ve seen that make a huge difference. I can’t rule out that the difference is due to a change in tournament conditions; in fact, the much stronger performance of MetaBot this year and weaker performance of BetaStar suggests that there may be a change in tournament conditions. No matter the cause, I have to think that year-over-year comparisons are questionable. In other words, this whole post was a waste, and I might as well have forgotten history!