CIG 2016 - the final hidden in the qualifier
Yesterday I claimed that the final stage of CIG 2016 produced little new information, because it was equivalent to drawing a subset from the qualifiers. Is it true? I wrote a script to render crosstables from subsets of game results.
Here’s my rendition of the real finals. I liked the red and green color coding of win rates in the original, but some people are red-green colorblind so my version has red and blue instead. I also went with a more contrasty color curve.
overall | tscm | Iron | Leta | ZZZK | Over | UAlb | Mega | Aiur | |
---|---|---|---|---|---|---|---|---|---|
tscmoo | 65.14% | 52% | 44% | 79% | 71% | 77% | 83% | 50% | |
Iron | 54.43% | 48% | 38% | 49% | 49% | 74% | 30% | 93% | |
LetaBot | 53.71% | 56% | 62% | 49% | 81% | 69% | 30% | 29% | |
ZZZKBot | 53.08% | 21% | 51% | 51% | 42% | 35% | 93% | 78% | |
Overkill | 51.43% | 29% | 51% | 19% | 58% | 43% | 81% | 79% | |
UAlbertaBot | 49.07% | 23% | 26% | 31% | 65% | 57% | 76% | 66% | |
MegaBot | 38.00% | 17% | 70% | 70% | 7% | 19% | 24% | 59% | |
Aiur | 35.14% | 50% | 7% | 71% | 22% | 21% | 34% | 41% |
Here is the crosstable of the final hidden in the qualifier, which is to say the qualifier games played between finalists.
overall | tscm | Iron | Leta | ZZZK | Over | UAlb | Mega | Aiur | |
---|---|---|---|---|---|---|---|---|---|
tscmoo | 61.71% | 44% | 48% | 82% | 53% | 87% | 75% | 43% | |
Iron | 56.57% | 56% | 39% | 53% | 56% | 63% | 38% | 91% | |
LetaBot | 52.00% | 52% | 61% | 51% | 81% | 60% | 28% | 31% | |
ZZZKBot | 52.14% | 18% | 47% | 49% | 44% | 45% | 93% | 69% | |
Overkill | 51.57% | 47% | 44% | 19% | 56% | 32% | 84% | 79% | |
UAlbertaBot | 48.29% | 13% | 37% | 40% | 55% | 68% | 48% | 77% | |
MegaBot | 42.86% | 25% | 62% | 72% | 7% | 16% | 52% | 66% | |
Aiur | 34.86% | 57% | 9% | 69% | 31% | 21% | 23% | 34% |
Overall results match closely. LetaBot and ZZZKBot have switched ranks, but that’s not a surprise because their scores were extremely close.
The 2 table cells with the largest differences are Tscmoo vs Overkill and MegaBot vs UAlbertaBot. The Tscmoo-Overkill numbers are within the expected range of statistical variation, according to spot checks with Fisher’s Exact Test, but the MegaBot-UAlbertaBot numbers are highly surprising, far outside the expected range. (The right way to do this would test both whole tables as a sample of samples of samples. :-) So there’s indication that something may be afoot.
I had a new thought. It’s theoretically possible that differences are caused by learning bots which generalize across opponents. Tscmoo and MegaBot are both learning bots (I verified it: they both wrote stuff to their learning files) and both seem as though they might be able to generalize across opponents. (Overkill is a learning bot but does not generalize.) So my original claim is not 100% true: The qualifiers don’t entirely duplicate the final in the presence of learning bots which generalize across opponents. Alternately, there could have been a problem with a big effect on that pairing (such as a bug in MegaBot related to its learning, an example which is equivalent to mis-generalizing across opponents). We have the source and the replays, so a sufficiently deep dig should turn up the issue if it is in the bots. There’s a chance that the issue is with the tournament operations, or with my script.
Here I combine the qualifier results with the final results to get the best numbers available. The organizers for whatever reason explicitly decided not to do this. Luckily, it doesn’t change the ranking of the bots.
overall | |
---|---|
tscmoo | 63.43% |
Iron | 55.50% |
LetaBot | 52.86% |
ZZZKBot | 52.61% |
Overkill | 51.50% |
UAlbertaBot | 48.68% |
MegaBot | 40.43% |
Aiur | 35.00% |
Tomorrow: More map analysis. Also I’ll release the script for others to play with.
Comments
Jay Scott on :
krasi0 on :
Jay Scott on :