CIG 2016 results discussion
I got ahead of myself yesterday—I should step back and talk about the CIG 2016 results more generally! Martin Rooijackers aka LetaBot sent me a few observations by e-mail. They mostly match up with my observations, and I’ll add a few of my own.
• Terran Renaissance confirmed, as predicted (probably by everybody who cared to predict).
• The top 3 winners, besides being terran, are all bots with many updates over the last several months.
• 3 bots of the final 8 are carryovers from past years (#5 Overkill, #6 UAlbertaBot, and #8 AIUR). They scored in the lower half. #4 ZZZKBot seems to have been only slightly updated. The long work put into the top 3 paid off in playing strength.
• Martin Rooijackers observes that #7 MegaBot is the highest-scoring brand new bot. It’s true if you count Iron as a continuation of Stone. And given MegaBot’s self-description as a meta-bot that uses the strategies of others, MegaBot is arguably not brand new either. In any case, the point is that it seems to take a long period of work to get to the top. The competition is fierce.
• None of the final 8 bots dominated the others. Even tail-ender AIUR had an equal record against winner Tscmoo and a winning record against LetaBot. The CIG 2016 finals crosstable has upsets throughout. Comparing to the AIIDE 2015 crosstable with 22 participants, the rate of upsets of bots near each other in rank seems visually similar, so with only 8 final bots the upsets run all the way through. Generally, bot #n is not clearly better than bot #n+1; the ranking is not stable at that level. In the qualifying stage, the rate of upsets visually looks steady down to #9 Tyr and then falls. AIIDE 2015 did not have that pattern.
• I predicted that ZZZKBot still had a chance to make it into the top 3. It didn’t, but it scored 53.08% to make #4 in the finals versus #3 LetaBot’s 53.71%. I think the prediction was justified. This was its last chance, though, without big updates.
• The qualifier results and finals results look different. Iron was narrowly on top in the qualifiers, but Tscmoo pulled well ahead in the finals (a surprise to me). Apparently Tscmoo is better tuned to defeat strong opponents.
• The slides on the result page include a chart of win rates over time which shows that learning helps some, but (as in the past) not as much as you’d hope. To learn more we need smarter learning. I’ll drop a few suggestions in a future post.
The bottom line is that we’re making good progress, though we’re still not far along the path. Tscmoo’s long short term memory is a pioneering idea and Tscmoo finished #1, but we don’t know much about it. Did the memory help results? Meanwhile, LetaBot finished #3 here, and is in a strong position as Martin Rooijackers tries to pioneer a next step in another direction, a tactical search derived from MaasCraft. Will the search lead to the hoped-for jump in strength? Tune in next time!
I question the tournament design. They ran a 100-round round robin with 16 bots and used the results to accept half of the entrants into the final—a staged design with qualifier and finals. That’s perfectly reasonable; it says that they’re more interested in who beats the strong than who consistently beats the weak. Having selected the finalists, they discarded the qualifier results and ran an independent final with 100 more rounds on the same maps for the 8 finalists. They even discarded bot learning files from the qualifier, so that nothing carried over. The final duplicated the qualifiers, only with fewer bots, and produced little new information. They could have saved the time and extracted the final results from the qualifier stage. It would have been equivalent.
In a staged tournament, each stage should produce new information. It could add to the qualifier results. It could have more rounds. It could include seeded opponents that skipped the qualifiers (though I wouldn’t recommend that for an academic tournament). It could include different maps. It could follow harsher rules. But something!
I can understand why they didn’t pass the qualifier results through to the final stage. They had the software they had, and an organizer’s time is always short. But this final had no point. I hope future tournaments will remember the lesson.
Comments