AIIDE 2019 second first look
The AIIDE 2019 tournament has been rerun to correct an error. The results are official, different from before, and hopefully final. In the original run of the tournament, we’re told, a hardware error corrupted a file and caused McRave to crash every game against Locutus. In the corrected rerun, McRave was able to score 1 win against Locutus in 100 games, but ironically ended up with a slightly lower overall winning rate. Bugs in McRave were more important for its result than bugs in the tournament.
#1 Locutus and #2 PurpleWave maintain their positions, but Locutus no longer had plus results against every opponent: PurpleWave edged it out 55-45 in their matchup. #3 BananaBrain gained a rank, and #4 DaQin lost one. From my point of view, the most important result is that #5 Steamhammer moved ahead of #7 Microwave and #8 Iron—these competitors were tightly grouped, and it only took small changes in the results to switch their finishing order around thoroughly.
shifts in the results
The order of finishers looks different, but most winning rates in the final results are within a few percent of the deprecated original results. Exceptions are #4 DaQin at 63.33% which was formerly #3 DaQin at 69.39%, a shift of 6% down, and #6 ZZZKBot at 52.08%, formerly #9 ZZZKBot at 43.04%, a shift of 9% up. What accounts for these two bots having such different results? To my eye, it doesn’t look like typical statistical variation.
I looked at the scores of specific matchups. Surprise result one: Formerly ZZZKBot scored 18-82 versus DaQin, but this time ZZZKBot 90-10 DaQin. This one difference accounts for the entire shift in DaQin’s winning rate, moving it down a rank, and much of ZZZKBot’s shift. Surprise result two: Formerly ZZZKBot 34-66 McRave, but this time ZZZKBot 67-33. That accounts for McRave performing worse overall, and for ZZZKBot jumping up the ranks. In other matchups, ZZZKBot performed similarly in both runs of the tournament.
Why did ZZZKBot perform so differently in these 2 matchups alone? I’ll dig in later, but I can speculate; here are 3 possible reasons, and it could be something else. There is some smell of software error: 18-82 -> 90-10 and 34-66 -> 67-33 look as though the results for the players were swapped. Or perhaps ZZZKBot was affected by the hardware error in these 2 matchups. Another possibility is unstable learning. I know that Steamhammer can perform very differently in two runs of the same matchup depending on what openings it happens to randomly try (does it hit on a winner early?). ZZZKBot’s learning is complicated and hard to analyze, but maybe it is susceptible to some effect like that.



