archive by month
Skip to content

AIIDE 2019 second first look

The AIIDE 2019 tournament has been rerun to correct an error. The results are official, different from before, and hopefully final. In the original run of the tournament, we’re told, a hardware error corrupted a file and caused McRave to crash every game against Locutus. In the corrected rerun, McRave was able to score 1 win against Locutus in 100 games, but ironically ended up with a slightly lower overall winning rate. Bugs in McRave were more important for its result than bugs in the tournament.

#1 Locutus and #2 PurpleWave maintain their positions, but Locutus no longer had plus results against every opponent: PurpleWave edged it out 55-45 in their matchup. #3 BananaBrain gained a rank, and #4 DaQin lost one. From my point of view, the most important result is that #5 Steamhammer moved ahead of #7 Microwave and #8 Iron—these competitors were tightly grouped, and it only took small changes in the results to switch their finishing order around thoroughly.

shifts in the results

The order of finishers looks different, but most winning rates in the final results are within a few percent of the deprecated original results. Exceptions are #4 DaQin at 63.33% which was formerly #3 DaQin at 69.39%, a shift of 6% down, and #6 ZZZKBot at 52.08%, formerly #9 ZZZKBot at 43.04%, a shift of 9% up. What accounts for these two bots having such different results? To my eye, it doesn’t look like typical statistical variation.

I looked at the scores of specific matchups. Surprise result one: Formerly ZZZKBot scored 18-82 versus DaQin, but this time ZZZKBot 90-10 DaQin. This one difference accounts for the entire shift in DaQin’s winning rate, moving it down a rank, and much of ZZZKBot’s shift. Surprise result two: Formerly ZZZKBot 34-66 McRave, but this time ZZZKBot 67-33. That accounts for McRave performing worse overall, and for ZZZKBot jumping up the ranks. In other matchups, ZZZKBot performed similarly in both runs of the tournament.

Why did ZZZKBot perform so differently in these 2 matchups alone? I’ll dig in later, but I can speculate; here are 3 possible reasons, and it could be something else. There is some smell of software error: 18-82 -> 90-10 and 34-66 -> 67-33 look as though the results for the players were swapped. Or perhaps ZZZKBot was affected by the hardware error in these 2 matchups. Another possibility is unstable learning. I know that Steamhammer can perform very differently in two runs of the same matchup depending on what openings it happens to randomly try (does it hit on a winner early?). ZZZKBot’s learning is complicated and hard to analyze, but maybe it is susceptible to some effect like that.

Trackbacks

No Trackbacks

Comments

Dave Churchill on :

I believe people on Discord have dug up the reasoning for the wild swings - the bots with the swings managed to pick a better strategy vs. the opponent earlier in the 2nd run, which yielded drastically different results. With 'only' 100 games, there can be huge swings depending on if/when the strategy selection converges. The bots definitely were not swapped, and there were no hardware errors this time.

The initial hardware error was caused by me swapping some RAM into a new machine right before the run of the initial tournament. The 2nd tournament was run on different hardware and so we're 100% sure that didn't happen again. This is just one of the dangers of using strategy selection. The extreme case is if both bots have some sort of Rock-Paper-Scissors and chase each other in a cycle trying to find a winning strategy. In one case if you chose the winner first (possibly randomly) you could end up 100-0 instead of 0-100

Dan on :

Congrats to Locutus for an extremely deserved win! The bot continues to be a monster competitor, and this tournament's new cannon-drop strategy was a clever and successful shot at sidestepping the PvP opening book.

Bot learning dynamics make for huge variances in matchup results. I've observed this on lots of 50-100 game test runs. If your bot has one good strategy for the matchup, and it loses the first time around, you're likely going another N (= # of strategies) losses in a row before you get a second shot.

If you've got one strategy that wins 60-40, and ten strategies that lose 10-90, the odds that you get fooled into trying a succession of bad builds is very high.

A prominent example of that is the most recent SSCAIT finals. PurpleWave went 7-1 against Locutus, but the results on BASIL immediately thereafter were 50-50. Locutus had strategies with greater than 50% winrate against PurpleWave's DT-expand strategy, but didn't settle in on them in time. That match could just as easily have gone 7-1 Locutus as it did 7-1 PurpleWave.

That high-variance dynamic is a major driver for the amount of pre-training and strategy filtering that I put into PurpleWave before each contest. Not only is exploration totally unaffordable in win% formats (PurpleWave alone accounted for over half of Locutus' losses -- meaning you really need to go near-100% against everyone else to win), but a few bad early-round coinflips can quickly cost you an extra 20+ games.

I continue to argue that the need to have strong opponent priors is one of a few good reasons to move away from win% formats. Priors help in all formats, but are overweighted in formats where exploration is unaffordable.

Congrats again to Locutus. And thanks to Dave Churchill and Rick Kelly for running this year's throwdown and for ensuring a smooth and accurate operation. Looking forward to the results on the alternative map pool.

MicroDK on :

This was discussed in Discord and we found out that ZZZKBot apparently has a very variance in its learning so it can make a big impact on a tournament. :(

I didn't believe that a big swing like 18-82 -> 90-10 was possible without some sort of bugs. And I am still sceptical...

Quatari on :

As I mentioned on Discord during the discussion, old versions of ZZZKBot like that version do have a bug in their learning logic that can partially explain its high variance against some opponents.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.