AIIDE 2019 results first look

Important update on Friday 11 October: The results are invalid due to an error and the tournament will be repeated from scratch. See Dave Churchill’s tweet “The 2019 AIIDE StarCraft AI Competition will have to be re-run due to an error on our part causing a corrupted file which caused McRave to crash a lot of games.” The same error might have caused other problems. Even if McRave was the only bot directly affected, the competition was round robin so every bot’s score was potentially affected.

The AIIDE 2019 results were announced today at the conference. The AIIDE conference stream includes Dave Churchill’s presentation starting at about 52:30.They come with a video of Locutus versus PurpleWave, with commentary by Dan Gant focusing not on the game, but on the AI techniques.

The standings: #1 Locutus edged out #2 PurpleWave. #3 DaQin and #4 BananaBrain were far behind, but finished out the dominant protoss bloc. (The win rate over time graph strangely omits #4 BananaBrain.) #5 Iron, #6 Microwave, #7 XiaoYi, and #8 Steamhammer were closely grouped around 50% win rate. As in CoG, Iron is the top terran and the top returning bot, and Microwave was the top zerg.

#10 McRave did surprisingly poorly. It must be suffering from new bugs. I notice that McRave’s army has become strangely passive; it sometimes seems unwilling to fight even with a large advantage. That seems like a symptom of an important bug.

#8 Steamhammer did about as I expected, or at least as I expected after I noticed the combat sim bug that I had just added. Without that bug I think it would have finished slightly ahead of Microwave. I’m bothered by the 59% win rate against Iron, though; I expected over 90%. I tested on every map with the correct version of Iron, but must have made a mistake somewhere.

Last year, Bruce Nielsen provided diffs from Locutus for bots derived from it. This year, Dan Gant has provided diffs of a few other bots.

• Stormbreaker derived from SAIDA - Stormbreaker was disqualified because its behavior was nearly identical to SAIDA’s, though there are big code differences. According to the presentation, Stormbreaker adds a neural network but does not use it.

• XiaoYi derived from SAIDA - According to the presentation, SAIDA would likely have finished 3rd if it had played. XiaoYi placed 7th behind Microwave.

• DaQin this year versus last year. I see a great many detailed changes.

We were promised a second competition on “unknown” maps, for those bots which did not opt out. I count 8 participants for the second competition. I don’t see a sign of its results. Perhaps it has not been run yet.

As always, I will analyze both CoG and AIIDE. But CoG is showing evidence of sloppiness, so AIIDE deserves more attention. With fewer entrants in AIIDE this year, it won’t take as long to dig into them. But I think I have almost managed to interpret the CoG result file, so I’ll start there.

Trackbacks

No Trackbacks

Comments

Dave Churchill on Friday, October 11. 2019:

Some quick notes before bed:

- Inspection showed that most of the code changes in Stormbreaker from SAIDA is code re-ordering, weird large commented sections, and mostly cosmetic things. They call their neural network only in one location but then immediately the result goes out of scope and is never used. I spoke to the author and they agreed with the DQ after presenting these facts

- We are investigating why McRave didn't start nearly every game vs. Locutus. If we find it's on our side of things, then we will re-run the tournament. We made updates to the software for more accurate reporting of crashes and it is possible we have a mistake. But very strange that it would only affect one bot vs. another bot. They both use BWEB according to McRave so maybe this is part of the culprit? Note that BASIL doesn't check for start-up times or frame timeouts

- Due to an earlier AIIDE, we haven't run the 2nd unknown maps competition yet, that will come later this month. Apologies for delays

Jay Scott on Friday, October 11. 2019:

I appreciate the attention to detail, as always!

Looking forward to the second competition.

Jay Scott on Friday, October 11. 2019:

I updated the post with a third diff link from Dan Gant, for DaQin.

Bruce on Sunday, October 13. 2019:

I see some of my changes in there as well, so it is likely that the DaQin author pulled in at least some of my AIIDE 2018 changes since last year’s entry.

I’ve uploaded a branch for 2019 DaQin on my github as well, so a diff with my AIIDE 2018 tag is probably possible there to see the full divergence from Locutus.

Jay Scott on Friday, October 11. 2019:

I expect that rerunning the tournament will take a week, or likely somewhat more because they’ll be careful to check for any other errors.

McRave on Friday, October 11. 2019:

Sorry I haven't had time to comment due to lengthy work days.

I was opposed to re-running the tournament but I understand that it's the right thing to do.

My bot did poorly from mostly timeouts which I did not find in testing, but are obvious from some videos posted on Discord of McRave running on AIIDE hardware / TM.

It's disappointing, but it is what it is. The bot is strong but did not constrain itself to the rules of the tournament.

Dave Churchill on Saturday, October 12. 2019:

We found the error with McRave - our tournament machine had a BSOD due to a RAM error (we swapped some more in just before the tournament) at the exact instant McRave was writing its Locutus learning file which corrupted it. Every subsequent game vs Locutus McRave timed out while reading the file.

Now that we identified this we will be re-running the entire tournament this week, so you may want to hold off on analysis until then

Jay Scott on Saturday, October 12. 2019:

Nice work tracing the exact cause!

It will be interesting to compare the “unofficial preliminary” results with the final results.

Dave Churchill on Wednesday, October 16. 2019:

New results are up. Hard-refresh so the cache doesn't reload the old results

Jay Scott on Wednesday, October 16. 2019:

Thanks. The results have some striking differences.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA