comparing strength across time

We don’t get many tournaments of bots versus humans. I don’t think there have been any with conditions controlled well enough that we can judge how strong bots are and how they are improving: Enough human participants, of known strength, with known levels of familiarity with computer play, finishing enough games. Then hold events across years so we can compare. We have to make do with seeing how bots are improving against other bots. Here is my best idea so far for comparing strength across tournaments.

1. We need 2 tournaments, preferably round robin, that share some participants—exactly identical bots, the more the better. We can’t do it with humans, because we can’t get exactly identical people across time. Ideally the maps should be the same too. AIIDE has more games, and SSCAIT has more shared participants; either should work, but I think SSCAIT may work better for this purpose despite being short by comparison. You could also compare between AIIDE and SSCAIT, but it would not work as well. It would take extra effort to make sure you know which players are exactly identical, and the different lengths of the tournaments means each provides a different amount of evidence to support the ratings, plus you could get confusing results for learning bots.

2. Pool all the games from both tournaments and compute elo ratings. If some participants which are not identical have the same names, distinguish them somehow—Steamhammer 2017 versus Steamhammer 2018, or whatever.

3. The identical players have identical strength in both tournaments, so consider their elo ratings as fixed. For each tournament separately, compute the elo ratings of the remaining players while keeping the ratings of the identical players fixed. The fixed ratings are benchmarks that keep the elo comparison stable for the remaining players (the idea has been used before).

It’s the best way I’ve thought of to get strength comparisons across time. We can get a pretty accurate measure of how individual bots have improved—Steamhammer 2018 is this much above Steamhammer 2017. We can treat elo as a linear measure of strength (a given elo interval always represents the same win rate difference), so we can simply average together the ratings of any set of bots to compare: The top 16 are x points stronger this year, the protoss are y points stronger, the spread between best and worst has widened to....

I may do this analysis for SSCAIT once it finishes. It’s a bit elaborate, but I’m interested.

Trackbacks

No Trackbacks

Comments

Antiga / Iruian on Sunday, January 6. 2019:

I think part of what is needed is a historical archive of bots with every update / update history in order to compare properly.

Jay Scott on Sunday, January 6. 2019:

That would be a natural feature for BASIL.

Joseph Huang on Sunday, January 6. 2019:

There are better ways - just run a RR tournament with new bots and old bots.

Jay Scott on Sunday, January 6. 2019:

Alas, I can’t do it in an afternoon. Maybe somebody else can.

Edmund Nelson on Monday, January 7. 2019:

The natural experiment version

1. SSCAIT 2016 vs SSCAIT 2017

(good) Bots that were unchanged
1. KillerBot
2. Bereaver
3. Ximp
4, Skynet
5. UalbertaBot

Since killerbot was changed in 2018 we lose a very useful piece of data for this experiment. But Iron and Letabot being unchanged in 2018 gives us 2 top bots that remained unchanged.

So there are only 4 (top) bots that remained unchanged in the 3 consecutive years of Student starcraft AI tournament

Jay Scott on Monday, January 7. 2019:

I have collected the list of bots which are unchanged between SSCAIT editions 2017 and 2018: There are 26. That is a lot! 17 of them have 2017 win rates 50% and up. The lower half of the table is not as well represented.

MicroDK on Monday, January 7. 2019:

You can use the sc-docker version from BASIL ladder to run a tournament with LF3 and BWAPI 3.7.4+ working. ;)
https://github.com/Bytekeeper/sc-docker
https://github.com/Bytekeeper/sc-docker/blob/master/docker/Setup%20in%20PowerShell%20--%20instructions.md

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA