archive by month
Skip to content

comparing AIIDE 2015 and CIG 2016 Elo ratings

The cool technique I had in mind to compare ratings across tournaments turned out not to work. Not cool after all. But 6 bots played unchanged in both AIIDE 2015 and CIG 2016, and we can compare their relative ratings. In this table the subtract column gives the AIIDE 2015 rating minus the CIG 2016 rating.

botAIIDE EloCIG elosubtractnormalize
UAlbertaBot1895177811735
Overkill189017969412
Aiur178416879715
TerranUAB1372133834-48
OpprimoBot1231115477-5
Bonjwa1171109972-10
average820

As you might expect, two tournaments with different maps and different opponents give different ratings. UAlbertaBot and Overkill swapped ranks among the 6. But after correcting for the 82 point offset (since only rating differences matter), the ratings turn out to be quite close between the tournaments. The biggest difference is for TerranUAB. Look up 48 points in the Elo table—it says that TerranUAB has a 57% probability of beating itself, not a drastic error.

You can try to convert a CIG 2016 rating into a rough estimate of an AIIDE 2015 rating by adding 82. For example, tscmoo terran earned a CIG rating of 1888, which corresponds to an AIIDE rating of 1888+82 = 1970, whereas the tscmoo zerg that played in AIIDE earned a rating there of 2026. So the estimate appears to be way off. But estimates made this way are likely to be closer for bots near the middle of the pack.

Next: Another mass of colorful crosstables.

Trackbacks

No Trackbacks

Comments

Jay Scott on :

Notice that the normalized numbers at the top of the table are positive, and those at the bottom are negative. There’s a stretching effect: The AIIDE ratings are more spread out than the CIG ratings. I think that’s because there were more participants and more games in AIIDE, which gave bayeselo more evidence to separate the ratings. A close look could probably quantify the effect and put comparisons on a firmer footing.

Bryan S Weber on :

Hey, I suspect this does not work either unless the comparison groups are identical. (Identical bots in both tournements). My best is have your new UAlberta bot play in the old tourny body for a comparative benchmark. PS, where could I look at a bot and explore what you've done?

Jay Scott on :

You’re right, results will vary depending on the maps and on the set of opponents. One way to think of it is: This test is a way to get some idea of how big those effects are.

Jay Scott on :

Results of past competitions and source code of a bunch of bots (this page just moved and is not updated for this year): https://www.cs.mun.ca/~dchurchill/starcraftaicomp/archive.shtml. Video showing how to get bot binaries from SSCAIT and play against them: https://youtu.be/v99ZIMsjTPM. Is that a good enough start?

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.