comparing AIIDE 2015 and CIG 2016 Elo ratings

The cool technique I had in mind to compare ratings across tournaments turned out not to work. Not cool after all. But 6 bots played unchanged in both AIIDE 2015 and CIG 2016, and we can compare their relative ratings. In this table the subtract column gives the AIIDE 2015 rating minus the CIG 2016 rating.

bot	AIIDE Elo	CIG elo	subtract	normalize
UAlbertaBot	1895	1778	117	35
Overkill	1890	1796	94	12
Aiur	1784	1687	97	15
TerranUAB	1372	1338	34	-48
OpprimoBot	1231	1154	77	-5
Bonjwa	1171	1099	72	-10
average			82	0

As you might expect, two tournaments with different maps and different opponents give different ratings. UAlbertaBot and Overkill swapped ranks among the 6. But after correcting for the 82 point offset (since only rating differences matter), the ratings turn out to be quite close between the tournaments. The biggest difference is for TerranUAB. Look up 48 points in the Elo table—it says that TerranUAB has a 57% probability of beating itself, not a drastic error.

You can try to convert a CIG 2016 rating into a rough estimate of an AIIDE 2015 rating by adding 82. For example, tscmoo terran earned a CIG rating of 1888, which corresponds to an AIIDE rating of 1888+82 = 1970, whereas the tscmoo zerg that played in AIIDE earned a rating there of 2026. So the estimate appears to be way off. But estimates made this way are likely to be closer for bots near the middle of the pack.

Next: Another mass of colorful crosstables.

Trackbacks

No Trackbacks

Comments

Jay Scott on Wednesday, October 5. 2016:

Notice that the normalized numbers at the top of the table are positive, and those at the bottom are negative. There’s a stretching effect: The AIIDE ratings are more spread out than the CIG ratings. I think that’s because there were more participants and more games in AIIDE, which gave bayeselo more evidence to separate the ratings. A close look could probably quantify the effect and put comparisons on a firmer footing.

Bryan S Weber on Wednesday, October 5. 2016:

Hey, I suspect this does not work either unless the comparison groups are identical. (Identical bots in both tournements). My best is have your new UAlberta bot play in the old tourny body for a comparative benchmark. PS, where could I look at a bot and explore what you've done?

Jay Scott on Thursday, October 6. 2016:

You’re right, results will vary depending on the maps and on the set of opponents. One way to think of it is: This test is a way to get some idea of how big those effects are.

Jay Scott on Thursday, October 6. 2016:

Results of past competitions and source code of a bunch of bots (this page just moved and is not updated for this year): https://www.cs.mun.ca/~dchurchill/starcraftaicomp/archive.shtml. Video showing how to get bot binaries from SSCAIT and play against them: https://youtu.be/v99ZIMsjTPM. Is that a good enough start?

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA