comparing AIIDE 2015 and CIG 2016 Elo ratings
The cool technique I had in mind to compare ratings across tournaments turned out not to work. Not cool after all. But 6 bots played unchanged in both AIIDE 2015 and CIG 2016, and we can compare their relative ratings. In this table the subtract column gives the AIIDE 2015 rating minus the CIG 2016 rating.
| bot | AIIDE Elo | CIG elo | subtract | normalize |
|---|---|---|---|---|
| UAlbertaBot | 1895 | 1778 | 117 | 35 |
| Overkill | 1890 | 1796 | 94 | 12 |
| Aiur | 1784 | 1687 | 97 | 15 |
| TerranUAB | 1372 | 1338 | 34 | -48 |
| OpprimoBot | 1231 | 1154 | 77 | -5 |
| Bonjwa | 1171 | 1099 | 72 | -10 |
| average | 82 | 0 |
As you might expect, two tournaments with different maps and different opponents give different ratings. UAlbertaBot and Overkill swapped ranks among the 6. But after correcting for the 82 point offset (since only rating differences matter), the ratings turn out to be quite close between the tournaments. The biggest difference is for TerranUAB. Look up 48 points in the Elo table—it says that TerranUAB has a 57% probability of beating itself, not a drastic error.
You can try to convert a CIG 2016 rating into a rough estimate of an AIIDE 2015 rating by adding 82. For example, tscmoo terran earned a CIG rating of 1888, which corresponds to an AIIDE rating of 1888+82 = 1970, whereas the tscmoo zerg that played in AIIDE earned a rating there of 2026. So the estimate appears to be way off. But estimates made this way are likely to be closer for bots near the middle of the pack.
Next: Another mass of colorful crosstables.
Comments
Jay Scott on :
Bryan S Weber on :
Jay Scott on :
Jay Scott on :