archive by month
Skip to content

CIG 2016 Bayesian Elo ratings

Same as yesterday, Bayesian Elo ratings calculated by bayeselo, this time for CIG 2016. I included both the qualifier and the final, of course. That gives the best possible ratings, so that confidence is higher for the 8 finalists. But the “score” column becomes difficult to interpret, because part of the score of the top 8 bots comes from the final when they faced tougher opposition. You can’t directly compare the scores of bots 1-8 with the scores of 9-16, only the ratings.

Also, with this analysis it doesn’t make sense to compare the rating values between tournaments. Each tournament is independently scaled to have an average rating of 1500. Only the relative ratings of bots in the same tournament can be compared. Ratings are relative.

botscoreElo95% conf.better?
1tscmoo73%18881872-190498.5%
2Iron71%18641848-188099.9%
3LetaBot68%18271811-184399.7%
4Overkill65%17961781-181270.9%
5ZZZKBot64%17901775-180586.8%
6UAlbertaBot63%17781763-179399.8%
7MegaBot60%17461731-176199.9%
8Aiur54%16871671-170272.7%
9Tyr62%16791659-1699100%
10Ziabot46%15001479-1521100%
11TerranUAB34%13381316-1360100%
12SRbotOne22%11581133-118359.1%
13OpprimoBot22%11541128-117997.1%
14XelnagaII21%11191092-114586.3%
15Bonjwa19%10991072-1125100%
16Salsa1%579510-636-

The official results have LetaBot a hair ahead of ZZZKBot, then Overkill following. bayeselo has ZZZKBot and Overkill reversed, saying that LetaBot is clearly superior to Overkill, which is fairly likely to be superior to ZZZKBot. The difference comes about because, of course, the official results include only the final. Martin Rooijackers was justified after all in saying that ZZZKBot had fallen from the top 3. All other results agree with the official ranking. The tailing finalist Aiur is 72.7% likely to be superior to Tyr, so there is some doubt that the best finalists won through (in general the doubt can’t be avoided, though).

The tail-ender Salsa has a wide and asymmetrical confidence interval. It takes more evidence to pin down an extreme rating than a middle-of-the-road rating.

Tomorrow: I’ll try an analysis in which the ratings of unchanged bots are carried over from AIIDE 2015 to CIG 2016, so that we can compare between tournaments. I’m not sure how well it will work, or even if I can get it to work at all, but it will be interesting to try.

Trackbacks

No Trackbacks

Comments

tscmoo on :

Yey, I still win

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.