CIG 2016 Bayesian Elo ratings

Same as yesterday, Bayesian Elo ratings calculated by bayeselo, this time for CIG 2016. I included both the qualifier and the final, of course. That gives the best possible ratings, so that confidence is higher for the 8 finalists. But the “score” column becomes difficult to interpret, because part of the score of the top 8 bots comes from the final when they faced tougher opposition. You can’t directly compare the scores of bots 1-8 with the scores of 9-16, only the ratings.

Also, with this analysis it doesn’t make sense to compare the rating values between tournaments. Each tournament is independently scaled to have an average rating of 1500. Only the relative ratings of bots in the same tournament can be compared. Ratings are relative.

	bot	score	Elo	95% conf.	better?
1	tscmoo	73%	1888	1872-1904	98.5%
2	Iron	71%	1864	1848-1880	99.9%
3	LetaBot	68%	1827	1811-1843	99.7%
4	Overkill	65%	1796	1781-1812	70.9%
5	ZZZKBot	64%	1790	1775-1805	86.8%
6	UAlbertaBot	63%	1778	1763-1793	99.8%
7	MegaBot	60%	1746	1731-1761	99.9%
8	Aiur	54%	1687	1671-1702	72.7%
9	Tyr	62%	1679	1659-1699	100%
10	Ziabot	46%	1500	1479-1521	100%
11	TerranUAB	34%	1338	1316-1360	100%
12	SRbotOne	22%	1158	1133-1183	59.1%
13	OpprimoBot	22%	1154	1128-1179	97.1%
14	XelnagaII	21%	1119	1092-1145	86.3%
15	Bonjwa	19%	1099	1072-1125	100%
16	Salsa	1%	579	510-636	-

The official results have LetaBot a hair ahead of ZZZKBot, then Overkill following. bayeselo has ZZZKBot and Overkill reversed, saying that LetaBot is clearly superior to Overkill, which is fairly likely to be superior to ZZZKBot. The difference comes about because, of course, the official results include only the final. Martin Rooijackers was justified after all in saying that ZZZKBot had fallen from the top 3. All other results agree with the official ranking. The tailing finalist Aiur is 72.7% likely to be superior to Tyr, so there is some doubt that the best finalists won through (in general the doubt can’t be avoided, though).

The tail-ender Salsa has a wide and asymmetrical confidence interval. It takes more evidence to pin down an extreme rating than a middle-of-the-road rating.

Tomorrow: I’ll try an analysis in which the ratings of unchanged bots are carried over from AIIDE 2015 to CIG 2016, so that we can compare between tournaments. I’m not sure how well it will work, or even if I can get it to work at all, but it will be interesting to try.

Trackbacks

No Trackbacks

Comments

tscmoo on Tuesday, October 4. 2016:

Yey, I still win

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA