SSCAIT initial and current Elo ratings

I’m still working on Elo curves over time, but today I have Elo ratings for each bot in the SSCAIT data at the beginning and end of its career. Here is yesterday’s table plus the new info, now sorted by decreasing current rating—the bot’s real strength yesterday as best we can measure. The topmost ratings are, to my surprise, exactly in the order I expected!

To make the ratings easier to interpret, I added two columns labeled “expect”. These are the expected winning rate of the bot against the average opponent. The rating system is designed so that the average Elo rating is constant at 1500, and it’s easy to compute the expected winning rate against an opponent rated 1500. The constant average rating, by the way, means that a bot which remains the same can see its rating decline over time if its opponents improve.

Ratings are not accurate for bots with a very small number of games. I plan to exclude those bots from the curves over time.

		initial		current
bot	win %	Elo	expect	Elo	expect	games	earliest	latest
krasi0	68.77%	1593	63.07%	2163	97.85%	2142	2015 Nov 30	2016 Sep 27
Iron bot	77.74%	1580	61.31%	2081	96.59%	1999	2015 Nov 27	2016 Sep 26
Marian Devecka	58.66%	1790	84.15%	2065	96.28%	6289	2013 Dec 25	2016 Sep 27
Martin Rooijackers	68.50%	1840	87.62%	2011	94.99%	7290	2014 Jul 28	2016 Sep 27
tscmooz	79.80%	1823	86.52%	1991	94.41%	5006	2015 Feb 27	2016 Sep 27
tscmoo	72.06%	1838	87.50%	1978	94.00%	5719	2015 Jan 22	2016 Sep 27
LetaBot CIG 2016	75.68%	1748	80.65%	1932	92.32%	444	2016 Aug 01	2016 Sep 27
WuliBot	72.76%	1773	82.80%	1871	89.43%	984	2016 Apr 19	2016 Sep 26
Simon Prins	55.48%	1513	51.87%	1867	89.21%	5431	2015 Jan 25	2016 Sep 27
ICELab	81.12%	2189	98.14%	1865	89.10%	8344	2013 Dec 25	2016 Sep 27
FlashTest	69.44%	1744	80.29%	1863	88.99%	216	2016 Mar 22	2016 Jul 27
Sijia Xu	71.65%	1850	88.23%	1849	88.17%	2328	2015 Oct 10	2016 Sep 27
LetaBot SSCAI 2015 Final	65.87%	1710	77.01%	1813	85.84%	416	2016 Aug 04	2016 Sep 27
Dave Churchill	75.48%	1985	94.22%	1804	85.19%	8275	2013 Dec 25	2016 Sep 27
Chris Coxe	73.10%	1754	81.19%	1800	84.90%	2201	2015 Sep 03	2016 Sep 27
Tomas Vajda	79.37%	2169	97.92%	1790	84.15%	8372	2013 Dec 25	2016 Sep 27
Flash	65.69%	1458	43.98%	1777	83.13%	991	2016 Apr 18	2016 Sep 27
LetaBot IM noMCTS	60.93%	1645	69.73%	1766	82.22%	1226	2016 May 18	2016 Aug 01
Zia bot	52.24%	1568	59.66%	1757	81.45%	536	2016 Jul 07	2016 Sep 27
A Jarocki	62.77%	1711	77.11%	1741	80.02%	932	2015 Oct 04	2016 Jan 26
PeregrineBot	57.29%	1692	75.12%	1728	78.79%	1276	2016 Feb 09	2016 Sep 10
tscmoop	78.16%	1895	90.67%	1721	78.11%	1992	2015 Nov 11	2016 Sep 26
Andrew Smith	65.00%	1705	76.50%	1718	77.81%	8391	2013 Dec 25	2016 Sep 27
Florian Richoux	62.11%	1770	82.55%	1716	77.62%	8203	2013 Dec 25	2016 Sep 27
Carsten Nielsen	66.08%	1708	76.81%	1695	75.45%	4711	2015 Mar 17	2016 Sep 27
Soeren Klett	63.62%	2068	96.34%	1687	74.58%	8277	2013 Dec 25	2016 Sep 27
Vaclav Horazny	37.35%	1066	7.60%	1686	74.47%	6455	2013 Dec 25	2015 Nov 18
La Nuee	51.61%	1499	49.86%	1662	71.76%	558	2015 Dec 13	2016 Mar 18
Jakub Trancik	45.08%	1755	81.27%	1657	71.17%	8416	2013 Dec 25	2016 Sep 27
Marek Suppa	51.85%	1746	80.47%	1655	70.94%	4413	2015 Jan 05	2016 Mar 18
Krasimir Krystev	70.52%	2033	95.56%	1653	70.70%	6510	2013 Dec 25	2016 Mar 10
ASPbot2011	49.78%	1671	72.80%	1652	70.58%	227	2015 Jan 29	2016 Feb 25
Marcin Bartnicki	60.42%	1855	88.53%	1633	68.26%	1435	2014 Nov 28	2016 Mar 18
Tomas Cere	61.11%	1888	90.32%	1631	68.01%	8373	2013 Dec 25	2016 Sep 27
MegaBot	49.40%	1576	60.77%	1630	67.88%	419	2016 Aug 01	2016 Sep 27
Aurelien Lermant	58.26%	1688	74.69%	1622	66.87%	3687	2015 Jun 22	2016 Sep 27
Matej Kravjar	49.57%	1723	78.31%	1619	66.49%	3234	2013 Dec 25	2015 Feb 18
Daniel Blackburn	43.79%	1651	70.46%	1605	64.67%	6883	2013 Dec 25	2016 Jan 26
Gabriel Synnaeve	45.96%	1737	79.65%	1584	61.86%	1658	2013 Dec 25	2015 Nov 24
David Milec	49.09%	1552	57.43%	1566	59.39%	55	2015 Jan 13	2015 Jan 20
Odin2014	55.65%	1659	71.41%	1565	59.25%	5648	2014 Dec 21	2016 Sep 11
Gaoyuan Chen	48.05%	1582	61.59%	1559	58.41%	5118	2015 Feb 10	2016 Sep 27
Henri Kumpulainen	38.81%	1447	42.43%	1553	57.57%	894	2016 Jan 13	2016 May 31
Martin Dekar	33.14%	1429	39.92%	1533	54.73%	4910	2013 Dec 25	2016 Jan 25
Serega	48.20%	1771	82.64%	1505	50.72%	3803	2015 Jan 31	2016 Jan 26
Chris Ayers	35.53%	1610	65.32%	1481	47.27%	1520	2015 Aug 10	2016 Jan 26
Nathan a David	39.34%	1446	42.29%	1481	47.27%	1004	2016 Feb 23	2016 Aug 08
DAIDOES	34.02%	1370	32.12%	1471	45.84%	485	2016 Jun 13	2016 Sep 08
FlashZerg	0.00%	1474	46.27%	1459	44.13%	7	2016 Apr 24	2016 May 12
Igor Lacik	39.32%	1608	65.06%	1454	43.42%	8073	2013 Dec 25	2016 Sep 08
Matej Istenik	44.74%	1709	76.91%	1449	42.71%	8297	2013 Dec 25	2016 Sep 27
EradicatumXVR	40.88%	1537	55.30%	1443	41.87%	4687	2013 Dec 25	2016 Jan 23
Ibrahim Awwal	30.57%	1510	51.44%	1437	41.03%	530	2013 Dec 25	2014 Mar 24
Tomasz Michalski	27.02%	1314	25.53%	1432	40.34%	433	2015 Dec 22	2016 Mar 18
Oleg Ostroumov	48.75%	1714	77.41%	1431	40.20%	3641	2013 Dec 25	2016 Jan 26
NUS Bot	35.72%	1482	47.41%	1426	39.51%	3337	2015 May 19	2016 Sep 06
Martin Pinter	28.98%	1409	37.20%	1425	39.37%	3740	2013 Dec 25	2015 Dec 11
Roman Danielis	45.63%	1688	74.69%	1417	38.28%	5155	2013 Dec 25	2016 Sep 26
ZerGreenBot	22.22%	1404	36.53%	1416	38.14%	36	2016 Sep 22	2016 Sep 27
Rafael Bocquet	0.00%	1450	42.85%	1415	38.01%	10	2015 Jun 23	2015 Jun 26
Flashrelease	0.00%	1449	42.71%	1413	37.73%	8	2016 Apr 24	2016 Apr 24
Marek Kadek	37.29%	1557	58.13%	1413	37.73%	7641	2013 Dec 25	2016 May 22
Ian Nicholas DaCosta	37.12%	1394	35.20%	1404	36.53%	2928	2015 Apr 27	2016 Sep 08
AwesomeBot	29.81%	1326	26.86%	1403	36.39%	473	2016 Jun 16	2016 Sep 08
Radim Bobek	23.37%	1315	25.64%	1390	34.68%	1151	2015 Oct 01	2016 Mar 06
Adrian Sternmuller	26.89%	1436	40.89%	1375	32.75%	4529	2013 Dec 25	2016 Jul 22
Martin Strapko	19.76%	1388	34.42%	1366	31.62%	3386	2013 Dec 25	2016 Jan 26
Maja Nemsilajova	23.81%	1365	31.49%	1363	31.25%	4246	2013 Dec 25	2015 Nov 29
Johan Kayser	24.46%	1294	23.40%	1361	31.00%	413	2016 Jul 29	2016 Sep 27
UPStarcraftAI	24.75%	1346	29.18%	1360	30.88%	610	2015 Dec 24	2016 Apr 13
Martin Vlcak	28.92%	1370	32.12%	1353	30.02%	1224	2016 Feb 16	2016 Sep 07
Johannes Holzfuss	35.04%	1531	54.45%	1351	29.78%	685	2016 Mar 05	2016 Jun 15
Vojtech Jirsa	14.14%	1186	14.09%	1350	29.66%	2786	2015 Jan 12	2015 Sep 05
JompaBot	21.99%	1316	25.75%	1349	29.54%	1055	2016 Feb 04	2016 Aug 13
Rob Bogie	31.34%	1335	27.89%	1346	29.18%	651	2016 May 14	2016 Sep 06
Christoffer Artmann	20.51%	1289	22.89%	1344	28.95%	395	2016 Aug 07	2016 Sep 27
Marek Gajdos	22.69%	1251	19.26%	1331	27.43%	1384	2016 Jan 30	2016 Sep 11
Travis Shelton	23.59%	1390	34.68%	1314	25.53%	1221	2016 Feb 28	2016 Sep 06
Peter Dobsa	13.25%	1227	17.20%	1307	24.77%	3027	2015 Jan 11	2015 Oct 02
VeRLab	17.06%	1241	18.38%	1304	24.45%	897	2016 Feb 28	2016 Aug 01
Andrej Sekac	11.76%	1359	30.75%	1296	23.61%	68	2013 Dec 25	2014 Jan 04
Bjorn P Mattsson	22.22%	1351	29.78%	1295	23.50%	4442	2015 Apr 05	2016 Sep 27
Lukas Sedlacek	22.86%	1344	28.95%	1293	23.30%	70	2015 Jan 12	2015 Jan 20
Sergei Lebedinskij	13.30%	1178	13.55%	1293	23.30%	1083	2015 May 28	2015 Sep 03
Vladimir Jurenka	38.45%	1635	68.51%	1278	21.79%	6167	2013 Dec 25	2016 Sep 27
neverdieTRX	20.66%	1265	20.54%	1272	21.21%	334	2016 Jul 19	2016 Sep 10
OpprimoBot	21.85%	1321	26.30%	1256	19.71%	2009	2015 Nov 18	2016 Sep 27
Marek Kruzliak	14.45%	1151	11.83%	1255	19.62%	934	2013 Dec 25	2015 Jan 20
Sungguk Cha	18.65%	1207	15.62%	1250	19.17%	697	2016 Jun 05	2016 Sep 27
Jacob Knudsen	20.53%	1083	8.31%	1247	18.90%	1257	2016 Feb 23	2016 Sep 10
Ludmila Nemsilajova	16.04%	1133	10.79%	1228	17.28%	505	2013 Dec 25	2015 Jan 21
Karin Valisova	17.68%	1238	18.12%	1226	17.12%	1171	2013 Dec 25	2016 Jan 26
HoangPhuc	15.67%	1132	10.73%	1209	15.77%	300	2016 Jul 18	2016 Sep 07
Sebastian Mahr	15.06%	1205	15.47%	1182	13.82%	1202	2016 Jan 13	2016 Aug 08
Jan Pajan	14.48%	1210	15.85%	1179	13.61%	1119	2013 Dec 25	2016 Jan 05
Pablo Garcia Sanchez	12.20%	1123	10.25%	1174	13.28%	590	2015 Dec 24	2016 Apr 13
Ivana Kellyerova	11.47%	1129	10.57%	1131	10.68%	1630	2013 Dec 25	2015 Apr 01
Lucia Pivackova	13.29%	1111	9.63%	1090	8.63%	835	2013 Dec 25	2015 Jan 20
Tae Jun Oh	4.55%	1069	7.72%	1036	6.47%	154	2016 Mar 22	2016 Apr 11
Denis Ivancik	10.76%	1102	9.19%	1022	6.00%	502	2013 Dec 25	2015 Jan 20
ButcherBoy	4.74%	921	3.45%	970	4.52%	422	2016 Jun 21	2016 Sep 06
Jon W	5.06%	920	3.43%	964	4.37%	790	2015 Apr 30	2015 Jul 09
Matyas Novy	6.32%	1130	10.62%	885	2.82%	1693	2015 Feb 04	2015 Jul 09

How did I get the initial ratings? I had a cute idea. One of the issues with computing Elo ratings over time is: How do you initialize the ratings? Most systems either start everybody with the same rating, which makes an ugly graph, or use a different and less accurate method to estimate the rating in early games. But in this case I have the whole data set in hand. I set the final rating of every bot to the same rating and computed ratings backwards in time to find an initial rating. Then I threw away everything except the initial rating, and calculated the real ratings forward in time to find the ratings over time and the final ratings. That way every data point is equally good, from beginning to end. I doubt I’m the first to think of it, but it’s a cute idea and I’m pleased.

Next: I’ll find some sensible way to plot the curves. Stand by!

Trackbacks

No Trackbacks

Comments

imp on Wednesday, September 28. 2016:

"The topmost ratings are, to my surprise, exactly in the order I expected!"
- that statement just made it into my quote library ;)

krasi0 on Thursday, September 29. 2016:

Calculating the initial ratings "backwards" is a neat idea! :) Still do you think that the rankings would change if you start with a constant of, say, 1500 for each bot like in the typical scenario?

Jay Scott on Thursday, September 29. 2016:

The final ratings should only change a little if the initial ratings are 1500. The point of finding a good initial rating is to have a useful graph from beginning to end. That’s the theory, anyway. But it was only a one-line change to test it in my code, and... I hit a mystery bug, the kind that Should Never Happen. Not sure whether it affects the original results. Stand by while I solve it.

Jay Scott on Thursday, September 29. 2016:

Whew, it wasn’t a real bug, I was only confused. Setting the initial rating to 1500 causes the final ratings to be slightly compressed, so the top rating and bottom rating are around 15 points closer to the average. It’s a bigger effect than I expected, but still small. It could cause a prediction error of 2%.

Jay Scott on Thursday, September 29. 2016:

Oh, and the rankings do change some, but not in the top 10 places.

krasi0 on Thursday, September 29. 2016:

ELO ratings should also be able to provide insights into the expected chance of winning against any opponent given both players' ratings. When you have time, you could compile a (huge) cross-table with those :)

Additionally, we could investigate alternative rating schemes like Truskill and others. Here is a link with comparison of some of them: https://rankade.com/ree#ranking-system-comparison

Jay Scott on Thursday, September 29. 2016:

All the mathematically sound rating systems that I know of are variations of Elo’s original. The differences amount to trying to squeeze a little more information out of each game by modeling more closely how playing strength varies in the real world, both between players and for a given player over time. We can definitely do better than straight Elo, because we have some prior knowledge. We know that IceBot doesn’t learn and it hasn’t been updated since its last upload in 2013, so its playing strength is constant over that time. We even have theoretical limits on how fast some learning bots can learn, because we know how they work. My intuition is that if we use our prior knowledge, we will surely get more accurate ratings... but only modestly more accurate. I think the major limit on the accuracy of Elo family systems is that they assume that playing strength is a single value, so they inherently can’t cope with intransitivity where A>B, B>C and C>A, which happens A LOT. Intransitivity distorts ratings because then ratings depend on the number of games. For example in the ABC example if A vs B happens to be more frequent (C was uploaded later), then the rating system finds more evidence that A>B than the other two, increasing A’s rating and decreasing B’s and C’s. For the same reason, adding a new player to the pool may change the relative ratings of existing players. Rating systems are great and Elo is accurate on average, but it’s not a cure-all, for fundamental reasons. And my feeling is that we can do a little better than straight Elo, but not much better.

krasi0 on Thursday, September 29. 2016:

Yes, I completely agree with your analysis. Still if you can come up with any other useful insights that we could capture from the vast games history, please share :)

Jay Scott on Thursday, September 29. 2016:

The game scores are virgin territory. Later I’ll explore there.

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA