solidity in AIIDE 2020 - part 4

I computed what I decided to call the upset deviation, which you can take as the average deviation of actual from expected win rate due to upsets. An upset pairing I defined as one where you outscore a stronger opponent (do better than expected) or underscore a weaker opponent (do worse than expected). Theoretically, smaller numbers are more “solid” and bigger numbers are more “daring”. The table also carries over the rms deviation from yesterday.

To summarize the procedure I followed: 1. Compute elo ratings for each participant in a tournament. 2. Using the elo ratings, you can compute expected win rates for each pairing. 3. For each pairing, the difference between the actual tournament result and the expected win rates is the deviation. 4. Square each deviation and calculate the sum of the squares. 5. Extract the pairings which are upsets and calculate the sum of those squares. 6. The upset ratio is the sum of the upset squares as a ratio of the entire sum of squares. 7. For each participant, given the deviations, compute the rms deviation, which is a kind of average of the deviations. Some people may not know what RMS is: It stands for root mean square, which means you square each number, find the arithmetic mean of the collection, then restore the original scale by taking the square root of the result. 8. Multiply the upset ratio by the rms deviation to get the upset deviation.

bot	rms deviation	upset ratio	upset deviation
stardust	7.6%	74.9%	5.7%
purplewave	11.3%	39.6%	4.5%
bananabrain	9.9%	65.2%	6.5%
dragon	15.5%	72.9%	11.3%
mcrave	13.6%	10.1%	1.4%
microwave	15.5%	23.5%	3.6%
steamhammer	16.9%	71.2%	12.0%
daqin	21.2%	63.8%	13.5%
zzzkbot	23.7%	65.7%	15.6%
ualbertabot	9.0%	27.8%	2.5%
willyt	15.7%	33.8%	5.3%
ecgberht	14.7%	52.7%	7.8%
eggbot	6.8%	69.6%	4.8%

The upset ratio has some interest in itself, so I included it. It doesn’t say how big the upsets were, it says what proportion of the (squared) deviations were due to upsets. You have to interpret the percentage as a ratio. The upset deviation then also recognizes how big the upsets were. In this case, you interpret the percentage as the average deviation from expected win rate due to upsets. The whole procedure is ad hoc and of questionable rigor but all the steps are logical and the results make sense to me. Can anybody suggest an improved method?

By this metric, Dragon, Steamhammer, DaQin, and especially ZZZKBot are the “daring” players in this group. McRave and UAlbertaBot are the most “solid”.

Next: Steamhammer’s bugs.

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

Name*

Homepage

Comment*

In reply to

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA