archive by month
Skip to content

solidity in AIIDE 2020 - part 4

I computed what I decided to call the upset deviation, which you can take as the average deviation of actual from expected win rate due to upsets. An upset pairing I defined as one where you outscore a stronger opponent (do better than expected) or underscore a weaker opponent (do worse than expected). Theoretically, smaller numbers are more “solid” and bigger numbers are more “daring”. The table also carries over the rms deviation from yesterday.

To summarize the procedure I followed: 1. Compute elo ratings for each participant in a tournament. 2. Using the elo ratings, you can compute expected win rates for each pairing. 3. For each pairing, the difference between the actual tournament result and the expected win rates is the deviation. 4. Square each deviation and calculate the sum of the squares. 5. Extract the pairings which are upsets and calculate the sum of those squares. 6. The upset ratio is the sum of the upset squares as a ratio of the entire sum of squares. 7. For each participant, given the deviations, compute the rms deviation, which is a kind of average of the deviations. Some people may not know what RMS is: It stands for root mean square, which means you square each number, find the arithmetic mean of the collection, then restore the original scale by taking the square root of the result. 8. Multiply the upset ratio by the rms deviation to get the upset deviation.

botrms
deviation
upset
ratio
upset
deviation
stardust7.6%74.9%5.7%
purplewave11.3%39.6%4.5%
bananabrain9.9%65.2%6.5%
dragon15.5%72.9%11.3%
mcrave13.6%10.1%1.4%
microwave15.5%23.5%3.6%
steamhammer16.9%71.2%12.0%
daqin21.2%63.8%13.5%
zzzkbot23.7%65.7%15.6%
ualbertabot9.0%27.8%2.5%
willyt15.7%33.8%5.3%
ecgberht14.7%52.7%7.8%
eggbot6.8%69.6%4.8%

The upset ratio has some interest in itself, so I included it. It doesn’t say how big the upsets were, it says what proportion of the (squared) deviations were due to upsets. You have to interpret the percentage as a ratio. The upset deviation then also recognizes how big the upsets were. In this case, you interpret the percentage as the average deviation from expected win rate due to upsets. The whole procedure is ad hoc and of questionable rigor but all the steps are logical and the results make sense to me. Can anybody suggest an improved method?

By this metric, Dragon, Steamhammer, DaQin, and especially ZZZKBot are the “daring” players in this group. McRave and UAlbertaBot are the most “solid”.

Next: Steamhammer’s bugs.

Trackbacks

No Trackbacks

Comments

No comments

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Form options

Submitted comments will be subject to moderation before being displayed.