I am hosting an unrated tournament on November 1st in Chicago (
CUSS WORDS!) and have been working with Dave Turissini on a class system that would reflect a player's performance relative to their expectation for any given game. The system uses all of the score data available since the new dictionary and calculates an expected win value relative to the opponent.
If I sit across the table from someone who is rated 200 points lower than me, I am expected to win, but by how much? This system would calculate that value using historical data and then give a ranking based on how I do relative to that expectation.
Dave Turissini has done a lot of work on this and is much smarter than I am, so I have asked him to include some notes. They are below the cut:
QURSH (QUantile Rankings from Score Histories) is sort of a handicapping tool to compare performance across different skill levels. The system can be employed in open tournaments as an alternative to class prizes and involves comparing game results to historical games played between people of comparable ratings. For each game, the rating difference between players determines which distribution will be used. Currently, I'm using bins of 0-50, 51-100, 101-150, 150-200, 200-300, and > 300. Game data is also subdivided based on the higher rated person's rating (max rating) into the following groups: < 1000, 1001-1500, and > 1500. Distributions are based on the score of the lower rated player subtracted from the higher rated player and can be negative. The quantile of the distribution for a score difference is then recorded as a measure of how well each player did based on expectations. Quantile values range from 0-100, and the lower rated player's score is 100 minus the score of the higher rated player. For example if player A is rated 1800 and player B is rated 1675, player A will need to win by 20 points to receive a score of 50. If A won by 10, their score would be 46, and B's would be 54. Taking the average quantile score for all games then gives a score for the tournament.
I'm only using games with score information from cross-tables played after March 1, 2006 to account for dictionary changes and rating deflation. Using data received in early June, all but one of the historical distributions (max rating < 1000 and rating difference > 300) has at least 1000 games. The distributions look surprisingly similar between the three max rating groups (only a few comparisons have significant t-test and Mann-Whitney U test p-values after bonferroni corrections), but I decided to split them up because of differences in the tails of the distributions. Expert level players tend to have more blowout games than sub 1000 level players (but not by much).
I processed the results of this years Dallas Open using three max rating groups and using a single group. Results were similar but expert-level players were penalized when a single group was used. Only 14% had higher scores in the single group system compared with all of the < 1000 players. Here are the 10 highest scores from the DO:
Rating Score Player
1838 | 63.33 |Peter Armstrong
1733 | 61.33 |Cecilia Le
1668 | 61.00 |Sam Dick-Onuoha
1828 | 60.05 |Doug Brockmeier
988 | 59.95 |Angela Dancho
1956 | 59.43 |Jason Katz-Brown
910 | 58.90 |Scott Hawkins
1923 | 58.43 |Joel Sherman
1248 | 58.33 |Mark Gooley
1524 | 58.05 |Keith Hagel
I like this system because it allows for comparisons across skill levels without relying on ad hoc methods. Instead each person's performance is compared against thousands of historical games played between similar players. It's likely that underrated or extremely lucky players will end up with the highest score, but those problems will be present in any system. Also as more game data becomes available the system can be expanded to look at distributions based on smaller rating differences or to break the > 300 bins into smaller pieces.
Here are some nifty histograms associated with the historical data:
http://tinyurl.com/3vjunr (EDIT: so it appears as if I cannot make public my histogram pdf on-line. If you would like to see the data we used to come up with the system, just e-mail me at aiBRETTy at GMail dot com.)
Now this obviously would be a supplemental prize. Depending on how one wanted to award it, you could just say whoever wins it wins it (Peter A above), or the highest person out of the money CeLe in this example).
Dave is working on a script (in the R programming language) that will use tsh outputs to do the calculations automatically and include the QURSH rankings on the wallcharts. After the tournament, I will probably post how successful everything was and hopefully clean up any problems.
If you have any comments, feel free to post them. Try to avoid the usual "you guys are idiots" response.
PS - I was personally a fan of the acronym HExAd Class System (standing for Historical Expectation-Adjusted Class System), but that's not my call.