Monthly grading moves closer as ECF embarks on consultation process

Monthly grading moves closer as ECF embarks on consultation process

The ECF is consulting about grades being calculated and published monthly instead of twice a year. Details are on the ECF website, but the main document is reproduced below.

There is also a suggestion to adopt the Elo system, so that would mean 4 digit grades, instead of 3.

The move to monthly grading lists would likely be especially popular with those who regularly play in congresses and also many congress organisers. It might also be attractive for 4NCL given the relative infrequency of their matches – compared with typical local chess leagues.

However, many leagues across the country would not be able to cope with grades changing every month.

Other considerations include how regularly club internal results would be processed and what system. Also to be debated would be potential upgrading of LMS to cope.

If Elo was adopted it is unclear how far back games would be considered. E.g. there would be likely concerns to go beyond the current 3 years, say to 10. Also will the current ECF categories that give an idea of grade reliability be lost?

This important topic will be brought up for discussion at both Dorset County and B&DCL Committee Meetings and members are encouraged to have views which can be expressed by their representative. At the same time the ECF are keen to get individual views.


Here is the text on the ECF website (13 Feb ’19)

Monthly grading proposal

Proposal for calculating ECF Grades on a monthly basis

Objective
The ECF Board have decided to institute monthly grading lists. I have advised and the board have accepted that an Elo system would be the most appropriate algorithm, although there is a consultation on this. There is a lot of detail missing from this proposal which can be sorted when more is settled about administration. This document gives a direction of travel.

The Board have indicated that —

  1. The process would produce monthly lists rather than less regular lists with a “live list”
  2. As many games as possible should be graded.
  3. At least in the early stages, there should be tolerance of delayed reporting, particularly for leagues and internal club results.

Background
There will need to conversations with interested parties on what might replace the categories used in the current list. At present a player becomes inactive as opposed to active, if there have been no graded results in the latest year. At present there is a category F grade which gives an indication of strength, but is not treated as an acceptable grade for some purposes. These is issues are parked for when the ECF is closer to settling the mechanics. A proposal of those mechanics is set out below based on statistical analysis aiming at the Board’s requirements.

Calculation routine

  1. At inception of the Elo system current grades will be converted using the formula:  new = 7.5*old + 700 (assuming the ECF want to move to a 4-digit system aligned at that point to FIDE)
  2. There will be extended deadlines for the July list to facilitate late reporting and corrections. This will be the principal list.
  3. Each month, all previous monthly lists after the last principal list will be recalculated to take into account late reported results.
  4. All results dated in the last month, plus some brought forward (see below) are collected and current grades assigned to each player. These records are then duplicated so that there is a record of each result is considered for a player point of view with each opponent.
  5. These half-results are split into 4 groups: both graded(“gg”), only the player graded(“gu”), only the opponent graded(“ug”), both ungraded(“uu”)
  6. The grading formula is R1 = R0 +k(W-We) where R0 is the grade for the previous month, k is 20 or 40, W is the player’s total score for results in the month, We is the expected total score based on the FIDE table 8.1b – https://www.fide.com/fide/handbook.html?id=197&view=article
  7. The ug results are first graded and these grades are inserted into the gu and ug groups. These groups are then combined with the gg group and all these results are then graded for each player.
  8. The uu group are carried forward to the next month for potential grading then.
  9. Each ungraded player will be deemed to have drawn with an 1850 graded opponent as an extra result on initial grading.
  10. The k factor will be 20 except that in a month where a junior player has outperformed expectation, then the k factor will be 40.
  11. These grades are to be used for grading calculation only. Some sifting will be required for seeding or section limits, as outlined above.

Discussion
I have much affection with the current ECF system and would defend it against alternatives for six monthly lists. However it does not outperform Elo and the efforts to maintain a 30-result average, where possible, makes it relatively cumbersome. For adult players there would be no reason to change, but the junior calculation is not fit for purpose on a monthly basis. The smaller samples obtained increase the, likely rare, unconnected and therefore ungradable groups not linked to the main network. The calculation can also be unreliable where the link to a graded player is through other ungraded players. Finally the theory assumes that results are all independent which is acceptable in large samples, but at the other extreme two players just playing each other gives rise to total dependence.

Moving to Elo means that players only have to understand one formula whether FIDE or ECF.

At present all half results are allocated a grading and this method continues this practice. As I understand it monthly lists are advocated for players who want more instant updates on performance, it would seem that this argument is stronger for that initial grade.

Various methods were investigated by grading all results for the years 2012-2016, both standard play and rapid play, and testing the progressions. The key indicators were: inflation or deflation measured in several ways including a split by seniors and juniors, “stretch” being the underperformance of the stronger player against their expectation, percentage of results graded, volatility of monthly change, and number of active graded players.

Both the FIDE and Yorkshire Grading System worked well for adult players, but did not handle junior improvement satisfactorily. The Yorkshire system is more complex, and it is not clear how to fix the junior problem.

The FIDE system in an English context is materially deflationary for junior player grades with contamination into adult grades. There are two reasons for this embedded in its methodology: 1. Where juniors play each other both get a k factor of 40; so there is reallocation with no recognition of general improvement. 2. The calculation of initial rating is biased downwards. Also since players must wait 5 rated games to be rated just under 10% of half results are discarded. The “stretch” in FIDE ratings deteriorates over 5 years. The ECF stretch is fairly good and this is embedded in the starting grades. There are signs that the FIDE stretch performance is levelling off, but the modifications proposed slightly improve on current ECF.

The proposal can be viewed as a modified FIDE system addressing these issues. The inclusion of an initial dummy result is an idea that Jeff Sonas has aired for the FIDE method. It provides stability to the list and since most of the new players will be juniors, tends to anticipate their improvements. The 1850 was selected as it optimises the stretch statistic in the fifth year and maintains the average grade. The k=40 boost for overperforming juniors appears, over the period, to inject the right amount for junior improvement to stabilise average list grade.

The use of an initial grade based on one result continues what happens under the existing method. It is quite clear that this grade would be inappropriate for seeding or section limits. The FIDE system uses a higher k factor for the first 30 rated results and does not rated anyone without 5 rated results. It appears that the higher k factor period adds as much noise as it does for reliability. There is no evidence to suggest that the period improves any of the key indicators mentioned above. The proposal goes for simplicity and inclusion.

As grading manager for many years, I am aware of how fragile a grading system can be. Stability depends on a reasonably constant profile of entrants and exits. Fortunately this has been the case in recent times. This proposal meets the enduring issues encountered in all rating systems in a relatively simple way, focusing on the key issues. It appears to work well over a recent 5 year period.

— Brian Valentine, Manager of ECF Grading