 |
The IGS Rating System - Table of contents (first draft)
(Written by John Bate, bate on the IGS)
Title Subject
Introduction.........................intro
Computing Handicaps..................computing
How Ratings are Calculated...........calculated
Mathematical Details.................math
An Example...........................example
...........................example2
Cycles in the rating system..........cycles
Entry into the ratings system........entry
Common Questions.....................FAQ
A Few Suggestions....................suggestions
If you type help <Subject>, you will get to the subject about the ratings.
Example: help intro (to read about the intoduction)
The IGS Rating System
Intro ---
A game of Go is most enjoyable when the players are evenly matched, and each
must play to the best of his or her ability in order to win. This requires
that an appropriate handicap be used, which in turn requires an accurate
estimation of the strengths of the players. This is the intended purpose of
the rating system. Consistently using the differences in the players' ratings
to determine handicaps will, in the long run, provide the most interesting and
challenging games for everyone. Also, by using the rating system, each player
should eventually win about 50% of his or her games.
To use the rating system, set your RANK to a reasonable value (using the
"rank" command) and then begin playing games. You will receive a numerical
RATING which will automatically be updated, usually once per day. (A rating of
32.00 corresponds to a rank of 1 dan.) The "stats" command will display both
your rating and your rank. Note, once you have a numerical
rating, your RANK and your RATING are completely independent. You may set your
RANK manually to any desired value, but it will not have any effect at all on
your RATING, which is computed solely from the results of your games. Your
RANK is used only to initialize your RATING to a reasonable starting value.
Computing ---
Computing Handicaps
A rating difference of 1.00 corresponds to a difference of 1 handicap stone,
which is considered to have a value of about 10 points. An even game (with 5
komi) is correct when there is no difference in the players' ratings. A no-
komi game gives Black an advantage of about 5 points, or 1/2 stone, which is
perfect when the rating difference is 0.5. A 2-stone game (without komi) gives
Black an additional advantage of exactly one stone, and so this is appropriate
when the rating difference is 1.5 (*not* 2.0). In general, an N-stone handicap
is ideal for a rating difference of N-1/2. The following table may be used to
determine the most reasonable handicap.
Rating Difference Handicap Komi Ideal Rating Diff
0.0-0.25 0 5 0.0
0.25-1.0 0 0 0.5
1.0-2.0 2 0 1.5
2.0-3.0 3 0 2.5 etc.....
Note that the proper number of handicap stones may be found by taking the
rating difference *UP* to the next integer. There is a tendency to use smaller
handicaps, giving White the advantage. This is fine, but if Black is to have a
truly even chance (and White is willing to lose half the games) then the above
table should be used. For even more accuracy, the komi may be adjusted by 1
point for each 0.1 that the rating difference departs from the ideal.
Calculated:
How Ratings are Calculated
==========================
In this section, the operation of the ratings system will be described in
general terms. The exact mathematical details will be given later for those
who are interested.
Each player has a "seed" rating. When a player first enters the rating system,
the seed is set to the stated RANK value (which is the ONLY time that the rank
is used). After that, the seed is changed to the current RATING value on a
regular basis. A "likelihood" is assigned to every possible rating for a given
player. This likelihood is highest at the seed and decreases on either side of
it.
Each game between two players in the rating system is also assigned a
likelihood value which depends on the handicap, the komi, the RATINGS of the
two players, and the winner of the game. (Other factors, such as the margin of
victory, are ignored.) The system will determine which player had the
advantage based on the handicap, the komi, and the players' ratings. The
likelihood will be 0.5 if the game was exactly even (in other words, if the
handicap was ideal for those ratings). It will be larger if the winning player
had an advantage (the handicap was too small), and smaller if the winning
player was at a disadvantage (the handicap was too large).
The ratings system uses the seeds of all of the players in the rating system
and the results of all the games between these players. It then computes a set
of ratings for ALL of the players AT ONCE which maximizes the total likelihood
of the entire system. This likelihood is the product of the likelihoods of
each player's computed rating, and the likelihoods of the results of each
game, given those ratings.
Here is a good analogy which will help in understanding the ratings system.
Think of each player's seed as a fixed post, and each player's rating as a
movable post attached to the seed by a rubber band. Think of each game as a
spring joining the ratings posts of the two players. If the lower ranked
player won, the spring will be stretched and will try to pull the ratings
closer together. If the higher-ranked player won, the spring will be
compressed and will try to push the ratings apart. The strength of the spring
will depend on the accuracy of the handicap. If the winning player had an
advantage, the spring will be weak, but if the winning player was at a
disadvantage, the spring will be strong. The entire system of posts, rubber
bands, and springs will have a stable position which will determine the
ratings of the players.
Math:
Mathematical Details
The likelihood of a particular player's rating is p(d)=exp(-((d/sigma)^2)/2)
where d is the difference between the rating and the seed, and sigma indicates
the degree of confidence that the seed accurately represents the player's
strength. Sigma begins at 1.0 for new players 9k and above, and decreases by
0.02 for each game that is processed by the rating system, until a minimum of
0.3 is reached. For players 10k abd below, the sigma is 2.00 and has a minimum
of 0.6. (Sharp-eyed readers with a good knowledge of statistics may realize
that this formula should be divided by root-two-pi to be a proper likelihood
function. But this constant factor may be ignored in the ratings system.)
The likelihood of a particular game result is calculated as follows, given
H=handicap stones, K=komi, W=White's rating, B=Black's rating:
Effective Handicap: E = if H=0 then 0.5-0.1*K else H-0.5-0.1*K
This computes the "ideal" ratings difference that would make the game exactly
even. For example, a 2 stone game (H=2, K=0) gives E=1.5 and an even game
(H=0, K=5) gives E=0. This corresponds to the "ideal rating difference" column
in the table in the previous section.
Black's Advantage: A = E - (W-B)
This adjusts the effective handicap by the difference in the strengths of the
players, giving the net advantage for Black (if positive) or White (if
negative). For example, if the difference in ratings is W-B=2.0 and a 2-stone
game is played (E=1.5), then A=-0.5 indicating that White still has a 1/2-
stone advantage.
Likelihood of Black winning: L = if A>=0 then 1.0 - 0.5*((3/4)^(2*A))
else 0.5*((3/4)^(2*-A))
This function gives L=0.5 if the game is exactly even (A=0), higher values if
Black has the advantage (A>0), and lower values if White has the advantage
(A<0). The value of L is between 0 and 1.
Likelihood of the result: G = if Black won then L else 1-L
This gives the likelihood of the observed result. It will be 0.5 if the game
was exactly even, higher if the player with the advantage won, and lower if
the player with the advantage lost.
The system calculates the ratings of all of the players which will maximize
the product of all of the likelihood values (for both players and games). To
avoid numerical problems, the sum of the logarithms of the likelihoods is
used. (The details of the algorithm that does this are beyond the scope of
this document.)
Example:
An Example
Suppose that PlayerA, with a rank of 1d, wins an even game (with 5 komi)
against PlayerB, also with a rank of 1d, and that this is the only game in the
system. The seeds of both players will be initialized to 32.00, and their
sigma values will be 1.0. First, try setting their ratings to 32.00, which
will give likelihoods of 1.00 for both. The advantage of PlayerA in the game
is 0, giving a game likelihood of 0.5 and a total likelihood of 1*1*0.5=0.5.
Now try giving PlayerA a rating of 32.25, and PlayerB a rating of 31.75. This
will give both players a likelihood of 0.969 which is only a little less than
1.0 since these ratings are still close to the seed (the rubber bands haven't
stretched very much). Given these ratings, PlayerA had an advantage of 0.5
which would increase the likelihood of the game (since the player with the
advantage won) to 0.666 giving a total likelihood of .969*.969*.666=0.625
which is higher than the 0.5 produced before. Now try giving PlayerA a rating
of 33, and PlayerB a rating of 31. These values are a lot farther from the
seeds, giving likelihoods of 0.607 (the rubber bands are being stretched a bit
more). PlayerA would now have an advantage of 2.0 which increases the
likelihood of the game to 0.901 (since if PlayerA is really 2 stones stronger,
then the result is very likely). But the total likelihood is only
0.607*0.607*0.901=0.332 which is smaller than before. So this change in the
ratings is too big. By using an iterative search algorithm, the optimum values
may be found, which are ratings of 32.333 for PlayerA and 31.667 for PlayerB,
giving a total likelihood of about 0.635 which is the maximum. If the game had
been played with no komi, then the players' ratings would change by about
0.238 if Black won or 0.454 if White won. (Since it is more likely that Black
will win a no-komi game, it will affect the ratings less when it happens.) It
is left as an exercise for interested readers to verify these values. :-)
Example2:
Below is a *very* typical letter about the ratings on IGS. The person
who wrote me is the '> ' lines and I am the rest of the lines. I am
reluctant to post about ratings system. People read more into the ratings
than the system was designed to accommodate. It is supposed to be
used to setting the proper handicap, that is it. Here is the letter:
>...First example was zb (me :-)) who played Lim and won but got a
>dropped rate by 0.25. If it was due to that Lim was not calculated into the
>system, then there should be NO effect on zb's rating. In no reason a winner
>should have a decreased rate instead. Another example is about players ong and
>fei. fei won a game over ong yesterday but fei got a decreased rating while
>ong got an increased one. You may argue that ong played strong players.
>However before ong got his rate increased, he played fei(4d*), xzhao(5d*), and
>ylwang(NR*) and lost all. As you stated ylwang was NOT calculated at that time
>(as NR*), so the counts for ong should be fei(4d*) and xzhao(5d*), and both had
>rating scores LOWER than ong when they played! Therefore no reason should fei
>got a decreased rating score while ong got an increased one.
Below are the games which affected fei and ong. The above example is just a
tiny sliver of the total information. Below I included the *partial*
games used in computing fei and ong rating. If you can say, by inspection,
what should happen given the data below you are doing a lot better than I.
In order to show any fault in the ratings system, you would have to use
two player which do not play *anyone* else. John Bates did this, and the
ratings system performed as well as can me expected.
If one does a stats Lim (I did a show of Lim), one can see that he has
not played any games, his rank is NR, and he has no rating. Lim has
*nothing* to do with the ratings system. 'zb' rating drop was from something
else in the ratings system, not from Lim. A player's rating can change
just by being active in the system. That is all it takes. Lets say you have
15 people playing each other (they are active). One new person comes in,
as a 6d, and plays and wins. The new person wins so much he becomes an 8d.
As he gets promoted, he pulls all the people he played with him up, also.
The other people do not even have to play to get pulled up. They just have
to have be part of the 'playing network', if you will. For example the new
person plays lyu, lyu plays m6, m6 plays zhong, zhong plays fei, and so on.
If fei plays no games, while the new person is winning, his rating will
increase. We all saw this happen with nyws. Instead of winning, if the
nyws had lost badly, he would have pulled people down.
What this boils down to is some 'local' instability, for global stability.
Players may see their ratings change, a little here and there, but globally
the ratings are accurate.
If you can show me an example with a two players who do not play anyone
else but themselves, then I can look at the behavior and go from there.
> From the above examples I think the rating system on IGS is not working
>appropriately sometimes. I hope you can check it out and make the rating a
>perfect one. Thanks a lot for your time and patience.
I think it is working, from the above example. Or at the very least it is
arguable as to what it is doing.
Here is the data. The format is similar to the 'games' command.
Most of the info is kept with the player, so 'NR' means nothing,
as does 'xx' and the bogus score at the end.
0016 xzhao [ NR ](W) : player1 [ NR ](W) H 2 K 0.5 xx W 0.0 B 0.0
0016 ong [ NR ](W) : player4 [ NR ](W) H 4 K 0.5 xx W 0.0 B 0.0
0016 player1 [ NR ](B) : xzhao [ NR ](B) H 2 K 0.5 xx W 0.0 B 0.0
0016 fei [ NR ](W) : player2 [ NR ](W) H 0 K 5.5 xx W 0.0 B 0.0
0016 player1 [ NR ](B) : xzhao [ NR ](B) H 2 K 0.5 xx W 0.0 B 0.0
0016 player4 [ NR ](B) : player5 [ NR ](B) H 0 K 0.5 xx W 0.0 B 0.0
0016 player1 [ NR ](B) : xzhao [ NR ](B) H 2 K 0.5 xx W 0.0 B 0.0
0017 ylwang [ NR ](W) : ong [ NR ](W) H 0 K 5.5 xx W 0.0 B 0.0
0017 xzhao [ NR ](W) : ong [ NR ](W) H 0 K 5.5 xx W 0.0 B 0.0
0017 fei [ NR ](B) : ong [ NR ](B) H 0 K 0.5 xx W 0.0 B 0.0
0015 player1 [ NR ](W) : player3 [ NR ](W) H 0 K 0.5 xx W 0.0 B 0.0
0015 player3 [ NR ](W) : player4 [ NR ](W) H 0 K 0.5 xx W 0.0 B 0.0
0015 player1 [ NR ](B) : xzhao [ NR ](B) H 2 K 0.5 xx W 0.0 B 0.0
0015 ong [ NR ](W) : player4 [ NR ](W) H 4 K 0.5 xx W 0.0 B 0.0
0015 xzhao [ NR ](W) : player1 [ NR ](W) H 2 K 0.5 xx W 0.0 B 0.0
0016 player3 [ NR ](B) : player6 [ NR ](B) H 0 K 0.5 xx W 0.0 B 0.0
0018 player7 [ NR ](W) : player3 [ NR ](W) H 0 K 0.5 xx W 0.0 B 0.0
0015 player1 [ NR ](W) : player3 [ NR ](W) H 0 K 0.5 xx W 0.0 B 0.0
0015 player3 [ NR ](W) : player4 [ NR ](W) H 0 K 0.5 xx W 0.0 B 0.0
0016 player4 [ NR ](B) : player5 [ NR ](B) H 0 K 0.5 xx W 0.0 B 0.0
0017 player8 [ NR ](W) : player5 [ NR ](W) H 2 K 0.5 xx W 0.0 B 0.0
Players Ratings:
player1 34 34.2852 0.9400 1
xzhao 36 35.2148 0.5000 1
ong 36 36.6328 0.5000 1
player2 35 34.7031 0.9800 1
fei 35 35.3789 0.5800 1
As you can tell, what happens to player one and player2 affect ong and fei.
So does player3, player4, player5, player6, and player7.
tim
Cycles:
Cycles in the rating system
===========================
The rating system operates in cycles, as follows. At the beginning of a cycle,
each player's seed is initialized to his or her current RATING value, and a
new collection of games is started. During the cycle, each player's rating is
recalculated on a daily basis using the results of all of the games played
since the beginning of the cycle, and the seed values that were set at the
beginning of the cycle. Note that the ratings will change daily, but the
underlying seed values (to which the ratings are attached by "rubber bands")
do NOT change during a cycle. Each day, a new set of games (springs) are added
to the system which will affect the ratings (movable posts), but the seeds
(fixed posts) remain anchored. At the end of the cycle, the seeds are changed
to the computed ratings, and the games are discarded. (All of the tension is
removed from the system by moving the fixed posts to match the movable posts,
and discarding all the springs.) Then a new cycle is started.
The plan is to make each cycle last for one week, so that one entire week's
worth of games are collected before making any permanent changes to the
ratings (by changing the seeds). However, as this document is being written
(March 18, 1993), the ratings system is still in the VERY FIRST cycle which
began when the ratings system was restarted on February 1, 1993. This means
that each player still has the same seed that they had on that date, and the
"rubber bands" will prevent the ratings from moving very far from those seeds.
The system contains a very large number of springs and rubber bands which are
under a considerable amount of tension, making it difficult for any new games
to have much effect at all. Weekly cycles are on Sunday.
*Note: The cycles are now twice a week (1994).
Entry:
To enter into the rating system you need to have your rank set greater
than NR and ???. Once int he rating system, you can change your rank to
whatever you wish, even back to NR or ???. The _initial rank_ is used
as your seed rating.
The only other condition is that you play someone with their rank set
in the same manner, or play someone already in the ratings system to
have the game count.
If one player is not in the rating system and has their rank set to NR
or ???, the game will not count. Both players have to meet the
conditions for the game to count towards ratings. If you play a NR or
??? player, the game may still count. The NR, or ??? player can already
be in the rating system by having previously set a rank and played a
rated game, then set the rank back to NR or ???).
FAQ:
Common Questions
Q: Why did my rating change when I didn't even play that day?
A: As soon as you play one game with another rated player, your rating (post)
is attached to everyone else's ratings through that game (a spring), until the
start of another cycle. Any change in your opponent's rating will pull on the
spring and cause a change in your rating, too. In theory, it is possible for a
game to affect the ratings of every player in the system. For example, if you
(as a 1d) lose an even game to a 3k player, it will cause a fairly large
decrease in your rating. :-( But if that "3k" player then wins several more
even games against 1d and 2d players, this will increase his or her rating
quite a bit, and this in turn will reduce the change in your rating. (Perhaps
that game was not so unlikely after all. :-)
Q: I initially set my rank at 5k, but that turned out to be too low. I changed
it to 1d, but my rating didn't change. Why?
A: When you set your rank and play ONE game against another ranked player,
your seed is initialized and you enter the ratings system. From then on, your
RANK and your RATING are entirely separate. Your RANK is only set manually
with the "rank" command, and your RATING is computed automatically. The moral
of the story: pick a reasonable rank BEFORE your FIRST game. After that it is
too late.
Q: I have played games with other ranked players, but my rating doesn't
change. Why?
A: There is a lower limit (currently 30k) below which ranks and ratings and
games are ignored. (This is because some players use NR and ??? just for
fun, and because NR and ??? are not part of the ratings system anyway.) If your
seed was set to a value lower than this, nothing will ever change it. Start
over with a new account and set your rank to something higher than ??? BEFORE
your FIRST game. (Also look at the answer to the next question.)
Q: What happens if results of a game does not count towards my rating?
: Please make sure the game would be against a rated player.
Q: What do I do if my rating is not changing?
A: If your rating is not changing one of two things is happening. 1) You are
playing as expected of your rating, or 2) The ratings are not counting your
games. To find out the differece, make sure the number of rated games you
are playing is chaning, from day-to-day, as expected. If this number is
changing, the ratings system is counting your games.
Q: Do 9x9 and 13x13 games count?
A: No. All games which are not 19x19 do not count towards ratings.
Q: Does a game with a 'NR' and '???' player count?
A: No. See: help entry help rank help rank
Q: Why does my rating change when I do not play?
A: Your rating depends not only on the games you play, but also on the
people you play. If you play fred and fred does very well, your rating
will increase as well, without you playing anyone else.
Q: I set my rank too low initially, can I change it and have the rating
change accordingly?
A: No. You just have to wait for 20 games until your rating is displayed.
After 20 games I would think your rating will stablize.
If it is more than 5 stones off, them you may need it adjusted.
Q: How can I reset my seed and start over again?
A: You cannot. Try creating a new account instead.
Q: How do handicap games affect the ratings?
A: Handicap games are treated just the same as regular games. If a teaching
you cannot keep a rating with a teaching game, you probably cannot keep
the rating.
Q: Why did my rating change when I didn't even play that day?
A: As soon as you play one game with another rated player, your rating (post)
is attached to everyone else's ratings through that game (a spring), until
the start of another cycle. Any change in your opponent's rating will pull
on the spring and cause a change in your rating, too. In theory, it is
possible for a game to affect the ratings of every player in the system.
For example, if you (as a 1d) lose an even game to a 3k player, it will
cause a fairly large decrease in your rating. :-( But if that "3k" player
then wins several more even games against 1d and 2d players, this will
increase his or her rating quite a bit, and this in turn will reduce the
change in your rating. (Perhaps that game was not so unlikely after all.:-)
Q: I initially set my rank at 5k, but that turned out to be too low. I
changed it to 1d, but my rating didn't change. Why?
A: When you set your rank and play ONE game against another ranked player,
your seed is initialized and you enter the ratings system. From then on,
your RANK and your RATING are entirely separate. Your RANK is only set
manually with the "rank" command, and your RATING is computed
automatically. The moral of the story: pick a reasonable rank BEFORE your
FIRST game. After that it is too late.
Q: How accurate are the ratings?
A: Probably not more than 0.5. The games played are usually played down
to one 1.0
Q: Why do the ratings not move 'fast enough'?
A: The ratings do move about the speed they should. You actually can
move from 25k* to 5k* in a matter of months. The players 10k* and
below are allowed to move more quickly than the stronger players.
This is to allow for people learning. After about 10k*, the ratings
are as slow as they are for every other player.
Q: Why do I have the wrong rating?
A: If you have played 20 games, given a close seed (within a few stones),
you probably have a rating close to your true IGS rating. Your IGS rating
will possilby not be the same as the rank you are used to using in your
local club. There is no absolute strength for a give rank. A one dan on
IGS is different than a one dan any where else. The names are the same,
but they are same names for differnt strengths.
Q: Why are there drops in my rating?
A: As it adjusts itself, peoples'
ratings become higher and higer. We adjust the ratings DOWN or UP by
tracking a few players throughout the system. For example, if you have
a consistent player who gains a stone in strength, in a short time (less
that a year).
Q: What can I do to reset my seed and start my rating over again?
A: You cannot. Try creating a new account instead.
Q: What are the parts of the rating system?
A: There are three things which describe the ratings: rating, seed, sigma.
The seed is the anchor of your ratings. You are close to your seed, in
rating. Your seed was an old rating from which your rating is compted.
The sigma is a measure of how acurate you rating is. If the sigma is high,
then the 'believability' or your current seed is low.
Suggestions:
A Few Suggestions
The ratings system will work best when the ratings have had a chance to adjust
themselves to reasonably accurate values. Note that the assignment of the
value 32.00 to "1 dan" is entirely arbitrary, and that this value may drift
over time due to the continual addition and deletion of accounts from the
system. There is also a wide variation in ranks from one country to another,
and so it is best not to pay too much attention to the "rank" that corresponds
to your "rating". In the long run, it is quite possible that all of the "real"
1d players (whatever that means) will have ratings close to 30.00 or 34.00 or
some other value. Pay attention to the DIFFERENCE between your rating and your
opponent's rating, but don't worry too much about the numbers themselves.
The system will also work best when most games are played with handicaps that
are "correct" given the current ratings of the players. Games played with
handicaps that are far too large or far too small may have extreme and
undesirable effects. When you play a game, determine the numerical difference
between your rating and your opponent's, round it *UP* to the next integer,
and use that number of handicap stones. If you think you are giving away too
many stones (or getting too few), perhaps you have been improving or playing
well lately. Use the computed handicap. Losing is good for you. :-) If you
think you are giving away too few stones (or getting too many), perhaps you
have been losing a lot recently. Use the computed handicap. Maybe you will win
and get your confidence back. In the long run, using the proper handicaps will
make the game more enjoyable for everyone.
See also: best FAQ rating stats toggle ratingstats
|
 |