FUMBBL :: Online Blood Bowl League

As some of you have already seen, the coach rating (CR) displayed on C division match reports looks a bit unusual. Numbers are significantly higher and the increases and decreases are quicker. This is the first step in a larger series of changes which aim to replace the core CR formula with something different (and that I believe is better).

The core formula of the old CR system was a system called Elo (named after the physics professor who came up with it, Arpad Elo), a system used widely in the world of chess. It is essentially a system designed so that winning a match against a stronger opponent gains you more rating than winning against a weaker opponent. This is done by a relatively simple formula, and functions fairly well overall (I won't go into deep detail about that here).

For FUMBBL, I adjusted the formula slightly to allow for asymmetric matches (ie, matches where the sides are not equal in strength) and it has been working well over the years. So what's the point of changing then? There are a couple of mathematical details which are less than perfect (again, glossing over this here), but the main issue is that Elo ratings don't change unless people play. This means that the top rated coaches will remain at the top indefinitely, which creates an incentive to not play if you're at the top. Clearly, this is a bit of a problem. In the past, people have suggested decaying the rating over time, which is fundamentally an interesting idea but simply does not work in an Elo based system. Removing (or adding) rating to an Elo system will inevitably cause what's called rating inflation or rating deflation, where a specific rating is worth more or less at different times (there's some of that happening in Elo regardless for subtle mathematical reasons) and is why I never added a decay into the previous rating system.

Ok, enough of the preamble. What's different? What is this new system and what does it do better?

The new formula is based on a system called Glicko 2 (named after Mark Glickman, a statistician at Harvard University), a successor to their Glicko (1) system. To simplify a complex system (I may delve into more detail in a separate blog at a later date), it can be seen as a more comprehensive variant of the Elo system (and it can, in fact, be configured to model the Elo system perfectly), where additional information is added beside the raw rating number. The most important addition is the concept of "ratings reliability" (or ratings deviation). To describe it in a more plain manner, it's essentially a measurement of uncertainty (as implied by the similarity to standard deviation), and gives an indication of how precise the number is in mathematical terms. A new coach with a rating of 1500 and a rating deviation of 150 would mean that the system thinks the person has a rating somewhere between 1350 and 1650, a fairly wide range. A more experienced coach could still be rated 1500, but have a smaller deviation (say 50), meaning the system would think the person has a rating between 1450 and 1550, a much narrower range. This deviation is taken into account, and playing against someone with a larger deviation will mean that the change in rating for a given game will be reduced given that the opponent rating is less precise.

The more a person plays, the more certain the system will become that the rating is "correct" and the deviation will be reduced. However, if someone does not play for a given period of time (called a rating period), the uncertainty will increase instead. It's worth to note that the rating itself does not change, and it's only the rating deviation that will increase.

The fact that this uncertainty moves as it does allows me to use the Glicko based system to implement "decay" without really affecting the system. Instead of displaying the raw rating, the site displays rating reduced by the deviation (in a somewhat non-linear way). This means that the apparent rating will decrease as time goes by and people who stop playing will lose some of their rating. There is an upper limit for how high that uncertainty will get, meaning there is a floor for how low your rating will drop over time.

So at its core, a high deviation means your rating will move faster and when playing enough it'll get more secure and your rating will stabilize to an extent. Taking an extended break will increase this uncertainty and you will need a few matches for the rating to settle down again. Overall, it's a very interesting system that feels "right" to me.

With all that being said, the Glicko based system is in place and gets updated as matches are played. I am mostly happy with the current configuration, but will be looking at feedback from people who are interested in CR in general. In particular, the current rating period is set to 2 days, which might be too fast but we will see how it turns out. The deviation doesn't exactly shoot into the sky just because you missed to play in a single rating period.

For now, the only place this rating is shown is in match reports for C division games, and I will be changing over various CR views to the new system over time. Clearly, the coach page needs to be updated to show it, and I need to update the rating page that shows all the different variations.

The multiple ratings we have right now will remain (and will in fact be increased with another category):

- Overall - shown on the match pages - Affected by all C games being played
- Open - affected by matches through the gamefinder
- Blackbox - affected by matches through the blackbox scheduler
- Tournament - affected by matches in tournaments (this is new)

In addition, all these exist for each roster as well (so you will have a Tournament Orc rating for example).

I may play around with the rating periods for the different categories (tournament ratings, for example, could switch to monthly or quarterly rating periods, or synchronize with majors). For now it's easier to keep it all the same, and I can absolutely go back and recalculate ratings when configuration changes are necessary.

Either way, feel free to take a look at your past C matches and let me know what you think. I know the data isn't super visible yet, but I aim to resolve that asap.

Happy cherry picking! ;)

Posted by grant85 on 2023-01-11 13:16:57

neat! thanks for the update

Posted by MattDakka on 2023-01-11 13:18:39

Awesome update, thanks for it, Christer!

Posted by Habeli on 2023-01-11 14:54:47

For tournament play the 2 day period seems quite short. Otherwise seems like a nice system.

Posted by commisaro on 2023-01-11 17:20:37

Thanks for this! Might we get an update to the ratings displayed on our profile at some point? My profile shows my [R] rating still even though I only had 10 games in [R], and have played many more [C] games at this point.

Posted by Dynamix on 2023-01-11 17:51:42

Nice I guess for those that are interested in CR but CR fixation is a road to hell IMO

Posted by MattDakka on 2023-01-11 18:02:03

CR is a useful tool to evaluate your performance and provides a goal in a perpetual division. Also, it helps to spot quickly the good coaches who play in the Box. If I want to watch replays of a specific race, I check high CR coaches.
So, CR is nice and the more accurate, the better.

Posted by PurpleChest on 2023-01-11 19:29:44

Sounds good.
Thank you.

Posted by Halfabrain on 2023-01-11 22:02:20

Nice one Christer

Posted by erased000051 on 2023-01-11 23:33:20

I appreciate the change especially in what concerns the rating of coaches who do not play often and it is not fair that they are always in the top ranks. But I would also appreciate having more details regarding the formula if only to understand why - for instance - in this game which I draw I lost 1559 points and collapsed to amateur levels.
https://fumbbl.com/FUMBBL.php?page=match&id=4430508

Thank you Christer if you can clarify this point by making the mathematical formula of the rating public. I believe there are many people on this site who know the rudiments of mathematical analysis

Posted by C0ddlefish on 2023-01-12 00:07:07

That's clearly an error/bug. It's all based on Glicko-2. If you can understand that even after an explanation you are a better person than me

Posted by Christer on 2023-01-12 12:09:49

Yes, there's a bug that affects a portion of the matches and CR just goes down the drain. Investigation is in progress, and I'll try to figure it out tonight. All the CRs will be recalculated, so it will be fixed eventually.

Posted by mayhemzz22 on 2023-01-12 12:46:27

like the idea of this and interesting to see how it plays out :)

noticed an issue with my last game I seem to have knocked -1146 off my opponent but gained 0 myself.

https://fumbbl.com/p/match?id=4430589

Posted by MattDakka on 2023-01-12 14:17:08

My last match, if it can be useful: https://fumbbl.com/FUMBBL.php?page=match&id=4430597

Won 0, my opponent lost 1314.

Posted by erased000051 on 2023-01-12 14:30:07

Christer thank you very much for the effort to fix it! Very appreciated. You're great!

Posted by garyt1 on 2023-01-12 14:40:32

Good job Christer. So if the deviation increases but mean doesn't then is the decay element just that a person with larger CR uncertainty would drop below someone with identical CR but less deviation?

Posted by Christer on 2023-01-12 16:15:27

Correct garyt1. The core rating doesn't change (which maintains the integrity of the rating system as you're not just artificially removing rating from it all). Glicko isn't inherently zero-sum like Elo so outright removing rating isn't inherently a disaster.

Posted by BigChiefSittingDuck on 2023-01-12 21:05:14

Interesting update, thank you Christer! Don't know if it's possible, but I think it would be great if the CR change on a match result reflected the 'rock - scissor - paper' nature of Blood Bowl. For example at the moment if my halflings take on some dwarfs, that is obviously a very hard match up whoever you play, but if I win I currently get the same CR boost as if I'd played some other stunties played by the same opposing coach. It would be great if you could add to your calculations say a 20% boost if your 'paper' side just beat a 'scissors', no alteration if you beat another 'paper', or -20% deduction if you just beat a 'rock' as that was one you were expected to win. I'm not that interested in CR myself (being primarily as halfling and goblin coach :D), but would that be possible? I think that would increase accuracy if you are interested in CR, plus as a bonus it would give coaches an incentive to take on hard match-ups rather than dodge them (which undoubtedly happens a lot on the gamefinder for all kinds of sides).

Posted by Eki on 2023-01-13 08:22:54

Oh yeah. It's a great update. Removing non-playing trainers is best. in 2022 i think i only played 1 RRR. Congratulations Christer because a change like the one you have made was needed

Posted by SanKuKai on 2023-01-17 16:44:27

Huhuhu I understand absolutely nothing about the ranking system... a week ago I was an emerging star (i'm coming for youuuu...!), yesterday I was a Star (WOW!!!: D), and today after a victory I am a Veteran (show me respect!)... If all goes well, tomorrow I will be a legend? :D

Posted by Verminardo on 2023-01-18 20:57:50

Strangely, despite all the valid points that have been made over the years on what CR does and doesn't say, I don't like the look of "Emerging Star" at the top of my profile. I don't like it at all. This is a ploy to lure estranged old hands like myself back into playin [C], isn't it? Well played, Christer, well played indeed.

If you will now excuse me, I have some teams to build for the Box Trophy.


(bad)	(good)