28 coaches online • Server time: 08:54
Forum Chat
Log in
Recent Forum Topics goto Post Secret League Old Wo...goto Post Creating a custom to...goto Post ramchop takes on the...
SearchSearch 
Post new topic   Reply to topic
View previous topic Log in to check your private messages View next topic
MattDakka



Joined: Oct 09, 2007

Post   Posted: Apr 28, 2021 - 15:15 Reply with quote Back to top

The CR system works quite well (as an aside, after last recalculation my CR got slightly higher, so I'm not pissed off personally by it, just I thought it could be improved further, especially for coaches playing regularly tier 3 teams).
Just a small comment: while it's true that TV is not very accurate and that you can make a tier 1 team with sub-optimal choices (for example a CD team made only by Hobgoblins and an Undead team made only by Zombies) these are not common rosters, while if you choose to play a tier 3 team, no matter the roster composition, it's going to be a sub-optimal roster always and this should be taken into account, in my humble and personal opinion.
That said, thanks for bothering to recalculate the rankings recently, Christer.
koadah



Joined: Mar 30, 2005

Post   Posted: Apr 28, 2021 - 16:05 Reply with quote Back to top

Apologies for still not getting it and combining probably incorrect assumptions.

Doesn't the higher TV team (before inducements) generally win more often than the lower TV teams?
Aren't the stunties, especially the long-lived ones, more likely to be the lower TV team?

Doesn't the chances of winning generally increase as the TV advantage increases?

That is the kind of thinking that makes including the inducements in the TV seem unfair.

Or have I got completely the wrong end of the stick?
Is this to fix a specific exploit?

_________________
Image
O[L]C 2016 Swiss! - April ---- All Stars - Anniversary Bowl - Teams of Stars - 13th March
Christer



Joined: Aug 02, 2003

Post   Posted: Apr 28, 2021 - 16:33
FUMBBL Staff
Reply with quote Back to top

The direct reason for this latest change was Snotling teams playing games at 250k TV base (through RRRs for example), bringing 500k+ inducements into the games. Zero risk because a loss was a 0 change, and a large increase for each win.

As for the typical win rates, the game is far far more lenient on TV differences than most people believe. Running statistics says that you basically have a minimum of 40% chance to win a game completely regardless of TV difference (up towards 1M difference, stats above that become volatile due to lack of data). This means that CR is a much much more powerful indicator for win probability than TV is by a large margin. The latest change moves the CR system more into line with that than before in addition to getting rid of the Snotling "issue".
Halfabrain



Joined: Jan 20, 2018

Post   Posted: Apr 28, 2021 - 16:35 Reply with quote Back to top

Christer I take my hat off to you for all you've done and continue to do. Fumbbl is a major part of what leisure time I have I will continue to enjoy and support it no matter what.

I suppose that much of this discussion comes from a fundamental misunderstanding of what you mean by CR. It is essentially a number representing the chance of you winning a game not a pure measurement of coach ability. If one only chooses tier 1 teams then that number goes up, if only tier 3 then it goes down. Obviously good coaches will always have a higher CR than bad but my gripe was that the vagaries of the scheduling system combined with the CR system, particularly the inducements aspect, lead to games and CR results that seemed unfair.
It doesn't take a good coach to tell one team is better than another, the 12 year olds at my local games club can tell that. Any competent coach can have an excellent record playing mismatched games, that doesn't make them a good coach, just good at creating favourable conditions which frankly is not especially difficult.
koadah



Joined: Mar 30, 2005

Post   Posted: Apr 28, 2021 - 17:09 Reply with quote Back to top

Thanks Christer.

So it's them free snotlings wot dunnit.

_________________
Image
O[L]C 2016 Swiss! - April ---- All Stars - Anniversary Bowl - Teams of Stars - 13th March
Nelphine



Joined: Apr 01, 2011

Post   Posted: Apr 28, 2021 - 17:21 Reply with quote Back to top

Yeah, i think the fundamental misunderstanding i have is that I think of CR as being related to systems used by other games, like MMR for Starcraft 2.

In other games, this stat is used primarily as a matchmaking tool, with the goal being that 2 coaches of roughly equal 'rank' should have roughly 50/50 chances of winning.

However, in all cases that I know of for other games, your choice of team does NOT fundamentally change your chance of winning. In chess, yes, the choice of team does technically make a difference; but only for a very very small subset of games played. Over thousands of games, with coaches of roughly equal skill, choice of team won't noticeably impact win rates. Similarly, in Starcraft, while the 3 teams all play very differently, the win rates for coaches of roughly equal skill will not fundamentally be different.

Further, for many games, if there ARE differences in win rates that can be attributed to a team choice, they promptly get updated to try to account for that (such as the regular patches Starcraft 2 had in its first several years of existence.)

However, here with BB, we have teams that ARE actively designed to be worse. They DO fundamentally alter win rates.

So this leads to a misunderstanding for me. I'm expecting CR to (if we ignore the TV problem, which basically just makes for an infinite number of team choices, instead of just one team per race) result in two coaches who are playing with the same CR to have roughly the same chance of winning.

If coaches who do not have the same chance of winning play each other (but all things other than CR are equal), then the one who has a larger chance of winning, wins less CR if they win; and if they lose, they lose more CR.

So to me, then two coaches who do not have the same chance of winning (but CR is equal and something ELSE is unequal) should have the same rule applied; the one who has a larger chance of winning, wins less CR if they win; and if they lose, they lose more CR.

In particular for me, the system should be designed such that over a large number of games, and assuming their current CR is 'correct' (meaning that over thousands of games, their current CR is where their CR would stabilize) then:

If all else is equal, and they play always coaches with the same CR as they have, then they should win half their matches. This then defines how large a sample needs to be used, because this is a dice game, and so any given game can appear very lopsided. But over enough games, the dice would balance out, and so eventually it would tend towards 50% wins.

Then, we would use this large sample size for the rest.

Next, assuming all else is equal EXCEPT CR of opponents, then playing against a given CR all the time should result in a certain number of wins or losses. Each win or loss should then correspond to an appropriate chance in CR such that the CR still tends towards that 'correct' value we previously defined. HOWEVER, since we actually almost never have large enough sample sizes, this actually means that CR of opponent, and CR of the player, is usually going to be off by a bit from their actual 'correct' CR. So using 'bands' of CR for this calculation is better, as it allows for a better approximation given the small sample sizes we use. So this paragraph becomes about playing against different categories of CR, instead of playing against different CR.

But then, BB has team choice. This is designed to have different win rates based on the team chosen. In theory this is roughly broken down into tiers (because TV makes things too crazy to try to actually account for everything accurately.) So then, CR should be designed such that if two coaches of equal 'correct' CR category play each other, but with different tiers of teams, then over a large sample of games, the difference in wins should still result in both coaches ending up with their 'correct' CR, in the same way that a different win rate due to different CR should still result in both coaches ending up with their 'correct' CR.

For this purpose, I'd specifically expect us to look at tiers of teams - for the same reason we look at categories of CR - instead of looking at specific teams or even team matchups.


This would NOT result in a 50/50 game each time. Instead, it would end up with whatever the expected win rate for a given TEAM has. So if the game expects that a tier 3 team should have a 25-30% win rate, then that would become the 'expected' win if a coach of a tier 3 team plays against opponents of equal CR.
Christer



Joined: Aug 02, 2003

Post   Posted: Apr 28, 2021 - 17:23
FUMBBL Staff
Reply with quote Back to top

First of all, you're vastly underestimating how hard it is to judge if a team is better than another. Sure, it's easy to see that a properly built CD team is better than an all-hobgoblin team with random skills, but for the most part there's a *huge* area where I doubt people can accurately judge win chances for two teams given equal ability coaches.

Most of your 12 year olds at the local games club will look at TV as their primary data and maybe do a scan of rough build choices, but it's very very hard to accurately predict which team will win more (and it's not practical to even test this given it would take hundreds of hours to do so).

Note, for the remainder of this post, I want to stress that being a "bad coach" doesn't make you a bad person. In this context, a "bad coach" is simply someone who isn't competing at the top level. I absolutely do not mean to say that just because you're playing a suboptimal minmaxed build, you're a worse person or that your choices on how to have fun aren't valid. The discussion here is strictly a ranking and competitive perspective where you're fighting to be one of the most successful coaches on the site.

Does playing tier 3 teams making you a bad coach? Quite frankly, yes. In the eyes of a ranking system you are deliberately making a poor choice given the current meta and therefore you are "bad". In all competitive games where you have active choices, you will have an inherent set of knowledge of what is good and what is bad (a "meta"). If you are not following this meta you are either very very good (meaning you are one of the few who move the meta forward with new ideas) or you're simply not among the top bracket of players (or more specifically as highly rated as you could potentially be).

Again, though, I stress here that playing something deliberately worse because it's fun is absolutely a valid decision but complaining that your ranking goes down when you make choices that reduce your win rate seems a bit crazy to me. The "meta" choices you make is absolutely a part of CR and if you are aiming for being the top rated coach on the site you simply shouldn't be making deliberate bad ones.

As for what CR changes are "unfair" and how CR should move.. It's honestly completely arbitrary. The CR numbers mean absolutely nothing in isolation. 0.62 CR increase/decrease is meaningless on its own, and you really have to look at the system as a whole. You need to compare it to the overall population of rankings of the active coach-base. This includes the average and standard deviations (and we're more often than not assuming a normal distribution here, which is a fair but obviously not perfect model). If I increase the k-value of the rating system, or the dCR divider in the formula, the absolute numbers get scaled up or down but the relative spread of coach CRs remain the same. If I change the starting CR from 150 to 200, everyone will simply get +50 CR across the board with no additional change. If I change the k value, the standard deviation will change but the relative positions will be largely unaffected. The same goes with the CR or TV dividers used in the formula. Things have little actual effect, and numbers will simply move without relative positions of coaches being affected. You'll just simply see -0.1 instead of -0.6 (or whatever).

The real reason for this discussion at all is that the formula changed from a place where win probability for underdogs were estimated far too low (remember, you have a roughly 40% chance to win against *any* TV difference from a statistical perspective assuming an equal CR opponent). So a game where you previously lost 0.2CR for a loss where you're the underdog will be a higher loss with the new system. The thing, though, is that the new system is significantly closer to a realistic expected win rate than before. Going from -0.2 to -0.6 "feels" rough because it's harsher than before and not because the direction of the change was bad. It's a subjective opinion and not based on any kind of data or analysis. That's a terrible starting point for a meaningful discussion, which is why I said that if you have suggestions for changes or ideas for improvement, you simply need to have the basics down. If you don't and have a starting point based on subjective opinion it'll just be a waste of time.
MattDakka



Joined: Oct 09, 2007

Post   Posted: Apr 28, 2021 - 17:50 Reply with quote Back to top

Since playing a tier 3 has high risk of losing the game, there should be an adequate high CR reward. High CR loss risk, high CR reward.
When people bet on horses or football teams, if they bet on the horse/team with lower win % they gain lot of money if the horse/team win (and the other way around, they don't lose much money because defeat was likely).
They deliberately gamble and pick the low win % in order to gain more money.
CR should be considered like betting money. When you gamble to play a tier 3 team you should gain more CR/lose less CR, in order to balance the risk/loss reward.
I don't think that a coach playing tier 3 in a competitive division is necessarily a bad coach, everybody knows that tier 3 is going to lose most of games.
There are many top coaches playing tier 3 and they surely know they have fewer chances to win a game.
I think that their "gamble" of playing a weak tier 3 team should be encouraged by winning more CR points and lose fewer.
After all, they are not going to break the ranking system because tier 3 teams are, by design, mechanically bound to lose more than win.
Only by compensating the CR win/loss ratio with a CR bonus people can bother to play tier 3 in a competitive division.
That said, I'm not complaining about the current CR system, I don't even play tier 3 anymore, but maybe, with a different CR gain/loss ratio, I could bother playing them from time to time.
I can live without playing tier 3 teams but honestly it's a bit sad that playing them in a competitive division is not encouraged. Racial variety is always nice to have in my opinion.
This was my last tier 3 game in the Box:
https://fumbbl.com/p/match?id=3987435
I personally don't think that that CR loss was fair, but for sure, it made me stop playing tier 3 with this CR system.
Gaining CR when you are a Legend takes too many games just to throw it to the wind if you dare to play a game with a tier 3 team and lose it.


Last edited by MattDakka on Apr 28, 2021 - 18:02; edited 1 time in total
koadah



Joined: Mar 30, 2005

Post   Posted: Apr 28, 2021 - 17:59 Reply with quote Back to top

Christer wrote:
...(remember, you have a roughly 40% chance to win against *any* TV difference from a statistical perspective assuming an equal CR opponent).


Is that really true for Flings & Gobbos though?

Is it that people feel that other stunties have been stitched up because of the stinking snotlings? Mr. Green

If this happens "too often" will it be a further disincentive to play those teams?

_________________
Image
O[L]C 2016 Swiss! - April ---- All Stars - Anniversary Bowl - Teams of Stars - 13th March
Christer



Joined: Aug 02, 2003

Post   Posted: Apr 28, 2021 - 18:07
FUMBBL Staff
Reply with quote Back to top

First off, good summary of your understanding of the topic Nelphine. I have a could of quick comments though.

Nelphine wrote:
If all else is equal, and they play always coaches with the same CR as they have, then they should win half their matches. This then defines how large a sample needs to be used, because this is a dice game, and so any given game can appear very lopsided. But over enough games, the dice would balance out, and so eventually it would tend towards 50% wins.


This is both inherently true, but also a bit "backwards" in a sense. The ELO system is directly designed to move towards exactly this. If we ignore the inherent team balance (either by the two coaches playing identical teams, or enough varied games to make it even out), they will have an expected 50% win rate if they start out at equal CR. However, the ranking system is mathematically designed to move towards making the expected win rate (p) equal the actual one. So if the two coaches started out at equal CR (say new coaches at 150), but were actually not equal, the CR system will converge towards the point where the actual p value equals their average win rate.

It's not the win rate that tends towards the CR, but the CR that tends towards the win rate. Slight difference here, but I thought I'd mention it.


Nelphine wrote:
So using 'bands' of CR for this calculation is better, as it allows for a better approximation given the small sample sizes we use.


This is a misunderstanding of why the brackets ("bands") are there. The ranking system we use is based on ELO, which traditionally estimates a win rate for a set of matches (a tournament) and then after this set of matches has been played compares the actual to the expected (S-p in the formula). With our modification, we calculate CR change after each match because it's "more fun" at the cost of a bit more volatility and faster movement in general.

Also, traditionally in ELO systems, you have a limit to how many games can be played against opponents vastly "weaker" than you essentially by segregation of the highest level from the newcomers.

FUMBBL chooses to not add these limitations in the name of inclusion, and therefore we must add guards towards cherry picking behaviour. This is what the CR bracket system does. The impact of non-upsets is dampened with large CR differences, a lot more than the standard ELO formula allows for (even with the non-linear nature of the CR amplification that is in place). They're not there to deal with small sample sizes.


Nelphine wrote:
This would NOT result in a 50/50 game each time. Instead, it would end up with whatever the expected win rate for a given TEAM has. So if the game expects that a tier 3 team should have a 25-30% win rate, then that would become the 'expected' win if a coach of a tier 3 team plays against opponents of equal CR.


The core issue I have here is that you're suggesting adding a system on top of CR that fundamentally is an approximation trying to cover the fact that TV is a terrible estimate for team power. Adding a subjective bonus to these teams is basically equivalent to opening up for people to exploit the edge cases.

If I set about into this territory of trying to cover TV problems, I'll do it by reintroducing strength in some way rather than adding another system on top of the ranking formula.

We had an advanced tiering system in place for a while, where each race had a range of CR-like numbers that were banded by TV and opponent race. It didn't work very well and it mostly just pushed everything more towards a 50% win probability and acted like a dampener more than anything else. Just mostly a complex way to reduce the standard deviation of CRs. This was removed last time around after evaluating the actual effect of it (basically none).
Halfabrain



Joined: Jan 20, 2018

Post   Posted: Apr 28, 2021 - 18:16 Reply with quote Back to top

I understand where your coming from and what you're saying makes perfect sense. I haven't looked into the math of it with much detail(or in fact any detail...), I suppose I was just making an observation that perhaps wasn't very useful in a meaningful way.

It's simply that several discussions I've had over the last week raised the point that the CR system did not seem to recognise just how hard it is to win, or even not to lose in a hopelessly demoralising manner, a game where you are coaching a Tier3 race against very high TV opponents, specifically heavy bash teams like Chorfs, Chaos or Nurgle.

Of course you are perfectly right that these games are designed to be very hard to win and in the grand scheme of the CR system are few enough so as to be statistically irrelevant but that doesn't mean stunty coaches don't resent the cosmic unfairness of it all or don't despise those coaches who seem to play nothing else but high TV killers. Doing that might help make you a "good" coach but it also makes you a bad person Very Happy
Christer



Joined: Aug 02, 2003

Post   Posted: Apr 28, 2021 - 19:00
FUMBBL Staff
Reply with quote Back to top

koadah wrote:
Christer wrote:
...(remember, you have a roughly 40% chance to win against *any* TV difference from a statistical perspective assuming an equal CR opponent).


Is that really true for Flings & Gobbos though?


Graphing goblin games played in B shows them starting out at roughly 30% win rate at equal TV and not really dropping as the opponent gets higher. In fact, data seems to imply that at -350k or so you have a higher chance to win than at 0TV delta.

As gobbos become the "overdog", they climb linearly up to 45% or so at +200k and then stagnate (high volatility, hard to tell actual numbers).

Flings are similar with maybe 32% win rate at 0TV difference, staying the same down to maybe -400k, and linearly increasing to 45-47% at +220k.

I can't be bothered uploading these charts to the site to persist here on the forums, but if you're super interested, poke me on Discord and I can paste them to you there.

Now, as you may imagine, adding the petty cash into the mix, the charts change. The overdog behaviour is roughly the same. The underdog behaviour, however, is very different.

For goblins, Win rates go up from maybe 35% at a slight disadvantage all the way up to roughy 45% at -250k.

Flings are even more extreme with a big jump at -70k at 40% and increasing higher as they become more of an underdog. At -360k or so, they're up towards 60% win rate, outperforming any other TV difference in either direction.

And this highlights the problem from prior to this change. The expected win rate was dropping while actual win rates were going up.
Nelphine



Joined: Apr 01, 2011

Post   Posted: Apr 28, 2021 - 19:00 Reply with quote Back to top

And yes I agree it's a horrible estimation.

But SOMETHING is better than nothing for tier 3 teams, which is what we have right now.

Yes a team strength would be way way better. I've done work on that, I've seen your old work on that.

But that takes a lot of time and effort.

Doing:
Tier 1 - everyone else
Tier 2 - vampire, underworld(?), khemri(?)
Tier 3 - ogres, snots(?), halflings, goblins

Is something that could be implemented without a full team strength system. And I'd be fine if all the races with a question mark are moved up a tier.
Christer



Joined: Aug 02, 2003

Post   Posted: Apr 28, 2021 - 19:08
FUMBBL Staff
Reply with quote Back to top

Nelphine wrote:
But SOMETHING is better than nothing for tier 3 teams, which is what we have right now.


Nope, it's worse. The previous post highlights why it's a terrible idea to blanket boost CR for tier 3 teams and shows why you shouldn't be just making assumptions without a data-driven foundation.

Yes, you're at a disadvantage from a CR perspective picking stunty teams. I'm fine with that.
MrCushtie



Joined: Aug 10, 2018

Post   Posted: Apr 28, 2021 - 21:16 Reply with quote Back to top

MattDakka wrote:

CR should be considered like betting money. When you gamble to play a tier 3 team you should gain more CR/lose less CR, in order to balance the risk/loss reward.

vs
Christer wrote:

It's not the win rate that tends towards the CR, but the CR that tends towards the win rate. Slight difference here, but I thought I'd mention it.


So basically, a racial-boosted CR would make MattDakka more likely to play goblins (variety, which is good), but it wouldn't reflect that it's unlikely that he'd win as much if he did play goblins all the time.

My own feeling is that you want to focus on the race-specific coach rating, so you know how good you are vs other coaches with the same race. And I can see part of that by looking at this but it's not something that's as readily available as the CR change on the match report, and it's harder to compare yourself to others (eg top Box Goblin coach has a CR of 153.97, whereas top Chorf is at 171.61)

_________________
Image
Display posts from previous:     
 Jump to:   
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Log in to check your private messages View next topic