25 coaches online • Server time: 12:15
Forum Chat
Log in
Recent Forum Topics goto Post Gnomes are trashgoto Post Roster Tiersgoto Post Gnomes FTW! (Replays...
SearchSearch 
Post new topic   Reply to topic
View previous topic Log in to check your private messages View next topic
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 05:56 Reply with quote Back to top

Irgy wrote:
They're not, because they could save you spending undue time on a fundamentally flawed system. Well, the criticisms more so than the assurances. Plus you don't have to just take my word for it, I have no doubt you could come up with a better system yourself in less time than I expect you'll spend defending your original idea.

All you've said is "Nuh uh, start over, I'm not going to get specific but don't criticize my criticisms" and that's kind'v douchey. I don't agree with what you're calling a fundamental flaw in the idea, and you've said yourself it's nothing we can look for in the data.

You also say that using W/L/D is enough, but you don't seem to understand that even if we didn't factor in racial matchups, it would take many games before we could compare a team's record to other teams of the same race (after their first game they have either a 100%, 50% or 0% rating, for example). In order to not require a large number of games, the comparison has to be made on a match level.

If we use win percentage on a match level, based on racial match-up, we're going to see the average number be somewhere between 0 and 1, but not either of those values. Since, on a match level, you can only win, tie, or lose, that means the only difference between THAT system, and, say, the ELO based systems is whether or not a draw increases or decreases your rating. A win would always increase it, a loss would always decrease it.

Again, if you have specific information and ideas, that's cool - lets hear them... but saying "in my pocket I have something, but I'm not showing it to you, but take my word on it, it's cool" is not helpful to anybody.
Cocinero



Joined: Sep 14, 2011

Post   Posted: Dec 14, 2011 - 06:15 Reply with quote Back to top

We do believe that skill of coaches is equally distribuited among the 24 races? The ideas that you present seems fine to me in order to have a Race Coach Rating, but doing the sum over all the races seem to no a good plan. Anyway, it would not hurt.
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 07:21 Reply with quote Back to top

VoodooMike wrote:
and you've said yourself it's nothing we can look for in the data.


I haven't said you can't look for it at all, I've said the extent of the problem is difficult to quantify. The issue is whether score differences (within a given W/D/L result) actually correlate with skill. Since this is central to any advantage of your method, if this is difficult to quantify that's a problem for your side of this argument not mine.

VoodooMike wrote:
You also say that using W/L/D is enough, but you don't seem to understand that even if we didn't factor in racial matchups, it would take many games before we could compare a team's record to other teams of the same race (after their first game they have either a 100%, 50% or 0% rating, for example). In order to not require a large number of games, the comparison has to be made on a match level.


Adding bad information might make more data, but it doesn't neccesarily make better data.

I think you're also missing the boat when you start talking about the amount of available data. Especially since your idea suggests (as I read it) discretising to "better" and "worse" - from an information theory perspective you actually lose more information by throwing away draws than you gain by shifting the odds closer to 50/50.

VoodooMike wrote:
If we use win percentage on a match level, based on racial match-up, we're going to see the average number be somewhere between 0 and 1, but not either of those values. Since, on a match level, you can only win, tie, or lose, that means the only difference between THAT system, and, say, the ELO based systems is whether or not a draw increases or decreases your rating. A win would always increase it, a loss would always decrease it.


I'm really not sure what your point is here? Yes, your rating should go up if you win and down if you lose, and possibly change if you draw. The trick is, if it was a difficult matchup it goes up by more than it would go down. So if, on average, you're winning as often as expected, it stays the same. Magic.

Yes it takes a while to average out, but this is a fundamental problem that a player's results are not a strong indication of their skill if the number of games is small. Your method does not substantially improve things in that regard.
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 08:06 Reply with quote Back to top

Irgy wrote:
I haven't said you can't look for it at all, I've said the extent of the problem is difficult to quantify. The issue is whether score differences (within a given W/D/L result) actually correlate with skill. Since this is central to any advantage of your method, if this is difficult to quantify that's a problem for your side of this argument not mine.

Given that we do not have an objective measure of skill to look at correlations with, this is a bit of a circular argument. Since the game is about scoring more TDs than the opponent (a requirement for winning the game) we can say that there will be some correlation between scoring more touchdowns than your opponent, and your skill in the game. We can also say that skillful players are ABLE to score more touchdowns against lesser coaches, and able to reduce the number of TDs the opponent scores.

Your objection is that people are not required to do so, and that's fine, but it's not as much of a problem as you seem to imagine it is. If someone is able to perform better than they actually do, then they are, by definition, not playing to the best of their ability, so the rating of their skill will not be as high as it could be if they did. It's far easier to pretend to be worse than you are than it is to pretend to be better than you are.

Irgy wrote:
I think you're also missing the boat when you start talking about the amount of available data. Especially since your idea suggests (as I read it) discretising to "better" and "worse" - from an information theory perspective you actually lose more information by throwing away draws than you gain by shifting the odds closer to 50/50.

I don't think you've fully grasped what is going on here. Nobody is throwing away draws - win percentage in BB is calculated by adding wins to half the number of draws, and dividing by the total number of games. That means draws are absolutely taken into account, but it means that the win percentage of all races will be between, but not at, 0 and 1. A game's victory value can only be 0, 0.5, or 1 on a per-match basis, and, again, it means exactly what I said it means about the system that tries to do things that way.

Irgy wrote:
I'm really not sure what your point is here? Yes, your rating should go up if you win and down if you lose, and possibly change if you draw. The trick is, if it was a difficult matchup it goes up by more than it would go down. So if, on average, you're winning as often as expected, it stays the same. Magic.

The point is that such systems are only going to be accurate measures of skill when you happen to play a top tier race against other top tier races. It's very hard for a goblin team to win against, say, a skaven team.. does that mean the goblin coach sucks, or that goblins suck as a whole? Under the system I'm proposing, to demonstrate skill, the skaven coach has to not only win the game (which they are likely to do even if they're below-average coaches) but beat a certain spread in order for it to improve his or her rating, and the goblin doesnt have to WIN to improve the rating, only do better than goblins usually do against skaven.

Irgy wrote:
Yes it takes a while to average out, but this is a fundamental problem that a player's results are not a strong indication of their skill if the number of games is small. Your method does not substantially improve things in that regard.

In most situations the system I'm proposing works like any other rating system - if you win your rating goes up, if you lose your rating goes down. Unlike other systems that is not always the case, especially in the cases where a tier 3 team is involved in a match.
PurpleChest



Joined: Oct 25, 2003

Post   Posted: Dec 14, 2011 - 10:47
FUMBBL Staff
Reply with quote Back to top

VoodooMike wrote:
.... but in general if you play the way you and everyone else normally does, it will still rate things appropriately.



As always, welcome to FUMBBL. Why not play a game while you are here.

Thank you for your interesting contribution. Personally i couldnt support, or be interested, in any system or rating that tries to make me play the game a certain way, any way other than exactly as i feel.

But it is an interesting idea and contribution.

As always though, you could start a fight in an empty forum. And have.

_________________
Barbarus hic ego sum, quia non intelligor illis -Ovid
I am a barbarian here because i am not understood by anyone
dode74



Joined: Aug 14, 2009

Post   Posted: Dec 14, 2011 - 10:51 Reply with quote Back to top

Quote:
The point is that such systems are only going to be accurate measures of skill when you happen to play a top tier race against other top tier races. It's very hard for a goblin team to win against, say, a skaven team.. does that mean the goblin coach sucks, or that goblins suck as a whole?
Wouldn't an Elo system which racially adjusts for WDL records between races achieve the same thing, but in a more volatile manner?
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 11:01 Reply with quote Back to top

VoodooMike wrote:
Given that we do not have an objective measure of skill to look at correlations with, this is a bit of a circular argument. Since the game is about scoring more TDs than the opponent (a requirement for winning the game) we can say that there will be some correlation between scoring more touchdowns than your opponent, and your skill in the game. We can also say that skillful players are ABLE to score more touchdowns against lesser coaches, and able to reduce the number of TDs the opponent scores.

Your objection is that people are not required to do so, and that's fine, but it's not as much of a problem as you seem to imagine it is. If someone is able to perform better than they actually do, then they are, by definition, not playing to the best of their ability, so the rating of their skill will not be as high as it could be if they did. It's far easier to pretend to be worse than you are than it is to pretend to be better than you are.


The objective is to score more touchdowns than the opponent, not just "scoring touchdowns". The difference is subtle, but important.

Of course there's a correlation between scoring touchdowns and skill, but mostly because there's a pretty strong correlation between scoring touchdowns and winning. The question is whether there's a correlation between touchdowns and skill when you factor out the result of the game. That case is far less clear. Sure, good players are able to win by more than 1 more often, but have they, will they, and should they need to? Aren't the best players those who concentrate on skilling and protecting their players for next game instead?

Look I'm not saying there's going to be no relation, but I am saying it's far from clear the relation is so strong as to smother out any other factors.

VoodooMike wrote:
Irgy wrote:
I'm really not sure what your point is here? Yes, your rating should go up if you win and down if you lose, and possibly change if you draw. The trick is, if it was a difficult matchup it goes up by more than it would go down. So if, on average, you're winning as often as expected, it stays the same. Magic.

The point is that such systems are only going to be accurate measures of skill when you happen to play a top tier race against other top tier races. It's very hard for a goblin team to win against, say, a skaven team.. does that mean the goblin coach sucks, or that goblins suck as a whole? Under the system I'm proposing, to demonstrate skill, the skaven coach has to not only win the game (which they are likely to do even if they're below-average coaches) but beat a certain spread in order for it to improve his or her rating, and the goblin doesnt have to WIN to improve the rating, only do better than goblins usually do against skaven.


You're just re-explaining your system, which I already understand, and failing to understand my suggestion. You're claiming it's impossible to learn anything from W/D/L if the matchup is uneven, which is just patently untrue. Every result provides information. A result of a 45% matchup provides almost as much information as a 50% matchup. It's just a bit harder to make use of that information.

VoodooMike wrote:
In most situations the system I'm proposing works like any other rating system - if you win your rating goes up, if you lose your rating goes down. Unlike other systems that is not always the case, especially in the cases where a tier 3 team is involved in a match.


In your system, a dwarf player against goblins plays an extremely tight game and gives the opposition no chances, and wins 2-1 90% of the time. And yet his CR goes down, not just on average but every single game, even the ones he wins. Another coach plays a very loose dwarf passing game and wins 55% of the time, but often 3-1. His CR on the other hand, goes up on average. But who is actually the better coach here, the player who wins 90% of the time or the player who wins 55% of the time?

So no, it doesn't work like other systems where if you win your rating goes up, because if you don't beat the target TD ratio you can win and have your rating go down. And when that's not what happens, your system isn't any different anyway.

Sure if everyone totally changed the way they play so that they valued each touchdown equally and essentially ignored the win/draw/loss result, your method would not have the problems I describe. But that just comes full circle back to my original objection: why on earth do you think anyone would want to completely redefine winning for the sake of an incremental improvement to the rating system?
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 11:49 Reply with quote Back to top

PurpleChest wrote:
As always though, you could start a fight in an empty forum. And have.

Because I put this thread in this particular forum? Thanks for failing to read the first three pages.

dode74 wrote:
Wouldn't an Elo system which racially adjusts for WDL records between races achieve the same thing, but in a more volatile manner?

Does an ELO based system ever decrease your rating for a win, or increase it for a loss? If not, then no, it wouldn't, because it still doesn't address the fact that when playing a lower performance team against a higher performance team, you can be a more skilled coach but still lose the game. Racial modifiers are just going to alter how much your rating goes down due to the loss.

Irgy wrote:
In your system, a dwarf player against goblins plays an extremely tight game and gives the opposition no chances, and wins 2-1 90% of the time. And yet his CR goes down, not just on average but every single game, even the ones he wins. Another coach plays a very loose dwarf passing game and wins 55% of the time, but often 3-1. His CR on the other hand, goes up on average. But who is actually the better coach here, the player who wins 90% of the time or the player who wins 55% of the time?

If dwarf teams, on average, beat goblin teams by more than a single TD, then yes, that 2-1 victory will make the dwarf team's coach's rating go down because obviously he's *not* playing a very tight game relative to other coaches who, on average, have been beating goblin teams 2-0 - he let the goblins slip by his defense and score a TD, unlike most dwarf coaches. He won the game (as is expected) but played a sloppy game in the process.

Irgy wrote:
So no, it doesn't work like other systems where if you win your rating goes up, because if you don't beat the target TD ratio you can win and have your rating go down. And when that's not what happens, your system isn't any different anyway.

As I said, in most cases you need only win to raise your rating, as for most race pairings the average TD difference is between 1 and -1. The only time that isn't the case, in general, is in pairings against a team that performs terribly against the race in question, in which case the TD difference is slightly past the 1/-1 mark. In those cases the race has to win by more than 1 TD for the game to improve his or her rating.

You disagree that you can ever be demonstrating better than average skill in a game that you lose, or lower than average skill in a game you win. You don't need to reiterate that again, thanks - I disagree, obviously, and telling me the same thing a dozen times won't change that so... are we done?
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 11:55 Reply with quote Back to top

dode74 wrote:
Wouldn't an Elo system which racially adjusts for WDL records between races achieve the same thing, but in a more volatile manner?


Actually it really is that simple isn't it?

VoodooMike wrote:
Well, when you're ready to open Shroedinger's rating formula, let me know! Your assurances without details are worthless to me.


My mysterious system which adjusts for the racial matchup without introducing needless distortions relating to winning margins is this:

A hard racial matchup is equivalent to a fair racial matchup against a harder opponent. So all that needs to be done to make ELO account for races is modify the opponent's rating in the calculation of each adjustment.

The modification function would need figuring out, but there's no shortage of data to do it with.
koadah



Joined: Mar 30, 2005

Post   Posted: Dec 14, 2011 - 12:06 Reply with quote Back to top

I don't like using the overall racial modifiers. In a lot of cases that will be just plain wrong. IMO if you don't have confidence in the data then don't apply a modifier.

Losing points for a win? Sure it works. It might even be better. But gut feel is that most people just won't like it.

Though I suppose that as it is not going to 'official' that doesn't really matter. Smile

If you build it some people will look at it.

_________________
Image
O[L]C 2016 Swiss! - April ---- All Star Bowl - Teams of Stars - 2 more teams needed
dode74



Joined: Aug 14, 2009

Post   Posted: Dec 14, 2011 - 12:26 Reply with quote Back to top

Quote:
Does an ELO based system ever decrease your rating for a win, or increase it for a loss? If not, then no, it wouldn't, because it still doesn't address the fact that when playing a lower performance team against a higher performance team, you can be a more skilled coach but still lose the game. Racial modifiers are just going to alter how much your rating goes down due to the loss.
It doesn't, but on those few times when you do win you'd get a fairly hefty boost due to the racial modifier. It's not like T3 teams lose all the time - Gobs v Dwarves gives 18.67(+/-8.82) as a Win% for gobs overall. So long as a part of the factoring goes into assigning points according to that win% ratio (i.e. a gob winner would get ~80% of the max, while a Dwarf winner would get ~20% from purely racial modifications) then over time the effect evens out. That's why I say such a system would be more volatile but overall the same.

I guess the difficult bit will be determining the relative effects of coach skill vs racial modifier (whichever version is used).
PurpleChest



Joined: Oct 25, 2003

Post   Posted: Dec 14, 2011 - 12:34
FUMBBL Staff
Reply with quote Back to top

VoodooMike wrote:
PurpleChest wrote:
As always though, you could start a fight in an empty forum. And have.

Because I put this thread in this particular forum? Thanks for failing to read the first three pages.


no, i read the thread, and found it interesting, as i said.

I was commenting more on your pugnacious style than on Woodstocks choice to see non-FUMBBLers contributions as more to do with RL BB than FUMBBL.

'You could start a fight in an empty forum' was in the nature of an amusing (to me)cultural reference, missquoting the legendary comment about sometime footballer Dennis Wise.

So, more accurately:
You could start a fight in an empty forum, but in this case have started a fight in an active forum that, as a non FUMBBL player, has been judge ill suited to your needs, and had said fight moved to more appropriate, though largely inaactive, forum.

It's not quite as pithy, is it?

_________________
Barbarus hic ego sum, quia non intelligor illis -Ovid
I am a barbarian here because i am not understood by anyone
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 15:41 Reply with quote Back to top

Mock-up of skill rating calculator

That's a quick and dirty script to create the skill ranking number that I've been talking about. It is for B only, meaning it only uses your active B teams, and it is based on data from B games. It can obviously be expanded, but that'll do for a draft.

The system uses overall match-up data. It does not break those match-ups into different TV ranges. The main reasons are what I mentioned earlier, and the simple fact that it'll be a lot of work to do that, and wasn't as much work to do it this way for the draft.

The rating is based around the arbitrary number 1000, with the SD being 1000 as well. Given the fact that the racial SDs are wide, this means you're only likely to see values between 0 and 2000, with most of them closer to 1000. Additionally, the range is bounded by number of games, to prevent people who have played very few games from having those matches skew their rating too far in either direction. The bounding formula is +/- 50 x number of matches played... thus, if you've only ever played one match, the minimum rating you can have is 950, and the maximum is 1050, and so on.

Since calculating the rating takes several slow page requests from FUMBBL, it only calculates a given coach's rating once per day, and stores it.. the next time someone requests it, it just uses the stored version, unless a day has passed since the last time it was calculated, in which case it recalculates it.
JimmyFantastic



Joined: Feb 06, 2007

Post   Posted: Dec 14, 2011 - 15:47 Reply with quote Back to top

Seems pretty good I have to say

_________________
Pull down the veil - actively bad for the hobby!
Corvidius



Joined: Feb 15, 2011

Post   Posted: Dec 15, 2011 - 00:46 Reply with quote Back to top

Seems ok, although my rating is very low.
Display posts from previous:     
 Jump to:   
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Log in to check your private messages View next topic