33 coaches online • Server time: 14:32
Forum Chat
Log in
Recent Forum Topics goto Post Gnomes are trashgoto Post Roster Tiersgoto Post Gnomes FTW! (Replays...
SearchSearch 
Post new topic   Reply to topic
View previous topic Log in to check your private messages View next topic
harvestmouse



Joined: May 13, 2007

Post   Posted: Dec 13, 2011 - 21:28 Reply with quote Back to top

What a mess! I think you'd all be best getting back on subject.
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 13, 2011 - 22:42 Reply with quote Back to top

Corvidius wrote:
I think the idea is interesting but wouldn't there be diminishing returns? As people push for greater victories then the average rises and they have to take greater risks for the same rewards or am i simply being daft? I would consider solid consistant play to be a better benchmark to set.

Well, I'm not sure it's all that easy to shift the average even if you try particularly hard. Assuming you could, though, it would still be a measure of how well the average coach is doing with a given matchup, so your ability to meet or exceed the average result would still be a valid measure of your skill relative to the average.

Hitonagashi wrote:
VM, simply put, the idea is nice, but TD margin in no way indicates the skill of a coach. You can have a VMTD rating, that indicates how good you are at winning by large margins, but that doesn't mean much aside from your ability to win big.

There may be additional measures that could be factored in. I'm unconvinced that casualty information would be a good measure, though, since causing casualties is more a measure of play style (and a lot of luck, without specific skill combinations) than of consistent skill.

pythrr wrote:
Also, why do you care about the system so much, seeing that you don't (and never have) play any games?

Why do, say, aerospace engineers care so much about planes if they don't fly them? I'm more interested in the big picture than I am in little moving pictures.

happygrue wrote:
Is there enough CRP data to implement this? Probably not. Until such time as some basic TV distinctions could be made (4-5 categories maybe?) it seems likely that players would be able to do crazy things with zons at low TV and chaos at high TV etc.

I think there's enough data to implement it, yes. There's not enough data to break the data down into race match-ups by TV, but there's enough data to break it down into race match-ups.

It's also feasible that a sliding scale related to TV difference/underdog victory could be applied to alter the side of the rating change for each coach based on the TV difference for the match. It should be noted that the TV difference issue is going to be present in pretty much any rating system - people who are trying to raise their ratings benefit by picking lower TV teams to play. In this system, however, they have to do more than simply win in that situation to raise their rating.

Were_M_Eye wrote:
I belive that winning a game is far more important then how much you won it with. But VoodooMike's suggestion might make for more interesting games when people go for td:s instead of stalling away a whole half.

The real issue is that any win-only rating system will only reflect coach skill in any serious way when the coach is playing a race that has equal overall performance to the race of his or her opponent. It forces coaches to play high performance races lest they be disproportionally punished, rating wise, for their choice.

divino77 wrote:
would concessions factor into the equasion?

This is actually an important point - concessions end the game early and shift the score. While that might not be a problem in that it typically gives the victor a higher number of TDs than the usual game, it might need to be special cased in that a concession never raises the losers rating and never lowers the winners.

Woodstock wrote:
As long as you don't participate on this site, and just throw out random stuff, Im moving your threads to RL BB... Enjoy.

I participate in the forums, and this is specifically FUMBBL related as it is based on FUMBBL data. Put the threads anywhere you want, though, if you feel they somehow make baby jesus cry.

koadah wrote:
The dude even appears to be a supporter. Or is that a bug.

Why is that so surprising? This site produces interesting data that I've been working with, and someone out there is footing the bill.

Carnis wrote:
TD difference (actual vs average) - Good coaches are good defenders, this does not necessarily translate into 5-0 though. 1-0 should be as good as 2-1. 2-0 should be slightly better than 1-0.
CAS difference (actual vs average) - Good coaches are better blockers, give out less blocks, usually block more, have better built teams for blocking/getting blocked.

The theory is that using an all-encompassing average will result in a ratio that is below what is normal for a good coach, and above what is average for a bad coach, since it includes all games of that matchup. The better you are at managing to score while preventing the opponent from scoring, the better your chance of achieving a superior ratio than average, and that'd improve your rating

As I said with casualty information, it represents a non-mandatory play style, so using that information in the rating would penalize those who don't attempt to get casualties while rewarding the frequently vilified CLPOMB teams who could then beat the spread by just hurting as many players as possible rather than trying to score at all.

I don't disagree that the rating system might benefit from using more than just TD ratio, but the question becomes "what else is there that makes sense?" and "is that information recorded so we can use it?"
dode74



Joined: Aug 14, 2009

Post   Posted: Dec 14, 2011 - 00:53 Reply with quote Back to top

Quote:
the problem is that as you divide the data into smaller and smaller pieces, the confidence interval will get larger and larger, making the resulting averages less and less predictively useful.
Are they predictively useful as an average if that average changes with TV? If there is a change over TV with any matchup then even large steps would improve the predictive accuracy, and therefore the accuracy of the rating.
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 01:13 Reply with quote Back to top

dode74 wrote:
Are they predictively useful as an average if that average changes with TV? If there is a change over TV with any matchup then even large steps would improve the predictive accuracy, and therefore the accuracy of the rating.

Well, first off, remember that we're talking about TDs scored ratios now, not win percentages. We do know that win percentages shift with TV difference, but do not yet know if the ratio in question does. We need to see if there's a significant relationship there or not.

If there is, then certainly there might be something to be said for applying the resulting regression equation to the degree by which ratings are changed, based on the TV difference.
dode74



Joined: Aug 14, 2009

Post   Posted: Dec 14, 2011 - 01:23 Reply with quote Back to top

What do you mean, exactly, by "TDs scored ratio". How would that be defined?
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 01:35 Reply with quote Back to top

dode74 wrote:
What do you mean, exactly, by "TDs scored ratio". How would that be defined?

The average scoring ratio between two types of teams. My first thought is, for, say Dwarfs vs. Goblins, to simply subtract the number of TDs scored by the dwarf team from the number of TDs scored by the Goblin team, and average those across all games played between those two races.

That (and its standard deviation) should give a way to ascertain scoring performance relative to the norm, and distance from the norm for a given match.
dode74



Joined: Aug 14, 2009

Post   Posted: Dec 14, 2011 - 01:41 Reply with quote Back to top

Quote:
simply subtract the number of TDs scored by the dwarf team from the number of TDs scored by the Goblin team, and average those across all games played between those two races.

So TD difference would work?

I took a look at TD difference for Amazons vs Chaos from all R, B and L matches from 20 Jan 11. I think we're only interested in equivalent TVs (i.e. TVs which would be allowed in B), so I filtered out all the games where the TV difference was >15% of the smaller TV. This left 403 games total. I split those down by the mean TV of the two teams into 400TV steps (arbitrarily chosen) and got the following:
Code:
TV        Mean TD   Range @ CI95
<1400     +0.633     +0.42 to +0.84
1401-1800 +0.366     +0.12 to +0.61
1801-2200 -0.113     -0.51 to +0.28
2201-2600 Only 3 results - range covers almost 90%
>2601     No data

Aggregated data
All TVs   +0.44      +0.14 to +0.59

(positive numbers indicate the Amazons scored more, negative numbers indicate the Chaos team scored more)

Expanding the steps to 100TV and looking only at TV vs TD difference gives r=-0.577, p<0.05. The critical r value is 0.553 (11 degrees of freedom), so we can say there is a statistically significant negative correlation between the TV at which an Amazon vs Chaos game takes place and the TD difference in favour of the Amazons.
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 02:36 Reply with quote Back to top

dode74 wrote:
So TD difference would work?

It should.

dode74 wrote:
I think we're only interested in equivalent TVs (i.e. TVs which would be allowed in B), so I filtered out all the games where the TV difference was >15% of the smaller TV. This left 403 games total.

I think the system can work for non B divisions too, frankly. If we're going to restrict ourselves to B's paradigm, we'll have to stick to using on B's results.

It may not even be necessary to only use data where the range isn't massive.. why? because as the amount of data decreases, the standard deviation will increase, and that simply means the effect of a match on the rating will decrease in those cases, which is what we'd want anyway.

For creating a mock-up we're certainly going to have to use historical data, but theoretically an implemented system would use and include current data, or at least update the ranges routinely and use the updated ranges for matches held after that point.
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 04:08 Reply with quote Back to top

There's one thing I fundamentally don't like about this, and that's that it introduces a difference between a win/loss with one score and a win/loss with another. The problem with that is it rewards/punishes behaviour that people are otherwise simply not playing for. People then have to play for a 3-1 rather than 3-0 loss purely for their rating, rather than concentrating on protecting their players and gathering spp in a game they can't possibly win.

The results of a bloodbowl game are {Win, Draw, Loss} (plus of course the team impact). If you want to change that, you have to change the way everyone plays every game. A rating system should take what we're already trying to do (W/D/L) and measure how good at it we are, not completely change what we're trying to do.
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 04:17 Reply with quote Back to top

Irgy wrote:
There's one thing I fundamentally don't like about this, and that's that it introduces a difference between a win/loss with one score and a win/loss with another. The problem with that is it rewards/punishes behaviour that people are otherwise simply not playing for. People then have to play for a 3-1 rather than 3-0 loss purely for their rating, rather than concentrating on protecting their players and gathering spp in a game they can't possibly win.

Except that the ratios are based on a large number of historical data points, meaning it takes all the games when people were doing their normal playing, and uses the averages from that. If you're trying to game the system then sure, you want to try to score a lot of TDs (not that I see that being a bad thing, overall) but in general if you play the way you and everyone else normally does, it will still rate things appropriately (assuming the idea pans out).

Irgy wrote:
The results of a bloodbowl game are {Win, Draw, Loss} (plus of course the team impact). If you want to change that, you have to change the way everyone plays every game. A rating system should take what we're already trying to do (W/D/L) and measure how good at it we are, not completely change what we're trying to do.

The assertion is that a W+L- system does *not* accurately measure a coach's skill unless that coach is using a higher performance race. How can you tell if a goblin coach is skilled if goblins lose most of the time against any race but other Tier 3s?

Additionally, while comparing overall win percentage can work in the long run, it requires quite a few games before you can reliably rank a team against the average.
Carnis



Joined: Feb 03, 2009

Post   Posted: Dec 14, 2011 - 04:27 Reply with quote Back to top

VoodooMike wrote:

Carnis wrote:
TD difference (actual vs average) - Good coaches are good defenders, this does not necessarily translate into 5-0 though. 1-0 should be as good as 2-1. 2-0 should be slightly better than 1-0.
CAS difference (actual vs average) - Good coaches are better blockers, give out less blocks, usually block more, have better built teams for blocking/getting blocked.

As I said with casualty information, it represents a non-mandatory play style, so using that information in the rating would penalize those who don't attempt to get casualties while rewarding the frequently vilified CLPOMB teams who could then beat the spread by just hurting as many players as possible rather than trying to score at all.

I don't disagree that the rating system might benefit from using more than just TD ratio, but the question becomes "what else is there that makes sense?" and "is that information recorded so we can use it?"

Would it really reward them?

IF said teams cause a ton of casualties, then their expected casualties would also be extremely high, then if they only cause minor or low cas they might get penalized for losing to the average..

So shouldnt the statistics prevent this reward effect?

Also while nonmandatory, it is very useful to get a lot of casualties, to get the win..
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 05:25 Reply with quote Back to top

VoodooMike wrote:
Except that the ratios are based on a large number of historical data points, meaning it takes all the games when people were doing their normal playing, and uses the averages from that. If you're trying to game the system then sure, you want to try to score a lot of TDs (not that I see that being a bad thing, overall) but in general if you play the way you and everyone else normally does, it will still rate things appropriately (assuming the idea pans out).


So, if it's used on past data, you can't intentionally game the system, sure. But already in that past data is some people who play to score touchdowns for pride when they're losing, and some people who don't. Some people also take risks to give themselves an outside chance to win (or at least draw) an otherwise lost game, at a cost of most of the time losing by more. Some people play strategies which guarantee a narrow win rather than maximising their winning margin. All these things have an impact on the rating. They're not averaged out over many games because they're personality traits - i.e. they correlate with players between games. And they don't relate to skill. The end result is you've corrected one distortion (racial matchups) by creating another (playing to win-more and lose-less).

Which of the two is a bigger distortion? Well, who knows, because the one I mention is pretty close to unmeasurable. It's a moot question though, because I can absolutely assure you there are ways of accounting for racial matchups (and TV difference) that don't introduce this distortion and do make full use of the available data.

So rather than argue about how significant the distortion is, I'd highly recommend just using a different approach which doesn't have it needlessly built in. I'm not referring to the existing methods, which you criticise in your reply, so that criticism isn't particularly relevant.

It would obviously help if I suggested a specific alternative, but you must appreciate there's work to be done and time to be spent between claiming confidence in the existance of something and proposing a specific system.

VoodooMike wrote:
How can you tell if a goblin coach is skilled if goblins lose most of the time against any race but other Tier 3s?


By whether they manage to win/draw more often than other people who use Goblins, surely.


Last edited by Irgy on %b %14, %2011 - %05:%Dec; edited 2 times in total
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 05:33 Reply with quote Back to top

VoodooMike wrote:
The assertion is that a W+L- system does *not* accurately measure a coach's skill unless that coach is using a higher performance race. How can you tell if a goblin coach is skilled if goblins lose most of the time against any race but other Tier 3s?

Additionally, while comparing overall win percentage can work in the long run, it requires quite a few games before you can reliably rank a team against the average.


Actually I see the confusion here. I say the rating should be based entirely on Win/Draw/Loss, rather than (for instance) score. You've taken that to mean the system can't account for anything other than the W/D/L result. This is not what I meant at all. The difference is between inputs and outputs. TV and race are inputs, and should be factored in, whereas I'm just saying W/D/L should be the only output that's considered.
VoodooMike



Joined: Nov 07, 2010

Post   Posted: Dec 14, 2011 - 05:37 Reply with quote Back to top

Carnis wrote:
Would it really reward them?

I suspect so. Not all Chaos teams focus on getting casualties (hard as it might be to believe) - their mutation access gives them the opportunity to focus on any number of play styles. As such, the average team that focuses on casualties will likely achieve higher than average casualties, and the average team that focuses on anything else will get lower than average casualties. We'd have to expect the relationship between casualties and tds to be inverted if we hoped that would be smoothed out.

Carnis wrote:
IF said teams cause a ton of casualties, then their expected casualties would also be extremely high, then if they only cause minor or low cas they might get penalized for losing to the average..

Brought down by teams of that race that play to score rather than playing to maim. It is likely the average CLPOMB team will cause a higher number of casualties than is average for all Chaos teams, for example.

Carnis wrote:
Also while nonmandatory, it is very useful to get a lot of casualties, to get the win..

Absolutely, there are a lot of things that are useful for winning.. and by winning we mean scoring more TDs than the other guy does. Causing casualties is not a mandatory part of the game.. it can help you win, or not, but we do know TDs are required to win.

Irgy wrote:
Which of the two is a bigger distortion? Well, who knows, because the one I mention is pretty close to unmeasurable. It's a moot question though, because I can absolutely assure you there are ways of accounting for racial matchups (and TV difference) that don't introduce this distortion and do make full use of the available data.

Well, when you're ready to open Shroedinger's rating formula, let me know! Your assurances without details are worthless to me.
Irgy



Joined: Feb 21, 2007

Post   Posted: Dec 14, 2011 - 05:47 Reply with quote Back to top

VoodooMike wrote:
Well, when you're ready to open Shroedinger's rating formula, let me know! Your assurances without details are worthless to me.


They're not, because they could save you spending undue time on a fundamentally flawed system. Well, the criticisms more so than the assurances. Plus you don't have to just take my word for it, I have no doubt you could come up with a better system yourself in less time than I expect you'll spend defending your original idea.
Display posts from previous:     
 Jump to:   
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Log in to check your private messages View next topic