FUMBBL :: Online Blood Bowl League

Posted by koadah on 2011-09-16 15:16:04

So you're saying C-POMB is goooood right? :)

Posted by WhatBall on 2011-09-16 15:36:01

Great blog, but I have one concern over your hypothesis, which is "Poorer coaches can beat stronger coaches by taking tons of killers, and hoping the dice pan out."

The fault in trying to measure this theory is that you are assuming all players are playing to win. The glaring flaw of the Box, which is where the majority of the Claw-bomber issues arise, is that there is no incentive to win. Many coaches are happy to just try and kill your team for amusement. If somehow winning had a tangible effect on Box games, then the issue would not be as prevalent.

I still believe the rules are borked, and need fine tuning to fix this ridiculous combo though.

Posted by Garion on 2011-09-16 15:36:34

You don't need facts and figures. Bottom line is personal experience of individual games being played at a high Tv shows that the aformentioned combo removes players too effectively. Making too many games a formality. Why not jsut look as Random oracles games where he has won the toss at the start of the game and how that corrolates with the cas ditibution and how hiw win percentage changes when he wins the toss and loses it. I think that would be quite telling for starters.

Posted by Gran on 2011-09-16 15:42:05

Sounds interesting, but I think I might have to rain on your parade a little. How are you going to be able to tell who's a casual coach (luck-player) and who's a coach with a plan? You cannot really use CR for games in B, can you, especially since not all B-coaches play in R (indeed some have never done it), and BWR is not available as far as I can tell. Also, how will your suggested method take into account the fact that some players are better with one team than another?

Posted by Malerun on 2011-09-16 15:49:54

To validate the popular claim that cpomb is overpowered, just scatterplot result vs #cpomb, and calculate the correlation and variation.

To test our original claim, I would pick two similar teams one with and one without claw access (assuming not enough non-cpomb chaos, orcs and chaos springs to mind - 4ST4, 1 BG and some linos) and calculate correlation and variation of result and #cpomb, for bad coach vs bad coach, good coach vs bad coach, bad coach versus good coach and good coach versus good coach, the first mentioned having cpomb, the latter not.

Tbh, this would have been better as a forum topic, to better structure the discussion :)

Posted by Hitonagashi on 2011-09-16 15:54:50

Gran: You can't. There *has* to be selection, as you point out...or just using the win percentage of a coach instead of their CR (this might be better anyway!). I was originally relying on the huge dataset fumbbl has. Sure, there will be some coaches like Tarabaralla (who got to about 170 in box before playing any R games), but we can measure success.

Garion: That's the thing. I play lizards and elves, and my experience says opposite. I won 17/23 games against chaos/nurgle/cd with my lizardmen. That's why I don't want to rely on anecdotes, but data. I'm not fussed either way...I've never used a clawmbpo player, but I've played against plenty. What I want to work out, is, if it's a formality, how can you prove it?

Whatball: I think a majority of Box coaches play to win. They may resort to killing when winning is out of the question, but they play to win. Again, personal experience, but I've played 200ish games in the LRB6 box, and never had an opponent that did a "Studman" and abandoned the ball.

Koadah: :-P

Posted by DukeTyrion on 2011-09-16 16:13:49

Can't you just take a selection of coaches that play both ClawMB and normal teams and check the win factor of both teams.

The problem is, you are not saying that their winning margin will be higher, you are just saying the dice will have a bigger impact on each individual game, so those stat's might not tell you much.

Posted by freak_in_a_frock on 2011-09-16 16:26:13

You will never be able to porve it IMHO.

However it is still my biggest bugbear with Claw Pomb. The fact that it attracts newb coaches because it is the easiest way for them to win against superior opposition, but it doesn't teach them anything on how to become a better coach.

It is the equivilant of using Blanka in street fighter and just tapping punch all game. Yes you might win, but what have you actually learnt? What is the point if after 100 games you still have no idea on how to break a cage or pull off a reverse sweep? Not saying that all Claw Pomb caoches can't do these things, but they didn't learn them with claw pomb.

Posted by Cloggy on 2011-09-16 16:27:52

So what will it mean when a really good coach who has only ever played in B and thus has a CR of 150 beats a coach who hasn't played in Ranked for a long time but has an ancient CR of 170+ using Clawpomb?

Not a lot huh? :)

Posted by Hitonagashi on 2011-09-16 16:37:32

Correct Cloggy :).

That's why I don't want to analyse one coach, but the results over a set of games consisting of sets of many coaches, because I think CR is *broadly* accurate. In any meaningful dataset, there are useless anomalies.

Freak: Agreed.

Duke/Maelrun: Hmm, might work. I didn't do this as a forum thread because I didn't want to create a flame war about whether it *is* OP, just how you would construct a method of testing whether it is. Duke's method certainly has the advantage of simplicity :). Just analysing w/d/l percentages might be enough to at least give interesting data. My problem with that approach is that for example, I play Lizards a lot. I'd like to think I'm a reasonable lizzie coach...but I know for a fact I'm a shockingly bad dwarf one. I suspect the general theme(there are races that suit your playstyle) holds for many coaches, and so you need to analyse the race as a whole.

Posted by Ullakkomorko on 2011-09-16 17:15:21

Did I tell you about the time I won against Chuck Versus Blood Bowl? :)

I like Garion's approach though. With high TV killer teams (Chaos and Nurgle at least) the toss should count for a lot IF clawpomb is overpowered, at least over a higher number of games. I guess it would be unwieldy to test though, requiring lots of tedious going through replays.

Posted by the_Sage on 2011-09-16 17:23:41

What you want to do is create a sample of coaches who play both killer and non-killer teams.
Then make your own coach rating on some average win % per team type (deweighted to exclude bias from specific killer/ nonkiller contribution), and test see whether poorer coaches win more with killer teams, and/or better coaches win more with non-killer teams.)

Ideally you keep the individual difference information, for instance using a repeated measures ANOVA (with dependent measures win%killer, win%nonkiller, and independent measures coach_quality, played%killer.)
If you really want to get into the details you use a mixed effects model, that allows you to include per game information on the lower level (instead of per type average), while drawing conclusions on the coach level (based on coach ratings).

According to your hypothesis you should see a stronger effect of coach quality on win%nonkiller than on win%killer.
You could also split it out into who you play the game against. If luck becomes more relevant when playing against killers, then poor coaches should also lost more to killers than better coaches.

The hardest part is how to define coach_quality, particularly how to avoid biasing that value. I agree with previous posters that you can't use CR, so you'll need to compute your own substitute for BWR based on the complete list of games that you extract.

Posted by uuni on 2011-09-16 18:04:10

The first thing that comes to mind would be an experiment like following:

Null hypothesis: Variance of games of different rosters are not different.

Hypothesis: Variance of games of roster of teams that normally wield clawpomb is smaller than the variance of the complement.

If I have understood it right, you could use a statistical test if the variance of win-ratios of Clawpomb-race -rosters is smaller than the complement rosters. Would t-test work, can someone more familiar with statistics tell?

Posted by Purplegoo on 2011-09-16 19:54:05

'You don't need facts and figures. Bottom line is personal experience'...

To be honest, this line of thinking is the issue behind all of the threads, chatter and rubbish, surely?! Facts and figues are there to be used, to stop all of the noise! If someone came up with a decent test with proper answers, win.

I'm glad someone is starting down this road. To be honest, before you do a proper test, some sort of finger in the air, non scientific lead in would at least be interesting to fill the time. Find me (anyone who is in the 'overpowered' camp) 5 coaches with a mean win % of say 50 of active teams, and a standout Claw/MB/PO team with a win % of over 75%. Surely, if it's overpowered, this is a mega easy starter before anyone does some real work.

Posted by Garion on 2011-09-16 20:09:27

But it is goign to be nigh on impossible to do this at the moment, the TFF guys just say all data from ranked and blackbox is meaningless because of the environment the teams are in. While it is clearly not the environment the game was written for I still believe it is showing us what will happen in leagues in 5 or 6 seasons time. I don't think we will have any conclusive proof about this combo until some leagues are seriously broken by it. Will that ever happen? I don't know. I personally can't see it happening in WIL because of the brutal nature of promotion and relegation and the caps that will likly be impossed in time, but I may be wrong? Perhaps leagues the format of OBBA etc... will show in time whether this combo is as powerful as everyone seems to think. But until we have a good few seasons behind us in a number of leagues that aren't using house rules with which to use data from we will always have a large number of people doubting the integrity of any data provided.

We also have the problem of not all the indcuements being ready yet. Like mercs and a few secret weapons. I personally can't see them making any difference but then others will swear by them. But anyway, I have rambled on enough my point is basically League is where the focus needs to be to avoid any petty get out arguements.

Posted by Purplegoo on 2011-09-16 20:23:02

Garion;

a) I'm not going to be down on the 'TFF guys' (especially after that wonderful thread they locked whilst going on about how we lock too many threads - quality), but the 'TV matchmaking is to blame' argument doesn't massively cut it with me. I do wonder what avoiding the issue in [R] really proves, or what better method there is (see other forum threads for why win % isn't the right move). Also, who are we proving this to? Why would TFF (a site made up of mainly TT players that will never see this 'issue') care, and indeed, why should they? This is not something that will be a cure all for all sites and all men. I want FUMBBL answers for FUMBBL.

b) Not having all the rules is an issue. Can't disagree with that.

However;

c) The information is there. We just need someone to write a sound enough test, go through the hours of number crunching and come up with a robust answer.

The worrying thing for me isn't that this can't be done since it should be simple enough, it's what do we do with the answer if it comes up 'yes'? I hope that conversation waits for the answer, not the other way around.

Either way - some loud shouting and anecdotes does not prove a single thing. Polls cancassing opinions don't prove if there is an issue (a good one would only prove people thought there was), numbers prove things. Let's have some numbers.

Posted by Garion on 2011-09-16 20:51:26

I agree with all of that.

Point A) was what I was making too, although there are many fumbblers that have also said in the many threads kicking about that the environment is to blame its not just the TFF TT crowd. But yes do it for fumbbl sounds good.

My point was just that I just don't think we will ever achieve a consensus even in fumbbl or be able to catagorically proove anything untill the leagues have run for a few seasons which will provide more structured data.

But I will gladly help in anyway I can with any tests Hitonagashi wants etc... I too hope that we can find some conclusive proof from the data out there. I think a forum thread would be a good place to start. With more explaination from Hitonagashi and Koadah about exaclty what stats are obtainable then take it from there.

Posted by RC on 2011-09-17 12:46:23

Best factor .. though not flawless is winn percentage accross all divisions with more then 1000 games played to determine who is a good coach,, and that having it spread over lots of races.

Posted by JackassRampant on 2011-10-07 19:26:14

@ Hito: I think you need either of two things.
a) A "VOA"-type stat for Blood Bowl (CR doesn't cut it), or
b) A way of factoring out coaching skill.
The former is hard. The latter might be relatively easy... separate teams into two categories, those with so many instances of POMB or ClawMB (counting ClawPOMB double) and those without. Then compare W/L stats, factoring for who kicks first. If you use a huge population, like Ranked (or Ranked + Box, even better), you'll factor out the variations in coaching skill. H0 = POMB teams have the same relative win percentage for kicking or receiving first as the general population. H1 = POMB teams win more games, relatively speaking, when they receive first. You may want three categories, including one for bash but no/little POMB or ClawMB.

@ where the POMB problem comes from: I'm a FUMBBLer and a TFF'er, and I'm quite convinced it's an artifact of the Box. It's good in Ranked, too. But the real problem is not being able to get a breather. My Orcs are happy to go up against Claw-happy Nurgle or Chaos or whatever, but they're in Ranked, so they draw a lot of mirror-matches against Orcs/Dwarfs/whatnot and don't have to trim their TV to stay away from the constant waves of damaging Chaos. One game in three, no problem. Two games in three, problem.

2014

2013

2012

2011

2010


(bad)	(good)