42 coaches online • Server time: 01:22
* * * Did you know? The best interceptor is Leena with 22 interceptions.
Log in
Recent Forum Topics goto Post Bots and Box?goto Post Grotty Little Tourna...goto Post The Light and The Da...
Christer
Online
Khemri Tomb Kings
Star
Khemri Tomb Kings
Record
59/24/37
Win Percentage
59%
Shambling Undead
Super Star
Shambling Undead
Record
51/5/10
Win Percentage
81%
Overall
[R]
Star
Overall
Record
229/56/79
Win Percentage
71%
Archive

2019

2019-04-14 23:33:08
rating 6
2019-04-07 16:59:39
rating 6
2019-04-07 00:55:26
rating 6
2019-01-08 15:27:38
rating 5.9
2019-01-05 02:58:18
rating 5.8

2018

2018-08-17 17:28:31
rating 6
2018-08-15 00:05:40
rating 6
2018-07-17 20:17:40
rating 6
2018-06-28 14:28:08
rating 5.9
2018-05-23 17:55:10
rating 6
2018-05-10 22:42:46
rating 6
2018-05-09 19:42:28
rating 6
2018-04-30 10:44:23
rating 5.8
2018-04-23 12:33:02
rating 5.8

2017

2017-04-23 18:06:35
rating 6
2017-04-06 23:00:56
rating 6
2017-04-03 19:06:00
rating 6
2017-03-29 22:35:46
rating 6
2017-03-25 16:18:39
rating 6
2017-03-11 21:24:26
rating 6
2017-02-14 14:23:58
rating 6
2017-02-10 14:54:03
rating 6

2016

2016-11-30 00:04:21
rating 6
2016-11-27 23:40:04
rating 6
2016-11-17 18:18:07
rating 6

2015

2015-09-06 23:59:26
rating 6
2015-01-24 15:56:29
rating 6
2015-01-22 13:10:32
rating 6
2015-01-19 21:20:53
rating 6
2015-01-10 19:03:45
rating 6

2014

2014-09-09 15:35:53
rating 6

2013

2013-04-26 11:48:40
rating 5.7

2012

2012-12-18 17:37:29
rating 5.9
2012-11-18 18:19:19
rating 6
2012-09-25 13:47:16
rating 5.6
2012-08-15 12:31:53
rating 5.9
2012-08-10 23:12:22
rating 5.9
2012-06-27 22:53:48
rating 5.9
2012-04-10 11:56:38
rating 5.9
2012-03-07 13:52:00
rating 5.9
2012-02-16 16:59:56
rating 5.9
2012-02-04 19:00:41
rating 5.3

2011

2011-07-25 23:32:43
rating 5.6
2011-05-23 13:12:52
rating 5.6
2011-02-04 14:26:18
rating 5.4

2010

2010-03-26 11:38:41
rating 5.1
2010-03-01 12:16:53
rating 5.6

2009

2009-12-08 16:40:30
rating 5.8

2008

2008-09-11 14:47:19
rating 4.1
2008-02-26 21:16:54
rating 5.3
2008-01-21 01:01:58
rating 5.6

2007

2007-11-06 21:23:14
rating 5.1
2007-10-16 00:26:11
rating 5.4
2007-09-30 17:10:03
rating 5.4
2007-09-30 12:01:42
rating 5.3
2007-08-09 12:14:57
rating 4.5
2007-08-06 12:02:52
rating 4.9
2007-08-03 17:56:21
rating 5.4
2012-08-15 12:31:53
75 votes, rating 5.9
Server performance
As a lot of you have been noticing, the site has recently been sluggish every now and again, mostly during peak hours (somewhere between 8pm and midnight server time). To give you an idea of how complex this stuff is, I'll try to explain the sequence of events and what I believe is the cause (and fix) for the problems.

First off, some background.

In the ancient beginning (roughly 10 years ago), FUMBBL ran on a single server with fairly unoptimized code and database queries. This was all well and good since the number of users was small enough to not stress the system. Over time, our user base grew and I went through a sequence of upgrading the hardware to cope with the increased traffic. When the site hardware was extensive enough, and our user base continued to increase, I ventured into the long and ardous process of optimizing the code and database queries to keep the site responding quickly enough to not be stressful to end users.

Back in 2008 and 2009 FUMBBL was responding to requests from roughly 200 coaches online at peak times. Things were looking good. In mid 2009 Cyanide's game hit the stand and GW sent us the infamous Cease and Desist letter, and a lot of people were expecting us to completely shut down. It's also important to realize that we were also stuck with the LRB 4 ruleset at the time and development of the client we used had effectively halted.

FUMBBL did take a significant hit in terms of userbase from this, but we proved to be more resilient than what many people thought. At the lowest point, we had a third of the traffic that we used to.

Since then, we've doubled up on effort with a lot of work on improving the website and integrating a new client (thanks Kalimar!). Since the lowest point, we've managed to increase traffic by 50%.

Now, despite the site handling a lot more users back in early 2009, I've added a bunch of new features that increase the number of website requests which leads me to the actual problems as of late.

Today, FUMBBL handles on the order of 2 million page loads per month (which averages out to give or take 1 request per second). Each page load involves loading the HTML page itself, plus each asset that is part of the site (images, stylesheets, scripts). With a conservative average, let's say that there are 9 assets like this and we're looking at 10 website "requests" per second. Now, this is on average over a full 30 day period and this isn't exactly hard to deal with for the servers.

The problem, however, is that the distribution of the traffic is very different from the average. At the peak times over the last couple of days, there has been on the order of 150 coaches online at the same time. This does not include people who simply access the site and (for example) read the forums without being logged in.

Before I go in to the details of what this means, let's back up a bit to the problems we've had lately. The way things progressed from my perspective is as follows:

1. Over the last couple of weeks, the website has been intermittently slow. Normally, I don't personally have any significant latency on loading pages and things are in general practically instant (since I have a practically dedicated gigabit network to the servers). Now and again, however, pages seemed to take a second or two to load. Because of the rarity of this, I didn't really think much about it.

2. A couple of days ago, I was notified that the game server went down. Ok, nothing really exceptional here. The FFB server does have a few bugs, and occasional crashes happens from time to time. I restarted the server and all was well.

3. The day after, the game server crashes again. Luckily, I'm at home since it's in the evening. People also complain that the site is very very slow. I confirm this, and pages are taking 20 seconds to load for me which is just completely unacceptable. Ok, so something is wrong, but what?

4. I start looking into the state of the servers.
- The web server is running at 5% CPU use. There is no memory swapping and no significant hard drive activity. The network is also not congested with traffic. hmm..
- So, let's check the database. low CPU use, no disk activity, no flood of connections to the database, no network issues. hmm again..

5. Some time passes, because I can't figure out what's wrong.

6. Then, it dawns on me: I had a similar problem back in late 2007/early 2008 (or so.. that was a while ago).

Now, this is where we come back to the story. Remember the 10 requests per second on average? With 150 people on the site at once, and all sorts of new features that keep making website requests, I realize that the web server simply can't process all the requests without running out of connections.

"Being online" on the web is a bit complicated since there's no persistent connection to the site as such (I'll skip the details on this since this blog is long enough as it is). For site purposes, being online means loading a page within the last 10 minutes. During these 10 minutes, an average coach loads on the order of 25 pages, which ends up as as 0.4 requests per second. Multiply this with 150 and you have 75 requests per second to deal with.

To make this almost impossible to understand, though, there are a number of optimizations done by both browsers and the web server. First off, the browser wants to be as quick as possible, so it opens up multiple connections to the server to download assets (pages, images, etc). It also caches things so it won't have to download the same thing multiple times for no reason. This effectively reduces the number of connections a browser opens to the server.

The server does things to improve speed. One of the major features is that it allows a browser to download multiple assets through a single connection, which again reduces the number of connections needed per browser and page. Ok, so far so good.. So what's the problem?

In order to make this actually work in practice, the server has to open a connection when the browser asks for it, process the first request and then wait for some specified amount of time for the next request to come down the line. This wait time has to be long enough to allow the browser to receive and process the previous request and then possibly re-request a new asset.

On the surface, this is all well and good, but this also means that the connection must remain open for a number of seconds. This time is normally defaulted to 15 seconds on the server. Remember how I said browsers open multiple connections to the server to increase speed? Firefox, for example, starts up to 6 connections like this for each page load.

Each connection takes up a fair amount of resources (on the order of 25MB of RAM on the server), so there is a configured limit to the number of these connections (more on that later).

With 150 users on the site at the same time, the server has to deal with a pretty large number of connections. Empirical tests (counting the number of processes running and comparing to the number of people online), the web server has 1 connection open for each user at any one time, plus a set of spare servers to deal with spikes in connections.

Simplified, the FUMBBL site was configured as follows before today:

- A maximum of 150 connections open
- Connection timeout at 15 seconds
- 10-20 "spare" processes waiting for connections

Now, with 150 people on the site at once, it's not that hard to realize that the 150 connections are not enough. If all of these people open up a connection to the site, the server can't process more users at all, and any further requests will have to wait. Remember that there are also non-logged-in users accessing the site and we're likely looking at 200+ people.

And since the browsers will try to open a second connection sometimes, the browsers will retain the first one for as long as possible (15 seconds), and the site will slow to a crawl for basically everyone, every time.

Ok, so realizing this, I have now configured the server to the following:

- A maximum of 512 connections open (meaning a theoretical max memory use of 12GB)
- Connection timeout at 10 seconds (this can't be too low, in order to keep things working properly)
- 40-100 spare processes.

Assuming that this is indeed the reason for the slowdown, this change should improve things immensly and the site should be responsive even during peak hours again. I will continue to monitor the situation and am hoping that this will, indeed, fix things. If it doesn't, I honestly don't know what could cause this issue. We'll just have to see.

Thanks for taking the time to read this massive wall of text, and I hope you understood at least parts of it :)

You've spent enough time reading now, go find a game to play and have fun!
Rate this entry
Comments
Posted by Woodstock on 2012-08-15 12:53:57
Rated 6 for the wall of text! ;)

j/k, thanks for the information.
Posted by GAZZATROT on 2012-08-15 12:55:12
Christer your dedication to this site is heart warming.

I have nothing intelligent to add to all the tech speak you (very well) explained.

Other than thank-you.
Posted by Garion on 2012-08-15 13:00:37
great read. you rock my world :P
Posted by DukeTyrion on 2012-08-15 13:03:06
Thanks for the update, and the continued great work on the site.

Reminds me a bit of when we had some tech's reconfiguring kernels in Unix so that the flow of processes performed at their optimal rate.
Posted by OenarLod on 2012-08-15 13:11:53
A big thanks to BigC!
Posted by Niebling on 2012-08-15 13:28:02
Good read :) thx for all your hard work
Posted by The_Murker on 2012-08-15 13:34:20
Very well explained. Thanks for the info.
Posted by garyt1 on 2012-08-15 13:55:37
Great to read these issues should be sorted now.
Posted by the_Sage on 2012-08-15 14:20:16
While actively using the site, I'll often have 4-10 fumbbl tabs open simultaneously, with various teams, forum threads, and blogs. I guess I should limit that a bit then?
Posted by Kyyberi on 2012-08-15 15:13:06
Rated 6 as in that enormous wall of text there wasn't a single 6 in it. Almost every other numbers is, but no sixes. What does that mean?
Posted by Christer on 2012-08-15 15:16:27
Sage, the number of windows you have is pretty much irrelevant. Keep using the site in any way you prefer and I'll deal with keeping the servers spinning smoothly :)
Posted by Wreckage on 2012-08-15 15:22:57
Kyberri you are wrong:
"Remember how I said browsers open multiple connections to the server to increase speed? Firefox, for example, starts up to [b]6[/b] connections like this for each page load."

Thanks for explaining, Christer.
Posted by Emeric on 2012-08-15 15:27:28
Excellent explanation, thx.
That reminds me that there is a lot of work involved and some expensive hardware running the site : i'll go get back my gold badge :)
Posted by xcver on 2012-08-15 15:28:41
Thanks for Everything...and this :)
Posted by Purplegoo on 2012-08-15 16:02:43
I always enjoy this sort of thing. Understand none of it. But enjoy it. ;)
Posted by dfunkateer on 2012-08-15 16:48:10
Surely rating The Big C's blogs anything less than a 6 is blasphemy.

I demand to know who did this so we can burn the heretic!
Posted by CroixFer on 2012-08-15 17:26:38
Can anyone give more information on the "infamous Cease and Desist letter"? I never heard of this (probably because I am a newbie in terms of site longevity)....
Posted by pythrr on 2012-08-15 17:38:56
I am so happy there are people in the world who understand this.

And also those who keep the magic little people in my TV screen happy.
Posted by gibby33 on 2012-08-15 17:44:22
Thx for the read I really enjoyed it.
Posted by oryx on 2012-08-15 18:03:41
rated six for a very clear explanation of a complicated concept!
Posted by Dalfort on 2012-08-15 18:04:01
Thank-you for all the time you spend keeping the site/servers running AND then take the time to explain in laymans terms for us.
Posted by Dan-Da-Man on 2012-08-15 19:22:51
And this is the reason why this site kicks Cyanides ass, Thank you C.
Posted by Balle2000 on 2012-08-15 21:27:51
Wow, this web specs blog reads like an X-file novel. Reading about connection timeouts have never been this thrilling! :)
Posted by SavageJ on 2012-08-16 07:11:58
I love stories of applications with bad response times and none of the basic server metrics out of line! Thanks for the good work, C.
Posted by DatMonsta on 2012-08-16 09:05:05
Wow, even I understood what this wall of text means!
Big C, you are a wizard!