FUMBBL :: Online Blood Bowl League

13 coaches online • Server time: 03:30

* * * Did you know? Up until now, 1517326 players have died on the pitch.

Recent Forum Topics

Gnome Tactics

Majors Finals Replay...

Tournament- Harlow, ...

Borak Build Log - OS and Software

Preparation
The first step is to get hold of an install disc. FUMBBL uses Ubuntu Server, and I downloaded the latest LTS (Long Term Support) edition of it (14.04 at the time).

Then I write the ISO image to a blank CD/DVD (it's also possible to write it to a USB thumbdrive, but it was quicker for me to find a disc).

Then we boot from the disc (some BIOS fiddling required), and get this:

Ubuntu start of installation

From this point, a fairly boring process starts, where we answer questions about location and keyboard layout. It's quite dull and I haven't documented it in detail. The only special part of the install at this point is the drive partitioning. I've opted for a 16GB Swap space (mostly because Linux usually likes having one) and the remaining space set up as a large partition for the root folder (/). I've only done this on one of the SSDs at this time, and will be doing further things on the main system drive later in the process.

On package selection, I choose only the OpenSSH package for now:

Ubuntu package selection

Once the OS has completed installing, the machine will reboot into the newly installed system. At this point, I log in to it (using the user I created during the install), and check the IP number of the machine (this is a temporary IP that will be changed when it's time to move the server into the server network).

That being done, we SSH into the server (remember how OpenSSH was installed) from our desktop to make things easier:

First remote login

See those lines about updates being available? Let's fix that first. We run "apt-get update" to update the list of packages available, then "apt-get upgrade" to see what needs to be upgraded

Upgrading packages

This upgrade takes like 15 seconds (yay for fast internet and SSDs).

At this point, the idea is to convert the file system from EXT4 (the default one) to a file system called btrfs. This is a pretty modern file system that in theory allows me to implement RAID in order to prevent data loss should one of the drives go bad. So, following a guide the next step would be to boot into a live CD (first trying the Ubuntu install disc to see if it has the things I need).

.. And of course that doesn't work directly, so we go back to the installed system and make sure we have the btrfs tools installed:

Installing BtrfsTools

The Ubuntu installer doesn't have the things I need to do the btrfs conversion, so we download a proper live CD and make a disc to boot that. After some poking around, we manage to get things sorted with the btrfs conversion:

Looks like we converted to btrfs

After a bunch of further research, I managed to get the btrfs raid mirroring configured. First I copied the partition table to the second SSD followed by adding the new partition to the root pool. Trying to rebalance to raid1 caused some errors stating there wasn't enough space, which I ended up solving by running a rebalancing without converting to raid1 first. Once the rebalance was done, converting data and metadata to raid1 wasn't too hard.

After all that, I now have a RAID1 mirrored root volume:

RAID1 configured root partition. The "unknown" entry is apparently a known display issue and refers to some global reserve buffer type thing

Monday, September 14

A day later, I've gotten hold of the missing SATA cable and connected the second 4TB drive. I also configured the drives into a RAID1 setup (again on top of btrfs) and installed (but not yet configured) the firewall software I use (Shorewall). Very limited on time today, so that's about all I managed to get done.

Tuesday, September 15

Work for the day started with moving the machine into the server network, and configuring IP addresses:

Network interfaces are configured and the server's in the correct network.

As you may notice, the machine's also renamed to Willow. Borak, after being saved, decided enough's enough and took off complaining about 8 years of service and not enough pay. I think he's going back to Blood Bowl..

Next up, it's time to set up Shorewall (the firewall software I use). Each machine on the FUMBBL network has its own firewall rules to limit network access as much as possible:

Initial firewall rules. Pretty simple for this server.

And now it's time to install the main software that will run on the machine: The database. I have decided to attempt a slight switch in what database software I use. Instead of MySQL with the InnoDB backend, I'm going to try to use forks of both projects: MariaDB instead of MySQL, and XtraDB instead of InnoDB. I'm trying MariaDB because it's more OpenSource than MySQL at this point (and it's a fork made by the original authors of MySQL, before they sold it off to Oracle). XtraDB is a fork of InnoDB (which I use currently) and is a version of InnoDB that's essentially optimized for modern multi-core CPUs. In addition, the company who's in charge of XtraDB (Percona) has a couple of tools I've used for InnoDB that have impressed me, so I will it a shot with this new install. If everything breaks down, I can always revert back to MySQL and InnoDB.

So, we install MariaDB (XtraDB comes by default with MariaDB):

Installing MariaDB.

This is where things get tricky. I want to alter the storage method I use for storing data (from a large table space to one file per table), which means that the backups I normally do can't be used since they're more of a file backup than a data backup (it's complex to explain this without getting incredibly technical). So, I need to make a backup using a different method (mysqldump).

After completing the backup with the correct options and transferring the file over to the new server (I sort of do this in one step), I configure the new server to be a replication slave and restore the backup (11GB file). These steps are slow and boring. Mainly waiting for things to complete without any visible progress or indication on how long it will take. Unfortunately, it's hard to show screenshots of this process as it involves database passwords I don't want to publish (for obvious reasons).

With the backup and restore completed, the next step is to actually start the replication and "catch up" with the remaining changes from the live server:

The new server is trying to catch up with the live one. It's a slow process.

This part of the process appears to be significantly slow. I've verified that things do get updated on the slave, but it will take quite a while to catch up properly. It's also getting late here so it's time for me to call it a day and continue tomorrow.

Tuesday, September 15
After spending some time thinking about the database replication slave not catching up, I decided to do some searching on the topic and ended up changing some configuration in the database settings. For those of you who may be interested, I specifically changed the following settings:

innodb_flush_log_at_trx_commit = 0
innodb_adaptive_hash_index = 1
innodb_doublewrite = 0
innodb_flush_method = fsync

After switching to this, and restarting the database, the slave synced up practically instantly and this is what we get now:

The replication slave is now up to date!

Of these settings, I am changing one (innodb_flush_log_at_trx_commit) as it is related to how changes are written to (and flushed) to disk. Setting it to "2" is fine for the FUMBBL database, which means that if there's a crash or power failure, a second's worth of changes could be lost (which is reasonable to me). Playing around with this setting (switching it back to "1") makes the slave fall behind the master straight away. Searching the net about this explains why: Slave replication is done in a single thread, and write-heavy applications will always have a problem keeping up with the threaded master server if transactions need to be flushed to the disk instantly.

Next up, I set up the backup process on the new server. This is pretty interesting as one of the major drivers for making the change to one file per table rather than a big table space was to improve backup speeds and size.

So, I set up my backup script on the new server and run it. To give you an idea of the backups on the old server, they took on the order of 11 hours to complete (and they run daily) and they end up roughly 78GB in size (compressed). With the new setup (and faster hardware), the backup takes 9 minutes and 15 seconds to complete, and end up at 6.9GB in size. I expected these numbers to improve, but.. Wow..

After this, I configure, and slightly update, the offsite backup script (which encrypts and sends the backup to Amazon S3). Overall, it takes roughly 10 minutes to upload the encrypted backup (more or less saturating the 100Mbps connection I have).

During the process of sorting backups, I also moved some files around on the btrfs partition to make things a bit cleaner than before. Essentially, btrfs supports something called "subvolumes", which in a way works like a folder but can be mounted into the file system properly. In a way, subvolumes are like partitions, except that the available space is shared between the different subvolumes (unless you enable quotas, which I don't need).

Another benefit of having subvolumes instead of mounting the "root" subvolume in the file system is that it allows me to do snapshots should I need to in the future. In principle, a snapshot works like a copy of all the files in the file system, except it's effectively instant to make them and doesn't take up any space on the drive. Very very useful things to have available at times.

After being distracted a bit by the subvolumes, I'm getting to a point where things are getting close to being ready to switch over. I will, however, keep it running as-is to see if scheduled jobs function as expected and things keep ticking. I'm not particularly stressed about the current setup as it stands. Even if the old server would completely die on me, the replication setup would mean no loss of data (possibly a second or so due to inherent replication lag).

Other than some monitoring of the system, I will leave the server as-is until I feel confident things are working. At that point, I'll announce a maintenance downtime in order to do the actual server swap. That involves stopping the site and running games, cut the slave replication setup, shut down Borak and move the Willow hardware into the 4U case I use for servers and start everything up again (possibly changing the IP of Willow to simplify things).

Transferring the server into the rack cabin