Category Archives: Platform

Platform features, security updates, technical stuff

Old red telephone and an old computer in between cogs and wheels.

Sneak Preview: Upcoming FC Platform and Infrastructure Features

We are planning to implement some cool stuff for the Flying Circus hosting platform and its underlying infrastructure during the second half of this year. In this post, I will give a preview to technical improvements you can happily look forward to.

All of these improvements are included in the platform subscription (this is what the platform subscription is actually for!) so you don’t have to pay extra for any of them.

Continue reading Sneak Preview: Upcoming FC Platform and Infrastructure Features

Downloading and extracting files with batou

During our daily work with deployments it is kind of routine to customize a package downloaded from the internet. In most cases this means downloading a tar file, extracting it and finally to modify some of its content to fit our needs.

This blog post will show you how this can be done during the deployment using batou and without fuzzing around with command line options of tar.

Continue reading Downloading and extracting files with batou

Elephant in Serengeti

PostgreSQL version migration

Migrating PostgreSQL from one version to another has become pretty easy. Using pg_upgrade it takes a few seconds to upgrade even a 100GiB database. It becomes interesting when you switch platforms, say from 32bit to 64bit, as we are facing it with our switch from Gentoo to NixOS. Our NixOS-based platform is stable enough for customers to use it. Some larger databases benefit especially from larger RAM sizes.

So the question is: How to migrate from 32bit to 64bit with as little downtime as possible?

Continue reading PostgreSQL version migration

Ceph performance learnings (long read)

We have been using Ceph since 0.7x back in 2013 already, starting when we were fed up with the open source iSCSI implementations, longing to provide our customers with a more elastic, manageable, and scalable solution. Ceph has generally fulfilled its promises from the perspective of functionality. However, if you have been following this blog or searched for Ceph troubles on Google you will likely have seen our previous posts.

Aside from early software stability issues we had to invest a good amount of manpower (and nerves) into learning how to make Ceph perform acceptably and how all the pieces of hard drives, SSDs, raid controllers, 1- and 10Gbit network, CPU and RAM consumption, Ceph configuration, Qemu drivers, … fit together.

Today, I’d like to present our learnings both from a technical and methodical view. Specifically the methodical aspects should be seen in the retrospective of running a production cluster for a comparatively long time by now, going through version upgrades, hardware changes, and so on. Even if you won’t be bitten by the specific issues of the 0.7x series in the future, the methods may prove useful in the future to avoid navigating into troublesome waters. No promises, though. 🙂

Continue reading Ceph performance learnings (long read)

Automatic installation of Oracle Java

Our customers at times require Oracle Java for their applications. Our new platform is based on NixOS. As with most Linux distributions, Oracle Java cannot be installed just like that. Oracle’s license prevents redistribution or direct downloading from their servers. NixOS is no exception there.

While manual installation is pretty straightforward on NixOS, ultimately an automated process is what makes operators happy. We use Batou for this.

Continue reading Automatic installation of Oracle Java

Thoughts on systems management methods

Reading Why Order Matters: Turing Equivalence in Automated Systems Administration (by Steve Traugott and Lance Brown) 15 years ago has been a career-changing moment for me. In this blog post, I will explore the meaning of some of the points made in this article for today’s data center infrastructures. I will also give a bit of background on what motivated our recent move to NixOS.

Continue reading Thoughts on systems management methods

Improving Ceph OSD start-up behaviour with vmtouch

We have a love/hate relation ship with Ceph. On one hand, it is probably the best open source distributed storage around. On the other hand, Ceph repeatedly exhibits unexpected behaviour under high load. And it is absolutely correct that you expect Flying Circus VMs to perform evenly. That is something we keep revisiting regularly. In the following article, I will describe an improvement we have applied on a common pain point: I/O hangs during OSD restarts.

Restarting an OSD (Object Storage Daemon) places additional load on its backing disks. Flying Circus business growth led to increasing storage I/O demand. While this is generally a good thing, it brought our main Ceph cluster near its throughput limit for several times. Danger ahead: The storage cluster is running fine as long as nothing special happens. But if something unusual happens, the cluster suddenly goes over the tipping point and performance becomes shaky.

Continue reading Improving Ceph OSD start-up behaviour with vmtouch

batou – recent improvements and roadmap update

batou is our open source web application deployment utility. We use it to perform simple and complex application deployments on top of the Flying Circus platform as well as into Vagrant VMs or local developer instances.

At the recent Plone Alpine City Sprint we took the time to improve batou’s documentation a lot. You can read it on https://batou.readthedocs.org. The basic concepts (modelling your application and fitting it into an environment) should be easier to understand and we have made a full guide through all the important features if you want to get started. Also, we are covering API reference and CLI commands almost completely now.

Continue reading batou – recent improvements and roadmap update

Announcing our new NixOS-based platform generation

This text first appeared in our October issue of Airmail, our monthly newsletter (subscribe). This is an important topic and we want to make sure it reaches you.

Why are we moving towards NixOS?NixOS Logo

NixOS is a devops-friendly package manager and Linux distribution. It solves similar issues and has similar ideas about how things should work as we had when we started building the original Flying Circus platform. Continue reading Announcing our new NixOS-based platform generation

VENOM’s little brother is here – another Qemu security upgrade required

A new Qemu vulnerability has been discovered recently. We are going to proactively reboot all VMs during the next days.

Update 2015-08-05: The VM restarts will be performed during maintenance windows according to every customers’ schedule tonight.  We decided to skip the regular lead time due to the importance of this update and to speed up another important update to our storage and backup infrastructure. We are paying close attention to keep your applications and your data safe, especially after the events in recent months. The current and upcoming changes belong to the promised updates, upgrades, and improvements to our infrastructure in response to those outages.

Continue reading VENOM’s little brother is here – another Qemu security upgrade required