All posts by Christian Kauhaus

About Christian Kauhaus

Christian is a systems engineer working with Flying Circus Internet Operations.

Dirty Cow: Restarting all VMs

All VMs are currently affected by the “Dirty Cow” kernel bug. The upcoming release 2016_034 contains a kernel update which upgrades Linux to the unaffected version 4.4.28. As usual, the kernel update requires to reboot all VMs.

Schedule:

  • Tue 15 through Thu 17 November 2016: reboot staging VMs
  • Thu 17 through Thu 24 November 2016: reboot productive VMs.

VM reboots will be scheduled along the agreed maintenance windows. We will piggy-back a Qemu binary environment update which would require a separate reboot otherwise.

Old red telephone and an old computer in between cogs and wheels.

Sneak Preview: Upcoming FC Platform and Infrastructure Features

We are planning to implement some cool stuff for the Flying Circus hosting platform and its underlying infrastructure during the second half of this year. In this post, I will give a preview to technical improvements you can happily look forward to.

All of these improvements are included in the platform subscription (this is what the platform subscription is actually for!) so you don’t have to pay extra for any of them.

Continue reading Sneak Preview: Upcoming FC Platform and Infrastructure Features

Thoughts on systems management methods

Reading Why Order Matters: Turing Equivalence in Automated Systems Administration (by Steve Traugott and Lance Brown) 15 years ago has been a career-changing moment for me. In this blog post, I will explore the meaning of some of the points made in this article for today’s data center infrastructures. I will also give a bit of background on what motivated our recent move to NixOS.

Continue reading Thoughts on systems management methods

Improving Ceph OSD start-up behaviour with vmtouch

We have a love/hate relation ship with Ceph. On one hand, it is probably the best open source distributed storage around. On the other hand, Ceph repeatedly exhibits unexpected behaviour under high load. And it is absolutely correct that you expect Flying Circus VMs to perform evenly. That is something we keep revisiting regularly. In the following article, I will describe an improvement we have applied on a common pain point: I/O hangs during OSD restarts.

Restarting an OSD (Object Storage Daemon) places additional load on its backing disks. Flying Circus business growth led to increasing storage I/O demand. While this is generally a good thing, it brought our main Ceph cluster near its throughput limit for several times. Danger ahead: The storage cluster is running fine as long as nothing special happens. But if something unusual happens, the cluster suddenly goes over the tipping point and performance becomes shaky.

Continue reading Improving Ceph OSD start-up behaviour with vmtouch

VENOM’s little brother is here – another Qemu security upgrade required

A new Qemu vulnerability has been discovered recently. We are going to proactively reboot all VMs during the next days.

Update 2015-08-05: The VM restarts will be performed during maintenance windows according to every customers’ schedule tonight.  We decided to skip the regular lead time due to the importance of this update and to speed up another important update to our storage and backup infrastructure. We are paying close attention to keep your applications and your data safe, especially after the events in recent months. The current and upcoming changes belong to the promised updates, upgrades, and improvements to our infrastructure in response to those outages.

Continue reading VENOM’s little brother is here – another Qemu security upgrade required