TL;DR We are consolidating our hardware and racks in the data center and will perform a longer period of maintenance over multiple days. We have prepared thoroughly for the migration to avoid any downtimes and will use this opportunity to further improve our network.
Over the last years our data center setup has grown from a few machines in a single rack to three racks that are completely filled up with servers and additional customer-specific racks in our vicinity.
One of our basic tenets has always been to grow organically to avoid unnecessary waste. Now we have reached the limit of renting individual racks and our next organic step is to move to a separate row (and room!) of multiple consecutive racks. This gives us and you enough room to grow in the future while maintaining tight control over our network structure. It also gives us the chance to nicely clean up some smaller annoyances have accumulated over the years.
As this maintenance requires us to move all of our machines, we are leveraging this opportunity by having reviewed and improved all our technology layers:
- We are introducing a redundant spine/core into our switching setup and are upgrading to 40G on all backbone connections.
- Simplify our network infrastructure by reducing it to single-vendor components.
- Our routers get upgraded with 10 Gbit/s on internal and external interfaces.
- Our DNS is now more reliable by running it on the routers and having it included in automatic failover.
- We improved our VM migration code to better support large migration tasks like moving whole racks around.
- Our overall resource usage has around 40% or more free capacity on all of CPU, RAM and storage.
- We are keeping a set of SSDs and HDDs on hand just in the case that disks should experience failures after turning the servers off. of them should experience issues when turning them back on.
- Virtualisation hosts that have not yet been upgraded to 10G storage interfaces will be upgraded at that time.
The maintenance itself will be performed during regular business and evening hours as all involved components are fully redundant and have been tested recently. We will perform all steps slowly and carefully, leaving enough capacity and time to verify individual steps to reduce the chance for critical mishaps.
Nevertheless, our back office personnel will be monitoring the situation closely and will be able to respond to any issues immediately.
If you have any questions or feedback – let us know through your usual contact channels or by email to email@example.com.
Cover photo by Tristan Schmurr, © 2012 CC-BY-2.0