Elephant in Serengeti

PostgreSQL version migration

Migrating PostgreSQL from one version to another has become pretty easy. Using pg_upgrade it takes a few seconds to upgrade even a 100GiB database. It becomes interesting when you switch platforms, say from 32bit to 64bit, as we are facing it with our switch from Gentoo to NixOS. Our NixOS-based platform is stable enough for customers to use it. Some larger databases benefit especially from larger RAM sizes.

So the question is: How to migrate from 32bit to 64bit with as little downtime as possible?

pg_upgrade

pg_upgrade modifies the data files in-place. There is no SQL dump/restore involved, hence it’s very fast. Unfortunately, but for obvious reasons, the raw disk format of PostgreSQL on 32bit is not compatible with the format on a 64bit machine. So, no pg_upgrade in this case.

dump/restore

This is the classic way of migrating any database to another version. Dump the entire contents into SQL and feed it to the new server. It works. It’s slow.

The test database is about 150GiB on disk. A first approach might look like this.

newdb# pg_dumpall -h olddb.fcio.net | psql

Took 5h 42m. Ouch. 

Maybe changing some of the PostgreSQL settings helps.

fsync=off
full_page_writes=off

So we don’t sync writes to disk anymore. Also do not force to write full pages to the WAL. This should improve performance at the cost that the database is probably broken after system failure. This is not a problem during restore – if the system crashes we have to start over anyway.

Guess what. Didn’t change much. Actually it got a bit slower (5h 50m) but that’s probably due to other factors (virtualization, network storage).

Let’s try the custom format and only the main database of this particular application. The other databases make take 100MiB. So the diffence should be marginal.

newdb# pg_dump -Fc -h olddb.fcio.net dbname | pg_restore -d dbname

7h 19m. WAT. Turns out the custom format compresses the dump. This is quite pointless in our case and just consume a lot of CPU. Turn compression off then!

newdb# pg_dump -Fc -Z0 -h olddb.fcio.net dbname | pg_restore -d dbname

Now it’s a bit faster than the naïve approach: 5h 21m. Still nothing to be proud of! It turns out that pg_dump both requires quite some CPU and gets slowed down by network latency. When we create the dump on the old database VM, things start to improve:

newdb# ssh olddb.fcio.net sudo -iu postgres pg_dumpall | sudo -iu postgres psql

5h 4m. We saved about 40 minutes compared to the original approach. As seen above the custom format without compression performes better than plain pg_dump. Another thing is network latencies and buffering. Adding a small (88MiB) buffer might detach both processes enough so one doesn’t have to wait for the other: 

newdb# ssh olddb.fcio.net sudo -iu postgres pg_dump -Fc -Z0 dbname "|" \
    mbuffer | sudo -iu postgres pg_restore -d dbname

3h 33m. Now we are talking! Further increasing the buffer to 500MiB saved another half hour: 3h 4m.

So the final command would be:

newdb# source=OLDDBHOST
newdb# database=DBNAME
newdb# ssh $source sudo -iu postgres pg_dump -Fc -Z0 -C $database "|" \
    mbuffer -m 500m | sudo -iu postgres pg_restore -C -d postgres

Don’t forget to re-set the fsync/full_page_write parameters, if you have changed them.

All in all, measuring helps a lot to find the right set of tools. Your situation might be different: measure!

Image credit: Ganesh Raghunathan.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s