Wednesday 23 July 2014

Chapter 8 - part 2, performance tips

Performance tips

Despite the fact pg_dump doesn't affects the running queries, its strict transactional approach have some effects on the affected schema. Any alter schema is blocked until the backup's end. The vacuum efficiency is affected as well, because all the dead rows generated during the backup's run, cannot be reclaimed being potentially required by the backup's running transaction.
There are some tips to improve the backup's speed.

Avoid remote backups

The pg_dump can connect to remote databases same like any other PostgreSQL client. It seems reasonable then to use the program installed on a centralised storage and to dump locally from the remote cluster.
Unfortunately even using the compressed format, the entire database flows uncompressed and in clear, from the server to the remote pg_dump. The compression happens locally when the data is received.
This approach expose also a network security issue. If the environment is not trusted then the remote connection must happen on a secure channel. This add an extra overhead to the transfer and any failure on this layer will fail the entire backup.
A far better approach is to save locally the database, using the local connection if possible, and then copy the entire dump file using a secure transfer protocol like scp or sshfs.

Skip replicated tables

If the database is configured as logical slave in slony or londiste for example, backing up the replicated table's data is not important as the contents are re synchronised from the master when the node is attached to the replication system. The switch -exclude-table-data=TABLE is then useful for dumping the table's definition only without the contents.

Check for slow cpu cores

PostgreSQL is a multitasking but not a multithreaded database system. Each backend is attached to just one cpu. The pg_dump opens one backend connection to the cluster in order to export the database objects. The pg_dump process receives the data flow from the backend and it saves performing also the optional compression. In this scenario the cpu power is critical in order to avoid a bottleneck. This could be helped using the parallel export offered by pg_dump from the version 9.3. The functionality is implemented via the snapshot exports. As this was introduced with PostgreSQL 9.2 the paralle export can happen only from this version and only if the output format is directory .

Check for the available locks

PostgreSQL at various level uses the locks to ensure the data consistency at various levels. For example, when a table is read an access share lock on the relation is put in order to avoid any structure change. Any backend issuing an ALTER TABLE which affect the table structure, will wait for the lock to be released before acquiring itself an exclusive lock and then perform the change. The relation's locks are stored into the pg_locks table. This table is quite unique because have a limited number of rows. The maximum number of table's lock slot is determine with this simple formula.

 max_locks_per_transaction * (max_connections + max_prepared_transactions)


The default configuration permits have only 6400 table's lock slots. This value is generally OK. However, if the database have a great number of relations, a full backup, pg_dump could hit the slot limit and fail with an out of memory error.
All the three GUC parameters require a restart to apply the new changes so is very important to plan the change before the limit is reached.

Try to use flexible formats

This is more a good practice suggestion rather a performance tip. Exporting the database in plain text have some advantages. Is possible to load the dump just using psql and any file's corruption can be managed in a simple way; if the damage is limited of course. The custom and directory formats need the pg_restore utility for the restoration. We'll take a look to this approach in 9. Anyway, the custom and directory formats have alongside with the compression, the parallel restore feature and the selective restoration. The compression can be fine tuned to suit the export's nature. In this era of ``big data'' is something to consider seriously.

No comments:

Post a Comment