Tuesday 16 September 2014

Book done!

The book is now complete. There's still a lot to do for reviewing the writing and fixing the bad grammar. Anyway is about 107 pages and I'm pretty satisfied. I've worked on this document for 4 months in my spare time and I became more confident with my English during the  writing.

So, what's next? I'll spend the next couple of weeks reviewing and fixing the book. After that I'll put the pdf  on lulu.com free for download and in hardcopy. The latter will have a production/shipping cost of course. I'll also put the mobi version on amazon for kindle distribution. Unfortunately amazon doesn't allow free books. I'll put the minimum price possible 0.99$. Anyway if anybody wants the mobi for free just give me a shout.

I've already started the second volume. This one will cover the PostgreSQL's advanced topics like the memory manager and the HA.

And now there are the final two sections. The restore performance and the dba advice.

Restore performances

When restoring a database, in particular in a disaster recovery scenario, the main goal is to have the data back on line as fast as possible. Usually the data section's restore is fast. If the dump has been taken using the copy statements, which are enabled by default, the reload requires a fraction of the entire restore's time. Taking the advantage of the parallel jobs, available for the custom and directory format, it's still possible to improve the data section's reload.

The other face of the coin is the post-data section. Because the objects in this section are mostly random access operations, the completion can require more time than the data section itself; even if the size of the resulting objects is smaller than the table's data. This happens because the unavoidable sort operations are CPU and memory bound. The parallel restore gives some advantage, but as seen in 10.2 each loop's process is single threaded.

Setting up an emergency postgresql.conf file can speed up the restore, reducing the time up to 40% than the production's configuration. What it follows assumes the production's database is lost and the data is restored from a custom format's backup.


When reloading the data from the dump, the database performs a so called bulk load operation. The PostgreSQL's memory manager have a subroutine which protects the shared segment from the block eviction caused by IO intensive operations. It's then very likely the ring buffer strategy will be triggered by the restore, sticking the IO in a small 4 MB buffer protecting the rest of the memory. A big shared buffer it can cache the data pages when in production but becomes useless when restoring. A smaller shared buffer, enough to hold the IO from the restore processes will result in more memory available for the backends when processing the post-data section. There's no fixed rule for the sizing. A gross approximation could be 10 MB for each parallel job with a minimum cap of 512 MB.


The wal_level parameter sets the level of redo informations stored in the WAL segments. By default is set to minimal, enabling the xlog skip. Having the database in with a standby, or simply using the point in time recovery as alternate backup strategy requires the parameter to be set to archive or hot_standby. If this is the case and you have a PITR or standby to failover, stop reading this book and act immediately. Restoring from a physical backup is several time faster than a logical restore. If you have lost the standby or PITR snapshot then before starting the reload the wal_level must be set to minimal.


Turning off fsync can improve massively the restore's speed. Having this parameter turned off is not safe, unless the cache is have the backup battery to prevent data loss in case of power failure. However, even without the battery at restore time having the fsync off is not critical. After all the database is lost, what else can happen?

checkpoint_segments, checkpoint_timeout

The checkpoint is a vital event in the database activity. When occurs all the pages not yet written to the data files are synced to disk. This in the restore context is a disturbance. Increasing the checkpoint segments and the timeout to the maximum allowed values will avoid any extra IO. In any case the dirtied blocks will be written on disk when the buffer manager will need to free some space.


There's no point in having vacuumed the tables after a complete reload. Unfortunately autovacuum does not know if a table is being restored. When the limit for the updated tuples is recognised the daemon starts a new process wasting precious CPU cycles. Turning off temporarily the setting will let the backends to stay focused on the main goal. The data restore.


Limiting the max connections to number of restore jobs is a good idea. It's ok also giving a slight headroom for one or two connections, just in case there's need to log in and check the database status. This way the available memory can be shared efficiently between the backends.


This parameter affects the index builds which are stored in the restore's post-data section. Low values will results in the backends sorting on disk and slowing down the entire process. Higher values will keep the index build in memory with great speed gain. The value should be carefully sized keeping in mind the memory available on the system. This value should be reduced by a 20% if the total ram is up to 10 GB and by 10% if bigger. This reduction is needed to consider the memory consumed by the operating system and the other processes. From the remaining ram must be subtracted the shared_buffer's memory. The remaining value must be divided by the expected backends to perform the restore. For example if we have a system with 26GB a shared_buffer of 2 GB and 10 parallel jobs to execute the restore, the maintenance_work_mem is 2.14 GB.
26 - 10% =  23.4
23.4 - 2 = 21.4
21.4 / 10 = 2.14
Ignoring this recommendation can trigger the swap usage resulting in a slower restore process.

Get DBA advice

The database administration is weird. It's very difficult to explain what a DBA does. It's a job where the statement ``failure is not an option" is the rule number zero. A DBA usually works in antisocial hours, with a very limited time window. A DBA holds a lot on the shoulders and that's the reason why those people can seem hostile or living in their own world.
Despite the strange reputation, a database expert is an incredible resource for building up efficient and scalable designs. Nowadays is very simple to set up a PostgreSQL instance. Even with the default configuration parameter the system is so efficient that basically doesn't show any problem when running at low or medium load. If this, at first sight, seems a fantastic feature actually is a really bad thing. Any mistake at design level is hidden and when the problem sooner or later appears is too late.

This final advice is probably the most important of the entire chapter. If you have a DBA don't be shy. Ask for any suggestion, even if the solution seems obvious or if the task is small. The database layer is a completely different universe. A small mistake can result in a very big problem.

Of course if there's no DBA, that's bad. Never sail without a compass. Never start a database project without an expert. A professional to take care of the most important part, the foundations.

If this is the case, ask your employer to hire a PostgreSQL DBA as soon as possible.

No comments:

Post a Comment