Wednesday 6 August 2014

Chapter 10 intro and part 1. restoring with plain format

Yes I know, the previous chapter was number 8, this is number 10. 
The reason why is I realised I forgot completely to talk about the constraints so I've added a data integrity  chapter just after the the logical layout and the numbers shifted. 
Anyway the pdf on slideshare is updated with the latest version.

Restore

There's little advantage in saving the data if the recover is not possible. In this chapter we'll take a look to the fastest and safest way to restore recover the saved dump.
The backup format determines which program will be used for the restore. We'll take a look first on the restore from plain format. Later we'll see how the custom and directory formats are the best choice, giving flexibility and performance at restore time. Finally we'll see how to improve database performances at restore time sacrificing temporarily the reliability.

The plain format

As seen in 9 the pg_dump by default saves the entire database in plain format. This is an SQL script offering nothing but a straightforward reload strategy. Feeding the saved file into psql rebuilds completely the objects.
This format have few advantages. For example it's possible to edit the statement using a common text editor. This of course if the dump is reasonably small. Even loading a file with vim when its size is measured in gigabytes becomes a stressful experience.

The data is saved by default in the copy format. This guarantee the best performances at load time. It's still possible to save the data using the inserts but this will result in a very slow restore, having each statement to be parsed and planned.

Saving the schema and data in two separate files requires also an extra care at dump time. Restoring from a data only plain backup will very likely result in tables with foreign keys having their data missing because the key violation.

In order to avoid the problem, at backup time and when running a data only backup, the switch -disable-triggers should be used. This will emit the DISABLE TRIGGER statements before the data load and the ENABLE TRIGGER after the data is consistently restored. The following example will show a dump reload session with the separate schema and data save.
Let's create the simple data structure. We'll create a new database with a table for the addresses and another one for the cities. Between the cities an the addresses a foreign key will enforce the relation on the id city column.




Now let's put some data into it.




We'll now execute two dumps one for the schema and one for the data without the disable triggers switch.

postgres@tardis:~/dmp$ pg_dump --schema-only db_addr > db_addr.schema.sql
postgres@tardis:~/dmp$ pg_dump --data-only db_addr > db_addr.data.sql

Looking to the schema dump it's quite obvious what it does. All the DDL are saved in the correct order to restore the same database structure .
The data is then saved by pg_dump in the correct order for having the referential integrity guaranteed. In our very simple example the table t_city is dumped before the table t_address. This way the data will not violate the foreign key. In a scenario where a complex structure is dumped, this cannot be guaranteed. Let's run the same dump with the option -disable-trigger.

postgres@tardis:~/dmp$ pg_dump --disable-triggers --data-only db_addr > db_addr.data.sql

The copy statements in this case are enclosed by two extra statements for disabling and re enabling the triggers.



This way the FOREIGN KEY's triggers with any user defined trigger, will be disabled during the restore, ensuring the data will be safely stored. After the restoration the enable will restore any constraint enforcement.

The data saved in plain format is then restored using the command line client psql.
Let's then create a new database and restore it the saved dumps, first schema and then the data.


No comments:

Post a Comment