Monday 25 August 2014

The missing chapter 6 part 1 and two, data integrity

I've started the sixth chapter, the one on the data integrity I've forgotten. There are the first two parts alongside with the introduction. I've also updated the book on slideshare with the new cover and the last incomplete chapter for the developers. The beautiful cover is made by Chiaretta & Bon. Kudos and many thanks.

I've also uploaded the latex sources on github for anybody to fork and review my crappy english. My former colleague and friend Craig Barnes already started reviewing the tex files, many thanks for the priceless help.


Here's the github repository url : https://github.com/the4thdoctor/pgdba_books

Data integrity

There's just one thing worse than losing the database. Having the data set full of rubbish. The data integrity has been part of PostgreSQL since the beginning. It offers various levels of strength ensuring the data is clean and consistent. In this chapter we'll have a brief look to the various constraints available. The PostgreSQL's constraints can be grouped in two kind. The table constraints and the column constraints. The table constraints are defined on the table's definition after the field's list. The column constraints appear in the field's definition after the data type. Usually for the primary keys and the unique keys the definition is written as table constraint.
The constraint applies the enforcement to any table's row without exclusion. When creating a table constraint on a fully populated table the data is validated first. Any validation error aborts the constraint creation. However, the foreign keys and check constraints accept the clause NOT VALID. With this clause the database assumes the data is valid and skips the validation. The cration is almost immediate. The new constraint is then enforced only for the new data. When using this option the data must be consistent.

Primary keys

A primary key is the unique row identifier. Having this constraint enforced ensures the row can be addressed directly using the key value. A primary key can be enforced on a single or multi column. The data aspect must be unique with the strictest level. That means the NULL values are not permitted in columns participating to the primary key. When creating a primary key this implicitly adds a new unique index on the affected fields. In order to avoid the exclusive lock on the affected table the unique index can be built before the primary using the CONCURRENTLY clause and then used in the primary key definition as shown in 8.3. Using the primary key is the fastest way to access the table's contents.
There is the primary key definition as table and column constraint.



With the table's constraint definition is possible to specify the constraint name and to have a multi column constraint. When writing a multi column constraint the participating columns should be listed separate by commas.

The most common primary key implementation, and probably the best, is to have a serial column as primary key. A serial field is short for integer NOT NULL which default value is associated to the nextval for an auto generated sequence. Because the sequence have its upper limit to the bigint upper limit, this ensures the data does not wraps in the table's lifetime. In the case the primary key is expected to reach the value of 2,147,483,647 the type of choice should be bigserial rather serial. This will create the primary key's field as bigint which upper limit is 9,223,372,036,854,775,807.

However it's still possible to alter the field later in order to match the new requirements. Because changing the data type requires a complete table's rewrite, any view referencing the affected column will abort the change.

Here's the t_data's type change output with the client message level set to debug3.



Dealing with a big amount of data presents also the problem to have enough space for fitting twice of the original table plus the downtime caused by the exclusive lock on the affected relation. A far better approach is to add a new bigint NULLable column without default value. Setting up a trigger for the inserts will keep in sync the new values with the original primary key. Then an update procedure will set the value for the rows. This should run in small batches to avoid to overfill the pg_xlog directory with long running transactions. When everything is in place the new column could then become NOT NULL and a unique index will finally enforce the uniqueness for the new field.

The primary key can then be dropped and recreated using the new unique index. This is permitted only if there's no foreign key referencing the field. In this case a multi drop and create statement is required. The final cleanup should include the trigger's drop and the old primary key removal. Any view using the old primary key should be rebuilt before the drop.

Unique keys

The unique keys are similar to the primary keys. They enforce the uniqueness using an implicit index but they allow the presence of NULL values. Their usage is for enforcing uniqueness on columns not used as primary key. Similar to the primary key the unique constraints are based on a unique index. In fact there's little difference between the unique index and the unique key except the presence of the latter in the system table pg_constraint.

Sunday 17 August 2014

Chapter 10 part 2 - The binary formats

The three binary formats supported by pg_dump are the custom, the directory and the tar format. The first two can be accessed randomly by the restore program and have the parallel restore support, being the best choice for having a flexible and reliable restore. Before the the 9.3 the only format supporting the parallel restore was the custom. With this version the directory format accepts the -j switch. This feature, combined with the parallel dump seen in 9.3 is a massive improvement for saving big amount of data. The tar format does have the limit of 12 GB in the archive's file size and doesn't offer the parallel restore nor the selective restore.

The custom format is a binary archive with a table of contents pointing the various archive sections. The directory format is a directory which name is the value provided with the -f switch. The directory contents are a toc.dat file, where the table of contents and the schema are stored. For each table there is a gzip file which name is a number corresponding to the toc entry for the saved relation. Those files store the data restore for the relation.

The restore from the binary formats requires the pg_restore usage. Because almost all the pg_dump's switches are supported by pg_restore we'll not repeat the look out. Take a look to 9.1 for the switch meanings. Anyway this is the pg_restore's help output.

pg_restore restores a PostgreSQL database from an archive created by pg_dump.

Usage:
  pg_restore [OPTION]... [FILE]

General options:
  -d, --dbname=NAME        connect to database name
  -f, --file=FILENAME      output file name
  -F, --format=c|d|t       backup file format (should be automatic)
  -l, --list               print summarized TOC of the archive
  -v, --verbose            verbose mode
  -V, --version            output version information, then exit
  -?, --help               show this help, then exit

Options controlling the restore:
  -a, --data-only              restore only the data, no schema
  -c, --clean                  clean (drop) database objects before recreating
  -C, --create                 create the target database
  -e, --exit-on-error          exit on error, default is to continue
  -I, --index=NAME             restore named index
  -j, --jobs=NUM               use this many parallel jobs to restore
  -L, --use-list=FILENAME      use table of contents from this file for
                               selecting/ordering output
  -n, --schema=NAME            restore only objects in this schema
  -O, --no-owner               skip restoration of object ownership
  -P, --function=NAME(args)    restore named function
  -s, --schema-only            restore only the schema, no data
  -S, --superuser=NAME         superuser user name to use for disabling triggers
  -t, --table=NAME             restore named table(s)
  -T, --trigger=NAME           restore named trigger
  -x, --no-privileges          skip restoration of access privileges (grant/revoke)
  -1, --single-transaction     restore as a single transaction
  --disable-triggers           disable triggers during data-only restore
  --no-data-for-failed-tables  do not restore data of tables that could not be
                               created
  --no-security-labels         do not restore security labels
  --no-tablespaces             do not restore tablespace assignments
  --section=SECTION            restore named section (pre-data, data, or post-data)
  --use-set-session-authorization
                               use SET SESSION AUTHORIZATION commands instead of
                               ALTER OWNER commands to set ownership

Connection options:
  -h, --host=HOSTNAME      database server host or socket directory
  -p, --port=PORT          database server port number
  -U, --username=NAME      connect as specified database user
  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)
  --role=ROLENAME          do SET ROLE before restore

If no input file name is supplied, then standard input is used.

Report bugs to <pgsql-bugs@postgresql.org>.
If the database connection is omitted pg_restore sends the output to the standard output. The switch -f sends the output to a filename though. This is very useful if we want to check the original dump file is readable, executing a restore onto the /dev/null device.
The the custom and directory formats show their power when restoring on a database connection in a multi core system. Using the -j switch it's possible to specify the number of parallel jobs for the data and the post data section. This can improve massively the recovery time, running the most time consuming actions in multiple processes.

The word parallel can be confusing in some way. PostgreSQL does not supports multithreading. That means each backend process will use just only one cpu. In this context, each job take care of a different area of the restore's table of contents, The TOC is split in many queues with a fixed object list to process. For example one queue will contain the data restoration for a table, and the relation's indices and constraints.

The switch -section offers a fine grain control on which section of the archived data will be restored. In a custom and directory format there are three distinct sections.
  • pre-data This section restores only the schema definitions not affecting the speed and reliability of the data restore. e.g. table's DDL, functions creation, extensions, etc.
  • data The data restore itself, by default saved as COPY statements to speed up the process
  • post-data This section runs the restore for all the objects enforcing the data integrity, like the primary and foreign keys, triggers and the indices which presence during the restore slow down the data reload massively.
The switch -C creates the target database before starting the restoration. To do this the connection must happen first on another database.

We'll now will see how to restore the database seen in 10.1 in the same two steps approach, using the custom format.

Let's start with a complete database dump using the custom format.
postgres@tardis:~/dump$ pg_dump -Fc -f db_addr.dmp  db_addr
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading extensions
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
pg_dump: reading procedural languages
pg_dump: reading user-defined aggregate functions
pg_dump: reading user-defined operators
pg_dump: reading user-defined operator classes
pg_dump: reading user-defined operator families
pg_dump: reading user-defined text search parsers
pg_dump: reading user-defined text search templates
pg_dump: reading user-defined text search dictionaries
pg_dump: reading user-defined text search configurations
pg_dump: reading user-defined foreign-data wrappers
pg_dump: reading user-defined foreign servers
pg_dump: reading default privileges
pg_dump: reading user-defined collations
pg_dump: reading user-defined conversions
pg_dump: reading type casts
pg_dump: reading table inheritance information
pg_dump: reading event triggers
pg_dump: finding extension members
pg_dump: finding inheritance relationships
pg_dump: reading column info for interesting tables
pg_dump: finding the columns and types of table "t_address"
pg_dump: finding default expressions of table "t_address"
pg_dump: finding the columns and types of table "t_city"
pg_dump: finding default expressions of table "t_city"
pg_dump: flagging inherited columns in subtables
pg_dump: reading indexes
pg_dump: reading indexes for table "t_address"
pg_dump: reading indexes for table "t_city"
pg_dump: reading constraints
pg_dump: reading foreign key constraints for table "t_address"
pg_dump: reading foreign key constraints for table "t_city"
pg_dump: reading triggers
pg_dump: reading triggers for table "t_address"
pg_dump: reading triggers for table "t_city"
pg_dump: reading rewrite rules
pg_dump: reading large objects
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
pg_dump: saving database definition
pg_dump: dumping contents of table t_address
pg_dump: dumping contents of table t_city
We'll use a second database for the restore.





We'll then restore just the schema using the following command.
postgres@tardis:~/dump$ pg_restore -v -s -d db_addr_restore_bin db_addr.dmp 
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA public
pg_restore: creating COMMENT SCHEMA public
pg_restore: creating EXTENSION plpgsql
pg_restore: creating COMMENT EXTENSION plpgsql
pg_restore: creating TABLE t_address
pg_restore: creating SEQUENCE t_address_i_id_addr_seq
pg_restore: creating SEQUENCE OWNED BY t_address_i_id_addr_seq
pg_restore: creating TABLE t_city
pg_restore: creating SEQUENCE t_city_i_id_city_seq
pg_restore: creating SEQUENCE OWNED BY t_city_i_id_city_seq
pg_restore: creating DEFAULT i_id_addr
pg_restore: creating DEFAULT i_id_city
pg_restore: creating CONSTRAINT pk_i_id_city
pg_restore: creating CONSTRAINT pk_id_address
pg_restore: creating FK CONSTRAINT fk_t_city_i_id_city
pg_restore: setting owner and privileges for DATABASE db_addr
pg_restore: setting owner and privileges for SCHEMA public
pg_restore: setting owner and privileges for COMMENT SCHEMA public
pg_restore: setting owner and privileges for ACL public
pg_restore: setting owner and privileges for EXTENSION plpgsql
pg_restore: setting owner and privileges for COMMENT EXTENSION plpgsql
pg_restore: setting owner and privileges for TABLE t_address
pg_restore: setting owner and privileges for SEQUENCE t_address_i_id_addr_seq
pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_address_i_id_addr_seq
pg_restore: setting owner and privileges for TABLE t_city
pg_restore: setting owner and privileges for SEQUENCE t_city_i_id_city_seq
pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_city_i_id_city_seq
pg_restore: setting owner and privileges for DEFAULT i_id_addr
pg_restore: setting owner and privileges for DEFAULT i_id_city
pg_restore: setting owner and privileges for CONSTRAINT pk_i_id_city
pg_restore: setting owner and privileges for CONSTRAINT pk_id_address
pg_restore: setting owner and privileges for FK CONSTRAINT fk_t_city_i_id_city
The dump file is specified as last parameter. The -d switch tells pg_restore which database to connect for the restore. By default the postgres user usually connects using the ident operating system daemon or the trust authentication method, when connected as local. That's the reason why in this example there's no need of specifying the username or enter the password.

The second restore's step is the data load. In the example seen in 10.1 we used the pg_dump with -disable-triggers switch in order to avoid failures caused by constraint violation. With the custom format the switch is used at restore time.
postgres@tardis:~/dump$ pg_restore --disable-triggers -v -a -d db_addr_restore_bin db_addr.dmp 
pg_restore: connecting to database for restore
pg_restore: disabling triggers for t_address
pg_restore: processing data for table "t_address"
pg_restore: enabling triggers for t_address
pg_restore: executing SEQUENCE SET t_address_i_id_addr_seq
pg_restore: disabling triggers for t_city
pg_restore: processing data for table "t_city"
pg_restore: enabling triggers for t_city
pg_restore: executing SEQUENCE SET t_city_i_id_city_seq
pg_restore: setting owner and privileges for TABLE DATA t_address
pg_restore: setting owner and privileges for SEQUENCE SET t_address_i_id_addr_seq
pg_restore: setting owner and privileges for TABLE DATA t_city
pg_restore: setting owner and privileges for SEQUENCE SET t_city_i_id_city_seq
However, this approach does not prevent the slowness caused by the indices when reloading the data. If a restore with multiple steps is required (e.g. creating the database schema and check all the relations are in place before starting) the section switch is a better choice. Let's see how it works with the example seen before.

We'll first restore the pre-data section10.1.
postgres@tardis:~/dump$ pg_restore --section=pre-data -v  -d db_addr_restore_bin db_addr.dmp 
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA public
pg_restore: creating COMMENT SCHEMA public
pg_restore: creating EXTENSION plpgsql
pg_restore: creating COMMENT EXTENSION plpgsql
pg_restore: creating TABLE t_address
pg_restore: creating SEQUENCE t_address_i_id_addr_seq
pg_restore: creating SEQUENCE OWNED BY t_address_i_id_addr_seq
pg_restore: creating TABLE t_city
pg_restore: creating SEQUENCE t_city_i_id_city_seq
pg_restore: creating SEQUENCE OWNED BY t_city_i_id_city_seq
pg_restore: creating DEFAULT i_id_addr
pg_restore: creating DEFAULT i_id_city
pg_restore: setting owner and privileges for DATABASE db_addr
pg_restore: setting owner and privileges for SCHEMA public
pg_restore: setting owner and privileges for COMMENT SCHEMA public
pg_restore: setting owner and privileges for ACL public
pg_restore: setting owner and privileges for EXTENSION plpgsql
pg_restore: setting owner and privileges for COMMENT EXTENSION plpgsql
pg_restore: setting owner and privileges for TABLE t_address
pg_restore: setting owner and privileges for SEQUENCE t_address_i_id_addr_seq
pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_address_i_id_addr_seq
pg_restore: setting owner and privileges for TABLE t_city
pg_restore: setting owner and privileges for SEQUENCE t_city_i_id_city_seq
pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_city_i_id_city_seq
pg_restore: setting owner and privileges for DEFAULT i_id_addr
pg_restore: setting owner and privileges for DEFAULT i_id_city
Again the pg_restore loads the objects with the ownership and privileges. What's missing is the constraints creation. The second step is the data section's load.
postgres@tardis:~/dump$ pg_restore --section=data -v  -d db_addr_restore_bin db_addr.dmp 
pg_restore: connecting to database for restore
pg_restore: implied data-only restore
pg_restore: processing data for table "t_address"
pg_restore: executing SEQUENCE SET t_address_i_id_addr_seq
pg_restore: processing data for table "t_city"
pg_restore: executing SEQUENCE SET t_city_i_id_city_seq
pg_restore: setting owner and privileges for TABLE DATA t_address
pg_restore: setting owner and privileges for SEQUENCE SET t_address_i_id_addr_seq
pg_restore: setting owner and privileges for TABLE DATA t_city
pg_restore: setting owner and privileges for SEQUENCE SET t_city_i_id_city_seq
This section simply loads the table's data and sets the sequence values. Apart for the ownership no further action is performed. Finally we'll run the post-data section.
postgres@tardis:~/dump$ pg_restore --section=post-data -v  -d db_addr_restore_bin db_addr.dmp 
pg_restore: connecting to database for restore
pg_restore: creating CONSTRAINT pk_i_id_city
pg_restore: creating CONSTRAINT pk_id_address
pg_restore: creating FK CONSTRAINT fk_t_city_i_id_city
pg_restore: setting owner and privileges for CONSTRAINT pk_i_id_city
pg_restore: setting owner and privileges for CONSTRAINT pk_id_address
pg_restore: setting owner and privileges for FK CONSTRAINT fk_t_city_i_id_city
With this run the constrains (and eventually all the indices) are created in the best approach possible when dealing with the bulk data processing.

Loading the data contents without indices maximise the speed. The constraint and index build with the data already in place results in a faster build and a fresh index without any bloat.

Monday 11 August 2014

Pg chamelion and dba reactions

Finally I've found some time to complete a working prototype of the new library pg_chamelion.

The github repos is here, https://github.com/the4thdoctor/pg_chameleon, please  fork it if you want to debug or give me some feedback.

The library exports the metadata from mysql using sqlalchemy. The informations are used by the PostgreSQL library to rebuild the schema in a PostgreSQL database. Finally the data is dumped to multiple files in CSV format and reloaded into PostgreSQL using the copy_expert command.


The MySQL data is exported in the CSV  format using a custom SQL query with the MySQL's (not standard) syntax for the string concatenation.
The  replace function is also used  to have the double quotes escaped.



The copy into PostgreSQL uses the psycopg2's copy_expert cursor's method, quick and very efficient.

In the future I'll change the config file format to YAML because is simpler to manage and to write.

I'm also planning to write an alternative library to get the metadata from mysql.
I think in general using any ORM is a bad choice. The performance tuning of their horrible queries, when the amount of data becomes serious, is a pain in the ass.

IMHO their usage should be limited for toy sized databases or for quick and dirt proof of concept.

And if you think I'm rude then take a look to http://dbareactions.com/ .

Wednesday 6 August 2014

Chapter 10 intro and part 1. restoring with plain format

Yes I know, the previous chapter was number 8, this is number 10. 
The reason why is I realised I forgot completely to talk about the constraints so I've added a data integrity  chapter just after the the logical layout and the numbers shifted. 
Anyway the pdf on slideshare is updated with the latest version.

Restore

There's little advantage in saving the data if the recover is not possible. In this chapter we'll take a look to the fastest and safest way to restore recover the saved dump.
The backup format determines which program will be used for the restore. We'll take a look first on the restore from plain format. Later we'll see how the custom and directory formats are the best choice, giving flexibility and performance at restore time. Finally we'll see how to improve database performances at restore time sacrificing temporarily the reliability.

The plain format

As seen in 9 the pg_dump by default saves the entire database in plain format. This is an SQL script offering nothing but a straightforward reload strategy. Feeding the saved file into psql rebuilds completely the objects.
This format have few advantages. For example it's possible to edit the statement using a common text editor. This of course if the dump is reasonably small. Even loading a file with vim when its size is measured in gigabytes becomes a stressful experience.

The data is saved by default in the copy format. This guarantee the best performances at load time. It's still possible to save the data using the inserts but this will result in a very slow restore, having each statement to be parsed and planned.

Saving the schema and data in two separate files requires also an extra care at dump time. Restoring from a data only plain backup will very likely result in tables with foreign keys having their data missing because the key violation.

In order to avoid the problem, at backup time and when running a data only backup, the switch -disable-triggers should be used. This will emit the DISABLE TRIGGER statements before the data load and the ENABLE TRIGGER after the data is consistently restored. The following example will show a dump reload session with the separate schema and data save.
Let's create the simple data structure. We'll create a new database with a table for the addresses and another one for the cities. Between the cities an the addresses a foreign key will enforce the relation on the id city column.




Now let's put some data into it.




We'll now execute two dumps one for the schema and one for the data without the disable triggers switch.

postgres@tardis:~/dmp$ pg_dump --schema-only db_addr > db_addr.schema.sql
postgres@tardis:~/dmp$ pg_dump --data-only db_addr > db_addr.data.sql

Looking to the schema dump it's quite obvious what it does. All the DDL are saved in the correct order to restore the same database structure .
The data is then saved by pg_dump in the correct order for having the referential integrity guaranteed. In our very simple example the table t_city is dumped before the table t_address. This way the data will not violate the foreign key. In a scenario where a complex structure is dumped, this cannot be guaranteed. Let's run the same dump with the option -disable-trigger.

postgres@tardis:~/dmp$ pg_dump --disable-triggers --data-only db_addr > db_addr.data.sql

The copy statements in this case are enclosed by two extra statements for disabling and re enabling the triggers.



This way the FOREIGN KEY's triggers with any user defined trigger, will be disabled during the restore, ensuring the data will be safely stored. After the restoration the enable will restore any constraint enforcement.

The data saved in plain format is then restored using the command line client psql.
Let's then create a new database and restore it the saved dumps, first schema and then the data.


Friday 1 August 2014

Chapter 8's final two parts. pg_dumpall and the export's validation

pg_dumpall

The pg_dumpall program is mostly a wrapper for pg_dump. It's usage is mainly for dumping all the databases in the cluster. Having lesser options than pg_dump is also less flexible. However one option is absolutely useful and should be included in any backup plan to ensure a rapid recovery in case of disaster.
The -globals-only option saves on the standard output all the cluster wide options like the tablespaces, the roles and the privileges. The passwords are dumped as well encrypted in md5 format. The dump format for this utility is only text. The best way to save the globals is to specify the -f option followed by the file name. This file can be loaded into an empty cluster to recreate the global objects. The tablespaces, if any present, must have the filesystem already in place before running the sql as PostgreSQL doesn't create the filesystem structure.

This example shows the program call and the contents of the output file.
postgres@tardis:~/dmp$ pg_dumpall --globals-only -f main_globals.sql
postgres@tardis:~/dmp$ cat main_globals.sql 
--
-- PostgreSQL database cluster dump
--

SET default_transaction_read_only = off;

SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;

--
-- Roles
--

CREATE ROLE postgres;
ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION;






--
-- PostgreSQL database cluster dump complete
--

postgres@tardis:~/dmp$

Backup validation

There's little advantage in having a backup if this is not restorable. The corruption danger is at various levels and unfortunately the problem appears only when the restore is needed.
The corruption have many causes. If the dump is saved locally a damaged filesystem can hide the problem meanwhile the corrupted block is not read. Also the disk subsystem with undetected problems will result in a silent corruption. In order to limit this kind of problems the filesystem of choice should be a solid one with strong journaling support.

The disk subsystem should guarantee the data reliability rather the speed. Slow disks when backing up the data, in particular if in the compressed format don't limit the speed, being the cpu power the real bottleneck.
If the dump file is transferred over the network is a good idea to generate a md5 checksum to check the file integrity after the transfer.

All those measures don't give the security the backup is restorable. The only test capable to ensure the backup is good is a test restore on separate server. This can be a single test or a more structured check. Which strategy to adopt is determined by the amount of data, the time required for the restore and the backup schedule.

The general purpose databases, which size is measurable in hundreds of gigabytes, the restore can complete in few hours and the continuous test is feasible. For the VLDB, which size is measured in terabytes, the restore can take more than one day, in particular if there are big indices requiring expensive sort on disk for the build. In this scenario a weekly restore gives a good perception if there are potential problems with the saved data.