Sunday 27 July 2014

Chapter 8 - part 3, under the bonnet. With a bonus

Snapshot exports

Since PostgreSQL 9.2 are supported the transaction's snapshot exports. A session with an open transaction, can export its consistent snapshot to any other session. The snapshot remains valid meanwhile the transaction is open. Using this functionality offers a way to run multiple backends on a consistent data set frozen in time. This feature resulted in the brilliant parallel export in the 9.3's pg_dump as described in 8.3.
In the following example, let's consider the table created in 5.5. We'll first start an explicit transaction and then we'll export the current snapshot.

 We are first starting a transaction with the REPEATABLE READ isolation level. The second statement exports the current snapshot using the function pg_export_snapshot(). Finally we are checking with a simple row count the table t_data have data inside.

We can now login with in a different session and delete all the rows from the t_data table.

With the table now empty let's import the snapshot exported by the first backend.

The function pg_export_snapshot saves the current snapshot returning the text string which identifies the snapshot. Passing the string to clients that want to import the snapshot gives to independent sessions a single consistent vision. The import is possible only until the end of the transaction that exported it. The export is useful only in the READ COMMITTED transactions, because the REPEATABLE READ and higher isolation levels use the same snapshot within their lifetime.

pg_dump under the bonnet

The pg_dump source code gives a very good picture of what exactly the backup software does. The process runs into with fixed transaction's isolation level accordingly with the server's version. The distinction is required because, becoming PostgreSQL more sophisticated at each major release, the isolation levels became more and more strict with their meanings.
More informations about the transactions are in 5.7.
From PostgreSQL 9.1 the transaction serializable became a real serialisation. The transaction's behaviour offered by the serializable in the version up to 9.0 were assigned to the REPETABLE READ transaction's isolation, its real kind. The SERIALIZABLE transaction's isolation level is still used with DEFERRABLE option when pg_dump is executed with the option -serializable-deferrable as seen in 8.1.3. The switch have effect only on the remote server with version 9.1 and later. The transaction is also set to READ ONLY, when supported by the server, in order to limit the XID generation.

Table 8.1: pg_dump's transaction isolation levels
Server version Command
>= 9.1 with -serializable-deferrable SERIALIZABLE, READ ONLY, DEFERRABLE

From the version 9.3 pg_dump supports the parallel dump using the feature seen in 5.7.1. As the snapshot export is supported from the version 9.2 this permit a parallel dump from the older major version if using the newer backup program. However, pg_dump accepts the option -no-synchronized-snapshots in order to dump in parallel jobs from the older versions. The data is not supposed to be consistent if there are read write transactions during this kind of export. To have a consistent export in this case all the transactions which modify the data must be stopped meanwhile the export is in progress.

When exporting in parallel the only permitted format is the directory. The pg_restore program since the version 9.3 supports also the directory format for the paralelle data restoration. The combination of the parallel jobs backup and the parallel jobs restore, can improve massively either the backup and the recovery speed in case of disaster.

Wednesday 23 July 2014

Chapter 8 - part 2, performance tips

Performance tips

Despite the fact pg_dump doesn't affects the running queries, its strict transactional approach have some effects on the affected schema. Any alter schema is blocked until the backup's end. The vacuum efficiency is affected as well, because all the dead rows generated during the backup's run, cannot be reclaimed being potentially required by the backup's running transaction.
There are some tips to improve the backup's speed.

Avoid remote backups

The pg_dump can connect to remote databases same like any other PostgreSQL client. It seems reasonable then to use the program installed on a centralised storage and to dump locally from the remote cluster.
Unfortunately even using the compressed format, the entire database flows uncompressed and in clear, from the server to the remote pg_dump. The compression happens locally when the data is received.
This approach expose also a network security issue. If the environment is not trusted then the remote connection must happen on a secure channel. This add an extra overhead to the transfer and any failure on this layer will fail the entire backup.
A far better approach is to save locally the database, using the local connection if possible, and then copy the entire dump file using a secure transfer protocol like scp or sshfs.

Skip replicated tables

If the database is configured as logical slave in slony or londiste for example, backing up the replicated table's data is not important as the contents are re synchronised from the master when the node is attached to the replication system. The switch -exclude-table-data=TABLE is then useful for dumping the table's definition only without the contents.

Check for slow cpu cores

PostgreSQL is a multitasking but not a multithreaded database system. Each backend is attached to just one cpu. The pg_dump opens one backend connection to the cluster in order to export the database objects. The pg_dump process receives the data flow from the backend and it saves performing also the optional compression. In this scenario the cpu power is critical in order to avoid a bottleneck. This could be helped using the parallel export offered by pg_dump from the version 9.3. The functionality is implemented via the snapshot exports. As this was introduced with PostgreSQL 9.2 the paralle export can happen only from this version and only if the output format is directory .

Check for the available locks

PostgreSQL at various level uses the locks to ensure the data consistency at various levels. For example, when a table is read an access share lock on the relation is put in order to avoid any structure change. Any backend issuing an ALTER TABLE which affect the table structure, will wait for the lock to be released before acquiring itself an exclusive lock and then perform the change. The relation's locks are stored into the pg_locks table. This table is quite unique because have a limited number of rows. The maximum number of table's lock slot is determine with this simple formula.

 max_locks_per_transaction * (max_connections + max_prepared_transactions)

The default configuration permits have only 6400 table's lock slots. This value is generally OK. However, if the database have a great number of relations, a full backup, pg_dump could hit the slot limit and fail with an out of memory error.
All the three GUC parameters require a restart to apply the new changes so is very important to plan the change before the limit is reached.

Try to use flexible formats

This is more a good practice suggestion rather a performance tip. Exporting the database in plain text have some advantages. Is possible to load the dump just using psql and any file's corruption can be managed in a simple way; if the damage is limited of course. The custom and directory formats need the pg_restore utility for the restoration. We'll take a look to this approach in 9. Anyway, the custom and directory formats have alongside with the compression, the parallel restore feature and the selective restoration. The compression can be fine tuned to suit the export's nature. In this era of ``big data'' is something to consider seriously.

Sunday 13 July 2014

Chapter 8 - Part 1, pg_dump at glance

Pretty busy week (and week end), I was able to write just those boring lines mostly derived from the pg_dump's help. I'll do better next time, I promise.


The hardware is subject to faults. In particular if the storage is lost the entire data infrastructure becomes inaccessible, sometime for good. Also human errors, like not filtered delete or table drop can happen. Having a solid backup strategy is then the best protection, ensuring the data is still recoverable. In this chapter we'll take a look only to the logical backup with pg_dump.

pg_dump at glance

As seen in 3.1.5, pg_dump is the PostgreSQL's utility for saving consistent backups. Its usage is quite simple and if launched without options it tries to connect to the local cluster with the current user sending the dump to the standard output.
The pg_dump help gives useful informations about the usage.
postgres@tardis:~/dump$ pg_dump --help
pg_dump dumps a database as a text file or to other formats.

  pg_dump [OPTION]... [DBNAME]

General options:
  -f, --file=FILENAME          output file or directory name
  -F, --format=c|d|t|p         output file format (custom, directory, tar,
                               plain text (default))
  -j, --jobs=NUM               use this many parallel jobs to dump
  -v, --verbose                verbose mode
  -V, --version                output version information, then exit
  -Z, --compress=0-9           compression level for compressed formats
  --lock-wait-timeout=TIMEOUT  fail after waiting TIMEOUT for a table lock
  -?, --help                   show this help, then exit

Options controlling the output content:
  -a, --data-only              dump only the data, not the schema
  -b, --blobs                  include large objects in dump
  -c, --clean                  clean (drop) database objects before recreating
  -C, --create                 include commands to create database in dump
  -E, --encoding=ENCODING      dump the data in encoding ENCODING
  -n, --schema=SCHEMA          dump the named schema(s) only
  -N, --exclude-schema=SCHEMA  do NOT dump the named schema(s)
  -o, --oids                   include OIDs in dump
  -O, --no-owner               skip restoration of object ownership in
                               plain-text format
  -s, --schema-only            dump only the schema, no data
  -S, --superuser=NAME         superuser user name to use in plain-text format
  -t, --table=TABLE            dump the named table(s) only
  -T, --exclude-table=TABLE    do NOT dump the named table(s)
  -x, --no-privileges          do not dump privileges (grant/revoke)
  --binary-upgrade             for use by upgrade utilities only
  --column-inserts             dump data as INSERT commands with column names
  --disable-dollar-quoting     disable dollar quoting, use SQL standard quoting
  --disable-triggers           disable triggers during data-only restore
  --exclude-table-data=TABLE   do NOT dump data for the named table(s)
  --inserts                    dump data as INSERT commands, rather than COPY
  --no-security-labels         do not dump security label assignments
  --no-synchronized-snapshots  do not use synchronized snapshots in parallel jobs
  --no-tablespaces             do not dump tablespace assignments
  --no-unlogged-table-data     do not dump unlogged table data
  --quote-all-identifiers      quote all identifiers, even if not key words
  --section=SECTION            dump named section (pre-data, data, or post-data)
  --serializable-deferrable    wait until the dump can run without anomalies
                               use SET SESSION AUTHORIZATION commands instead of
                               ALTER OWNER commands to set ownership

Connection options:
  -d, --dbname=DBNAME      database to dump
  -h, --host=HOSTNAME      database server host or socket directory
  -p, --port=PORT          database server port number
  -U, --username=NAME      connect as specified database user
  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)
  --role=ROLENAME          do SET ROLE before dump

Connection options

The connection options specify the way the program connects to the cluster. All the options are straightforward except for the password. Is possible to avoid the password prompt or to disable it but the password cannot be specified on the command line. In an automated dump script this can be worked around exporting the variable PGPASSWORD or using the password file.
The PGPASSWORD variable is considered not secure and shouldn't be used if untrusted users are accessing the server. The password file is a text file named .pgpass and stored in the home directory of the os user which connects to the cluster.
Each file's line specify a connection using the following format.
If, for example, we want to connect to the database db_test with the username usr_test on the host tardis with port 5432 and the password is testpwd8.1, the password file will contain this row
For security reasons the file will not work if group or others accessible. In order to make it work you should issue the command chmod go-rw .pgpass . The password file is used also by other PostgreSQL programs like the client psql.

General options

The general options are a set of switches used to control the backup's output and format. The -f followed by the FILENAME outputs the backup on file.

The -F specifies the backup format and requires a second option to tell pg_dump which format to use. The option can be one of those, c d t p which corresponds to custom directory tar plain.

If the parameter is omitted then pg_dump uses the p format. This outputs a SQL script which recreates the objects when loaded into an empty database. The format is not compressed and is suitable for direct load using the client psql.
The the custom together with the directory format is most versatile format. It offers compression and flexibility at restore time. The file offers the parallel restore functionality and the selective restore of single objects.

The directory format stores the schema dump, the dump's table of contents alongside with the compressed data dump in the directory specified with the -f switch. Each table is saved in a different file and is compressed by default.From the version 9.3 this format offers the parallel dump functionality.

The tar format stores the dump in the conservative tape archive format. This format is compatible with directory format, does not supports compression and have the 8 GB limit on the size of individual tables.

The -j option specifies the number of jobs to run in parallel for dumping the data. This feature appeared in the version 9.3 and uses the transaction's snapshot export to give a consistent data snapshot to the export jobs. The switch is usable only with the directory format and only against PostgreSQL 9.2 and later.

The option -Z specifies the compression level for the compressed formats. The default is 5 resulting in a dumped archive from 5 to 8 times smaller than the original database.
The option -lock-wait-timeout is the number of milliseconds for the table's lock acquisition. When expired the dump will fail. Is useful to avoid the program to wait forever for a table lock but can result in failed backups if set too much low.

Output options

The output options control the way the program outputs the backup. Some of those options are meaningful only under specified conditions, other are quite obvious.
The -a option sets the data only export. Separating schema and data have some effects at restore time, in particular with the performance. We'll see in the detail in 9 how to build an efficient two phase restore.

The -b option exports the large objects. This is the default setting except if the -n switch is used. In this case the -b is required to export the large objects.

The options -c and -C are meaningful only for the plain output format. They respectively add the DROP and CREATE command before the object's DDL. For the archive formats the same option exists for pg_restore.

The -E specifies the character encoding for the archive. If not set the origin database encoding will be used.

The -n switch is used to dump the named schema only. It's possible to specify multiple -n switches to select many schemas or using the wildcards. However despite the efforts of pg_dump to get all the dependencies resolved, something could be missing. There's no guarantee the resulting archive can be successfully restored.

The -N switch does the opposite of the -n switch. Skips the named schema. Accepts wildcards and it's possible to specify multiple schemas with multiple -N switches. When both -n and -N are given, the behavior is to dump just the schemas that match at least one -n switch but no -N switches. If -N appears without -n, then schemas matching -N are excluded from what is otherwise a normal dump.

The -o option dumps the object id as part of the table for every table. This options should be used only if the OIDs are part of the design. Otherwise this option shouldn't be used.

The -O have effects only on plain text exports and skips the object ownership set.

The -s option dumps only the database schema.

The -S option is meaningful only for plain text exports. This is the super user which disables the triggers if the export is performed with the option -disable-triggers. Anyway, as suggested on the manual, it's better to run the restoring script as superuser.

The -t switch is used to dump the named table only. It's possible to specify multiple tables using the wildcards or specifying the -t more.

The -T does the opposite, skips the named table in the dump.

The switch -x skips dumping the privileges settings usually dumped as grant/revoke commands.

The option -binary-upgrade is used only for the in place upgrade program pg_upgrade. Is not for general usage.
The option -column-inserts result in the data exported as INSERT commands with all the column names specified. Because by default the data is saved as COPY statement, using this switch will results in a bigger dump file and a very slow restoration. It's sole advantage is any error in the restore will skip just the affected row and not the entire table's load.

The -disable-dollar-quoting disables the newer dollar quoting for the function's body and uses the standard SQL quoting.

The -disable-triggers emits the triggers disable and re enable for the data only export. Disabling the triggers ensure the foreign keys won't fail for the missing referenced data. This switch is meaningful only for the plain text export.

The -exclude-table-data=TABLE skips the data for the named table but dumps the table's definition. It follow the same rules of the -t and -T for specifying multiple tables.

The -insert dumps the data as INSERT command. Same as the -column-inserts the restore is very slow and should be used only for reloading data into non-PostgreSQL databases.

The -no-security-labels doesn't include the security labels into the dump file.

The -no-synchronized-snapshots allows the parallel export against pre 9.2 databases. Because the snapshot export feature is missing this means the database shall not change its status meanwhile the export is running. Otherwise there will be a not consistent data export. If in doubt do not use this option.

The -no-tablespaces skips the tablespace assignments.

The -no-unlogged-table-data does not export data for the unlogged relations.

The -quote-all-identifiers cause all the identifiers to be enclosed in double quotes. By default only the identifiers with keyword's name are quoted.

The -section option specifies one of the three export's sections. The pre-data, with the table, the view and the function definitions. The data section where the actual table data is dumped as COPY or INSERTS, accordingly with the command line options. The post-data section where the constraint, the index and the eventual GRANT REVOKE commands are finally executed. This switch is meaningful only for text exports.

The -serializable-deferrable uses a serializable transaction for the dump, to ensure the database state is consistent. The dump execution waits for a point in the transaction stream without anomalies to avoid the risk of the dump failing or causing other transactions to abort for the serialization_failure. The option is not beneficial for a dump intended only for disaster recovery. It's useful for the dump used for reloading the data into a read only database which needs to have a consistent state compatible with the origin's database.

The switch -use-set-session-authorization causes the usage of SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set the objects ownership. The resulting dump is more standards compatible but the SET SESSION AUTHORIZATION command requires the super user privileges whereas ALTER OWNER doesn't.

Wednesday 2 July 2014

Chapter 7 - Part 5, The autovacuum

The autovacuum

The autovacuum daemon appeared with the revolutionary PostgreSQL version 8.0. With the version 8.3 was also enabled by default because reliable and efficient. Having the autovacuum turned on is a good idea because all the maintenance is done automatically by the system. The number of workers to start is not simple to determine. Each process consumes a connection slot and changing the number of workers requires the cluster's restart. Turning on autovacuum does't disable it, the worker starts automatically to vacuum tables near to the transacion ID and multixact ID wraparound failure. The autovacuum behaviour is controlled using few GUC parameters.


This parameter is used to enable or disable the autovacuum daemon. Changing the setting requires the cluster's restart. Turning autovacuum off never disables the daemon completely. The autovacuum process will start in any case for tables with XID older than autovacuum_freeze_max_age


The parameter sets the maximum number of autovacuum subprocesses. Changing the setting requires the cluster's restart and each subprocess consumes one PostgreSQL connection.


The parameter sets the delay between two autovacuum runs on a specified database.The delay is measured in seconds and the default value is 1 minute.


This parameter and the next one controls when the autovacuum is triggered. This one specifies the fraction of table to add to autovacuum_vacuum_threshold in order to determine whether start the vacuum. The default is 0.2, 20% of the table. This setting can be overridden for individual tables by changing storage parameters.


This parameter sets the minimum number of a table's updated or deleted tuples needed to trigger a VACUUM. The default is 50 tuples. This setting can be overridden for individual tables by changing storage parameters. For example, if we have a 10 million rows table with both parameters set to default, the autovacuum will start after 2,000,050 update or delete.


This parameter and the next one controls when the auto analyse is triggered. This one specifies the fraction of table to add to autovacuum_analyze_threshold in order to determine whether start the vacuum. The default is 0.1, 10% of the table. This setting can be overridden for individual tables by changing storage parameters.


This parameter sets the minimum number of a table's updated or deleted tuples needed to trigger an ANALYZE. The default is 50 tuples. This setting can be overridden for individual tables by changing storage parameters. For example, if we have a 10 million rows table with both parameters set to default, the autovacuum will start after 1,000,050 update or delete.


The parameter sets the maximum age of the table's pg_class.relfrozenxid, in transactions, after the VACUUM is forced to avoid the transaction ID wraparound. The process will start also if the autovacuum is disabled. The parameter can be set only at server's start but is possible to reduce the value per table by changing the storage parameter.


The parameter sets the maximum age of the table's pg_class.relminmxid, in transactions, after the VACUUM is forced to avoid the multixact ID wraparound. The process will start also if the autovacuum is disabled. The parameter can be set only at server's start but is possible to reduce the value per table by changing the storage parameter.


The parameter sets the cost delay to use in automatic VACUUM operations. If set to -1, the regular vacuum_cost_delay value will be used. The default value is 20 milliseconds.


The parameter sets cost limit value to be used in automatic VACUUM operations. If set to -1 then the regular vacuum_cost_limit value will be used. The default value is -1. The value is distributed among the running autovacuum workers. The sum of the limits of each worker never exceeds this variable. More informations on cost based vacuum here 7.1.6.