The custom format is a binary archive with a table of contents pointing the various archive sections. The directory format is a directory which name is the value provided with the -f switch. The directory contents are a toc.dat file, where the table of contents and the schema are stored. For each table there is a gzip file which name is a number corresponding to the toc entry for the saved relation. Those files store the data restore for the relation.
The restore from the binary formats requires the pg_restore usage. Because almost all the pg_dump's switches are supported by pg_restore we'll not repeat the look out. Take a look to 9.1 for the switch meanings. Anyway this is the pg_restore's help output.
pg_restore restores a PostgreSQL database from an archive created by pg_dump. Usage: pg_restore [OPTION]... [FILE] General options: -d, --dbname=NAME connect to database name -f, --file=FILENAME output file name -F, --format=c|d|t backup file format (should be automatic) -l, --list print summarized TOC of the archive -v, --verbose verbose mode -V, --version output version information, then exit -?, --help show this help, then exit Options controlling the restore: -a, --data-only restore only the data, no schema -c, --clean clean (drop) database objects before recreating -C, --create create the target database -e, --exit-on-error exit on error, default is to continue -I, --index=NAME restore named index -j, --jobs=NUM use this many parallel jobs to restore -L, --use-list=FILENAME use table of contents from this file for selecting/ordering output -n, --schema=NAME restore only objects in this schema -O, --no-owner skip restoration of object ownership -P, --function=NAME(args) restore named function -s, --schema-only restore only the schema, no data -S, --superuser=NAME superuser user name to use for disabling triggers -t, --table=NAME restore named table(s) -T, --trigger=NAME restore named trigger -x, --no-privileges skip restoration of access privileges (grant/revoke) -1, --single-transaction restore as a single transaction --disable-triggers disable triggers during data-only restore --no-data-for-failed-tables do not restore data of tables that could not be created --no-security-labels do not restore security labels --no-tablespaces do not restore tablespace assignments --section=SECTION restore named section (pre-data, data, or post-data) --use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership Connection options: -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically) --role=ROLENAME do SET ROLE before restore If no input file name is supplied, then standard input is used. Report bugs to <pgsql-bugs@postgresql.org>.If the database connection is omitted pg_restore sends the output to the standard output. The switch -f sends the output to a filename though. This is very useful if we want to check the original dump file is readable, executing a restore onto the /dev/null device.
The the custom and directory formats show their power when restoring on a database connection in a multi core system. Using the -j switch it's possible to specify the number of parallel jobs for the data and the post data section. This can improve massively the recovery time, running the most time consuming actions in multiple processes.
The word parallel can be confusing in some way. PostgreSQL does not supports multithreading. That means each backend process will use just only one cpu. In this context, each job take care of a different area of the restore's table of contents, The TOC is split in many queues with a fixed object list to process. For example one queue will contain the data restoration for a table, and the relation's indices and constraints.
The switch -section offers a fine grain control on which section of the archived data will be restored. In a custom and directory format there are three distinct sections.
- pre-data This section restores only the schema definitions not affecting the speed and reliability of the data restore. e.g. table's DDL, functions creation, extensions, etc.
- data The data restore itself, by default saved as COPY statements to speed up the process
- post-data This section runs the restore for all the objects enforcing the data integrity, like the primary and foreign keys, triggers and the indices which presence during the restore slow down the data reload massively.
We'll now will see how to restore the database seen in 10.1 in the same two steps approach, using the custom format.
Let's start with a complete database dump using the custom format.
postgres@tardis:~/dump$ pg_dump -Fc -f db_addr.dmp db_addr pg_dump: reading schemas pg_dump: reading user-defined tables pg_dump: reading extensions pg_dump: reading user-defined functions pg_dump: reading user-defined types pg_dump: reading procedural languages pg_dump: reading user-defined aggregate functions pg_dump: reading user-defined operators pg_dump: reading user-defined operator classes pg_dump: reading user-defined operator families pg_dump: reading user-defined text search parsers pg_dump: reading user-defined text search templates pg_dump: reading user-defined text search dictionaries pg_dump: reading user-defined text search configurations pg_dump: reading user-defined foreign-data wrappers pg_dump: reading user-defined foreign servers pg_dump: reading default privileges pg_dump: reading user-defined collations pg_dump: reading user-defined conversions pg_dump: reading type casts pg_dump: reading table inheritance information pg_dump: reading event triggers pg_dump: finding extension members pg_dump: finding inheritance relationships pg_dump: reading column info for interesting tables pg_dump: finding the columns and types of table "t_address" pg_dump: finding default expressions of table "t_address" pg_dump: finding the columns and types of table "t_city" pg_dump: finding default expressions of table "t_city" pg_dump: flagging inherited columns in subtables pg_dump: reading indexes pg_dump: reading indexes for table "t_address" pg_dump: reading indexes for table "t_city" pg_dump: reading constraints pg_dump: reading foreign key constraints for table "t_address" pg_dump: reading foreign key constraints for table "t_city" pg_dump: reading triggers pg_dump: reading triggers for table "t_address" pg_dump: reading triggers for table "t_city" pg_dump: reading rewrite rules pg_dump: reading large objects pg_dump: reading dependency data pg_dump: saving encoding = UTF8 pg_dump: saving standard_conforming_strings = on pg_dump: saving database definition pg_dump: dumping contents of table t_address pg_dump: dumping contents of table t_cityWe'll use a second database for the restore.
We'll then restore just the schema using the following command.
postgres@tardis:~/dump$ pg_restore -v -s -d db_addr_restore_bin db_addr.dmp pg_restore: connecting to database for restore pg_restore: creating SCHEMA public pg_restore: creating COMMENT SCHEMA public pg_restore: creating EXTENSION plpgsql pg_restore: creating COMMENT EXTENSION plpgsql pg_restore: creating TABLE t_address pg_restore: creating SEQUENCE t_address_i_id_addr_seq pg_restore: creating SEQUENCE OWNED BY t_address_i_id_addr_seq pg_restore: creating TABLE t_city pg_restore: creating SEQUENCE t_city_i_id_city_seq pg_restore: creating SEQUENCE OWNED BY t_city_i_id_city_seq pg_restore: creating DEFAULT i_id_addr pg_restore: creating DEFAULT i_id_city pg_restore: creating CONSTRAINT pk_i_id_city pg_restore: creating CONSTRAINT pk_id_address pg_restore: creating FK CONSTRAINT fk_t_city_i_id_city pg_restore: setting owner and privileges for DATABASE db_addr pg_restore: setting owner and privileges for SCHEMA public pg_restore: setting owner and privileges for COMMENT SCHEMA public pg_restore: setting owner and privileges for ACL public pg_restore: setting owner and privileges for EXTENSION plpgsql pg_restore: setting owner and privileges for COMMENT EXTENSION plpgsql pg_restore: setting owner and privileges for TABLE t_address pg_restore: setting owner and privileges for SEQUENCE t_address_i_id_addr_seq pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_address_i_id_addr_seq pg_restore: setting owner and privileges for TABLE t_city pg_restore: setting owner and privileges for SEQUENCE t_city_i_id_city_seq pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_city_i_id_city_seq pg_restore: setting owner and privileges for DEFAULT i_id_addr pg_restore: setting owner and privileges for DEFAULT i_id_city pg_restore: setting owner and privileges for CONSTRAINT pk_i_id_city pg_restore: setting owner and privileges for CONSTRAINT pk_id_address pg_restore: setting owner and privileges for FK CONSTRAINT fk_t_city_i_id_cityThe dump file is specified as last parameter. The -d switch tells pg_restore which database to connect for the restore. By default the postgres user usually connects using the ident operating system daemon or the trust authentication method, when connected as local. That's the reason why in this example there's no need of specifying the username or enter the password.
The second restore's step is the data load. In the example seen in 10.1 we used the pg_dump with -disable-triggers switch in order to avoid failures caused by constraint violation. With the custom format the switch is used at restore time.
postgres@tardis:~/dump$ pg_restore --disable-triggers -v -a -d db_addr_restore_bin db_addr.dmp pg_restore: connecting to database for restore pg_restore: disabling triggers for t_address pg_restore: processing data for table "t_address" pg_restore: enabling triggers for t_address pg_restore: executing SEQUENCE SET t_address_i_id_addr_seq pg_restore: disabling triggers for t_city pg_restore: processing data for table "t_city" pg_restore: enabling triggers for t_city pg_restore: executing SEQUENCE SET t_city_i_id_city_seq pg_restore: setting owner and privileges for TABLE DATA t_address pg_restore: setting owner and privileges for SEQUENCE SET t_address_i_id_addr_seq pg_restore: setting owner and privileges for TABLE DATA t_city pg_restore: setting owner and privileges for SEQUENCE SET t_city_i_id_city_seqHowever, this approach does not prevent the slowness caused by the indices when reloading the data. If a restore with multiple steps is required (e.g. creating the database schema and check all the relations are in place before starting) the section switch is a better choice. Let's see how it works with the example seen before.
We'll first restore the pre-data section10.1.
postgres@tardis:~/dump$ pg_restore --section=pre-data -v -d db_addr_restore_bin db_addr.dmp pg_restore: connecting to database for restore pg_restore: creating SCHEMA public pg_restore: creating COMMENT SCHEMA public pg_restore: creating EXTENSION plpgsql pg_restore: creating COMMENT EXTENSION plpgsql pg_restore: creating TABLE t_address pg_restore: creating SEQUENCE t_address_i_id_addr_seq pg_restore: creating SEQUENCE OWNED BY t_address_i_id_addr_seq pg_restore: creating TABLE t_city pg_restore: creating SEQUENCE t_city_i_id_city_seq pg_restore: creating SEQUENCE OWNED BY t_city_i_id_city_seq pg_restore: creating DEFAULT i_id_addr pg_restore: creating DEFAULT i_id_city pg_restore: setting owner and privileges for DATABASE db_addr pg_restore: setting owner and privileges for SCHEMA public pg_restore: setting owner and privileges for COMMENT SCHEMA public pg_restore: setting owner and privileges for ACL public pg_restore: setting owner and privileges for EXTENSION plpgsql pg_restore: setting owner and privileges for COMMENT EXTENSION plpgsql pg_restore: setting owner and privileges for TABLE t_address pg_restore: setting owner and privileges for SEQUENCE t_address_i_id_addr_seq pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_address_i_id_addr_seq pg_restore: setting owner and privileges for TABLE t_city pg_restore: setting owner and privileges for SEQUENCE t_city_i_id_city_seq pg_restore: setting owner and privileges for SEQUENCE OWNED BY t_city_i_id_city_seq pg_restore: setting owner and privileges for DEFAULT i_id_addr pg_restore: setting owner and privileges for DEFAULT i_id_cityAgain the pg_restore loads the objects with the ownership and privileges. What's missing is the constraints creation. The second step is the data section's load.
postgres@tardis:~/dump$ pg_restore --section=data -v -d db_addr_restore_bin db_addr.dmp pg_restore: connecting to database for restore pg_restore: implied data-only restore pg_restore: processing data for table "t_address" pg_restore: executing SEQUENCE SET t_address_i_id_addr_seq pg_restore: processing data for table "t_city" pg_restore: executing SEQUENCE SET t_city_i_id_city_seq pg_restore: setting owner and privileges for TABLE DATA t_address pg_restore: setting owner and privileges for SEQUENCE SET t_address_i_id_addr_seq pg_restore: setting owner and privileges for TABLE DATA t_city pg_restore: setting owner and privileges for SEQUENCE SET t_city_i_id_city_seqThis section simply loads the table's data and sets the sequence values. Apart for the ownership no further action is performed. Finally we'll run the post-data section.
postgres@tardis:~/dump$ pg_restore --section=post-data -v -d db_addr_restore_bin db_addr.dmp pg_restore: connecting to database for restore pg_restore: creating CONSTRAINT pk_i_id_city pg_restore: creating CONSTRAINT pk_id_address pg_restore: creating FK CONSTRAINT fk_t_city_i_id_city pg_restore: setting owner and privileges for CONSTRAINT pk_i_id_city pg_restore: setting owner and privileges for CONSTRAINT pk_id_address pg_restore: setting owner and privileges for FK CONSTRAINT fk_t_city_i_id_cityWith this run the constrains (and eventually all the indices) are created in the best approach possible when dealing with the bulk data processing.
Loading the data contents without indices maximise the speed. The constraint and index build with the data already in place results in a faster build and a fresh index without any bloat.
No comments:
Post a Comment