Saturday 19 April 2014

Chapter 4 - part 6 - The data area

As seen before the data storage area is initialized by initdb . Its structure didn't change too much from the old fashioned 7.4. In this section we'll take a look to the various subdirectories and how their usage can affect the performances.


As the name suggests, the base directory contains the database files. Each database have a dedicated subdirectory, named after the internal database's object id. A freshly initialised data directory shows only three subdirectories in the base folder. Those corresponds to the two template databases,template0 and template1, plus the postgres database. Take a look to section 4.7 for more informations.
The numerical directories contains various files, also with the numerical name which are actualy the database's relations, tables and indices.
The relation's name is set initially from the relation's object id. Any file altering operation like VACUUM FULL or REINDEX, will generate a new file with a different name. To find out the real relation's file name the relfilenode inside the pg_class system table must be queried.


The global directory contains all the cluster wide relations. In addition there's the very critical control file mentioned in 3.1.7. This small file is big exactly one database block, usually 8192 bytes, and contains critical informations for the cluster. With a corrupted control file the instance cannot start. The control file is written usually when a checkpoint occurs.


This is the most important and critical directory, for the performances and for the reliability. The directory contains the transaction's logs, named wal file. Each file is usually 16 Mb and contains all the data blocks changed during the database activity. The blocks are written first on this not volatile area to ensure the cluster's recovery in case of cras. The data blocks are then written later to the corresponding data files. If the cluster's shutdown is not clean then the wal files are replayed during the startup process from the last known consistent location read from control file.
In order to ensure good performance this location should stay on a dedicated device.


This directory contains the committed transactions in small 8k files, except for the serializable transactions. The the files are managed by the cluster and the amount is related with the two GUC parameters autovacuum_freeze_max_age and vacuum_freeze_table_age. Increasing the values for the two parameters the pg_clog must store the commit status to the ``event horizon'' of the oldest frozen transaction id. More informations about vacuum and the maintenance are in the chapter 8.


Same as pg_clog this directory stores the informations about the commited transactions in serializable transaction isolation level.


Stores the informations about the multi transaction status, used generally for the row share locks.


Stores informations about the LISTEN/NOTIFY operations.


This directory is used to store the exported snapshots. From the version 9.2 PostgreSQL offers the transaction's snapshot export where one session can open a transaction and export a consistent snapshot. This way different session can access the snapshot and read all togheter the same consistent data snapshot. This feature is used, for example, by pg_dump for the parallel export.


This directory contains the permanent files for the statistic subsystem.


This directory contains the temporary files for the statistic subsystem. As this directory is constantly written, is very likely to become an IO bottleneck. Setting the GUC parameter stats_temp_directory to a ramdisk speeds can improve the database performances.


Stores the subtransactions status data.


Stores the two phase commit data. The two phase commit allows the transaction opening independently from the session. This way even a different session can commit or rollback the transaction later.


The directory contains the symbolic links to the tablespace locations. A tablespace is a logical name pointing a physical location. As from PostgreSQL 9.2 the location is read directly from the symbolic link. This make possible to change the tablespace's position simply stopping the cluster, moving the data files in the new location, creating the new symlink and starting the cluster. More informations about the tablespace management in the chapter 7.

No comments:

Post a Comment