Sunday, 30 March 2014

Chapter 4 - part 3 - The shutdown sequence

The shutdown sequence

For those coming from Oracle, the PostgreSQL's shutdown sequence will sound familiar with few, but very important, exceptions. A PostgreSQL process enters the shutdown status when receive a specific OS signal.
This can happen using the os kill or via pg_ctl.

As seen in the section 3.1.2 the latter accepts the -m switch to specify the shutdown mode. If not specified defaults to the mode smart which sends the signal SIGTERM to the running PostgreSQL process. This way the database will not accept new connections and will wait for the existing connections to exit before the shutdown.

The most useful PostgreSQL shutdown mode is the fast mode. This way the SIGQUIT is sent to the postgresql process and the database will disconnect all the sessions rolling back the open transactions and then will enter the shutdown sequence.

Either modes, smart and fast, make the database shutdown clean, leaving the cluster in consistent state when the postgres process exits.
As soon as the postgres process enters the shutdown sequence will issue a last checkpoint consolidating the updated physical blocks in the shared buffers to the data files.

As soon as the checkpoint is complete and the last consistent position is logged on the pg_control file, the database main process exits the accessory processes like the walwriter or the checkpointer and terminates removing the pid file.
The last checkpoint can slow down the entire sequence because is commonly spread through time to avoid any disk IO activity spike. If the shared_buffer is big and contains many dirty blocks, the checkpoint can run for a very long time. In addition, if at the shutdown time, another checkpoint is running the database will wait the completion before starting the final checkpoint. This meanwhile the new connections are forbidden, whatever shutdown mode you decided to use.
In order to have an idea of what's happening on the running cluster, is a good practice to change the GUC log_checkpoints = off to log_checkpoints = on.
For more informations about the database processes take a look to the section 4.4.

If the cluster is taking too long to stop, as last resort is possible to use the immediate mode. This will send a SIGQUIT to the running process causing the immediate exit of the main process and all the sessions.
The shutdown in this case is not clean and the subsequent start will perform a crash recovery starting from the last consistent state read from the pg_control file. The process is usually harmless with one important exception.
If the cluster have unlogged tables those relations are recreated from scratch when the recovery happens and all the data in those table is lost.

This is the main reason I suggest to avoid the pg_ctlcluster shipped with debian to stop the cluster. The program doesn't offer any control on the shutdown mode and can result in disastrous data loss. For more details take a look to the subsection 3.3.4.

A last word about the SIGKILL signal. It can happen the cluster refuse to stop even using the immediate mode. In this case, as last resort the SIGKILL or kill -9 can be used. The online manual is very clear on this point. As the SIGKILL cannot be trapped and evicts from the ram the process with the shared memory, its use will results in not freeing the resources used by the killed process.

This will very likely affect the start of a fresh instance. Please refer to your sysadmin to find out the best way to perform a memory and semaphore cleanup before starting PostgreSQL after a SIGKILL.

Wednesday, 26 March 2014

Chapter 4 - part 2 - the startup sequence

The startup sequence


When the server process is started allocates the shared segment in memory. In the versions before the 9.3 this was the a potential point of failure because, if the shared_buffers was bigger than the kernel's max allowed shared memory segment the startup will fail with this sort of error

FATAL: could not create shared memory segment: Cannot allocate memory

DETAIL: Failed system call was shmget(key=X, size=XXXXXX, XXXXX).

HINT: This error usually means that PostgreSQL's request for a shared memory
segment exceeded available memory or swap space, or exceeded your kernel's
SHMALL parameter. You can either reduce the request size or reconfigure the
kernel with larger SHMALL. To reduce the request size (currently XXXXX bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or
max_connections.
In this case the kernel parameters needs adjustment. As from my Oracle experience I take no chance and I do use the values suggested for an Oracle installation which incidentally works perfectly for PostgreSQL.
kernel.shmall = 2621440
kernel.shmmax = 18253611008
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 658576
Put those values into the file /etc/sysctl.conf and, as root, run sysctl -p.
The latest major version PostgreSQL switched from the SysV to Posix shared memory and mmap. This solved completely the shared memory tuning on Linux.
When the memory is allocated the postmaster reads the pg_control file to check if the instance requires recovery.
The pg_control contains references to the last checkpoint and the last known status for the instance.
If the instance is in dirty state, because a crash or an unclean shutdown, the startup replays the blocks from the WAL segments in the pg_xlog directory starting from the last checkpoint position read from the pg_control file.
Any corruption in the wal files or, even worse, the pg_control file results in a not startable instance.

After the recovery is complete or the cluster is in clean state, then the startup completes opening the database in production state.