Thursday 29 January 2015

The cluster in action - part 2

2.4 The background writer

Before the spread checkpoints the only solution to ease down the IO spike caused by the checkpoint was to tweak the background writer. This process were introduced with the revolutionary PostgreSQL 8.0. The writer, as the name suggests, works in the background searching for dirty buffers to write on the data files. The writer works in rounds. When the process awakes scans the shared buffer for dirty buffers. When the amount of buffers cleaned reaches the value set in bgwriter_lru_maxpages the process sleeps for the time set in bgwriter_delay.

2.5 The autovacuum

The routine vacuuming is an important task to prevent the table bloat and the dreaded XID wraparound failure. If enabled the autovacuum launcher starts one daemon for each relation with enough dead tuples to trigger the conditions set in autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. An autovacuum daemon is a normal backend and appears in the view pg_stat_activity. Because the XID wraparound failure is a really serious problem, the autovacuum to prevent wraparound starts even if the autovacuum is turned off.

2.6 The backends

The PostgreSQL backend architecture is the brilliant solution to a nasty problem. How to guarantee the buffers are read only by one session at time and avoid the bottleneck of a long waiting queue. When a backend needs to access a particular tuple, either for read or write, the relation's pages are accessed to find the tuple matching the search criteria. When a buffer is accessed then the backend sets a pin on the buffer which prevents the other backends requiring the same page to wait. As soon as the tuple is found and processed the pin is removed. If the tuple is modified the MVCC enforces the tuple's visibility to the other backends. The process is fine grained and very efficient. Even with an high concurrency rate on the same buffers is very difficult to have the backends entangled.
A backend process is a fork of the main postgres process. It's very important to understand that the backend is not the connection but a server process which interacts with the connection. Usually the backend terminates when the connection disconnects. However, if a client disconnects ungracefully meanwhile a query is running without signalling the backend, the query will continue only to find there's nothing listening on the other side. This is bad for many reasons. First because is consuming a connection slot for nothing. Also the cluster is doing something useless consuming CPU cycles and memory.

Like everything in PostgreSQL the backend architecture is oriented to protect the data and in particular the volatile shared buffer. If for some reasons one of the backend process crashes then the postgres process terminates all the backends in order to prevent the potential shared buffer corruption. The clients should be able to manage this exception resetting the connection.

2.7 Wrap up

The cluster's background activity remains most of the time unnoticed. The users and developers can mostly ignore this aspect of the PostgreSQL architecture leaving the difficult business of understanding the database heartbeat to the DBA, which should have the final word on any potential mistake in the design specs. The next chapters will explore the PostgreSQL's architecture in details, starting with the memory.

No comments:

Post a Comment