Monday, 26 May 2014

Chapter 5 part 7 - Transactions

Transactions

PostgreSQL implements the MVCC which stands for Multi Version Concurrency Control. This offers high efficiency in multi user access for read and write queries. When a new query starts a transaction identifier is assigned, the XID a 32 bit quantity. To determine the transaction's snapshot visibility, all the committed transactions with XID lesser than the current XID are in the past and then visible. Otherwise, all the transactions with XID greater than the current XID are in the future and not visible.
This comparison happens at tuple level using two system fields xmin and xmax having the xid data type. When a transaction creates a new tuple then the transaction's xid is put into the tuple's xmin value. When a transaction deletes a tuple then the xmax value is set to the transaction's xid leaving the tuple in place for read consistency. When a tuple is visible to any transaction is called a live tuple, a tuple which is no longer visible is a dead tuple.

PostgreSQL have no dedicated field for the update's xid. That's because when an UPDATE is issued PostgreSQL creates a new tuple's version with the updated data and sets the xmax value in the old version making it disappear.

A dead tuple can be reclaimed by VACUUM if no longer required by running transactions, anyway tables updated often can result in data bloat for the dead tuples and for the eventual indices.

When designing a new data model, the PostgreSQL's peculiar behaviour on the update should be the first thing to consider, in order to limit the table and index bloat.

Among the xmin,xmax two other system fields the cmin and cmax which data type is CID, command id. Those are similar to the xmin/xmax quantities and usage and their usage is to track the internal transaction's commands, in order to avoid the command execution on the same tuple more than one time. The pratical issue is explained in the well known Halloween Problem. For more informations take a look here http://en.wikipedia.org/wiki/Halloween_Problem .

The SQL standard defines four level of transaction's isolation levels where some phenomena are permitted or forbidden.
Those phenomena are the following.

  • dirty read A transaction reads data written by a concurrent uncommitted transaction
  • nonrepeatable read A transaction re reads the data previously read and finds the data changed by another transaction which has committed since the initial read
  • phantom read A transaction re executes a query returning a set of rows satisfying a search condition and finds that the set of rows satisfying the condition has changed because another recently-committed transaction
Table 5.2 shows the isolation levels with the allowed phenomena. In PostgreSQL it's possible to set all the four isolation levels but only the three more strict are supported. Setting the isolation level to read uncommited fallback to the read committed in any case.
By default the global isolation level is set to read committed, it's possible to change the session's transaction isolation level using the command:
SET TRANSACTION ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ 
COMMITTED | READ UNCOMMITTED };
To change the default transaction isolation level cluster wide there is the GUC parameter transaction_isolation.


Table 5.2: Standard SQL Transaction Isolation Levels
Isolation Level Dirty Read Nonrepeatable Read Phantom Read
Read uncommitted Possible Possible Possible
Read committed Not possible Possible Possible
Repeatable read Not possible Not possible Possible
Serializable Not possible Not possible Not possible

No comments:

Post a Comment