Friday 6 June 2014

Chapter 6 part 4 - TOAST

The oversize attribute storage technique is the PostgreSQL implementation for the data overflowing the page size.

The user data shown in figure 6.3 is a stream of composite data. Actually the data itself is logically described by the composite model stored in the system catalogue. The attributes in the model can be grouped in two categories, the fixed length and the variable length data type (varlena).

For example, a four bytes integer is a fixed length type and a text is a variable length. For the PostgreSQL's internal routines the data at physical level appears all the same, as a generic datum. When the datum is loaded into the shared buffer becomes meaningful and is managed accordingly with its kind.

The attribute kind is stored in the first two bits6.1 of the varlena length word. When both bits are zero then the attribute is a fixed length data type and the remaining bits give the datum size in bytes including the length word.

If the first bit is set then the value have only a single-byte header and the remaining bits describe the total datum size in bytes including the length byte. If the remaining bits are all zero, then the value is a pointer to an out of line data stored in a separate TOAST table which structure is shown in figure 6.4.

If the first bit is zero but the second bit is set then the datum is compressed and must be decompressed before the use. The compression uses the LZ family algorithm.

The external toast table is a normal table with three fields. The chunk_id, the OID of the toasted data, the chunk_seq and integer for ordering the chunks within the value and the chunk_data, a bytea field containing the the overflown data.

The chunk size is normally 2k and is controlled at compile time by the symbol TOAST_MAX_CHUNK_SIZE. The TOAST code is triggered by the value TOAST_TUPLE_THRESHOLD, also 2k by default. When the tuple's size is bigger then the TOAST routines are triggered.
The TOAST_TUPLE_TARGET normally 2 kB as well governs the compression's behaviour. PostgreSQL will compress the datum to achieve a final size lesser than TOAST_TUPLE_TARGET. If cannot then the out of line storage is used.



Figure 6.4: Toast table structure
TOAST offerst four different storage strategies. Each strategy can be changed per column using the ALTER TABLE SET STORAGE statement.

  • PLAIN prevents either compression or out-of-line storage; It's the only for fixed length data types.
  • EXTENDED allows both compression and out-of-line storage. It is the default for most TOAST-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big.
  • EXTERNAL allows out-of-line storage but not compression.
  • MAIN allows compression but not out-of-line storage. Actually the out-of-line storage is still performed as last resort.
The out of line storage alongside the reduced performances have the advantage of leaving out the stored data, if untouched by the update, from the new row version's generation. That's possible because the varlena is a mere pointer to the chunks and a new row version will affect only the pointer leaving the TOAST data unchanged.

The TOAST table are stored like all the other relation's in the pg_class table, the associated table can be found using a self join on the field reltoastrelid.


6.1 On the big-endian architecture those are the high-order bits; on the little-endian those are the low-order bits

No comments:

Post a Comment