Sunday, 18 May 2014

Chapter 5 part 4 - Indices

Indices

An index is a relation capable to map the values in an structured way pointing the table's position where the rows are stored. The presence of an index doesn't mean this will be used for read. By default and index block read have an estimated cost four times than the table block read. However the optimiser can be controlled using the two GUC parameters seq_page_cost for the table sequential read read and random_page_cost for the index. Lowering the latter will make more probable the optimiser will choose the indices when building the execution plan. Creating indices is a complex task. At first sight adding an index could seem a harmless action. Unfortunately their presence adds overhead to the write operations unpredictably. The rule of thumb is add an index only if really needed. Monitoring the index usage is crucial. Querying the statistics view pg_stat_all_indexes is possible to find out if the indices are used or not.
For example, the following query finds all the indices in che public schema, with zero usage from the last database statistics reset.
SELECT
        schemaname,
        relname,
        indexrelname,
        idx_scan
FROM
         pg_stat_all_indexes
WHERE
                schemaname='public'
        AND     idx_scan=0
;
PostgreSQL supports many index types.
The general purpose B-tree, implementing the Lehman and Yao's high-concurrency B-tree management algorithm. The B-tree can handle equality and range queries and returns ordered data. As the data is actually stored in the page and because the index is not TOASTable, the max length for an index entry is 1/3 of the page size. This is the limitation for the variable length indexed data (e.g. text).

The hash indices can handle only equality and aren't WAL logged. That means their changes are not replayed if the crash recovery occurs, requiring a reindex in case of unclean shutdown.

The GiST indices are the Generalised Search Tree. The GiST is a collection of indexing strategies organized under an infrastructure. They can implement arbitrary indexing schemes like B-trees, R-trees or other. The operator classes shipped with PostgreSQL are for the two elements geometrical data and for the nearest-neighbor search. As the GiST indices are not exact , when scanned the returned set doesn't requires a to remove the false positives.

The GIN indices are the Generalised Inverted Indices. This kind of index is optimised for indexing the composite data types or vectors like the full text search elements. The GIN are exact indices, when scanned the returned set doesn't require recheck.
There's no bitmap index implementation in PostgreSQL. At runtime the executor can emulate partially the bitmap indices reading the B-tree sequentially and matching the occurrences in the on the fly generated bitmap.
The index type shall be specified in the create statement. If the type is omitted then the index will default to the B-tree.
 CREATE INDEX idx_test ON t_test USING hash (t_contents);
As the index maintenance is a delicate matter, the argument is described in depth in 7.

No comments:

Post a Comment