Tuesday 17 June 2014

Chapter 7 - Part 2, analyze

analyze

The PostgreSQL's query optimiser is based on the costs estimates. When building the execution plans the planner consults the internal statistics and assigns to each node plan an estimated cost. The plan with the smallest total estimated cost is then executed. Having up to date and accurate statistics will help the database to keep up the performances.
The ANALYZE command is used to gather the usage statistics. When launched reads the data, builds up the statistics and stores them into the pg_statistics system table. The command accepts the optional clause VERBOSE to increase verbosity and the optional target table and the column list. If ANALYZE is launched with no parameters it processes all the tables in the database. Launching ANALYZE with the table name only, will process all the table's columns.

When working on large tables ANALYZE runs a sample random read on a table's portion. The GUC parameter default_statistics_target determines the amount of entries read by the sample. The default limit is 100. Increasing the value will cause the planner to get better estimates. in particular for columns having data distributed irregularly. This accuracy costs more time for the statistics gathering and space because requires a bigger storage in the pg_statistics table.

To show how the default_statistics_target can affects the estimates, let's run an ANALYZE VERBOSE with the default setting on the table created in 7.1.




Even if the table have 10 million rows, the analyse estimates only 2,909,979 rows, the 30% of the total effective storage.
Changing the default_statistics_target to its maximum value of 10000 ANALYZE will get better estimates.






This time the table estimate is correctly determined in 10 millions live rows.
The pg_statistics stores the gathered data for the database usage only. To help users and administrators it's present the view pg_stats providing a human readable visualization of the gathered statistics.

As general rule, before starting any performance tuning, it's important to check if database statistics are recent and accurate. The information is stored into the view pg_stat_all_tables 7.2.
For example this query gets the last execution of the manual and the auto vacuum with the analyze and auto analyze, for a given table.



The statistics target can be set per column to fine tune the ANALYZE with the ALTER TABLE SET STATISTICS statement.




The SET can be used, as seen before, to change the default statistics target for the current session. Otherwise is possible to change the value cluster wide changing the parameter in the postgresql.conf file.

No comments:

Post a Comment