Thursday 20 February 2014

Chapter 2 - Database installation

This chapter will cover the install procedure, on Debian Gnu linux compiling from source and using the packaged install from the pgdg archive.

Install from source

Installing from source, using the default configuration settings requires the root access as the default install location is in /usr/local/. To simplify things I've created a procedure with minimal need for root access. This of course is still required but only for the os user creation and to install the dependencies.
Before starting with the postgresql part, ask your sysadmin, or do it by yourself, to do the following

  • Create a postgres group and a postgres user
  • Add the postgres user to the postgres group
  • Install the packages
  • build-essential
  • libreadline6-dev
  • zlib1g-dev
When everything is in place login as postgres user and download the source's tarball.
mkdir download cd download wget http://ftp.postgresql.org/pub/source/v9.3.2/postgresql-9.3.2.tar.bz2
Extract the tarbal with
tar xfj postgresql-9.2.3.tar.bz2 cd postgresql-9.2.3
The configure script have the -prefix option to set the install directory in a custom location. Assuming the postgres user have his home directory in /home/postgres, we'll put the install target into the bin directory organised per mayor version. In this way is possible to have multiple versions on the same box whitout hassle.
mkdir -p /home/postgres/bin/9.3 ./configure -prefix=/home/postgres/bin/9.3
The script will check all the dependencies and will generate the makefiles. Any error at configure time is usually explained good.
When everthing looks fine you can start the build process with the make command. The time required to compile depends from the box speed. On a laptop usually this doesn't require more than 30 minutes.
After the build is complete is a good idea to run the regression tests before installing.
To do it simply run make check.
All the output is written in the directory src/test/regress/results.
If everything is fine you can complete the installation with
make install
After this, into the /home/postgres/bin/9.3 directory will appear with 4 subfolders bin include lib and share.
The bin folder is the place of the database binaries, like the client psql or the server postgres.
The include folder is the place for the server's header files.
The lib folder is the location for all shared libraries and functions used by the database installation.
In the share folder you'll find the example files and the extension.

Packaged install

The PostgreSQL Global Group mantains an apt repository to simplify the install on the GNU/Linux based on debian.
The supported Linux versions are listed on the wiki page http://wiki.postgresql.org/wiki/Apt and at moment are

  • Debian 6.0 (squeeze)
  • Debian 7.0 (wheezy)
  • Debian unstable (sid)
  • Ubuntu 10.04 (lucid)
  • Ubuntu 12.04 (precise)
  • Ubuntu 13.10 (saucy)
  • Ubuntu 14.04 (trusty)
The packages are available for amd64 and i386.
The available database versions are
  • PostgreSQL 8.4
  • PostgreSQL 9.0
  • PostgreSQL 9.1
  • PostgreSQL 9.2
  • PostgreSQL 9.3
Before starting you shall import the GPG key to validate the packages.
In a root shell simply run wget -quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
Then add the file pgdg.list into the directory /etc/apt/sources.d/ with the following contents
deb http://apt.postgresql.org/pub/repos/apt/ codename-pgdg main
Change the codename value accordingly with your distribuition. (e.g. wheezy) then you can run
apt-get update
To install the full set of binaries simply run
apt-get install postgreql-9.3 postgreql-contrib-9.3 postgreql-client-9.3
This will set up also a database cluster up and running.

Wednesday 12 February 2014

Chapter 1 - PostgreSQL at glance

With this article I'll start a publication series of the WIP books on postgresql.
I've decided to split all the original book in three or more parts simpler for me to manage and write.
I'll publish a new chapter when is finished and the book's pdf when is complete.
The document is targeted to the DBA and to people who want to learn the PostgreSQL administration.
This first chapter is a mere introduction, is surely full of errors and typos. Please forgive me, I'm not English after all and I've studied French at school.

Thanks for reading my stuff.

PostgreSQL at glance

PostgreSQL is a first class product with high end enterprise class level features. This first chapter is a general review on the product with a brief talk on the database's history.

Long time ago in a galaxy far far away...

Following the works of the Berkeley's Professor Michael Stonebraker, in the 1996 Marc G. Fournier asked for any volunteer interested in revamping the Postgres 95 project.


The answer came from Bruce Momjian,Thomas Lockhart e Vadim Mikheev, the very first PostgreSQL Global Development Team.

Features

Every time a new major release is released, new powerful features join the rich set of the product's functionalities. There's a small excerpt of what the latest version offer in terms of flexibility and reliability.

Write ahead logging

Like any RDBMS worth of this name PostgreSQL have the write ahead logging feature. In short, when a data block is updated the change is saved in a reliable location, the so called write ahead log. The effective write on the datafile is performed later. Should the database crash the WAL is scanned and the saved blocks are replayed during the crash recovery. PostgreSQL stores the redo records in fixed size segments, usually 16 MB. When the wal segment is full PostgreSQL switches to a newly created or recycled wal segment in the process called log switch.

Point in time recovery

When the log switch happens is possible to archive the previous segment in a safe location. Taking an inconsistent copy of the data directory is possible to restore a fully functional cluster because the archived wal segments have all the informations to replay the physical data blocks on the inconsistent data files. The restore can be, optionally stopped at a given point in time. For example is possible to recover a PostgreSQL cluster to one second before the a catastrophic happening (e.g. a table drop).

Standby server and high availability

The inconsistent snapshot can be configured to stay up in continuous archive recovery. PostgreSQL 8.4 supports the warm standby configuration where the standby server does not accept connections. From the version 9.0 is possible to enable the hot standby configuration to access the standby server in read only mode.

Streaming replication

The wal archiving doesn't work in real time. The wal shipping happens only after the log switch and in a low activity server this can leave the standby behind the master for a while. Using the streaming replication a standby server can get the wal blocks over a database connection in almost real time.

Transactional

PostgreSQL fully supports the transactions and is ACID compliant. From the version 8.0 the save points were introduced.

Procedural languages

Amongst the rich of feature procedural language pl/pgsql, many procedural languages such as perl or python are available for writing database functions. The DO keyword was introduced in the 9.1 to have anonymous function's code blocks.

Partitioning

The partitioning, implemented in PostgreSQL is still very basic. The partitions are tables connected with one empty parent table using the table's inheritance. Defining check constraints on the partitioned criteria the database can exclude, querying the parent table, the partitions not affected by the where condition. As the physical storage is distinct for each partition and there's no global primary key enforcement nor foreign keys can be defined on the partitioned structure.

Cost based optimizer

The cost based optimizer, or CBO, is the one of PostgreSQL's point of strenght. The query execution is dynamically determined and self adapting to the underlying data structure or the estimated amount of data affected. PostgreSQL supports also the genetic query optimizer GEQO.

Multi platform support

PostgreSQL nowadays supports almost any unix flavour and from the the version 8.0 is native to Windows.

Tablespaces

The tablespace support permits the data files fine grain distribution on the OS filesystems.

MVCC

The way PostgreSQL keeps things consistent is the MVCC which stands for Multi Version Concurrency Control. The mechanism is neat and efficient, offering great advantages and one single disadvantage. We'll see in detail further but keep in mind this important sentence.
There's no such thing like an update in PostgreSQL.

Triggers

Triggers to execute automated tasks on when DML is performed on tables and also views are supported at any level. The events triggers are also supported.

Views

The read only views are well consodlidated in PostgreSQL. In the version 9.3 was added the support for the materialized and updatable. Also the implementation is still very basic as no incremental refresh for the mat views nor update is possible on complex views. Anyway is still possible to replicate this behaviour using the triggers and procedures.

Constraint enforcement

PostgreSQL supports primary keys and unique keys to enforce local data meanwhile the referential integrity is guaranteed with the foreign keys. The check constraint to validate custom data sets is also supported.

Extension system

PostgreSQL implements the extension system. Almost all the previously known contrib modules are now implemented in this efficient way to add feature to the server using a simple SQL command.