Saturday, 26 April 2014

Chapter 5 part 1 - The connection

The connection

When a client tries to connect to a PostgreSQL cluster the process follow few stages which can result in rejection or connection. The first stage is the host based authentication where the cluster scans the pg_hba.conf file searching a match for the connection's parameters, like the host, the username etc. This file is usually stored into the data area amongst the postgresql.conf file and is read from the top to the bottom. If there's match the corresponding method is used, otherwise if there's no match then the connection is refused.
The pg_hba.conf is structured as shown in table 5.1

Table 5.1: pg_hba.conf
Type Database User Address Method
local name name ipaddress/network mask trust
host * * host name reject
hostssl       md5
hostnossl       password
        gss
        sspi
        krb5
        ident
        peer
        pam
        ldap
        radius
        cert

The column type specifies if the connection is local and happens via unix socket or host,hostssl,hostnossl, in this case the tcp/ip is used.The host type matches either an SSL or plain connection, the hostssl only a SSL connection and hostnossl only a plain connection.
The Database and User columns are used to match specific databases or users from the incoming connection. It's possible to use a wildcard to specify everything.
The column address is used only if the type uses the tcp/ip and can be an ip address with network mask or a hostname. Both ipv4 and ipv6 are supported.
The last column is the authentication method. PostgreSQL supports many methods, from the password challenge to the sophisticated radius or kerberos.
For now we'll take a look to the most common.

  • trust allow the connection without any request. Is quite useful if the password is lost but represent a treat on production.
  • peer allow the connection if the OS user matches the database user. Useful to authenticate to the database on the local boxes. Initdb sets this as default method for the local connections.
  • password allow the connection matching the user and password with pg_shadow system table. Beware asthis method sends the password in clear over the network.
  • md5 same as for password this method offer a md5 encryption for the passwords. As the md5 is deterministic a pseudo random subroutine is used during the password challenge to avoid the same string to be sent over the network.
When the connection request matches the pg_hba.conf and the authentication method is cleared, the connection becomes established. The postgres main process forks a new backend process which attaches to the shared buffer.
As the fork process is expensive the connection is a potential bottleneck. A great amount of opening connections can degenerate in zombies resulting in the worst case in a denial of service.
A solution could be to keep all the needed connections constantly established. Even if it seems reasonable, this approach have a couple of unpleasant side effects.
Each connection's slot in the GUC max_connections, consumes about 400 bytes of shared memory; each established connection requires the allocation of the work_mem and the os management of the extra backend process.

For example a 512 MB shared_buffer and 100MB work_mem, with with 500 established connections consumes about 49 GB. Even reducing the work_mem to 10MB the required memory is still 5 GB; in addiction, as seen in 4.5.2, this parameter affects the sorts and subsequently the performance, his value requires then extra care.
In this case a connection pooler like pgpool http://www.pgpool.net/ or the lightweight pgbouncer http://pgfoundry.org/projects/pgbouncer is a good solution. In particular the latter offers a very simple to use configuration, with different pooling levels.
The GUC parameter listen_addresses in a freshly initialised data area is set to localhost. This way the cluster accepts only tcp connections from the local machine. In order to have the cluster listening on the network this parameter must be changed to the correct address to listen or to * for 'all'. The parameter accepts multiple values separated by commas.
Changing the parameters max_connections and listen_addresses require the cluster shutdown and startup as described in 4.2 and 4.3

No comments:

Post a Comment