PostgreSQL Dba: The angry dba

Before anything on database administration let me introduce some best practice I acquired during my experience in the consulting company I mentioned before.
This company offer Oracle services to important enterprises in Italy and, despite the size, is well structured and organized.
The first thing I've learned was how to write good documents, not too much verbose but complete in any part.
Then I was inducted to the company's best practice in SQL coding as Oracle have some limitations in the size and characters allowed in the identifiers.
PostgreSQL does not have this limitations giving freedom of choice for any format and structure, with unicode characters (if the database supports UTF8 of course) and a 64 characters maximum length for the identifiers.
This is a good thing but, believe me, sometimes it can be a bloody nightmare.
I've written some guidelines to get things sorted out.

Rule 1 – No camel case or reserved keywords in the identifier name

Let start from the beginning.
Should you need to create a table to store debit notes for the year 2002, which name will you choose?
My old school development habits suggest me something like debit_notes_2002.
A new school developer or maybe used to work with AGILE, SCRUM,ITIL and S**** will probably use the name DebitNotes2002.
This in PostgreSQL, is a problem because all the names are converted in lower case making the camel case useless.
It's possible to keep the case preserved enclosing the identifiers between double quotes “.
That mean that ANY query involving your beautiful identifiers will contain the quotation and this affects the readability of the query.
This happens for any identifier named after reserved keyword as well.
Same story for the column names .

Rule 2 - self explaining schema

In the rule 1 I declared I would name the table for debit notes as debit_notes_2002.
I lied.
I like to put prefix in any identifier to keep track which relation I'm are dealing with.
PostgreSQL stores any relation in one system table named pg_class and give you system views like pg_tables and pg_indexes to query.
But why I should bother in system catalogue queries or people inquiry if I can let the database describe itself?
The way to do this is quite simple.
I put a prefix on any identifier to tell what kind it is or which data carries.
Adding a t_ prefix in the table name will tell to any developer and, most important to any dba/performance tuner, that the relation carries physical data.
Adding a v_ prefix to any view will tell that we are dealing with a short cut to a query and, hopefully, this will avoid the bad habit to join views resulting in the query planner going nuts.
On the same way giving a prefix to the columns to tell the data type will reduce to the developers the risk to get a wrong behaviour for wrong typecast, and will give to the dba precious informations about the space and storage used by the table.
A good structure for building self explaining names can be this.

Object	Prefix
Table	t_
View	v_
Index	idx_
Unique index	uidx_
Primary key	pk_
Foreign key	fk_
Check	chk_
Unique key	uk_
Type	ty_
Sql function	fn_sql_
Plpgsql function	fn_plpg_
Plpython function	fn_plpy_
Plperl function	fn_plpr_
Trigger	trg_
rule	rul_

Table 1: Object prefix

The same for the data type used for the column.
An example for the most common data type is shown in table 2

Type	Prefix
Character	c_
Character varying	v_
integer	i_
text	t_
bytea	by_
numeric	n_

Rule number 4 – format the SQL queries

Sometime I receive requests to debug or optimize queries written in this way
select * from debitnoteshead a join debitnoteslines b on debnotid where a.datnot=b. datnot and b.deblin>1;
Frustrating isn't it?
Despite my legendary bad temper when I'm in service I'm absolutely professional and probably too much kind because I don't bounce the request and I fix the query.
I know I know, I'm doing wrong.
The company doesn't pay my salary to hit tab, enter and ctrl+u, but to keep the company's systems up, running and sound.
So, if you don't want to waste your time formatting when you receive something like this, I suggest to simply bounce back the request asking for a pretty format.
Some simple rules can enhance the readability.

SQL keywords in upper case
all the identifiers must be indented on the same level
All the same level SQL keywords must be indented at same level
Avoid the SELECT * as is a waste of memory and doesn't tell anything useful
If possible inner join the tables explicitally in the WHERE condition
In the join use the keyword ON to make clear which column is used for joining
In the table list use self explaining aliasing

Using those rules the previous query looks like this.
SELECT
productcode,
noteid,
datnot
FROM
debitnoteshead hrd,
debitnoteslines lns
WHERE
hdr.debnotid=lns.debnotid
AND hdr.datnot=lns.datnot
AND lns.deblin>1
;
Now look beauty.

With this last advice ends my first blog post.
Just small afterword before ending.
Probably someone will think this guidelines are too strict to be accepted.
Obviously for the code already written nothing can be done, but can still teach the developers to write decent code.
I think this is a goal can be achieved and believe me, it worth it.
For the new projects this guidelines give you a powerful vision for the database structure without wasting your time through data dictionaries or asking people about the data.
IMHO our job is a war where the casualties are downtime and data loss.
I consider the data the most precious thing a company own and definitely our mission is to guarantee the smallest reaction time to fix any problem minimizing any possible loss of money.

2 comments:

Unknown20 August 2012 at 09:44
Are there any technical benefits to using the implicit join notation over explicit INNER JOIN e.g.

SELECT
productcode,
noteid,
datnot
FROM
debitnoteshead AS hrd INNER JOIN
debitnoteslines AS lns ON
hdr.debnotid = lns.debnotid
WHERE
hdr.datnot = lns.datnot AND
lns.deblin > 1;
-- missing indentation due to comments formatting constraints.

Thank you for your sharing the fruits of your experience.
Unknown20 August 2012 at 09:50
Using the implicit join give the query planner the freedom of choice.
Using the explicit join you can influence it as explained here

http://www.postgresql.org/docs/9.1/static/explicit-joins.html

I like the implicit notations because I feel the inner join more like a filter constraint, it's bad habit from my oracle experience where, before the 9i, no inner join or left outer join was allowed and the the (+) operator dominate the queries ;)