Friday, 31 July 2015

Nuts and bolts part 4 - naming conventions and editors

Named identifiers

Writing and debugging the SQL is not simple and a little bit of clarity becomes a massive help. Adding a prefix to the identifiers gives to the SQL developer a great knowledge about the database schema.




This naming convention makes clear the difference between tables which are physical entities, and the views which are names for saved SQL statement.

Adopting a similar approach for the column names makes the data type immediately recognisable.



Both naming schemas prevent the risk of having reserved keywords used for the identifier's name.

The drawback is if the object type changes. For example if a table becomes a view the name should reflect the change. In an ideal world this shouldn't happen. In the real life building a solid schema design reduces greatly this kind of issues. However, changing the data type, in particular if dealing with big tables, is more complicated than renaming a field and, after all is a DBA problem.

The editor

Unlikely many commercial RDBMS PostgreSQL, ships only with the command line client psql. There is a good quantity of third party clients with good support for the database features and a good connectivity layer. An exhaustive list of those clients can be found on the PostgreSQL wiki
https://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQL_GUI_Tools. Is difficult to say which editor is the best. When I started learning PostgreSQL the only tool available was PgAdmin 2 and phpPgAdmin. I decided for the former and then I switched to the newer PgAdmin 3 which added a powerful editor and multi platform support. In the years I tested some of the other clients like TOra, SQL workbench and SQL Maestro and I never found the same confidence and ease of usage like PgAdmin 3. Whether is the tool of your choice this should have the following features.

libpq connector

One of the reasons I do not like SQL workbench is the JDBC connector. Writing and testing the SQL code is a quick process. Write the statement, test it, then change it, the test it again and so on. The client response in this method is absolutely important. The libpq connector have virtually no lag, except the disk/network bandwidth.

Removal of the Byte Order Mark (BOM)

The BOM is a marker set in the UTF8 files to determine the byte order. The client psql manages the presence but for some reasons it can fail with an error when running sql scripts with this marker. In PgAdmin 3 is possible to disable the BOM when saving the files.

Monday, 27 July 2015

Brighton PostgreSQL Meetup

After a while I finally started a PostgreSQL Meetup in Brighton.
I reckon there are several people interested in PostgreSQL in the UK and I've met many of them in the Sussex area.

I'm quite new to the Meetup platform so I still need to learn how to use it correctly. However, anybody is absolutely welcome to help to organise a PostgreSQL community in the area.

The link is here. http://www.meetup.com/Brighton-PostgreSQL-Meetup/
I've already scheduled a PostgreSQL beer, an informal chit chat about our database of choice and the future of the community.

Save the date,
Friday, August 14, 2015 6:30 PM.

 

The location is the Brunswick in Hove. Big beer garden, no music outside and nice ales.

 

Wednesday, 22 July 2015

Nuts and Bolts - part 3

INSERT

The INSERT statement is composed by two elements, the attribute list and the values list. The lists are positional. The formatting should mark clearly the various elements of the query in order to make it simpler to spot the correspondence between the attribute and the value.

  • After the words INSERT INTO there is one space followed by the table's name
  • After the table's name there is one carriage return
  • The attributes if present are enclosed between round brackets indented one tab from the word INSERT
  • The attribute's list indents one tab from the round brackets
  • The attributes indent one tab fro the round brackets and are separated by a carriage return
  • The word VALUES indents at the same level of INSERT and is followed by a carriage return
  • The value's list is surrounded by round brackets indented one tab from the word VALUE
  • The values indent one tab fro the round brackets and are separated by a carriage return
























UPDATE

The WHERE condition on the update is the same of the SELECT. Take a look to 1.1.1 for more informations. Just after the table's name there is the keyword SET and the list of the updates. The formatting should reflect this particular structure.
  • After the table's name there is one carriage return
  • The word SET indents one tab from the word UPDATE
  • The update's list indents one tab from the word set
  • Each update is separated by a carriage return 












DELETE

The delete is the simplest of the DML as lists just the target relation and the eventual WHERE condition. The formatting is quite simple as the where condition is the only part which requires to be structured as seen in the select.








Data Definition Language

The data definition language groups all the commands defining, altering or dropping the database's relations. Each command type follows a different syntax structure.

CREATE

The create statement is composed by the verb CREATE followed by the object type (e.g. TABLE) and the new relation's name followed by a carriage return. The rest of the statement indents one tab from the verb CREATE.
The rules for the CREATE TABLE are the following.
  • The fields are indented one tab stop from the surrounding brackets
  • Between the field's name and the type there is one or more tab stops
  • The number of tabs is determined by the field with the longest name
  • Between the field's type and the optional DEFAULT there is one space
  • Between the the optional DEFAULT and the optional field's constraint there is one space
  • If the DEFAULT is missing, between the field's type and the field's constraint there is one space
  • Table constraints will require the constraint name followed by one carriage return
  • The constraint type indents one tab from the word CONSTRAINT
  • Any clause indents one tab from the constraint type


 
The rules for the CREATE VIEW are the following.
  • After the view name there is one carriage return
  • The word AS indents one tab from the verb CREATE
  • The round brackets surrounding the view definition indents at the same level of the word AS
  • The view's SELECT follows the same rules defined in the select.












ALTER

The alter statement's only rule is very simple, all the alter actions indent by one tab fromt he ALTER verb and are separated by a carriage return.

DROP

The drop statement's formatting is very simple. Everything is on a single line.

Saturday, 4 July 2015

Nuts and Bolts - part 2

WITH statements

Because the WITH statement works like having a temporary table, the formatting will follow rules similar to the CREATE TABLE statement.

• Between the WITH and the alias there is a tab followed by the word AS and a carriage return
• The round brackets surrounding the inner query indent one tab from the alias
• The inner query indents one tab from the round brackets





Conclusions 

This coding style is at first sight complex and with some drawback as it can slow down the coding process because of more keystrokes required for the uppercase and the carriage return. However, when the writing automatism is acquired the coding speed is very fast.

This coding style creates the query using a top down structure where the different part of the query are indented in order to reflecting their level inside the query’s logic. This increases the query readability in particular when dealing with complex queries.

The vertical block structure gives great control on the query’s sub sections. With query editors supporting the partial code execution (e.g. pgadmin3’s query ) is possible to highlight the query’s single sections and execute them immediately.

I developed this style from the induction I received in my first DBA job. My employer enforced along the developers and DBA a similar set of rules which worked perfectly, improving the produc- tivity and the code’s consistency across the company.


Incidentally this matches almost perfectly the way PostgreSQL processes a select. What follow is an excerpt of the PostgreSQL’ manual describing how the select is processed.


SELECT retrieves rows from zero or more tables. 
The general processing of SELECT is as follows:

1. All queries in the WITH list are computed.
These effectively serve as temporary tables that can be referenced in the FROM list.
A WITH query that is referenced more than once in FROM is computed only once.
 

2. All elements in the FROM list are computed.
(Each element in the FROM list is a real or virtual table.)
If more than one element is specified in the FROM list, they are cross-joined together.
 

3. If the WHERE clause is specified, all rows that do not satisfy the condition
are eliminated from the output.
 

4. If the GROUP BY clause is specified, or if there are aggregate function calls,
the output is combined into groups of rows that match on one or more values,
and the results of aggregate functions are computed.
If the HAVING clause is present, it eliminates groups that do not satisfy the
given condition.
 

5. The actual output rows are computed using the SELECT output expressions
for each selected row or row group.
 

6. SELECT DISTINCT eliminates duplicate rows from the result.
SELECT DISTINCT ON eliminates rows that match on all the specified expressions.
SELECT ALL (the default) will return all candidate rows, including duplicates.


7. Using the operators UNION, INTERSECT, and EXCEPT, the output of more than one
SELECT statement can be combined to form a single result set.
The UNION operator returns all rows that are in one or both of the result sets.
The INTERSECT operator returns all rows that are strictly in both result sets.
The EXCEPT operator returns the rows that are in the first result set but not in the second.
In all three cases, duplicate rows are eliminated unless ALL is specified.
The noise word DISTINCT can be added to explicitly specify eliminating duplicate rows.
Notice that DISTINCT is the default behavior here, even though ALL is the default for SELECT itself.
 

8. If the ORDER BY clause is specified, the returned rows are sorted in the specified order.
If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
 

9. If the LIMIT (or FETCH FIRST) or OFFSET clause is specified, the SELECT statement only returns
a subset of the result rows.
 

10. If FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE or FOR KEY SHARE is specified,
the SELECT statement locks the selected rows against concurrent updates.

 

Thursday, 2 July 2015

Nuts and bolts - part 1

Before starting with the book's topic I want to explain how to set up an efficient environment and some good practice which can improve the code's readability and quality. As somebody will notice these methods are completely opposite to the general code style trends. I'll try to give the motivation for each rule. Anyway, in general because the SQL universe is a strange place this requires strange approach. In order to write and read effectively the SQL the coder should gain the capability to get a mental map between the query's sections and the underlying logic. This can be facilitated using a clear and well defined formatting.

Code formatting

The SQL language is based on statements. Each statement is terminated by a specific character, by default ``;'' . In order to understand the logic of the SQL statements is very important to get the statements formatted in the same way the database executes them. We'll look first how to format the SELECT, then the DML and finally the DDL. Each of those queries needs a different structure because of the different logic. The common formatting rules are the following.
  • There is a tab separator of 7 spaces
  • All the keywords are in upper case
  • After a round bracket there is a carriage return and one tab indented
  • The opening and closing round brackets are indented at the same level 

SELECT

When processing a query the parser works backward. It starts from the innermost and lowest complete statements and moves upward in order to get the entire picture. There is one remarkable exception. The WITH statements are computed first because required in the rest of the query. Actually the WITH acts like a temporary table. In order to see immediately the way PostgreSQL processes the query the indention should follow the same logic.
Basic formatting rules.
  • After the word SELECT there is a carriage return
  • The columns are separated by a comma and a carriage return
  • The columns indent one tab from the respective SELECT
  • The word FROM indents at the same level of SELECT
  • The relation indent one tab from the word FROM
  • The word WHERE indents at the same level of SELECT and FROM
  • A single condition indents one tab from the WHERE
  • Multiple conditions indent two tabs from the where and the logical operators indent one tab from the where
  • Between the logical operators and the condition there is one tab
  • Group by and order by follow the same rules of the select list
  • The queries joined by UNION, INTERSECT, and EXCEPT indent one tab
  • The terminator character indents at the same level of the SELECT,FROM and WHERE
Let's format a simple select from one relation and with one or more condition in the where section.



The first thing to do is to change all the keywords in uppercase.



We'll then format the select list.










Finally  the from and the where.















Having multiple conditions requires an extra tab in order to keep aligned the logical operators and the conditions. 




















Joins.
  • If using the implicit joins the relations are separated by a comma and a carriage return, and indented at the same level
  • When using explicit joins specify the kind of join [INNER,LEFT OUTER,CROSS etc.]
  • After the left hand relation there is a carriage return
  • The join clause indents at the same level of the left hand relation
  • After the words [INNER,LEFT OUTER,CROSS] JOIN follows the relation's name and a carriage return
  • The word ON indents one tab from the JOIN and there is a carriage return
  • The join conditions follow the same rules of the WHERE condition
  • Nested joins indent one tab from the upper join
  • The usage of the aliases, with the optional word AS is strongly encouraged 
Query with implicit join 






Query with explicit inner join





Subqueries
  • All the subqueries will indent one tab from the surrounding brackets
  • If the subquery is short in can stay in line with the round brackets (e.g. SELECT now())
Query with subselect in the SELECT's list
































Query with subselect in the FROM's list