Thursday, 2 July 2015

Nuts and bolts - part 1

Before starting with the book's topic I want to explain how to set up an efficient environment and some good practice which can improve the code's readability and quality. As somebody will notice these methods are completely opposite to the general code style trends. I'll try to give the motivation for each rule. Anyway, in general because the SQL universe is a strange place this requires strange approach. In order to write and read effectively the SQL the coder should gain the capability to get a mental map between the query's sections and the underlying logic. This can be facilitated using a clear and well defined formatting.

Code formatting

The SQL language is based on statements. Each statement is terminated by a specific character, by default ``;'' . In order to understand the logic of the SQL statements is very important to get the statements formatted in the same way the database executes them. We'll look first how to format the SELECT, then the DML and finally the DDL. Each of those queries needs a different structure because of the different logic. The common formatting rules are the following.
  • There is a tab separator of 7 spaces
  • All the keywords are in upper case
  • After a round bracket there is a carriage return and one tab indented
  • The opening and closing round brackets are indented at the same level 

SELECT

When processing a query the parser works backward. It starts from the innermost and lowest complete statements and moves upward in order to get the entire picture. There is one remarkable exception. The WITH statements are computed first because required in the rest of the query. Actually the WITH acts like a temporary table. In order to see immediately the way PostgreSQL processes the query the indention should follow the same logic.
Basic formatting rules.
  • After the word SELECT there is a carriage return
  • The columns are separated by a comma and a carriage return
  • The columns indent one tab from the respective SELECT
  • The word FROM indents at the same level of SELECT
  • The relation indent one tab from the word FROM
  • The word WHERE indents at the same level of SELECT and FROM
  • A single condition indents one tab from the WHERE
  • Multiple conditions indent two tabs from the where and the logical operators indent one tab from the where
  • Between the logical operators and the condition there is one tab
  • Group by and order by follow the same rules of the select list
  • The queries joined by UNION, INTERSECT, and EXCEPT indent one tab
  • The terminator character indents at the same level of the SELECT,FROM and WHERE
Let's format a simple select from one relation and with one or more condition in the where section.



The first thing to do is to change all the keywords in uppercase.



We'll then format the select list.










Finally  the from and the where.















Having multiple conditions requires an extra tab in order to keep aligned the logical operators and the conditions. 




















Joins.
  • If using the implicit joins the relations are separated by a comma and a carriage return, and indented at the same level
  • When using explicit joins specify the kind of join [INNER,LEFT OUTER,CROSS etc.]
  • After the left hand relation there is a carriage return
  • The join clause indents at the same level of the left hand relation
  • After the words [INNER,LEFT OUTER,CROSS] JOIN follows the relation's name and a carriage return
  • The word ON indents one tab from the JOIN and there is a carriage return
  • The join conditions follow the same rules of the WHERE condition
  • Nested joins indent one tab from the upper join
  • The usage of the aliases, with the optional word AS is strongly encouraged 
Query with implicit join 






Query with explicit inner join





Subqueries
  • All the subqueries will indent one tab from the surrounding brackets
  • If the subquery is short in can stay in line with the round brackets (e.g. SELECT now())
Query with subselect in the SELECT's list
































Query with subselect in the FROM's list


4 comments:

  1. I agree that code formatting is a very important thing, but just imagine you type "select ... from ... where ..." a hundred times a day. Would it be easy for you to type all the keywords in uppercase? In fact, it is annoying and slowing down the performance of a developer.
    The only reason I can think of doing that is that the sql is to be examined later by people who are not good at sql (such formatting makes the sql more understandable).

    It is hard to imagine so many carriage returns and tabs in a non-trivial sql. I mean, if there are more then 10 tables joined with subselects, functions' calls etc. the final select would be huge. I think, keeping a sql compact is more important (especially, if it is quite complex).

    For example:

    select t1.field1,
    t1.field2,
    t1.field3,
    t1.field4
    from table1 t1
    join table2 t2 on t2.field1 = t1.field1
    join table3 t3 on t3.field1 = t1.field1
    left join table4 t4 on t4.field2 = t3.field2
    join table5 t5 on t5.field1 = t1.field1
    join table6 t6 on t6.field1 = t1.field1
    join table7 t7 on t7.field5 = t6.field5
    join table8 t8 on t8.field6 = t7.field6
    left join table9 t9 on t9.field12 = t1.field12
    where t1.field3 = 30000
    and t1.field4 <> 1;

    ReplyDelete
    Replies
    1. This is the reason why I like the pgadmin3's sql editor.
      It's quick and efficient in sql development. Similar (without the drawback) to oracle's SQL*.

      I write hundreds of lines of sql (and plpgsql) code every day and with the approach described in this post I never get lost.

      Delete
    2. Oops, all the tabs were eaten before the post saved.:)

      Delete
    3. yes, blogger is quite bad. I've in the to list a proper website for this.

      Delete