Optimizing Sql Queries. First Steps

This article will show you some simple tips to improve your database skills
By Daniel de la Peña Posted 30 December 2016

Optimizing Sql Queries. First Steps

Write SQL code is easy to learn and to write, the syntax is easy and doesn't describe the internal logic used to obtain the data. But sometimes this simplicity can be dangerous, two queries may look very similar, but their execution time could be totally different.

In these lines, we will try to know some basic tricks and techniques to improve the performance of our queries. Originally, this is devised to SqlServer but the most of tips can be applied in other database engines.

Simple useful tips

"DISTINCT"

"DISTINCT" clause is normally used to avoid duplicity errors in databases, but what many people do not know, is that it is one of the statements that more inputs and outputs generates on the hard drive and more force the processor. We must avoid it as much as possible.

SELECT DISTINCT(CustomerId) FROM dbo.Customer

Wildcard or "*"

A common mistake is to use the "*" to obtain all the columns of the table, but many times not all of them are needed.

SELECT * FROM dbo.Customer WHERE Region=1

If the number of rows of the table is small, probably there will probably be no much difference, but if the table has many rows, we will save processing time.

SELECT CustomerId, CustomerName FROM dbo.Customer WHERE Region=1

"Examine JOINS and WHERE clauses"

If there are no filters, and the majority of table is returned, consider which colums are needed. If there are no filters at all, this could be a big risk.

"WHERE"

Some tips:

1 - Try to search by primary key (clustered): if the PK is composed, use as fields as you can, if you can't search by PK try to use the other indexes.

SELECT CustomerName FROM dbo.Customer WHERE CustomerId=1

2 - Negative "WHERE" clauses: Negative filters using "NOT", "!=", or "<>" try to find something that is not there. Indexes are ignored and the entire table is read. Be careful with that.

SELECT CustomerName FROM dbo.Customer WHERE CustomerId NOT IN (1,2,3,4,5,6,7)

3 - The "LIKE" operator: Beware of the LIKE operator. It usually applies a full scan of a table and ignores indexing. If searching for a small number of records, this could be extremely inefficient. When searching for 10 records in 10 million, it is best to find those 10 records only using something like equality, and not pull them from all 10 million records, because all those 10 million records are read.

SELECT CustomerId FROM dbo.Customer WHERE CustomerName LIKE '%test%'

"TOP"

If you need to obtain certain quantity of rows, you can use TO` clause to only retrive the data that you need. It´s better to use TOP clause instead of ROWCOUNT, that could present mistakes in not sorted lists.

SELECT TOP 10 ProductId, ProductName FROM dbo.Product

"ORDER BY"

Use this clause only when is needed, if you can order the results in your client side, do it. If you have to do it, take into account the following recommendations:

  1. Keep the number of rows to order to a minimum
  2. Keep the number of columns to order to a minimum
  3. Maintain the width (physical size) of the columns to be ordered to a minimum
  4. Sort Columns with Numeric Data (NO Character Data Types)

Use stored procedure instead of heavy duty queries

Stored procedure exists as compiled objects on disk. Moreover, SQL server also keeps optimized execution plan for stored procedure. This really saves time and speeds up execution.

Avoid functions on the left-hand side of the operator

Functions are useful to process complex tasks and they can be used both in the SELECT clause and in the WHERE clause. Although their application in WHERE clauses may cause performance problems. Take a look at the following example:

SELECT CustomerId, CustomerName, ModifiedDate FROM dbo.Customer WHERE DATEDIFF(MONTH, ModifiedDate, '2016-05-01') < 0

Even if there is an index on the "ModifiedDate" column in the table "Customer", the query will still need to perform a full table scan. This is because we use the DATEDIFF function on the column ModifiedDate. The output of the function is evaluated at run time, so the server has to read all the rows in the table to retrieve the necessary data. To improve the performance, the following fix can be made:

SELECT CustomerId, CustomerName, ModifiedDate FROM dbo.Customer WHERE ModifiedDate > '2016-11-16'

This time, we aren’t using any functions in the "WHERE" clause, so the system can utilise an index to seek the data more efficiently.

The importance of indexes

Learning how to index properly, is one of the best things that you can do to improve the performance of your queries. A database without indexes would work slowly and a database with too many indexes as well. We will use the indexes in the columns for which we need to access, search, sort, etc.

An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.

There are two different types of indexes:

Clustered

Characteristic:

  1. Physical ordering of the data of the column
  2. Only one clustered index per table
  3. While creating primary key, a clustered index is created if not exist
  4. Improve the data retrival
  5. Can be created on a view
  6. Every table should need a clustered index for performance

Where we can use them?

  1. Normally the primary key, for example CustomerId in Customer table, but not always.
  2. The field most use in "WHERE", "GROUP BY" and "ORDER BY" clauses.

Non Clustered

Characteristic:

  1. Logical ordering of the data of the column, it´s like a pointer to data.
  2. No limitation of number of non clustered indexes in a table (999 per table)
  3. Can be used with unique constraint on the table acting as a composite key
  4. It should be created on columns witch are used in joins, where, and order by clause.
  5. While creating a unique constraint, a non clustered index is created

Where we can use them?

  1. Search criteria
  2. Used to join other tables
  3. Used as foreign key fields
  4. Used in the "ORDER BY" clause
  5. Etc

Indexes

How to indentify the critical queries in our Database

One thing that SQL Server allow us, is to identify the queries that are consuming more resources in our system. To do that, at DB instance level right click -> Reports -> Standard Reports – Perfomance Top Queries.

Identify critical queries

When we open the report, we have to look at the queries that both the total and the average Logical IO are high. There may be heavy queries that are called very seldom, or very quick queries that are called many times and do not need to be reviewed. We will look at both the total and the average high. For example the Query number 1 in the following report.

Identify critical queries result

In the next chapter…

In the next chapter we will learn to interpret an execution plan, and be able to optimize the queries that are heavier for our database. As we have seen in previous reports

Daniel de la Peña
Daniel de la Peña
Software and Database Developer