SQL Performance: Say no to the "NOT IN" operator

Hello friends,

Paraphrasing the title of the american television show "Say Yes to the dress", our title of today is "Say no to the NOT IN operator".

Why?

Well, we will show you that the "Not in" operator can be substituited with the "Not Exists" operator and that normally the Not Exists operator is fastest!

How we write a Query ultimately determines how much it will be efficient.
Yes, SQL Server is a declarative language (we tell him what we want to achieve and he internally decides how to get it) but if we help him by applying a series of good rules we will get improvements from the point of view of performance.

But now, as usual, we will show you with an example

Let's go!

For the example we will use our tables ORDRIG and PRODUCTS. Each table have a clustered index on the ID integer field:

       
CREATE CLUSTERED INDEX IDX_ORDRIG_ID ON ORDRIG(ID)
CREATE CLUSTERED INDEX IDX_PRODUCTS_ID ON PRODUCTS(ID)

Well now imagine your boss ask a list of rows of orders whoose description do not contain the word "CARTUC".

You could write the Query below with a NOT IN operator:

       
SELECT * FROM ORDRIG T
WHERE T.DESCR NOT IN
  (SELECT DESCR FROM PRODUCTS WHERE DESCR LIKE 'CARTUC%')

Or you could write the same query with an NOT EXISTS operator:

       
SELECT * FROM ORDRIG T
WHERE T.DESCR NOT EXISTS
  (SELECT 1 FROM PRODUCTS A WHERE T.DESCR = A.DESCR AND A.DESCR LIKE 'CARTUC%')

Yes but what is the difference? The result is the same but the execution plans differ!

Differences

When SQL Server resolve the Query with the NOT IN operator, consider the main part of the Query on the ORDRIG before and the part of the PRODUCTS table after.

For this reason SQL Server creating an execution plan first use a clustered index scan on the ORDRIG table. As a second step use the clustered index scan on the PRODUCTS table.

The type on join choosen in this case is a nested loop!

Using the NOT EXISTS operator the two tables are joined though the DESCR field and SQL Server is free to choose for the execution plan wich order use.

Especially it is free to choose to use as main table the table with less rows.

In our example the table with less rows is the PRODUCTS table while the ORDRIG table is a detail. With the two table arranged in this way SQL SERVER is free to use a more efficent Hash Match Join.

A correct join type is foundament and it is sufficient to look at the pages read to realize it:

With the NOT IN operator we read more than 2 million of pages from the PRODUCT table:

       
Tabella 'Products'. Conteggio analisi 1, letture logiche 2248139, letture fisiche 1, letture read-ahead 2, letture logiche LOB 0, letture fisiche LOB 0, letture read-ahead LOB 0.
Tabella 'OrdRig'. Conteggio analisi 1, letture logiche 35478, letture fisiche 3, letture read-ahead 35474, letture logiche LOB 0, letture fisiche LOB 0, letture read-ahead LOB 0.

While using the NOT EXISTS operator we read just 4 pages from the PRODUCT table:

       
Tabella 'OrdRig'. Conteggio analisi 1, letture logiche 35478, letture fisiche 3, letture read-ahead 9353, letture logiche LOB 0, letture fisiche LOB 0, letture read-ahead LOB 0.
Tabella 'Products'. Conteggio analisi 1, letture logiche 4, letture fisiche 0, letture read-ahead 0, letture logiche LOB 0, letture fisiche LOB 0, letture read-ahead LOB 0.

This is really cool! With the NOT IN operator we get the result in 30 seconds, with the NOT EXISTS operator we get the same result in 20 seconds.

This in another tips and another aspect to take care if you want to write well performing queries!

That's all folks and remember:

If you liked this post then leave a comment, subscribe to the blog and wait for the next post!

Luca

Previous post:SQL Server, A bit of reverse engineering inside the parser

Search This Blog

SQL Server Performance & Troubleshooting – Where Milliseconds Matter 🚀