Inside the SQL Server Query Optimizer - part 2 All about the Simplification


Hi guys,
I am back!

After the introduction done in the previous article, today we talk about the simplication step of the Query Optimizer.

In my opinion, knowing what types of optimization SQL Server might apply is very important in order to write a Query that run Quickly!

So, are you ready for the second part of this exciting journey to discover the Query Optimizer?

Ready? Go!


The Simplification

Let now start speaking about the Simplification phase.
Simplification is the first step of the optimization pipeline.
It's an important step because during this phase the optimizer try to modify the logical tree in order to remove redundancies.
Optimizer try also to change the order of the logical operation in order to facilitate later step
.

We can split the simplification phase into various sub phases that are:
  • Constant Folding
  • Domain simplification
  • Predicate push-down
  • Join simplification
  • Contradiction detection


The constant folding

During this phase SQL server evaluates one or more expression before the Query is compiled.
This happens in limited cases:


1) Arithmetic expressions that contain only constant and without variables or parameters.


This is foldable:

SELECT ID FROM TABLE WHERE IDFIELD = 5 + 7 
 
And so after the Folding we have:

SELECT ID FROM TABLE WHERE IDFIELD = 12
  


This is not foldable:

SELECT ID FROM TABLE WHERE IDFIELD = 5 + @a            
      
Because @a is a parameter


2) Logical expressions that contains only constants 


SELECT ID FROM TABLE WHERE 7 > 5 

Is foldable: since 7 > 5 after the folding we have TRUE.



3) Built-in deterministic functions whose input rely solely on provided values. Non deterministic functions and scalar user-defined functions (UDF) are not foldable.

Note: we can verify using the traceflag 8605 and then watching at the execution plan:

 
Example n° 1
       
SELECT UnitPrice from Product where UnitPrice = 18+12
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);
       





Example n° 2
       
SELECT UnitPrice from Product where UnitPrice + 1+2 = 2+2
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);
      





Note:
  • Only the part at the right of the “=” 2 + 2 is folded.
  • The part at the left of the “=” is not folded because a column
  • SQL doesn’t effect this calculation: UnitPrice + 1+2 = 2+2 => Unitprice = 2+2 – (1+2) => UnitPrice = 1


Note that is a best practice write T-SQL code in order to permit the use of the constant folding feature. In this manner we alleviate the need to repeatedly evaluate the expression at runtime.

The Domain simplification


During this phase the optimizer try to detect the range of the values that a column can have or an expression can have.

Example:
       
SELECT * from Product
where (UnitPrice between 1 and 2) OR
      (UnitPrice between 2 and 5)
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);
 





Predicate push-down


Since the input tree in evaluated from bottom to top, in this step the optimizer try to push down the filter specified in the WHERE clauses.
Note that this operation should have always a positive effect.
This is because we try to reduce the number of rows that the later logical operations must do.

In other words, the predicate push-down is the action of using the predicate directly in the seek or in the scan and so reading only the rows that match the give predicate.


Example:

We have a table “Product”

 
       
SELECT P.SupplierId , P.UnitPrice
FROM Product P
WHERE  
 P.SupplierId = 1 and
 P.UnitPrice = 18
 
 
If we look at the execution plan we can see we read only 3 rows from the table products.






Join Simplification


The Join Simplification is another great feature!
It’s goal is to remove JOIN when they are not necessary.
Convert also a left outer join into an inner join when the NULLs introduced by the left outer join are later by another feature in the Query.
Lastly remove the empty subexpression.


Example 1
 
       
SELECT P.SupplierId, P.UnitPrice
From Product P
  LEFT OUTER JOIN Supplier S on S.Id = P.SupplierId
Where  
 P.SupplierId = 1 and
 P.UnitPrice = 18
    

LEFT OUTER JOIN on table Suppliers is removed



Example 2

SELECT P.SupplierId , P.UnitPrice,
  (SELECT Id FROM Supplier S where S.Id = P.SupplierId AND S.Id = 9999)
from Product P 
Where  
 P.SupplierId = 1 and
 P.UnitPrice = 18


Subexpression is removed


Example 3


These two Queries produce the same execution plan

       
SELECT P.SupplierId , P.UnitPrice, S.id 
from Product P 
  LEFT OUTER JOIN Supplier S ON S.Id = P.SupplierId
Where  
 P.SupplierId = 1 and
 P.UnitPrice = 18 and
 S.Id = 1
 

       
SELECT P.SupplierId , P.UnitPrice, S.id 
from Product P 
  INNER JOIN Supplier S ON S.Id = P.SupplierId
Where  
 P.SupplierId = 1 and
 P.UnitPrice = 18 and
 S.Id = 1
 


Example 4

Select
  O.CustomerId,
  OI.Id,
  P.id
from [Order] O
  join [OrderItem] OI on OI.OrderId = O.id
  join [Product] P on OI.ProductId = P.Id



Now If we remove the field P.ID then the optimizer does not use anymore the product table.
       
Select O.CustomerId,OI.Id  --, P.id
from [Order] O
  join [OrderItem] OI on OI.OrderId = O.id
  join [Product] P on OI.ProductId = P.Id
 
 
 

But to implement the INNER JOIN shouldn't table product still be needed?
Actually NOT!
The Optimizer use the foreign key and so there is no reason to access to the table Product

CREATE NONCLUSTERED INDEX [IndexOrderItemProductId] ON [dbo].[OrderItem]([ProductId] ASC)






 

 

Contradiction detection


The optimizer is able to resolve contradictions.


Explicit contradiction inside the Query:

SELECT * FROM Product where UnitPrice > 18.0 and UnitPrice < 18.0            





Can use the constraint.


For example with this constraint

ALTER TABLE [dbo].[Product]  WITH CHECK ADD  CONSTRAINT [CK_Product] CHECK  (([UnitPrice]<=(100000)))


Running this select

SELECT * FROM Product where UnitPrice > 100000
 

We obtain a constant scan




That's all for today! If you have found this Article AWE then wait for the continuation where we will talk about the next step of the Optimizer pipeline: the cardinality estimation!

See you soon!

Luca Biondi @ SQLServerPerformance blog!


 
 
 
 
 
 
 
 
 
 

Comments

I Post più popolari

SQL Server, execution plan and the lazy spool (clearly explained)

SQL Server, datetime vs. datetime2

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!