Inside the SQL Server Query Optimizer - part 1 Introduction and the input tree

Welcome back Guys!

We are in 2020 and I would wish you a happy new year!
I hope it will be a year of professional and personal growth for you.
I hope it is a year of health, dedicated to affections and family!

Today we start with the first article of this year and the second year of this blog that's growing fast!

This first article of this year is about the Query Optimizer.
I will present to you a series of articles in which we will go into detail on how the Query optimizer works.


Enjoy the reading!


Introduction 


We had already mentioned the subject in this article wich i really invite you to read here: SQL Server, the Parametrization and the Parameter Sniffing

So let's review what we have already said:

1) The goal of the Query Optimizer is to find a good way to return data (aka find a good execution plan) that will be executed by the Query Executor.

Remember: not the best plan. This is because in a non-trivial query there can be hundreds or thousands of different ways to return the same result.

This process however is a very time-consuming process and so finding the best way to optimize the query took too long.

For this reason the optimizer is based on a cost concept. The SQL Server engine has a certain amount of time available in which it will search for the least cost among the already calculated execution plans.

2) The Query optimizer read an algebrized or input tree produced by the previous Binding phase and produce an Execution plan.
Obtain an execution plan is expensive so SQL Server try to store it in an memory called Cache Plan.


Now, let’s go into details!


As you can see on the pics below there are many steps inside the optimization pipeline:


Well, today we will talk about the input tree, while in the next article we will enter into the core talking about the Simplification step.
But! How to Watch the input tree? 
In order to take a look to the Input tree you should enable some traceflags.
They are: 8605 and 3604

For convenience we can append this OPTION cause to the end of the Query:
       
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);

 
In the message tab of the SQL Manager will appear our input tree.


The Input Tree


The Optimizer read an input tree produces by the previous Binding phase.
So first of all, let's look at some input trees order to see logical operators of which it is composed

Case 1: Let’s start now with this simple Query:
 
       
SELECT FirstName from Customer
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);

 
The Input tree is this below:

*** Converted Tree: ***
    LogOp_Project QCOL: [Crack_Me].[dbo].[Customer].FirstName
        LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1000
        AncOp_PrjList
*******************

We can observe that:
  • We have 3 logical operations: LogOp_Project, LogOp_Get and AncOp_PrjList
  • After the QCOL label of the LogOp_Project we have the list of the columns that need to be pulled out.
  • After the TBL label of the LogOp_Get we have the table Id of the our table



Case 2: Using the star “*”
       
SELECT * from Customer
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);


The Input tree is now this:

*** Converted Tree: ***
    LogOp_Project QCOL: [Crack_Me].[dbo].[Customer].Id QCOL: [Crack_Me].[dbo].[Customer].FirstName QCOL: [Crack_Me].[dbo].[Customer].LastName QCOL: [Crack_Me].[dbo].[Customer].City QCOL: [Crack_Me].[dbo].[Customer].Country QCOL: [Crack_Me].[dbo].[Customer].Phone
        LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1000
        AncOp_PrjList
*******************

 
What changes? 
If you use the “*” SQL Server had to replace the “*” (star) with the list of the columns. 
So do not use the “*”



Case 3: Adding a LEFT function

Now we add a left function in the select clause:
       
SELECT Left(FirstName,40) from Customer
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);


The Input tree is this:

*** Converted Tree: ***
    LogOp_Project COL: Expr1002
        LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1000
        AncOp_PrjList
            AncOp_PrjEl COL: Expr1002
                ScaOp Intrinsic substring
                    ScaOp Const TI(int,ML=4) XVAR(int,Not Owned,Value=40)
                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
                    ScaOp_Identifier QCOL: [Crack_Me].[dbo].[Customer].FirstName
*******************


Now we can see:
  • The introduction of the expression (Expr1002)
  • The new logical operators AncOp_PrjList and AncOP_PrjEl
  • The new logical operators ScaOp_Instrinsic ( + a data type)
  • A new logical operator ScaOp_Const
  • A new logical operator ScaOp_Identifier 


Case 4: Adding a WHERE
       
SELECT FirstName from Customer
WHERE FirstName = 'Paolo'
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);


This is the Input tree:

*** Converted Tree: ***
    LogOp_Project QCOL: [Crack_Me].[dbo].[Customer].FirstName
        LogOp_Select
            LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1000
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [Crack_Me].[dbo].[Customer].FirstName
                ScaOp_Const TI(nvarchar collate 53256,Var,Trim,ML=10) XVAR(nvarchar,Owned,Value=Len,Data = (10,8097111108111))
        AncOp_PrjList
*******************
 

We can see:
  • The introduction of the logical operator LogOp_Select
  • The introduction of the logical operator LogOp_Comp



Case 5: Adding a Group By
       
SELECT FirstName from Customer
GROUP BY FirstName
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);

The input tree is:

*** Converted Tree: ***
    LogOp_Project QCOL: [Crack_Me].[dbo].[Customer].FirstName
        LogOp_GbAgg OUT(QCOL: [Crack_Me].[dbo].[Customer].FirstName,) BY(QCOL: [Crack_Me].[dbo].[Customer].FirstName,)
            LogOp_Project
                LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1000
                AncOp_PrjList
            AncOp_PrjList
        AncOp_PrjList
*******************

We can see:
  • The introduction of the logical operator LogOp_GbAgg
  • The Group by Aggregate logical operation (GBAGG) implements the GROUP BY operator.


Case 6
: Using the TOP operator
       
SELECT Top 10 FirstName from Customer
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);
       
     
*** Converted Tree: ***
    LogOp_Top NoTies
        LogOp_Project QCOL: [Crack_Me].[dbo].[Customer].FirstName
            LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1000
            AncOp_PrjList
        ScaOp_Const TI(bigint,Null,ML=8) XVAR(bigint,Not Owned,Value=10)
        ScaOp_Const TI(bigint,ML=8) XVAR(bigint,Not Owned,Value=0)
*******************


We can see a LogOp_Top logical operator in the top of the tree that means the is the last operator processed.



Case 7: The JOIN predicate

SELECT FirstName from [Order] join [Customer] on [Order].CustomerId = [Customer].Id
OPTION
(
    RECOMPILE,
    QUERYTRACEON 3604,
    QUERYTRACEON 8605
);


 *** Converted Tree: ***
    LogOp_Project QCOL: [Crack_Me].[dbo].[Customer].FirstName
        LogOp_Join
            LogOp_Get TBL: Order Order TableID=1237579447 TableReferenceID=0 IsRow: COL: IsBaseRow1000
            LogOp_Get TBL: Customer Customer TableID=1205579333 TableReferenceID=0 IsRow: COL: IsBaseRow1002
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [Crack_Me].[dbo].[Order].CustomerId
                ScaOp_Identifier QCOL: [Crack_Me].[dbo].[Customer].Id
        AncOp_PrjList

  • We see the new logical operator Log_Op_Join
  • Then we have 2 LogOP_Get logical operations.
  • The ScaOp_Comp logical operator realize the Cartesian product.


That's all for today!

I know! All of this information was perhaps uninteresting.
However, they will be needed for the next article in which we will get to the heart of the operations of the operations carried out by the optimizer.
In fact, we will talk about the first optimization step called Simplification.
The Simplification step reads and then manipulates an input tree.
 


So thank you and wait for the next article!

Luca Biondi @ SQLServerPerformance blog!
 


 
 
 
 
 
 
 
 
 
 
 

Comments

I Post più popolari

SQL Server, execution plan and the lazy spool (clearly explained)

SQL Server, datetime vs. datetime2

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!