Inside the SQL Server Query Optimizer - part 3 Cardinality estimation
After the simplification phase discussed in the last article, today we will speak about another fundamental step called Cardinality Estimation.
During this step the optimizer try to predict the number of rows returned by each node of the execution plan.
As you can imagine this is a step of primary importance because a good prediction generate an accurate execution plan with a lower processing cost to execute.
The statistics
In order to estimate the number of rows returned by a query the Cardinality Estimator use so called statistics.By default for each column and each index created, SQL Server create it's relative statistic.
But what is a statistic?
A statististic for query optimization is a binary large object called BLOBs that contain statistical information about the distribution of values for one o more columns of a table.
We can observe statistics through the T-SQL command:
DBCC SHOW_STATISTICS ( <TABLE> , <FIELD_NAME> )
The execution of this command will show you a grid with a bunch of rows called histogram statistics.
How to read the Histogram above?
For a value in the RANGE_HI_KEY columns (eg."214230003200000") the EQ_ROWS will tell your how much rows there are (eg. 4 rows).
Single Predicate (the simplest case)
Now, we will start with the simplest case.Estimate the number of rows qualified returned by a single query predicate.
This is an easy case for the Optimizer.
Suppose to execute this following simple query:
SELECT * FROM <TABLE> WHERE FIELD_NAME = '214230003200000'
Looking at the execution plan, using the statistics, the optimizer correctly indicate an estimated Number of Rows equal to 4.Multiple predicates
- One of the two sets generated by the (two) predicates, considered individually, overlaps the other.
- The two sets are disjoint.
What is not known is the cardinality of the elements in common to the two sets.
For this reason, it is also easy to understand that having only statistics for single fields it is not possible to calculate an exact cardinality value.
An example with two predicates connected by an AND operator
Given a PERSON table with these fields:
Select CAP, PIVA from Person where (CAP BETWEEN '10000' AND '400000') AND PIVA = '00759680218'
A) When I search the CAP field (CAP BETWEEN '10000' AND '400000')
Using statistics, estimated number of row is equal to 111
B) When I search the PIVA field (PIVA = '00759680218')
Using statistics, estimated number of row is equal to 13
And now? ...what is the number of rows estimated when the two conditions are placed in AND?
So the Optimizer apply this formula:
So, ( 13 * 111 ) / 512 = 2,818
Sql Server 2014 introduce a new cardinality estimator (CE) who assume some correlation between the predicates.
New CE adopt a new formula called exponential backoff algorithm:
In this case we have:
So, 512 * ( 13 / 512) * SQRT(111 / 512) = 6,052
And now another question!
What cardinality estimator produced the better rows number estimation?
OLD CE = 2,8
NEW CE = 6,4
Our Query return ….
…. 13 Rows!
In this case the NEW CE produce a better estimation for this Query but in general it depends:
If you have data with correlating predicates the new CE is likely to give you a better estimate and therefore a better plan.
However, if your predicates have no correlation, you may find the legacy CE to provide a better estimate.
That's all for today!
I hope you liked this article.
Look forward to the next part of this series of articles
Luca Biondi @ SQLServerPerformance blog!
Next post: SQL Server, Again info about statistics and the "Ascending Key Problem"
Previous post: Inside the SQL Server Query Optimizer - part 2 All about the Simplification
Comments
Post a Comment