SQL Server. Statistics, cardinality estimation and some thoughts about my previous post
Hi Guys,
I would say a big thanks to Jeff Moden who have commented my previous post SQL Server, statistics, performance and the Ascending key problem. He made me think more deeply about the statistics and so i decided to do this second post just to clarify and add some thoughts.
Enjoy the reading!
Statistic, cardinality and indexes
From the point of view of returning a correct cardinality estimate value (and therefore having a precise execution plan), reading my previous post in the example paragraph, we referred to the existence of a clustered index on the table.
It should be noted that it does not matter whether there is a clustered index or a not clustered one.
I will demonstrate this with an example.
Example
Suppose we create a heap table (therefore without any index)
Suppone our table will have an id field and a value field.
Now let's fill our table with 100 rows by putting a value from 1 to 100 in both the id and value fields.
Create Table Example (id int, [value] int)
Insert into Example (id, value)
Select top 100 ROW_NUMBER() over (order by id), ROW_NUMBER() over (order by id) from ordrig
then execute this simple command:
Select Value From Example where Value < 50
What happens?
However without an index on the value field i will read all the rows within the table.
So, for example we will add a non clustered index on the value field:
Create Index IDX_Example_Value on Example(Value)
This time we will add 5 rows with a value equal to 101
Insert into Example (id, value) Select top 5 ROW_NUMBER() over (order by id), 101 from ordrig
Statistics will not be updated.
Type now:
Select Value From Example where Value = 101
You can see that the extimated numeber or rows to be read is 1 (and not 5)
Exactly as in the case of a clustered index.
To summarize
Cardinality estimation suffers from this problem: if we insert values in the table that are outside the range of values included in the statistics, the returned cardinality will be equal to 1 until the statistics are updated. This happens with both clustered and non-clustered indexes.
The Next post..
Jeff Moden, whom I thank again, very correctly also remembered that:
[...I'll
also tell you that such an "ever-increasing" insert in the middle of an
index can cause a similar cardinality issue. It's a bit less likely
than the end-of-index insert, but it's still possible and for all the
same reasons that you identify in your good article.
The end of
index insert pattern also causes a hot-spot that can produce serious
contention during inserts. Believe it or not, an awesome fix for that
problem and the stats issue is go use a Random GUID as the Clustered
Index even if it's not the PK..]
But we will talk about this in the next post..
Next post: SQL Server, How to choose a clustered key value and the Last Page Insert Latch Contention problem
Previous post: SQL Server, statistics, performance and the Ascending key problem
Awesome update, Luca! I can't wait to see what you have to say in the upcoming article that you spoke of.
ReplyDeleteAnd thank you kindly for the honorable mention.