SQL Server. Statistics, cardinality estimation and some thoughts about my previous post

Hi Guys,

I would say a big thanks to Jeff Moden who have commented my previous post SQL Server, statistics, performance and the Ascending key problem. He made me think more deeply about the statistics and so i decided to do this second post just to clarify and add some thoughts.

Enjoy the reading!

Statistic, cardinality and indexes

From the point of view of returning a correct cardinality estimate value (and therefore having a precise execution plan), reading my previous post in the example paragraph, we referred to the existence of a clustered index on the table.

It should be noted that it does not matter whether there is a clustered index or a not clustered one.

I will demonstrate this with an example.

Example

Suppose we create a heap table (therefore without any index)
Suppone our table will have an id field and a value field.

Now let's fill our table with 100 rows by putting a value from 1 to 100 in both the id and value fields.


Create Table Example (id int, [value] int)

Insert into Example (id, value)
Select top 100 ROW_NUMBER() over (order by id), ROW_NUMBER() over (order by id) from ordrig

then execute this simple command:


Select Value From Example where Value < 50

What happens?

If the AUTO_CREATE_STATISTICS parameter is set to ON, executing this first select, statistics are created.

However without an index on the value field i will read all the rows within the table.

So, for example we will add a non clustered index on the value field:

       
Create Index IDX_Example_Value on Example(Value)

This time we will add 5 rows with a value equal to 101


Insert into Example (id, value)
Select top 5 ROW_NUMBER() over (order by id), 101 from ordrig

Statistics will not be updated.

Type now:


Select Value From Example where Value = 101

You can see that the extimated numeber or rows to be read is 1 (and not 5)

Exactly as in the case of a clustered index.

To summarize

Cardinality estimation suffers from this problem: if we insert values in the table that are outside the range of values included in the statistics, the returned cardinality will be equal to 1 until the statistics are updated. This happens with both clustered and non-clustered indexes.

The Next post..

Jeff Moden, whom I thank again, very correctly also remembered that:

[...I'll also tell you that such an "ever-increasing" insert in the middle of an index can cause a similar cardinality issue. It's a bit less likely than the end-of-index insert, but it's still possible and for all the same reasons that you identify in your good article.

The end of index insert pattern also causes a hot-spot that can produce serious contention during inserts. Believe it or not, an awesome fix for that problem and the stats issue is go use a Random GUID as the Clustered Index even if it's not the PK..]

But we will talk about this in the next post..

I hope the article is useful to you, so stay tuned!

Thank you,

Luca

Next post: SQL Server, How to choose a clustered key value and the Last Page Insert Latch Contention problem

Previous post: SQL Server, statistics, performance and the Ascending key problem

Search This Blog

SQL Server Performance Blog where milliseconds matter