SQL Server, How to choose a clustered key value and the Last Page Insert Latch Contention problem

Hi Guys,


Today you are the data architect! your first task is to design a table from scratch.
But...how do you choose the clustered key value?

It sounds easy but rember that your decisions will have a lot of consequences for the long life of the project ... and you can't go wrong!

 

How to choose a clustered key value

Today let's review some theory ..

A good Clustered key must be narrow. It is not only a question of data occupation (today storage devices are spacious) but a lot of data takes time to be read.

We always consider that the value used will be used not only in the clustered key. It will in fact be used in any non-clustered index. 

So for example fields Id and page_id, category_id must have the same data type.

SQL Server relations between the tables

Normally you should use a surrogate key value (like an ID identity, a progressive numerber) instead of a natural key (whose values ​​are formed by attributes that exist and are used in the external world, for example surname+name).

A surrogate key value cannot change and this fact is very important for the next gold rule Which says that the clustered Key value must be static.

The clustered Key value should not change as its value is replicated in every non clustered index

Finally you should choose an Ever increasing Clustered key value.

The reason depends on how SQL Server stores the information.

Sql Server Clustered index B+ tree structure

 

Using an ever increasing clustered key value means that new records will be always be stored at the end of our clustered index (within the last actual page)
This avoids page split which are expensive and avoids index fragmentation
.

Here, however, there is a danger problem that occurs when many users enter data in our table at the same time. In jargon it is said that there is a problem of scalability or that a procedure does not scale.

 

The Last Page Insert Latch Contention problem

Why is there a problem when many users enter data into our table at the same time?

I will try to explain it in the simplest way possible.

In order to manage data concurrency SQL Server engine need to guarantee consistency of its in-memory structures like index, data pages and internal structures, such as non-leaf pages in a B-Tree.

Latches are synchronization primitives that are used to do this job!

There are actually different types of latches: 

  • Buffer (pool) latches are used to protect pages in the buffer pool.
  • I/O Latches are used to protect pages not yet loaded in the buffer pool.

In our case we are talking about buffer latches.

Let's pay attention to this logic:

  • When you need to read a page, a worker thread must acquire a shared latch (PAGELATCH_SH)
  • When you need to write a page, a worker thread must acquire an exclusive Latch (PAGELATCH_EX). 

Now what happens when many users execute an insert statemement? 

We are trying to insert data on the same page so there are many worker thread that are contending about an exclusive latch on the same last page in the clustered index.

This is the Last Page Insert Latch Contention problem.


How to resolve?

Follow me in the next posts to see how to reduce or eliminate this problem!
 
There are various possibilities that today I will list only but that in the next posts we will see in detail with their PRO and CONS.
  • Adopt a random clustered key value
  • Reverse index technique
  • Enable OPTIMIZE_FOR_SEQUENTIAL_KEY option in sql 2019
  • In memory technique
  • Partitioning technique
note: random Clustered Key for your Clustered Index, because then you are distributing the inserted records across multiple different places in the Clustered Index. But a random Clustered Key also introduces so-called hard Page Splits, because SQL Server has to allocate a new data page somewhere within the leaf level of the Clustered Index. Hard page splits will also have a negative impact on the performance of your transaction log, because logging a hard page split is a lot more work than logging a “normal” INSERT (a soft page split) at the end of your Clustered Index

That's all for today! thanks and see you next time,
Luca

Comments

I Post più popolari

SQL Server, datetime vs. datetime2

SQL Server, execution plan and the lazy spool (clearly explained)

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!