Speed up your Inserts and Updates ...after learning how the Write Ahead Logging mechanism works

Hi Friends, 


Thanks for the many views on this blog, i am really happy!

Today i will show you a way to speed up your inserts and updates

This require an explanation (and i hope to do it clearly) of how writing data to disk works. We will talk about Durability, Write ahead logging mechanism and the Log Buffer.

As always the one and only way to make SQL Server run faster is to ... know in detail how it works!

Enjoy the reading!


Intro: ACID properties and the durability

Let's take a step back hanging up to a post written some time ago.

I would invite you infact to read a post i wrote in 2019, the title is SQL Server: Transazioni, Lock e Deadlock. Un po di teoria spiegata in modo semplice! 

We talked about transactions, locks and ACID properties (Atomicity, Consistency, Isolation, e Durability) of a DBMS.

Today we need to remember what durability means.

So, what durability means?

Durability means that once a transaction has been confirmed changes made must never be lost.

The WRITE AHEAD LOGGING mechanism

Another question: how SQL server ensure the durability?

Someone is led to believe that SQL Server satisfies the durability property by immediately writing data to disk, but in reality it is not!

The mechanism adopted is in fact another and is called WRITE AHEAD LOGGING or WAL.

In order to avoid data losses due to a malfunction (between the moment in which SQL Server "commits" to write the data and the moment in which this data is actually written)  this mechanism mantain a Log file in which all operations that are performed on the database are recorded.

This essential log file is called transaction log.

Let's say that it is essential because in the event of a malfunction it will always be possible to recover the data by re-reading the transaction log.

However, if the transaction log is damaged, information retrieval will not be possible.

In order to improve the performance the WAL mechanism introduce a log buffer in memory to speed up the entire process.

Log writes are first done to the “log buffer”, later, under certain conditions SQL Server flush (or harden) the log buffer contents to the transaction log on the disk.

Hardened blocks can be of various size: from one sector (512 bytes) to a maximum of 60 KB (the
size of the buffer log)  

Now, what conditions cause the log buffer to be flushed on disk?  

  • The user send a commit request (or execute a single T-SQL command: in this case we have an implicit transaction)
  • The Log buffer reach 60 KB and become full
  • A checkpoint process is execute
  • A user execute the procedure sys.sp_flush_log

Note that the first point tell us an important things: each commit cause a flush of the log buffer to the transaction log even if the log buffer is not full. 

A Last thing: there is a well known datatype that measure time that SQL Server wait to complete a log buffer flush, this is the WRITELOG wait type.

 

Now, speed up your inserts and updates!

 
Imagine you have a table "CLASS" with a column ID int identity(1,1), a code and a descr varchar type fields. Your goal is to insert 100.000 rows.

Doing our tests we will measure the duration of the operation and another parameter that return the number of log flushes (*)

(*) You can get though this value with this query:

SELECT cntr_value FROM sys.dm_os_performance_counters 
WHERE counter_name = 'Log Flushes/sec' AND instance_name = N'<nome del database>'

 

This is the procedure to execute:


DECLARE @i AS INT = 1;

WHILE @i <= 100000

BEGIN

  INSERT INTO dbo.CLASS(CODE,DESCR) VALUES(cast(@i as varchar(20)),'');

  SET @i += 1;

END

 

Since the transaction is implicit, our query is equal to:

 

DECLARE @i AS INT = 1; 

WHILE @i <= 100000

BEGIN

  BEGIN TRAN

  INSERT INTO dbo.CLASS(CODE,DESCR) VALUES(cast(@i as varchar(20)),'');

  COMMIT TRAN;

  SET @i += 1;

END

 

Results: 20 seconds and 101406 flushes of the Log

Surely more than 100k flushes of a nearly empty Buffer log is not efficent so we now try to avoid to execute a commit for each insert.

Execute this commands:

DECLARE @i AS INT = 1; 

BEGIN TRAN

WHILE @i <= 100000

BEGIN 

  INSERT INTO dbo.CLASS(CODE,DESCR) VALUES(cast(@i as varchar(20)),''); 

  SET @i += 1;

END

COMMIT TRAN;

 

Results?  Bingo! Only 6 seconds and only 612 flushes of the log

Rememeber: repeated insert of (little) quantity of data probably are faster if enclosed inside a transaction.

But that's not all! 

 

Do you want to go even faster?

Oh yes ...i am hearing!!

Let me say that SQL Server has a system databases dedicated to the temporary data. This database is the TempDB.

We talked about this database at least other three times:

 

Today instead we say about the TempDB that it has two features which in our case may be of interest to us

1) A commit does not trigger a log buffer flush (wow)

2) The amount of information logged inside the TempDB is lower than others user database (more detail will follow in future if this could interest to you)


Good, i feel you are ready for another test.

So, create the CLASS table on the TempDB database:

CREATE TABLE dbo.classi(id int identity(1,1), codice varchar(8), descr varchar(40));
 

Then run again this batch:

DECLARE @i AS INT = 1;

WHILE @i <= 100000

BEGIN

  INSERT INTO TempDB.dbo.CLASS(CODE,DESCR) VALUES(cast(@i as varchar(20)),'');

  SET @i += 1;

END

 

Results: Again bingo! Only 3 seconds and only 475 flushes of the log

Finally the latest test, moving the transaction outsite the while loop:

 

DECLARE @i AS INT = 1;
 
BEGIN TRAN 

WHILE @i <= 100000

BEGIN

  INSERT INTO TempDB.dbo.CLASS(CODE,DESCR) VALUES(cast(@i as varchar(20)),'');

  SET @i += 1;

END

COMMIT

 

Results: Wow! Only 1 seconds and only 123 flushes of the log



That's all for day guys! i wish you a great week end!

If you liked this post leave a comment, subscribe to the blog and wait for the next post!

Luca












Previous post:https://sqlserverperformace.blogspot.com/2021/02/sql-server-inside-parser-getgenlex.html

 

 

 

Comments

I Post più popolari

SQL Server, datetime vs. datetime2

SQL Server, execution plan and the lazy spool (clearly explained)

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!