SQL Server 2019 and the Approx_Count_Distinct function

Hi Guys,

Today we continue with the speech of the new features of SQL 2019, if last time we talked about Batch mode on Rowstore, today we talk about another new feature: the function
Approx_Count_Distinct
.

Enjoy the reading, mate!


The Approx_Count_Distinct function 

Let's suppose for simplicity we take the same "movements" table used in the last post and already filled with 3 million rows.

       
CREATE TABLE Moviments
(ID INT IDENTITY(1,1),
 YEAR FLOAT,
 QTY FLOAT,
 PRICE FLOAT,
 CLASSIFICATORS VARCHAR(10)) 
     

Now suppose you want to know the number of items purchased for each classificators.

Simple, you will say! just write this command:


SELECT COUNT(DISTINCT CLASSIFICATORS) FROM MOVIMENTS

True!

However, it will not surprise you to know that this operation, which apparently seems to be so simple, instead requires many resources (and to read through all the rows of our table):

 

       
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'moviments'. Scan count 1, logical reads 14085, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 1484 ms,  elapsed time = 1490 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Completion time: 2021-10-11T19:11:30.4171839+02:00
      

 

In some cases it would be preferable to have an approximate result but in a shorter time!

For this reason Microsoft starting from SQL Server 2019 has introduced a new function called APPROX_COUNT_DISTINCT.

The goal of this function is to return the distinct count of values ​​"with some approximation".
Therefore, if we do not need to know the exact value of the distinct values ​​but we are satisfied with a small approximation (a few percentage points), then using this function we will have a shorter execution time.

But it's true? Let's try it!

For the test type the following command:


SELECT APPROX_COUNT_DISTINCT(CLASSIFICATORS) FROM MOVIMENTS

and execute it.


Table 'moviments'. Scan count 1, logical reads 14085, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 766 ms,  elapsed time = 770 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Completion time: 2021-10-11T19:21:12.5513347+02:00


Wow! The Approx_Count_Distinct function works! 770 ms instead of 1490 ms!

Good! this is also another reason to switch to SQL server 2019

 

 
That's all for today! 
Stay tuned for the next post and follow me from linkedin if you want to stay updated.
Luca

Luca Biondi @ SQLServerPerformance blog!

 
 
 
 
 
 
Next post:
Previous post: https://sqlserverperformace.blogspot.com/2021/10/sql-server-2019-and-batch-mode-on.html

Comments

I Post più popolari

SQL Server, datetime vs. datetime2

SQL Server, execution plan and the lazy spool (clearly explained)

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!