SQL Server, SQL Server, fast data reading with table partitioning! ...Vamonos!

Hi Guys,

Welcome back for this midweek post!

Performance has always been our favorite topic and then even today we see how to read the data faster using the table partitioning!

Are you ready? ...as my Mexican friends would say ... vamonos!

What is the table partitioning?

Table partitioning is a data organization scheme where the table data is divided into multiple storage objects called partitions.

When a table becomes large it might be useful to partition the data and this is done following a rule.

For example, if we had a huge list of people we could think of creating a partition for each initial letter of the name. The first partition would contain all the people whose name starts with the letter A, the second partition would contain all the people who start with the letter B, and so on.

The nice and interesting aspect is that when we search the table for a person starting with the letter A we will only read the data from the first partition. All other partitions would not be accessed.

This process by which the query optimizer accesses only the relevant partitions to satisfy the query filter criteria is called Partition elimination.

Let's see it in more detail with our usual example!

Divide our table into many partitions

Suppose we have a huge list of personal data.

We want to divide our table by making a partition for each initial letter.

After creating a new database called partitioning, we will create a number of filegroups (for our example we will only do 5):


CREATE DATABASE Partitioning

ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_A

ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_B

ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_C

ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_D

ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_E

Then we will create the new partitions.

Remember: one partition for each filegroup


ALTER DATABASE Partitioning
    ADD FILE 
    (
    NAME = [Names_A],
    FILENAME = 'E:\Names_A.ndf',
        SIZE = 3072 KB, 
        MAXSIZE = UNLIMITED, 
        FILEGROWTH = 1024 KB
    ) TO FILEGROUP Classifier_A

ALTER DATABASE Partitioning
    ADD FILE 
    (
    NAME = [Names_B],
    FILENAME = 'E:\Names_B.ndf',
        SIZE = 3072 KB, 
        MAXSIZE = UNLIMITED, 
        FILEGROWTH = 1024 KB
    ) TO FILEGROUP Classifier_B

ALTER DATABASE Partitioning
    ADD FILE 
    (
    NAME = [Names_C],
    FILENAME = 'E:\Names_C.ndf',
        SIZE = 3072 KB, 
        MAXSIZE = UNLIMITED, 
        FILEGROWTH = 1024 KB
    ) TO FILEGROUP Classifier_C

ALTER DATABASE Partitioning
    ADD FILE 
    (
    NAME = [Names_D],
    FILENAME = 'E:\Names_D.ndf',
        SIZE = 3072 KB, 
        MAXSIZE = UNLIMITED, 
        FILEGROWTH = 1024 KB
    ) TO FILEGROUP Classifier_D


ALTER DATABASE Partitioning
    ADD FILE 
    (
    NAME = [Names_E],
    FILENAME = 'E:\Names_E.ndf',
        SIZE = 3072 KB, 
        MAXSIZE = UNLIMITED, 
        FILEGROWTH = 1024 KB
    ) TO FILEGROUP Classifier_E

The partition Names_A is associated to the filegroup Classifier_A.

The partition Names_B is associated to the filegroup Classifier_B.

And so on..

Now it is the time to add the Partition function.

What is the partition function?

The partition function is the function that defines how the data is divided.


CREATE PARTITION FUNCTION [PartitioningByClassifiers] (varchar(1))
AS RANGE RIGHT FOR VALUES ('B', 'C', 'D', 'E');

Our PartitionByClassifier partition function set 5 subset of data:

All letters before the letter B. (so the letter A)
All letters from the letter B and before the letter C. (so the letter B)
All letters from the letter C and before the letter D. (so the letter C)
All letters from the letter D and before the letter E. (so the letter D)
All letters after the letter B. (so the letter E)

Finally we need a partition scheme.


CREATE PARTITION SCHEME [Partitioninscheme]
AS PARTITION [PartitioningByClassifiers]
TO (Classifier_A, Classifier_B, Classifier_C, Classifier_D, Classifier_E);

Our partitioninschede partition scheme map each value of the partition function to a filegroup.

For example: All letters before the letter B (so the letter A) is associated to the filegroup Classifiers_A.

Finally we create the table:


CREATE TABLE Names
(FirstLetter Varchar(1),
 Name varchar(40))
ON [Partitioninscheme] (FirstLetter);

What does the part highlighted in yellow tell us?

The our table Names will use the partitionscheme partitionscheme.

The field that will be evaluated to divide the data will be the FirstLetter field.

We are now ready to insert the data into our table.
And it's our test time too!

The question is:
Will it be faster to read data from the Names table or from a table with the same structure but not partitioned?

The time of the test: Partitioned Vs. Not partitioned

We have a nice table called names ready to do some testing!

What we will do?

Let's create a new table with the same structure as the name table but not partitioned.
Then we fill in both tables and see from which of the two tables we read the data faster!

...you are curious right?

Let's fill the names table with 10 million rows using the loop below:


declare @i integer;
set @i = 0;
while @i < 10000000  -- 10M
begin 
  exec('Insert into Names (FirstLetter,Name) 
        Values (''A'',''Anna '+@i+'''), 
               (''B'',''Berto '+@i+'''), 
	     (''C'',''Claudio '+@i+'''), 
	     (''D'',''Diana '+@i+'''), 
	     (''E'',''Ettore '+@i+''')')
  set @i = @i + 1
end

The data entered will look like this:

Now let's create a table with the same structure but not partitioned:


CREATE TABLE Names_nopart
(FirstLetter Varchar(1),
 Name varchar(40))

..and we fill itwith this command:


INSERT INTO Names_nopart SELECT * FROM names

Last thing..
We add the clustered index on the Firstletter field in both tables.


CREATE CLUSTERED INDEX idx_Names_firstletter ON Names(firstletter)
CREATE CLUSTERED INDEX idx_Names_nopart_firstletter ON Names_nopart(firstletter)

Now we type these two command and measure the execution time:


SELECT * FROM Names WHERE FirstLetter in ('B','B') AND name = 'Berto 1'
SELECT * FROM Names_nopart WHERE FirstLetter in ('B','B') AND name = 'Berto 1'

...and here are the results.


 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 120 ms.

(1024 righe interessate)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 186 ms.

However, it is not certain that we have such obvious advantages: Partitioning was not born to speed up reading operations but instead for reduce contentions.

But how do we see if we actually only read the correct partitions?

Selecting a clustered index and pressing the F4 key you will see two new property.

The property Partioned will be set to true.
The property number of partitions accessed, in our example will be equal to 2.

Today I will not show you anything else because I just wanted to introduce this topic that I had never talked about on this blog.

News reports ..

Just a couple of news

CU 14 for SQL Server 2019 has been published. Download it directly from the microsoft site

General availability GA release of the new SQL Server 2022 by the end of the year.

Started an affiliation with the popular virtualization software VMWARE which we will use often shortly. I suggest and recommend clicking here to watch to all the offers

That's all for today mates,

Stay tuned and have a great thanksgiving day!

Luca

Previous post: SQL Server, other ideas for a fast insert ...multivalues!

Search This Blog

SQL Server Performance Blog where milliseconds matter