SQL Server, SQL Server, fast data reading with table partitioning! ...Vamonos!
Hi Guys,
Welcome back for this midweek post!
Performance has always been our favorite topic and then even today we see how to read the data faster using the table partitioning!
Are you ready? ...as my Mexican friends would say ... vamonos!
What is the table partitioning?
Table partitioning is a data organization scheme where the table data is divided into multiple storage objects called partitions.
When a table becomes large it might be useful to partition the data and this is done following a rule.
For example, if we had a huge list of people we could think of creating a partition for each initial letter of the name. The first partition would contain all the people whose name starts with the letter A, the second partition would contain all the people who start with the letter B, and so on.
The nice and interesting aspect is that when we search the table for a person starting with the letter A we will only read the data from the first partition. All other partitions would not be accessed.
This process by which the query optimizer accesses only the relevant partitions to satisfy the query filter criteria is called Partition elimination.
Let's see it in more detail with our usual example!
Divide our table into many partitions
CREATE DATABASE Partitioning
ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_A
ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_B
ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_C
ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_D
ALTER DATABASE Partitioning
ADD FILEGROUP Classifier_E
ALTER DATABASE Partitioning ADD FILE ( NAME = [Names_A], FILENAME = 'E:\Names_A.ndf', SIZE = 3072 KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024 KB ) TO FILEGROUP Classifier_A ALTER DATABASE Partitioning ADD FILE ( NAME = [Names_B], FILENAME = 'E:\Names_B.ndf', SIZE = 3072 KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024 KB ) TO FILEGROUP Classifier_B ALTER DATABASE Partitioning ADD FILE ( NAME = [Names_C], FILENAME = 'E:\Names_C.ndf', SIZE = 3072 KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024 KB ) TO FILEGROUP Classifier_C ALTER DATABASE Partitioning ADD FILE ( NAME = [Names_D], FILENAME = 'E:\Names_D.ndf', SIZE = 3072 KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024 KB ) TO FILEGROUP Classifier_D ALTER DATABASE Partitioning ADD FILE ( NAME = [Names_E], FILENAME = 'E:\Names_E.ndf', SIZE = 3072 KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024 KB ) TO FILEGROUP Classifier_E
CREATE PARTITION FUNCTION [PartitioningByClassifiers] (varchar(1))
AS RANGE RIGHT FOR VALUES ('B', 'C', 'D', 'E');
- All letters before the letter B. (so the letter A)
- All letters from the letter B and before the letter C. (so the letter B)
- All letters from the letter C and before the letter D. (so the letter C)
- All letters from the letter D and before the letter E. (so the letter D)
- All letters after the letter B. (so the letter E)
CREATE PARTITION SCHEME [Partitioninscheme]
AS PARTITION [PartitioningByClassifiers]
TO (Classifier_A, Classifier_B, Classifier_C, Classifier_D, Classifier_E);
Our partitioninschede partition scheme map each value of the partition function to a filegroup.
CREATE TABLE Names
(FirstLetter Varchar(1),
Name varchar(40))
ON [Partitioninscheme] (FirstLetter);
What does the part highlighted in yellow tell us?We are now ready to insert the data into our table.
And it's our test time too!
Will it be faster to read data from the Names table or from a table with the same structure but not partitioned?
The time of the test: Partitioned Vs. Not partitioned
What we will do?
Then we fill in both tables and see from which of the two tables we read the data faster!
declare @i integer;
set @i = 0;
while @i < 10000000 -- 10M
begin
exec('Insert into Names (FirstLetter,Name)
Values (''A'',''Anna '+@i+'''),
(''B'',''Berto '+@i+'''),
(''C'',''Claudio '+@i+'''),
(''D'',''Diana '+@i+'''),
(''E'',''Ettore '+@i+''')')
set @i = @i + 1
end
CREATE TABLE Names_nopart
(FirstLetter Varchar(1),
Name varchar(40))
..and we fill itwith this command:
INSERT INTO Names_nopart SELECT * FROM names
Last thing..We add the clustered index on the Firstletter field in both tables.
CREATE CLUSTERED INDEX idx_Names_firstletter ON Names(firstletter)
CREATE CLUSTERED INDEX idx_Names_nopart_firstletter ON Names_nopart(firstletter)
Now we type these two command and measure the execution time:
SELECT * FROM Names WHERE FirstLetter in ('B','B') AND name = 'Berto 1'
SELECT * FROM Names_nopart WHERE FirstLetter in ('B','B') AND name = 'Berto 1'
...and here are the results.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 120 ms.
(1024 righe interessate)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 186 ms.
.
- The property Partioned will be set to true.
- The property number of partitions accessed, in our example will be equal to 2.
News reports ..
- CU 14 for SQL Server 2019 has been published. Download it directly from the microsoft site
- General availability GA release of the new SQL Server 2022 by the end of the year.
- Started an affiliation with the popular virtualization software VMWARE which we will use often shortly. I suggest and recommend clicking here to watch to all the offers
Comments
Post a Comment