SQL Server, the bitmap operator clearly explained.
For this last post of March we will talk about an operator that we have seen in the execution plan.
Perhaps it is less known than the others but it is worth knowing.
Today we will talk about the Bitmap operator.
It is used to filter data and to improve performance.
As usual, I will try to be as clear as possible.
Enjoy the reading!
The Bitmap Operator
As mentioned earlier today we will talk about the Bitmap operator.
This operator is born with the aim of improving performance when we are dealing with a lot of data and the execution plan works in parallel.
This operator is born with the aim of improving performance when we are dealing with a lot of data and the execution plan works in parallel.
Let's see with an example how to generate an execution plan that contains the Bitmap operator.
For the example we generate a couple of heap tables:
Create table TabA (id int identity(1,1), Progr int)
Create table TabB (id int identity(1,1), Progr int)
Both tables have the same structure: a field ID of type integer and a field Progr of type integer.We will join TabA and TabB through the Prog field.
We populate the tables with enough data so that the cost of the query is sufficient to run it in parallel.
Insert into TabA (Progr) values (0)
Insert into TabA (Progr) Select MAX(Progr)+1 from TabA
go 500
Insert into TabA (Progr) Select 501 from TabA
Insert into TabB (Progr) Select Progr from TabA
go 10000
Insert into TabB (Progr) Select 501 from TabA
Table TabA has 1002 records while Table TabB has about 10 million of record.We have the data, we need a query!
Now run these two queries:
Select * from TabA a
join TabB b on a.Progr = b.Progr
where
b.Progr >= 500
Select * from TabA a
join TabB b on a.Progr = b.Progr
where
b.Progr >= 501
Two queries has two different execution plan.
Just change a value to switch from one execution plan to another
What can you notice?
You can notice that the second query execution plan has the operator we were looking for: the Bitmap operator!
Keep in mind that this operator is used only if the Optimizer thinks that the a bitmap will be selective enough to be useful.
What does it mean?
A bitmap operator is a way to perform a "semijoin reduction" operation (a process that excludes rows from the subsequent join operation)
The gain from joining fewer rows must be greater than the cost of running the bitmap operator
Now let's talk about what it is and how it works.
How the Bitmap operator works
A Bitmap is simply an array of bits where its size is equal to the max value of the Progr field.
From the following picture we can see that the Bitmap is created from the build side table that is the our TabA table.Ho the bitmap is created?
For each row of the TabA the value of the columns Progr is first
hashed, then the value returned is used to set the bit corresponding index of the arrary.
(note that the Progr is the column used for the join)
For example:
For example:
- The Progr field of the first row of the table is equal to the value 1.
- Hashing the value 1 we obtain the value 37
- I set (to 1) the 37th location of the bitmap array
Now we understand how and where the bitmap is created.
Now let's see how and where the bitmap is read.
We can see that the bitmap is read at the beginning when the TabB is scanned.
The bitmap is checked from the probe side.
Each row of the Table TabB is hashed and
checked whether the bit is set or not in Bitmap.
If the bit is not
set, then we know that the row will not qualify in the join
(since there is no match from the other table) and we can discard the row.
Doing it this way we can quickly eliminating rows without execute full join algorithm
Furthermore, from the popup we can clearly read the string "IN ROW" that means that we are using the InRow optimization. In this case the bitmap is used to eliminate the rows even earlier of the Query execution.
From the picture below without the bitmap operator we will read 20756 rows from the table TabB while with the bitmap operator we will read only 2004 rows!
This operation is called semijoin reduction and the Bitmap operator try to execute this operation as early as possible and before the parallelism operator.
All this to speed up the execution of the query!
That' all for today!
Stay tuned mates!!!
Help me to share knowledge on my blog
Previous post: SQL server, more on the Ghost cleanup task
Hi Luca. Thank you for the interesting article.
ReplyDeleteYou wrote:
"Table TabA has 1002 records while Table TabB has about 10 million of record.".
Do like most users will likely do. Run the entire block of code that populates TabA and TabB and see that there's a pretty major issue there. You need to add more GOs or the code will genarate a whole lot more rows than you think.