SQL Server is a compiler! & Where T-SQL tokens are stored?

Hello friends,

First of all i want to wish you an happy new year!

I know 2020 hitted hard for many of us and hitted hard also for me too! But it is necessary to be positive, go on and working hard. In this manner who can stop us? nobody can!

For the first post of this 2021 i would talk about the SQL Server parser exploring how it works.

We have already talked about this topic in the beginning of the 2020 with with a series of posts:

But now it is the time  to go into a deep detail!

Are you ready? go!


SQL Server is a compiler!

Yes, SQL Server is a compiler. The input of the compiler is the T-SQL command you send to the engine. Every batch you send is analized and traduced into a series of more simple instructions.
The heart of the compiler is a parser coded into the CParser class located in the SqlLang.dll.
 
The Parser retrieve the characters from the string in input, break the text into token and add the operation to be executed into a Tree structure called input tree.
Examples of token that parser is trained to find are for example the string "select","insert","update" etc etc.
 
Our parser is based on the yacc and lex model and the functions are named yyparse and yylex.


More in detail yyparse function read the input string e call yylex that return the type of token found.

At the end of this phase the T-SQL command is traduced in a series of instruction stored in the input tree.

As an example we execute the T-SQL command below on the northwind database
 
       
SELECT p.productid, p.ProductName, c.CategoryName
FROM Northwind.dbo.products p, Northwind.dbo.Categories c
WHERE p.categoryid = c.CategoryID AND p.categoryid < 4
       
 
 
The input tree obtained is:

*** Input Tree: ***
  LogOp_Project QCOL: [p].ProductID QCOL: [p].ProductName QCOL: [c].CategoryName [ Card=0 ]
     LogOp_Select [ Card=0 ]
        LogOp_Join [ Card=0 ]
           LogOp_Get TBL: Northwind.dbo.products(alias TBL: p) Northwind.dbo.products TableID=533576939 TableReferenceID=0 IsRow: COL: IsBaseRow1000  [ Card=0 ]
           LogOp_Get TBL: Northwind.dbo.categories(alias TBL: c) Northwind.dbo.categories TableID=309576141 TableReferenceID=0 IsRow: COL: IsBaseRow1001  [ Card=0 ]
           ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
        ScaOp_Logical x_lopAnd
           ScaOp_Comp x_cmpEq
              ScaOp_Identifier QCOL: [p].CategoryID
              ScaOp_Identifier QCOL: [c].CategoryID
           ScaOp_Comp x_cmpLt
              ScaOp_Identifier QCOL: [p].CategoryID
              ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=4)
     AncOp_PrjList 
 
From this moment SQL Server will not use anymore the T-SQL language. 


Where T-SQL token are stored?


What we have already told? A little recap.
 
T-SQL syntax is traduced by a parser that produce an input tree.
Source code of the parser is located in the Sqllang.dll CParser class.
The parser is triained to find the T-SQL syntax using a list of token.
The list of token contain the elements of the T-SQL grammar.

The next question is, the token are located?

Token too are located inside the SqlLang.dll.

Doing some reverse engineering we can take a look to the token list.
In the pics below the can observe 3 token : LEN , COALESCE and NULLIF.



Note that strings are stored in unicode format where 1 character uses 2 byte.
A series of "00" is used to detect the end of the token.

That's all for today and if I've intrigued you, stay tuned for the next post where we will continue to talk about the Parser! I will show you a lot of informations in preview and that you can only find here on Sqlserverperformace.blogspot.it 
I wait for you in the comments for any doubt, question or curiosity.

Luca












Previous post: SQL Server, How to find a text inside a SQL Server trigger or procedures?

Comments

I Post più popolari

SQL Server, execution plan and the lazy spool (clearly explained)

SQL Server, datetime vs. datetime2

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!