SQL Server, inside the parser: the Get_Gen_Lex procedure

Hello friends,

Well, we have arrived on Mars! The Perseverance probe designed by NASA reached the Red Planet after 203 days!

"The human being by his nature always wants to overcome his limits and go beyond. If curiosity is the great engine that makes humanity progress, science is the engine.
Today is a big day for science!"

To this and to those who do not want to have limits I dedicate this post.

 

Yes, again about the Parser of SQL Server

 
Today we talk about another procedure of the SqlLang.dll the Get_Gen_Lex procedure.
This is the second part of a series of posts about the parser and you can find the first part here

Enjoy the reading!

The CParser::Get_GenLex procedure

This procedure, part of the CParser class, is usually called after the GetChar procedure and it is used to classify the character just read.

In many parts of the LGetToken routine for example we have:


Procedure and parameters

Looking at the source below

 

we can say that the declaration is:

int CParser::Get_Gen_Lex(CCompatLevel param_1,ulong param_2)

The procedure has two parameters:

CCompatLevel (CL) and param_2 (EDX) and return an int32 value in the EAX register.

This procedure return the type of character read.


So, let's read togheter the main part of the Get_Gen_Lex routine

1) As for every asm routine written in C (as the entire SQL Server is) we can find at the start of the function the instruction used to save the rbx register: This istruction is the "push rbx"
Then you will find the instruction "sub rsp, 20h" that allocates 0x20 bytes of stack space.
At the end of the function we will find the "inverse" opcodes:  "add rsp,20h" , "pop rbx" and "ret".
I hightlight these areas in yellow.

00007fff`63b52ff0 53              push    rbx

00007fff`63b52ff1 4883ec20        sub     rsp,20h


2) The EDX register is the second parameter of the procedure and contains the value the read from the input string. 

In the previous post we read the character 's' (53h) with the getchar procedure and now this value is passed to the param2

If the value read if less than 80h

00007fff`63b52ff5 8bda mov ebx,edx  

00007fff`63b52ff7 81fa80000000 cmp edx,80h

00007fff`63b52ffd 731a            jae     sqllang!CParser::Get_Gen_Lex+0x2f (00007fff`63b53019)       


If the value read is not equal to 5Ch ("|")

00007fff`63b52fff 83fb5c cmp ebx,5Ch

00007fff`63b53002 0f840a42af00 je sqllang!CParser::Get_Gen_Lex+0x15 (00007fff`64647212)

Then the procedure read from sqllang!charTab zone of memory the type of character and it store this value in the EAX register. The register rbx contain the value read.

00007fff`63b53008 488d0d31954402 lea rcx,[sqllang!CharTab (00007fff`65f9c540)]

00007fff`63b5300f 0fb60499 movzx eax,byte ptr [rcx+rbx*4]               


Finally the original rsp and rbx values are restored (since we are exiting from the procedure) and the value in eax is returned.                  

00007fff`63b53013 4883c420        add     rsp,20h

00007fff`63b53017 5b              pop     rbx

00007fff`63b53018 c3              ret

 

This is the normal behaviour of the procedure! 

But what  does the sqllang!charTab zone of memory contain? 

 

Sqllang!charTab zone of memory

This memory zone is 80h x 4 = 140h bytes wide and for each character contain a property called "type of character" here highlighted in orange.


Values from 0 to 20h are non printable charaters and returns a type equal to 3 (eax=3).

Values from 21h to 2fh are special characters returns types 05, 06, 04 and 01

Values from 30h to 39h are numerical characters. Return type 02

Values from 3ah to 40h are special characters. Return types 05 and 01


Values from 61h to 7ah are alfabetical characters. Return type 01

Values from 5bh to 60h are special characters. Return types 05, 04 and 01.


Values from 61h to 7ah are alfabetical characters. Return type 01


Values from 7bh to 7fh are special characters. Return type 05.


Briefly:

01 = Alfabethical characters from a to z plus _ @ #

02 = numbers from 0 to 9

03 = non printable characters.

04 = this special characters: [ " '

05 = this special characters: { | } ~ \ ] ^ : ; < = >? ! % & ( ) * + , - . /

06 $

 

=> Ooh yes.. remember this values for the next posts...


Exception:

If the value read from the stream is equal to 0ffffffffh then eax = 9


00007fff`63b53019 83fbff cmp ebx,0FFFFFFFFh

00007fff`63b5301c 0f85f941af00 jne sqllang!CParser::Get_Gen_Lex+0x3f (00007fff`6464721b)

00007fff`63b53022 b809000000 mov eax,9

00007fff`63b53027 4883c420 add rsp,20h

00007fff`63b5302b 5b pop rbx

00007fff`63b5302c c3 ret

00007fff`63b5302d 90 nop

00007fff`63b5302e 90 nop

00007fff`63b5302f 90 nop



If the value read is equal to 5Ch then eax = 7h


00007ffd`d6627212 8d43ab lea eax,[rbx-55h] ; 5C - 55 -> 7h

00007ffd`d6627215 4883c420 add rsp,20h

00007ffd`d6627219 5b pop rbx

00007ffd`d662721a c3 ret


And now?

Well, for now no more details about this procedure (i know we did not talk about what happen when the value is greater than 80h and not equal to 0ffffffffh...)

Next time we will talk about another procedure part of the Parser called LGetToken.

Another step inside the parser..




That's all folks but remember:

If you liked this post leave a comment, subscribe to the blog and wait for the next post!

Luca












Previous post: SQL Performance: Say no to the "NOT IN" operator

Comments

  1. Thank you so much for providing information about SQL server and many complexities that could have easily helped people look and find out some brutal and useful ways of simplifying operations.

    SQL Server Load Rest API

    ReplyDelete
    Replies
    1. Thank you James for you kind words. Stay tuned for the next posts! Luca

      Delete

Post a Comment

I Post più popolari

SQL Server, execution plan and the lazy spool (clearly explained)

SQL Server, datetime vs. datetime2

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!