SQL Server, A bit of reverse engineering inside the parser: the Parser and the GetChar procedure.

Hello friends,

Welcome to another post about the Parser. We started talking about it in this posts: SQL Server is a compiler! & Where T-SQL tokens are stored?

What did we say? Briefly we say that every batch you send to the engine is forst analized and then traduced into a series of more simple instructions called the input tree.

Each command is parsed by a parser routine coded into the CParser class that is located in the SqlLang.dll.
SO the job of the Parser is to retrieve characters from the input string, break then into token and finally add the operation to be executed into the Tree structure input tree.

I know you are curious to enter in greater details! Well you are right this is a first!

The CParser::GetChar procedure

What is the GetChar procedure? The Getchar procedure is a routine part of the CParser class that read every character you send to the sql server engine.

From the picture below we can take a look to the call stack of the SQLLang.dll.
The Getchar procedure is the more deep routine after: sqlpars, yyparse, yylex, CParser:LPeekAfterNextToken.

This is the GetChar routine

Let's take a first look:

1) As for every asm routine written in C (as the entire SQL Server is) we can find at the start of the function the instruction used to save the rbx register: This istruction is the "push rbx"
Then you will find the instruction "sub rsp, 20h" that allocates 0x20 bytes of stack space.
At the end of the function we will find the "inverse" opcodes:  "add rsp,20h" , "pop rbx" and "ret".
I hightlight these areas in yellow.

2) A single Parameter is passed to the function through the register RCX

3) The result is stored into the EAX register


00007fff`633235f0 53              push    rbx
00007fff`633235f1 4883ec20        sub     rsp,20h

00007fff`633235f5 8b9164050000    mov     edx,dword ptr [rcx+564h]
00007fff`633235fb 488bd9          mov     rbx,rcx
00007fff`633235fe 8b8968050000    mov     ecx,dword ptr [rcx+568h]
00007fff`63323604 3bd1            cmp     edx,ecx
00007fff`63323606 7d44            jge     sqllang!CParser::GetChar+0x19 (00007fff`6332364c)
00007fff`63323608 81faa00f0000    cmp     edx,0FA0h
00007fff`6332360e 0f8310740c00    jae     sqllang!CParser::GetChar+0x44 (00007fff`633eaa24)
00007fff`63323614 48638b64050000  movsxd  rcx,dword ptr [rbx+564h]
00007fff`6332361b 0fb7944b38060000 movzx   edx,word ptr [rbx+rcx*2+638h]
00007fff`63323623 8d4101          lea     eax,[rcx+1]
00007fff`63323626 83835805000002  add     dword ptr [rbx+558h],2
00007fff`6332362d 898364050000    mov     dword ptr [rbx+564h],eax
00007fff`63323633 6683fa0a        cmp     dx,0Ah
00007fff`63323637 0f849dbd0000    je      sqllang!CParser::GetChar+0xc4 (00007fff`6332f3da)
00007fff`6332363d 0fb7c2          movzx   eax,dx
00007fff`63323640 898360050000    mov     dword ptr [rbx+560h],eax
00007fff`63323646 4883c420        add     rsp,20h
00007fff`6332364a 5b              pop     rbx
00007fff`6332364b c3              ret

00007fff`6332364c 83835805000002  add     dword ptr [rbx+558h],2
00007fff`63323653 8d4201          lea     eax,[rdx+1]
00007fff`63323656 898364050000    mov     dword ptr [rbx+564h],eax
00007fff`6332365c 83c8ff          or      eax,0FFFFFFFFh
00007fff`6332365f c78360050000ffffffff mov dword ptr [rbx+560h],0FFFFFFFFh
00007fff`63323669 4883c420        add     rsp,20h
00007fff`6332366d 5b              pop     rbx
00007fff`6332366e c3              ret
00007fff`6332366f 90              nop

But how does it works?

The memory location at address [rcx+564h] store a progressive while the location at [rcx+568h] contain the value 13 ('e').
00007fff`63b535f5 8b9164050000    mov     edx,dword ptr [rcx+564h]
00007fff`63b535fb 488bd9          mov     rbx,rcx
00007fff`63b535fe 8b8968050000    mov     ecx,dword ptr [rcx+568h]
00007fff`63b53604 3bd1            cmp     edx,ecx

If the progressive stored in edx is less than 13 ('e') and not equal to 4000 ('0fa0') ...

00007fff`63323606 7d44            jge     sqllang!CParser::GetChar+0x19 (00007fff`6332364c)
00007fff`63323608 81faa00f0000    cmp     edx,0FA0h
00007fff`6332360e 0f8310740c00    jae     sqllang!CParser::GetChar+0x44 (00007fff`633eaa24)
Read again the progressive stored in rcx and read the data in input:
00007fff`63323614 48638b64050000  movsxd  rcx,dword ptr [rbx+564h]
00007fff`6332361b 0fb7944b38060000 movzx   edx,word ptr [rbx+rcx*2+638h]

Now the edx register contain the value 53 ('S') (the first letter of the word "SELECT")

WOW! Looking to the picture below is clear that we are reading the input string:

After have read the actual character, the routine increment the progressive stored in location [rcx+564h].

00007fff`63323623 8d4101          lea     eax,[rcx+1]
00007fff`63323626 83835805000002  add     dword ptr [rbx+558h],2
00007fff`6332362d 898364050000    mov     dword ptr [rbx+564h],eax
Now the procedure compare the read value ('53') with the value 10 ('0a') (line feed)

00007fff`63323633 6683fa0a        cmp     dx,0Ah
00007fff`63323637 0f849dbd0000    je      sqllang!CParser::GetChar+0xc4 
If the read value is a not a line feed then store the value in eax register. (this is the result of the function). The same value is store at the location [rbx+560h].
00007fff`6332363d 0fb7c2          movzx   eax,dx
00007fff`63323640 898360050000    mov     dword ptr [rbx+560h],eax

finally the procedure came to an end:

00007fff`63323669 4883c420        add     rsp,20h
00007fff`6332366d 5b              pop     rbx
00007fff`6332366e c3              ret
00007fff`6332366f 90              nop

I've described not the whole procedure only a part in order to explain how data are read.
Remeber that the getchar function is called for each character contanied in the input string.
In out example for the "SELECT" string the getchar will be called 6 times.

If you enjoyed this post, don't forget the next post where we will look to the functions that calls the getchar!
Have a good week!! Subscribe to this blog if you like it!!!


