How to read XML file from SQL Server. The OPENXML statement and a little bit of Datascience

Hi Guys,

Welcome back!

Would you like to learn how to read XML file from SQL Server?
Would you like to learn how to use the OPENXML statement?

Today we will do also some datascience.
 So what are you waiting for? …Follow me!

Introduction

XML files are very popular nowadays.
They are nothing more than text files with a structure established by means of tags.
An example of an XML file could be this:

<ROOT>
<ROW
        STAZIONE="san pietro capofiume"
        DATAINIZIO="31/12/2018 23.00.00"
        DATAFINE="01/01/2019"
        STRUMENTO="api 200e"
        NO2="28"
        UM="ug/m3"
        F1=""
        F2="1"
        F3="1"
        F4="1"
/>
       <ROW
        STAZIONE="san pietro capofiume"
        DATAINIZIO="01/01/2019"
        DATAFINE="01/01/2019 1.00.00"
        STRUMENTO="api 200e"
        NO2="25"
        UM="ug/m3"
        F1=""
        F2="1"
        F3="1"
        F4="1"
       />
</ROOT>


As already mentioned many data are today memorized into an XML file.
For this example and in order to do some datascience in this post I will use data from Arpae

At address https://www.arpae.it/v2_rete_di_monitoraggio.asp?p=BO&s=7000027&idlivello=134&w=8#param you will find "air quality" data from the SAN PIETRO CAPOFIUME station.


SAN PIETRO CAPOFIUME station measures continuously (every hour) these parameters: NO2 (Nitrogen dioxide); O3 (Ozone); PM10; PM2.5

For this example we are interested to check the progress of the NO2 before and after the lockdown due to the coronavirus!
Inside XML file data are stored in this way:

<ROOT>
<ROW
        STAZIONE="san pietro capofiume"
        DATAINIZIO="31/12/2018 23.00.00"
        DATAFINE="01/01/2019"
        STRUMENTO="api 200e"
        NO2="28"
        UM="ug/m3"
        F1=""
        F2="1"
        F3="1"
        F4="1"
/>
       <ROW
        STAZIONE="san pietro capofiume"
        DATAINIZIO="01/01/2019"
        DATAFINE="01/01/2019 1.00.00"
        STRUMENTO="api 200e"
        NO2="25"
        UM="ug/m3"
        F1=""
        F2="1"
        F3="1"
        F4="1"
       />
</ROOT>

So, how to read them?


Read an XML file from SQL Server

The instruction that read a XML is the OPENXML.
The basic sintax is pretty simple:

       
SELECT * FROM OPENXML(@docHandle, N'/ROOT')       
 


So, how to use the OPENXML command?

I wrote this simply procedure to read XML data and write it into a SQL Server table called TABLEDATA.

DECLARE @docHandle int;
DECLARE @xmlDocument nvarchar(max); 

SET @xmlDocument = 

N'<ROOT> 
    <ROW
        STAZIONE="san pietro capofiume"
        DATAINIZIO="31/12/2018 23.00.00"
        DATAFINE="01/01/2019"
        STRUMENTO="api 200e"
        NO2="28"
        UM="ug/m3"
        F1=""
        F2="1"
        F3="1"
        F4="1"
    />
    <ROW
        STAZIONE="san pietro capofiume"
        DATAINIZIO="01/01/2019"
        DATAFINE="01/01/2019 1.00.00"
        STRUMENTO="api 200e"
        NO2="25"
        UM="ug/m3"
        F1=""
        F2="1"
        F3="1"
        F4="1"
    />
</ROOT>'


EXEC sp_xml_preparedocument @docHandle OUTPUT, @xmlDocument; 

-- Use OPENXML to provide rowset consisting of customer data. 
SELECT *  
INTO TABLEDATA
FROM OPENXML(@docHandle, N'/ROOT')
 

Once data are inserted into a SQL Server table we can analyze it.

Watching at the content of TABLEDATA it is clear that we need some query to reorganize data:


For example:

       
Select cast(tt1.text as varchar) as DataInizio, cast(tt2.text as varchar) as NO2
into dati2
From
(
(select t2.parentid,t2.localname,t1.text from TABLEDATA t1
join TABLEDATA t2 on t1.parentid = t2.id
where t1.text is not null and t2.localname in ('DATAINIZIO')) tt1
join
(select t2.parentid,t2.localname,t1.text from TABLEDATA t1
join TABLEDATA t2 on t1.parentid = t2.id
where t1.text is not null and t2.localname in ('NO2')) tt2 on tt1.parentid = tt2.parentid
)
   


Now we can read data in a clear way:

for example we can group by day through this Query:
       
Select Left(DataInizio,10) as Giorno, sum(cast(NO2 as float)) as NO2
From dati2
Group by Left(DataInizio,10)
Order by cast(Left(DataInizio,10) as datetime)



And plot data:



From the graph you can see clearly that NO2 from January 2020 it has been decreasing.
In February and in March NO2 has halved again.
This, could be, due to minor vehicles running on the streets due to the lockdown.

Conclusion

This time we have seen how read an XML throgh SQL Server.
And we also did a little datascience!

I hope you found this post useful and interesting.
If so don't forget to subscribe to this blog.
we are preparing many other interesting topics

That all for today
See you soon!

Luca


Luca Biondi @ SQLServerPerformance blog 2020!  












Previuos post: From SQLServer to ML, a first little step into machine learning

Comments

  1. thank you for info

    stay safe #covid-19

    see more: http://coronavirusnpl.com

    ReplyDelete
    Replies
    1. Thank you to you glad you find useful the post!
      Luca

      Delete

Post a Comment

I Post più popolari

SQL Server, execution plan and the lazy spool (clearly explained)

SQL Server, datetime vs. datetime2

La clausola NOLOCK. Approfondiamo e facciamo chiarezza!