SQL Server Performance Series - Understanding the Engine Beyond Traditional Indexes! XML INDEXES!

SQL Server XML Indexes – When They Help, When They Hurt, and What Nobody Tells You

SQL Server Performance Series – Understanding the Engine Beyond Traditional Indexes

Previous article: I Removed 32 Seconds From This SQL Server Query Without Adding an Index

Hi guys,

In the previous post we pushed the SQL Server engine to remove 32 seconds from a query — without adding a single index.

Today we go somewhere different. Not rowstore. Not columnstore. Not filtered indexes.

XML Indexes.

Powerful. Expensive. Often misunderstood.

Let’s break them down properly.


The Problem

You store semi-structured data inside an XML column. Then someone writes a query like this:


SELECT *
FROM Orders
WHERE OrderData.value('(/Order/CustomerID)[1]', 'INT') = 42;

It works. But performance is terrible.

Full table scan.
XML parsing per row.
CPU explosion.


Why XML Is Expensive

Without an XML index, SQL Server must:

  • Shred XML at runtime
  • Parse the document repeatedly
  • Evaluate XQuery expressions per row

That means high CPU usage and poor scalability.


Primary XML Index

The first step is creating a Primary XML Index.


CREATE PRIMARY XML INDEX PXML_Orders_OrderData
ON Orders(OrderData);

What this does:

  • Shreds XML into internal relational structures
  • Creates a hidden node table
  • Stores path, value and hierarchy information

Important: this can dramatically increase storage usage.

In some cases the XML index is larger than the base table.


Secondary XML Indexes

After creating the Primary XML Index, you can add secondary indexes depending on workload.

PATH Index


CREATE XML INDEX PXML_Orders_Path
ON Orders(OrderData)
USING XML INDEX PXML_Orders_OrderData
FOR PATH;

Optimizes path-based queries.

VALUE Index


CREATE XML INDEX PXML_Orders_Value
ON Orders(OrderData)
USING XML INDEX PXML_Orders_OrderData
FOR VALUE;

Optimizes value comparisons inside XML.

PROPERTY Index

Useful for property-bag style XML structures.


Performance Before vs After

Before XML Index:

  • Table Scan
  • High CPU
  • Slow XQuery evaluation

After Primary + Secondary XML Index:

  • XML Index Seek
  • Reduced CPU usage
  • Significant execution time improvement

On large datasets, improvements can be dramatic.


When XML Indexes Hurt

  • High insert/update workloads
  • Frequent XML modifications
  • Very large XML documents
  • Indexes created but rarely used

XML indexes are expensive to maintain.

Every DML operation updates the shredded node table.


Hidden Costs

  • Storage overhead
  • Fragmentation
  • Backup size increase
  • Longer rebuild time

Measure storage impact like this:


SELECT 
    i.name,
    ps.used_page_count * 8 / 1024.0 AS SizeMB
FROM sys.indexes i
JOIN sys.dm_db_partition_stats ps
    ON i.object_id = ps.object_id
   AND i.index_id = ps.index_id
WHERE i.object_id = OBJECT_ID('Orders');

Best Practices

  1. Create a Primary XML Index only if XML queries are frequent and performance-critical.
  2. Add secondary XML indexes based on real workload patterns.
  3. Benchmark before and after implementation.
  4. Monitor storage growth.
  5. Never create all XML index types blindly.

The Real Lesson

XML indexes are not magic performance boosters.

They are specialized tools.

If your workload depends heavily on XML querying, they can drastically reduce CPU cost.

If not, they silently increase storage and slow down write operations.


Performance tuning is about choosing the right index — not creating more indexes.

See you in the next deep dive. Maybe JSON indexing next? 

Luca Biondi

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!