SQL Server Performance Series - Understanding the Engine Beyond Traditional Indexes! XML INDEXES!
SQL Server XML Indexes – When They Help, When They Hurt, and What Nobody Tells You
SQL Server Performance Series – Understanding the Engine Beyond Traditional Indexes
Previous article: I Removed 32 Seconds From This SQL Server Query Without Adding an Index
Hi guys,
In the previous post we pushed the SQL Server engine to remove 32 seconds from a query — without adding a single index.
Today we go somewhere different. Not rowstore. Not columnstore. Not filtered indexes.
XML Indexes.
Powerful. Expensive. Often misunderstood.
Let’s break them down properly.
The Problem
You store semi-structured data inside an XML column. Then someone writes a query like this:
SELECT *
FROM Orders
WHERE OrderData.value('(/Order/CustomerID)[1]', 'INT') = 42;
It works. But performance is terrible.
Full table scan.
XML parsing per row.
CPU explosion.
Why XML Is Expensive
Without an XML index, SQL Server must:
- Shred XML at runtime
- Parse the document repeatedly
- Evaluate XQuery expressions per row
That means high CPU usage and poor scalability.
Primary XML Index
The first step is creating a Primary XML Index.
CREATE PRIMARY XML INDEX PXML_Orders_OrderData
ON Orders(OrderData);
What this does:
- Shreds XML into internal relational structures
- Creates a hidden node table
- Stores path, value and hierarchy information
Important: this can dramatically increase storage usage.
In some cases the XML index is larger than the base table.
Secondary XML Indexes
After creating the Primary XML Index, you can add secondary indexes depending on workload.
PATH Index
CREATE XML INDEX PXML_Orders_Path
ON Orders(OrderData)
USING XML INDEX PXML_Orders_OrderData
FOR PATH;
Optimizes path-based queries.
VALUE Index
CREATE XML INDEX PXML_Orders_Value
ON Orders(OrderData)
USING XML INDEX PXML_Orders_OrderData
FOR VALUE;
Optimizes value comparisons inside XML.
PROPERTY Index
Useful for property-bag style XML structures.
Performance Before vs After
Before XML Index:
- Table Scan
- High CPU
- Slow XQuery evaluation
After Primary + Secondary XML Index:
- XML Index Seek
- Reduced CPU usage
- Significant execution time improvement
On large datasets, improvements can be dramatic.
When XML Indexes Hurt
- High insert/update workloads
- Frequent XML modifications
- Very large XML documents
- Indexes created but rarely used
XML indexes are expensive to maintain.
Every DML operation updates the shredded node table.
Hidden Costs
- Storage overhead
- Fragmentation
- Backup size increase
- Longer rebuild time
Measure storage impact like this:
SELECT
i.name,
ps.used_page_count * 8 / 1024.0 AS SizeMB
FROM sys.indexes i
JOIN sys.dm_db_partition_stats ps
ON i.object_id = ps.object_id
AND i.index_id = ps.index_id
WHERE i.object_id = OBJECT_ID('Orders');
Best Practices
- Create a Primary XML Index only if XML queries are frequent and performance-critical.
- Add secondary XML indexes based on real workload patterns.
- Benchmark before and after implementation.
- Monitor storage growth.
- Never create all XML index types blindly.
The Real Lesson
XML indexes are not magic performance boosters.
They are specialized tools.
If your workload depends heavily on XML querying, they can drastically reduce CPU cost.
If not, they silently increase storage and slow down write operations.
Performance tuning is about choosing the right index — not creating more indexes.
See you in the next deep dive. Maybe JSON indexing next?
Luca Biondi

Comments
Post a Comment