SQL Server 2025 CU3 – The Hidden Performance Fix Nobody Talks About (False Sharing Benchmark)
SQL Server 2025 CU3 – Does It REALLY Fix CPU Contention? Let’s Benchmark It 🔥
Hi SQL SERVER Guys,
Today... we go serious.
No theory. No assumptions. No marketing.
👉 Just tests, numbers and benchmarks.
We are going to validate one of the most interesting fixes introduced in SQL Server 2025 CU3:
"Reduces CPU contention on high-core servers by fixing cache line conflicts (false sharing), improving overall scalability."
(If you missed it, check my full CU3 breakdown here 👇)
👉 SQL Server 2025 CU3 – Critical Fixes You Should NOT Ignore
🎯 Benchmark Goal
We want to verify if CU3 really reduces:
- CPU contention
- Spinlocks / latch contention
- Scalability issues on multi-core systems
👉 In short: parallel workload scalability
⚠️ The Real Problem: False Sharing
The bug is related to:
👉 false sharing so when:
- multiple CPU cores writing on the same cache line
- continuous cache invalidations
- massive performance degradation
This typically this happens with:
- high concurrency workloads
- shared engine structures
- columnstore-heavy operations
🧪 Test Strategy
| Scenario | Version |
|---|---|
| Test 1 | post CU2 |
| Test 2 | Post CU3 |
🔥 STEP 1 – Build a Serious Dataset
DROP TABLE IF EXISTS BigTable;
CREATE TABLE BigTable
(
id INT IDENTITY(1,1),
c1 INT,
c2 INT,
c3 CHAR(200)
);
INSERT INTO BigTable (c1,c2,c3)
SELECT TOP 5000000
ABS(CHECKSUM(NEWID())),
ABS(CHECKSUM(NEWID())),
REPLICATE('X',200)
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;
👉 We need volume + memory pressure
🔥 STEP 2 – CPU Heavy Query
SELECT SUM(c1), SUM(c2) FROM BigTable WHERE c1 % 10 = 0;
👉 The WHERE clause made the query Non-SARGable (if you miss my post about it: SQL SERVER! SARGability: the One Concept You Absolutely Must Understand!) → this forces CPU work
🔥 STEP 3 – Parallel Stress (CRITICAL)
Open 1 sessions , run and add other session until we have 100 sessions!
SELECT SUM(c1), SUM(c2) FROM BigTable WHERE c1 % 10 = 0;
👉 This creates:
- high concurrency
- CPU pressure
- real contention
📊 STEP 4 – What to Measure
1. CPU Pressure
SELECT TOP 10 * FROM sys.dm_os_schedulers ORDER BY runnable_tasks_count DESC;
👉 Look at:
- runnable_tasks_count
- load_factor
2. Spinlock Contention 🔥
SELECT * FROM sys.dm_os_spinlock_stats WHERE collisions > 0 ORDER BY collisions DESC;
3. Wait Stats
SELECT * FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC;
Focus on:
- SOS_SCHEDULER_YIELD
- CXPACKET / CXCONSUMER
4. Real CPU Usage
SELECT
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS Idle,
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS SQLCPU
FROM
(
SELECT CONVERT(XML, record) AS record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'
) x;
📈 STEP 5 – Expected Results
✅ SQL Server 2025 CU2
- High CPU but inefficient
- Heavy spinlock contention
- Poor scaling with more threads
✅ SQL Server 2025 CU3
- More stable CPU usage
- Reduced contention
- Better scalability with concurrency
👉 This is the key point:
But it’s not just about speed… it’s about SCALABILITY.
🧠 Advanced Test. Using columnstore indexes
To push the benchmark even further, we will repeat the entire test using Columnstore indexes.
This is particularly important because the CU3 fix specifically targets scenarios where internal engine structures—heavily used by Columnstore—can suffer from cache line contention and false sharing.
By introducing a Columnstore index, we increase:
-
parallelism
-
memory pressure
-
internal engine activity
In other words, this is where the fix should make the biggest difference.
To do this we create the following columnstore indexes on our table:
CREATE CLUSTERED COLUMNSTORE INDEX CCI_BigTable ON BigTable;
📊 STEP 6 – Benchmark Setup
In order to produce reliable and repeatable results, we installed a fresh copy of Windows 11 on VirtualBox.
We then installed SQL Server 2025 and updated it to CU2.
The virtual machine was configured with:
-
4 CPU cores
-
8 GB of RAM
To simulate a realistic and scalable workload, we developed a Python script that continuously creates sessions executing the following query:
SELECT SUM(c1), SUM(c2)
FROM BigTable
WHERE c1 % 10 = 0;
Every 5 seconds, a new session is added, gradually increasing concurrency until reaching 100 simultaneous sessions.
In parallel, a second Python script executes a stored procedure that periodically captures and stores key performance metrics into a SQL table.
This allows us to track CPU usage, contention, and scalability behavior over time.
Then we installed the CU3 on SQL SERVER e restarted it e running again our benchmark.
READY... FOR THE RESULTS?
📊 Benckmark results
Load Factor
- CU2 (blue) grows much faster so CU2 accumulates more work in the queue, this is clear sign of contention
- CU3 (orange) grows in a more controlled way, CU3 handles threads better, we have less congestion
Collisions
What it means:
multiple cores writing to the same cache line
constant cache invalidations
spinlocks going crazy
👉 This chart alone proves that fix 4836787 works
SQL CPU
In CU2 we have much higher CPU (obviuosly the values 140 means 100%)
In CU3 the CPU usage is significantly lower (~40)
It does NOT mean CU3 is slower
It means:
➡️ CU2 wastes CPU
➡️ CU3 uses CPU efficiently
Wait Time
Less Waiting means More Throughput
In CU2 we have explosive growth (over 35 million ms)
In CU3 this is much more controlled (~23 million ms)
This mean:
-
fewer global waits
-
fewer internal bottlenecks
-
better parallelism
Max Wait Time
CU2 we have peaks up to ~650 ms
CU3 they remain around ~400 ms
👉 This is extremely important for real-time workloads, APIs and OLTP systems
💡 CU3 reduces the worst spikes, not just the average
Conclusion
👉 This benchmark proves that the fix:
✔ does not simply increase speed
✔ improves scalability under parallel load
✔ reduces:
-
spinlock contention
-
cache invalidation
-
CPU waste
Final Verdict: Should You Upgrade to CU3?
If you run SQL Server on:
- multi-core machines
- high concurrency workloads
- analytics / columnstore queries
👉 YES. Upgrade immediately.
This is not a cosmetic fix.
We proved that this is engine-level scalability improvement.
And those are rare… and extremely valuable.
No SQL Server instances were harmed during these tests… well, at least not too much


Comments
Post a Comment