SQL Server 2025 CU3 – The Hidden Performance Fix Nobody Talks About (False Sharing Benchmark)

SQL Server 2025 CU3 – Does It REALLY Fix CPU Contention? Let’s Benchmark It 🔥

Hi SQL SERVER Guys,

Today... we go serious.

No theory. No assumptions. No marketing.

👉 Just tests, numbers and benchmarks.

We are going to validate one of the most interesting fixes introduced in SQL Server 2025 CU3:

"Reduces CPU contention on high-core servers by fixing cache line conflicts (false sharing), improving overall scalability."

(If you missed it, check my full CU3 breakdown here 👇)

👉 SQL Server 2025 CU3 – Critical Fixes You Should NOT Ignore


🎯 Benchmark Goal

We want to verify if CU3 really reduces:

  • CPU contention
  • Spinlocks / latch contention
  • Scalability issues on multi-core systems

👉 In short: parallel workload scalability


⚠️ The Real Problem: False Sharing

The bug is related to:

👉 false sharing so when:

  • multiple CPU cores writing on the same cache line
  • continuous cache invalidations
  • massive performance degradation

This typically this happens with:

  • high concurrency workloads
  • shared engine structures
  • columnstore-heavy operations

🧪 Test Strategy

Scenario Version
Test 1 post CU2
Test 2 Post CU3

🔥 STEP 1 – Build a Serious Dataset

DROP TABLE IF EXISTS BigTable;

CREATE TABLE BigTable
(
    id INT IDENTITY(1,1),
    c1 INT,
    c2 INT,
    c3 CHAR(200)
);

INSERT INTO BigTable (c1,c2,c3)
SELECT TOP 5000000
    ABS(CHECKSUM(NEWID())),
    ABS(CHECKSUM(NEWID())),
    REPLICATE('X',200)
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;

👉 We need volume + memory pressure


🔥 STEP 2 – CPU Heavy Query

SELECT SUM(c1), SUM(c2)
FROM BigTable
WHERE c1 % 10 = 0;

👉 The WHERE clause made the query Non-SARGable (if you miss my post about it: SQL SERVER! SARGability: the One Concept You Absolutely Must Understand!) → this forces CPU work


🔥 STEP 3 – Parallel Stress (CRITICAL)

Open 1 sessions , run and add other session until we have 100 sessions!

SELECT SUM(c1), SUM(c2)
FROM BigTable
WHERE c1 % 10 = 0;

👉 This creates:

  • high concurrency
  • CPU pressure
  • real contention

📊 STEP 4 – What to Measure

We will measure the following parameters:

1. CPU Pressure

SELECT TOP 10 *
FROM sys.dm_os_schedulers
ORDER BY runnable_tasks_count DESC;

👉 Look at:

  • runnable_tasks_count
  • load_factor

2. Spinlock Contention 🔥

SELECT *
FROM sys.dm_os_spinlock_stats
WHERE collisions > 0
ORDER BY collisions DESC;

3. Wait Stats

SELECT *
FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC;

Focus on:

  • SOS_SCHEDULER_YIELD
  • CXPACKET / CXCONSUMER

4. Real CPU Usage

SELECT 
    record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS Idle,
    record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS SQLCPU
FROM 
(
    SELECT CONVERT(XML, record) AS record
    FROM sys.dm_os_ring_buffers
    WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'
) x;

📈 STEP 5 – Expected Results

If this patch do not lie we will expect:

SQL Server 2025 CU2

  • High CPU but inefficient
  • Heavy spinlock contention
  • Poor scaling with more threads

✅ SQL Server 2025 CU3

  • More stable CPU usage
  • Reduced contention
  • Better scalability with concurrency

👉 This is the key point:

But it’s not just about speed… it’s about SCALABILITY.


🧠 Advanced Test. Using columnstore indexes

To push the benchmark even further, we will repeat the entire test using Columnstore indexes.

This is particularly important because the CU3 fix specifically targets scenarios where internal engine structures—heavily used by Columnstore—can suffer from cache line contention and false sharing.

By introducing a Columnstore index, we increase:

  • parallelism

  • memory pressure

  • internal engine activity

In other words, this is where the fix should make the biggest difference.

To do this we create the following columnstore indexes on our table:

CREATE CLUSTERED COLUMNSTORE INDEX CCI_BigTable ON BigTable;



📊 STEP 6 – Benchmark Setup

In order to produce reliable and repeatable results, we installed a fresh copy of Windows 11 on VirtualBox.
We then installed SQL Server 2025 and updated it to CU2.

The virtual machine was configured with:

  • 4 CPU cores

  • 8 GB of RAM

To simulate a realistic and scalable workload, we developed a Python script that continuously creates sessions executing the following query:

SELECT SUM(c1), SUM(c2)
FROM BigTable
WHERE c1 % 10 = 0;

Every 5 seconds, a new session is added, gradually increasing concurrency until reaching 100 simultaneous sessions.

In parallel, a second Python script executes a stored procedure that periodically captures and stores key performance metrics into a SQL table.
This allows us to track CPU usage, contention, and scalability behavior over time.

Then we installed the CU3 on SQL SERVER e restarted it e running again our benchmark.

READY... FOR THE RESULTS?


📊 Benckmark results

I summarized all the results in a single chart, and I think it tells the whole story.













 
The results i got show a big difference updating to the CU3.

Load Factor 

The load Factor is where we can see the real Scalability Improvement.
  • CU2 (blue) grows much faster so CU2 accumulates more work in the queue, this is clear sign of contention
  • CU3 (orange) grows in a more controlled way, CU3 handles threads better, we have less congestion
💡 This is the first signal that the fix introduced in the CU3 reduces internal scheduler pressure


Collisions

In CU2 we have extremely high collisions (up to ~2M) while in CU3 we have much lower and more stable. This is exactly the behavior of false sharing
What it means:
multiple cores writing to the same cache line
constant cache invalidations
spinlocks going crazy

Oh yes! CU3 drastically reduces this phenomenon
👉 This chart alone proves that fix 4836787 works


SQL CPU 

Efficiency, Not Just Consumption!
In CU2 we have much higher CPU (obviuosly the values 140 means 100%)
In CU3 the CPU usage is significantly lower (~40)

👉 This is critical to understand:
It does NOT mean CU3 is slower
It means:
➡️ CU2 wastes CPU
➡️ CU3 uses CPU efficiently


Wait Time 

Less Waiting means More Throughput

In CU2 we have explosive growth (over 35 million ms) 

In CU3 this is much more controlled (~23 million ms)

This mean:

  • fewer global waits

  • fewer internal bottlenecks

  • better parallelism


Max Wait Time 

Latency Under Control!
CU2 we have peaks up to ~650 ms
CU3 they remain around ~400 ms

👉 This is extremely important for real-time workloads, APIs and OLTP systems

💡 CU3 reduces the
worst spikes, not just the average

Conclusion

👉 This benchmark proves that the fix:

✔ does not simply increase speed
improves scalability under parallel load
✔ reduces:

  • spinlock contention

  • cache invalidation

  • CPU waste



Final Verdict: Should You Upgrade to CU3?

If you run SQL Server on:

  • multi-core machines
  • high concurrency workloads
  • analytics / columnstore queries

👉 YES. Upgrade immediately.

This is not a cosmetic fix.

We proved that this is engine-level scalability improvement.

And those are rare… and extremely valuable.



No SQL Server instances were harmed during these tests… well, at least not too much

PS: Tell me if you like this format and if you are interested in exploring other patches or topics at this level


See you in the next deep dive 👈
Luca Biondi @2026

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!