SQL Server 2025 CU3 Backup/Restore Performance: Benchmark and the Real Impact of Patch 4836855

🔥 SQL Server 2025 CU3 vs CU2 – Testing Fix 4836855 (Backup/Restore I/O Alignment)

Hi SQL SERVER Guys,

In the previous post we started analyzing performance differences between SQL Server 2025 CU2 and CU3. We showed real benchmark results and performances increment nobody talks about.

👉 If you missed it, here is the previous article:

SQL Server 2025 CU3 – The Hidden Performance Fix Nobody Talks About (False Sharing Benchmark)

Today we continue our investigation and analyze the performance improvements introduced by the patch below:

🧪 Fix 4836855 – Backup/Restore I/O Alignment

This fix aims to improve how SQL Server handles I/O operations during backup and restore.

👉 In simple terms:

  • Reduces inefficient I/O alignment
  • Improves throughput during backup/restore
  • Reduces latch contention on I/O operations
  • Removes hidden bottlenecks caused by internal serialization

💡 Important insight:

The fix doesn’t make your disk faster — it removes artificial serialization.

So… do you want faster backups and restores? Keep reading.


⚙️ Benchmark Setup

The goal is to compare:

  • SQL Server 2025 CU2
  • SQL Server 2025 CU3

Same hardware, same database, same workload.

We want to isolate the effect of fix 4836855.


🧪 Test Query

This time we use a simple but very effective workload, the database used in the previous post.


-- Backup test
BACKUP DATABASE biondi_test
TO DISK = 'C:\temp\biondi_test.bak'
WITH INIT, STATS = 5;

-- Restore test
RESTORE DATABASE biondi_test_copy
FROM DISK = 'C:\temp\biondi_test.bak'
WITH MOVE 'biondi_test' TO 'C:\temp\biondi_test_copy.mdf',
     MOVE 'biondi_test_log' TO 'C:\temp\biondi_test_copy.ldf',
     REPLACE, STATS = 5;

📊 What We Measure

To understand if the fix really works, we focus on two key areas:

1️⃣ Throughput (MB/s)

  • Measured directly from backup output

👉 Expectation:

  • CU3 should show higher MB/s

2️⃣ I/O Latch Contention

We specifically monitor:

  • PAGEIOLATCH_SH
  • PAGEIOLATCH_EX
  • PAGEIOLATCH_UP

👉 These waits indicate pressure on physical I/O operations.

👉 Expectation:

  • Lower wait time in CU3
  • Lower average wait per task

🔍 How to Measure Waits


SELECT
    wait_type,
    wait_time_ms,
    waiting_tasks_count,
    wait_time_ms / NULLIF(waiting_tasks_count,0) AS avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGEIOLATCH%'
ORDER BY wait_time_ms DESC;

📈 What We Expect to See

If fix 4836855 is effective:

  • 🚀 Higher backup/restore throughput
  • 🔽 Reduced PAGEIOLATCH waits
  • ⚡ Lower average wait time
  • 📉 Better scalability under load

If nothing changes:

  • Same MB/s
  • Same latch contention

🧠 Results

The test has been repeated 8 times and the outlier values were excluded. 
Values are taken directly from the output of the backup / restore commands and are expressed in in MB/sec.

Throughput test


🠆 Backup results


CU2 vs. CU3


Version Avg(MB/s) Var Dev. Std
CU2 562.90 105.59 10.28
CU3 565.02 17.42 4.17

Performances: 

CU3 is only slightly faster than CU2: +2.1 MB/s (~+0.4%).
This is a very small difference, so in terms of raw throughput, it’s not a game-changer.

Stability:

This is the real difference!
The Standard deviation measured with CU2 is 10.28 while is 4.17 with CU3.
Variance measured in CU2 is roughly 6× more variable than CU3

👉 This means that CU3 is significantly more stable, which is critical for predictable backup performance.


🠆 Restore results

CU2 vs. CU3a



VersionAvg(MB/s)VarDev. Std
CU2465.0140.066.33
CU3465.3121.624.65

Performances: 

Virtually identical. CU3 exibith +0.3 MB/s so there is not real speed gain.

Stability:

The Std.Dev decrease from 6.33 to 4.65. Restore is 25/30% more stable.


I/O waits test

In this test we measured the PAGEIOLATCH% wait types. 
We used then following procedure for test the backup statement.
We replaced the backup statements with the restore one for the restore test.
drop table WaitStats_before
drop table WaitStats_after

SELECT *
INTO WaitStats_before
FROM sys.dm_os_wait_stats;

BACKUP DATABASE TestDb TO DISK = 'C:\temp\TestDb_CU3.bak' WITH INIT, STATS = 5;

SELECT *
INTO WaitStats_after
FROM sys.dm_os_wait_stats;

SELECT
    a.wait_type,
    a.wait_time_ms - b.wait_time_ms AS delta_wait_ms,
    a.waiting_tasks_count - b.waiting_tasks_count AS delta_tasks
FROM WaitStats_after a
JOIN WaitStats_before b
    ON a.wait_type = b.wait_type
WHERE a.wait_time_ms > b.wait_time_ms
AND a.wait_type LIKE 'PAGEIOLATCH%'
ORDER BY delta_wait_ms DESC;

🠆 Backup:


We analyzed wait stats during backups, focusing on PAGEIOLATCH_UP and PAGEIOLATCH_EX waits — these reflect typical page I/O operations.

We looked at two key metrics:

  • delta_wait_ms → additional wait time per operation
  • delta_tasks → number of threads waiting

The CU2
  • delta_wait_ms: 1–3 ms (most frequent 2–3 ms)
  • delta_tasks: 1–15 (peaks between 10–13 threads)
  • Wait types:
    • Predominant: PAGEIOLATCH_UP
    • Minor: PAGEIOLATCH_EX (sporadic exclusive locks)
Interpretation:
  • Waits are fairly regular and manageable
  • Some peaks (delta_tasks = 15) indicate brief I/O contention
  • No extreme outliers → I/O is stable

The CU3
  • delta_wait_ms: 1–3 ms (similar to CU2)
  • delta_tasks: 1–13 (slightly more uniform than CU2)
  • Wait types:
    • Predominant: PAGEIOLATCH_UP
    • Minor: PAGEIOLATCH_EX

Interpretation:

  • CU3 shows slightly more uniform thread wait distribution
  • No extreme peaks → better management of thread contention
  • PAGEIOLATCH_UP remains consistent without performance degradation


Comparison CU2 vs CU3:

Key Metric CU2 CU3 Notes
Min delta_wait_ms 1 ms 1 ms identical
Max delta_wait_ms 3 ms 3 ms identical
Min delta_tasks 1 1 identical
Max delta_tasks 15 13 CU3 shows slightly fewer peaks
Wait type distribution UP + EX UP + EX similar
Stability good slightly better CU3 more uniform


Overall Analysis

I/O Performance
  • Delta wait times are nearly identical → backup speed is unaffected
  • Number of waiting threads slightly more stable in CU3 → lower contention

Stability

  • CU3 avoids highest thread peaks (15 vs 13)
  • More uniform distribution → better predictability

Finally:

  • Backup speed remains consistent
  • CU3 reduces jitter → SLA compliance more reliable
  • No extreme outliers → confirms improved I/O management vs CU2


🠆 Restore

If the backup command showed only a limited improvent, the restore one instead surprise me!

We analyzed wait stats during database restores, focusing on PAGEIOLATCH_SH, PAGEIOLATCH_UP, and PAGEIOLATCH_EX waits — these reflect typical page I/O operations under high load.

Two key metrics were considered:

  • delta_wait_ms → additional wait time per operation
  • delta_tasks → number of threads waiting


The CU2
  • delta_wait_ms: values range from 1 ms up to 98 ms, with the majority between 6–32 ms
  • delta_tasks: values range from 28 to 340 threads, with frequent peaks around 30–102 threads
  • Wait types:
    • Predominant: PAGEIOLATCH_SH and PAGEIOLATCH_UP
    • Minor: PAGEIOLATCH_EX (sporadic exclusive locks)
Interpretation:
  • Waits are highly variable, with occasional extreme peaks
  • Significant spikes (delta_tasks = 340, delta_wait_ms = 98) indicate heavy I/O contention
  • Restore performance may be temporarily throttled by these extreme peaks
  • Overall, CU2 shows less predictable I/O behavior under restore load


The CU3
  • delta_wait_ms: values range from 1 ms up to 34 ms, most frequent between 6–30 ms
  • delta_tasks: values range from 3 to 112 threads, generally more uniform
  • Wait types:
    • Predominant: PAGEIOLATCH_SH and PAGEIOLATCH_UP
    • Minor: PAGEIOLATCH_EX

Interpretation:

  • CU3 reduces extreme wait times and thread contention peaks significantly
  • No delta_wait_ms exceeds 34 ms, and delta_tasks peaks at 112 → much smoother I/O behavior
  • Wait distribution is more uniform → restores are more predictable and stable
  • PAGEIOLATCH_SH and UP remain consistent without causing performance degradation 


Comparison CU2 vs CU3:

Key Metric CU2 CU3 Notes
Min delta_wait_ms 1 ms 1 ms identical
Max delta_wait_ms 98 ms 34 ms CU3 significantly lower → reduced peak waits
Min delta_tasks 28 3 CU3 lower minimum, less thread contention
Max delta_tasks 340 112 CU3 shows much lower peak tasks → improved stability
Wait type distribution SH + UP + EX SH + UP + EX similar, but CU3 peaks more controlled
Stability highly variable much better CU3 reduces extreme contention spikes


Overall Analysis

I/O Performance:

  • CU3 dramatically reduces peak wait times and threads waitingrestore is faster and more predictable

Stability:

  • CU3 avoids extreme spikes seen in CU2
  • Wait distribution is more uniform → better SLA predictability

Impact:

  • Restores are smoother and less prone to bottlenecks under CU3
  • No extreme outliers → overall I/O management is improved


📈 Conclusion

The “fix” introduced in CU3 appears to work as intended: there is less contention, shorter wait times, and more stable throughput. The data confirm both a reduction in latch spikes and a more uniform management of waiting threads.


🧠 Final Thoughts

This is a classic example where performance improvements are not about raw hardware speed, but about removing internal inefficiencies. Fix 4836855 targets exactly that.

👉 Hope you liked this post. I will be happy to read your suggestions and your comments. 

Stay tuned for the next posts!


Luca Biondi @2025

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!