What is CPU starvation in SQL Server?

In SQL Server, CPU starvation is a scheduler fairness problem where workers fail to yield CPU fairly, causing runnable queues to grow while other requests wait excessively. This occurs even when total CPU usage is not fully saturated.

Why does SOS_SCHEDULER_YIELD occur in SQL Server?

In SQL Server, SOS_SCHEDULER_YIELD occurs when a worker voluntarily releases CPU after consuming its scheduling quantum. High values are normal under CPU activity, but persistent runnable queue growth indicates scheduler pressure or starvation.

How do hash indexes affect CPU starvation in SQL Server?

In SQL Server, hash indexes can worsen CPU starvation when bucket_count sizing is incorrect and collision chains grow excessively. Large hash chains increase garbage collection traversal costs and reduce scheduler fairness under heavy Hekaton workloads.

What does sys.dm_db_xtp_hash_index_stats show in SQL Server?

In SQL Server, sys.dm_db_xtp_hash_index_stats is used to analyze hash index bucket distribution, collision levels, average chain length, and maximum chain length. This DMV helps diagnose inefficient Hekaton hash index sizing and garbage collection pressure.

Why can SQL Server be slow even when CPU usage is low?

In SQL Server, low average CPU usage can still hide scheduler unfairness and worker starvation. Runnable queues may continuously grow while specific workers monopolize schedulers, creating latency spikes, login delays, and intermittent timeouts.

What did SQL Server CU5 fix for In-Memory OLTP?

In SQL Server, CU5 fixes scheduler fairness issues during In-Memory OLTP garbage collection scans by adding periodic yields. This reduces CPU starvation risks caused by long-running hash index traversal operations under high concurrency workloads.

SQL SERVER. A deep analysis on CPU Starvation

Why a server at 40% CPU can still behave like a completely saturated system 💣

Deep dive into SOS Scheduler fairness, Hekaton garbage collection, hash index scans, and why SQL Server CU5 matters far more than most DBAs think.

Before we dive into today's topic, if you missed my previous post you can take a look at SQL SERVER CU5: Fixes CPU starvation issues

👉 If you found this deep-dive helpful, feel free to check out the ads—your support helps me keep creating high-quality SQL Server content for the community.

🧠 TL;DR

✔️ SQL Server uses cooperative scheduling through SOS Scheduler and workers must voluntarily yield CPU

✔️ SQL Server CU5 improves scheduler fairness during In-Memory OLTP hash index garbage collection scans 🚀

✔️ CPU starvation can occur even when total CPU usage looks moderate 💣

✔️ Bad hash index bucket sizing and large version chains can dramatically increase GC traversal costs ⚡

✔️ Persistent runnable_tasks_count growth is often more dangerous than raw CPU percentage 🔍

🚀 The Hook

In this post, I’ll show you why CPU starvation is one of the most misunderstood performance problems in SQL Server.

⚡ More importantly, you will understand why SQL Server CU5 quietly fixes a potentially catastrophic scheduler fairness problem inside In-Memory OLTP garbage collection scans.

💣 CPU starvation is one of the few SQL Server problems where average CPU usage can completely hide catastrophic scheduler unfairness.

⚡ A server at 40% CPU can still behave like a server at 100% CPU when scheduler fairness collapses.

Hi SQL SERVER Guys and Gals,

Many DBAs monitor CPU percentage.

Very few monitor scheduler fairness.

And that is exactly where some of the nastiest SQL Server production problems hide.

The CU5 fix around In-Memory OLTP garbage collection is important because it touches the heart of SQL Server execution internals:

⚡ SOS Scheduler
⚡ Cooperative scheduling
⚡ Worker yielding
⚡ Runnable queues
⚡ Scheduler monopolization

🧠 What Cooperative Scheduling Really Means

SQL Server does not rely on Windows preemptive scheduling alone.

Internally, SQL Server uses cooperative scheduling through SOS Scheduler.

Workers are expected to voluntarily yield execution periodically.

If an internal task runs too long without yielding, runnable queues grow and other workers experience starvation.

⚡ Key concepts every serious DBA should understand:

SOS Scheduler → SQL Server internal scheduler abstraction
Worker → execution entity assigned to tasks
Quantum → execution time slice
Yield → voluntary CPU release
Runnable Queue → waiting workers ready to execute
Scheduler Fairness → balanced CPU access across workers

💣 The problem starts when some workers fail to yield frequently enough.

💣 Why Hash Indexes Can Make Things Worse

In-Memory OLTP hash indexes are extremely fast for point lookups.

But internally they rely on buckets and hash chains.

Under bad conditions, these chains become expensive to traverse.

⚡ Incorrect bucket_count
⚡ Heavy delete workloads
⚡ Skewed key distribution
⚡ High collision rates
⚡ Massive row version churn

Large hash chains increase garbage collection traversal costs.

Under high churn workloads, GC workers may spend excessive time scanning version chains and hash buckets.

⚡ That is where scheduler monopolization begins.

🧠 Garbage Collection Internals

This is where things become very interesting under the hood.

In-Memory OLTP uses:

⚡ Row versioning
⚡ Timestamp visibility
⚡ Version chains
⚡ Epoch-based garbage collection
⚡ Memory residency

In-Memory OLTP does not immediately remove deleted rows.

Older row versions remain visible for active transactions until epoch advancement allows garbage collection cleanup.

💣 This means garbage collection is not optional.

It is a core architectural requirement of Hekaton.

And under heavy workloads, garbage collection can become extremely CPU-intensive.

🔍 DIAGNOSIS

Start by checking scheduler pressure.


-- 🔍 Scheduler Pressure Diagnostic Query

SELECT 
    scheduler_id,
    cpu_id,
    current_tasks_count,
    runnable_tasks_count,
    active_workers_count,
    pending_disk_io_count,
    load_factor
FROM sys.dm_os_schedulers
WHERE status = 'VISIBLE ONLINE';

⚡ Persistent runnable_tasks_count greater than zero is suspicious.

💣 Especially on systems using Hekaton workloads.

🧪 QUERY

Now inspect SOS_SCHEDULER_YIELD waits.


-- 🔍 Top Scheduler Waits

SELECT 
    wait_type,
    waiting_tasks_count,
    wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type = 'SOS_SCHEDULER_YIELD';

⚡ Important:

Healthy yielding is NORMAL
SOS_SCHEDULER_YIELD alone is NOT automatically bad
Runnable queue growth is the real danger signal

💣 The issue appears when workers fail to yield fairly while runnable queues continue growing.

🧪 Hash Index Statistics (THIS DMV IS GOLD)


-- 🧪 Hash Index Statistics

SELECT 
    OBJECT_NAME(object_id) AS table_name,
    name,
    total_bucket_count,
    empty_bucket_count,
    avg_chain_length,
    max_chain_length
FROM sys.dm_db_xtp_hash_index_stats;

⚡ This DMV can reveal:

Bad bucket sizing
Excessive collisions
Long traversal chains
Potential GC traversal amplification

🧠 CPU Saturation vs CPU Starvation

CPU Saturation	CPU Starvation
CPU genuinely full	Scheduler fairness collapse
High query CPU cost	Worker monopolization
Parallelism pressure	Yield imbalance
System busy	System unfair

💣 Real Symptoms DBAs Actually See

⚡ CPU not at 100% but system extremely slow
⚡ Random latency spikes
⚡ Login delays
⚡ AG synchronization lag
⚡ THREADPOOL waits
⚡ Query timeouts
⚡ Runnable queue spikes
⚡ Intermittent stalls

💣 Many DBAs misdiagnose these as generic CPU issues.

In reality, the problem may be scheduler fairness collapse.

⚡ NUMA and Scheduler Imbalance

Starvation often does NOT affect all schedulers equally.

This becomes especially visible on:

NUMA systems
Soft-NUMA environments
Large core-count servers

⚡ Some schedulers may become “hot” while others remain underutilized.

This creates:

Uneven worker distribution
Localized scheduler pressure
Artificial latency spikes

🚀 My REAL Strategy

In real production systems, I NEVER evaluate only raw CPU percentage.

⚡ I correlate:

SOS_SCHEDULER_YIELD waits
Runnable queue growth
THREADPOOL waits
Hekaton workload intensity
Hash index chain statistics
NUMA imbalance
Latency spikes

💣 If runnable queues stay elevated for sustained periods, I immediately suspect scheduler unfairness.

🚀 My recommendations:

Apply CU5 if you use In-Memory OLTP heavily
Continuously monitor runnable_tasks_count
Validate hash index bucket sizing
Avoid oversized bucket_count configurations
Monitor version chain growth
Watch for scheduler imbalance across NUMA nodes

📊 TAKEAWAY

SQL Server CU5 is not “just another cumulative update.”

⚡ It directly improves scheduler fairness inside one of the most CPU-sensitive areas of the SQL Server Engine.

For latency-sensitive systems such as:

Trading platforms
Telemetry ingestion
IoT pipelines
Real-time APIs
Microservices backends

💣 Even small starvation bursts can destroy P99 latency.

And that is exactly why understanding SOS Scheduler internals is no longer optional for serious SQL Server performance troubleshooting.

📚 Official References

📢 Support the Blog: Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content for the community.

Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion. Share if you like my posts!

Search This Blog

SQL Server Performance & Troubleshooting – Where Milliseconds Matter 🚀

SQL SERVER CPU Starvation: When 40% CPU Behaves Like 100%

SQL SERVER. A deep analysis on CPU Starvation

🧠 TL;DR

🚀 The Hook

Hi SQL SERVER Guys and Gals,

🧠 What Cooperative Scheduling Really Means

💣 Why Hash Indexes Can Make Things Worse

🧠 Garbage Collection Internals

🔍 DIAGNOSIS

🧪 QUERY

🧪 Hash Index Statistics (THIS DMV IS GOLD)

🧠 CPU Saturation vs CPU Starvation

💣 Real Symptoms DBAs Actually See

⚡ NUMA and Scheduler Imbalance

🚀 My REAL Strategy

📊 TAKEAWAY

📚 Official References

Performance Tuning Knowledge Hub

🚀 45-Seconds DBA

🔍 Tuning & Indexing

🤖 SQL 2025 & AI

🧠 Internals

Comments

Post a Comment

I Post più popolari

SQL Server, Sub-Queries vs. Cross Apply

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL SERVER CPU Starvation: When 40% CPU Behaves Like 100%

SQL SERVER. A deep analysis on CPU Starvation

🧠 TL;DR

🚀 The Hook

Hi SQL SERVER Guys and Gals,

🧠 What Cooperative Scheduling Really Means

💣 Why Hash Indexes Can Make Things Worse

🧠 Garbage Collection Internals

🔍 DIAGNOSIS

🧪 QUERY

🧪 Hash Index Statistics (THIS DMV IS GOLD)

🧠 CPU Saturation vs CPU Starvation

💣 Real Symptoms DBAs Actually See

⚡ NUMA and Scheduler Imbalance

🚀 My REAL Strategy

📊 TAKEAWAY

📚 Official References

Related Posts

Performance Tuning Knowledge Hub

🚀 45-Seconds DBA

🔍 Tuning & Indexing

🤖 SQL 2025 & AI

🧠 Internals

Comments

Post a Comment

I Post più popolari