Check Wait Stats in 45 Seconds. From Symptoms to Root Clause. The "45 Seconds DBA Series" – What Real DBAs Check First | Part 9


Check Wait Stats in 45 Seconds

From Symptoms to Root Cause. The "45 Seconds DBA Series" – Part 9 🥇

Before we dive into today's topic, if you missed my previous post you can take a look at A recap of my April 2026 posts.

👉 If you found this deep-dive helpful, feel free to check out the ads—your support helps me keep creating high-quality SQL Server content for the community.

If you aren't looking at Wait Stats, you aren't tuning performance—you're guessing. In this post, I will show you how to cut through the noise and identify exactly what is throttling your SQL Server in under 45 seconds.

✔️ Wait Stats = The Truth: They tell you why a thread is not moving. 🛠️

✔️ Resource vs. Signal: Distinguish between waiting for a disk (Resource) and waiting for a CPU (Signal). 📦

✔️ Filter the Noise: Real DBAs ignore background tasks like Lazy Writer and Checkpoint to focus on user impact. 🚀

✔️ Baseline is King: Accumulative stats mean nothing without knowing what's "normal" for your workload. ✔️

Hi SQL SERVER Guys,

When a user complains that "the database is slow," the worst thing you can do is start changing indexes or adding RAM blindly. SQL Server is a "Wait-Based" engine. Every time a request can't proceed, it records exactly why it stopped. Whether it's waiting for a lock, a data page from disk, or a CPU scheduler, the answers are already there. Let's find the truth in 45 seconds...

What It Really Is: The Waiting Game

Every request in SQL Server goes through three states: Running, Runnable, and Suspended. Wait stats are generated when a thread moves into the Suspended state because it needs a resource (like a page lock) that isn't immediately available.

💣 The Problem:

Standard monitoring often gives you a "Top Waits" list that includes background noise. If you see LAZYWRITER_SLEEP or LOGMGR_QUEUE at the top, you might panic for no reason. These are benign. Real pressure comes from:

  • PAGEIOLATCH_XX: Your storage subsystem can't keep up with the I/O demand.
  • SOS_SCHEDULER_YIELD: Your queries are burning through their 4ms quantum and waiting for a CPU turn.
  • LCK_M_X: Blocking is killing your concurrency.

🧪 The 45-Second Diagnostic Query

This script excludes the "benign" waits and calculates the percentage of impact for each wait type. It’s the first script I run on any server I’m called to troubleshoot.

-- 🔍 Global Wait Stats Diagnostic Query
WITH [Waits] AS (
    SELECT
        [wait_type],
        [wait_time_ms] / 1000.0 AS [WaitS],
        ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
        [signal_wait_time_ms] / 1000.0 AS [SignalS],
        [waiting_tasks_count] AS [WaitCount],
        100.0 * [wait_time_ms] / SUM([wait_time_ms]) OVER() AS [Percentage]
    FROM sys.dm_os_wait_stats
    WHERE [wait_type] NOT IN (
        'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'OLD_RESOURCES_CHECK',
        'SQLTRACE_BUFFER_FLUSH', 'SLEEP_TASK', 'CHECKPOINT_QUEUE',
        'XE_TIMER_EVENT', 'REQUEST_FOR_DEADLOCK_SEARCH')
)
SELECT TOP 10 * FROM [Waits]
WHERE [WaitS] > 0
ORDER BY [WaitS] DESC;
    

For detailed definitions of each wait type, refer to the official Microsoft Documentation.

🚀 My REAL Strategy

In my 25 years of experience, I’ve seen DBAs make the mistake of clearing wait stats (DBCC SQLPERF) just to "see what's happening now." Don't do it. You lose your historical context.

Instead, follow my "Signal vs Resource" rule:

  • 🧠 High Signal Waits (>20%): This is a clear indicator of CPU pressure. Your threads are ready to run but the CPU is too busy to pick them up. Don't buy more cores; optimize your code to reduce context switching.
  • 🧠 High Resource Waits: Focus on the specific resource. If PAGEIOLATCH_SH is your top wait, check your Top Logical Reads. High I/O waits are often just poor indexing forcing massive scans into memory.
  • 🧠 CXPACKET is a Symptom: If you see CXPACKET, don't just disable parallelism. Check for CXCONSUMER. If CXPACKET is high alongside LATCH_XX, you likely have skewed data distribution in your parallel threads.

📢 Support the Blog: Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content for the community.

Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion. Share if you like my posts!

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!