Check IO Bottlenecks in 45 Seconds. The "45 Seconds DBA Series" – What Real DBAs Check First | Part 5

Before you dive into today's topic, if you missed my previous post you can take a look at 



Check IO Bottlenecks in 45 Seconds. The "45 Seconds DBA Series" – What Real DBAs Check First | Part 5 🥇

In this post, I’ll show you how to definitively identify SQL Server I/O bottlenecks in under 45 seconds using targeted DMVs. Stop guessing between latency and throughput, and start fixing the real storage queues!

TL;DR

✔️ Latency vs Throughput: Latency is speed (ms), Throughput is volume (MB/s). Know the difference! 🧠
✔️ sys.dm_io_virtual_file_stats: The ultimate DMV to check database file read/write latency 🛠️
✔️ sys.dm_os_wait_stats: Exposes global IO waits like PAGEIOLATCH_SH and WRITELOG 💣
✔️ The 20ms Rule: Consistently > 20ms latency on data files means you have a serious storage bottleneck 🚨

Hi SQL SERVER Guys,

We all know how important it is to have performing environments. Storage is usually the slowest component in any database architecture. When your queries are lagging, memory looks fine, and the CPU isn't maxed out, your disks are likely crying for help. Let's dive into diagnosing I/O pressure instantly!

🧠 What It Really Is: Latency vs. Throughput

Before we run any scripts, we need to stop confusing these two terms:

  • 🏎️ Latency: How fast a single I/O operation completes. Measured in milliseconds (ms). If it takes 50ms to read a page, your query is stalling.
  • 🌊 Throughput: How much data can be pushed through the pipe per second. Measured in MB/s or GB/s.

💣 The trap: Your storage array might be pushing 2 GB/s of throughput (looking great on paper!), but if the latency for each individual 8KB page read is 40ms, SQL Server will feel incredibly slow. This is why you must measure latency to feel the pain.

🛠️ The Essential DMVs

To measure this in 45 seconds, we rely on two critical Dynamic Management Views:

  • 1. sys.dm_io_virtual_file_stats:
    What it really is: This DMV tracks the total number of I/O operations and the total time spent waiting for those operations, broken down by individual database files (MDF, NDF, LDF). It's the gold standard for finding exactly which drive or file is bottlenecked.
  • 2. sys.dm_os_wait_stats:
    What it really is: This DMV shows what SQL Server threads are waiting on globally. If you see high waits for PAGEIOLATCH_SH (reading data from disk to memory) or WRITELOG (writing to the transaction log), you have systemic I/O pressure.

⚡ The 45-Second I/O Latency Check

Run this script to calculate the average read and write latency (in ms) for every file in your instance since the last restart:

SELECT 
    DB_NAME(vfs.database_id) AS [Database Name],
    mf.physical_name AS [Physical File],
    vfs.num_of_reads,
    (vfs.io_stall_read_ms / NULLIF(vfs.num_of_reads, 0)) AS [Avg Read Latency (ms)],
    vfs.num_of_writes,
    (vfs.io_stall_write_ms / NULLIF(vfs.num_of_writes, 0)) AS [Avg Write Latency (ms)]
FROM 
    sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
JOIN sys.master_files AS mf 
    ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
ORDER BY 
    [Avg Read Latency (ms)] DESC;

✔️ The Goal: You want these numbers under 5-10ms for solid state storage (SSD/NVMe).
💣 The Red Line: If you are consistently seeing numbers > 20ms, your storage is a massive bottleneck.

 The Wait Stats Check (sys.dm_os_wait_stats)

This DMV shows what SQL Server threads are waiting on globally. If you see high waits for PAGEIOLATCH_SH (reading data from disk to memory) or WRITELOG (writing to the transaction log), you have systemic I/O pressure.

Run this script to isolate and identify your top I/O related wait types:

SELECT 
    wait_type, 
    wait_time_ms / 1000.0 AS [Wait Time (Seconds)],
    waiting_tasks_count AS [Wait Count],
    (wait_time_ms / NULLIF(waiting_tasks_count, 0)) AS [Avg Wait Time (ms)]
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGEIOLATCH_%' 
   OR wait_type = 'WRITELOG'
   OR wait_type = 'ASYNC_IO_COMPLETION'
ORDER BY wait_time_ms DESC;

🚀 My REAL Strategy

In my experience, DBAs often blame the SAN, the cloud provider, or the storage team immediately when they see high latency. Stop! Before you do that, check your queries! 🧪

A missing index causing a massive table scan will artificially inflate your I/O latency because it overwhelms the storage array with millions of unnecessary read requests. The storage isn't slow; it's just being asked to do too much garbage work. Always tune the query first, fix your indexes to reduce logical reads, and then if the latency is still high, go fight with the hardware team.

📢 Support the Blog: Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content for the community.

Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion. Share if you like my posts!

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!