Check TempDB Health in 45 Seconds. The "45 Seconds DBA Series" – What Real DBAs Check First | Part 1 🥇

Before we dive into today's topic, if you missed my previous post you can take a look at SQL Server 2025 CU4: Are Your Queries Lying to You? The Hidden Full-Text Search Trap (Test Day 5)
👉
If you found this deep-dive helpful, feel free to check out the ads—your support helps me keep creating high-quality SQL Server content for the community.

Check TempDB Health in 45 Seconds

In this post, I’ll show you how to detect TempDB pressure in under 45 seconds and—more importantly—how to interpret it correctly.
Because identifying the real bottleneck is what separates reactive DBAs from performance engineers.


⚡ TL;DR

✔️ TempDB pressure is a symptom, not the root cause 💣
✔️ InternalObjectsMB reveals memory and plan issues 🧠
✔️ PAGELATCH waits = concurrency bottlenecks ⚠️
✔️ Spills are silent killers of performance
💥


Hi SQL SERVER Guys,

Welcome to the first part of the 45 Seconds DBA Series: a collection of essential, high-impact posts that every DBA should have within reach. Critical insights, designed for quick reference.

When your server is slow, but CPU and IO metrics look fine. When execution plans look acceptable, yet something is still off. When queries are unstable—fast one moment and crawling the next—and users are complaining even though you don’t see anything obvious...

This is where many DBAs waste hours.

👉 Remember: TempDB is often involved, but it is almost never the root cause.


🧠 First: Change Your Mindset

TempDB is NOT a “service database”.

👉 It is the hidden operational engine of SQL Server.

  • Intermediate operations (SORT, HASH, SPOOL)
  • Memory pressure handling
  • Row versioning (Snapshot Isolation)
  • Temporary structures

💣 TempDB is where SQL Server compensates when something is inefficient.

👉 If TempDB is under pressure… the real issue is usually elsewhere.


🧪 Step 1 — Understand HOW TempDB is Used

SELECT 
    SUM(user_object_reserved_page_count) * 8 / 1024.0 AS UserObjectsMB,
    SUM(internal_object_reserved_page_count) * 8 / 1024.0 AS InternalObjectsMB,
    SUM(version_store_reserved_page_count) * 8 / 1024.0 AS VersionStoreMB,
    SUM(unallocated_extent_page_count) * 8 / 1024.0 AS FreeSpaceMB
FROM sys.dm_db_file_space_usage;

This is a simple, yet fundamental DMV because it shows:


✔️ UserObjectsMB

This metric represents the total disk space (in MB) occupied by objects explicitly created by users or applications within a session

While most DBAs think only of #temp tables and table variables, there is a common oversight here: indexes.

Keep in mind: If a developer creates a 10MB temporary table but applies five heavy indexes to it, this number will skyrocket!  🚀

To be more precise: it includes the base heap/clustered index plus all non-clustered indexes and statistics associated with those user objects.


  • 💣 My Senior Insight 

  • A high value here is a "smoking gun" for application or ETL logic issues. It means your code is either materializing too much data or failing to drop large temporary structures, leading to unnecessary disk pressure.

  • 👉 So High value = application or ETL logic issue..


✔️ InternalObjectsMB 

This is your key metricInternalObjectsMB represents the space SQL Server claims because it has run out of workspace in memory (RAM).

While "bad cardinality estimates" is a common term, let’s be more precise: this value is the definitive signal that the Memory Grant assigned to the query was too small.

This is called the "Liar" Scenario because if SQL Server requests 50MB based on a flawed estimate, but the actual workload requires 500MB. That massive 450MB delta doesn't just disappear—it gets dumped right here into TempDB.

This category includes:

- Sorts: ORDER BY operations that can't fit in the granted memory.
- Hash Joins & Aggregates: Large joins or GROUP BY operations.
- Spools & Worktables: Intermediate storage used during query execution.


👉 If this value grows:         
  • Memory is insufficient: Your buffer pool is under pressure.
  • Bad Cardinality Estimates: The optimizer is guessing wrong because of stale statistics.           
  • Poor Execution Plans: The engine is choosing expensive operators that force disk spills.
💥 This is where performance dies because you are swapping nanosecond-speed RAM for millisecond-speed Disk I/O.

⚠️ VersionStoreMB

If this value refuses to drop, your Version Store Cleanup Filter is likely blocked by a "ghost" or orphaned transaction (e.g., a BEGIN TRAN without a COMMIT in a forgotten session). This is the classic reason why TempDB hits 100% capacity even when there is no active query load.


            Snapshot Isolation & RCSI.
            👉 Growth = long-running transactions + cleanup lag.


🚨 FreeSpaceMB

Never forget that Autogrow in TempDB is a serial operation! 
If you have 8 data files and one needs to grow, the entire system suffers a "micro-freeze" or sub-second stall. In high-concurrency environments where every millisecond counts, these stalls are performance killers.


            👉 Low free space = autogrowth → latency spikes.

🧪 Step 2 — Contention (the silent bottleneck)

SELECT 
    wait_type, 
    waiting_tasks_count, 
    wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type LIKE 'PAGELATCH_%'
ORDER BY wait_time_ms DESC;

This DMV shows PAGELATCH_UP or PAGELATCH_EX

When you run the query above and see PAGELATCH_UP or PAGELATCH_EX at the top of your wait stats, your first instinct might be to blame the storage sub-system. Hey stop right there... these are NOT disk waits. If you had a disk problem, you would see PAGEIOLATCH (notice the IO). 

PAGELATCH wait happens entirely in memory. We are experiencing memory structure contention.

🔍 What is actually happening?

Imagine TempDB as a high-speed motorway. To get onto the motorway, you need to pass through a toll booth (these are the internal metadata pages like PFS, GAM, and SGAM that track where space is free).

If you have 64 CPU cores trying to cram data into TempDB at the same time, but only one "toll booth" (one metadata page), the CPUs start queuing up. They aren't waiting for the disk; they are waiting for their turn to update the map of TempDB in RAM.

💣 Why you have Contention:

  • Too many concurrent allocations: Your workload is spawning dozens of threads that all want to create #temp tables at the exact same millisecond.

  • Hot Pages: Multiple sessions are fighting over the same data page in memory.

  • Bad TempDB Configuration: You haven't followed the "Pro-Rule" of having multiple data files.

💥 The "Ghost" Result:

This is the most frustrating part of Contention. If you look at your monitoring tools, you will see:

  1. Low CPU: The processors are idle because they are stuck waiting for a latch.

  2. Low Throughput: The server is barely moving data.

  3. No Visible Blocking: You check sys.dm_exec_requests and don't see any standard SQL blocking (LCK_M_...), yet everything is crawling.


        💣 My Senior Insight: 

        If you see PAGELATCH_UP / PAGELATCH_EX, your TempDB is "bottlenecked" by its own internal management. The engine is tripping over itself. This is often the primary reason why adding more CPU power doesn't make a slow SQL Server go any faster.

        💣 If you see PAGELATCH_UP / PAGELATCH_EX means TempDB contention:

    • Too many concurrent allocations
    • Hot pages
    • Bad TempDB configuration

        💥 Result:

    • Low CPU
    • Low throughput
    • No visible blocking

🧪 Step 3 — Spills (The Silent Killers)

SELECT TOP 10
    qs.total_spills,
    qs.execution_count,
    qs.total_spills / qs.execution_count AS avg_spills,
    qt.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qs.total_spills > 0
ORDER BY avg_spills DESC;

This DMV shows spills...

If InternalObjectsMB is the smoke, Spills are the fire. A "Spill" occurs when the SQL Server Optimizer estimates that a specific operation (like a Sort or a Hash Join) needs a certain amount of memory (the Memory Grant). If that estimate is wrong and the data doesn't fit in the allocated RAM, SQL Server doesn't stop the query. Instead, it "spills" the excess data into TempDB.

Its is a "Silent" Killer because you have no errors, no warning (Unless you are looking at the Actual Execution Plan in SSMS and hovering over a specific operator, you won't see the yellow warning triangle). Instead of have a massive Latency because You are moving from the speed of light (Memory) to the speed of a bicycle (Disk I/O).


        👉 A spill happens when memory is not enough and SQL Server writes to TempDB.         It happens when:

    • Wrong Statistics: The optimizer thinks 100 rows are coming, but 1,000,000 show up.
    • Bad Cardinality: Inaccurate math in the execution plan leads to a tiny memory grant.
    • Parameter Sniffing: A plan optimized for a small dataset is reused for a massive one.
    • Skewed Data: High variance in data distribution confuses the engine's estimates.

        💥 No errors. No warnings. Just slow queries.


💥 The Senior Insight!

When you run the query above and see a high avg_spills count, you have found your "Performance Vampire"!

Stop looking at TempDB configuration: If a query is spilling, adding more disks or files won't fix the underlying problem. Instead you need to fix the Plan. Update your statistics, use a plan hint, or rewrite the query to avoid large, unindexed sorts.

🧪 Step 4 — Who is Using TempDB NOW

SELECT 
    session_id,
    user_objects_alloc_page_count * 8 / 1024.0 AS UserMB,
    internal_objects_alloc_page_count * 8 / 1024.0 AS InternalMB
FROM sys.dm_db_session_space_usage
ORDER BY InternalMB DESC;

When the server is screaming and you need a culprit, this is your lens into real-time activity. This query isolates the specific sessions that are currently draining your TempDB resources. 

👉 How to Interpret the Live Data

  • UserMB is high: This is a specific session running a massive #temp table or a table variable. This is Application Logic: You need to talk to the developer about their data-staging strategy.

  • InternalMB is high: This is a session suffering from a massive Spill. It is likely stuck in a giant Sort or Hash Join. This is a Plan Issue. Adding disk will NOT fix the problem because the bottleneck isn't the capacity of the drive.

📢Conclusion

You don't need to monitor everything: You just need to monitor the right things. 
TempDB is the first place SQL Server shows signs of exhaustion: If you can read these signs in 45 seconds, you are already ahead of 90% of the DBAs out there!

We can say that TempDB does NOT slow down SQL Server, it exposes that SQL Server is already struggling.

🚀 My REAL Strategy

  • Identify spills first
  • Analyze the ACTUAL execution plan
  • Validate cardinality & statistics
  • Check memory grants

👉 TempDB is the effect... Not the cause.

  • ✔️ Never ignore InternalObjectsMB
  • ✔️ Spills are always a signal
  • ✔️ Version store = transaction issue
  • ✔️ PAGELATCH = concurrency problem

📚 Official Sources

📢 Support the Blog: Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content for the community.

Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion. Share if you like my posts!

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!