I Removed 32 Seconds From This SQL Server Query ...Without Adding an Index! Part 7
I Removed 32 Seconds From This SQL Server Query Without Adding an Index
SQL Server Performance Series – No New Index. No Schema Change. Just pure engine optimization.
Hi guys,
This is not theory.
This is not lab-only tuning.
This happened on a real workload.
Original execution time: 42 seconds.
After optimization: 9.8 seconds.
No new index.
No schema change.
No hardware upgrade.
Just understanding how the engine works.
Happy reading — let’s make SQL Server fly.
The Problem
A reporting query running on a table with ~5 million rows.
Users were complaining about slow dashboard refresh.
CPU spikes appeared during heavy aggregation.
Environment:
- SQL Server 2019
- Compatibility Level 150
- Rowstore table
- No columnstore index
The Query
SELECT CustomerID,
SUM(Amount) AS TotalAmount
FROM SalesBig
GROUP BY CustomerID;
Simple. Clean. Nothing exotic.
Yet painfully slow.
Metrics Before
SET STATISTICS IO, TIME ON;
- Elapsed Time: 42,312 ms
- CPU Time: 38,774 ms
- Logical Reads: 18M+
- Operator: Hash Match (Row Mode)
Execution plan showed:
- Parallel scan
- Row Mode aggregation
- High CPU pressure
Clearly CPU-bound.
The Insight
The dataset was large.
The aggregation was heavy.
Compatibility level was 150.
Batch Mode on Rowstore was available — but SQL Server didn’t choose it.
The optimizer made a cost-based decision. So I tested an alternative.
The Change
SELECT CustomerID,
SUM(Amount) AS TotalAmount
FROM SalesBig
GROUP BY CustomerID
OPTION (USE HINT('ForceBatchMode'));
Execution plan changed instantly:
- Hash Match (Batch Mode)
- Vectorized processing
- Reduced CPU cycles per row
Metrics After
- Elapsed Time: 9,821 ms
- CPU Time: 8,104 ms
- Logical Reads: approximately the same
32 seconds removed.
4x faster execution.
Same hardware.
Same table.
Why It Worked
- Large dataset → ideal for vectorized execution
- Heavy aggregation → Batch Mode advantage
- Sufficient memory grant → no spills
- Parallelism + Batch Mode → significant CPU reduction
Batch Mode processes data in chunks instead of one row at a time. That drastically reduces CPU instructions.
What I Verified Before Calling It a Win
- No tempdb spills
- No memory pressure
- Consistent execution time across runs
- CPU reduction confirmed via DMV
SELECT total_worker_time,
total_elapsed_time,
execution_count
FROM sys.dm_exec_query_stats
ORDER BY total_worker_time DESC;
Important Warning
Forcing Batch Mode does NOT always improve performance.
If:
- Row count is small
- Query is an OLTP lookup
- Memory grant is insufficient
- Plan shape changes negatively
Then it may backfire.
The Real Lesson
Performance tuning is not about blindly adding indexes.
It’s about understanding how the execution engine really works.
The optimizer makes cost-based decisions. Sometimes testing alternative strategies reveals hidden performance potential.
Final Takeaways
- Measure first.
- Understand execution plans.
- Check compatibility level.
- Test Batch Mode carefully.
- Never assume — always benchmark.
32 seconds removed.
No index added.
No schema changed.
See you in the next deep dive — where we push SQL Server even harder
Luca Biondi @2026
Previous article: Make Your SQL Server Fly! Still on Batch Mode… The Part 6

Comments
Post a Comment