Understand stemming in SQL Server to become a better DBA
Before we dive into today's topic, if you missed my previous post you can take a look at SQL Server 2025 CU4: Does JSON_CONTAINS Break Your Queries? Test Day 2. 👉 If you found this deep-dive helpful, feel free to check out the ads—your support helps me keep creating high-quality SQL Server content for the community. 19.04.2026
Understand stemming in SQL Server to become a better DBA
In this sunday post, I’ll show you how leveraging SQL Server's stemming capabilities can drastically improve your full-text search accuracy and reduce complex query overhead. Stop writing endless OR clauses and start letting the engine do the heavy lifting for you!
TL;DR
✔️ Stemming reduces words to their root form to find all grammatical variations ✔️FORMSOF(INFLECTIONAL, ...)eliminates the need for massiveLIKEorORchains ✔️ Full-Text Search language settings dictate whether stemming works 🌍 ✔️ You can querysys.fulltext_languagesto check your instance's supported stemmers ⚙️
Hi SQL SERVER Guys,
Welcome back to another sqlserverperform_ace post, we all know how important it is to make every second count when managing SQL Server environments. Dealing with text searches across millions of rows using standard LIKE operators is a guaranteed way to kill your server's performance. By mastering Full-Text Search (FTS) and its linguistic features, you can revolutionize the way you work with SQL databases.
Let's dive into these game-changing features!
🔍 What it really is: Stemming in SQL Server
Stemming is an advanced Information Retrieval technique that reduces words down to their linguistic root form (the "stem"). This allows different grammatical variations of a word to be treated as entirely equivalent by the search engine. For example, terms like “run”, “running”, and “ran” are considered related.
In SQL Server, this powerful functionality lives inside the Full-Text Search engine (introduced way back in SQL Server 7.0 and heavily optimized since). By wrapping your search terms in the FORMSOF(INFLECTIONAL, ...) predicate, the engine automatically expands the term into all its known variations based on the language's dictionary.
🎯 Why you need it
If you aren't using stemming, you are probably working too hard.
Stemming helps to:
- Improve search result quality: Users get what they mean, not just what they type.
- Avoid writing complex queries: Say goodbye to building dynamic SQL with multiple word variations.
- Mimic modern search engines: Make your internal application search behave like Google or Bing.
🚀 Testing the Stemming
/* ========================================================= 1. Create table Docs This table will store text data for Full-Text Search tests ========================================================= */ IF OBJECT_ID('dbo.Docs', 'U') IS NOT NULL DROP TABLE dbo.Docs; GO CREATE TABLE dbo.Docs ( Id INT IDENTITY(1,1) NOT NULL, Content NVARCHAR(MAX) NOT NULL, CONSTRAINT PK_Docs PRIMARY KEY CLUSTERED (Id) ); GO /* ========================================================= 2. Insert sample data Simple dataset to test stemming behavior ========================================================= */ INSERT INTO dbo.Docs (Content) VALUES (N'I run every day'), (N'She is running fast'), (N'They ran yesterday'); GO /* ========================================================= 3. Enable Full-Text Search on database Required before creating Full-Text indexes ========================================================= */ IF NOT EXISTS ( SELECT 1 FROM sys.fulltext_catalogs ) BEGIN EXEC sp_fulltext_database 'enable'; END GO /* ========================================================= 4. Create Full-Text Catalog Logical container for Full-Text indexes ========================================================= */ IF NOT EXISTS ( SELECT 1 FROM sys.fulltext_catalogs WHERE name = 'ftCatalog' ) BEGIN CREATE FULLTEXT CATALOG ftCatalog; END GO /* ========================================================= 5. Create Full-Text Index This enables text search capabilities on the Content column LANGUAGE 1033 = English (required for stemming) ========================================================= */ IF NOT EXISTS ( SELECT 1 FROM sys.fulltext_indexes WHERE object_id = OBJECT_ID('dbo.Docs') ) BEGIN CREATE FULLTEXT INDEX ON dbo.Docs ( Content LANGUAGE 1033 ) KEY INDEX PK_Docs ON ftCatalog WITH CHANGE_TRACKING AUTO; END GO /* ========================================================= 6. Force population (optional but recommended) Ensures index is immediately usable ========================================================= */ ALTER FULLTEXT INDEX ON dbo.Docs START FULL POPULATION; GO /* ========================================================= 7. Test query WITHOUT stemming Only exact match will be returned ========================================================= */ SELECT * FROM dbo.Docs WHERE CONTAINS(Content, '"run"'); GO /* ========================================================= 8. Test query WITH stemming Expands search to include run, running, ran ========================================================= */ SELECT * FROM dbo.Docs WHERE CONTAINS(Content, 'FORMSOF(INFLECTIONAL, "run")'); GO
💣 A) Example without stemming
Let's look at standard behavior when searching a Full-Text indexed table.
VALUES
(N'I run every day'),
(N'She is running fast'),
(N'They ran yesterday');
-- Standard Contains Search
SELECT *
FROM Docs
WHERE CONTAINS(Content, '"run"');
👉 Result:
- ✔️ Finds: run
- ❌ Does NOT find: running, ran
🚀 B) The Solution: Example with stemming (inflectional search)
Now, let's inject the FORMSOF inflectional capability into the same query.
-- Inflectional Stemming Search SELECT * FROM Docs WHERE CONTAINS(Content, 'FORMSOF(INFLECTIONAL, "run")');
👉 Result:
- ✔️ run
- ✔️ running
- ✔️ ran
👉 This works seamlessly because the English language component in SQL Server includes a highly accurate stemmer.
🌍 Supported languages
Stemming is strictly language-dependent. The engine must know the linguistic rules of the text it is parsing.
✔️ Languages that typically support stemming (IStemmer included):
- English
- Italian
- French
- Spanish
- German
- Portuguese
- Dutch
❌ Languages that may NOT support stemming:
- Some Asian languages (e.g., Chinese and Japanese rely on word-breakers and N-grams, not traditional stemming).
- Less common or legacy Full-Text languages.
- Custom or unsupported language IDs (LCID).
👉 You can always check the installed languages and their word-breaker/stemmer registrations on your server by running:
SELECT * FROM sys.fulltext_languages;
🚀 My REAL Strategy
In my experience, developers often create Full-Text indexes without explicitly specifying the LANGUAGE attribute for the column, causing SQL Server to default to the server's default language. If your server is in English, but your data is in Italian, FORMSOF(INFLECTIONAL, ...) will apply English grammatical rules to Italian words, resulting in completely missed searches and confused end-users. Always bind the correct LCID to your full-text indexed columns during creation. Don't rely on server defaults!
📢 Support the Blog: Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content for the community.
Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion. Share if you like my posts!


Comments
Post a Comment