Understand stemming in SQL Server to become a better DBA

Before we dive into today's topic, if you missed my previous post you can take a look at SQL Server 2025 CU4: Does JSON_CONTAINS Break Your Queries? Test Day 2. 👉 If you found this deep-dive helpful, feel free to check out the ads—your support helps me keep creating high-quality SQL Server content for the community. 19.04.2026

Understand stemming in SQL Server to become a better DBA

In this sunday post, I’ll show you how leveraging SQL Server's stemming capabilities can drastically improve your full-text search accuracy and reduce complex query overhead. Stop writing endless OR clauses and start letting the engine do the heavy lifting for you!

TL;DR

✔️ Stemming reduces words to their root form to find all grammatical variations ✔️ FORMSOF(INFLECTIONAL, ...) eliminates the need for massive LIKE or OR chains ✔️ Full-Text Search language settings dictate whether stemming works 🌍 ✔️ You can query sys.fulltext_languages to check your instance's supported stemmers ⚙️

Hi SQL SERVER Guys,

Welcome back to another sqlserverperform_ace post, we all know how important it is to make every second count when managing SQL Server environments. Dealing with text searches across millions of rows using standard LIKE operators is a guaranteed way to kill your server's performance. By mastering Full-Text Search (FTS) and its linguistic features, you can revolutionize the way you work with SQL databases. 

Let's dive into these game-changing features!

🔍 What it really is: Stemming in SQL Server

Stemming is an advanced Information Retrieval technique that reduces words down to their linguistic root form (the "stem"). This allows different grammatical variations of a word to be treated as entirely equivalent by the search engine. For example, terms like “run”, “running”, and “ran” are considered related.

In SQL Server, this powerful functionality lives inside the Full-Text Search engine (introduced way back in SQL Server 7.0 and heavily optimized since). By wrapping your search terms in the FORMSOF(INFLECTIONAL, ...) predicate, the engine automatically expands the term into all its known variations based on the language's dictionary.

🎯 Why you need it

If you aren't using stemming, you are probably working too hard

Stemming helps to:

  • Improve search result quality: Users get what they mean, not just what they type.
  • Avoid writing complex queries: Say goodbye to building dynamic SQL with multiple word variations.
  • Mimic modern search engines: Make your internal application search behave like Google or Bing.

🚀 Testing the Stemming

In order to test the stemming you need to have installed the Full-Text feature component:


This is the code:

/* =========================================================
   1. Create table Docs
   This table will store text data for Full-Text Search tests
   ========================================================= */
IF OBJECT_ID('dbo.Docs', 'U') IS NOT NULL
    DROP TABLE dbo.Docs;
GO

CREATE TABLE dbo.Docs
(
    Id INT IDENTITY(1,1) NOT NULL,
    Content NVARCHAR(MAX) NOT NULL,
    CONSTRAINT PK_Docs PRIMARY KEY CLUSTERED (Id)
);
GO


/* =========================================================
   2. Insert sample data
   Simple dataset to test stemming behavior
   ========================================================= */
INSERT INTO dbo.Docs (Content)
VALUES 
(N'I run every day'),
(N'She is running fast'),
(N'They ran yesterday');
GO


/* =========================================================
   3. Enable Full-Text Search on database
   Required before creating Full-Text indexes
   ========================================================= */
IF NOT EXISTS (
    SELECT 1 
    FROM sys.fulltext_catalogs
)
BEGIN
    EXEC sp_fulltext_database 'enable';
END
GO


/* =========================================================
   4. Create Full-Text Catalog
   Logical container for Full-Text indexes
   ========================================================= */
IF NOT EXISTS (
    SELECT 1 
    FROM sys.fulltext_catalogs 
    WHERE name = 'ftCatalog'
)
BEGIN
    CREATE FULLTEXT CATALOG ftCatalog;
END
GO


/* =========================================================
   5. Create Full-Text Index
   This enables text search capabilities on the Content column
   LANGUAGE 1033 = English (required for stemming)
   ========================================================= */
IF NOT EXISTS (
    SELECT 1 
    FROM sys.fulltext_indexes 
    WHERE object_id = OBJECT_ID('dbo.Docs')
)
BEGIN
    CREATE FULLTEXT INDEX ON dbo.Docs
    (
        Content LANGUAGE 1033
    )
    KEY INDEX PK_Docs
    ON ftCatalog
    WITH CHANGE_TRACKING AUTO;
END
GO


/* =========================================================
   6. Force population (optional but recommended)
   Ensures index is immediately usable
   ========================================================= */
ALTER FULLTEXT INDEX ON dbo.Docs START FULL POPULATION;
GO


/* =========================================================
   7. Test query WITHOUT stemming
   Only exact match will be returned
   ========================================================= */
SELECT *
FROM dbo.Docs
WHERE CONTAINS(Content, '"run"');
GO


/* =========================================================
   8. Test query WITH stemming
   Expands search to include run, running, ran
   ========================================================= */
SELECT *
FROM dbo.Docs
WHERE CONTAINS(Content, 'FORMSOF(INFLECTIONAL, "run")');
GO


Let's look to the results:

💣 A) Example without stemming

Let's look at standard behavior when searching a Full-Text indexed table.

INSERT INTO Docs (Content)
VALUES
(N'I run every day'),
(N'She is running fast'),
(N'They ran yesterday');
-- Standard Contains Search
SELECT *
FROM Docs
WHERE CONTAINS(Content, '"run"');

👉 Result:

  • ✔️ Finds: run
  • ❌ Does NOT find: running, ran

🚀 B) The Solution: Example with stemming (inflectional search)

Now, let's inject the FORMSOF inflectional capability into the same query.

-- Inflectional Stemming Search
SELECT *
FROM Docs
WHERE CONTAINS(Content, 'FORMSOF(INFLECTIONAL, "run")');

👉 Result:

  • ✔️ run
  • ✔️ running
  • ✔️ ran

👉 This works seamlessly because the English language component in SQL Server includes a highly accurate stemmer.

🌍 Supported languages

Stemming is strictly language-dependent. The engine must know the linguistic rules of the text it is parsing.

✔️ Languages that typically support stemming (IStemmer included):

  • English
  • Italian
  • French
  • Spanish
  • German
  • Portuguese
  • Dutch

Languages that may NOT support stemming:

  • Some Asian languages (e.g., Chinese and Japanese rely on word-breakers and N-grams, not traditional stemming).
  • Less common or legacy Full-Text languages.
  • Custom or unsupported language IDs (LCID).

👉 You can always check the installed languages and their word-breaker/stemmer registrations on your server by running:

SELECT * FROM sys.fulltext_languages;

🚀 My REAL Strategy

In my experience, developers often create Full-Text indexes without explicitly specifying the LANGUAGE attribute for the column, causing SQL Server to default to the server's default language. If your server is in English, but your data is in Italian, FORMSOF(INFLECTIONAL, ...) will apply English grammatical rules to Italian words, resulting in completely missed searches and confused end-users. Always bind the correct LCID to your full-text indexed columns during creation. Don't rely on server defaults!


📢 Support the Blog: Did you find this deep-dive helpful? The ads you see here are selected to reflect your interests. If a partner's offer catches your eye, give it a look! Your engagement helps me continue publishing premium SQL Server content for the community.

Biondi Luca @2026 - Sharing over 25 years of Gained Knowledge for Passion. Share if you like my posts!

Comments

I Post più popolari

Speaking to Sql Server, sniffing the TDS protocol

SQL Server, find text in a Trigger, Stored Procedures, View and Function. Two ways and what ways is better

SQL Server, Avoid that damn Table Spool!