Stemming | Notion

Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search. For example, walking and walked can be stemmed to the same root word: walk. Once stemmed, an occurrence of either word would match the other in a search.

Stemming is language-dependent but often involves removing prefixes and suffixes from words.

In some cases, the root form of a stemmed word may not be a real word. For example, jumping and jumpiness can both be stemmed to jumpi. While jumpi isn’t a real English word, it doesn’t matter for search; if all variants of a word are reduced to the same root form, they will match correctly.

In Elasticsearch, stemming is handled by stemmer token filters. These token filters can be categorized based on how they stem words:

Algorithmic Stemmer - stem words based on set of rules
Dictionary Stemmer - stem words by looking them into dictionary

<aside> 💡 Because stemming changes tokens, we recommend using the same stemmer token filters during index and search analysis.

</aside>

Algorithmic Stemmer

Algorithmic stemmers apply a series of rules to each word to reduce it to its root form. For example, an algorithmic stemmer for English may remove the -s and -es suffixes from the end of plural words.

Advantages:

They require little setup and usually work well out of the box.
They use little memory.
They are typically faster than dictionary stemmers

However, most algorithmic stemmers only alter the existing text of a word. This means they may not work well with irregular words that don’t contain their root form, such as:

be, are, and am
mouse and mice