Oleander C++ stemming library
Project Website:
Github Link:
Description
Stemming is a normalization process used to reduce words down to their root. Stemming removes inflectional suffixes so that morphological variants of the same word can be compared more easily. For example, the words “predicts”, “prediction”, and “predicted” will all be seen as having the same root, “predict”, after being stemmed and therefore would be considered the same word.
Stemming is primarily used in Information Retrieval systems where “fuzzy” string matching is a necessity. IR systems that allow users to search for a particular word and all its variants, rather than only matching the user’s precise query, normally use stemming to accomplish this. IR systems include desktop and web-based search engines.
The Oleander C++ stemming library is an implementation of the Porter stemming algorithms and supports most Western European languages.
Features
Full implementations of the Porter stemming algorithms
Includes stemmers for English, Danish, Dutch, French, Finnish, German, Italian, Norwegian, Portuguese, Spanish, Swedish, and Russian
Case-insensitive text handling
Designed for C++’s standard wstring class (Unicode strings)
BSD licensing