Open Source Text Processing Project: PTStemmer

PTStemmer – A Stemming toolkit for the Portuguese language

Project Website:

Github Link: None

Description

FEATURES

Java, Python, and .NET C# implementations of Orengo, Porter, and Savoy stemmers

Fast: can stem more than 1.5M words/second on a normal desktop

Least Recently Used (LRU) stem cache

Support for lists of words to ignore (useful for stopword and named entity removal)

MINIMUM REQUIREMENTS

Java
Java 1.5
Python
Python 2.5
.NET
.NET Framework 2.0 or Mono 2.0


Leave a Reply

Your email address will not be published. Required fields are marked *