Open Source Text Processing Project: OleanderStemmingLibrary

Oleander C++ stemming library Project Website: http://www.oleandersolutions.com/stemming/stemming.html Github Link: https://github.com/OleanderSoftware/OleanderStemmingLibrary Description Stemming is a normalization process used to reduce words down to their root. Stemming removes inflectional suffixes so that morphological variants of the same word can be compared more … Continue reading

Open Source Text Processing Project: libstemmer

libstemmer: The C version of the libstemmer library Project Website: http://snowball.tartarus.org/dist/libstemmer_c.tgz Github Link: https://github.com/zvelo/libstemmer Description This contains all you need to include the snowball stemming algorithms into a C project of your own. If you download this, you don’t need … Continue reading

Open Source Text Processing Project: The Porter Stemming Algorithm

The Porter Stemming Algorithm Project Website: http://tartarus.org/martin/PorterStemmer/ Github Link: None Description This is the ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter. The Porter stemming algorithm (or ‘Porter stemmer’) is … Continue reading

Open Source Text Processing Project: hmmus

hmmus: Posterior decoding with a hidden Markov model Project Website: https://pypi.python.org/pypi/hmmus Github Link: None Description Hmmus has some C implementations of HMM algorithms with Python bindings, and it is meant to be useful under the following conditions: The sequence of … Continue reading

Open Source Text Processing Project: UMDHMM

UMDHMM: Hidden Markov Model Toolkit Project Website: http://www.kanungo.com/software/software.html#umdhmm Github Link: None Description Hidden Markov Model (HMM) Software: Implementation of Forward-Backward, Viterbi, and Baum-Welch algorithms. The software has been compiled and tested on UNIX platforms (sun solaris, dec osf and linux) … Continue reading

Open Source Text Processing Project: GHMM

GHMM: The General Hidden Markov Model library Project Website: http://www.ghmm.org/ Github Link: None Description The General Hidden Markov Model library (GHMM) is a freely available C library implementing efficient data structures and algorithms for basic and extended HMMs with discrete … Continue reading

Open Source Text Processing Project: HTK

HTK: The Hidden Markov Model Toolkit Project Website: http://htk.eng.cam.ac.uk/ Github Link: None Description The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it … Continue reading

Open Source Text Processing Project: MGIZA

MGIZA++: a multi-threaded word alignment tool based on GIZA++ Project Website: http://www.kyloo.net/software/doku.php/mgiza:overview Github Link: https://github.com/moses-smt/mgiza Description MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways: Multi-threading MGIZA++ can make use of multi-core platforms … Continue reading

Open Source Text Processing Project: mkcls

mkcls: Training of word classes Project Website: http://www.fjoch.com/mkcls.html Github Link: https://github.com/moses-smt/giza-pp Description mkcls is a tool to train word classes by using a maximum-likelihood-criterion. The resulting word classes are especially suited for language models or statistical translation models. The program … Continue reading

Open Source Text Processing Project: GIZA++

GIZA++: Training of statistical translation models Project Website: http://www.fjoch.com/GIZA++.html Github Link: https://github.com/moses-smt/giza-pp Description GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop … Continue reading