Open Source Text Processing Project: Kaldi

Kaldi Speech Recognition Toolkit Project Website: http://kaldi-asr.org/ Github Link: https://github.com/kaldi-asr/kaldi Description What is Kaldi? Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. … Continue reading

Open Source Text Processing Project: PocketSphinx

PocketSphinx Project Website: None Github Link: https://github.com/cmusphinx/pocketsphinx Description PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop This is PocketSphinx, one of Carnegie Mellon University’s open source … Continue reading

Open Source Text Processing Project: Sphinxtrain

Sphinxtrain Project Website: None Github Link: https://github.com/cmusphinx/sphinxtrain Description This is SphinxTrain, Carnegie Mellon University’s open source acoustic model trainer. This directory contains the scripts and instructions necessary for building models for the CMU Sphinx Recognizer. This distribution is free software, … Continue reading

Open Source Text Processing Project: sphinxbase

sphinxbase: CMU Sphinx common libraries Project Website: None Github Link: https://github.com/cmusphinx/sphinxbase Description This package contains the basic libraries shared by the CMU Sphinx trainer and all the Sphinx decoders (Sphinx-II, Sphinx-III, and PocketSphinx), as well as some common utilities for … Continue reading

Open Source Text Processing Project: Serelex

Serelex – a tool for semantic relation extraction Project Website: None Github Link: https://github.com/jgc128/DefVectors Description A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from the Wikipedia articles (other … Continue reading

Open Source Text Processing Project: OleanderStemmingLibrary

Oleander C++ stemming library Project Website: http://www.oleandersolutions.com/stemming/stemming.html Github Link: https://github.com/OleanderSoftware/OleanderStemmingLibrary Description Stemming is a normalization process used to reduce words down to their root. Stemming removes inflectional suffixes so that morphological variants of the same word can be compared more … Continue reading

Open Source Text Processing Project: libstemmer

libstemmer: The C version of the libstemmer library Project Website: http://snowball.tartarus.org/dist/libstemmer_c.tgz Github Link: https://github.com/zvelo/libstemmer Description This contains all you need to include the snowball stemming algorithms into a C project of your own. If you download this, you don’t need … Continue reading

Open Source Text Processing Project: The Porter Stemming Algorithm

The Porter Stemming Algorithm Project Website: http://tartarus.org/martin/PorterStemmer/ Github Link: None Description This is the ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter. The Porter stemming algorithm (or ‘Porter stemmer’) is … Continue reading

Open Source Text Processing Project: hmmus

hmmus: Posterior decoding with a hidden Markov model Project Website: https://pypi.python.org/pypi/hmmus Github Link: None Description Hmmus has some C implementations of HMM algorithms with Python bindings, and it is meant to be useful under the following conditions: The sequence of … Continue reading

Open Source Text Processing Project: UMDHMM

UMDHMM: Hidden Markov Model Toolkit Project Website: http://www.kanungo.com/software/software.html#umdhmm Github Link: None Description Hidden Markov Model (HMM) Software: Implementation of Forward-Backward, Viterbi, and Baum-Welch algorithms. The software has been compiled and tested on UNIX platforms (sun solaris, dec osf and linux) … Continue reading