Open Source Text Processing Project: Jahmm

Jahmm: An implementation of Hidden Markov Models in Java Project Website: https://code.google.com/archive/p/jahmm/ Github Link: https://github.com/KommuSoft/jahmm Description Jahmm (pronounced “jam”), is a Java implementation of Hidden Markov Model (HMM) related algorithms. It’s been designed to be easy to use (e.g. simple … Continue reading

Open Source Text Processing Project: UMDHMM

UMDHMM: Hidden Markov Model Toolkit Project Website: http://www.kanungo.com/software/software.html#umdhmm Github Link: None Description Hidden Markov Model (HMM) Software: Implementation of Forward-Backward, Viterbi, and Baum-Welch algorithms. The software has been compiled and tested on UNIX platforms (sun solaris, dec osf and linux) … Continue reading

Open Source Text Processing Project: GHMM

GHMM: The General Hidden Markov Model library Project Website: http://www.ghmm.org/ Github Link: None Description The General Hidden Markov Model library (GHMM) is a freely available C library implementing efficient data structures and algorithms for basic and extended HMMs with discrete … Continue reading

Open Source Text Processing Project: MGIZA

MGIZA++: a multi-threaded word alignment tool based on GIZA++ Project Website: http://www.kyloo.net/software/doku.php/mgiza:overview Github Link: https://github.com/moses-smt/mgiza Description MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways: Multi-threading MGIZA++ can make use of multi-core platforms … Continue reading

Open Source Text Processing Project: mkcls

mkcls: Training of word classes Project Website: http://www.fjoch.com/mkcls.html Github Link: https://github.com/moses-smt/giza-pp Description mkcls is a tool to train word classes by using a maximum-likelihood-criterion. The resulting word classes are especially suited for language models or statistical translation models. The program … Continue reading

Open Source Text Processing Project: GIZA++

GIZA++: Training of statistical translation models Project Website: http://www.fjoch.com/GIZA++.html Github Link: https://github.com/moses-smt/giza-pp Description GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop … Continue reading

Open Source Text Processing Project: Thot

Thot: a Toolkit for Statistical Machine Translation Project Website: http://daormar.github.io/thot/ Github Link: https://github.com/daormar/thot Description Thot is an open source software toolkit for statistical machine translation (SMT). Originally, Thot incorporated tools to train phrase-based models. The new version of Thot now … Continue reading

Open Source Text Processing Project: PyTeaser

Pyteaser: Summarizes news articles by providing an url Project Website: http://xiaoxu193.github.io/PyTeaser/ Github Link: https://github.com/xiaoxu193/PyTeaser Description PyTeaser takes any news article and extract a brief summary from it. It’s based on the original Scala project. Summaries are created by ranking sentences … Continue reading

Open Source Text Processing Project: Python TextTeaser

TextTeaser: Official version of TextTeaser Project Website: None Github Link: https://github.com/DataTeaser/textteaser Description TextTeaser is an automatic summarization algorithm. This is now the official version of TextTeaser. Future developments of TextTeaser will be in this repository. The original Scala TextTeaser can … Continue reading

Open Source Text Processing Project: summarizer

summarizer: A multidocument text summarizer Project Website: None Github Link: https://github.com/kylehg/summarizer Description UNMAINTAINED: CIS-530 Final Project NOTE: This was a school project. It is very likely riddled with bugs, and is entirely unmaintained. It should not be considered for any … Continue reading