Open Source Text Processing Project: GIZA++

GIZA++: Training of statistical translation models Project Website: http://www.fjoch.com/GIZA++.html Github Link: https://github.com/moses-smt/giza-pp Description GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop … Continue reading

Open Source Text Processing Project: KenLM

KenLM: Faster and Smaller Language Model Queries Project Website: http://kheafield.com/code/kenlm/ Github Link: https://github.com/kpu/kenlm Description KenLM Language Model Toolkit benchmark | dependencies | developers | estimation | filter | moses | structures Ken Models with Computer Engineer Barbie KenLM estimates, filters, … Continue reading

Open Source Text Processing Project: IRSTLM

IRSTLM: The IRST Language Modeling Toolkit Project Website: http://hlt-mt.fbk.eu/technologies/irstlm Github Link: https://github.com/irstlm-team/irstlm Description The IRST Language Modeling (IRSTLM) Toolkit features algorithms and data structures suitable to estimate, store, and access very large n-gram language models. Our software has been integrated … Continue reading

Open Source Text Processing Project: SRILM

SRILM – The SRI Language Modeling Toolkit Project Website: http://www.speech.sri.com/projects/srilm/ Github Link: None Description SRILM – The SRI Language Modeling Toolkit SRILM is a toolkit for building and applying statistical language models (LMs), primarily for use in speech recognition, statistical … Continue reading

Open Source Text Processing Project: Thot

Thot: a Toolkit for Statistical Machine Translation Project Website: http://daormar.github.io/thot/ Github Link: https://github.com/daormar/thot Description Thot is an open source software toolkit for statistical machine translation (SMT). Originally, Thot incorporated tools to train phrase-based models. The new version of Thot now … Continue reading

Open Source Text Processing Project: berkeleyparser

berkeleyparser Project Website: None Github Link: https://github.com/slavpetrov/berkeleyparser Description “THE BERKELEY PARSER” release 1.1 migrated from Google Code to GitHub July 2015 This package contains the Berkeley Parser as described in “Learning Accurate, Compact, and Interpretable Tree Annotation” Slav Petrov, Leon … Continue reading

Open Source Text Processing Project: berkeleylm

berkeleylm Project Website: https://code.google.com/archive/p/berkeleylm/ Github Link: https://github.com/adampauls/berkeleylm Description An N-gram Language Model Library from UC Berkeley This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf’>this paper. … Continue reading

Open Source Text Processing Project: Joshua

Joshua Statistical Machine Translation Toolkit Project Website: http://joshua-decoder.org/ Github Link: https://github.com/joshua-decoder/joshua Description Joshua is an open-source statistical machine translation decoder for phrase-based (new in 6.0), hierarchical, and syntax-based machine translation, written in Java. It is developed at the Human Language … Continue reading

Open Source Text Processing Project: MEAD

MEAD Project Website: http://www.summarization.com/mead/ Github Link: None Description MEAD is the most elaborate publicly available platform for multi-lingual summarization and evaluation.The platform implements multiple summarization algorithms such as position-based, centroid-based, largest common subsequence, and keywords. The methods for evaluating the … Continue reading

Open Source Text Processing Project: SWING

SWING: An Open-Source Text Summarizer from WING Project Website: http://wing.comp.nus.edu.sg/downloads/swing/ Github Link: https://github.com/WING-NUS/SWING Description The Summarizer from the Web IR / NLP Group (WING), hence SWING, is a modular, state-of-the-art automatic extractive text summarization system. It produces informative summaries from … Continue reading