Text Processing Book: Speech and Language Processing (3rd ed. draft)

Speech and Language Processing (3rd ed. draft) Project Website: https://web.stanford.edu/~jurafsky/slp3/ Description   Chapter Slides Relation to 2nd ed. 1: Introduction [Ch. 1 in 2nd ed.] 2: Regular Expressions, Text Normalization, and Edit Distance Text [pptx] [pdf] Edit Distance [pptx] [pdf] … Continue reading

Open Source Text Processing Project: MGIZA

MGIZA++: a multi-threaded word alignment tool based on GIZA++ Project Website: http://www.kyloo.net/software/doku.php/mgiza:overview Github Link: https://github.com/moses-smt/mgiza Description MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways: Multi-threading MGIZA++ can make use of multi-core platforms … Continue reading

Open Source Text Processing Project: mkcls

mkcls: Training of word classes Project Website: http://www.fjoch.com/mkcls.html Github Link: https://github.com/moses-smt/giza-pp Description mkcls is a tool to train word classes by using a maximum-likelihood-criterion. The resulting word classes are especially suited for language models or statistical translation models. The program … Continue reading

Open Source Text Processing Project: GIZA++

GIZA++: Training of statistical translation models Project Website: http://www.fjoch.com/GIZA++.html Github Link: https://github.com/moses-smt/giza-pp Description GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop … Continue reading

Open Source Text Processing Project: Thot

Thot: a Toolkit for Statistical Machine Translation Project Website: http://daormar.github.io/thot/ Github Link: https://github.com/daormar/thot Description Thot is an open source software toolkit for statistical machine translation (SMT). Originally, Thot incorporated tools to train phrase-based models. The new version of Thot now … Continue reading

Open Source Text Processing Project: Joshua

Joshua Statistical Machine Translation Toolkit Project Website: http://joshua-decoder.org/ Github Link: https://github.com/joshua-decoder/joshua Description Joshua is an open-source statistical machine translation decoder for phrase-based (new in 6.0), hierarchical, and syntax-based machine translation, written in Java. It is developed at the Human Language … Continue reading

Open Source Text Processing Project: Moses

Moses, the machine translation system Project Website: http://www.statmt.org/moses/ Github Link: https://github.com/moses-smt/mosesdecoder Description Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts … Continue reading