Open Source Text Processing Project: Text-NSP

NSP: The Ngram Statistics Package Project Website: http://search.cpan.org/~tpederse/Text-NSP/ Github Link: None Description The Ngram Statistics Package (NSP) is a collection of perl modules that aid in analyzing Ngrams in text files. We define an Ngram as a sequence of ‘n’ … Continue reading

Open Source Text Processing Project: KenLM

KenLM: Faster and Smaller Language Model Queries Project Website: http://kheafield.com/code/kenlm/ Github Link: https://github.com/kpu/kenlm Description KenLM Language Model Toolkit benchmark | dependencies | developers | estimation | filter | moses | structures Ken Models with Computer Engineer Barbie KenLM estimates, filters, … Continue reading

Open Source Text Processing Project: IRSTLM

IRSTLM: The IRST Language Modeling Toolkit Project Website: http://hlt-mt.fbk.eu/technologies/irstlm Github Link: https://github.com/irstlm-team/irstlm Description The IRST Language Modeling (IRSTLM) Toolkit features algorithms and data structures suitable to estimate, store, and access very large n-gram language models. Our software has been integrated … Continue reading

Open Source Text Processing Project: SRILM

SRILM – The SRI Language Modeling Toolkit Project Website: http://www.speech.sri.com/projects/srilm/ Github Link: None Description SRILM – The SRI Language Modeling Toolkit SRILM is a toolkit for building and applying statistical language models (LMs), primarily for use in speech recognition, statistical … Continue reading

Open Source Text Processing Project: berkeleylm

berkeleylm Project Website: https://code.google.com/archive/p/berkeleylm/ Github Link: https://github.com/adampauls/berkeleylm Description An N-gram Language Model Library from UC Berkeley This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf’>this paper. … Continue reading