berkeleylm
Project Website:
Github Link:
Description
An N-gram Language Model Library from UC Berkeley
This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf’>this paper. Its data structures are faster and smaller than http://nlp.cs.berkeley.edu/’>SRILM and nearly as fast as http://kheafield.com/code/kenlm/’>KenLM despite being written in Java instead of C++. It also achieves the best published lossless encoding of the Google n-gram corpus.