Open Source Text Processing Project: berkeleylm

berkeleylm

Project Website:

Github Link:

Description

An N-gram Language Model Library from UC Berkeley

This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf’>this paper. Its data structures are faster and smaller than http://nlp.cs.berkeley.edu/’>SRILM and nearly as fast as http://kheafield.com/code/kenlm/’>KenLM despite being written in Java instead of C++. It also achieves the best published lossless encoding of the Google n-gram corpus.


Leave a Reply

Your email address will not be published. Required fields are marked *