An N-gram Language Model Library from UC Berkeley

This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in’>this paper. Its data structures are faster and smaller than’>SRILM and nearly as fast as’>KenLM despite being written in Java instead of C++. It also achieves the best published lossless encoding of the Google n-gram corpus.

