Open Source Text Processing Project: SRILM

SRILM – The SRI Language Modeling Toolkit

Project Website:

Github Link: None

Description

SRILM – The SRI Language Modeling Toolkit

SRILM is a toolkit for building and applying statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation, and machine translation. It has been under development in the SRI Speech Technology and Research Laboratory since 1995. The toolkit has also greatly benefitted from its use and enhancements during the Johns Hopkins University/CLSP summer workshops in 1995, 1996, 1997, and 2002 (see history).

These pages and the software itself assume that you know what statistical language modeling is. To learn about language modeling we recommend the textbooks

Speech and Language Processing by Dan Jurafsky and Jim Martin (chapter 6 in the 1st edition, chapter 4 in the 2nd edition)
Foundations of Statistical Natural Language Processing by Chris Manning and Hinrich Schütze (chapter 6).
Either book gives an excellent introduction to N-gram language modeling, which is the main type of LM supported by SRILM.

SRILM consists of the following components:

A set of C++ class libraries implementing language models, supporting data stuctures and miscellaneous utility functions.
A set of executable programs built on top of these libraries to perform standard tasks such as training LMs and testing them on data, tagging or segmenting text, etc.
A collection of miscellaneous scripts facilitating minor related tasks.
SRILM runs on UNIX and Windows platforms.

SRILM has been used in a great variety of statistical modeling applications.

Others have published extensions to SRILM that add new functionality


Leave a Reply

Your email address will not be published. Required fields are marked *