TextProcessing | A Text Processing Portal for Humans

Open Source Deep Learning Project: Blocks

Posted on May 23, 2016 by textprocessingMay 23, 2016

Blocks: A Theano framework for building and training neural networks

Project Website: None

Github Link:

Description

Blocks is a framework that helps you build neural network models on top of Theano. Currently it supports and provides:

Constructing parametrized Theano operations, called “bricks”
Pattern matching to select variables and bricks in large models
Algorithms to optimize your model
Saving and resuming of training
Monitoring and analyzing values during training progress (on the training set as well as on test sets)
Application of graph transformations, such as dropout
In the future we also hope to support:

Dimension, type and axes-checking
See Also:
Fuel, the data processing engine developed primarily for Blocks.
Blocks-examples for maintained examples of scripts using Blocks.
Blocks-extras for semi-maintained additional Blocks components.

Open Source Deep Learning Project: Pylearn2

Posted on May 22, 2016 by textprocessingMay 22, 2016

Pylearn2: A machine learning research library

Project Website:

Github Link:

Description

Pylearn2 is a machine learning library. Most of its functionality is built on top of Theano. This means you can write Pylearn2 plugins (new models, algorithms, etc) using mathematical expressions, and Theano will optimize and stabilize those expressions for you, and compile them to a backend of your choice (CPU or GPU).

Pylearn2 Vision
Researchers add features as they need them. We avoid getting bogged down by too much top-down planning in advance.
A machine learning toolbox for easy scientific experimentation.
All models/algorithms published by the LISA lab should have reference implementations in Pylearn2.
Pylearn2 may wrap other libraries such as scikit-learn when this is practical
Pylearn2 differs from scikit-learn in that Pylearn2 aims to provide great flexibility and make it possible for a researcher to do almost anything, while scikit-learn aims to work as a “black box” that can produce good results even if the user does not understand the implementation
Dataset interface for vector, images, video, …
Small framework for all what is needed for one normal MLP/RBM/SDA/Convolution experiments.
Easy reuse of sub-component of Pylearn2.
Using one sub-component of the library does not force you to use / learn to use all of the other sub-components if you choose not to.
Support cross-platform serialization of learned models.
Remain approachable enough to be used in the classroom (IFT6266 at the University of Montreal).

Open Source Deep Learning Project: Torch

Posted on May 21, 2016 by textprocessingMay 21, 2016

Torch: A SCIENTIFIC COMPUTING FRAMEWORK FOR LUAJIT

Project Website:

Github Link:

Description

What is Torch?
Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.

A summary of core features:

a powerful N-dimensional array
lots of routines for indexing, slicing, transposing, …
amazing interface to C, via LuaJIT
linear algebra routines
neural network, and energy-based models
numeric optimization routines
Fast and efficient GPU support
Embeddable, with ports to iOS, Android and FPGA backends
Why Torch?
The goal of Torch is to have maximum flexibility and speed in building your scientific algorithms while making the process extremely simple. Torch comes with a large ecosystem of community-driven packages in machine learning, computer vision, signal processing, parallel processing, image, video, audio and networking among others, and builds on top of the Lua community.

At the heart of Torch are the popular neural network and optimization libraries which are simple to use, while having maximum flexibility in implementing complex neural network topologies. You can build arbitrary graphs of neural networks, and parallelize them over CPUs and GPUs in an efficient manner.

Using Torch
Start with our Getting Started guide to download and try Torch yourself. Torch is open-source, so you can also start with the code on the GitHub repo.

Torch is constantly evolving: it is already used within Facebook, Google, Twitter, NYU, IDIAP, Purdue and several other companies and research labs.

Open Source Deep Learning Project: Theano

Posted on May 20, 2016 by textprocessingMay 20, 2016

Theano

Project Website:

Github Link:

Description

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:

tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
dynamic C code generation – Evaluate expressions faster.
extensive unit-testing and self-verification – Detect and diagnose many types of errors.
Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (University of Montreal’s deep learning/machine learning classes).

Text Processing Book: Speech and Language Processing (3rd ed. draft)

Posted on May 20, 2016 by textprocessingMay 20, 2016

Speech and Language Processing (3rd ed. draft)

Project Website:

Description

Chapter	Slides	Relation to 2nd ed.
1:	Introduction		[Ch. 1 in 2nd ed.]
2:		Text [] [] Edit Distance [] []	[Ch. 2 and parts of Ch. 3 in 2nd ed.]
3:	Finite State Transducers
4:		LM [] []	[Ch. 4 in 2nd ed.]
5:	Neural Language Models and RNNs
6:		Spelling [] []	[expanded from pieces in Ch. 5 in 2nd ed.]
7:		NB [] [] Sentiment [] []	[new in this edition]
8:			[Ch. 6 in 2nd ed.]
9:			[Ch. 5 in 2nd ed.]
10:	Formal Grammars of English
11:	Syntactic Parsing
12:	Statistical Parsing
13:	Dependency Parsing
14:	Language and Complexity

15:		Vector [] []	[expanded from parts of Ch. 19 and 20 in 2nd ed.]
16:		Dense Vector [] []	[new in this edition]
18:		Intro, Sim [] [] WSD [] []	[expanded from parts of Ch. 19 and 20 in 2nd ed.]
21:		SentLex [] []	[new in this edition]

16:	The Representation of Sentence Meaning
17:	Computational Semantics
??:	Neural Models of Sentence Meaning (LSTM, CNN, etc.)
20:			[Ch. 22 in 2nd ed.]
22:		SRL [] [] Select [] []	[expanded from parts of Ch. 19 and 20 in 2nd ed.]

23:	Coreference Resolution and Entity Linking
24:	Discourse Coherence
25:	Summarization
26:	Machine Translation
27:
28:	Conversational Agents
29:	Speech Recognition
30:	Speech Synthesis

About the Author
Dan Jurafsky is an associate professor in the Department of Linguistics, and by courtesy in Department of Computer Science, at Stanford University. Previously, he was on the faculty of the University of Colorado, Boulder, in the Linguistics and Computer Science departments and the Institute of Cognitive Science. He was born in Yonkers, New York, and received a B.A. in Linguistics in 1983 and a Ph.D. in Computer Science in 1992, both from the University of California at Berkeley. He received the National Science Foundation CAREER award in 1998 and the MacArthur Fellowship in 2002. He has published over 90 papers on a wide range of topics in speech and language processing.

James H. Martin is a professor in the Department of Computer Science and in the Department of Linguistics, and a fellow in the Institute of Cognitive Science at the University of Colorado at Boulder. He was born in New York City, received a B.S. in Comoputer Science from Columbia University in 1981 and a Ph.D. in Computer Science from the University of California at Berkeley in 1988. He has authored over 70 publications in computer science including the book A Computational Model of Metaphor Interpretation.

Open Source Text Processing Project: Festival

Posted on May 20, 2016 by textprocessingMay 28, 2016

The Festival Speech Synthesis System

Project Website:
Github Link: None

Description

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system. And full tools and documentation for build new voices are available through Carnegie Mellon’s FestVox project (http://festvox.org)

The system is written in C++ and uses the Edinburgh Speech Tools Library for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate, a printed manual, info files and HTML.

Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.

Open Source Text Processing Project: PyJulius

Posted on May 20, 2016 by textprocessingMay 20, 2016

PyJulius: Python interface to Julius speech recognition engine

Project Website:
Github Link:

Description

pyjulius provides a simple interface to connect to julius module server

First you will need to run julius with the -module option (documentation here or man julius). Julius will wait for a client to connect, this is what Client does in a threaded way.

Let’s just write a simple program that will print whatever the julius server sends until you press CTRL+C:

#!/usr/bin/env python
import sys
import pyjulius
import Queue

# Initialize and try to connect
client = pyjulius.Client(‘localhost’, 10500)
try:
client.connect()
except pyjulius.ConnectionError:
print ‘Start julius as module first!’
sys.exit(1)

# Start listening to the server
client.start()
try:
while 1:
try:
result = client.results.get(False)
except Queue.Empty:
continue
print repr(result)
except KeyboardInterrupt:
print ‘Exiting…’
client.stop() # send the stop signal
client.join() # wait for the thread to die
client.disconnect() # disconnect from julius
If you are only interested in recognitions, wait for an instance of Sentence objects in the queue:

if isinstance(result, pyjulius.Sentence):
print ‘Sentence “%s” recognized with score %.2f’ % (result, result.score)
If you do not want Client to interpret the raw xml Element, you can set modelize attribute to False

If you encounter any encoding issues, have a look at the -charconv option of julius and set the Client.encoding to the right value

Open Source Text Processing Project: eSpeak

Posted on May 20, 2016 by textprocessingMay 20, 2016

eSpeak text to speech

Project Website:
Github Link: None

Description

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. http://espeak.sourceforge.net
eSpeak uses a “formant synthesis” method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:

A command line program (Linux and Windows) to speak text from a file or from stdin.
A shared library version for use by other programs. (On Windows this is a DLL).
A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris.
Features.
Includes different Voices, whose characteristics can be altered.
Can produce speech output as a WAV file.
SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
Compact size. The program and its data, including many languages, totals about 2 Mbytes.
Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
Development tools are available for producing and tuning phoneme data.
Written in C.

Open Source Text Processing Project: Julius

Posted on May 20, 2016 by textprocessingMay 20, 2016

Julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine

Project Website:
Github Link:

Description

“Julius” is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, it can perform real-time decoding on various computers and devices from micro-computer to cloud server. The algorithm is based on 2-pass tree-trellis search, which fully incorporates major decoding techniques such as tree-organized lexicon, 1-best / word-pair context approximation, rank/score pruning, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized to be independent from model structures, and wide variety of HMM structures are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phone sets. It also can run multi-instance recognition, running dictation, grammar-based recognition or isolated word recognition simultaneously in a single thread. Standard formats are adopted for the models to cope with other speech / language modeling toolkit such as HTK, SRILM, etc. Recent version also supports Deep Neural Network (DNN) based real-time decoding.

The main platform is Linux and other Unix-based system, as well as Windows, Mac, Androids and other platforms.

Julius has been developed as a research software for Japanese LVCSR since 1997, and the work was continued under IPA Japanese dictation toolkit project (1997-2000), Continuous Speech Recognition Consortium, Japan (CSRC) (2000-2003) and Interactive Speech Technology Consortium (ISTC).

The main developer / maintainer is Akinobu Lee (ri@nitech.ac.jp).

Features
An open-source LVCSR software (see terms and conditions of license.)
Real-time, hi-speed, accurate recognition based on 2-pass strategy.
Low memory requirement: less than 32MBytes required for work area (<64MBytes for 20k-word dictation with on-memory 3-gram LM). Supports LM of N-gram with arbitrary N. Also supports rule-based grammar, and word list for isolated word recognition. Language and unit-dependent: Any LM in ARPA standard format and AM in HTK ascii hmm definition format can be used. Highly configurable: can set various search parameters. Also alternate decoding algorithm (1-best/word-pair approx., word trellis/word graph intermediates, etc.) can be chosen. List of major supported features: On-the-fly recognition for microphone and network input GMM-based input rejection Successive decoding, delimiting input by short pauses N-best output Word graph output Forced alignment on word, phoneme, and state level Confidence scoring Server mode and control API Many search parameters for tuning its performance Character code conversion for result output. (Rev. 4) Engine becomes Library and offers simple API (Rev. 4) Long N-gram support (Rev. 4) Run with forward / backward N-gram only (Rev. 4) Confusion network output (Rev. 4) Arbitrary multi-model decoding in a single thread. (Rev. 4) Rapid isolated word recognition (Rev. 4) User-defined LM function embedding DNN-based decoding, using front-end module for frame-wise state probability calculation for flexibility.

Open Source Text Processing Project: Kaldi

Posted on May 20, 2016 by textprocessingMay 20, 2016

Kaldi Speech Recognition Toolkit

Project Website:
Github Link:

Description

What is Kaldi?

Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. For more detailed history and list of contributors see History of the Kaldi project.

The name Kaldi

According to legend, Kaldi was the Ethiopian goatherder who discovered the coffee plant.

Kaldi’s versus other toolkits

Kaldi is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend. Important features include:

Code-level integration with Finite State Transducers (FSTs)
We compile against the OpenFst toolkit (using it as a library).
Extensive linear algebra support
We include a matrix library that wraps standard BLAS and LAPACK routines.
Extensible design
As far as possible, we provide our algorithms in the most generic form possible. For instance, our decoders are templated on an object that provides a score indexed by a (frame, fst-input-symbol) tuple. This means the decoder could work from any suitable source of scores, such as a neural net.
Open license
The code is licensed under Apache 2.0, which is one of the least restrictive licenses available.
Complete recipes
Our goal is to make available complete recipes for building speech recognition systems, that work from widely available databases such as those provided by the Linguistic Data Consortium (LDC).
The goal of releasing complete recipes is an important aspect of Kaldi. Since the code is publicly available under a license that permits modifications and re-release, we would like to encourage people to release their code, along with their script directories, in a similar format to Kaldi’s own example script.

We have tried to make Kaldi’s documentation as complete as possible given time constraints, but in the short term we cannot hope to generate documentation that is as thorough as HTK’s. In particular there is a lot of introductory material in the HTKBook, explaining statistical speech recognition for the uninitiated, that will probably never appear in Kaldi’s documentation. Much of Kaldi’s documentation is written in such a way that it will only be accessible to an expert. In the future we hope to make it somewhat more accessible, bearing in mind that our intended audience is speech recognition researchers or researchers-in-training. In general, Kaldi is not a speech recognition toolkit “for dummies.” It will allow you to do many kinds of operations that don’t make sense.

The flavor of Kaldi

In this section we attempt to summarize some of the more generic qualities of the Kaldi toolkit. To some extent this describes the goals of the current developers, as much as it descibes the current status of the project. It is not meant to exclude contributions from researchers whose work has a different flavor.

We emphasize generic algorithms and universal recipes
By “generic algorithms” we mean things like linear transforms, rather than those that are specific to speech in some way. But we don’t intend to be too dogmatic about this, if more specific algorithms are useful.
We would like recipes that can be run on any data-set, rather than those that have to be customized.
We prefer provably correct algorithms
The recipes have been designed in such a way that in principle they should never fail in a catastophic way. There has been an effort to avoid recipes and algorithms that could possibly fail, even if they don’t fail in the “normal case” (one example: FST weight-pushing, which normally helps but can crash or make things much worse in certain cases).
Kaldi code is thoroughly tested.
The goal is for all or nearly all the code to have corresponding test routines.
We try to keep the simple cases simple.
There is a danger when building a large speech toolkit that the code can become a forest of rarely used alternatives. We are trying to avoid this by structuring the toolkit in the following way. Each command-line program generally works for a limited set of cases (e.g. a decoder might just work for GMMs). Thus, when you add a new type of model, you create a new command-line decoder (that calls the same underlying templated code).
Kaldi code is easy to understand.
Even though the Kaldi toolkit as a whole may get very large, we aim for each individual part of it to be understandable without too much effort. We will accept some code duplication if it improves the understandability of individual pieces.
Kaldi code is easy to reuse and refactor.
We aim for the toolkit to as loosely coupled as possible. In general this means that any given header should need to #include as few other header files as possible. The matrix library, in particular, only depends on code in one other subdirectory so it can be used independently of almost all the rest of Kaldi.
Status of the project

Currently, we have code and scripts for most standard techniques, including all standard linear transforms, MMI, boosted MMI and MCE discriminative training, and also feature-space discriminative training (like fMPE, but based on boosted MMI). We have working recipes for Wall Street Journal and Resource Management, and also for Switchboard. The Switchboard recipe is not yet giving state-of-the-art results, due to vocabulary and language model issues– we don’t use any external data sources for this.

Note: after an early phase in which we intended to use version numbers for major releases of Kaldi (“v1” and so on), we realized that these type of releases do not mesh well with the natural style of development, which is very continuous. Currently we maintain only the “master” development branch, and this is the version you should use. Also, frequently do “git pull” to keep it up to date; see Downloading and installing Kaldi for more details.

Referencing Kaldi in papers

You can use the following reference if you want to cite Kaldi in papers.

@INPROCEEDINGS{
Povey_ASRU2011,
author = {Povey, Daniel and Ghoshal, Arnab and Boulianne, Gilles and Burget, Lukas and Glembek, Ondrej and Goel, Nagendra and Hannemann, Mirko and Motlicek, Petr and Qian, Yanmin and Schwarz, Petr and Silovsky, Jan and Stemmer, Georg and Vesely, Karel},
keywords = {ASR, Automatic Speech Recognition, GMM, HTK, SGMM},
month = dec,
title = {The Kaldi Speech Recognition Toolkit},
booktitle = {IEEE 2011 Workshop on Automatic Speech Recognition and Understanding},
year = {2011},
publisher = {IEEE Signal Processing Society},
location = {Hilton Waikoloa Village, Big Island, Hawaii, US},
note = {IEEE Catalog No.: CFP11SRW-USB},
}