Open Source Deep Learning Project: neuralnetworks

neuralnetworks: Deep Neural Networks with GPU support

Project Website: None

Github Link:


This is a Java implementation of some of the algorithms for training deep neural networks. GPU support is provided via the OpenCL and Aparapi. The architecture is designed with modularity, extensibility and pluggability in mind.

Git structure

I’m using the git-flow model. The most stable (but older) sources are available in the master branch, while the latest ones are in the develop branch.

If you want to use the previous Java 7 compatible version you can check out this release.

Neural network types

Multilayer perceptron
Restricted Boltzmann Machine
Deep belief network
Stacked autoencodeer
Convolutional networks with max pooling, average poolng and stochastic pooling.
Maxout networks (work-in-progress)
Training algorithms

Backpropagation – supports multilayer perceptrons, convolutional networks and dropout.
Contrastive divergence and persistent contrastive divergence implemented using these and these guidelines.
Greedy layer-wise training for deep networks – works for stacked autoencoders and DBNs, but supports any kind of training.
All the algorithms support GPU execution.

Out of the box supported datasets are MNIST, CIFAR-10/CIFAR-100 (experimental, not much testing), IRIS and XOR, but you can easily implement your own.

Experimental support of RGB image preprocessing operations – affine transformations, cropping, and color scaling (see -> testImageInputProvider).

Activation functions

Weighted sum
All the functions support GPU execution. They can be applied to all types of networks and all training algorithms. You can also implement new activations.

Open Source Deep Learning Project: CUV

CUV: Matrix library for CUDA in C++ and Python

Project Website:

Github Link:


CUV is a C++ template and Python library which makes it easy to use NVIDIA(tm)


Supported Platforms:

• This library was only tested on Ubuntu Karmic, Lucid and Maverick. It uses
mostly standard components (except PyUBLAS) and should run without major
modification on any current linux system.

Supported GPUs:

• By default, code is generated for the lowest compute architecture. We
recommend you change this to match your hardware. Using ccmake you can set
the build variable “CUDA_ARCHITECTURE” for example to -arch=compute_20
• All GT 9800 and GTX 280 and above
• GT 9200 without convolutions. It might need some minor modifications to
make the rest work. If you want to use that card and have problems, just
get in contact.
• On 8800GTS, random numbers and convolutions wont work.


• Like for example Matlab, CUV assumes that everything is an n-dimensional
array called “tensor”
• Tensors can have an arbitrary data-type and can be on the host (CPU-memory)
or device (GPU-memory)
• Tensors can be column-major or row-major (1-dimensional tensors are, by
convention, row-major)
• The library defines many functions which may or may not apply to all
possible combinations. Variations are easy to add.
• For convenience, we also wrap some of the functionality provided by Alex
Krizhevsky on his website ( with
permission. Thanks Alex for providing your code!

Python Integration

• CUV plays well with python and numpy. That is, once you wrote your fast GPU
functions in CUDA/C++, you can export them using Boost.Python. You can use
Numpy for pre-processing and fancy stuff you have not yet implemented, then
push the Numpy-matrix to the GPU, run your operations there, pull again to
CPU and visualize using matplotlib. Great.

Implemented Functionality

• Simple Linear Algebra for dense vectors and matrices (BLAS level 1,2,3)
• Helpful functors and abstractions
• Sparse matrices in DIA format and matrix-multiplication for these matrices
• I/O functions using boost.serialization
• Fast Random Number Generator
• Up to now, CUV was used to build dense and sparse Neural Networks and
Restricted Boltzmann Machines (RBM), convolutional or locally connected.


• Tutorials are available on
• The documentation can be generated from the code or accessed on the


• We are eager to help you getting started with CUV and improve the library
continuously! If you have any questions, feel free to contact Hannes Schulz
(schulz at ais dot uni-bonn dot de) or Andreas Mueller (amueller at ais dot
uni-bonn dot de). You can find the website of our group at http://

Open Source Deep Learning Project: CNTK

CNTK: Computational Network Toolkit

Project Website:

Github Link:


Production-quality, Open Source, Multi-machine, Multi-GPU,
Highly efficent RNN training,
Speech, Image, Text

CNTK (, the Computational Network Toolkit by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code.

Wiki: Go to the CNTK Wiki for all information on CNTK including setup, examples, etc.

License: See in the root of this repository for the full license information.

Tutorial: Microsoft Computational Network Toolkit (CNTK) @ NIPS 2015 Workshops


Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance
Microsoft researchers win ImageNet computer vision challenge (December 2015)

Open Source Deep Learning Project: Gnumpy


Project Website: None

Github Link:


Do you want to have both the compute power of GPU’s and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that.

Gnumpy is a simple Python module that interfaces in a way almost identical to numpy, but does its computations on your computer’s GPU. See this example, training an RBM using Gnumpy.

Gnumpy runs on top of, and therefore requires, the excellent cudamat library, written by Vlad Mnih.

Gnumpy can run in simulation mode: everything happens on the CPU, but the interface is the same. This can be helpful if you like to write your programs on your GPU-less laptop before running them on a GPU-equipped machine. It also allows you to easily test what performance gain you get from using a GPU. The simulation mode requires npmat, written by Ilya Sutskever.

Gnumpy is licensed with a BSD-style license (i.e. it’s completely free to use for everyone, also as a component in commercial software), with one added note: if you use it for scientific work that gets published, you must include reference to the Gnumpy tech report in your publication. For details of the license, see the top of

See also this presentation by Xavier Arrufat, introducing numpy at the Python for Data Analysis meetup in Barcelona, 2013.

Open Source Deep Learning Project: CUDAMat

CUDAMat: Python module for performing basic dense linear algebra computations on the GPU using CUDA

Project Website: None

Github Link:


The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations our GPU matrix class supports include:

Easy conversion to and from instances of numpy.ndarray.
Limited slicing support.
Matrix multiplication and transpose.
Elementwise addition, subtraction, multiplication, and division.
Elementwise application of exp, log, pow, sqrt.
Summation, maximum and minimum along rows or columns.
Conversion of CUDA errors into Python exceptions.
The current feature set of cudamat is biased towards features needed for implementing some common machine learning algorithms. We have included implementations of feedforward neural networks and restricted Boltzmann machines in the examples that come with cudamat.


import numpy as np
import cudamat as cm


# create two random matrices and copy them to the GPU
a = cm.CUDAMatrix(np.random.rand(32, 256))
b = cm.CUDAMatrix(np.random.rand(256, 32))

# perform calculations on the GPU
c =, b)
d = c.sum(axis = 0)

# copy d back to the host (CPU) and print

Open Source Deep Learning Project: EBLearn

EBLearn: Open Source C++ Machine Learning Library

Project Website:

Github Link: None


Eblearn is an object-oriented C++ library that implements various machine learning models, including energy-based learning, gradient-based learning for machine composed of multiple heterogeneous modules. In particular, the library provides a complete set of tools for building, training, and running convolutional networks.

Open Source Deep Learning Project: Nengo

The Nengo Neural Simulator

Project Website:

Github Link: None


Nengo is a graphical and scripting based software package for simulating large-scale neural systems. The book How to build a brain, which includes Nengo tutorials, is now available. This website also has additional information on the book.

To use Nengo, you define groups of neurons in terms of what they represent, and then form connections between neural groups in terms of what computation should be performed on those representations. Nengo then uses the Neural Engineering Framework (NEF) to solve for the appropriate synaptic connection weights to achieve this desired computation. Nengo also supports various kinds of learning. Nengo helps make detailed spiking neuron models that implement complex high-level cognitive algorithms.

Among other things, Nengo has been used to implement motor control, visual attention, serial recall, action selection, working memory, attractor networks, inductive reasoning, path integration, and planning with problem solving (see the model archives and publications for details).

Open Source Deep Learning Project: CXXNET

CXXNET is a fast, concise, distributed deep learning framework

Project Website: CXXNet now move forward to MXNet

Github Link:


CXXNET is a fast, concise, distributed deep learning framework.


Learning to use cxxnet by examples
Note on Code
User Group(TODO)
Feature Highlights

Lightweight: small but sharp knife
cxxnet contains concise implementation of state-of-art deep learning models
The project maintains a minimum dependency that makes it portable and easy to build
Scale beyond single GPU and single machine
The library works on multiple GPUs, with nearly linear speedup
THe library works distributedly backed by disrtibuted parameter server
Easy extensibility with no requirement on GPU programming
cxxnet is build on mshadow
developer can write numpy-style template expressions to extend the library only once
mshadow will generate high performance CUDA and CPU code for users
It brings concise and readable code, with performance matching hand crafted kernels
Convenient interface for other languages
Python interface for training from numpy array, and prediction/extraction to numpy array
Matlab interface

Open Source Deep Learning Project: mshadow

mshadow: Matrix Shadow

Project Website: None

Github Link:


MShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for maximum performance and control, while also emphasize simplicty.

MShadow also provides interface that allows writing Multi-GPU and distributed deep learning programs in an easy and unified way.


Efficient: all the expression you write will be lazily evaluated and compiled into optimized code
No temporal memory allocation will happen for expression you write
mshadow will generate specific kernel for every expression you write in compile time.
Device invariant: you can write one code and it will run on both CPU and GPU
Simple: mshadow allows you to write machine learning code using expressions.
Whitebox: put a float* into the Tensor struct and take the benefit of the package, no memory allocation is happened unless explicitly called
Lightweight library: light amount of code to support frequently used functions in machine learning
Extendable: user can write simple functions that plugs into mshadow and run on GPU/CPU, no experience in CUDA is required.
MultiGPU and Distributed ML: mshadow-ps interface allows user to write efficient MultiGPU and distributed programs in an unified way.

Open Source Deep Learning Project: deepmat


Project Website: None

Github Link:


= Generative Stochastic Network =

A simple implementation of GSN according to (Bengio et al., 2013)
= Convolutional Neural Network =

A naive implementation (purely using Matlab)
Pooling: max (Jonathan Masci’s code) and average
Not for serious use!
= Restricted Boltzmann Machine & Deep Belief Networks =

Binary/Gaussian Visible Units + Binary Hidden Units
Enhanced Gradient, Adaptive Learning Rate
Adadelta for RBM
Contrastive Divergence
(Fast) Persistent Contrastive Divergence
Parallel Tempering
DBN: Up-down Learning Algorithm
= Deep Boltzmann Machine =

Binary/Gaussian Visible Units + Binary Hidden Units
(Persistent) Contrastive Divergence
Enhanced Gradient, Adaptive Learning Rate
Two-stage Pretraining Algorithm (example)
Centering Trick (fixed center variables only)
= Denoising Autoencoder (Tied Weights) =

Binary/Gaussian Visible Units + Binary(Sigmoid)/Gaussian Hidden Units
tanh/sigm/relu nonlinearities
Shallow: sparsity, contractive, soft-sparsity (log-cosh) regularization
Deep: stochastic backprop
Adagrad, Adadelta
= Multi-layer Perceptron =

Stochastic Backpropagation, Dropout
tanh/sigm/relu nonlinearities
Adagrad, Adadelta
Balanced minibatches using crossvalind()