Open Source Deep Learning Project: dlib

dlib: A toolkit for making real world machine learning and data analysis aplications in C++

Project Website:

Github Link:


Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. Dlib’s open source licensing allows you to use it in any application, free of charge.

To follow or participate in the development of dlib subscribe to dlib on github. Also be sure to read the how to contribute page if you intend to submit code to the project.

Open Source Deep Learning Project: torchnet

torchnet: Torch on steroids

Project Website: None

Github Link:


torchnet is a framework for torch which provides a set of abstractions aiming at encouraging code re-use as well as encouraging modular programming.

At the moment, torchnet provides four set of important classes:

Dataset: handling and pre-processing data in various ways.
Engine: training/testing machine learning algorithm.
Meter: meter performance or any other quantity.
Log: output performance or any other string to file / disk in a consistent manner.

Open Source Deep Learning Project: OpenNN

OpenNN – Open Neural Networks Library

Project Website:

Github Link:


OpenNN is an open source class library written in C++ programming language which implements neural networks, a main area of deep learning research. It is intended for advanced users, with high C++ and machine learning skills.

The library implements any number of layers of non-linear processing units for supervised learning. This deep architecture allows the design of neural networks with universal approximation properties.

The main advantage of OpenNN is its high performance. This library outstands in terms of execution speed and memory allocation. It is constantly optimized and parallelized in order to maximize its efficiency.

OpenNN is a software library written in C++ for predictive analytics. It implements neural networks, the most successful deep learning method.

The main advantage of OpenNN is its high performance. This library outstands in terms of execution speed and memory allocation. It is constantly optimized and parallelized in order to maximize its efficiency.

Some typical applications of OpenNN are function regression (modelling), pattern recognition (classification) and time series prediction (forecasting).

The documentation is composed by tutorials and examples to offer a complete overview about the library. The documentation can be found at the official OpenNN site.

CMakeLists.txt are build files for CMake, it is also used byt the CLion IDE.

The .pro files are project files for the Qt Creator IDE, which can be downloaded from its site. Note that OpenNN does not make use of the Qt library.

OpenNN is developed by Artelnics, a company specialized in artificial intelligence.

Open Source Deep Learning Project: ELEKTRONN

ELEKTRONN: A highly configurable toolkit for training 3d/2d CNNs and general Neural Networks

Project Website:

Github Link:


ELEKTRONN is a deep learning toolkit that makes powerful neural networks accessible to scientists outside of the machine learning community.

ELEKTRONN is a highly configurable toolkit for training 3D/2D CNNs and general Neural Networks.

It is written in Python 2 and based on Theano, which allows CUDA-enabled GPUs to significantly accelerate the pipeline.

The package includes a sophisticated training pipeline designed for classification/localisation tasks on 3D/2D images. Additionally, the toolkit offers training routines for tasks on non-image data.

ELEKTRONN was created by Marius Killinger and Gregor Urban at the Max Planck Institute For Medical Research to solve connectomics tasks.

Open Source Deep Learning Project: ConvNet

ConvNet: Convolutional Neural Networks for Matlab

Project Website: None

Github Link:


Convolutional Neural Networks for Matlab, including Invariang Backpropagation algorithm (IBP). Has versions for GPU and CPU, written on CUDA, C++ and Matlab. All versions work identically. The GPU version uses kernels from Alex Krizhevsky’s library ‘cuda-convnet2’.

Convolutional neural net is a type of deep learning classification algorithms, that can learn useful features from raw data by themselves. Learning is performed by tuning its weighs. CNNs consist of several layers, that are usually convolutional and subsampling layers following each other. Convolution layer performs filtering of its input with a small matrix of weights and applies some non-linear function to the result. Subsampling layer does not contain weights and simply reduces the size of its input by averaging of max-pooling operation. The last layer is fully connected by weights with all outputs of the previous layer. The output is also modified by a non-linear function. If your neural net consists of only fully connected layers, you get a classic neural net.

Learning process consists of 2 steps: forward and backward passes, that repeat for all objects in a training set. On the forward pass each layer transforms the output from the previous layer according to its function. The output of the last layer is compared with the label values and the total error is computed. On the backward pass the corresponding transformation happens with the derivatives of error with respect to outputs and weights of this layer. After the backward pass finished, the weights are changed in the direction that decreases the total error. This process is performed for a batch of objects simultaneously, in order to decrease the sample bias. After all the object have been processed, the process might repeat for different batch splits.

Open Source Deep Learning Project: neuralnetworks

neuralnetworks: Deep Neural Networks with GPU support

Project Website: None

Github Link:


This is a Java implementation of some of the algorithms for training deep neural networks. GPU support is provided via the OpenCL and Aparapi. The architecture is designed with modularity, extensibility and pluggability in mind.

Git structure

I’m using the git-flow model. The most stable (but older) sources are available in the master branch, while the latest ones are in the develop branch.

If you want to use the previous Java 7 compatible version you can check out this release.

Neural network types

Multilayer perceptron
Restricted Boltzmann Machine
Deep belief network
Stacked autoencodeer
Convolutional networks with max pooling, average poolng and stochastic pooling.
Maxout networks (work-in-progress)
Training algorithms

Backpropagation – supports multilayer perceptrons, convolutional networks and dropout.
Contrastive divergence and persistent contrastive divergence implemented using these and these guidelines.
Greedy layer-wise training for deep networks – works for stacked autoencoders and DBNs, but supports any kind of training.
All the algorithms support GPU execution.

Out of the box supported datasets are MNIST, CIFAR-10/CIFAR-100 (experimental, not much testing), IRIS and XOR, but you can easily implement your own.

Experimental support of RGB image preprocessing operations – affine transformations, cropping, and color scaling (see -> testImageInputProvider).

Activation functions

Weighted sum
All the functions support GPU execution. They can be applied to all types of networks and all training algorithms. You can also implement new activations.

Open Source Deep Learning Project: CUV

CUV: Matrix library for CUDA in C++ and Python

Project Website:

Github Link:


CUV is a C++ template and Python library which makes it easy to use NVIDIA(tm)


Supported Platforms:

• This library was only tested on Ubuntu Karmic, Lucid and Maverick. It uses
mostly standard components (except PyUBLAS) and should run without major
modification on any current linux system.

Supported GPUs:

• By default, code is generated for the lowest compute architecture. We
recommend you change this to match your hardware. Using ccmake you can set
the build variable “CUDA_ARCHITECTURE” for example to -arch=compute_20
• All GT 9800 and GTX 280 and above
• GT 9200 without convolutions. It might need some minor modifications to
make the rest work. If you want to use that card and have problems, just
get in contact.
• On 8800GTS, random numbers and convolutions wont work.


• Like for example Matlab, CUV assumes that everything is an n-dimensional
array called “tensor”
• Tensors can have an arbitrary data-type and can be on the host (CPU-memory)
or device (GPU-memory)
• Tensors can be column-major or row-major (1-dimensional tensors are, by
convention, row-major)
• The library defines many functions which may or may not apply to all
possible combinations. Variations are easy to add.
• For convenience, we also wrap some of the functionality provided by Alex
Krizhevsky on his website ( with
permission. Thanks Alex for providing your code!

Python Integration

• CUV plays well with python and numpy. That is, once you wrote your fast GPU
functions in CUDA/C++, you can export them using Boost.Python. You can use
Numpy for pre-processing and fancy stuff you have not yet implemented, then
push the Numpy-matrix to the GPU, run your operations there, pull again to
CPU and visualize using matplotlib. Great.

Implemented Functionality

• Simple Linear Algebra for dense vectors and matrices (BLAS level 1,2,3)
• Helpful functors and abstractions
• Sparse matrices in DIA format and matrix-multiplication for these matrices
• I/O functions using boost.serialization
• Fast Random Number Generator
• Up to now, CUV was used to build dense and sparse Neural Networks and
Restricted Boltzmann Machines (RBM), convolutional or locally connected.


• Tutorials are available on
• The documentation can be generated from the code or accessed on the


• We are eager to help you getting started with CUV and improve the library
continuously! If you have any questions, feel free to contact Hannes Schulz
(schulz at ais dot uni-bonn dot de) or Andreas Mueller (amueller at ais dot
uni-bonn dot de). You can find the website of our group at http://

Open Source Deep Learning Project: CNTK

CNTK: Computational Network Toolkit

Project Website:

Github Link:


Production-quality, Open Source, Multi-machine, Multi-GPU,
Highly efficent RNN training,
Speech, Image, Text

CNTK (, the Computational Network Toolkit by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code.

Wiki: Go to the CNTK Wiki for all information on CNTK including setup, examples, etc.

License: See in the root of this repository for the full license information.

Tutorial: Microsoft Computational Network Toolkit (CNTK) @ NIPS 2015 Workshops


Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance
Microsoft researchers win ImageNet computer vision challenge (December 2015)

Open Source Deep Learning Project: Gnumpy


Project Website: None

Github Link:


Do you want to have both the compute power of GPU’s and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that.

Gnumpy is a simple Python module that interfaces in a way almost identical to numpy, but does its computations on your computer’s GPU. See this example, training an RBM using Gnumpy.

Gnumpy runs on top of, and therefore requires, the excellent cudamat library, written by Vlad Mnih.

Gnumpy can run in simulation mode: everything happens on the CPU, but the interface is the same. This can be helpful if you like to write your programs on your GPU-less laptop before running them on a GPU-equipped machine. It also allows you to easily test what performance gain you get from using a GPU. The simulation mode requires npmat, written by Ilya Sutskever.

Gnumpy is licensed with a BSD-style license (i.e. it’s completely free to use for everyone, also as a component in commercial software), with one added note: if you use it for scientific work that gets published, you must include reference to the Gnumpy tech report in your publication. For details of the license, see the top of

See also this presentation by Xavier Arrufat, introducing numpy at the Python for Data Analysis meetup in Barcelona, 2013.

Open Source Deep Learning Project: CUDAMat

CUDAMat: Python module for performing basic dense linear algebra computations on the GPU using CUDA

Project Website: None

Github Link:


The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations our GPU matrix class supports include:

Easy conversion to and from instances of numpy.ndarray.
Limited slicing support.
Matrix multiplication and transpose.
Elementwise addition, subtraction, multiplication, and division.
Elementwise application of exp, log, pow, sqrt.
Summation, maximum and minimum along rows or columns.
Conversion of CUDA errors into Python exceptions.
The current feature set of cudamat is biased towards features needed for implementing some common machine learning algorithms. We have included implementations of feedforward neural networks and restricted Boltzmann machines in the examples that come with cudamat.


import numpy as np
import cudamat as cm


# create two random matrices and copy them to the GPU
a = cm.CUDAMatrix(np.random.rand(32, 256))
b = cm.CUDAMatrix(np.random.rand(256, 32))

# perform calculations on the GPU
c =, b)
d = c.sum(axis = 0)

# copy d back to the host (CPU) and print