Open Source Text Processing Project: GibbsLDA++

GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation

Project Website:

Github Link: None

Description

GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is designed to analyze hidden/latent topic structures of large-scale datasets including large collections of text/Web documents. LDA was first introduced by David Blei et al [Blei03]. There have been several implementations of this model in C (using Variational Methods), Java, and Matlab. We decided to release this implementation of LDA in C/C++ using Gibbs Sampling to provide an alternative to the topic-model community.

GibbsLDA++ is useful for the following potential application areas:

Information retrieval and search (analyzing semantic/latent topic/concept structures of large text collection for a more intelligent information search).
Document classification/clustering, document summarization, and text/web mining community in general.
Content-based image clustering, object recognition, and other applications of computer vision in general.
Other potential applications in biological data.


Leave a Reply

Your email address will not be published. Required fields are marked *