Open Source Text Processing Project: Gensim

Deep Learning Specialization on Coursera

Gensim: Topic Modelling for Humans

Project Website:

Github Link:


Gensim is a FREE Python library:

Scalable statistical semantics
Analyze plain-text documents for semantic structure
Retrieve semantically similar documents

Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library in 2008, where it served to generate a short list of the most similar articles to a given article (gensim = “generate similar”).

Later versions of gensim improved this efficiency and scalability tremendously.

By now, gensim is—to my knowledge—the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. It stands in contrast to brittle homework-assignment-implementations that do not scale on one hand, and robust java-esque projects that take forever just to run “hello world”.

Leave a Reply

Your email address will not be published. Required fields are marked *