Open Source Text Processing Project: Sumy

Deep Learning Specialization on Coursera

Sumy: Automatic text summarizer

Project Website:

Github Link:


Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods:

Luhn – heurestic method, reference
Edmundson heurestic method with previous statistic research, reference
Latent Semantic Analysis, LSA – one of the algorithm from I think the author is using more advanced algorithms now. Steinberger, J. a Ježek, K. Using latent semantic an and summary evaluation. In In Proceedings ISIM ‘04. 2004. S. 93-100.
LexRank – Unsupervised approach inspired by algorithms PageRank and HITS, reference
TextRank – some sort of combination of a few resources that I found on the internet. I really don’t remember the sources. Probably Wikipedia and some papers in 1st page of Google 🙂
SumBasic – Method that is often used as a baseline in the literature. Source: Read about SumBasic
KL-Sum – Method that greedily adds sentences to a summary so long as it decreases the KL Divergence. Source: Read about KL-Sum
Make sure you have Python 2.7/3.3+ and pip (Windows, Linux) installed. Run simply (preferred way):

$ [sudo] pip install sumy
Or for the fresh version:

$ [sudo] pip install git+git://
Sumy contains command line utility for quick summarization of documents.

$ sumy lex-rank –length=10 –url= # what’s summarization?
$ sumy luhn –language=czech –url=
$ sumy edmundson –language=czech –length=3% –url=
$ sumy –help # for more info
Various evaluation methods for some summarization method can be executed by commands below:

$ sumy_eval lex-rank reference_summary.txt –url=
$ sumy_eval lsa reference_summary.txt –language=czech –url=
$ sumy_eval edmundson reference_summary.txt –language=czech –url=
$ sumy_eval –help # for more info

Leave a Reply

Your email address will not be published. Required fields are marked *