Open Source Text Processing Project: Reduction

Reduction Project Website: None Github Link: https://github.com/adamfabish/Reduction Description Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important. Example usage: from reduction import * reduction = Reduction() text = … Continue reading

Open Source Text Processing Project: summarize

summarize: A python library for simple text summarization Project Website: None Github Link: https://github.com/thavelick/summarize/ Description Installation First install nltk and numpy: sudo pip install nltk sudo pip install numpy Then install the punkt and stopwords nltk packages: sudo python -m … Continue reading

Open Source Text Processing Project: Sumy

Sumy: Automatic text summarizer Project Website: https://pypi.python.org/pypi/sumy Github Link: https://github.com/miso-belica/sumy Description Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods: Luhn … Continue reading

Open Source Text Processing Project: Gensim

Gensim: Topic Modelling for Humans Project Website: https://radimrehurek.com/gensim/ Github Link: https://github.com/piskvorky/gensim/ Description Gensim is a FREE Python library: Scalable statistical semantics Analyze plain-text documents for semantic structure Retrieve semantically similar documents Gensim started off as a collection of various Python … Continue reading

Open Source Text Processing Project: Maximum Entropy Modeling Toolkit

Maximum Entropy Modeling Toolkit for Python and C++ Project Website: http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html Github Link: https://github.com/lzhang10/maxent Description The Maximum Entropy Toolkit provides a set of tools and library for constructing maximum entropy (maxent) model in either Python or C++. Maxent Entropy Model … Continue reading

Open Source Text Processing Project: CRF++

CRF++: Yet Another CRF toolkit Project Website: https://taku910.github.io/crfpp/ Github Link: None Description CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed for generic purpose and will be applied … Continue reading

Open Source Text Processing Project: GibbsLDA++

GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation Project Website: http://gibbslda.sourceforge.net/ Github Link: None Description GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is … Continue reading

Open Source Text Processing Project: WhatLanguage

WhatLanguage: A language detection library for Ruby that uses bloom filters for speed. Project Website: None Github Link: https://github.com/peterc/whatlanguage Description Text language detection. Quick, fast, memory efficient, and all in pure Ruby. Uses Bloom filters for aforementioned speed and memory … Continue reading

Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Open Source Text Processing Project: THUTag

THUTag: A Package of Keyphrase Extraction and Social Tag Suggetion Project Website: None Github Link: https://github.com/YeDeming/THUTag/ Description Part I : THUTag Contents Part II : How To Compile THUTag Part III : How To Run Cross-validation of THUTag Part IV … Continue reading