Open Source Text Processing Project: Gensim

Gensim: Topic Modelling for Humans Project Website: https://radimrehurek.com/gensim/ Github Link: https://github.com/piskvorky/gensim/ Description Gensim is a FREE Python library: Scalable statistical semantics Analyze plain-text documents for semantic structure Retrieve semantically similar documents Gensim started off as a collection of various Python … Continue reading

Open Source Text Processing Project: Maximum Entropy Modeling Toolkit

Maximum Entropy Modeling Toolkit for Python and C++ Project Website: http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html Github Link: https://github.com/lzhang10/maxent Description The Maximum Entropy Toolkit provides a set of tools and library for constructing maximum entropy (maxent) model in either Python or C++. Maxent Entropy Model … Continue reading

Open Source Text Processing Project: CRF++

CRF++: Yet Another CRF toolkit Project Website: https://taku910.github.io/crfpp/ Github Link: None Description CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed for generic purpose and will be applied … Continue reading

Open Source Text Processing Project: GibbsLDA++

GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation Project Website: http://gibbslda.sourceforge.net/ Github Link: None Description GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is … Continue reading

Open Source Text Processing Project: WhatLanguage

WhatLanguage: A language detection library for Ruby that uses bloom filters for speed. Project Website: None Github Link: https://github.com/peterc/whatlanguage Description Text language detection. Quick, fast, memory efficient, and all in pure Ruby. Uses Bloom filters for aforementioned speed and memory … Continue reading

Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Open Source Text Processing Project: THUTag

THUTag: A Package of Keyphrase Extraction and Social Tag Suggetion Project Website: None Github Link: https://github.com/YeDeming/THUTag/ Description Part I : THUTag Contents Part II : How To Compile THUTag Part III : How To Run Cross-validation of THUTag Part IV … Continue reading

Open Source Text Processing Project: langid

langid.py: Stand-alone language identification system Project Website: None Github Link: https://github.com/saffsd/langid.py Description langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained over a large number of languages (currently 97) Not sensitive to domain-specific … Continue reading

Open Source Text Processing Project: LingPipe

LingPipe Project Website: http://alias-i.com/lingpipe/ Github Link: None Description LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: Find the names of people, organizations or locations in news Automatically classify Twitter search results … Continue reading

Open Source Text Processing Project: OpenNLP

Apache OpenNLP Project Website: https://opennlp.apache.org/ Github Link: None Description The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, … Continue reading