Open Source Text Processing Project: Maximum Entropy Modeling Toolkit

Maximum Entropy Modeling Toolkit for Python and C++ Project Website: http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html Github Link: https://github.com/lzhang10/maxent Description The Maximum Entropy Toolkit provides a set of tools and library for constructing maximum entropy (maxent) model in either Python or C++. Maxent Entropy Model … Continue reading

Open Source Text Processing Project: CRF++

CRF++: Yet Another CRF toolkit Project Website: https://taku910.github.io/crfpp/ Github Link: None Description CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed for generic purpose and will be applied … Continue reading

Open Source Text Processing Project: GibbsLDA++

GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation Project Website: http://gibbslda.sourceforge.net/ Github Link: None Description GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is … Continue reading

Open Source Text Processing Project: WhatLanguage

WhatLanguage: A language detection library for Ruby that uses bloom filters for speed. Project Website: None Github Link: https://github.com/peterc/whatlanguage Description Text language detection. Quick, fast, memory efficient, and all in pure Ruby. Uses Bloom filters for aforementioned speed and memory … Continue reading

Text Processing Book: Text Processing with Ruby

Text Processing with Ruby: Extract Value from the Data That Surrounds You Description Text is everywhere. Web pages, databases, the contents of files–for almost any programming task you perform, you need to process text. Cut even the most complex text-based … Continue reading

Text Processing Book: Taming Text – How to Find, Organize, and Manipulate It 1st Edition

Taming Text: How to Find, Organize, and Manipulate It Description Summary Taming Text, winner of the 2013 Jolt Awards for Productivity, is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores … Continue reading

Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Open Source Text Processing Project: THUTag

THUTag: A Package of Keyphrase Extraction and Social Tag Suggetion Project Website: None Github Link: https://github.com/YeDeming/THUTag/ Description Part I : THUTag Contents Part II : How To Compile THUTag Part III : How To Run Cross-validation of THUTag Part IV … Continue reading

Text Processing Book: Python 2.6 Text Processing Beginners Guide

Python 2.6 Text Processing: Beginners Guide Description With a basic knowledge of Python you have the potential to undertake time-saving text processing. This book is a great introduction to the various techniques, and teaches through practical examples and clear explanations. … Continue reading

Open Source Text Processing Project: langid

langid.py: Stand-alone language identification system Project Website: None Github Link: https://github.com/saffsd/langid.py Description langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained over a large number of languages (currently 97) Not sensitive to domain-specific … Continue reading