Open Source Text Processing Project: Maximum Entropy Modeling Toolkit

Maximum Entropy Modeling Toolkit for Python and C++ Project Website: http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html Github Link: https://github.com/lzhang10/maxent Description The Maximum Entropy Toolkit provides a set of tools and library for constructing maximum entropy (maxent) model in either Python or C++. Maxent Entropy Model … Continue reading

Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Text Processing Book: Python 2.6 Text Processing Beginners Guide

Python 2.6 Text Processing: Beginners Guide Description With a basic knowledge of Python you have the potential to undertake time-saving text processing. This book is a great introduction to the various techniques, and teaches through practical examples and clear explanations. … Continue reading

Open Source Text Processing Project: langid

langid.py: Stand-alone language identification system Project Website: None Github Link: https://github.com/saffsd/langid.py Description langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained over a large number of languages (currently 97) Not sensitive to domain-specific … Continue reading

Open Source Text Processing Project: Pattern

Pattern Project Website: http://www.clips.ua.ac.be/pattern Github Link: https://github.com/clips/pattern Description Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing … Continue reading

Open Source Text Processing Project: MBSP

MBSP for Python Project Website: http://www.clips.ua.ac.be/pages/MBSP Description MBSP is a text analysis system based on the TiMBL and MBT memory based learning applications developed at CLiPS and ILK. It provides tools for Tokenization and Sentence Splitting, Part of Speech Tagging, … Continue reading

Text Processing Book: Text Processing in Python 1st Edition

Text Processing in Python Description Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring … Continue reading

Text Processing Book: Python Text Processing with NLTK 2.0 Cookbook

Python Text Processing with NLTK 2.0 Cookbook Description Use Python’s NLTK suite of libraries to maximize your Natural Language Processing capabilities. * Quickly get to grips with Natural Language Processing ? with Text Analysis, Text Mining, and beyond * Learn … Continue reading

Open Source Text Processing Project: TextBlob

TextBlob: Simplified Text Processing Project Website: http://textblob.readthedocs.org/en/dev/ Github Link: https://github.com/sloria/textblob Description TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech … Continue reading

Open Source Text Processing Project: spaCy

spaCy: BUILD TOMORROW’S LANGUAGE TECHNOLOGIES Project Website: https://spacy.io/ Github Link: https://github.com/honnibal/spaCy Description spaCy is a library for industrial-strength natural language processing in Python and Cython. It features state-of-the-art speed and accuracy, a concise API, and great documentation. If you’re a … Continue reading