Open Source Text Processing Project: Reduction

Reduction Project Website: None Github Link: https://github.com/adamfabish/Reduction Description Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important. Example usage: from reduction import * reduction = Reduction() text = … Continue reading

Open Source Text Processing Project: summarize

summarize: A python library for simple text summarization Project Website: None Github Link: https://github.com/thavelick/summarize/ Description Installation First install nltk and numpy: sudo pip install nltk sudo pip install numpy Then install the punkt and stopwords nltk packages: sudo python -m … Continue reading

Open Source Text Processing Project: Sumy

Sumy: Automatic text summarizer Project Website: https://pypi.python.org/pypi/sumy Github Link: https://github.com/miso-belica/sumy Description Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods: Luhn … Continue reading

Open Source Text Processing Project: Gensim

Gensim: Topic Modelling for Humans Project Website: https://radimrehurek.com/gensim/ Github Link: https://github.com/piskvorky/gensim/ Description Gensim is a FREE Python library: Scalable statistical semantics Analyze plain-text documents for semantic structure Retrieve semantically similar documents Gensim started off as a collection of various Python … Continue reading

Open Source Text Processing Project: Jieba

Jieba: Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Text Processing Book: Python 2.6 Text Processing Beginners Guide

Python 2.6 Text Processing: Beginners Guide Description With a basic knowledge of Python you have the potential to undertake time-saving text processing. This book is a great introduction to the various techniques, and teaches through practical examples and clear explanations. … Continue reading

Open Source Text Processing Project: langid

langid.py: Stand-alone language identification system Project Website: None Github Link: https://github.com/saffsd/langid.py Description langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained over a large number of languages (currently 97) Not sensitive to domain-specific … Continue reading

Open Source Text Processing Project: Pattern

Pattern Project Website: http://www.clips.ua.ac.be/pattern Github Link: https://github.com/clips/pattern Description Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing … Continue reading

Open Source Text Processing Project: MBSP

MBSP for Python Project Website: http://www.clips.ua.ac.be/pages/MBSP Description MBSP is a text analysis system based on the TiMBL and MBT memory based learning applications developed at CLiPS and ILK. It provides tools for Tokenization and Sentence Splitting, Part of Speech Tagging, … Continue reading

Text Processing Book: Text Processing in Python 1st Edition

Text Processing in Python Description Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring … Continue reading