Open Source Text Processing Project: tagger

tagger: A Python module for extracting relevant tags from text documents Project Website: None Github Link: https://github.com/apresta/tagger Description Module for extracting tags from text documents. Extracting tags from a text document involves at least three steps: splitting the document into … Continue reading

Text Processing Course: Introduction to Natural Language Processing

Name: Introduction to Natural Language Processing Website: http://people.cs.umass.edu/~brenocon/inlp2015/ Description Natural Language Processing (NLP) is the engineering art and science of how to teach computers to understand human language. NLP is a type of artificial intelligence technology, and it’s now ubiquitous … Continue reading

Open Source Text Processing Project: Moses

Moses, the machine translation system Project Website: http://www.statmt.org/moses/ Github Link: https://github.com/moses-smt/mosesdecoder Description Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts … Continue reading

Open Source Text Processing Project: RAKE

RAKE: A python implementation of the Rapid Automatic Keyword Extraction Project Website: None Github Link: https://github.com/aneesha/RAKE Description A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. … Continue reading

Open Source Text Processing Project: KEA

KEA: Keyphrase Extraction Algorithm Project Website: http://www.nzdl.org/Kea/ Github Link: None Description Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful … Continue reading

Open Source Text Processing Project: Open Text Summarizer

Open Text Summarizer Project Website: http://libots.sourceforge.net/ Github Link: None Description Automatic text summarization is the technique, where a computer program summarizes a document. A text is put into the computer and a highlighted (summarized) text is returned. The Open Text … Continue reading

Open Source Text Processing Project: Reduction

Reduction Project Website: None Github Link: https://github.com/adamfabish/Reduction Description Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important. Example usage: from reduction import * reduction = Reduction() text = … Continue reading

Open Source Text Processing Project: summarize

summarize: A python library for simple text summarization Project Website: None Github Link: https://github.com/thavelick/summarize/ Description Installation First install nltk and numpy: sudo pip install nltk sudo pip install numpy Then install the punkt and stopwords nltk packages: sudo python -m … Continue reading

Open Source Text Processing Project: Sumy

Sumy: Automatic text summarizer Project Website: https://pypi.python.org/pypi/sumy Github Link: https://github.com/miso-belica/sumy Description Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains simple evaluation framework for text summaries. Implemented summarization methods: Luhn … Continue reading

Open Source Text Processing Project: Gensim

Gensim: Topic Modelling for Humans Project Website: https://radimrehurek.com/gensim/ Github Link: https://github.com/piskvorky/gensim/ Description Gensim is a FREE Python library: Scalable statistical semantics Analyze plain-text documents for semantic structure Retrieve semantically similar documents Gensim started off as a collection of various Python … Continue reading