Open Source Text Processing Project: TextRank

Python implementation of TextRank algorithm Project Website: None Github Link: https://github.com/davidadamojr/TextRank Description This is a python implementation of TextRank for automatic keyword and sentence extraction (summarization) as done in https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf. However, this implementation uses Levenshtein Distance as the relation between … Continue reading

Open Source Text Processing Project: topia.termextract

topia.termextract: Content Term Extraction using POS Tagging Project Website: https://pypi.python.org/pypi/topia.termextract/ Github Link: None Description This package determines important terms within a given piece of content. It uses linguistic tools such as Parts-Of-Speech (POS) and some simple statistical analysis to determine … Continue reading

Open Source Text Processing Project: tagger

tagger: A Python module for extracting relevant tags from text documents Project Website: None Github Link: https://github.com/apresta/tagger Description Module for extracting tags from text documents. Extracting tags from a text document involves at least three steps: splitting the document into … Continue reading

Open Source Text Processing Project: RAKE

RAKE: A python implementation of the Rapid Automatic Keyword Extraction Project Website: None Github Link: https://github.com/aneesha/RAKE Description A Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. … Continue reading

Open Source Text Processing Project: KEA

KEA: Keyphrase Extraction Algorithm Project Website: http://www.nzdl.org/Kea/ Github Link: None Description Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful … Continue reading

Open Source Text Processing Project: Jieba

Jiebaļ¼š Chinese text segmentation Project Website: None Github Link: https://github.com/fxsjy/jieba Description “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. Features Support three types of segmentation mode: Accurate Mode attempts to … Continue reading

Open Source Text Processing Project: THUTag

THUTag: A Package of Keyphrase Extraction and Social Tag Suggetion Project Website: None Github Link: https://github.com/YeDeming/THUTag/ Description Part I : THUTag Contents Part II : How To Compile THUTag Part III : How To Run Cross-validation of THUTag Part IV … Continue reading