Open Source Text Processing Project: segtok

segtok: sentence segmentation and word tokenization tools Project Website: Github Link: Description A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features. The segtok package provides two modules, segtok.segmenter and segtok.tokenizer. The segmenter provides functionality for … Continue reading

Open Source Text Processing Project: OpenNLP

Apache OpenNLP Project Website: Github Link: None Description The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, … Continue reading