Stanford Word Segmenter | TextProcessing | A Text Processing Portal for Humans

Open Source Text Processing Project: Stanford Word Segmenter

Posted on December 31, 2015 by textprocessingDecember 31, 2015

Stanford Word Segmenter Project Website: http://nlp.stanford.edu/software/segmenter.shtml Github Link: None Description Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require … Continue reading →