Open Source Text Processing Project: Stanford Tokenizer

Stanford Tokenizer Project Website: http://nlp.stanford.edu/software/tokenizer.shtml Github Link: None Description A tokenizer divides text into a sequence of tokens, which roughly correspond to “words”. We provide a class suitable for tokenization of English, called PTBTokenizer. It was initially designed to largely … Continue reading