Open Source Text Processing Project: segtok

segtok: sentence segmentation and word tokenization tools Project Website: http://fnl.es/segtok-a-segmentation-and-tokenization-library.html Github Link: https://github.com/fnl/segtok Description A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features. The segtok package provides two modules, segtok.segmenter and segtok.tokenizer. The segmenter provides functionality for … Continue reading

Open Source Text Processing Project: nlp-with-ruby

nlp-with-ruby: Awesome NLP with Ruby Project Website: None Github Link: https://github.com/arbox/nlp-with-ruby Description This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with Ruby. That field is often referred to as NLP, Computational … Continue reading

Open Source Text Processing Project: textacy

textacy: higher-level NLP built on spaCy Project Website: https://textacy.readthedocs.io Github Link: https://github.com/chartbeat-labs/textacy Description textacy is a Python library for performing higher-level natural language processing (NLP) tasks, built on the high-performance spaCy library. With the basics — tokenization, part-of-speech tagging, dependency … Continue reading

Open Source Text Processing Project: vivekn sentiment

Sentiment analysis using machine learning techniques Project Website: http://sentiment.vivekn.com/ Github Link: https://github.com/vivekn/sentiment Description Sentiment analysis using machine learning techniques. Check info.py for the training and testing code. A demo of the tool is available here Refer this paper for more … Continue reading

Open Source Text Processing Project: Stanford Temporal Tagger

Stanford Temporal Tagger Project Website: http://nlp.stanford.edu/software/sutime.html Github Link: None Description SUTime is a library for recognizing and normalizing time expressions. That is, it will convert next wednesday at 3pm to something like 2016-02-17T15:00 (depending on the assumed current reference time). … Continue reading

Open Source Text Processing Project: CMU Flite

CMU Flite: a small, fast run time synthesis engine Project Website: http://www.festvox.org/flite/ Github Link: None Description CMU Flite (festival-lite) is a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded … Continue reading

Open Source Text Processing Project: MaryTTS

MaryTTS: The MARY Text-to-Speech System Project Website: http://mary.dfki.de/ Github Link: None Description MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of … Continue reading

Text Processing Book: Speech and Language Processing (3rd ed. draft)

Speech and Language Processing (3rd ed. draft) Project Website: https://web.stanford.edu/~jurafsky/slp3/ Description   Chapter Slides Relation to 2nd ed. 1: Introduction [Ch. 1 in 2nd ed.] 2: Regular Expressions, Text Normalization, and Edit Distance Text [pptx] [pdf] Edit Distance [pptx] [pdf] … Continue reading

Open Source Text Processing Project: Festival

The Festival Speech Synthesis System Project Website: http://www.cstr.ed.ac.uk/projects/festival/ Github Link: None Description Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through … Continue reading

Open Source Text Processing Project: PyJulius

PyJulius: Python interface to Julius speech recognition engine Project Website: https://pypi.python.org/pypi/pyjulius Github Link: https://github.com/Diaoul/pyjulius Description pyjulius provides a simple interface to connect to julius module server First you will need to run julius with the -module option (documentation here or … Continue reading