segtok: sentence segmentation and word tokenization tools Project Website: http://fnl.es/segtok-a-segmentation-and-tokenization-library.html Github Link: https://github.com/fnl/segtok Description A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features. The segtok package provides two modules, segtok.segmenter and segtok.tokenizer. The segmenter provides functionality for … Continue reading
Tag Archives: Text Processing Project
Sentiment analysis using machine learning techniques Project Website: http://sentiment.vivekn.com/ Github Link: https://github.com/vivekn/sentiment Description Sentiment analysis using machine learning techniques. Check info.py for the training and testing code. A demo of the tool is available here Refer this paper for more … Continue reading
Stanford Temporal Tagger Project Website: http://nlp.stanford.edu/software/sutime.html Github Link: None Description SUTime is a library for recognizing and normalizing time expressions. That is, it will convert next wednesday at 3pm to something like 2016-02-17T15:00 (depending on the assumed current reference time). … Continue reading
CMU Flite: a small, fast run time synthesis engine Project Website: http://www.festvox.org/flite/ Github Link: None Description CMU Flite (festival-lite) is a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded … Continue reading
MaryTTS: The MARY Text-to-Speech System Project Website: http://mary.dfki.de/ Github Link: None Description MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of … Continue reading
The Festival Speech Synthesis System Project Website: http://www.cstr.ed.ac.uk/projects/festival/ Github Link: None Description Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through … Continue reading
PyJulius: Python interface to Julius speech recognition engine Project Website: https://pypi.python.org/pypi/pyjulius Github Link: https://github.com/Diaoul/pyjulius Description pyjulius provides a simple interface to connect to julius module server First you will need to run julius with the -module option (documentation here or … Continue reading
eSpeak text to speech Project Website: http://espeak.sourceforge.net/ Github Link: None Description eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. http://espeak.sourceforge.net eSpeak uses a “formant synthesis” method. This allows many languages … Continue reading
Julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine Project Website: http://julius.osdn.jp/ Github Link: https://github.com/julius-speech/julius Description “Julius” is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, … Continue reading
Kaldi Speech Recognition Toolkit Project Website: http://kaldi-asr.org/ Github Link: https://github.com/kaldi-asr/kaldi Description What is Kaldi? Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. … Continue reading