Open Source Text Processing Project: Snowball

Snowball Project Website: http://snowballstem.org/ Github Link: https://github.com/snowballstem/snowball Description Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it. … Continue reading

Open Source Text Processing Project: CRF Project

CRF Project Project Website: http://crf.sourceforge.net/ Github Link: None Description The CRF package is a java implementation of Conditional Random Fields for sequential labeling developed by Sunita Sarawagi of IIT Bombay. The package is distributed with the hope that it will … Continue reading

Open Source Text Processing Project: Jahmm

Jahmm: An implementation of Hidden Markov Models in Java Project Website: https://code.google.com/archive/p/jahmm/ Github Link: https://github.com/KommuSoft/jahmm Description Jahmm (pronounced “jam”), is a Java implementation of Hidden Markov Model (HMM) related algorithms. It’s been designed to be easy to use (e.g. simple … Continue reading

Open Source Text Processing Project: berkeleyparser

berkeleyparser Project Website: None Github Link: https://github.com/slavpetrov/berkeleyparser Description “THE BERKELEY PARSER” release 1.1 migrated from Google Code to GitHub July 2015 This package contains the Berkeley Parser as described in “Learning Accurate, Compact, and Interpretable Tree Annotation” Slav Petrov, Leon … Continue reading

Open Source Text Processing Project: berkeleylm

berkeleylm Project Website: https://code.google.com/archive/p/berkeleylm/ Github Link: https://github.com/adampauls/berkeleylm Description An N-gram Language Model Library from UC Berkeley This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf’>this paper. … Continue reading

Open Source Text Processing Project: Joshua

Joshua Statistical Machine Translation Toolkit Project Website: http://joshua-decoder.org/ Github Link: https://github.com/joshua-decoder/joshua Description Joshua is an open-source statistical machine translation decoder for phrase-based (new in 6.0), hierarchical, and syntax-based machine translation, written in Java. It is developed at the Human Language … Continue reading

Open Source Text Processing Project: KEA

KEA: Keyphrase Extraction Algorithm Project Website: http://www.nzdl.org/Kea/ Github Link: None Description Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful … Continue reading

Text Processing Book: Taming Text – How to Find, Organize, and Manipulate It 1st Edition

Taming Text: How to Find, Organize, and Manipulate It Description Summary Taming Text, winner of the 2013 Jolt Awards for Productivity, is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores … Continue reading

Open Source Text Processing Project: THUTag

THUTag: A Package of Keyphrase Extraction and Social Tag Suggetion Project Website: None Github Link: https://github.com/YeDeming/THUTag/ Description Part I : THUTag Contents Part II : How To Compile THUTag Part III : How To Run Cross-validation of THUTag Part IV … Continue reading

Open Source Text Processing Project: LingPipe

LingPipe Project Website: http://alias-i.com/lingpipe/ Github Link: None Description LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: Find the names of people, organizations or locations in news Automatically classify Twitter search results … Continue reading