Open Source Text Processing Project: RelationFactory

End-to-end relation extraction and knowledge base population pipeline Project Website: None Github Link: https://github.com/beroth/relationfactory Description RelationFactory is a relation extraction and knowledge-base population system. It was the top-ranked system in TAC KBP 2013 English Slot-filling (http://www.nist.gov/tac/2013/KBP/index.html). If you want to … Continue reading

Open Source Text Processing Project: PTStemmer

PTStemmer – A Stemming toolkit for the Portuguese language Project Website: https://code.google.com/archive/p/ptstemmer/ Github Link: None Description FEATURES Java, Python, and .NET C# implementations of Orengo, Porter, and Savoy stemmers Fast: can stem more than 1.5M words/second on a normal desktop … Continue reading

Open Source Text Processing Project: Snowball

Snowball Project Website: http://snowballstem.org/ Github Link: https://github.com/snowballstem/snowball Description Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it. … Continue reading

Open Source Text Processing Project: CRF Project

CRF Project Project Website: http://crf.sourceforge.net/ Github Link: None Description The CRF package is a java implementation of Conditional Random Fields for sequential labeling developed by Sunita Sarawagi of IIT Bombay. The package is distributed with the hope that it will … Continue reading

Open Source Text Processing Project: Jahmm

Jahmm: An implementation of Hidden Markov Models in Java Project Website: https://code.google.com/archive/p/jahmm/ Github Link: https://github.com/KommuSoft/jahmm Description Jahmm (pronounced “jam”), is a Java implementation of Hidden Markov Model (HMM) related algorithms. It’s been designed to be easy to use (e.g. simple … Continue reading

Open Source Text Processing Project: berkeleyparser

berkeleyparser Project Website: None Github Link: https://github.com/slavpetrov/berkeleyparser Description “THE BERKELEY PARSER” release 1.1 migrated from Google Code to GitHub July 2015 This package contains the Berkeley Parser as described in “Learning Accurate, Compact, and Interpretable Tree Annotation” Slav Petrov, Leon … Continue reading

Open Source Text Processing Project: berkeleylm

berkeleylm Project Website: https://code.google.com/archive/p/berkeleylm/ Github Link: https://github.com/adampauls/berkeleylm Description An N-gram Language Model Library from UC Berkeley This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. It is described in http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf’>this paper. … Continue reading

Open Source Text Processing Project: Joshua

Joshua Statistical Machine Translation Toolkit Project Website: http://joshua-decoder.org/ Github Link: https://github.com/joshua-decoder/joshua Description Joshua is an open-source statistical machine translation decoder for phrase-based (new in 6.0), hierarchical, and syntax-based machine translation, written in Java. It is developed at the Human Language … Continue reading

Open Source Text Processing Project: KEA

KEA: Keyphrase Extraction Algorithm Project Website: http://www.nzdl.org/Kea/ Github Link: None Description Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful … Continue reading

Text Processing Book: Taming Text – How to Find, Organize, and Manipulate It 1st Edition

Taming Text: How to Find, Organize, and Manipulate It Description Summary Taming Text, winner of the 2013 Jolt Awards for Productivity, is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores … Continue reading