Text Processing Project | TextProcessing | A Text Processing Portal for Humans

Open Source Text Processing Project: langid

Posted on January 8, 2016 by textprocessingJanuary 8, 2016

langid.py： Stand-alone language identification system Project Website: None Github Link: https://github.com/saffsd/langid.py Description langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained over a large number of languages (currently 97) Not sensitive to domain-specific … Continue reading →

Open Source Text Processing Project: LingPipe

Posted on January 7, 2016 by textprocessingJanuary 7, 2016

LingPipe Project Website: http://alias-i.com/lingpipe/ Github Link: None Description LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: Find the names of people, organizations or locations in news Automatically classify Twitter search results … Continue reading →

Open Source Text Processing Project: OpenNLP

Posted on January 6, 2016 by textprocessingJanuary 6, 2016

Apache OpenNLP Project Website: https://opennlp.apache.org/ Github Link: None Description The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, … Continue reading →

Open Source Text Processing Project: GATE

Posted on January 5, 2016 by textprocessingJanuary 5, 2016

GATE: a full-lifecycle open source solution for text processing Project Website: https://gate.ac.uk/ Github Link: None Description GATE is… open source software capable of solving almost any text processing problem a mature and extensive community of developers, users, educators, students and … Continue reading →

Open Source Text Processing Project: ClearNLP

Posted on January 4, 2016 by textprocessingJanuary 4, 2016

ClearNLP：Software and resources for natural language processing Project Website: http://www.clearnlp.com Github Link: https://github.com/clir/clearnlp Description The ClearNLP project provides software and resources for natural language processing. The project started at the Center for Computational Language and EducAtion Research, and is currently … Continue reading →

Open Source Text Processing Project: Stanford Open Information Extraction

Posted on January 3, 2016 by textprocessingJanuary 3, 2016

Stanford Open Information Extraction Project Website: http://nlp.stanford.edu/software/openie.shtml Github Link: None Description Open information extraction (open IE) refers to the extraction of structured relation triples from plain text, such that the schema for these relations does not need to be specified … Continue reading →

Open Source Text Processing Project: Stanford Tokenizer

Posted on January 2, 2016 by textprocessingJanuary 2, 2016

Stanford Tokenizer Project Website: http://nlp.stanford.edu/software/tokenizer.shtml Github Link: None Description A tokenizer divides text into a sequence of tokens, which roughly correspond to “words”. We provide a class suitable for tokenization of English, called PTBTokenizer. It was initially designed to largely … Continue reading →

Open Source Text Processing Project: Stanford Classifer

Posted on January 1, 2016 by textprocessingJanuary 1, 2016

Stanford Classifer Project Website: http://nlp.stanford.edu/software/classifier.shtml Github Link: None Description A classifier is a machine learning tool that will take data items and place them into one of k classes. A probabilistic classifier, like this one, can also give a probability … Continue reading →

Open Source Text Processing Project: Stanford Word Segmenter

Posted on December 31, 2015 by textprocessingDecember 31, 2015

Stanford Word Segmenter Project Website: http://nlp.stanford.edu/software/segmenter.shtml Github Link: None Description Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require … Continue reading →

Open Source Text Processing Project: The Stanford Parser (A statistical parser)

Posted on December 30, 2015 by textprocessingDecember 30, 2015

The Stanford Parser: A statistical parser Project Website: http://nlp.stanford.edu/software/lex-parser.shtml Github Link: None Description A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which … Continue reading →