Open Source Text Processing Project: The Stanford Parser (A statistical parser)

The Stanford Parser: A statistical parser

Project Website:

Github Link: None

Description

A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s.

This package is a Java implementation of probabilistic natural language parsers, both highly optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser. The original version of this parser was mainly written by Dan Klein, with support code and linguistic grammar development by Christopher Manning. Extensive additional work (internationalization and language-specific modeling, flexible input/output, grammar compaction, lattice parsing, k-best parsing, typed dependencies output, user support, etc.) has been done by Roger Levy, Christopher Manning, Teg Grenager, Galen Andrew, Marie-Catherine de Marneffe, Bill MacCartney, Anna Rafferty, Spence Green, Huihsin Tseng, Pi-Chuan Chang, Wolfgang Maier, and Jenny Finkel.

The lexicalized probabilistic parser implements a factored product model, with separate PCFG phrase structure and lexical dependency experts, whose preferences are combined by efficient exact inference, using an A* algorithm. Or the software can be used simply as an accurate unlexicalized stochastic context-free grammar parser. Either of these yields a good performance statistical parsing system. A GUI is provided for viewing the phrase structure tree output of the parser.

As well as providing an English parser, the parser can be and has been adapted to work with other languages. A Chinese parser based on the Chinese Treebank, a German parser based on the Negra corpus and Arabic parsers based on the Penn Arabic Treebank are also included. The parser has also been used for other languages, such as Italian, Bulgarian, and Portuguese.

The parser provides Universal Dependencies and Stanford Dependencies output as well as phrase structure trees. Typed dependencies are otherwise known grammatical relations. This style of output is available only for English and Chinese. For more details, please refer to the Stanford Dependencies webpage and the Universal Dependencies documentation.

Shift-reduce constituency parser
As of version 3.4 in 2014, the parser includes the code necessary to run a shift reduce parser, a much faster constituent parser with competitive accuracy. Models for this parser are linked below.

Neural-network dependency parser
In version 3.5.0 (October 2014) we released a high-performance dependency parser powered by a neural network. The parser outputs typed dependency parses for English and Chinese. The models for this parser are included in the general Stanford Parser models package.

Dependency scoring
The package includes a tool for scoring of generic dependency parses, in a class edu.stanford.nlp.trees.DependencyScoring. This tool measures scores for dependency trees, doing F1 and labeled attachment scoring. The included usage message gives a detailed description of how to use the tool.


Leave a Reply

Your email address will not be published. Required fields are marked *