Open Source Text Processing Project: Stanford Classifer

Stanford Classifer

Project Website:

Github Link: None

Description

A classifier is a machine learning tool that will take data items and place them into one of k classes. A probabilistic classifier, like this one, can also give a probability distribution over the class assignment for a data item. This software is a Java implementation of a maximum entropy classifier. Maximum entropy models are otherwise known as softmax classifiers and are essentially equivalent to multiclass logistic regression models (though parameterized slightly differently, in a way that is advantageous with sparse explanatory feature vectors). In other words, this is the same basic technology that you’re usually getting in various of the cloud-based machine learning APIs (Amazon, Google, …) The classification method is described in:

Christopher Manning and Dan Klein. 2003. Optimization, Maxent Models, and Conditional Estimation without Magic. Tutorial at HLT-NAACL 2003 and ACL 2003. [pdf slides] [pdf handouts]
Version 2 of the classifier was developed by Anna Rafferty, Alex Kleeman, Jenny Finkel, and Christopher Manning.

The software requires requires Java (now Java 8). As well as API access, the program includes an easy-to-use command-line interface, ColumnDataClassifier, for building models. Its features are especially suited to building models over text data, but it also supports numeric variables.


Leave a Reply

Your email address will not be published. Required fields are marked *