Open Source Text Processing Project: GIZA++

GIZA++: Training of statistical translation models

Project Website:

Github Link:

Description

GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed and written by Franz Josef Och

About GIZA++

The program includes the following extensions to GIZA:

Model 4;
Model 5;
Alignment models depending on word classes (software for producing word classes can be downloaded here;
Implements the HMM alignment model: Baum-Welch training, Forward-Backward algorithm, empty word, dependency on word classes, transfer to fertility models, …;
Includes a variant of Model 3 and Model 4 which allow the training of the parameter p_0;
Various smoothing techniques for fertility, distortion/alignment parameters;
Significant more efficient training of the fertility models;
Correct implementation of pegging as described in (Brown et al. 1993), a series of heuristics in order to make pegging sufficiently efficient;


Leave a Reply

Your email address will not be published. Required fields are marked *