Open Source Text Processing Project: MGIZA

MGIZA++: a multi-threaded word alignment tool based on GIZA++

Project Website:

Github Link:

Description

MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways:

Multi-threading

MGIZA++ can make use of multi-core platforms efficiently. Usually a quad-core machine can have a three-fold speedup over single-thread GIZA++.

Memory optimization

By eliminating duplicated tables, MGIZA++ can save a lot of memory comparing to GIZA++.

Resume training

MGIZA++ can resume training from any stage and continue training. For example you may be able to re-use previous available models and continue training directly from IBM Model 4 instead of all the way from Model 1.

Integrated with Chaski

MGIZA++ can be integrated into Chaski and run on cluters, which will give you even larger speedup.

Native Windows support

MGIZA++ can now be compiled in Visual Studio, providing native MS Windows support. The latest version is, however, not stable when compiled as 64bit.

If MGIZA++ helps you, please be kind to cite the following paper in addition to the GIZA++ one:

Qin Gao, Stephan Vogel, “Parallel Implementations of Word Alignment Tool”, Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49-57, June, 2008 pdf bib


Leave a Reply

Your email address will not be published. Required fields are marked *