MGIZA++: a multi-threaded word alignment tool based on GIZA++
Project Website:
Github Link:
Description
MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways:
Multi-threading
MGIZA++ can make use of multi-core platforms efficiently. Usually a quad-core machine can have a three-fold speedup over single-thread GIZA++.
Memory optimization
By eliminating duplicated tables, MGIZA++ can save a lot of memory comparing to GIZA++.
Resume training
MGIZA++ can resume training from any stage and continue training. For example you may be able to re-use previous available models and continue training directly from IBM Model 4 instead of all the way from Model 1.
Integrated with Chaski
MGIZA++ can be integrated into Chaski and run on cluters, which will give you even larger speedup.
Native Windows support
MGIZA++ can now be compiled in Visual Studio, providing native MS Windows support. The latest version is, however, not stable when compiled as 64bit.
If MGIZA++ helps you, please be kind to cite the following paper in addition to the GIZA++ one:
Qin Gao, Stephan Vogel, “Parallel Implementations of Word Alignment Tool”, Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49-57, June, 2008 pdf bib