MGIZA++: a multi-threaded word alignment tool based on GIZA++
Project Website: http://www.kyloo.net/software/doku.php/mgiza:overview
Github Link: https://github.com/moses-smt/mgiza
MGIZA++ is a multi-threaded word alignment tool based on GIZA++. It extends GIZA++ in multiple ways:
MGIZA++ can make use of multi-core platforms efficiently. Usually a quad-core machine can have a three-fold speedup over single-thread GIZA++.
By eliminating duplicated tables, MGIZA++ can save a lot of memory comparing to GIZA++.
MGIZA++ can resume training from any stage and continue training. For example you may be able to re-use previous available models and continue training directly from IBM Model 4 instead of all the way from Model 1.
Integrated with Chaski
MGIZA++ can be integrated into Chaski and run on cluters, which will give you even larger speedup.
Native Windows support
MGIZA++ can now be compiled in Visual Studio, providing native MS Windows support. The latest version is, however, not stable when compiled as 64bit.
If MGIZA++ helps you, please be kind to cite the following paper in addition to the GIZA++ one:
Qin Gao, Stephan Vogel, “Parallel Implementations of Word Alignment Tool”, Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49-57, June, 2008 pdf bib