Getting started with Sentence Alignment

Sentence Alignment is an old and new problem, which is very important for machine translation.

An influential early method is based on sentence length, measured in words:

1)Peter F. Brown and Jennifer C. Lai and Robert L. Mercer (1991): , Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)
2) William A. Gale and Kenneth Ward Church (1991): , Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)
3) William A. Gale and Kenneth Ward Church (1993): , Computational Linguistics
4) Kenneth Ward Church (1993): , Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL)

Classical and Useful, Related with “lexical” information:

5) Robert C. Moore (2002): , Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings
Related Perl Code:

6) Singh, Anil Kumar and Husain, Samar (2005): , Proceedings of the ACL Workshop on Building and Using Parallel Texts

7) Bleualgin:

The algorithm is described in:

Rico Sennrich, Martin Volk (2010): In: Proceedings of AMTA 2010, Denver, Colorado.

Rico Sennrich; Martin Volk (2011): . In: NODALIDA 2011, Nordic Conference of Computational Linguistics, Riga.

8)

D. Varga, L. Németh, P. Halácsy, A. Kornai, V. Trón, V. Nagy (2005).

In Proceedings of the RANLP 2005, pages 590-596.

https://github.com/danielvarga/hunalign
https://github.com/bitextor/hunalign

9)

yasa is a program that aligns two translations of a text sentence by sentence in order to produce a bi-text

github:

10) mALIGNa: Bilingual sengence aligner

11) SMT-LowRec


This repository is for the following paper:

Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity

The repository includes:

Corpora
Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
Sentence alignment
The Java implementation of [Moore, 2002] for sentence alignment.
Extending word alignment by word similarity using word2vec
Pivot translation
The Java implementation of [Wu and Wang, 2007]

12)

13)

14)
LF Aligner helps translators create translation memories from texts and their translations. It relies on Hunalign for automatic sentence pairing. Input: txt, doc, docx, rtf, pdf, html. Output: tab delimited txt, TMX and xls. With web features.

Ref:


Leave a Reply

Your email address will not be published. Required fields are marked *