Sentence Alignment is an old and new problem, which is very important for machine translation.
An influential early method is based on sentence length, measured in words:
1)Peter F. Brown and Jennifer C. Lai and Robert L. Mercer (1991): , Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)
2) William A. Gale and Kenneth Ward Church (1991): , Proceedings of the 29th Annual Meeting of the Association of Computational Linguistics (ACL)
3) William A. Gale and Kenneth Ward Church (1993): , Computational Linguistics
4) Kenneth Ward Church (1993): , Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (ACL)
Classical and Useful, Related with “lexical” information:
5) Robert C. Moore (2002): , Machine Translation: From Research to Real Users, 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 Tiburon, CA, USA, October 6-12, 2002, Proceedings
Related Perl Code:
6) Singh, Anil Kumar and Husain, Samar (2005): , Proceedings of the ACL Workshop on Building and Using Parallel Texts
7) Bleualgin:
The algorithm is described in:
Rico Sennrich, Martin Volk (2010): In: Proceedings of AMTA 2010, Denver, Colorado.
Rico Sennrich; Martin Volk (2011): . In: NODALIDA 2011, Nordic Conference of Computational Linguistics, Riga.
8)
D. Varga, L. Németh, P. Halácsy, A. Kornai, V. Trón, V. Nagy (2005).
In Proceedings of the RANLP 2005, pages 590-596.
https://github.com/danielvarga/hunalign
https://github.com/bitextor/hunalign
9)
yasa is a program that aligns two translations of a text sentence by sentence in order to produce a bi-text
github:
10) mALIGNa: Bilingual sengence aligner
11) SMT-LowRec
This repository is for the following paper:
Enhancing Statistical Machine Translation For Low-ResourceLanguages Using Semantic Similarity
The repository includes:
Corpora
Bilingual corpora: training, tuning, and test sets for language pairs: Japanese-Vietnamese, Indonesian-Vietnamese, Malay-Vietnamese, Filipino-Vietnamese.
Sentence alignment
The Java implementation of [Moore, 2002] for sentence alignment.
Extending word alignment by word similarity using word2vec
Pivot translation
The Java implementation of [Wu and Wang, 2007]
12)
13)
14)
LF Aligner helps translators create translation memories from texts and their translations. It relies on Hunalign for automatic sentence pairing. Input: txt, doc, docx, rtf, pdf, html. Output: tab delimited txt, TMX and xls. With web features.
Ref: