Identifying Complex Sound Correspondences in Bilingual Wordlists
The determination of recurrent sound correspondences between languages
is crucial for the identification of cognates,
which are often employed in statistical machine translation
for sentence and word alignment.
In this paper,
an algorithm designed for extracting non-compositional compounds from bitexts
is shown to be capable of
determining complex sound correspondences in bilingual wordlists.
In experimental evaluation, a C++ implementation of the algorithm
achieves approximately 90\% recall and precision on authentic language data.