Cognates and Word Alignment in Bitexts
We evaluate several orthographic word similarity measures
in the context of bitext word alignment.
We investigate the relationship between
the length of the words and the length of their longest common subsequence.
We present an alternative to
the longest common subsequence ratio (LCSR),
a widely-used orthographic word similarity measure.
Experiments involving identification of cognates in bitexts
suggest that the alternative method outperforms LCSR.
Our results also indicate that
alignment links can be used as a substitute for cognates
for the purpose of evaluating word similarity measures.