Computing Word Similarity
and Identifying Cognates
with Pair Hidden Markov Models
We present a system for computing similarity between
pairs of words. Our system is based on Pair Hidden
Markov Models, a variation on Hidden Markov Models that
has been used successfully for the alignment of
biological sequences. The parameters of the model are
automatically learned from training data that consists
of word pairs known to be similar. Our tests
focus on the identification of cognates --- words of
common origin in related languages. The results show
that our system outperforms previously proposed
techniques.