Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models

We present a system for computing similarity between pairs of words. Our system is based on Pair Hidden Markov Models, a variation on Hidden Markov Models that has been used successfully for the alignment of biological sequences. The parameters of the model are automatically learned from training data that consists of word pairs known to be similar. Our tests focus on the identification of cognates --- words of common origin in related languages. The results show that our system outperforms previously proposed techniques.