Evaluation of Several Phonetic Similarity Algorithms
on the Task of Cognate Identification
We investigate the problem of measuring phonetic similarity,
focusing on the identification of cognates, words of the same origin in
different languages.
We compare representatives of two principal approaches
to computing phonetic similarity: manually-designed metrics, and
learning algorithms.
In particular, we consider a stochastic transducer, a Pair HMM,
several DBN models, and two constructed schemes.
We test those approaches on the task of identifying cognates among
Indoeuropean languages, both in the supervised and unsupervised context.
Our results suggest that the averaged context DBN model and the Pair HMM
achieve the highest accuracy given a large training set of positive
examples.