Identifying Cognates by Phonetic and Semantic Similarity
I present a method of identifying cognates in the vocabularies of
related languages.
I show that a measure of phonetic similarity based on multivalued features
performs better than "orthographic" measures, such as
the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient.
I introduce a procedure for estimating semantic similarity of glosses
that employs keyword selection and WordNet.
Tests performed on vocabularies of four Algonquian languages
indicate that the method is capable of discovering on average
nearly 75% percent of cognates at 50% precision.