Combining Evidence in Cognate Identification
Cognates are words of the same origin that belong to distinct languages.
The problem of automatic identification of cognates
arises in language reconstruction and bitext-related tasks.
The evidence of cognation may come from various information sources, such as
phonetic similarity, semantic similarity,
and recurrent sound correspondences.
I discuss ways of defining
the measures of the various types of similarity
and propose a method of combining then into
an integrated cognate identification program.
The new method requires no manual parameter tuning
and performs well when tested
on the Indoeuropean and Algonquian lexical data.