Multilingual Cognate Identification using Integer Linear Programming
The identification of cognates in natural languages is a crucial
part of automatic translation lexicon construction and other
multilingual lexical tasks. We present new methods for multilingual
cognate identification using the global inference framework of
Integer Linear Programming. While previous approaches to cognate
identification have focused on pairs of natural languages, we
provide a methodology that directly forms sets of cognates
across groups of languages. We show improvements over simple
clustering techniques that do not inherently consider the
transitivity of cognate relations. Furthermore, we show that
formulations that jointly link cognates across groups of natural
languages achieve higher performance than traditional pairwise
approaches. We also describe applications of our technique to other
important problems in multilingual natural language processing.