The Totonac-Tepehua Comparative Dictionary Project

The Totonac-Tepehua Comparative Dictionary Project is an attempt to automatically generate a dictionary of cognate sets for a relatively poorly understood language family. The project is part of the larger Upper Necaxa Totonac Field Project, which is aimed at documenting and revitalising an endagered and understudied Indigenous language of Mexico.

The project is still in its early stages, but it has already seen the development of fairly accurate cognate identification tools that can be applied successfuly to a wide variety of language families using minimal expert evaluation. It has also revealed a few surprising, linguistically interesting phoneme correspondences across languges. Still, the current cognate dictionary will likely undergo several revisions as the methods for determining cognate sets develop.

The most recently generated dictionary can be found here.

An ACL workshop paper describing our work can be found here.

If you use any of this data in your work, please cite:

Grzegorz Kondrak, David Beck, and Philip Dilts. Creating a Comparative Dictionary of Totonac-Tepehua. Proceedings of the ACL Workshop on Computing and Historical Phonology, (Ninth Meeting of ACL Special Interest Group for Computational Morphology and Phonology), pp. 134-141, Prague, Czech Republic, June 2007.
[Abstract (HTML)] [PostScript] [PDF]

Please send us an e-mail if you find the data useful. We'll be happy to help if you need assistance.

The following is the only published dictionary used for the project:

Herman P. Aschmann. 1983. Vocabulario totonaco de la Sierra. Summer Institute of Linguistics, Mexico.