Towards cultural preservation

Machine Translation for Indigenous Language Preservation

Indigenous languages in Colombia face the risk of extinction due to the decline in native speakers and lack of digital resources. This project leverages AI-driven machine translation models tailored for low-resource languages, aiming to create digital tools that support linguistic preservation. Additionally, we are actively working on translations to expand the available datasets for Indigenous language machine translation, ensuring better model performance and broader linguistic coverage.

Explore Resources
Logo

Project Timeline

2023

The project began as an academic initiative in the NLP course of the Master’s in Systems Engineering at Universidad de los Andes. Our research focused on translation models for low-resource languages, specifically studying Wayuunaiki and Ika. The results were presented at WSDM.

2024

The project continued with data collection for Wayuunaiki, Ika, Inga, and Nasa Yuwe. The results were presented at the 4th Workshop on NLP for Indigenous Languages of the Americas. We also started working with a translator of Wayuunaiki to increase the quality and accuracy of our datasets.

2025

Our datasets were included in AmericasNLP 2025, marking the first time that Colombian Indigenous languages were represented in the initiative. Additionally, we have started working with a Nasa Yuwe translator and are actively seeking new collaborations to further expand and improve our work.


Translation Model

Our Methodology

We leverage high-resource language models to develop translators for low-resource Indigenous languages. To achieve this, we gather valuable, high-quality translations of short phrases into Spanish, working closely with native speakers and expert translators.


Our Languages

Colombia is home to a rich linguistic heritage, including Wayuunaiki, Nasa Yuwe, Inga, and Arhuaco. We collaborate with native speakers to support these languages:

  • Wayuunaiki – Spoken by the Wayuu people in La Guajira, it is the most widely spoken Indigenous language in Colombia.
  • Nasa Yuwe – Used by the Nasa community in Cauca and neighboring regions, it is considered an isolated language with unique grammar.
  • Inga – A Quechuan language spoken in Putumayo and Nariño, preserving Incan linguistic traditions.
  • Arhuaco (Ika) – A Chibchan language spoken in the Sierra Nevada de Santa Marta, known for its complex structure.

By working with native speakers, we help preserve and strengthen these languages through high-quality translations and language technology.

Translation Model

Our team of researchers

Rubén Manrique

Rubén Manrique

Lead Researcher

rf.manrique@uniandes.edu.co

Juan Camilo Prieto

Juan Camilo Prieto

Researcher

jc.prietoa@uniandes.edu.co

Melissa Robles

Melissa Robles

Researcher

mv.robles@uniandes.edu.co

Translators

Antonio José Ipuana

Antonio José Ipuana

Wayuunaiki Translator

Wayuunaiki translation services

Manuel Muyuy

Manuel Muyuy

Inga Translator

Universidad de Cauca