Towards cultural preservation
Machine Translation for Indigenous Language Preservation
Indigenous languages in Colombia face the risk of extinction due to the decline in native speakers and lack of digital resources. This project leverages AI-driven machine translation models tailored for low-resource languages, aiming to create digital tools that support linguistic preservation. Additionally, we are actively working on translations to expand the available datasets for Indigenous language machine translation, ensuring better model performance and broader linguistic coverage.
Explore Resources
Project Timeline
2023
The project began as an academic initiative in the NLP course of the Master’s in Systems Engineering at Universidad de los Andes. Our research focused on translation models for low-resource languages, specifically studying Wayuunaiki and Ika. The results were presented at WSDM.
2024
The project continued with data collection for Wayuunaiki, Ika, Inga, and Nasa Yuwe. The results were presented at the 4th Workshop on NLP for Indigenous Languages of the Americas. We also started working with a translator of Wayuunaiki to increase the quality and accuracy of our datasets.
2025
Our datasets were included in AmericasNLP 2025, marking the first time that Colombian Indigenous languages were represented in the initiative. Additionally, we have started working with a Nasa Yuwe translator and are actively seeking new collaborations to further expand and improve our work.

Our Methodology
We leverage high-resource language models to develop translators for low-resource Indigenous languages. To achieve this, we gather valuable, high-quality translations of short phrases into Spanish, working closely with native speakers and expert translators.
Our Languages
Colombia is home to a rich linguistic heritage, including Wayuunaiki, Nasa Yuwe, Inga, and Arhuaco. We collaborate with native speakers to support these languages:
- Wayuunaiki – Spoken by the Wayuu people in La Guajira, it is the most widely spoken Indigenous language in Colombia.
- Nasa Yuwe – Used by the Nasa community in Cauca and neighboring regions, it is considered an isolated language with unique grammar.
- Inga – A Quechuan language spoken in Putumayo and Nariño, preserving Incan linguistic traditions.
- Arhuaco (Ika) – A Chibchan language spoken in the Sierra Nevada de Santa Marta, known for its complex structure.
By working with native speakers, we help preserve and strengthen these languages through high-quality translations and language technology.

Our team of researchers
Translators

Antonio José Ipuana
Wayuunaiki Translator
Wayuunaiki translation services

Manuel Muyuy
Inga Translator
Universidad de Cauca