Speaking Every Child’s Language: Ensuring authentic translation of LEVANTE tasks across countries

Sachin Allums, Serena Lee, Dr. Amy A. Lightbody, Dr. Fionnuala O’Reilly, Dr. Michael C. Frank

October 27, 2025

A global research framework is only as good as its translation. Authentically translating the text and audio needed for our tasks into different languages remains one of the ongoing challenges in LEVANTE. Although it might seem straightforward—particularly with the rise of AI translation tools—it is in fact far more nuanced than it appears. Every translation that we use in LEVANTE goes through a multi-step review process to ensure that it is appropriate for the language variant, culture, and context at the research sites where it will be used.

Our translation process starts with automated “AI” tools

Across our tasks, dashboard, and surveys, we are translating over 16,000 words—using DeepL and Google Translate to generate baseline AI translations. Those translations are reviewed by expert translators, and then reviewed by research sites. After that, we use ElevenLabs to generate audio that goes through a similar human review process.

AI tools are only a starting point, not an end point

AI is a great way to begin the task of translating text. However, without manual checking, AI translations can sometimes be laughably bad. For example, the word fan in English can refer to a person who adores a particular person or franchise (e.g., a sports fan), or an object that cools people off (e.g., a ceiling fan). In Spanish, fan translates to admirador or ventilador, each of which have very different implications for how children perceive the word. If a child is asked to pick the image of an admirador, which means admirer, they might be confused to find an image of a ceiling fan (translated as ventilador)!

To ensure the translated versions of our tasks are as accurate as possible, we work with professional translators to review all AI-translated content. We manage this process using Crowdin, a platform designed for multilingual translation. Crowdin lets translators simultaneously view the exact task stimuli and text that participants see on their screens—something that’s far more difficult in Excel or Google Sheets. Crowdin allows us to:

generate AI translations,
have our translators correct their accuracy and adapt them to a specific regional dialect,
have experts at our partner sites check to ensure that the translations stay true to the original intention of the measure and are developmentally appropriate.

This three step pipeline ensures that multiple sets of eyes approve every single translation.

Keeping task difficulty consistent

During cross-cultural adaptation, a task’s original level of difficulty can become obscured. For example, in one of our vocabulary tasks, children were asked to identify an image showing suede—a word that is challenging for English and Spanish-speaking children. To our surprise, German children identified it with ease. After consulting our German colleagues, we realized the difference stemmed from the translation: suede translates to Wildleder (literally wild-leather). Because leather translates to Leder in German, children could use this as a clue to infer the meaning of Wildleder, even if they had never heard this word before. Because German often forms longer words by combining smaller ones, it can be difficult to find terms that do not advantage German-speaking children on vocabulary tasks. Involving language consultants in this process helps us distinguish between differences arising from lexical structure and those reflecting genuine cross-cultural variation.

Working towards cultural relevance

The content of each task should be culturally relevant to children around the world. This requires thoughtful consideration of which words or scenarios may be unfamiliar to children in specific regions. For example, playing or watching American football is not a common childhood experience in Colombia or Germany. Children in these countries might not even recognize what an American football looks like. We have adopted a strategy of designing scenarios that draw on broadly applicable experiences—for instance, using soccer, the most widely played sport in the world.

Grammatical differences between languages

One major challenge for our tasks is assessing language competence across languages that differ in their structures. In our Sentence Understanding task, for example, participants match a phrase such as they are jumping over the wall to a picture. In English, they is a gender-neutral pronoun, but in other languages, the plural pronoun changes depending on whether the group is all boys, all girls, or mixed. To avoid giving away the answer in those languages, we design stimuli so that each image within an item depicts either all boys or all girls.

In addition, there are structures in some languages that don’t exist in others. A competent speaker of Spanish or French needs to know more verb endings than an English speaker does, for example. When we adapt our language measures (especially Sentence Understanding) to a new language, we work with expert collaborators and consultants to ensure that we create a new construct map––a list of the important components a speaker needs to know––for each language. This work is tricky, and ongoing. Often we need to collect validation data to ensure that a newly-developed language measure is performing appropriately.

More to do

Although we have made significant progress toward ensuring our measures are internationally valid, further work remains to be done on translation. Our immediate focus is on developing translations with our new international sites and on refining our Vocabulary and Sentence Understanding assessments. By the end of 2025, we aim to have our tasks adapted into two additional languages—Dutch and Canadian French. We also plan to adapt them for regional dialects such as Argentinian Spanish, Swiss German, and Ghanaian English. This effort will require extensive collaboration to ensure that our tasks preserve the same context, difficulty, and relevance across cultures. By prioritizing these factors, we aim to keep our measures accurate, fair, and cross-culturally robust as we move towards a deeper understanding of children’s developmental variation.

To follow updates on our translation efforts and processes, visit the internationalization section of our Researcher Site.

About us

About

Science