Skip to main content
European Commission logo
EMT Blog
  • News article
  • 26 May 2025
  • Directorate-General for Translation
  • 10 min read

From data to discrimination: when words carry prejudice

By Meryem Murat, Ambre Desforges and Raphaëlle Duhamel, students studying for a Masters in Applied Foreign Languages at Grenoble Alpes University

Drawn picture of a robot with the words 'Oops, 404 error' above it
Freepik

In a world where international communication and accessibility have become essential, especially for economic and social reasons, Neural machine translation (NMT) is omnipresent. Its development continues to have a major impact on society and culture, facilitating exchanges, but also raising several issues.

Indeed, NMT relies on artificial intelligence models, in particular deep neural networks, to analyze and generate translation texts, considering the global context. Popular tools, such as Google Translate, DeepL, Bing Translate, Reverso, Gemini or ChatGPT, to name a few, use these technologies to provide fast and accessible translations. 

Nevertheless, these systems are not free from bias, which can reflect stereotypes present in the training data. These biases can result in injustice, marginalization and exclusion in professional or cultural contexts.

Origins and challenges

Among the results generated by NMTs can be found a variety of biases, including gender, cultural or socio-economic biases. Such models rely on algorithms and extensive corpora of training data to produce translations. As a matter of fact, observed biases may stem from prejudices already present in the data used or from problems related to disparities between dominant data (e.g., English or Western languages) and less-represented languages (e.g., Corsican or Gaelic).

However, these problems can also arise for other reasons, such as the very limitations of neural models. In fact, the algorithms underlying these NMT systems have certain limitations of their own. For example, they often “guess” gender or context based on the probabilities calculated from their data, i.e., these systems can either memorize specific associations seen in their training data (e.g., “nurse” is often translated as female) or attempt to generalize by adapting translations to a broader context.

However, over-memorization leads to recurrent biases while over-generalization can lead to mistranslations or clumsy sentences. It is, therefore, essential to remain vigilant when using NMTs as they can influence the translation of sensitive texts (related to gender, culture, religion, etc.) and reflect or reinforce prejudices already present in societies, which can lead to problematic, or even dire, consequences.

Let’s take a closer look at the different types of bias that these systems generate.

The limits of neutrality: gender, culture, and minority languages

Although NMTs have been proven to perform better on average than previous technologies (Junczys-Dowmunt, Dwojak & Hoang, 2016), NMT systems often have difficulty handling languages that are less gender-oriented or use neutral forms. As a result, they may associate genders with professions or roles based on stereotypes.

For example, translating Turkish, which is a gender-neutral language, can prove difficult. 

Screenshot of the translation of a sentence from Turkish into English in Google translate

The sentence “O bir doktor ve o bir hemşire” (Literal translation: “She/he is a doctor, and she/he is a nurse”) is translated as “He is a doctor, and she is a nurse,” which is an issue. Tools (here Google Translate) assume a gender based on stereotypes. We can see that NMT “favors stereotypical assignments, reinforcing sexist prejudices in the translated text” (Wisniewski et al., n.d., p. 38). Problems of this kind are likely to distort the way in which certain groups (in this case, women) are represented in texts (representational harm[1]) as well as lead to lower-quality (Machine Translation) services for women (allocational harm[2]) (Crawford, 2017; Blodgett et al, 2020).

Moreover, some NMT systems inappropriately translate proper nouns, religious, or cultural terms, or ignore cultural nuances. That is the case with Reverso in the translation of the following Japanese sentence “お花見に行きましょう.”

Screenshot of the translation of a sentence from Japenese into English in Reverso

Translating this sentence as “Let’s go flower viewing” is an understatement as it fails to capture the cultural connotations of “Hanami” (a specific tradition where the Japanese admire the cherry blossoms, called “Sakura,” which are emblematic of Japan and represent the rice planting period). Taking out the reference to the Sakura removes all cultural connotations from the translation. A more appropriate translation would be “Let’s go and see the cherry blossoms.”

The accuracy of MTs varies considerably depending on the language pairs involved. For example, Google Translate achieves a 94% accuracy rate for translations from English into Spanish, two widely spoken languages. However, for less common language pairs, such as English and Armenian, accuracy drops to 55% (Sonix, 2024). Therefore, the accuracy of MT tools is often lower for minority or regional languages, reflecting linguistic inequalities. Indeed, when it comes to covering minority languages, NMT relies on the analysis of large corpora of bilingual texts. However, for many minority languages, these corpora are often limited or non-existent, which complicates the development of effective MT systems for these languages.

Systems can also erase important aspects of minority identity, depending on the languages they have to translate.

Screenshot of the translation of a sentence from French into Russian in DeepL

For example, here DeepL fails to translate an expression related to LGBTQIA+ identity. The word “partner” is often used in English or in French for LGBTQIA+ couples because it is gender neutral. The French sentence means “My (feminine form) partner is supportive,” the word “partner” is neutral, but “my” in the feminine form makes it obvious it’s a feminine partner. The word “partner” may not have the same gender-neutral connotations in all languages, which could affect perception or understanding. Here, in Russian, “partner” is automatically translated using the masculine, which suggest a male partner. This alters the intended meaning and erases the original nuance. Moreover, in both French and English, “partner” can also signal a non-marital relationship, which is another layer of identity that may be lost in translation when systems default to gendered or traditional norms.

As NMT systems are trained in existing corpora, they often contain human biases, which can lead to inaccurate translations. The amplification of stereotypes or errors linked to biased training data can, therefore, also be found in texts translated by NMT. 

Impacts on individuals and global relations

Biases in MT systems have multiple consequences, affecting society, individuals, and international relations. On a social level, these biases reproduce and reinforce gender stereotypes, encompassing the systematic association of certain professions with a specific gender (doctor = male, nurse = female). They also contribute to the marginalization of minority languages and cultures, which are often mistranslated or ignored, further exacerbating linguistic and cultural inequalities.

On a practical level, inaccurate or offensive translations can cause problems in professional contexts, particularly for drafting contracts or legal documents where a translation error can have serious consequences. Individuals who do not conform to majority norms, comprising non-binary people or speakers of rare languages, also face challenges as MT systems struggle to provide inclusive or accurate translations for these groups.

Politically, biases in MT promote cultural homogenization where linguistic and cultural nuances are lost. This can lead to a flattening of expressions and ideas at the expense of cultural diversity. Moreover, inaccurate translations can cause diplomatic misunderstandings, for example, the mistranslation of the Japanese word “mokusatsu” during WWII, interpreted as “not worthy of comment” rather than “withholding comment,” is widely believed to have contributed to the U.S. decision to drop the atomic bomb on Hiroshima (Nussbaum, n.d.), impacting international relations. For example, a mistranslation of a political speech could escalate tensions between countries or cultures.

Finally, these biases erode trust in MT tools. If translations are perceived as inaccurate or biased, users may turn away from these technologies, limiting their usefulness in an increasingly globalized world. It is crucial to recognize these consequences to better understand the urgency of developing more equitable and inclusive translation systems – or improving the ones that already exist.

Towards fairer translations: solutions to mitigate bias

There are several possible solutions for advancing NMT. One of them is to diversify the training datasets to improve the accuracy of NMT. Indeed, by incorporating more texts from minority languages and cultures, translation models can better understand linguistic nuances and context. Emphasis should be placed on the quality of training data rather than merely increasing its quantity. High-quality, well-curated datasets would reduce bias and improve translation accuracy, ensuring better representation of minority languages in NMT systems.

Another solution could be supervised learning, which means that expert linguists supervise and perfect MTs, ensuring linguistic and contextual accuracy. Some translation tools, for instance, DeepL or Gemini, have already taken steps in this direction by offering a list of alternative translations that reflect different gender perspectives. If users are not satisfied with the default translation, they can choose the translation with the accurate gender, among a list of alternative translations. The development of such initiatives could help mitigate biases in MT and improve inclusivity.

Finally, beyond technical improvements, raising public awareness about the limitations of translation tools is crucial. Users should be informed about potential biases and inaccuracies in machine-generated translations to encourage critical engagement with these technologies.

Can NMT ever be truly neutral?

In the contemporary context, NMT has attained significant prevalence; however, concerns regarding its inherent biases persist as a substantial challenge. These biases, originating from the training data and algorithmic limitations, have the potential to perpetuate stereotypes and marginalize specific languages or identities. To address this issue, it is imperative to diversify data sources, enhance model performance through human supervision, and raise awareness among users regarding the limitations of these tools. 

In this regard, developers and companies bear a significant responsibility in enhancing the transparency of algorithms and incorporating more diverse and representative data. It is imperative that they recognize the ethical responsibility of their systems to mitigate bias and ensure more equitable translations. At the same time, users can also play a role by critically evaluating these tools, comparing outcomes and reporting observed biases, thereby promoting continuous improvement.

This prompts the pivotal question of whether total neutrality can truly be achieved in an automated system. Some AI tools, like ChatGPT and Gemini, are getting better and better and are beginning to find ways around problems associated with translation biases (e.g., Gemini can sometimes refuse to translate a sentence containing gender bias). Despite these improvements, it remains to be seen whether technology will ever be capable of accurately reflecting the richness and complexity of the world’s languages and cultures without bias or distortion.

References

Blodgett, S. L.; Barocas, S., Daumé, H. III, & Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 5454–5476.

Crawford, K. (2017). The trouble with bias. Conference on Neural Information Processing Systems, Keynote.

Ehrensberger-Dow, M., & O᾽Brien, S. G. A. (2015). Interdisciplinarity in translation and interpreting Process Research. Amsterdam: Johns Benjamins Publishing Company. 

Hong, W., & Rossi, C. (2021). The cognitive turn in metaphor translation studies: A critical overview. Journal of Translation Studies, 2, 83–115.

IA et traduction : Ce qu’il faut savoir. (n.d.). Ottiaq.org, from https://ottiaq.org/app/uploads/2024/01/page-web-sur-ia-pour-site-ottiaq…

Isabelle, P., Cherry, C., & Foster, G. (2017). A challenge set approach to evaluating machine translation. In M. Palmer, R. Hwa, & S. Riedel (Eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2486–2496). Association for Computational Linguistics.

Junczys-Dowmunt, M., Dwojak, T., & Hoang, H. (2016). Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv [cs.CL]. http://arxiv.org/abs/1610.01108

Nussbaum, H. (n.d.). Mokusatsu: One Word, Two Lessons.

Reid, N., J., & Katz, A. N. (2018). Vector space applications in metaphor comprehension. Metaphor and Symbol, 33(4), 280–294. https://doi.org/10.1080/10926488.2018.1549840

Précision de la traduction automatique : Évaluation globale. (2024). Sonix; Sonix.ai. https://sonix.ai/resources/fr/la-precision-de-la-traduction-automatique/

Katzman, J. et al. (2023). Taxonomizing and measuring representational harms: A look at image tagging. arXiv:2305.01776.

Tudor, M.-D. (2022). La traduction des expressions idiomatiques à l’aide de moteurs de traduction automatique. Analele Universității Bucuresti. Limbi si Literaturi Străine, 2, 109-128. 

Vaguer, C. (2011). Expressions figées et traduction : langue, culture, traduction automatique, apprentissage, lexique. In J.-C. Anscombre & S. Mejri, Le figement linguistique : la parole entravée, Honoré Champion, pp. 391–411.

Wisniewski, G., Zhu, L., Ballier, N., & Yvon, F. (2022). Biais de genre dans un système de traduction automatique neuronale : une étude des mécanismes de transfert cross-langue. Traitement automatique des langues, 63(1), 37–61. 


 


[1] According to Katzman and his colleagues (2023, p. 1), representational harm is “[o]utputs that can affect the understandings, beliefs, and attitudes that people hold about particular social groups, and thus the standings of those groups within society.”

[2] According to Katzman and his colleagues (2023, p. 2), allocational harm describes “people belonging to particular social groups are unfairly deprived of access to important opportunities or resources.”

Details

Publication date
26 May 2025
Author
Directorate-General for Translation
Language
  • English
  • French
EMT Category
  • Translation technology