{"id":215765,"date":"2021-03-31T20:00:09","date_gmt":"2021-03-31T17:00:09","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/how-we-taught-google-translate-to-stop-being-sexist\/"},"modified":"2021-03-31T20:00:09","modified_gmt":"2021-03-31T17:00:09","slug":"how-we-taught-google-translate-to-stop-being-sexist","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/how-we-taught-google-translate-to-stop-being-sexist\/","title":{"rendered":"#How we taught Google Translate to stop being sexist"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a3b9acaa7292\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a3b9acaa7292\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/how-we-taught-google-translate-to-stop-being-sexist\/#Biased_algorithms\" >Biased algorithms<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/how-we-taught-google-translate-to-stop-being-sexist\/#New_translations\" >New translations<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/how-we-taught-google-translate-to-stop-being-sexist\/#Overcoming_bias\" >Overcoming bias<\/a><\/li><\/ul><\/nav><\/div>\n<p>&#8220;<strong>#How we taught Google Translate to stop being sexist<\/strong>&#8221;<\/p>\n<div>\n                                Online translation tools have helped us learn new languages, communicate across linguistic borders, and view foreign websites in our native tongue. But the artificial intelligence (AI) behind them is far from perfect, often replicating rather than rejecting the biases that exist within a language or a society.<\/p>\n<p>Such tools are especially vulnerable to gender stereotyping because some languages (such as English) don\u2019t tend to gender nouns, while others (such as German) do. When translating from English to German, translation tools have to decide which gender to assign English words like \u201ccleaner.\u201d Overwhelmingly, the tools conform to the stereotype, opting for the feminine word in German.<\/p>\n<p><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/doi.org\/10.3389\/fpsyg.2018.01561\">Biases<\/a> are human: they\u2019re part of who we are. But when left unchallenged, biases can emerge in the form of concrete negative attitudes towards others. Now, our team has found a way to <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/link.springer.com\/article\/10.1007\/s10676-021-09583-1\">retrain the AI<\/a> behind translation tools, using targeted training to help it to avoid gender stereotyping. Our method could be used in other fields of AI to help the <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a> reject, rather than replicate, biases within society.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Biased_algorithms\"><\/span>Biased algorithms<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To the dismay of their creators, AI algorithms often develop racist or sexist traits. <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.forbes.com\/sites\/parmyolson\/2018\/02\/15\/the-algorithm-that-helped-google-translate-become-sexist\/?sh=22b6f48d7daa\">Google Translate<\/a> has been accused of stereotyping based on gender, such as its translations presupposing that all doctors are male and all nurses are female. Meanwhile, the AI language generator GPT-3 \u2013 which wrote an <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.theguardian.com\/commentisfree\/2020\/sep\/08\/robot-wrote-this-article-gpt-3\">entire article<\/a> for the Guardian in 2020 \u2013 recently showed that it was also shockingly good at producing <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.technologyreview.com\/2020\/07\/20\/1005454\/openai-machine-learning-language-generator-gpt-3-nlp\/\">harmful content and misinformation<\/a>.<\/p>\n<p>These AI failures aren\u2019t necessarily the fault of their creators. Academics and activists recently drew attention to <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.bbc.co.uk\/news\/uk-england-oxfordshire-51738824?intlink_from_url=&amp;\">gender bias<\/a> in the Oxford English Dictionary, where sexist synonyms of \u201cwoman\u201d \u2013 such as \u201cbitch\u201d or \u201cmaid\u201d \u2013 show how even a constantly revised, academically edited catalog of words can contain biases that reinforce stereotypes and perpetuate everyday sexism.<\/p>\n<p>AI learns bias because it isn\u2019t built in a vacuum: it learns how to think and act by reading, analyzing, and categorizing existing data \u2013 like that contained in the Oxford English Dictionary. In the case of translation AI, we expose its algorithm to billions of words of textual data and ask it to recognize and learn from the patterns it detects. We call this process <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ieeexplore.ieee.org\/document\/5392560\">machine learning<\/a>, and along the way patterns of bias are learned as well as those of grammar and syntax.<\/p>\n<p>Ideally, the textual data we show AI won\u2019t contain bias. But there\u2019s an ongoing trend in the field towards building bigger systems trained on <a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/faculty.washington.edu\/ebender\/papers\/Stochastic_Parrots.pdf\">ever-growing data sets<\/a>. We\u2019re talking hundreds of billions of words. These are obtained from the internet by using undiscriminating text-scraping tools like Common Crawl and WebText2, which maraud across the web, gobbling up every word they come across.<\/p>\n<p>The sheer size of the resultant data makes it impossible for any human to actually know what\u2019s in it. But we do know that some of it comes from platforms like <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">Reddit<\/a>, which <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.bbc.co.uk\/news\/technology-56099232\">has made headlines<\/a> for featuring offensive, false or conspiratorial information in users\u2019 posts.<\/p>\n<figure class=\"align-center \">\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" sizes=\"auto, (min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px\" alt=\"A magnifying glass over the Reddit logo on a web browser\" width=\"600\" height=\"409\" class=\" lazy\" src=\"https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=754&amp;fit=clip\" data-lazy=\"true\" srcset=\"https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=600&amp;h=409&amp;fit=crop&amp;dpr=1 600w, https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=30&amp;auto=format&amp;w=600&amp;h=409&amp;fit=crop&amp;dpr=2 1200w, https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=15&amp;auto=format&amp;w=600&amp;h=409&amp;fit=crop&amp;dpr=3 1800w, https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=754&amp;h=514&amp;fit=crop&amp;dpr=1 754w, https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=30&amp;auto=format&amp;w=754&amp;h=514&amp;fit=crop&amp;dpr=2 1508w, https:\/\/images.theconversation.com\/files\/392527\/original\/file-20210330-13-2qqote.jpeg?ixlib=rb-1.1.0&amp;q=15&amp;auto=format&amp;w=754&amp;h=514&amp;fit=crop&amp;dpr=3 2262w\"\/><figcaption><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/thenextweb.com\/neural\/2021\/03\/31\/google-translate-is-sexist-ai-can-solve-syndication\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2021%2F03%2F31%2Fgoogle-translate-is-sexist-ai-can-solve-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Some of the text users share on Reddit contains language we might prefer our translation tools not to learn. Gil C\/Shutterstock\" data-title=\"Share Some of the text users share on Reddit contains language we might prefer our translation tools not to learn. Gil C\/Shutterstock on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Some of the text users share on Reddit contains language we might prefer our translation tools not to learn. Gil C\/Shutterstock on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"\/><\/a>Some of the text users share on Reddit contains language we might prefer our translation tools not to learn. <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.shutterstock.com\/image-photo\/lisbon-portugal-february-6-2014-photo-175132031\">Gil C\/Shutterstock<\/a><span style=\"font-size: 16px;\"\/><\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<h2><span class=\"ez-toc-section\" id=\"New_translations\"><\/span>New translations<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/link.springer.com\/article\/10.1007\/s10676-021-09583-1\">our research<\/a>, we wanted to search for a way to counter the bias within textual data-sets scraped from the internet. Our experiments used a randomly selected part of an existing English-German corpus (a selection of text) that originally contained 17.2 million pairs of sentences \u2013 half in English, half in German.<\/p>\n<p>As we\u2019ve highlighted, German has gendered forms for nouns (doctor can be \u201c<em>der Arzt<\/em>\u201d for male, \u201c<em>die \u00c4rztin<\/em>\u201d for female) where in English we don\u2019t gender these noun forms (with some exceptions, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.thestage.co.uk\/your-views\/actor-or-actress-the-debate-continues-your-views-december-14\">themselves contentious<\/a>, like \u201cactor\u201d and \u201cactress\u201d).<\/p>\n<p>Our analysis of this data revealed clear gender-specific imbalances. For instance, we found that the masculine form of engineer in German (<em>der Ingenieur<\/em>) was 75 times more common than its feminine counterpart (<em>die Ingenieurin<\/em>). A translation tool trained on this data will inevitably replicate this bias, translating \u201cengineer\u201d to the male \u201c<em>der Ingenieur.<\/em>\u201d So what can be done to avoid or mitigate this?<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Overcoming_bias\"><\/span>Overcoming bias<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>A seemingly straightforward answer is to \u201cbalance\u201d the corpus before asking computers to learn from it. Perhaps, for instance, adding more female engineers to the corpus would prevent a translation system from assuming all engineers are men.<\/p>\n<p>Unfortunately, there are difficulties with this <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>roach. Translation tools are trained for days on billions of words. Retraining them by altering the gender of words is possible, but it\u2019s inefficient, expensive and complicated. Adjusting the gender in languages like German is especially challenging because, in order to make grammatical sense, several words in a sentence may need to be changed to reflect the gender swap.<\/p>\n<p>Instead of this laborious gender rebalancing, we decided to retrain existing translation systems with targeted lessons. When we spotted a bias in existing tools, we decided to retrain them on new, smaller data-sets \u2013 a bit like an afternoon of gender-sensitivity training at work.<\/p>\n<p>This approach takes a fraction of the time and resources needed to train models from scratch. We were able to use just a few hundred selected translation examples \u2013 instead of millions \u2013 to adjust the behavior of translation AI in targeted ways. When testing gendered professions in translation \u2013 as we had done with \u201cengineers\u201d \u2013 the accuracy improvements after adapting were about nine times higher than the \u201cbalanced\u201d retraining approach.<\/p>\n<p>In our research, we wanted to show that tackling hidden biases in huge data-sets doesn\u2019t have to mean laboriously adjusting millions of training examples, a task which risks being dismissed as impossible. Instead, bias from data can be targeted and unlearned \u2013 a lesson that other <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/hbr.org\/2019\/10\/what-do-we-do-about-the-biases-in-ai\">AI researchers<\/a> can apply to their own work.<\/p>\n<p><em>This article by\u00a0<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/profiles\/stefanie-ullmann-856014\">Stefanie Ullmann<\/a>, Postdoctoral Research Associate, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/institutions\/university-of-cambridge-1283\">University of Cambridge<\/a> and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/profiles\/danielle-saunders-1221845\">Danielle Saunders<\/a>, Research Student, Department of Engineering, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/institutions\/university-of-cambridge-1283\">University of Cambridge<\/a>\u00a0is republished from <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\">The Conversation<\/a> under a Creative Commons license. Read the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/online-translators-are-sexist-heres-how-we-gave-them-a-little-gender-sensitivity-training-157846\">original article<\/a>.<\/em><\/p>\n<p class=\"c-post-pubDate\">\n                                    Published March 31, 2021 \u2014 17:00 UTC<\/p><\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><script data-src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js#xfbml=1&amp;appId=378011798897423&amp;version=v2.6\" id=\"socialSrcFacebook\" type=\"text\/template\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/neural\/2021\/03\/31\/google-translate-is-sexist-ai-can-solve-syndication\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#How we taught Google Translate to stop being sexist&#8221; Online translation tools have helped us learn new languages, communicate across linguistic borders, and view foreign websites in our native tongue. But the artificial intelligence (AI) behind them is far from perfect, often replicating rather than rejecting the biases that exist within a language or a&#8230;<\/p>\n","protected":false},"author":1,"featured_media":215766,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/03\/1-copy-47.jpg&signature=757bbc54172a2376aca405eb1077c76d","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-215765","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/215765","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=215765"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/215765\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/215766"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=215765"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=215765"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=215765"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}