{"id":93513,"date":"2020-10-20T13:41:04","date_gmt":"2020-10-20T10:41:04","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/facebooks-new-ai-model-translates-100-languages-without-going-through-english-first\/"},"modified":"2020-10-20T13:41:04","modified_gmt":"2020-10-20T10:41:04","slug":"facebooks-new-ai-model-translates-100-languages-without-going-through-english-first","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/facebooks-new-ai-model-translates-100-languages-without-going-through-english-first\/","title":{"rendered":"#Facebook\u2019s new AI model translates 100 languages \u2014 without going through English first"},"content":{"rendered":"<p>&#8220;<strong>#<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">Facebook<\/a>\u2019s new AI model translates 100 languages \u2014 without going through English first<\/strong>&#8221;<br \/>\n<img decoding=\"async\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Untitled-design-2020-10-20T102109.015-796x417.png\" \/><\/p>\n<div>\n                                Facebook has\u00a0<a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/pytorch\/fairseq\/tree\/master\/examples\/m2m_100\">open-sourced an AI model<\/a> that can translate between any pair of 100 languages without first translating them to English as an intermediary step.<\/p>\n<p>The system, called M2M-100, is currently only a research project, but could eventually be used to translate posts for Facebook users,\u00a0nearly two-thirds of whom use a language other than English.<\/p>\n<p><span>\u201cFor years, AI researchers have been working toward building a single universal model that can understand all languages across different tasks,\u201d said Facebook research assistant Angela Fan in <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/about.fb.com\/news\/2020\/10\/first-multilingual-machine-translation-model\/\">a blogpost<\/a>. <\/span><\/p>\n<p><span>\u201cA single model that supports all languages, dialects, and modalities will help us better serve more people, keep translations up to date, and create new experiences for billions of people equally. This work brings us closer to this goal.\u201d<\/span><\/p>\n<p><em>[Read:\u00a0Researchers use AI to translate text found on ancient clay tablets]<\/em><\/p>\n<p>The model was trained on a dataset of 7.5 billion sentence pairs across 100 languages that were mined from the web. Facebook says all of these resources are open source and use publicly available data.<\/p>\n<p>To manage the scale of the mining, the researchers focused on language translations that were most commonly requested and avoided the rarer ones, such as Sinhala-Javanese.<\/p>\n<p>They then grouped the languages into 14 different groups, based on linguistic, geographic, and cultural similarities. This <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>roach was chosen because people in countries with languages that share these characteristics would be more likely to benefit from translations between them.<\/p>\n<p>For instance, one group included common languages in India, such as Hindi, Bengali, and Marathi. All the possible language pairs within each group were then mined.<\/p>\n<p><iframe loading=\"lazy\" style=\"border: none; overflow: hidden;\" src=\"https:\/\/www.facebook.com\/plugins\/video.php?href=https%3A%2F%2Fwww.facebook.com%2Ffacebook%2Fvideos%2F712647556266801%2F&amp;show_text=0&amp;width=560\" width=\"560\" height=\"315\" frameborder=\"0\" scrolling=\"no\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe><br \/>\nThe languages of different groups were connected through a small number of bridge languages. In the example of the Indian language group,\u00a0Hindi, Bengali, and Tamil served as bridge languages for Indo-Aryan languages.<\/p>\n<p>The team then mined training data for all combinations of these bridge languages, which left them with a\u00a0dataset of 7.5 billion parallel sentences corresponding to 2,200 translation directions.<\/p>\n<p>For languages lacking quality translation data, the researchers used a method called <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/engineering.fb.com\/ai-research\/scaling-neural-machine-translation-to-bigger-data-sets-with-faster-training-and-inference\/\">back-translation<\/a> to generate synthetic translations that can supplement the mined data.<\/p>\n<p>This combination of techniques resulted in the\u00a0first multilingual machine translation (MMT) model that can translate between any pair of 100 languages without relying on English data, according to Facebook.<\/p>\n<p>\u201cWhen translating, say, Chinese to French, most English-centric multilingual models train on Chinese to English and English to French, because English training data is the most widely available,\u201d said Fan. \u201cOur model directly trains on Chinese to French data to better preserve meaning.\u201d<\/p>\n<p>The model has not yet been incorporated in any products, but tests suggest it could support a wide variety of translations on Facebook,\u00a0where people post content in more than 160 languages. The company says it outperformed English-centric systems by 10 points on <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/bleu-bilingual-evaluation-understudy-2b4eab9bcfd1\">the BLEU metric for assessing machine translations<\/a>.<\/p>\n<p>\u00a0<\/p>\n<p class=\"c-post-pubDate\">\n                                    Published October 20, 2020 \u2014 10:41 UTC\n                                <\/p>\n<\/p><\/div>\n<p><script data-src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js#xfbml=1&amp;appId=378011798897423&amp;version=v2.6\" id=\"socialSrcFacebook\" type=\"text\/template\"><\/script><\/p>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener noreferrer\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/20\/facebooks-new-ai-model-translates-100-languages-without-going-through-english-first\/\" target=\"_blank\" rel=\"noopener noreferrer\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Facebook\u2019s new AI model translates 100 languages \u2014 without going through English first&#8221; Facebook has\u00a0open-sourced an AI model that can translate between any pair of 100 languages without first translating them to English as an intermediary step. The system, called M2M-100, is currently only a research project, but could eventually be used to translate posts&#8230;<\/p>\n","protected":false},"author":1,"featured_media":93514,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Untitled-design-2020-10-20T102109.015.png&signature=8142acd8b75be57da597ab595a608c02","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-93513","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/93513","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=93513"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/93513\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/93514"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=93513"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=93513"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=93513"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}