{"id":678095,"date":"2025-07-01T10:20:20","date_gmt":"2025-07-01T07:20:20","guid":{"rendered":"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/"},"modified":"2025-07-01T10:20:20","modified_gmt":"2025-07-01T07:20:20","slug":"the-race-to-make-ai-as-multilingual-as-europe","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/","title":{"rendered":"The race to make AI as multilingual as Europe"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a3df9d91cdae\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a3df9d91cdae\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#Terminology_and_technology_primer\" >Terminology and technology primer<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#Hugging_Face\" >Hugging Face<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#Mistral_AI\" >Mistral AI<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#EuroLLM\" >EuroLLM<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#OpenLLM_Europe\" >OpenLLM Europe<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#OpenEuroLLM_Lumi_and_Silo\" >OpenEuroLLM, Lumi, and Silo<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#OpenLLM_France\" >OpenLLM France<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/buradabiliyorum.com\/en\/the-race-to-make-ai-as-multilingual-as-europe\/#Do_Europeans_care_about_multilingual_AI\" >Do Europeans care about multilingual AI?<\/a><\/li><\/ul><\/nav><\/div>\n<div id=\"article-main-content\">\n<p><span style=\"font-weight: 400;\">The European Union has <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/european-union.europa.eu\/principles-countries-history\/languages_en\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">24 official languages<\/span><\/a><span style=\"font-weight: 400;\"> and dozens more unofficial ones spoken across the continent. If you add in the European countries outside the union, then that brings at least a dozen more into the mix. Add dialects, <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.ethnologue.com\/insights\/how-many-languages-endangered\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">endangered languages<\/span><\/a><span style=\"font-weight: 400;\">, and languages brought by migrants to Europe, and you end up with hundreds of languages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One thing many of us in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a> could agree on is that the US dominates \u2014 and that extends to online languages. There are many reasons for this, mostly due to American institutions, standards bodies, and companies defining how computers, their operating systems, and the software they run work in their nascent days. This is changing, but for the short term at least, it remains the norm. This has also led to the majority of the web being in English. An astounding <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/w3techs.com\/technologies\/overview\/content_language\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">50% of websites are in English<\/span><\/a><span style=\"font-weight: 400;\">, despite it being the native tongue of only about 6% of the world\u2019s population, with Spanish, German, and Japanese next, but a long way behind, each only between 5-6% of the web.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As we delve deeper into the new wave of AI-powered <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lications and services, many are driven by data in large language models (LLMs). As much of the data in these LLMs is scraped (controversially in many cases) from the web, <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2666389924002903\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">LLMs predominantly understand and respond in English.<\/span><\/a><span style=\"font-weight: 400;\"> As we find ourselves at the start of or in the midst of a shift in technological paradigm caused by the rapid growth of AI tools, this is a problem, and we\u2019re bringing that problem into a new age.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Europe already boasts several high-profile AI companies and projects, such as <\/span><span style=\"font-weight: 400;\">Mistral<\/span><span style=\"font-weight: 400;\"> and <\/span><span style=\"font-weight: 400;\">Hugging Face<\/span><span style=\"font-weight: 400;\">. <\/span><span style=\"font-weight: 400;\">Google DeepMind<\/span><span style=\"font-weight: 400;\"> also originated as a European company. The continent has research projects that develop language models to enhance how AI tools comprehend less commonly spoken languages.<\/span><\/p>\n<div class=\"inarticle-wrapper channel-cta\">\n<div class=\"ica-text\">\n<p class=\"ica-text__title\">TNW Conference 2025 &#8211; That&#8217;s a wrap!<\/p>\n<p>Check out the highlights!<\/p>\n<\/div>\n<\/div>\n<p><span style=\"font-weight: 400;\">This article explores some of these initiatives, questions their effectiveness, and asks whether their efforts are worthwhile or if many users default to using English versions of tools. As Europe seeks to build its independence in AI and ML, does the continent have the companies and skills necessary to achieve its goals?<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Terminology_and_technology_primer\"><\/span>Terminology and technology primer<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">To make sense of what follows, you don\u2019t need to understand how models are created, trained, or function. But it\u2019s helpful to understand a couple of basics about models and their human language support.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Unless model documentation explicitly mentions it is <\/span>multilingual or cross-lingual<span style=\"font-weight: 400;\">, prompting it or requesting a response in an unsupported language may cause it to translate back and forth or respond in a language it <\/span><i><span style=\"font-weight: 400;\">does<\/span><\/i><span style=\"font-weight: 400;\"> understand. Both strategies can produce unreliable and inconsistent results \u2014 especially in low-resource languages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While <\/span>high-resource<span style=\"font-weight: 400;\"> languages, such as English, benefit from abundant training data. <\/span>Low-resource<span style=\"font-weight: 400;\"> languages, such as Gaelic or Galician, have far less, which often leads to inferior performance<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The harder concept to explain regarding models is \u201copen,\u201d which is unusual, as software in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a> has had a fairly clear definition of \u201copen source\u201d for a while. I don\u2019t want to delve too deeply into this topic as the exact definition is still in flux and controversial. The summary is that even when a model might call itself \u201copen\u201d and is referenced as \u201copen,\u201d the meaning of \u201copen\u201d isn\u2019t always the same.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here are two other useful terms to know:<\/span><\/p>\n<p><b>Training<\/b><span style=\"font-weight: 400;\"> teaches a model to make predictions or decisions based on input data.<\/span><\/p>\n<p><b>Parameters<\/b><span style=\"font-weight: 400;\"> are variables learned during model training that define how the model maps inputs to outputs. In other words, how it understands and responds to your questions. The larger the number of parameters, the more complex the model is.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With that brief explanation done, how are European AI companies and projects working to enhance these processes to improve European language support?<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Hugging_Face\"><\/span>Hugging Face<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">When someone wants to share code, they typically provide a link to their GitHub repository. When someone wants to share a model, they typically provide a Hugging Face link. Founded in 2016 by French entrepreneurs in New York City, the company is an active participant in creating communities and a strong proponent of open models. In 2024, it started an AI accelerator for European startups and partnered with Meta to develop translation tools based on <\/span><span style=\"font-weight: 400;\">Meta\u2019s <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/ai.meta.com\/blog\/nllb-200-high-quality-machine-translation\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">\u201cNo Language Left Behind\u201d model<\/span><\/a><span style=\"font-weight: 400;\">. They are also one of the driving forces behind the <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/bigscience\/bloom\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">BLOOM model<\/span><\/a><span style=\"font-weight: 400;\">, a groundbreaking multilingual model that set new standards for international collaboration, openness, and training methodologies.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Hugging Face is a useful tool for getting a rough idea of the language support in models. At the time of writing, <\/span><span style=\"font-weight: 400;\">Hugging Face lists <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?sort=trending\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,743,136 models<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?sort=trending\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">298,927 datasets<\/span><\/a><span style=\"font-weight: 400;\">. Look at its <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/languages\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">leaderboard<\/span><\/a><span style=\"font-weight: 400;\"> for monolingual models and datasets<\/span><span style=\"font-weight: 400;\">, and you see the following ranking for models and datasets that developers tag (add metadata) as supporting European languages at the time of writing:<\/span><\/p>\n<table>\n<thead>\n<tr>\n<th style=\"text-align: center;\"><strong>Language<\/strong><\/th>\n<th style=\"text-align: center;\"><strong>Language code<\/strong><\/th>\n<th style=\"text-align: center;\"><strong>Datasets<\/strong><\/th>\n<th style=\"text-align: center;\"><strong>Models<\/strong><\/th>\n<\/tr>\n<tr>\n<th style=\"text-align: center;\"><span style=\"font-weight: 400;\">English\u00a0English<\/span><\/th>\n<th><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/ISO_639:en\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">en<\/span><\/a><\/th>\n<th style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?language=language:en\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">27,702<\/span><\/a><\/th>\n<th style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?language=en\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">205,459<\/span><\/a><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align: center;\"><span style=\"font-weight: 400;\">English<\/span><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/ISO_639:eng\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">eng<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?language=language:eng\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,370<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?language=eng\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,070<\/span><\/a><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\"><span style=\"font-weight: 400;\">French<\/span><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/ISO_639:fra\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">fra<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?language=language:fra\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,933<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?language=fra\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">850<\/span><\/a><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\"><span style=\"font-weight: 400;\">Spanish\u00a0Espa\u00f1ol<\/span><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/ISO_639:es\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">es<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?language=language:es\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,745<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?language=es\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">10,028<\/span><\/a><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\"><span style=\"font-weight: 400;\">German\u00a0Deutsch<\/span><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/ISO_639:de\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">de<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?language=language:de\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,442<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?language=de\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">9,714<\/span><\/a><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;\"><span style=\"font-weight: 400;\">English<\/span><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/ISO_639:eng\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">eng<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets?language=language:eng\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,370<\/span><\/a><\/td>\n<td style=\"text-align: center;\"><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/models?language=eng\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1,070<\/span><\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">You can already see some issues here. These aren\u2019t tags set in stone. The community can add values freely. While you can see that they follow them for the most part, there is some duplication.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As you can see, the models are dominated by English. A similar issue applies to the datasets on Hugging Face, which lack non-English data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">What does this mean?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Lucie-Aim\u00e9e Kaffee, EU Policy Lead at Hugging Face, said that the tags indicate that a model has been trained to understand and process this language or that the dataset contains materials in that language. She added that the confusion between language support often comes during training.\u201cWhen training a large model, it\u2019s common for other languages to accidentally get caught in training because there were some artefacts of it in that dataset,\u201d she said. \u201cThe language a model is tagged with is usually what the developers intended the model to understand.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As one of the main and busiest destinations for model developers and researchers, Hugging Face not only hosts much of their work, but also lets them create outward-facing communities to tell people how to use them.<\/span><\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1414113 js-lazy\" alt=\"Thomas Wolf, Co-founder &amp; Chief Science Officer, Hugging Face, on Centre Stage during day one of Web Summit 2024 at the MEO Arena in Lisbon, Portugal.\" width=\"1280\" height=\"720\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2025\/06\/Untitled-design-3.jpg\"\/><figcaption>Thomas Wolf, co-founder of Hugging Face, described Bloom as \u201cthe world\u2019s largest open multilingual language model.\u201d Credit: <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/flickr.com\/photos\/websummit\/54134874860\/in\/photolist-2qtAY4c-2qtzQii-2qtzxfK-2qtzQuq-2qtAuX3-2qtB6md-2qtubyJ-2qtHogo-2qtGvoe-2qtF2Gc-2qtGLC3-2qtHina\" target=\"_blank\" rel=\"nofollow noopener\">Shauna Clinton\/Web Summit via Sportsfile<\/a><\/figcaption><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1414113\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2025\/06\/Untitled-design-3.jpg\" alt=\"Thomas Wolf, Co-founder &amp; Chief Science Officer, Hugging Face, on Centre Stage during day one of Web Summit 2024 at the MEO Arena in Lisbon, Portugal.\" width=\"1280\" height=\"720\" srcset=\"\"\/><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"Mistral_AI\"><\/span>Mistral AI<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Perhaps the best-known Europe-based AI company is France\u2019s <\/span><span style=\"font-weight: 400;\">Mistral AI<\/span><span style=\"font-weight: 400;\">, which unfortunately declined an interview. Its multilingual challenges partly inspired this article. <\/span><span style=\"font-weight: 400;\">At the <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/archive.fosdem.org\/2024\/schedule\/event\/fosdem-2024-2591-building-open-source-language-models\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">FOSDEM developer conference<\/span><\/a><span style=\"font-weight: 400;\"> in February 2024,<\/span><span style=\"font-weight: 400;\"> linguistics researcher Julie Hunter asked one of Mistral\u2019s models for a recipe in French \u2014 but it responded in English. However, 16 months is an eternity in AI development, and neither the company\u2019s \u201cLe Chat\u201d chat interface nor running its 7B model locally reproduced the same error in recent tests. But interestingly, 7B did produce a spelling error in the opening line: \u201cboueef\u201d \u2014 and more may follow.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While Mistral sells several commercial models, tools, and services, its<\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/mistralai\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\"> free-to-use models<\/span><\/a><span style=\"font-weight: 400;\"> are popular, and I personally tend to use <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/mistral.ai\/news\/announcing-mistral-7b\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Mistral 7B<\/span><\/a><span style=\"font-weight: 400;\"> for running tasks through local models.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Until recently, the company wasn\u2019t explicit about its models having multilingual support, but its announcement of the <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/mistral.ai\/news\/magistral\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Magistral model<\/span><\/a><span style=\"font-weight: 400;\"> at London Tech Week in June 2025 confirmed support for several European languages.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"EuroLLM\"><\/span>EuroLLM<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/eurollm.io\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">EuroLLM<\/span><\/a><span style=\"font-weight: 400;\"> was created as a partnership between Portuguese AI platform <\/span><span style=\"font-weight: 400;\">Unbabel<\/span><span style=\"font-weight: 400;\"> and several European universities to understand and generate text in all official European Union languages. The model also includes non-European languages widely spoken by immigrant communities and major trading partners, such as Hindi, Chinese, and Turkish.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Like some of the other open model projects in this article, its work was partly funded by the <\/span><span style=\"font-weight: 400;\">EU\u2019s <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/eurohpc-ju.europa.eu\/index_en\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">High Performance Computing Joint Undertaking program<\/span><\/a><span style=\"font-weight: 400;\"> (EuroHPC JU). Many of them share similar names and aims, making it confusing to separate them all. EuroLLM was one of the first, and as Ricardo Rei, Senior Research Scientist at Unbabel, told me, the team has learned a lot from the projects that have come since.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As Unbabel\u2019s prime business is language translation, and translation is a key task for many multilingual models, the work on EuroLLM made sense to the Portuguese platform. Before EuroLLM, Unbabel had already been refining existing models to make its own and found them all too English-centric.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One of the team\u2019s biggest challenges was finding sufficient training data for low-resource languages. Ultimately, the availability of training material reflects the number of people who speak the language. One of the common data sources used to train European language models is <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.europarl.europa.eu\/portal\/en\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Europarl<\/span><\/a><span style=\"font-weight: 400;\">, which contains transcripts of the European Parliament\u2019s activities translated into all official EU languages. It\u2019s also <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/datasets\/disco-eth\/EuroSpeech\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">available as a Hugging Face dataset<\/span><\/a><span style=\"font-weight: 400;\">, thanks to <\/span><span style=\"font-weight: 400;\">ETH Z\u00fcrich<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Currently, the project has a <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/utter-project\/EuroLLM-1.7B\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">1.7B parameter model<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/utter-project\/EuroLLM-9B\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">a 9B parameter model<\/span><\/a><span style=\"font-weight: 400;\">, and is working on a 22B parameter model. In all cases, the models can translate, but are also general-purpose, meaning you can chat with them in a similar way to ChatGPT, mixing and matching languages as you do.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"OpenLLM_Europe\"><\/span>OpenLLM Europe<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/OpenLLM-Europe\/European-OpenLLM-Projects?tab=readme-ov-file\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">OpenLLM Europe<\/span><\/a><span style=\"font-weight: 400;\"> isn\u2019t building anything directly, but it\u2019s fostering a Europe-wide community of LLM projects, specifically medium and low-resource languages. Don\u2019t let the one-page GitHub repository fool you: <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/discord.com\/invite\/b5UQTWQn\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">the Discord server<\/span><\/a><span style=\"font-weight: 400;\"> is lively and active<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"OpenEuroLLM_Lumi_and_Silo\"><\/span>OpenEuroLLM, Lumi, and Silo<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">A joint project between several European universities and companies, <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/openeurollm.eu\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">OpenEuroLLM<\/span><\/a><span style=\"font-weight: 400;\"> is one of the newer and larger entrants to the list of projects funded by EuroHPC. This means that it has no public models as of yet, but it involves many of the institutions and individuals behind <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/LumiOpen\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">the Lumi family of models<\/span><\/a><span style=\"font-weight: 400;\"> that focus on Scandinavian and Nordic languages. It aims to create a multilingual model, provide more datasets for other models and conform to the <\/span><span style=\"font-weight: 400;\">EU AI Act<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I spoke with <\/span><span style=\"font-weight: 400;\">Peter Sarlin<\/span><span style=\"font-weight: 400;\"> of <\/span><span style=\"font-weight: 400;\">AMD Silo<\/span><span style=\"font-weight: 400;\">, one of the companies involved in the project and a key figure in Finnish and European AI development, about the plans. He explained that Finland, especially, has several institutes with significant AI research programs, including <\/span><span style=\"font-weight: 400;\">Lumi<\/span><span style=\"font-weight: 400;\">, one of the supercomputers part of EuroHPC. Silo, through its SiloGen product, offers open source models to customers, with a strong focus on supporting European languages. Sarlin pointed out that while sovereignty is an important motivation to him and Silo for creating and maintaining models that support European languages, the better reason is expanding the business and helping companies build solutions for small markets such as Estonia.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201cOpen models are great building blocks, but they aren\u2019t as performant as closed ones, and many businesses in the Nordics and Scandinavia don\u2019t have the resources to build tools based on open models,\u201d he said. \u201cSo Silo and our models can step in to fill the gaps.\u201d<\/span><\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1414184 js-lazy\" alt=\"Silo AI CEO Peter Sarlin\" width=\"1280\" height=\"720\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2025\/06\/Untitled-design-3-2.jpg\"\/><figcaption>Under Sarlin\u2019s leadership, Silo AI built a Nordic LLM family to protect the region\u2019s linguistic diversity. Credit: Silo AI<\/figcaption><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1414184\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2025\/06\/Untitled-design-3-2.jpg\" alt=\"Silo AI CEO Peter Sarlin\" width=\"1280\" height=\"720\" srcset=\"\"\/><\/figure>\n<p><span style=\"font-weight: 400;\">The Lumi models use a \u201ccross-lingual training\u201d technique in which the model shares its parameters between high-resource and low-resource languages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">All this prior work led to the OpenEuroLLM project, which Sarlin describes as \u201cEurope\u2019s largest open source AI initiative ever, including pretty much all AI developers in Europe apart from Mistral.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While many efforts are underway and performing well, the training data issue for low-resource languages remains the biggest challenge, especially amid the move towards more nuanced <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2501.11223\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">reasoning models<\/span><\/a><span style=\"font-weight: 400;\">. Translations and cross-lingual training are options, but can create responses that sound unnatural to native speakers. As Sarlin said, \u201cWe don\u2019t want a model that sounds like an American speaking Finnish.\u201d<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"OpenLLM_France\"><\/span>OpenLLM France<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">France is one of the more active countries in AI development, with Mistral and Hugging Face leading the way. From a community perspective, the country also has <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/OpenLLM-France\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">OpenLLM France<\/span><\/a><span style=\"font-weight: 400;\">. The project (unsurprisingly) focuses on French language models, with several models of different parameters and datasets, which help other projects train and improve their models that support French. <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/OpenLLM-France\/Claire-datasets?tab=readme-ov-file#parliamentary-proceedings\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">The datasets include<\/span><\/a><span style=\"font-weight: 400;\"> a mix of political discourse, meeting recordings, theatre shows, and casual conversations. The project also maintains <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/huggingface.co\/spaces\/le-leadboard\/OpenLLMFrenchLeaderboard\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">a leaderboard<\/span><\/a><span style=\"font-weight: 400;\"> of French models on Hugging Face<\/span><span style=\"font-weight: 400;\">, one of the few (active) European language model benchmark pages.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Do_Europeans_care_about_multilingual_AI\"><\/span>Do Europeans care about multilingual AI?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Europe is full of people and projects working on multilingual language models. But do consumers care? Unfortunately, getting language usage rates for proprietary tools such as ChatGPT or Mistral is almost impossible. I created a <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.linkedin.com\/posts\/chrischinchilla_i-am-working-on-a-piece-about-using-llms-activity-7328081633889169409-7FKM?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAAIljVUBH0xZMvfzrbeANZYOeLlaZ8y5g8E\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">poll on LinkedIn<\/span><\/a><span style=\"font-weight: 400;\"> asking if people use AI tools in their native language, English, or a mixture of both. The results were a 50\/50 split between English and a mixture of languages. This could indicate that the number of people using AI tools in a non-English language is higher than you think.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Typically, people use AI tools in English for work and in their own language for personal tasks.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kaffee, a German and English speaker, said: \u201cI use them mostly in English because I speak English at work and with my partner at home. But then, for personal tasks\u2026, I use German.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kaffee mentioned that Hugging Face was working on a soon-to-be-published research project that fully analysed the usage of multilingual models on the platform. She also noted anecdotally that their usage is on the rise.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201cUsers have a conception that models are now more multilingual. And with the accessibility through large models like <\/span><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.llama.com\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">Llama<\/span><\/a><span style=\"font-weight: 400;\">, for example, being multilingual, I think that made a big impact on the research world regarding multilingual models and the number of people wanting to now use them in their own language.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The internet was always supposed to be global and for everyone, but the damning statistic that <\/span><span style=\"font-weight: 400;\">50% of sites are in <\/span><span style=\"font-weight: 400;\">English shows it never really worked out that way. We\u2019re entering a new phase in how we access information and who controls it. Maybe this time, the (AI) revolution will be international.<\/span><\/p>\n<\/p><\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" target=\"_blank\" >Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/making-multilingual-ai-in-europe\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The European Union has 24 official languages and dozens more unofficial ones spoken across the continent. If you add in the European countries outside the union, then that brings at least a dozen more into the mix. Add dialects, endangered languages, and languages brought by migrants to Europe, and you end up with hundreds of&#8230;<\/p>\n","protected":false},"author":1,"featured_media":678096,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/tnw-blurple?filter_last=1&fit=1280%2C640&url=https%3A%2F%2Fcdn0.tnwcdn.com%2Fwp-content%2Fblogs.dir%2F1%2Ffiles%2F2025%2F06%2FUntitled-design-2-1.jpg&signature=78bae8d540942519ca137acce026ae8f","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-678095","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/678095","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=678095"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/678095\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/678096"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=678095"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=678095"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=678095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}