{"id":594085,"date":"2023-10-12T16:51:50","date_gmt":"2023-10-12T13:51:50","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/new-technique-makes-ai-hallucinations-wake-up-and-face-reality\/"},"modified":"2023-10-12T16:51:50","modified_gmt":"2023-10-12T13:51:50","slug":"new-technique-makes-ai-hallucinations-wake-up-and-face-reality","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/new-technique-makes-ai-hallucinations-wake-up-and-face-reality\/","title":{"rendered":"#New technique makes AI hallucinations wake up and face reality"},"content":{"rendered":"<div id=\"article-main-content\">\n                            Chatbots<span> have an alarming propensity to generate false information, but present it as accurate. This phenomenon, known as AI hallucinations, has various adverse effects. At best, it restricts the benefits of artificial intelligence. At worst, it can cause real-world harm to people.<\/span><\/p>\n<p>As generative AI enters the mainstream, the alarm bells are ringing louder. In response, a team of European researchers has been <span>vigorously experimenting with remedies.<\/span> Last week, the team unveiled a promising solution. They say <span style=\"font-weight: 400;\">it can reduce AI hallucinations to single-figure percentages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The system is the brainchild of <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/iris.ai\/\">Iris.ai<\/a>, an Oslo-based startup. Founded in 2015, the company has built an AI engine for understanding scientific text. The software scours vast quantities of research data, which it then analyses, categorises, and summarises. <span>\u00a0<\/span><\/span><\/p>\n<p>Customers include the <span style=\"font-weight: 400;\">Finnish Food Authority<span>. The government agency <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.computerweekly.com\/news\/366543192\/The-Finnish-Food-Authority-uses-AI-to-accelerate-research\">used<\/a> the system to accelerate research on a potential avian flu crisis. <\/span><\/span><span style=\"font-weight: 400;\">According to Iris.ai, the platform saves 75% of a researcher\u2019s time.<\/span><\/p>\n<div class=\"inarticle-wrapper latest channel-cta hs-embed-tnw\">\n<div id=\"hs-embed-tnw\" class=\"channel-cta-wrapper\">\n<div class=\"channel-cta-img\"><img decoding=\"async\" class=\"js-lazy\" src=\"https:\/\/s3.amazonaws.com\/events.tnw\/hardfork-2018\/uploads\/visuals\/tnw-newsletter.png\"\/><\/div>\n<p><noscript><img decoding=\"async\" src=\"https:\/\/s3.amazonaws.com\/events.tnw\/hardfork-2018\/uploads\/visuals\/tnw-newsletter.png\"\/><\/noscript><\/p>\n<div class=\"channel-cta-input\">\n<p class=\"channel-cta-title\">The &lt;3 of EU tech<\/p>\n<p class=\"channel-cta-tagline\">The latest rumblings from the EU tech scene, a story from our wise ol&#8217; founder Boris, and some questionable AI art. It&#8217;s free, every week, in your inbox. Sign up now!<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><span style=\"font-weight: 400;\">What doesn\u2019t save their time is AI <\/span><span style=\"font-weight: 400;\">hallucinating. <\/span><\/p>\n<p><span><\/p>\n<blockquote class=\"c-richText__pullQuote\">\n<div class=\"c-richText__pullQuoteGradient\">\n<p class=\"c-richText__pullQuoteQuote\">\u201cThe key is returning responses that match what a human expert would say.\n            <\/div>\n<\/blockquote>\n<p><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Today\u2019s large language models (LLMs) are notorious for spitting out nonsensical and false information. <\/span><span style=\"font-weight: 400;\">Endless examples of these outputs have emerged in recent months.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Sometimes the inaccuracies cause reputational damage. At the launch demo of\u00a0Microsoft Bing AI, for instance, the system <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.cnbc.com\/2023\/02\/14\/microsoft-bing-ai-made-several-errors-in-launch-demo-last-week-.html\">produced<\/a> an error-strewn analysis of Gap\u2019s earnings report.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">At other times, the erroneous outputs can be more harmful. ChatGPT can spout dangerous <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/nypost.com\/2023\/08\/25\/chatgpts-cancer-treatment-advice-potentially-dangerous\/\">medical recommendations.<\/a> Security analysts <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.securityweek.com\/chatgpt-hallucinations-can-be-exploited-to-distribute-malicious-code-packages\/\">fear<\/a> the chatbot\u2019s hallucinations could even drive malicious code packages towards software developers.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201cUnfortunately, LLMs are so good in phrasing that it is hard to distinguish hallucinations from factually valid generated text,\u201d <span>Iris.ai CTO Victor Botev tells TNW.<\/span> \u201cIf this issue is not overcome, users of models will have to dedicate more resources to validating outputs rather than generating them.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">AI hallucinations are also hampering AI\u2019s value in research.\u00a0<\/span><span style=\"font-weight: 400;\">In an Iris.ai survey of 500 corporate R&amp;D workers, only 22% of respondents said they trust systems like ChatGPT. Nonetheless, 84% of them still use ChatGPT as their primary AI tool to support research. Eek.<\/span><\/p>\n<p>These problematic practices spurred Iris.ai\u2019s work on AI hallucinations.<\/p>\n<p>Iris.ai uses several methods to measure the accuracy of AI outputs. <span style=\"font-weight: 400;\">The most crucial technique is validating factual correctness.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201cWe map out the key knowledge concepts we expect to see in a correct answer,\u201d Botev says. \u201cThen we check if the AI\u2019s answer contains those facts and whether they come from reliable sources.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A secondary technique compares the AI-generated response to a verified \u201cground truth.\u201d\u00a0Using a proprietary metric dubbed <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3127526.3127530\">WISDM<\/a>, the software<\/span><span style=\"font-weight: 400;\"> scores the AI output\u2019s semantic similarity to the ground truth. This covers checks on the topics, structure, and key information.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another method examines the coherence of the answer. To do this, Iris.ai ensures the output incorporates relevant subjects, data, and sources for the question at hand \u2014 rather than unrelated inputs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The combination of techniques creates a benchmark for factual accuracy.<\/span><\/p>\n<p><span>\u201cThe key for us is not just returning any response, but returning responses that closely match what a human expert would say,\u201d Botev says.<\/span><\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1400626 js-lazy\" alt=\"Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schj\u00f8ll Abildgaard, and Victor Botev\" width=\"1024\" height=\"508\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6.jpeg 1024w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-280x139.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-270x135.jpeg 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-540x268.jpeg 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-796x395.jpeg 796w\"\/><figcaption><a rel=\"nofollow noopener\" target=\"_blank\" href=\"#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Feditorial.thenextweb.com%2Fdeep-tech%2F2023%2F10%2F12%2Fai-hallucinations-solution-iris-ai%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schj\u00f8ll Abildgaard, and Victor Botev. Credit: Iris.ai\" data-title=\"Share Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schj\u00f8ll Abildgaard, and Victor Botev. Credit: Iris.ai on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schj\u00f8ll Abildgaard, and Victor Botev. Credit: Iris.ai on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"\/><\/a>Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schj\u00f8ll Abildgaard, and Victor Botev. Credit: Iris.ai<\/figcaption><noscript><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1400626\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6.jpeg\" alt=\"Iris.ai founders (left to right) Maria Ritola, Jacobo Elosua, Anita Schj\u00f8ll Abildgaard, and Victor Botev\" width=\"1024\" height=\"508\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6.jpeg 1024w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-280x139.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-270x135.jpeg 270w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-540x268.jpeg 540w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Team-slide_6-796x395.jpeg 796w\"\/><\/noscript><\/figure>\n<p><span style=\"font-weight: 400;\">Under the covers, the Iris.ai system harnesses knowledge graphs, which<\/span><span style=\"font-weight: 400;\"> show relationships between data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The knowledge graphs assess and demonstrate the steps a language model takes to reach its outputs. Essentially, they generate a chain of thoughts that the model should follow.<\/span><\/p>\n<p>The <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>roach simplifies the verification process. <span style=\"font-weight: 400;\">By asking a model\u2019s chat function to split requests into smaller parts and then displaying the right steps, problems can be identified and resolved.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The structure could even prompt a model to identify and correct its own mistakes. As a result, a coherent and factually correct answer could be automatically produced.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><span><\/p>\n<blockquote class=\"c-richText__pullQuote\">\n<div class=\"c-richText__pullQuoteGradient\">\n<p class=\"c-richText__pullQuoteQuote\">\u201cWe need to break down AI\u2019s decision-making.\n            <\/div>\n<\/blockquote>\n<p><\/span>\u00a0<\/span><span style=\"font-weight: 400;\"\/><\/p>\n<p>Iris.ai has now integrated the tech into a new Chat feature, which has been added to the company\u2019s Researcher Workspace platform. In preliminary tests, the <span>feature reduced AI hallucinations to single-figure percentages.<\/span><\/p>\n<p>The problem, however, has not been entirely solved. While the approach appears effective for researchers on the Iris.ai platform, the method will be difficult to<span> scale for popular LLMs<\/span><span>. According to <span style=\"font-weight: 400;\">Botev, the c<\/span>hallenges don\u2019t stem from the tech, but from the users.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When someone does<\/span><span style=\"font-weight: 400;\"> a Bing AI search, for instance, they may have\u00a0 little knowledge of the subject they\u2019re investigating. Consequently, they can misinterpret the results they receive.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201cPeople self-misdiagnose illnesses all the time by searching their symptoms online,\u201d Botev says. \u201cWe need to be able to break down AI\u2019s decision-making process in a clear, explainable way.<\/span><\/p>\n<p>The main cause of AI hallucinations is training data issues. <span style=\"font-weight: 400;\">Microsoft recently unveiled a novel solution to the problem. The company\u2019s new <\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/textbooks-are-all-you-need-ii-phi-1-5-technical-report\/\"><span style=\"font-weight: 400;\">Phi-1.5 model<\/span><\/a><span style=\"font-weight: 400;\"> is <\/span><span style=\"font-weight: 400;\">pre-trained on \u201ctextbook quality\u201d data, which is both synthetically generated and filtered from web sources. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">I<\/span><span style=\"font-weight: 400;\">n theory, this technique will mitigate AI hallucinations.<span> If the training data is well structured and promotes reasoning, there should be less scope for a model to hallucinate.\u00a0<\/span><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another method involves removing bias from the data. <\/span><span style=\"font-weight: 400;\">To do this, Botev suggests training a model on coding language.<\/span><\/p>\n<p>At present, many popular LLMs are trained on a diverse range of data, from novels and <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>paper articles to legal documents and <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">social media<\/a> posts. Inevitably, these sources contain human biases.<\/p>\n<p>In c<span style=\"font-weight: 400;\">oding language, there is a far greater emphasis on reason. This leaves less room for interpretation, which can guide LLMs to factually accurate answers. On the other hand, it could give coders a potentially terrifying power.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><span><\/p>\n<blockquote class=\"c-richText__pullQuote\">\n<div class=\"c-richText__pullQuoteGradient\">\n<p class=\"c-richText__pullQuoteQuote\">\u201cIt\u2019s a matter of trust.\n            <\/div>\n<\/blockquote>\n<p><\/span>\u00a0<\/span><span style=\"font-weight: 400;\"\/><\/p>\n<p><span style=\"font-weight: 400;\">Despite its limitations, the Iris.ai<\/span><span style=\"font-weight: 400;\">\u00a0method is a step in the right direction. By using the<\/span><span style=\"font-weight: 400;\"> knowledge graph structure, transparency and explainability can be added to AI.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201cA wider understanding of the model\u2019s processes, as well as additional outside expertise with black box models, means the root causes of hallucinations across fields can be sooner identified and addressed,\u201d says Botev.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The CTO is also optimistic about external progress in the field.<\/span><span style=\"font-weight: 400;\"> He points to the<\/span><span style=\"font-weight: 400;\"> collaborations with LLM-makers to build larger datasets, infer knowledge graphs from texts, and prepare self-assessment metrics. In the future, this should yield further reductions in AI hallucinations.<\/span><\/p>\n<p>For Botev, the work serves a crucial purpose.<\/p>\n<p><span style=\"font-weight: 400;\">\u201cIt is to a large extent a matter of trust,\u201d he says. \u201cHow can users capitalise on the benefits of AI if they don\u2019t trust the model they\u2019re using to give accurate responses?\u201d<\/span>\n                        <\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/ai-hallucinations-solution-iris-ai\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Chatbots have an alarming propensity to generate false information, but present it as accurate. This phenomenon, known as AI hallucinations, has various adverse effects. At best, it restricts the benefits of artificial intelligence. At worst, it can cause real-world harm to people. As generative AI enters the mainstream, the alarm bells are ringing louder. In&#8230;<\/p>\n","protected":false},"author":1,"featured_media":594086,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/tnw-blurple?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2023\/10\/Untitled-design-3-1.jpg&signature=fb7eca7b8ef925a65ea4a0bf792ec762","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-594085","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/594085","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=594085"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/594085\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/594086"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=594085"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=594085"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=594085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}