{"id":602722,"date":"2023-12-28T13:04:20","date_gmt":"2023-12-28T10:04:20","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/researchers-use-ai-chatbots-against-themselves-to-jailbreak-each-other\/"},"modified":"2023-12-28T13:04:20","modified_gmt":"2023-12-28T10:04:20","slug":"researchers-use-ai-chatbots-against-themselves-to-jailbreak-each-other","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/researchers-use-ai-chatbots-against-themselves-to-jailbreak-each-other\/","title":{"rendered":"#Researchers use AI chatbots against themselves to &#8216;jailbreak&#8217; each other"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a40b13cd670b\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a40b13cd670b\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/researchers-use-ai-chatbots-against-themselves-to-jailbreak-each-other\/#Testing_the_limits_of_LLM_ethics\" >Testing the limits of LLM ethics<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/researchers-use-ai-chatbots-against-themselves-to-jailbreak-each-other\/#Escalating_arms_race_between_hackers_and_LLM_developers\" >Escalating arms race between hackers and LLM developers<\/a><\/li><\/ul><\/nav><\/div>\n<div>\n<div class=\"article-gallery lightGallery\">\n<div data-thumb=\"https:\/\/scx1.b-cdn.net\/csz\/news\/tmb\/2023\/researchers-use-ai-cha.jpg\" data-src=\"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2023\/researchers-use-ai-cha.jpg\" data-sub-html=\"NTU Ph.D. student Mr. Liu Yi, who co-authored the paper, shows a database of successful jailbreaking prompts which managed to compromise AI chatbots, causing them to produce information that their developers deliberately restricted from revealing. Credit: Nanyang Technological University\">\n<figure class=\"article-img\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/scx1.b-cdn.net\/csz\/news\/800a\/2023\/researchers-use-ai-cha.jpg\" alt=\"Researchers use AI chatbots against themselves to \u2018jailbreak\u2019 each other\" title=\"NTU Ph.D. student Mr. Liu Yi, who co-authored the paper, shows a database of successful jailbreaking prompts which managed to compromise AI chatbots, causing them to produce information that their developers deliberately restricted from revealing. Credit: Nanyang Technological University\" width=\"800\" height=\"530\"\/><figcaption class=\"text-darken text-low-up text-truncate-js text-truncate mt-3\">\n                NTU Ph.D. student Mr. Liu Yi, who co-authored the paper, shows a database of successful jailbreaking prompts which managed to compromise AI chatbots, causing them to produce information that their developers deliberately restricted from revealing. Credit: Nanyang Technological University<br \/>\n            <\/figcaption><\/figure>\n<\/div>\n<\/div>\n<p>Computer scientists from Nanyang Technological University, Singapore (NTU Singapore) have managed to compromise multiple artificial intelligence (AI) chatbots, including ChatGPT, Google Bard and Microsoft Bing Chat, to produce content that breaches their developers&#8217; guidelines\u2014an outcome known as &#8220;jailbreaking.&#8221;<\/p>\n<p>                                                                                                                                    &#8220;Jailbreaking&#8221; is a term in computer security where computer hackers find and exploit flaws in a system&#8217;s software to make it do something its developers deliberately restricted it from doing.<\/p>\n<p>Furthermore, by training a large language model (LLM) on a database of prompts that had already been shown to hack these chatbots successfully, the researchers created an LLM chatbot capable of automatically generating further prompts to jailbreak other chatbots.<\/p>\n<p>LLMs form the brains of AI chatbots, enabling them to process human inputs and generate text that is almost indistinguishable from that which a human can create. This includes completing tasks such as planning a <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/trip-and-travel\/\" data-internallinksmanager029f6b8e52c=\"10\" title=\"Trip &amp; Travel\" target=\"_blank\" rel=\"noopener\">trip<\/a> itinerary, telling a bedtime story, and developing computer code.<\/p>\n<p>The NTU researchers&#8217; work now adds &#8220;jailbreaking&#8221; to the list. Their findings may be critical in helping companies and businesses to be aware of the weaknesses and limitations of their LLM chatbots so that they can take steps to strengthen them against hackers.<\/p>\n<p>After running a <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/watch-movies-tv-seriess\/\" data-internallinksmanager029f6b8e52c=\"8\" title=\"Watch Movies &amp; TV Series\" target=\"_blank\" rel=\"noopener\">series<\/a> of proof-of-concept tests on LLMs to prove that their technique indeed presents a clear and present threat to them, the researchers im<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">media<\/a>tely reported the issues to the relevant service providers, upon initiating successful jailbreak attacks.<\/p>\n<div class=\"article-gallery lightGallery\">\n<div data-thumb=\"https:\/\/scx1.b-cdn.net\/csz\/news\/tmb\/2023\/researchers-use-ai-cha-1.jpg\" data-src=\"https:\/\/scx2.b-cdn.net\/gfx\/news\/2023\/researchers-use-ai-cha-1.jpg\" data-sub-html=\"A jailbreak attack example. Credit: &lt;i&gt;arXiv&lt;\/i&gt; (2023). DOI: 10.48550\/arxiv.2307.08715\">\n<figure class=\"article-img text-center\"><img decoding=\"async\" src=\"https:\/\/scx1.b-cdn.net\/csz\/news\/800a\/2023\/researchers-use-ai-cha-1.jpg\" alt=\"Researchers use AI chatbots against themselves to 'jailbreak' each other\" title=\"A jailbreak attack example. Credit: arXiv (2023). DOI: 10.48550\/arxiv.2307.08715\"\/><figcaption class=\"text-left text-darken text-truncate text-low-up mt-3\">\n                A jailbreak attack example. Credit: <i>arXiv<\/i> (2023). DOI: 10.48550\/arxiv.2307.08715<br \/>\n            <\/figcaption><\/figure>\n<\/div>\n<\/div>\n<p>Professor Liu Yang from NTU&#8217;s School of Computer <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/sciencee\/\" data-internallinksmanager029f6b8e52c=\"5\" title=\"Science\" target=\"_blank\" rel=\"noopener\">Science<\/a> and Engineering, who led the study, said, &#8220;Large Language Models (LLMs) have proliferated rapidly due to their exceptional ability to understand, generate, and complete human-like text, with LLM chatbots being highly popular <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lications for everyday use.&#8221;<\/p>\n<p>                                                                                                        <!-- TechX - News - In-article --><\/p>\n<p>                                                                                                                                            &#8220;The developers of such AI services have guardrails in place to prevent AI from generating violent, unethical, or criminal content. But AI can be outwitted, and now we have used AI against its own kind to &#8216;jailbreak&#8217; LLMs into producing such content.&#8221;<\/p>\n<p>NTU Ph.D. student Mr. Liu Yi, who co-authored the paper, said, &#8220;The paper presents a novel approach for automatically generating jailbreak prompts against fortified LLM chatbots. Training an LLM with jailbreak prompts makes it possible to automate the generation of these prompts, achieving a much higher success rate than existing methods. In effect, we are attacking chatbots by using them against themselves.&#8221;<\/p>\n<p>The researchers&#8217; paper describes a two-fold method for &#8220;jailbreaking&#8221; LLMs, which they named &#8220;Masterkey.&#8221;<\/p>\n<p>First, they reverse-engineered how LLMs detect and defend themselves from malicious queries. With that information, they taught an LLM to automatically learn and produce prompts that bypass the defenses of other LLMs. This process can be automated, creating a jailbreaking LLM that can adapt to and create new jailbreak prompts even after developers patch their LLMs.<\/p>\n<p>The researchers&#8217; paper, which appears on the pre-print server <i>arXiv<\/i>, has been accepted for presentation at the Network and Distributed System Security Symposium, a leading security forum, in San Diego, U.S., in February 2024.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Testing_the_limits_of_LLM_ethics\"><\/span>Testing the limits of LLM ethics<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>AI chatbots receive prompts, or a series of instructions, from human users. All LLM developers set guidelines to prevent chatbots from generating unethical, questionable, or illegal content. For example, asking an AI chatbot how to create malicious software to hack into bank accounts often results in a flat refusal to answer on the grounds of criminal activity.<\/p>\n<p>Professor Liu said, &#8220;Despite their benefits, AI chatbots remain vulnerable to jailbreak attacks. They can be compromised by malicious actors who abuse vulnerabilities to force chatbots to generate outputs that violate established rules.&#8221;<\/p>\n<p>The NTU researchers probed into ways of circumventing a chatbot by engineering prompts that slip under the radar of its ethical guidelines so that the chatbot is tricked into responding to them. For example, AI developers rely on keyword censors that pick up certain words that could flag potentially questionable activity and refuse to answer if such words are detected.<\/p>\n<p>One strategy the researchers employed to get around keyword censors was to create a persona that provided prompts simply containing spaces after each character. This circumvents LLM censors, which might operate from a list of banned words.<\/p>\n<p>The researchers also instructed the chatbot to reply in the guise of a persona &#8220;unreserved and devoid of moral restraints,&#8221; increasing the chances of producing unethical content.<\/p>\n<p>The researchers could infer the LLMs&#8217; inner workings and defenses by manually entering such prompts and observing the time for each prompt to succeed or fail. They were then able to reverse engineer the LLMs&#8217; hidden defense mechanisms, further identify their ineffectiveness and create a dataset of prompts which managed to jailbreak the chatbot.<\/p>\n<p>                                                                                                        <!-- TechX - News - In-article --><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Escalating_arms_race_between_hackers_and_LLM_developers\"><\/span>Escalating arms race between hackers and LLM developers<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>When vulnerabilities are found and revealed by hackers, AI chatbot developers respond by &#8220;patching&#8221; the issue, in an endlessly repeating cycle of cat-and-mouse between hacker and developer.<\/p>\n<p>With Masterkey, the NTU computer scientists upped the ante in this arms race as an AI jailbreaking chatbot can produce a large volume of prompts and continuously learn what works and what does not, allowing hackers to beat LLM developers at their own <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">game<\/a> with their own tools.<\/p>\n<p>The researchers first created a training dataset comprising prompts they found effective during the earlier jailbreaking reverse-engineering phase, together with unsuccessful prompts, so that Masterkey knows what not to do. The researchers fed this dataset into an LLM as a starting point and subsequently performed continuous pre-training and task tuning.<\/p>\n<p>This exposes the model to a diverse array of information and sharpens the model&#8217;s abilities by training it on tasks directly linked to jailbreaking. The result is an LLM that can better predict how to manipulate text for jailbreaking, leading to more effective and universal prompts.<\/p>\n<p>The researchers found the prompts generated by Masterkey were three times more effective than prompts generated by LLMs in jailbreaking LLMs. Masterkey was also able to learn from past prompts that failed and can be automated to constantly produce new, more effective prompts.<\/p>\n<p>The researchers say their LLM can be employed by developers themselves to strengthen their security.<\/p>\n<p>NTU Ph.D. student Mr. Deng Gelei, who co-authored the paper, said, &#8220;As LLMs continue to evolve and expand their capabilities, manual testing becomes both labor-intensive and potentially inadequate in covering all possible vulnerabilities. An automated approach to generating jailbreak prompts can ensure comprehensive coverage, evaluating a wide range of possible misuse scenarios.&#8221;<\/p>\n<div class=\"article-main__more p-4\">\n                                                                                                <strong>More information:<\/strong><br \/>\n                                                Gelei Deng et al, MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots, <i>arXiv<\/i> (2023). <a rel=\"nofollow noopener\" target=\"_blank\" data-doi=\"1\" href=\"https:\/\/dx.doi.org\/10.48550\/arxiv.2307.08715\">DOI: 10.48550\/arxiv.2307.08715<\/a><\/p>\n<div class=\"mt-3\">\n                                                    <strong>Journal information:<\/strong><br \/>\n                                                                                                            <cite>arXiv<\/cite><br \/>\n                                                        <a rel=\"nofollow noopener\" target=\"_blank\" class=\"icon_open\" href=\"http:\/\/arxiv.org\/\"><br \/>\n                                                            <svg><use href=\"https:\/\/techx.b-cdn.net\/tmpl\/v2\/img\/svg\/sprite.svg#icon_open\" x=\"0\" y=\"0\"\/><\/svg><\/a>\n                                                                                                    <\/div>\n<\/p><\/div>\n<div class=\"d-inline-block text-medium my-4\">\n                                                Provided by<br \/>\n                                                                                                    Nanyang Technological University<br \/>\n                                                                                                        <a rel=\"nofollow noopener\" target=\"_blank\" class=\"icon_open\" href=\"http:\/\/www.ntu.edu.sg\/Pages\/default.aspx\"><br \/>\n                                                        <svg><use href=\"https:\/\/techx.b-cdn.net\/tmpl\/v2\/img\/svg\/sprite.svg#icon_open\" x=\"0\" y=\"0\"\/><\/svg><\/a><\/p><\/div>\n<p>                                        <!-- print only --><\/p>\n<div class=\"d-none d-print-block\">\n<p>                                                <strong>Citation<\/strong>:<br \/>\n                                                Researchers use AI chatbots against themselves to &#8216;jailbreak&#8217; each other (2023, December 28)<br \/>\n                                                retrieved 28 December 2023<br \/>\n                                                from https:\/\/techxplore.com\/<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>\/2023-12-ai-chatbots-jailbreak.html<\/p>\n<p>                                            This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no<br \/>\n                                            part may be reproduced without the written permission. The content is provided for information purposes only.<\/p><\/div>\n<\/p><\/div>\n<p><script id=\"facebook-jssdk\" async=\"\" src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more Like this articles, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/science\/\" target=\"_blank\" rel=\"noopener\">Science category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techxplore.com\/news\/2023-12-ai-chatbots-jailbreak.html\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>NTU Ph.D. student Mr. Liu Yi, who co-authored the paper, shows a database of successful jailbreaking prompts which managed to compromise AI chatbots, causing them to produce information that their developers deliberately restricted from revealing. Credit: Nanyang Technological University Computer scientists from Nanyang Technological University, Singapore (NTU Singapore) have managed to compromise multiple artificial intelligence&#8230;<\/p>\n","protected":false},"author":1,"featured_media":602723,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2023\/researchers-use-ai-cha.jpg","fifu_image_alt":"","footnotes":""},"categories":[16],"tags":[],"class_list":["post-602722","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-sciencee"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/602722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=602722"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/602722\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/602723"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=602722"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=602722"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=602722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}