{"id":700687,"date":"2025-11-24T02:10:14","date_gmt":"2025-11-23T23:10:14","guid":{"rendered":"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/"},"modified":"2025-11-24T02:10:14","modified_gmt":"2025-11-23T23:10:14","slug":"new-web-standards-could-redefine-how-ai-models-use-your-content","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/","title":{"rendered":"New web standards could redefine how AI models use your content"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a28eade096dc\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a28eade096dc\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/#A_new_protocol_could_give_you_power_over_how_AI_models_collect_and_use_your_content_See_what_rules_are_being_drafted_and_why_they_matter\" >A new protocol could give you power over how AI models collect and use your content. See what rules are being drafted and why they matter.<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/#IETF_AI_Preferences_Working_Group\" >IETF AI Preferences Working Group<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/#What_the_AI_Preferences_Group_is_proposing\" >What the AI Preferences Group is proposing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/#How_it_might_work\" >How it might work<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/new-web-standards-could-redefine-how-ai-models-use-your-content\/#Why_does_this_matter\" >Why does this matter?<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"subhead\" itemprop=\"alternativeHeadline\"><span class=\"ez-toc-section\" id=\"A_new_protocol_could_give_you_power_over_how_AI_models_collect_and_use_your_content_See_what_rules_are_being_drafted_and_why_they_matter\"><\/span>A new protocol could give you power over how AI models collect and use your content. See what rules are being drafted and why they matter.<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><\/p>\n<div class=\"bialty-container\">\n<p>In recent years, the open web has felt like the Wild West. Creators have seen their work scraped, processed, and fed into large language models \u2013\u00a0mostly without their consent.<\/p>\n<p>It became a data free-for-all, with almost no way for site owners to opt out or protect their work.<\/p>\n<p>There have been efforts, like<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/llmstxt.org\/\" target=\"_blank\" rel=\"noopener\"> llms.txt initiative from Jeremy Howard.<\/a> Like robots.txt, which lets site owners allow or block site crawlers, llms.txt offers rules that do the same for AI companies\u2019 crawling bots.<\/p>\n<p>But there\u2019s no clear evidence that AI companies follow llms.txt or honor its rules. Plus, Google explicitly said it doesn\u2019t support llms.txt.<\/p>\n<p>However, a new protocol is now emerging to give site owners control over how AI companies use their content. It may become part of robots.txt, allowing owners to set clear rules for how AI systems can access and use their sites.<\/p>\n<h2 id=\"ietf-ai-preferences-working-group\" class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"IETF_AI_Preferences_Working_Group\"><\/span>IETF AI Preferences Working Group<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To address this, the Internet Engineering Task Force (IETF) <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.ietf.org\/blog\/ai-pref-progress\/\" target=\"_blank\" rel=\"noopener\">launched<\/a> the AI Preferences Working Group in January. The group is creating standardized, machine-readable rules that let site owners spell out how (or if) AI systems can use their content.<\/p>\n<p>Since its founding in 1986, the IETF has defined the core protocols that power the Internet, including TCP\/IP, HTTP, DNS, and TLS.<\/p>\n<p>Now they\u2019re developing standards for the AI era of the open web. The AI Preferences Working Group is co-chaired by Mark Nottingham and Suresh Krishnan, along with leaders from Google, Microsoft, Meta, and others.<\/p>\n<p>Notably, Google\u2019s Gary Illyes is also part of the working group.<\/p>\n<p>The <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/wg\/aipref\/about\/\" target=\"_blank\" rel=\"noopener\">goal<\/a> of this group:<\/p>\n<ul class=\"wp-block-list\">\n<li>\u201cThe AI Preferences Working Group will standardize building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.\u201d\u00a0<\/li>\n<\/ul>\n<h2 id=\"what-the-ai-preferences-group-is-proposing\" class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_the_AI_Preferences_Group_is_proposing\"><\/span>What the AI Preferences Group is proposing<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/charter-ietf-aipref\/\">This working group will deliver<\/a> new standards that give site owners control over how LLM-powered systems use their content on the open web.<\/p>\n<ul class=\"wp-block-list\">\n<li>A standard track document covering vocabulary for expressing AI-related preferences, independent of how those preferences are associated with content.<\/li>\n<li>Standard track document(s) describing means of attaching or associating those preferences with content in IETF-defined protocols and formats, including but not limited to using Well-Known URIs (<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/rfc8615\/\">RFC 8615<\/a>) such as the Robots Exclusion Protocol (<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/rfc9309\/\">RFC 9309<\/a>), and HTTP response header fields.<\/li>\n<li>A standard method for reconciling multiple expressions of preferences.<\/li>\n<\/ul>\n<p><strong>As of this writing, nothing from the group is final yet.<\/strong> But they have published early documents that offer a glimpse into what the standards might look like.<\/p>\n<p>Two main documents were published by this working group in August.<\/p>\n<ul class=\"wp-block-list\">\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/draft-ietf-aipref-vocab\/\">A Vocabulary For Expressing AI Usage Preferences<\/a><\/li>\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/draft-ietf-aipref-attach\/\">Associating AI Usage Preferences with Content in HTTP<\/a> (Illyes is one of the authors of this document)<\/li>\n<\/ul>\n<p>Together, these documents propose updates to the existing <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc9309\">Robots Exclusion Protocol (RFC 9309)<\/a>, adding new rules and definitions that let site owners spell out how they want AI systems to use their content on the web.<\/p>\n<h2 id=\"how-it-might-work\" class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_it_might_work\"><\/span>How it might work<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Different AI systems on the web are categorized and given standard labels. It\u2019s still unclear whether there will be a directory where site owners can look up how each system is labeled.<\/p>\n<p>These are the labels defined so far:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>search:<\/strong> for indexing\/discoverability<\/li>\n<li><strong>train-ai:<\/strong> for <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a> AI training<\/li>\n<li><strong>train-genai:<\/strong> for generative AI model training<\/li>\n<li><strong>bots:<\/strong> for all forms of automated processing (including crawling\/scraping)<\/li>\n<\/ul>\n<p>For each of these labels, two values can be set:<\/p>\n<ul class=\"wp-block-list\">\n<li>\u00a0y to allow <\/li>\n<li> n to disallow.\u00a0<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1290\" height=\"1028\" http: alt=\"Relationship Between Categories Of Use\" class=\"wp-image-464992\" srcset=\"https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/relationship-between-categories-of-use.png 1290w, https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/relationship-between-categories-of-use-768x612.png 768w\" data-lazy-sizes=\"(max-width: 1290px) 100vw, 1290px\" src=\"https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/relationship-between-categories-of-use.png\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1290\" height=\"1028\" src=\"https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/relationship-between-categories-of-use.png\" alt=\"Relationship Between Categories Of Use\" class=\"wp-image-464992\" srcset=\"https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/relationship-between-categories-of-use.png 1290w, https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/relationship-between-categories-of-use-768x612.png 768w\" sizes=\"(max-width: 1290px) 100vw, 1290px\"><\/figure>\n<p>The documents also note that these rules can be set at the folder level and customized for different bots. In robots.txt, they\u2019re <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lied through a new Content-Usage field, similar to how the Allow and Disallow fields work today.<\/p>\n<p>Here is an example robots.txt that the working group <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/datatracker.ietf.org\/doc\/draft-ietf-aipref-attach\/#:~:text=Figure%202%20shows%20a%20simple%20%22robots.txt%22%20document\">included in the document<\/a>:<\/p>\n<p>User-Agent: *<br \/>Allow: \/<br \/>Disallow: \/never\/<br \/>Content-Usage: train-ai=n<br \/>Content-Usage: \/ai-ok\/ train-ai=y<\/p>\n<p>Explanation<br \/>Content-Usage: train-ai=n means all the content on this domain isn\u2019t allowed for training any LLM model while Content-Usage: \/ai-ok\/ train-ai=y specifically means that training the models using content of subfolder \/ai-ok\/ is alright.<\/p>\n<h2 id=\"why-does-this-matter\" class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_does_this_matter\"><\/span>Why does this matter?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There\u2019s been a lot of buzz in the SEO world about llms.txt and why site owners should use it alongside robots.txt, but no AI company has confirmed that their crawlers actually follow its rules. And we know Google doesn\u2019t use llms.txt.<\/p>\n<p>Still, site owners want clearer control over how AI companies use their content \u2013\u00a0whether for training models or powering RAG-based answers.<\/p>\n<p>IETF\u2019s work on these new standards feels like a step in the right direction. And with Illyes involved as an author, I\u2019m hopeful that once the standards are finalized, Google and other tech companies will adopt them and respect the new robots.txt rules when scraping content.<\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/searchengineland.com\/robots-exclusions-new-rules-definitions-ietf-464990\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new protocol could give you power over how AI models collect and use your content. See what rules are being drafted and why they matter. In recent years, the open web has felt like the Wild West. Creators have seen their work scraped, processed, and fed into large language models \u2013\u00a0mostly without their consent&#8230;.<\/p>\n","protected":false},"author":1,"featured_media":700688,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/searchengineland.com\/wp-content\/seloads\/2025\/11\/ai-content-crawlers-1920.png","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-700687","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/700687","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=700687"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/700687\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/700688"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=700687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=700687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=700687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}