{"id":645017,"date":"2024-11-23T08:07:00","date_gmt":"2024-11-23T05:07:00","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit-updated\/"},"modified":"2024-11-23T08:07:00","modified_gmt":"2024-11-23T05:07:00","slug":"openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit-updated","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit-updated\/","title":{"rendered":"#OpenAI accidentally deleted potential evidence in NY Times copyright lawsuit (updated)"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Lawyers for The New York Times and Daily <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">News<\/a>, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case. <\/p>\n<p class=\"wp-block-paragraph\">Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets. (Virtual machines are software-based computers that exist within another computer\u2019s operating system, often used for the purposes of testing, backing up data, and running <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>s.) In a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/storage.courtlistener.com\/recap\/gov.uscourts.nysd.612697\/gov.uscourts.nysd.612697.328.0.pdf\">letter<\/a>, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI\u2019s training data.<\/p>\n<p class=\"wp-block-paragraph\">But on November 14, OpenAI engineers erased all the publishers\u2019 search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. <\/p>\n<p class=\"wp-block-paragraph\">OpenAI tried to recover the data \u2014 and was mostly successful. However, because the folder structure and file names were \u201cirretrievably\u201d lost, the recovered data \u201ccannot be used to determine where the news plaintiffs\u2019 copied articles were used to build [OpenAI\u2019s] models,\u201d per the letter.<\/p>\n<p class=\"wp-block-paragraph\">\u201cNews plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,\u201d counsel for The Times and Daily News wrote. \u201cThe news plaintiffs learned only yesterday that the recovered data is unusable and that an entire week\u2019s worth of its experts\u2019 and lawyers\u2019 work must be re-done, which is why this supplemental letter is being filed today.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The plaintiffs\u2019 counsel makes clear that they have no reason to believe the deletion was intentional. But they do say the incident underscores that OpenAI \u201cis in the best position to search its own datasets\u201d for potentially infringing content using its own tools. <\/p>\n<p class=\"wp-block-paragraph\">An OpenAI spokesperson declined to provide a statement.<\/p>\n<p class=\"wp-block-paragraph\">But late Friday, November 22, counsel for OpenAI filed a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/storage.courtlistener.com\/recap\/gov.uscourts.nysd.612697\/gov.uscourts.nysd.612697.345.0.pdf\">response<\/a> to the letter sent by lawyers for The Times and Daily News on Wednesday. In their response, OpenAI\u2019s attorneys unequivocally denied that OpenAI deleted any evidence, and instead suggested that the plaintiffs were to blame for a system misconfiguration that led to a technical issue. <\/p>\n<p class=\"wp-block-paragraph\">\u201cPlaintiffs requested a configuration change to one of several machines that OpenAI has provided to search training datasets,\u201d OpenAI\u2019s counsel wrote. \u201cImplementing plaintiffs\u2019 requested change, however, resulted in removing the folder structure and some file names on one hard drive \u2014 a drive that was supposed to be used as a temporary cache \u2026 In any event, there is no reason to think that any files were actually lost.\u201d<\/p>\n<p class=\"wp-block-paragraph\">In this case and others, OpenAI has maintained that training models using publicly available data \u2014 including articles from The Times and Daily News \u2014 is fair use. In other words, in creating models like\u00a0GPT-4o, which \u201clearn\u201d from billions of examples of e-books, essays, and more to generate human-sounding text, OpenAI believes that it isn\u2019t required to license or otherwise pay for the examples \u2014 even if it makes money from those models.<\/p>\n<p class=\"wp-block-paragraph\">That being said, OpenAI has inked licensing deals with a growing number of new publishers, including the Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp. OpenAI has declined to make the terms of these deals public, but one content partner, Dotdash, is <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.theverge.com\/2024\/11\/18\/24300144\/openai-is-paying-dotdash-meredith-at-least-16-million-a-year-to-license-its-content-for-ai#:~:text=Richard%20Lawler-,OpenAI%20is%20paying%20Dotdash%20Meredith%20at%20least%20%2416%20million%20a,parent%20company%2C%20Vox%20Media).\">reportedly<\/a> being paid at least $16 million per year.<\/p>\n<p class=\"wp-block-paragraph\">OpenAI has neither confirmed nor denied that it trained its AI systems on any specific copyrighted works without permission.<\/p>\n<p class=\"wp-block-paragraph\"><em>Update: Added OpenAI\u2019s response to the allegations.<\/em><\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2024\/11\/22\/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case. Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News&#8230;<\/p>\n","protected":false},"author":1,"featured_media":645018,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/05\/openAI-spiral-teal.jpg?resize=1200,675","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,152831,147146,72226,5079,152832,141199],"class_list":["post-645017","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-daily-news","tag-generative-ai","tag-lawsuit","tag-new-york-times","tag-ny-times","tag-openai"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/645017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=645017"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/645017\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/645018"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=645017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=645017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=645017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}