{"id":658235,"date":"2025-03-21T18:17:35","date_gmt":"2025-03-21T15:17:35","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/microsoft-is-exploring-a-way-to-credit-contributors-to-ai-training-data\/"},"modified":"2025-03-21T18:17:35","modified_gmt":"2025-03-21T15:17:35","slug":"microsoft-is-exploring-a-way-to-credit-contributors-to-ai-training-data","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/microsoft-is-exploring-a-way-to-credit-contributors-to-ai-training-data\/","title":{"rendered":"#Microsoft is exploring a way to credit contributors to AI training data"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Microsoft is launching a research project to estimate the influence of specific training examples on the text, images, and other types of <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">media<\/a> that generative AI models create. <\/p>\n<p class=\"wp-block-paragraph\">That\u2019s <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.linkedin.com\/jobs\/view\/research-intern-training-time-provenance-data-dignity-at-microsoft-4079653050\/\">per a job listing<\/a> dating back to December that was recently recirculated on LinkedIn. <\/p>\n<p class=\"wp-block-paragraph\">According to the listing, which seeks a research intern, the project will attempt to demonstrate that models can be trained in such a way that the impact of particular data \u2014 e.g. photos and books \u2014 on their outputs can be \u201cefficiently and usefully estimated.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cCurrent neural network architectures are opaque in terms of providing sources for their generations, and there are [\u2026] good reasons to change this,\u201d reads the listing. \u201c[One is,] incentives, recognition, and potentially pay for people who contribute certain valuable data to unforeseen kinds of models we will want in the future, assuming the future will surprise us fundamentally.\u201d<\/p>\n<p class=\"wp-block-paragraph\">AI-powered text, code, image, video, and song generators are at the center of <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.wired.com\/story\/ai-copyright-case-tracker\/\">a number of IP lawsuits<\/a> against AI companies. Frequently, these companies train their models on massive amounts of data from public websites, some of which is copyrighted. Many of the companies argue that\u00a0<a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/en.wikipedia.org\/wiki\/Fair_use\">fair use doctrine<\/a>\u00a0shields their data-scraping and training practices. But creatives \u2014 from artists to programmers to authors \u2014 largely disagree.<\/p>\n<p class=\"wp-block-paragraph\">Microsoft itself is facing at least two legal challenges from copyright holders. <\/p>\n<p class=\"wp-block-paragraph\">The New York Times sued the tech giant and its sometime collaborator, OpenAI, in December, accusing the two companies of infringing on The Times\u2019 copyright by deploying models trained on millions of its articles. <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.lexology.com\/library\/detail.aspx?g=e8d937ff-80a0-468d-9ba8-7e0221879609\">Several software developers<\/a> have also filed suit against Microsoft, claiming that the firm\u2019s GitHub Copilot AI coding assistant was unlawfully trained using their protected works.<\/p>\n<p class=\"wp-block-paragraph\">Microsoft\u2019s new research effort, which the listing describes as \u201ctraining-time provenance,\u201d <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/glenweyl\/status\/1864744547143540771\">reportedly<\/a> has the involvement of Jaron Lanier, <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/en.wikipedia.org\/wiki\/Jaron_Lanier\">the accomplished technologist and interdisciplinary scientist<\/a> at Microsoft Research. In an April 2023 <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.newyorker.com\/science\/annals-of-artificial-intelligence\/there-is-no-ai\">op-ed in The New Yorker<\/a>, Lanier wrote about the concept of \u201cdata dignity,\u201d which to him meant connecting \u201cdigital stuff\u201d with \u201cthe humans who want to be known for having made it.\u201d <\/p>\n<p class=\"wp-block-paragraph\">\u201cA data-dignity <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>roach would trace the most unique and influential contributors when a big model provides a valuable output,\u201d Lanier wrote. \u201cFor instance, if you ask a model for \u2018an animated movie of my kids in an oil-painting world of talking cats on an adventure,\u2019 then certain key oil painters, cat portraitists, voice actors, and writers \u2014 or their estates \u2014 might be calculated to have been uniquely essential to the creation of the new masterpiece. They would be acknowledged and motivated. They might even get paid.\u201d<\/p>\n<p class=\"wp-block-paragraph\">There are, not for nothing, already several companies attempting this. AI model developer Bria, which recently raised $40 million in venture capital, claims to \u201cprogrammatically\u201d compensate data owners according to their \u201coverall influence.\u201d Adobe and Shutterstock also award regular payouts to dataset contributors, although the exact payout amounts tend to be opaque.<\/p>\n<p class=\"wp-block-paragraph\">Few large labs have established individual contributor payout programs outside of inking licensing agreements with publishers, platforms, and data brokers. They\u2019ve instead provided means for copyright holders to \u201copt out\u201d of training. But some of these opt-out processes are onerous, and only apply to future models \u2014 not previously-trained ones.<\/p>\n<p class=\"wp-block-paragraph\">Of course, Microsoft\u2019s project may amount to little more than a proof of concept. There\u2019s precedent for that. Back in\u00a0May, OpenAI said it was developing similar <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a> that would let creators specify how they want their works to be included in \u2014 or excluded from \u2014 training data. But nearly a year later, the tool has yet to see the light of day, and it often hasn\u2019t been viewed as a priority internally.<\/p>\n<p class=\"wp-block-paragraph\">Microsoft may also be trying to \u201c<a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.workplaceethicsadvice.com\/2021\/04\/what-is-ethics-washing.html\">ethics wash,<\/a>\u201d here \u2014 or head off regulatory and\/or court decisions disruptive to its AI business.<\/p>\n<p class=\"wp-block-paragraph\">But that the company is investigating ways to trace training data is notable in light of other AI labs\u2019 recently expressed stances on fair use. Several of the top labs, including Google and OpenAI, have published policy documents recommending that the Trump Administration weaken copyright protections as they relate to AI development. OpenAI has explicitly called on the U.S. government to codify fair use for model training, which it argues would free developers from burdensome restrictions.<\/p>\n<p class=\"wp-block-paragraph\">Microsoft didn\u2019t immediately respond to a request for comment.<\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2025\/03\/21\/microsoft-is-exploring-a-way-to-credit-contributors-to-ai-training-data\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft is launching a research project to estimate the influence of specific training examples on the text, images, and other types of media that generative AI models create. That\u2019s per a job listing dating back to December that was recently recirculated on LinkedIn. According to the listing, which seeks a research intern, the project will&#8230;<\/p>\n","protected":false},"author":1,"featured_media":658236,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/01\/GettyImages-2153485379.jpg?resize=1200,800","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,151443,70286],"class_list":["post-658235","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-media-entertainment","tag-microsoft"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/658235","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=658235"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/658235\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/658236"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=658235"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=658235"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=658235"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}