{"id":645371,"date":"2024-11-27T14:06:03","date_gmt":"2024-11-27T11:06:03","guid":{"rendered":""},"modified":"2024-11-27T14:06:03","modified_gmt":"2024-11-27T11:06:03","slug":"blueskys-open-api-means-anyone-can-scrape-your-data-for-ai-training","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/blueskys-open-api-means-anyone-can-scrape-your-data-for-ai-training\/","title":{"rendered":"#Bluesky&#8217;s open API means anyone can scrape your data for AI training"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Bluesky might not be training AI systems on user content as other <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">social<\/a> networks are doing, but there\u2019s little stopping third-parties from doing so.<\/p>\n<p class=\"wp-block-paragraph\">Per a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.404media.co\/someone-made-a-dataset-of-one-million-bluesky-posts-for-machine-learning-research\/\">report by 404 Media<\/a>, a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/bsky.app\/profile\/danielvanstrien.bsky.social\">machine learning librarian<\/a> at AI firm Hugging Face <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/bsky.app\/profile\/danielvanstrien.bsky.social\/post\/3lbu6l4fxdc2e\">pulled 1 million public posts<\/a> from Bluesky via its <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/docs.bsky.app\/docs\/advanced-guides\/firehose\">Firehose API<\/a> for machine learning research, pushing the dataset <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/huggingface.co\/datasets\/bluesky-community\/one-million-bluesky-posts\">to a public repository<\/a>. Daniel van Strien later <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/bsky.app\/profile\/danielvanstrien.bsky.social\/post\/3lbvih4luvk23\">removed the data<\/a> due to the controversy that ensued, however it serves as a timely reminder that everything you post publicly to Bluesky is, well, public.<\/p>\n<p class=\"wp-block-paragraph\">Bluesky said that it\u2019s looking at ways to enable users to communicate their consent preferences externally, though it\u2019s up to those parties whether they respect those preferences.<\/p>\n<p class=\"wp-block-paragraph\">The company <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/bsky.app\/profile\/did:plc:z72i7hdynmk6r22z27h6tvur\/post\/3lbvgvbvl6c2c\">posted<\/a>: \u201cBluesky won\u2019t be able to enforce this consent outside of our systems. It will be up to outside developers to respect these settings. We\u2019re having ongoing conversations with engineers &amp; lawyers and we hope to have more updates to share on this shortly!\u201d<\/p>\n<p class=\"wp-block-paragraph\">What\u2019s clear here is that while Bluesky is surging in popularity, its rapid rise to the forefront of the global consciousness will mean it\u2019s subject to the same levels of scrutiny as other major social platforms.<\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2024\/11\/27\/blueskys-open-api-means-anyone-can-scrape-your-data-for-ai-training\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bluesky might not be training AI systems on user content as other social networks are doing, but there\u2019s little stopping third-parties from doing so. Per a report by 404 Media, a machine learning librarian at AI firm Hugging Face pulled 1 million public posts from Bluesky via its Firehose API for machine learning research, pushing&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-645371","post","type-post","status-publish","format-standard","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/645371","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=645371"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/645371\/revisions"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=645371"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=645371"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=645371"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}