{"id":474135,"date":"2022-07-15T01:29:30","date_gmt":"2022-07-14T22:29:30","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/"},"modified":"2022-07-15T01:29:30","modified_gmt":"2022-07-14T22:29:30","slug":"scathing-study-exposes-googles-harmful-approach-to-ai-development","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/","title":{"rendered":"#Scathing study exposes Google\u2019s harmful approach to AI development"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a3f0ee060903\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a3f0ee060903\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#%E2%80%9CScathing_study_exposes_Googles_harmful_approach_to_AI_development%E2%80%9D\" >&#8220;Scathing study exposes Google\u2019s harmful approach to AI development&#8221;<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#Greetings_humanoids\" >Greetings, humanoids<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#The_study\" >The study<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#The_problem\" >The problem<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#A_bit_deeper\" >A bit deeper<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#The_solution\" >The solution<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/buradabiliyorum.com\/en\/scathing-study-exposes-googles-harmful-approach-to-ai-development\/#Final_thoughts\" >Final thoughts<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"%E2%80%9CScathing_study_exposes_Googles_harmful_approach_to_AI_development%E2%80%9D\"><\/span>&#8220;Scathing study exposes Google\u2019s harmful approach to AI development&#8221;<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<div id=\"article-main-content\">\n                            A study published earlier this week <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.surgehq.ai\/\/blog\/30-percent-of-googles-reddit-emotions-dataset-is-mislabeled\">by Surge AI<\/a> <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>ears to lay bare one of the biggest problems plaguing the AI industry: bullshit, exploitative data-labeling practices.<\/p>\n<p>Last year, Google built a dataset called \u201cGoEmotions.\u201d It was billed as a \u201cfine-grained emotion dataset\u201d \u2014 basically a ready-to-train-on dataset for building AI that can recognize emotional sentiment in text.<\/p>\n<p>Per <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ai.googleblog.com\/2021\/10\/goemotions-dataset-for-fine-grained.html\">a Google blog post<\/a>:<\/p>\n<blockquote><p>In \u201cGoEmotions: A Dataset of Fine-Grained Emotions\u201d, we describe GoEmotions, a human-annotated dataset of 58k <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">Reddit<\/a> comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset to date, we designed the GoEmotions taxonomy with both psychology and data applicability in mind.<\/p>\n<\/blockquote>\n<div class=\"inarticle-wrapper neural channel-cta hs-embed-tnw\">\n<div id=\"hs-embed-tnw\" class=\"channel-cta-wrapper\">\n<div class=\"channel-cta-img\"><img class=\"js-lazy\" https:=\"\"\/><\/div>\n<p><noscript><img decoding=\"async\" src=\"https:\/\/thenextweb.com\/news\/src=\" https:=\"\"\/><\/noscript><\/p>\n<div class=\"channel-cta-input\">\n<h2 class=\"channel-cta-title\"><span class=\"ez-toc-section\" id=\"Greetings_humanoids\"><\/span>Greetings, humanoids<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"channel-cta-tagline\">Subscribe to our <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>letter now for a weekly recap of our favorite AI stories in your inbox.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p>Here\u2019s another way of putting it: Google scraped 58,000 Reddit comments and then sent those files to a third-party company for labeling. More on that later.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_study\"><\/span>The study<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Surge AI took a look at a sample of 1,000 labeled comments from the GoEmotions dataset and found that a significant portion of them were mislabeled.<\/p>\n<p>Per the study:<\/p>\n<blockquote><p>A whopping 30% of the dataset is severely mislabeled! (We tried training a model on the dataset ourselves, but noticed deep quality issues. So we took 1000 random comments, asked Surgers whether the original emotion was reasonably accurate, and found strong errors in 308 of them.)<\/p>\n<\/blockquote>\n<p>It goes on to point out some of the major problems with the dataset, including this doozy:<\/p>\n<blockquote><p>Problem #1: \u201cReddit comments were presented with no additional metadata\u201d<\/p>\n<p>First of all, language doesn\u2019t live in a vacuum! Why would you present a comment with no additional metadata? The subreddit and parent post it\u2019s replying to are especially important context.<\/p>\n<p>Imagine you see the comment \u201chis traps hide the fucking sun\u201d by itself. Would you have any idea what it means? Probably not \u2013 maybe that\u2019s why Google mislabeled it.<\/p>\n<p>But what if you were told it came from the \/r\/nattyorjuice subreddit dedicated to bodybuilding? Would you realize, then, that traps refers to someone\u2019s trapezoid muscles?<\/p>\n<\/blockquote>\n<h2><span class=\"ez-toc-section\" id=\"The_problem\"><\/span>The problem<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This kind of data can\u2019t be properly labeled. Using the above \u201chis traps hide the fucking sun\u201d comment as an example, it\u2019s impossible to imagine a single person on the planet capable of understanding every edge case when it comes to human sentiment.<\/p>\n<p>It\u2019s not that the particular labelers didn\u2019t do a good job, it\u2019s that they were given an impossible task.<\/p>\n<p>There are no shortcuts to gleaning insight into human communications. We\u2019re not stupid like machines are. We can incorporate our entire environment and lived history into the context of our communications and, through the tamest expression of our masterful grasp on semantic manipulation, turn nonsense into philosophy (shit happens) or turn a truly mundane statement into the punchline of an ageless joke (to get to the other side).<\/p>\n<p>What these Google researchers have done is spent who knows how much time and money developing a crappy digital version of a Magic 8-Ball. Sometimes it\u2019s right, sometimes it\u2019s wrong, and there\u2019s no way to be sure one way or another.<\/p>\n<p>This particular kind of AI development is a grift. It\u2019s a scam. And it\u2019s one of the oldest in the book.<\/p>\n<p>Here\u2019s how it works: The researchers took an impossible problem, \u201chow to determine human sentiment in text at massive scales without context,\u201d and used the magic of bullshit to turn it into a relatively simple one that any AI can solve \u201chow to match keywords to labels.\u201d<\/p>\n<p>The reason it\u2019s a grift is because you don\u2019t need AI to match keywords to labels. Hell, you could do that in Microsoft Excel 20 years ago.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"A_bit_deeper\"><\/span>A bit deeper<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>You know the dataset the AI was trained on contains mislabeled data. Thus, the only way you can be absolutely sure that a given result it returns\u00a0is accurate is to verify it yourself \u2014 you have to be the so-called <i>human in the loop.\u00a0<\/i>But what about all the results it doesn\u2019t return that it should?<\/p>\n<p><span>We\u2019re not trying to find all the cars that are red in a dataset of automobile images. We\u2019re making determinations about human beings.<\/span><\/p>\n<p><span>If the AI screws up and misses some red cars, those cars are unlikely to suffer negative outcomes. And if it accidentally labels some blue cars as red, those blue cars should be okay. <\/span><\/p>\n<p><span>But this particular dataset is specifically built for decision-making related to human outcomes.<\/span><\/p>\n<p><span>Per Google:<\/span><\/p>\n<blockquote><p><span>It\u2019s been a long-term goal among the research community to enable machines to understand context and emotion, which would, in turn, enable a variety of applications, including empathetic chatbots, models to detect harmful online behavior, and improved customer support interactions.<\/span><\/p>\n<\/blockquote>\n<p><span>Again, we know for a fact that any AI model trained on this dataset will produce erroneous outputs. That means every single time the AI makes a decision that either rewards or punishes any human, it causes demonstrable harm to other humans.<\/span><\/p>\n<p><span>If the AI\u2019s output can be used to influence human rewards \u2014 by, for example, surfacing all the resumes in a stack that have \u201cpositive sentiment\u201d in them \u2014 we have to assume that some of the files it didn\u2019t surface were wrongfully discriminated against. <\/span><\/p>\n<p><span>That\u2019s something humans-in-the-loop cannot help with. It would require a person to review every single file that\u00a0<em>wasn\u2019t<\/em> selected.<\/span><\/p>\n<p><span>And, if the AI has the ability to influence human <em>punishments<\/em> \u2014 by, for example, taking down content it considers \u201chate speech\u201d \u2014 we can be certain that sentiments that objectively don\u2019t deserve punishment will be erroneously surfaced and, thus, humans will be harmed. <\/span><\/p>\n<p><span>Worst of all, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.aclu.org\/news\/privacy-technology\/how-artificial-intelligence-can-deepen-racial-and-economic-inequities\">study after study<\/a> demonstrates that these systems are inherently full of human bias and that minority groups are always disproportionately negatively-impacted.\u00a0<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_solution\"><\/span><span>The solution<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span>There\u2019s only one way to fix this kind of research: throw it in the trash.<\/span><\/p>\n<p><span>It is our stance here at Neural that it is entirely unethical to train an AI on human-created content without the expressed individual consent of the humans who created it. <\/span><\/p>\n<p><span>Whether it\u2019s legal to do so or not is irrelevant. When I post on Reddit, I do so in the good faith that my discourse is intended for other humans. Google doesn\u2019t compensate me for my data so it shouldn\u2019t use it, even if the terms of service allow for it. <\/span><\/p>\n<p><span>Furthermore, it is also our stance that it is unethical to deploy AI models trained on data that hasn\u2019t been verified to be error-free when the output from those models has the potential to affect human outcomes.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Final_thoughts\"><\/span>Final thoughts<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span>Google\u2018s researchers aren\u2019t stupid. They know that a generic \u201ckeyword search and comparison\u201d algorithm can\u2019t turn an AI model into a human-level expert in psychology, sociology, pop-culture, and semantics just because they feed it a dataset full of randomly-mislabeled Reddit posts.\u00a0<\/span><\/p>\n<p>You can draw your own conclusions as to their motivations.<\/p>\n<p><span>But no amount of talent and <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a> can turn a bag full of bullshit into a useful AI model when human outcomes are at stake.<\/span>\n                        <\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/scathing-study-exposes-googles-harmful-approach-ai-development\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Scathing study exposes Google\u2019s harmful approach to AI development&#8221; A study published earlier this week by Surge AI appears to lay bare one of the biggest problems plaguing the AI industry: bullshit, exploitative data-labeling practices. Last year, Google built a dataset called \u201cGoEmotions.\u201d It was billed as a \u201cfine-grained emotion dataset\u201d \u2014 basically a ready-to-train-on&#8230;<\/p>\n","protected":false},"author":1,"featured_media":474136,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/10\/pichaiethics.jpg&signature=8c3d968a9a275c2f483385e3d5fac783","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-474135","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/474135","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=474135"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/474135\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/474136"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=474135"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=474135"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=474135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}