{"id":646389,"date":"2024-12-10T06:10:16","date_gmt":"2024-12-10T03:10:16","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/a-test-for-agi-is-closer-to-being-solved-but-it-may-be-flawed\/"},"modified":"2024-12-10T06:10:16","modified_gmt":"2024-12-10T03:10:16","slug":"a-test-for-agi-is-closer-to-being-solved-but-it-may-be-flawed","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/a-test-for-agi-is-closer-to-being-solved-but-it-may-be-flawed\/","title":{"rendered":"#A test for AGI is closer to being solved \u2014 but it may be flawed"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A well-known test for artificial <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a> intelligence (AGI) is closer to being solved. But the tests\u2019s creators say this points to flaws in the test\u2019s design, rather than a bonafide research breakthrough.<\/p>\n<p class=\"wp-block-paragraph\">In 2019, Francois Chollet, a leading figure in the AI world, introduced the ARC-AGI benchmark, short for \u201cAbstract and Reasoning Corpus for Artificial General Intelligence.\u201d Designed to evaluate whether an AI system can efficiently acquire new skills outside the data it was trained on, <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/arcprize.org\/arc\">ARC-AGI<\/a>, Francois claims, remains the only AI test to measure progress towards general intelligence (although <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/arxiv.org\/html\/2311.02462v2#:~:text=Our%20intent%20is%20that%20an,interpersonal%20and%20intra%2Dpersonal%20social\">others<\/a> have been proposed.)<\/p>\n<p class=\"wp-block-paragraph\">Until this year, the best-performing AI could only solve just under a third of the tasks in ARC-AGI. Chollet blamed the industry\u2019s focus on large language models (LLMs), which he believes aren\u2019t capable of actual \u201creasoning.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cLLMs struggle with generalization, due to being entirely reliant on memorization,\u201d he <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/fchollet\/status\/1755250582334709970\">said<\/a> in a <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/watch-movies-tv-seriess\/\" data-internallinksmanager029f6b8e52c=\"8\" title=\"Watch Movies &amp; TV Series\" target=\"_blank\" rel=\"noopener\">series<\/a> of posts on X in February. \u201cThey break down on anything that wasn\u2019t in the their training data.\u201d<\/p>\n<p class=\"wp-block-paragraph\">To Chollet\u2019s point, LLMs are statistical machines. Trained on a lot of examples, they learn patterns in those examples to make predictions, like that \u201cto whom\u201d in an email typically precedes \u201cit may concern.\u201d <\/p>\n<p class=\"wp-block-paragraph\">Chollet asserts that while LLMs might be capable of memorizing \u201creasoning patterns,\u201d it\u2019s unlikely that they can generate \u201cnew reasoning\u201d based on novel situations. \u201cIf you need to be trained on many examples of a pattern, even if it\u2019s implicit, in order to learn a reusable representation for it, you\u2019re memorizing,\u201d Chollet <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/fchollet\/status\/1689706474996645888\">argued<\/a> in another post.<\/p>\n<p class=\"wp-block-paragraph\">To incentivize research beyond LLMs, in June, Chollet and Zapier co-founder Mike Knoop launched a $1 million <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/arcprize.org\/\">competition<\/a> to build open source AI capable of beating ARC-AGI. Out of 17,789 submissions, the best scored 55.5% \u2014 ~20% higher than 2023\u2019s top scorer, albeit short of the 85%, \u201chuman-level\u201d threshold required to win.<\/p>\n<p class=\"wp-block-paragraph\">This doesn\u2019t mean we\u2019re ~20% closer to AGI, though, Knoop says.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Today we\u2019re announcing the winners of ARC Prize 2024. We\u2019re also publishing an extensive technical report on what we learned from the competition (link in the next tweet).<\/p>\n<p>The state-of-the-art went from 33% to 55.5%, the largest single-year increase we\u2019ve seen since 2020. The\u2026<\/p>\n<p>\u2014 Fran\u00e7ois Chollet (@fchollet) <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/twitter.com\/fchollet\/status\/1865106416329191883?ref_src=twsrc%5Etfw\">December 6, 2024<\/a><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">In a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/arcprize.org\/blog\/arc-prize-2024-winners-technical-report\">blog post<\/a>, Knoop said that many of the submissions to ARC-AGI have been able to \u201cbrute force\u201d their way to a solution, suggesting that a \u201clarge fraction\u201d of ARC-AGI tasks \u201c[don\u2019t] carry much useful signal towards general intelligence.\u201d <\/p>\n<p class=\"wp-block-paragraph\">ARC-AGI consists of puzzle-like problems where an AI has to, given a grid of different-colored squares, generate the correct \u201canswer\u201d grid. The problems were designed to force an AI to adapt to new problems it hasn\u2019t seen before. But it\u2019s not clear they\u2019re achieving this. <\/p>\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1600\" height=\"840\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?w=680\" alt=\"ARC-AGI benchmark\" class=\"wp-image-2928216\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg 1600w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=150,79 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=300,158 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=768,403 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=680,357 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=1200,630 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=1280,672 1280w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=430,226 430w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=720,378 720w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=900,473 900w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=800,420 800w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=1536,806 1536w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=668,351 668w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=1175,617 1175w, https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/12\/arc-example-task.jpg?resize=708,372 708w\" sizes=\"auto, (max-width: 1600px) 100vw, 1600px\"\/><figcaption class=\"wp-element-caption\"><span class=\"wp-element-caption__text\">Tasks in the ARC-AGI benchmark. Models must solve \u2018problems\u2019 in the top row; the bottom row shows solutions. <\/span><span class=\"wp-block-image__credits\"><strong>Image Credits:<\/strong>ARC-AGI<\/span><\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">\u201c[ARC-AGI] has been unchanged since 2019 and is not perfect,\u201d Knoop acknowledged in his post.<\/p>\n<p class=\"wp-block-paragraph\">Francois and Knoop have also faced <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/news.ycombinator.com\/item?id=42343215\">criticism<\/a> for overselling ARC-AGI as benchmark toward AGI \u2014 at a time when the very definition of AGI is being hotly contested. One OpenAI staff member recently <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/search?q=openai%20already%20achieved%20agi&amp;src=typed_query\">claimed<\/a> that AGI has \u201calready\u201d been achieved if one defines AGI as AI \u201cbetter than most humans at most tasks.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Knoop and Chollet say that they plan to release a second-gen ARC-AGI benchmark to address these issues, alongside a 2025 competition. \u201cWe will continue to direct the efforts of the research community towards what we see as the most important unsolved problems in AI, and accelerate the timeline to AGI,\u201d Chollet wrote in an X <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/fchollet\/status\/1865106418329849920\">post<\/a>.<\/p>\n<p class=\"wp-block-paragraph\">Fixes likely won\u2019t come easy. If the first ARC-AGI test\u2019s shortcomings are any indication, defining intelligence for AI will be as intractable \u2014 and <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.reddit.com\/r\/OpenAI\/comments\/1fgq0oy\/openai_o1_results_on_arcagi_benchmark\/\">inflammatory<\/a> \u2014 as it has been for human beings.<\/p>\n<\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2024\/12\/09\/a-test-for-agi-is-closer-to-being-solved-but-it-may-be-flawed\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A well-known test for artificial general intelligence (AGI) is closer to being solved. But the tests\u2019s creators say this points to flaws in the test\u2019s design, rather than a bonafide research breakthrough. In 2019, Francois Chollet, a leading figure in the AI world, introduced the ARC-AGI benchmark, short for \u201cAbstract and Reasoning Corpus for Artificial&#8230;<\/p>\n","protected":false},"author":1,"featured_media":646390,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2024\/09\/GettyImages-496822526.jpg?resize=1200,800","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,76255,151633,153090,153091,153092,153093,153094,147146,27951,153095,61514,32740],"class_list":["post-646389","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-study","tag-agi","tag-arc-agi","tag-arc-agi-benchmark","tag-artificial-general-intelligence","tag-benchmark","tag-francois-chollet","tag-generative-ai","tag-intelligence","tag-reasoning","tag-research","tag-test"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/646389","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=646389"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/646389\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/646390"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=646389"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=646389"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=646389"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}