{"id":654525,"date":"2025-02-24T22:20:14","date_gmt":"2025-02-24T19:20:14","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/anthropic-used-pokemon-to-benchmark-its-newest-ai-model\/"},"modified":"2025-02-24T22:20:14","modified_gmt":"2025-02-24T19:20:14","slug":"anthropic-used-pokemon-to-benchmark-its-newest-ai-model","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/anthropic-used-pokemon-to-benchmark-its-newest-ai-model\/","title":{"rendered":"#Anthropic used Pok\u00e9mon to benchmark its newest AI model"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Anthropic used Pok\u00e9mon to benchmark its newest AI model. Yes, really.<\/p>\n<p class=\"wp-block-paragraph\">In a blog <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.anthropic.com\/news\/claude-3-7-sonnet\">post<\/a> published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">Game<\/a> Boy classic\u00a0Pok\u00e9mon Red. The company equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the screen, allowing it to play Pok\u00e9mon continuously.<\/p>\n<p class=\"wp-block-paragraph\">A unique feature of Claude 3.7 Sonnet is its ability to engage in \u201cextended thinking.\u201d Like OpenAI\u2019s o3-mini and DeepSeek\u2019s R1, Claude 3.7 Sonnet can \u201creason\u201d through challenging problems by <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lying more computing \u2014 and taking more time. <\/p>\n<p class=\"wp-block-paragraph\">That came in handy in Pok\u00e9mon Red, apparently. <\/p>\n<p class=\"wp-block-paragraph\">Compared to a previous version of Claude, Claude 3.0 Sonnet, which failed to leave the house in Pallet Town where the story begins, Claude 3.7 Sonnet successfully battled three Pok\u00e9mon gym leaders\u00a0and won their badges.\u00a0<\/p>\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"1269\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?w=680\" alt=\"Anthropic Pokemon Red\" class=\"wp-image-2970257\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp 1920w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=150,99 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=300,198 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=768,508 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=680,449 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=1200,793 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=1280,846 1280w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=430,284 430w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=720,476 720w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=900,595 900w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=800,529 800w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=1536,1015 1536w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=668,442 668w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=567,375 567w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=934,617 934w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269-1.webp?resize=708,468 708w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\"\/><figcaption class=\"wp-element-caption\"><span class=\"wp-block-image__credits\"><strong>Image Credits:<\/strong>Anthropic<\/span><\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Now, it\u2019s not clear how much computing was required for Claude 3.7 Sonnet to reach those milestones \u2014 and how long each took. Anthropic only said that the model performed 35,000 actions to reach the last gym leader, Surge.<\/p>\n<p class=\"wp-block-paragraph\">It surely won\u2019t be long before some enterprising developer finds out.<\/p>\n<p class=\"wp-block-paragraph\">Pok\u00e9mon Red is more of a toy benchmark than anything. However, there <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/venturebeat.com\/uncategorized\/why-games-may-not-be-the-best-benchmark-for-ai\/\"><em>is<\/em> a long history<\/a> of games being used for AI benchmarking purposes. In the past few months alone, a number of new apps and platforms have cropped up to test models\u2019 game-playing abilities on titles ranging from <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/github.com\/OpenGenerativeAI\/llm-colosseum\">Street Fighter<\/a> to Pictionary.<\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2025\/02\/24\/anthropic-used-pokemon-to-benchmark-its-newest-ai-model\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic used Pok\u00e9mon to benchmark its newest AI model. Yes, really. In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic\u00a0Pok\u00e9mon Red. The company equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":654526,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2019\/01\/pokemon.png?resize=1200,674","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,152300,153093,154495,10751,33597,113210],"class_list":["post-654525","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-anthropic","tag-benchmark","tag-claude-3-7-sonnet","tag-gaming","tag-pokemon","tag-pokemon-red"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/654525","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=654525"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/654525\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/654526"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=654525"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=654525"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=654525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}