{"id":479855,"date":"2022-07-31T23:50:54","date_gmt":"2022-07-31T20:50:54","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/"},"modified":"2022-07-31T23:50:54","modified_gmt":"2022-07-31T20:50:54","slug":"large-language-models-cant-plan-even-if-they-write-fancy-essays","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/","title":{"rendered":"#Large language models can\u2019t plan, even if they write fancy essays"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a2e57db993e4\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a2e57db993e4\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/#%E2%80%9CLarge_language_models_cant_plan_even_if_they_write_fancy_essays%E2%80%9D\" >&#8220;Large language models can\u2019t plan, even if they write fancy essays&#8221;<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/#Greetings_humanoids\" >Greetings, humanoids<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/#The_illusion_of_planning_and_reasoning\" >The illusion of planning and reasoning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/#System_1_and_System_2_thinking\" >System 1 and System 2 thinking<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/#A_new_benchmark_for_testing_planning_in_LLMs\" >A new benchmark for testing planning in LLMs<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/buradabiliyorum.com\/en\/large-language-models-cant-plan-even-if-they-write-fancy-essays\/#Large_language_models_are_bad_at_planning\" >Large language models are bad at planning<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"%E2%80%9CLarge_language_models_cant_plan_even_if_they_write_fancy_essays%E2%80%9D\"><\/span>&#8220;Large language models can\u2019t plan, even if they write fancy essays&#8221;<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<div id=\"article-main-content\">\n                            <em>This article is part of our coverage of the latest in <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/tag\/ai-research-papers\/\">AI research<\/a>.<\/em><\/p>\n<p>Large language models like GPT-3 have advanced to the point that it has become difficult to measure the limits of their capabilities. When you have a very large neural network that can <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/09\/14\/guardian-gpt-3-article-ai-fake-news\/\">generate articles<\/a>, write <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/07\/15\/openai-codex-ai-programming\/\">software code<\/a>, and engage in conversations about <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2022\/06\/20\/lamda-large-language-models-sentient-ai\/\">sentience and life<\/a>, you should expect it to be able to reason about tasks and plan as a human does, right?<\/p>\n<p>Wrong. A <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2206.10498\">study<\/a> by researchers at Arizona State University, Tempe, shows that when it comes to planning and thinking methodically, LLMs perform very poorly, and suffer from many of the same failures observed in current deep learning systems.<\/p>\n<div class=\"inarticle-wrapper neural channel-cta hs-embed-tnw\">\n<div id=\"hs-embed-tnw\" class=\"channel-cta-wrapper\">\n<div class=\"channel-cta-img\"><img class=\"js-lazy\" https:=\"\"\/><\/div>\n<p><noscript><img decoding=\"async\" src=\"https:\/\/thenextweb.com\/news\/src=\" https:=\"\"\/><\/noscript><\/p>\n<div class=\"channel-cta-input\">\n<h2 class=\"channel-cta-title\"><span class=\"ez-toc-section\" id=\"Greetings_humanoids\"><\/span>Greetings, humanoids<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"channel-cta-tagline\">Subscribe to our <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>letter now for a weekly recap of our favorite AI stories in your inbox.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p>Interestingly, the study finds that, while very large LLMs like <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/08\/17\/openai-gpt-3-commercial-ai\/\">GPT-3<\/a> and PaLM pass many of the tests that were meant to evaluate the reasoning capabilities and artificial intelligence systems, they do so because these benchmarks are either too simplistic or too flawed and can be \u201ccheated\u201d through statistical tricks, something that deep learning systems are very good at.<\/p>\n<p>With LLMs breaking new ground every day, the authors suggest a new benchmark to test the planning and reasoning capabilities of AI systems. The researchers hope that their findings can help steer AI research toward developing artificial intelligence systems that can handle what has become popularly known as \u201c<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/12\/23\/yoshua-bengio-neurips-2019-deep-learning\/\">system 2 thinking<\/a>\u201d tasks.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_illusion_of_planning_and_reasoning\"><\/span>The illusion of planning and reasoning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u201cBack last year, we were evaluating GPT-3\u2019s ability to extract plans from text de<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">script<\/a>ions\u2014a task that was attempted with special purpose methods earlier\u2014and found that off-the-shelf GPT-3 does quite well compared to the special purpose methods,\u201d Subbarao Kambhampati, professor at Arizona State University and co-author of the study, told TechTalks. \u201cThat naturally made us wonder what \u2018emergent capabilities\u2019\u2014if any\u2013GPT3 has for doing the simplest planning problems (e.g., generating plans in toy domains). We found right away that GPT3 is pretty spectacularly bad on anecdotal tests.\u201d<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Intrigued by the profusion of &#8217;em &#8220;<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/hashtag\/LLM?src=hash&amp;ref_src=twsrc%5Etfw\">#LLM<\/a>&#8216;s are Zero-shot &lt;XXX&gt;&#8217;s&#8221;  papers, we set out to see how good LLMs are at planning and reasoning about change.<\/p>\n<p>tldr; off-the-shelf <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/hashtag\/GPT3?src=hash&amp;ref_src=twsrc%5Etfw\">#GPT3<\/a> is pretty bad at these..<\/p>\n<p>\ud83d\udc49<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/t.co\/JuSjU9xSRY\">https:\/\/t.co\/JuSjU9xSRY<\/a> <\/p>\n<p>(w\/ <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/karthikv792?ref_src=twsrc%5Etfw\">@karthikv792<\/a> <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/sarath_ssreedh?ref_src=twsrc%5Etfw\">@sarath_ssreedh<\/a> &amp; <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/_aolmo_?ref_src=twsrc%5Etfw\">@_aolmo_<\/a>) 1\/ <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/t.co\/sWqvDlTv3W\">pic.twitter.com\/sWqvDlTv3W<\/a><\/p>\n<p>\u2014 Subbarao Kambhampati (\u0c15\u0c02\u0c2d\u0c02\u0c2a\u0c3e\u0c1f\u0c3f \u0c38\u0c41\u0c2c\u0c4d\u0c2c\u0c3e\u0c30\u0c3e\u0c35\u0c41) (@rao2z) <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/twitter.com\/rao2z\/status\/1539435614503768065?ref_src=twsrc%5Etfw\">June 22, 2022<\/a><\/p>\n<\/blockquote>\n<p>However, one interesting fact is that GPT-3 and other large language models perform very well on benchmarks designed for common-sense reasoning, logical reasoning, and ethical reasoning, skills that were previously thought to be off-limits for deep learning systems. A <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2106.07131\">previous study<\/a> by Kambhampati\u2019s group at Arizona State University shows the effectiveness of large language models in generating plans from text descriptions. Other recent studies include one that shows LLMs can do <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2022\/07\/11\/large-language-models-zero-shot-reasoning\/\">zero-shot reasoning<\/a> if provided with a special trigger phrase.<\/p>\n<p>However, \u201creasoning\u201d is often used broadly in these benchmarks and studies, Kambhampati believes. What LLMs are doing, in fact, is creating a semblance of planning and reasoning through pattern recognition.<\/p>\n<p>\u201cMost benchmarks depend on shallow (one or two steps) type of reasoning, as well as tasks for which there is sometimes no actual ground truth (e.g., getting LLMs to reason about ethical dilemmas),\u201d he said. \u201cIt is possible for a purely pattern completion engine with no reasoning capabilities to still do fine on some of such benchmarks. After all, while System 2 reasoning abilities can get compiled to System 1 sometimes, it is also the case that System 1\u2019s \u2018reasoning abilities\u2019 may just be reflexive responses from patterns the system has seen in its training data, without actually doing anything resembling reasoning.\u201d<\/p>\n<h2><span class=\"ez-toc-section\" id=\"System_1_and_System_2_thinking\"><\/span>System 1 and System 2 thinking<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2022\/01\/24\/ai-thinking-fast-and-slow\/\">System 1 and System 2<\/a> thinking were popularized by psychologist Daniel Kahneman in his book Thinking Fast and Slow. The former is the fast, reflexive, and automated type of thinking and acting that we do most of the time, such as walking, brushing our teeth, tying our shoes, or driving in a familiar area. Even a large part of speech is performed by System 1.<\/p>\n<p>System 2, on the other hand, is the slower thinking mode that we use for tasks that require methodical planning and analysis. We use System 2 to solve calculus equations, play chess, design software, plan a <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/trip-and-travel\/\" data-internallinksmanager029f6b8e52c=\"10\" title=\"Trip &amp; Travel\" target=\"_blank\" rel=\"noopener\">trip<\/a>, solve a puzzle, etc.<\/p>\n<p>But the line between System 1 and System 2 is not clear-cut. Take driving, for example. When you are learning to drive, you must fully concentrate on how you coordinate your muscles to control the gear, steering wheel, and pedals while also keeping an eye on the road and the side and rear mirrors. This is clearly System 2 at work. It consumes a lot of energy, requires your full attention, and is slow. But as you gradually repeat the procedures, you learn to do them without thinking. The task of driving shifts to your System 1, enabling you to perform it without taxing your mind. One of the criteria of a task that has been integrated into System 1 is the ability to do it subconsciously while focusing on another task (e.g., you can tie your shoe and talk at the same time, brush your teeth and read, drive and talk, etc.).<\/p>\n<p>Even many of the very complicated tasks that remain in the domain of System 2 eventually become partly integrated into System 1. For example, professional chess players rely a lot on pattern recognition to speed up their decision-making process. You can see similar examples in math and programming, where after doing things over and over again, some of the tasks that previously required careful thinking come to you automatically.<\/p>\n<p>A similar phenomenon might be happening in deep learning systems that have been exposed to very large datasets. They might have learned to do the simple pattern-recognition phase of complex reasoning tasks.<\/p>\n<p>\u201cPlan generation requires chaining reasoning steps to come up with a plan, and a firm ground truth about correctness can be established,\u201d Kambhampati said.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"A_new_benchmark_for_testing_planning_in_LLMs\"><\/span>A new benchmark for testing planning in LLMs<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u201cGiven the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/cacm.acm.org\/blogs\/blog-cacm\/261732-ai-as-an-ersatz-natural-science\/fulltext\">excitement around hidden\/emergent properties<\/a> of LLMs however, we thought it would be more constructive to develop a benchmark that provides a variety of planning\/reasoning tasks that can serve as a benchmark as people improve LLMs via finetuning and other approaches to customize\/improve their performance to\/on reasoning tasks. This is what we wound up doing,\u201d Kambhampati said.<\/p>\n<p>The team developed their benchmark based on the domains used in the International Planning Competition (<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.icaps-conference.org\/competitions\/\">IPC<\/a>). The framework consists of multiple tasks that evaluate different aspects of reasoning. For example, some tasks evaluate the LLMs capacity to create valid plans to achieve a certain goal while others will test whether the generated plan is optimal. Other tests include reasoning about the results of a plan, recognizing whether different text descriptions refer to the same goal, reusing parts of one plan in another, shuffling plans, and more.<\/p>\n<p>To carry out the tests, the team used <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Blocks_world\">Blocks world<\/a>, a problem framework that revolves around placing a set of different blocks in a particular order. Each problem has an initial condition, an end goal, and a set of allowed actions.<\/p>\n<p>\u201cThe benchmark itself is extensible and is meant to have tests from several of the IPC domains,\u201d Kambhampati said. \u201cWe used the Blocks world examples for illustrating the different tasks. Each of those tasks (e.g., Plan generation, goal shuffling, etc.) can also be posed in other IPC domains.\u201d<\/p>\n<p>The benchmark Kambhampati and his colleagues developed uses <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/08\/12\/what-is-one-shot-learning\/\">few-shot learning<\/a>, where the prompt given to the machine learning model includes a solved example plus the main problem that must be solved.<\/p>\n<p>Unlike other benchmarks, the problem descriptions of this new benchmark are very long and detailed. Solving them requires concentration and methodical planning and can\u2019t be cheated through pattern recognition. Even a human who would want to solve them would have to carefully think about each problem, take notes, possibly make visualizations, and plan the solution step by step.<\/p>\n<p>\u201cReasoning is a system-2 task in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a>. The collective delusion of the community has been to look at those types of reasoning benchmarks that could probably be handled via compilation to system 1 (e.g., \u2018the answer to this ethical dilemma, by pattern completion, is this\u2019) as against actually doing reasoning that is needed for the task at hand,\u201d Kambhampati said.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Large_language_models_are_bad_at_planning\"><\/span>Large language models are bad at planning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The researchers tested their framework on Davinci, the largest version of GPT-3. Their experiments show that GPT-3 has mediocre performance on some types of planning tasks but performs very poorly in areas such as plan reuse, plan generalization, optimal planning, and replanning.<\/p>\n<p>\u201cThe initial studies we have seen basically show that LLMs are particularly bad on anything that would be considered planning tasks\u2013including plan generation, optimal plan generation, plan reuse or replanning,\u201d Kambhampati said. \u201cThey do better on the planning-related tasks that don\u2019t require chains of reasoning\u2013such as goal shuffling.\u201d<\/p>\n<p>In the future, the researchers will add test cases based on other IPC domains and provide performance baselines with human subjects on the same benchmarks.<\/p>\n<p>\u201cWe are also ourselves curious as to whether other variants of LLMs do any better on these benchmarks,\u201d Kambhampati said.<\/p>\n<p>Kambhampati stresses that the goal of the project is to put the benchmark out and give an idea of where the current baseline is. The researchers hope that their work opens new windows for developing planning and reasoning capability for current AI systems. For example, one direction they propose is evaluating the effectiveness of finetuning LLMs for reasoning and planning in specific domains. The team already has preliminary results on an instruction-following variant of GPT-3 that seems to do marginally better on the easy tasks, although it too remains around the 5-percent level for actual plan generation tasks, Kambhampati said.<\/p>\n<p>Kambhampati also believes that learning and acquiring world models would be an essential step for any AI system that can reason and plan. Other scientists, including <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2022\/03\/07\/yann-lecun-ai-self-supervised-learning\/\">deep learning pioneer Yann LeCun<\/a>, have made similar suggestions.<\/p>\n<p>\u201cIf we agree that reasoning is part of intelligence, and want to claim LLMs do it, we certainly need plan generation benchmarks there,\u201d Kambhampati said. \u201cRather than take a magisterial negative stand, we are providing a benchmark, so that people who believe that reasoning can be emergent from LLMs even without any special mechanisms such as world models and reasoning about dynamics, can use the benchmark to support their point of view.\u201d<\/p>\n<p><em>This article was originally published by Ben Dickson on<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/\">TechTalks<\/a>, a publication that examines trends in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a>, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article\u00a0<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2022\/07\/25\/large-language-models-cant-plan\/\">here<\/a>.<\/em>\n                        <\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/large-language-models-cant-plan\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Large language models can\u2019t plan, even if they write fancy essays&#8221; This article is part of our coverage of the latest in AI research. Large language models like GPT-3 have advanced to the point that it has become difficult to measure the limits of their capabilities. When you have a very large neural network that&#8230;<\/p>\n","protected":false},"author":1,"featured_media":479856,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2022\/07\/language-learning.jpg&signature=a32a0c680be5d31e7d2c46d40824dd96","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-479855","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/479855","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=479855"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/479855\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/479856"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=479855"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=479855"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=479855"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}