{"id":272820,"date":"2021-06-12T13:00:42","date_gmt":"2021-06-12T10:00:42","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/reinforcement-learning-can-deliver-general-ai-says-deepmind\/"},"modified":"2021-06-12T13:00:42","modified_gmt":"2021-06-12T10:00:42","slug":"reinforcement-learning-can-deliver-general-ai-says-deepmind","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/reinforcement-learning-can-deliver-general-ai-says-deepmind\/","title":{"rendered":"#Reinforcement learning can deliver general AI, says DeepMind"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a4d85853ca3c\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a4d85853ca3c\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/reinforcement-learning-can-deliver-general-ai-says-deepmind\/#Two_paths_for_AI\" >Two paths for AI<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/reinforcement-learning-can-deliver-general-ai-says-deepmind\/#Developing_abilities_through_reward_maximization\" >Developing abilities through reward maximization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/reinforcement-learning-can-deliver-general-ai-says-deepmind\/#Reinforcement_learning_for_reward_maximization\" >Reinforcement learning for reward maximization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/reinforcement-learning-can-deliver-general-ai-says-deepmind\/#Strengths_and_weaknesses_of_reward_maximization\" >Strengths and weaknesses of reward maximization<\/a><\/li><\/ul><\/nav><\/div>\n<p>&#8220;<strong>#Reinforcement learning can deliver <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a> AI, says DeepMind<\/strong>&#8221;<\/p>\n<div>In their decades-long chase to create artificial intelligence, computer scientists have designed and developed all kinds of complicated mechanisms and technologies to replicate vision, language, reasoning, motor skills, and other abilities associated with intelligent life. While these efforts have resulted in AI systems that can efficiently solve specific problems in limited environments, they fall short of developing the kind of general intelligence seen in humans and animals.<\/p>\n<p>In a new paper submitted to the peer-reviewed<span>\u00a0<\/span><em>Artificial Intelligence\u00a0<\/em>journal, scientists at<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/12\/21\/deepminds-annual-report-why-its-hard-to-run-a-commercial-ai-lab\/\">UK-based AI lab DeepMind<\/a><span>\u00a0<\/span>argue that intelligence and its associated abilities will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.<\/p>\n<p>Titled \u201c<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0004370221000862\">Reward is Enough<\/a>,\u201d the paper, which is still in pre-proof as of this writing, draws inspiration from studying the evolution of natural intelligence as well as drawing lessons from recent achievements in artificial intelligence. The authors suggest that reward maximization and trial-and-error experience are enough to develop behavior that exhibits the kind of abilities associated with intelligence. And from this, they conclude that reinforcement learning, a branch of AI that is based on reward maximization, can lead to the development of<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/05\/13\/what-is-artificial-general-intelligence-agi\/\">artificial general intelligence<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Two_paths_for_AI\"><\/span>Two paths for AI<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u00a0<\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1356692 aligncenter js-lazy\" alt=\"\" width=\"696\" height=\"487\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1-280x196.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1-386x270.jpeg 386w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1-193x135.jpeg 193w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1356692 aligncenter\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1.jpeg\" alt=\"\" width=\"696\" height=\"487\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1-280x196.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1-386x270.jpeg 386w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD1-193x135.jpeg 193w\"\/><\/noscript><\/figure>\n<p>One common method for creating AI is to try to replicate elements of intelligent behavior in computers. For instance, our<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/05\/10\/biological-computer-vision\/\">understanding of the mammal vision system<\/a><span>\u00a0<\/span>has given rise to all kinds of AI systems that can categorize images, locate objects in photos, define the boundaries between objects, and more. Likewise, our understanding of language has helped in the development of various<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2018\/02\/20\/ai-machine-learning-nlg-nlp\/\">natural language processing<\/a><span>\u00a0<\/span>systems, such as question answering, text generation, and machine translation.<\/p>\n<p>These are all instances of<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/04\/09\/what-is-narrow-artificial-intelligence-ani\/\">narrow artificial intelligence<\/a>, systems that have been designed to perform specific tasks instead of having general problem-solving abilities. Some scientists believe that assembling multiple narrow AI modules will produce higher intelligent systems. For example, you can have a software system that coordinates between separate<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/01\/14\/what-is-computer-vision\/\">computer vision<\/a>, voice processing, NLP, and motor control modules to solve complicated problems that require a multitude of skills.<\/p>\n<p>A different <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>roach to creating AI, proposed by the DeepMind researchers, is to recreate the simple yet effective rule that has given rise to natural intelligence. \u201c[We] consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence,\u201d the researchers write.<\/p>\n<p>This is basically how nature works. As far as <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/sciencee\/\" data-internallinksmanager029f6b8e52c=\"5\" title=\"Science\" target=\"_blank\" rel=\"noopener\">science<\/a> is concerned, there has been no top-down intelligent design in the complex organisms that we see around us. Billions of years of natural selection and random variation have filtered lifeforms for their fitness to survive and reproduce. Living beings that were better equipped to handle the challenges and situations in their environments managed to survive and reproduce. The rest were eliminated.<\/p>\n<p>This simple yet efficient mechanism has led to the evolution of living beings with all kinds of skills and abilities to perceive, navigate, modify their environments, and communicate among themselves.<\/p>\n<p>\u201cThe natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments,\u201d the researchers write. \u201cThus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximization contains within it many or possibly even all the goals of intelligence.\u201d<\/p>\n<p>For example, consider a squirrel that seeks the reward of minimizing hunger. On the one hand, its sensory and motor skills help it locate and collect nuts when food is available. But a squirrel that can only find food is bound to die of hunger when food becomes scarce. This is why it also has planning skills and memory to cache the nuts and restore them in winter. And the squirrel has <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">social<\/a> skills and knowledge to ensure other animals don\u2019t steal its nuts. If you zoom out, hunger minimization can be a subgoal of \u201cstaying alive,\u201d which also requires skills such as detecting and hiding from dangerous animals, protecting oneself from environmental threats, and seeking better habitats with seasonal changes.<\/p>\n<p>\u201cWhen abilities associated with intelligence arise as solutions to a singular goal of reward maximisation, this may in fact provide a deeper understanding since it explains<span>\u00a0<\/span><em>why<\/em><span>\u00a0<\/span>such an ability arises,\u201d the researchers write. \u201cIn contrast, when each ability is understood as the solution to its own specialised goal, the why question is side-stepped in order to focus upon<span>\u00a0<\/span><em>what<\/em><span>\u00a0<\/span>that ability does.\u201d<\/p>\n<p>Finally, the researchers argue that the \u201cmost general and scalable\u201d way to maximize reward is through agents that learn through interaction with the environment.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Developing_abilities_through_reward_maximization\"><\/span>Developing abilities through reward maximization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1356693 aligncenter js-lazy\" alt=\"\" width=\"1645\" height=\"863\" sizes=\"auto, (max-width: 1645px) 100vw, 1645px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2.jpeg 1645w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-280x147.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-515x270.jpeg 515w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-257x135.jpeg 257w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-796x418.jpeg 796w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-1592x835.jpeg 1592w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1356693 aligncenter\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2.jpeg\" alt=\"\" width=\"1645\" height=\"863\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2.jpeg 1645w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-280x147.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-515x270.jpeg 515w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-257x135.jpeg 257w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-796x418.jpeg 796w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD2-1592x835.jpeg 1592w\"\/><\/noscript><\/figure>\n<p>In the paper, the AI researchers provide some high-level examples of how \u201cintelligence and associated abilities will implicitly arise in the service of maximising one of many possible reward signals, corresponding to the many pragmatic goals towards which natural or artificial intelligence may be directed.\u201d<\/p>\n<p>For example, sensory skills serve the need to survive in complicated environments. Object recognition enables animals to detect food, prey, friends, and threats, or find paths, shelters, and perches. Image segmentation enables them to tell the difference between different objects and avoid fatal mistakes such as running off a cliff or falling off a branch. Meanwhile, hearing helps detect threats where the animal can\u2019t see or find prey when they\u2019re camouflaged. Touch, taste, and smell also give the animal the advantage of having a richer sensory experience of the habitat and a greater chance of survival in dangerous environments.<\/p>\n<p>Rewards and environments also shape innate and learned knowledge in animals. For instance, hostile habitats ruled by predator animals such as lions and cheetahs reward ruminant species that have the innate knowledge to run away from threats since birth. Meanwhile, animals are also rewarded for their power to learn specific knowledge of their habitats, such as where to find food and shelter.<\/p>\n<p>The researchers also discuss the reward-powered basis of language, social intelligence, imitation, and finally, general intelligence, which they describe as \u201cmaximising a singular reward in a single, complex environment.\u201d<\/p>\n<p>Here, they draw an analogy between natural intelligence and AGI: \u201cAn animal\u2019s stream of experience is sufficiently rich and varied that it may demand a flexible ability to achieve a vast variety of subgoals (such as foraging, fighting, or fleeing), in order to succeed in maximising its overall reward (such as hunger or reproduction). Similarly, if an artificial agent\u2019s stream of experience is sufficiently rich, then many goals (such as battery-life or survival) may implicitly require the ability to achieve an equally wide variety of subgoals, and the maximisation of reward should therefore be enough to yield an artificial general intelligence.\u201d<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Reinforcement_learning_for_reward_maximization\"><\/span>Reinforcement learning for reward maximization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1356694 js-lazy\" alt=\"\" width=\"696\" height=\"392\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3-280x158.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3-479x270.jpeg 479w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3-240x135.jpeg 240w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1356694\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3.jpeg\" alt=\"\" width=\"696\" height=\"392\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3-280x158.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3-479x270.jpeg 479w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD3-240x135.jpeg 240w\"\/><\/noscript><\/figure>\n<p><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/05\/28\/what-is-reinforcement-learning\/\">Reinforcement learning<\/a> is a special branch of AI algorithms that is composed of three key elements: an environment, agents, and rewards.<\/p>\n<p>By performing actions, the agent changes its own state and that of the environment. Based on how much those actions affect the goal the agent must achieve, it is rewarded or penalized. In many reinforcement learning problems, the agent has no initial knowledge of the environment and starts by taking random actions. Based on the feedback it receives, the agent learns to tune its actions and develop policies that maximize its reward.<\/p>\n<p>In their paper, the researchers at DeepMind suggest reinforcement learning as the main algorithm that can replicate reward maximization as seen in nature and can eventually lead to artificial general intelligence.<\/p>\n<p>\u201cIf an agent can continually adjust its behaviour so as to improve its cumulative reward, then any abilities that are repeatedly demanded by its environment must ultimately be produced in the agent\u2019s behaviour,\u201d the researchers write, adding that, in the course of maximizing for its reward, a good reinforcement learning agent could eventually learn perception, language, social intelligence and so forth.<\/p>\n<p>In the paper, the researchers provide several examples that show how reinforcement learning agents were able to learn general skills in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">game<\/a>s and robotic environments.<\/p>\n<p>However, the researchers stress that some fundamental challenges remain unsolved. For instance, they say, \u201cWe do not offer any theoretical guarantee on the sample efficiency of reinforcement learning agents.\u201d Reinforcement learning is notoriously renowned for requiring huge amounts of data. For instance, a reinforcement learning agent might need centuries worth of gameplay to master a computer game. And AI researchers still haven\u2019t figured out how to create reinforcement learning systems that can generalize their learnings across several domains. Therefore, slight changes to the environment often require the full retraining of the model.<\/p>\n<p>The researchers also acknowledge that learning mechanisms for reward maximization is an unsolved problem that remains a central question to be further studied in reinforcement learning.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Strengths_and_weaknesses_of_reward_maximization\"><\/span>Strengths and weaknesses of reward maximization<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1356695 js-lazy\" alt=\"\" width=\"696\" height=\"418\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4-280x168.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4-450x270.jpeg 450w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4-225x135.jpeg 225w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1356695\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4.jpeg\" alt=\"\" width=\"696\" height=\"418\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4-280x168.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4-450x270.jpeg 450w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BD4-225x135.jpeg 225w\"\/><\/noscript><\/figure>\n<p>Patricia Churchland, neuroscientist, philosopher, and professor emerita at the University of California, San Diego, described the ideas in the paper as \u201cvery carefully and insightfully worked out.\u201d<\/p>\n<p><span>However, Churchland pointed it out to possible flaws in the paper\u2019s discussion about social decision-making. The DeepMind researchers focus on personal gains in social interactions. Churchland, who has recently written a book on the\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/09\/28\/ai-conscience-patricia-churchland\/\">biological origins of moral intuitions<\/a><span>, argues that attachment and bonding is a powerful factor in social decision-making of mammals and birds, which is why animals put themselves in great danger to protect their children.\u00a0<\/span><\/p>\n<div style=\"text-align: center\">\n<p><iframe loading=\"lazy\" title=\"Cheetah Mom Protects Cubs from Male Lion | Animals Save Another Animals\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/mCn58l3fdGk?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<\/div>\n<p>\u201cI have tended to see bonding, and hence other-care, as an extension of the ambit of what counts as oneself\u2014\u2018me-and-mine,\u2019\u201d Churchland said. \u201cIn that case, a small modification to the [paper\u2019s] hypothesis to allow for reward maximization to me-and-mine would work quite nicely, I think. Of course, we social animals have degrees of attachment\u2014super strong to offspring, very strong to mates and kin, strong to friends and acquaintances etc., and the strength of types of attachments can vary depending on environment, and also on developmental stage.\u201d<\/p>\n<p>This is not a major criticism, Churchland said, and could likely be worked into the hypothesis quite gracefully.<\/p>\n<p>\u201cI am very impressed with the degree of detail in the paper, and how carefully they consider possible weaknesses,\u201d Churchland said. \u201cI may be wrong, but I tend to see this as a milestone.\u201d<\/p>\n<p>Data scientist Herbert Roitblat challenged the paper\u2019s position that simple learning mechanisms and trial-and-error experience are enough to develop the abilities associated with intelligence. Roitblat argued that the theories presented in the paper face several challenges when it comes to implementing them in real life.<\/p>\n<p>\u201cIf there are no time constraints, then trial and error learning might be enough, but otherwise we have the problem of an infinite number of monkeys typing for an infinite amount of time,\u201d Roitblat said.<\/p>\n<p>The<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Infinite_monkey_theorem\">infinite monkey theorem<\/a><span>\u00a0states<\/span> that a monkey hitting random keys on a typewriter for an infinite amount of time may eventually type any given text.<\/p>\n<p>Roitblat is the author of<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/03\/29\/ai-algorithms-representations-herbert-roitblat\/\"><em>Algorithms are Not Enough<\/em><\/a>, in which he explains why all current AI algorithms, including reinforcement learning, require careful formulation of the problem and representations created by humans.<\/p>\n<p>\u201cOnce the model and its intrinsic representation are set up, optimization or reinforcement could guide its evolution, but that does not mean that reinforcement is enough,\u201d Roitblat said.<\/p>\n<p>In the same vein, Roitblat added that the paper does not make any suggestions on how the reward, actions, and other elements of reinforcement learning are defined.<\/p>\n<p>\u201cReinforcement learning assumes that the agent has a finite set of potential actions. A reward signal and value function have been specified. In other words, the problem of general intelligence is precisely to contribute those things that reinforcement learning requires as a pre-requisite,\u201d Roitblat said. \u201cSo, if machine learning can all be reduced to some form of optimization to maximize some evaluative measure, then it must be true that reinforcement learning is relevant, but it is not very explanatory.\u201d<\/p>\n<p><i><span>This article was originally published by Ben Dickson on\u00a0<\/span><\/i><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/\"><i><span>TechTalks<\/span><\/i><\/a><i><span>, a publication that examines trends in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a>, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/06\/07\/deepmind-artificial-intelligence-reward-maximization\/\">here<\/a>.<\/span><\/i><\/p>\n<p>\u00a0<\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/deepmind-reinforcement-learning-enough-general-ai-syndication\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Reinforcement learning can deliver general AI, says DeepMind&#8221; In their decades-long chase to create artificial intelligence, computer scientists have designed and developed all kinds of complicated mechanisms and technologies to replicate vision, language, reasoning, motor skills, and other abilities associated with intelligent life. While these efforts have resulted in AI systems that can efficiently solve&#8230;<\/p>\n","protected":false},"author":1,"featured_media":272821,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/06\/2BDSF.jpg&signature=c72563bebf471ce23430f8ec3480edde","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-272820","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/272820","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=272820"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/272820\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/272821"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=272820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=272820"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=272820"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}