{"id":317422,"date":"2021-08-06T16:00:21","date_gmt":"2021-08-06T13:00:21","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/"},"modified":"2021-08-06T16:00:21","modified_gmt":"2021-08-06T13:00:21","slug":"deepminds-new-system-could-take-us-a-step-closer-to-general-ai","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/","title":{"rendered":"#DeepMind\u2019s new system could take us a step closer to general AI"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a2d0535d1cbc\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a2d0535d1cbc\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/#The_brittleness_of_deep_reinforcement_learning\" >The brittleness of deep reinforcement learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/#The_XLand_environment\" >The XLand environment<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/#Deep_reinforcement_learning\" >Deep reinforcement learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/#High-level_behavior\" >High-level behavior<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/#Theories_of_intelligence\" >Theories of intelligence<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/buradabiliyorum.com\/en\/deepminds-new-system-could-take-us-a-step-closer-to-general-ai\/#The_gap_between_simulation_and_the_real_world\" >The gap between simulation and the real world<\/a><\/li><\/ul><\/nav><\/div>\n<p>&#8220;<strong>#DeepMind\u2019s new system could take us a step closer to <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a> AI<\/strong>&#8221;<\/p>\n<div>One of the key challenges of deep reinforcement learning models\u2014the kind of AI systems that have mastered Go, StarCraft 2, and other <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">game<\/a>s\u2014is their inability to generalize their capabilities beyond their training domain. This limit makes it very hard to <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>ly these systems to real-world settings, where situations are much more complicated and unpredictable than the environments where AI models are trained.<\/p>\n<p>But scientists at AI research lab DeepMind claim to have taken the \u201cfirst steps to train an agent capable of playing many different games without needing human interaction data,\u201d according to a <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/deepmind.com\/blog\/article\/generally-capable-agents-emerge-from-open-ended-play\">blog post<\/a> about their new \u201copen-ended learning\u201d initiative. Their new project includes a 3D environment with realistic dynamics and deep reinforcement learning agents that can learn to solve a wide range of challenges.<\/p>\n<p>The new system, according to DeepMind\u2019s AI researchers, is an \u201cimportant step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments.\u201d<\/p>\n<p>The paper\u2019s findings show some impressive advances in applying reinforcement learning to complicated problems. But they are also a reminder of how far current systems are from achieving the kind of general intelligence capabilities that the AI community has been <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/07\/22\/general-ai-driverless-cars-impossible\/\">coveting for decades<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_brittleness_of_deep_reinforcement_learning\"><\/span>The brittleness of deep reinforcement learning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1363116 js-lazy\" alt=\"Reinforcement-learning\" width=\"696\" height=\"392\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning-280x158.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning-479x270.jpeg 479w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning-240x135.jpeg 240w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1363116\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning.jpeg\" alt=\"Reinforcement-learning\" width=\"696\" height=\"392\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning-280x158.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning-479x270.jpeg 479w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Reinforcement-learning-240x135.jpeg 240w\"\/><\/noscript><\/figure>\n<p>The key advantage of <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/05\/28\/what-is-reinforcement-learning\/\">reinforcement learning<\/a> is its ability to develop behavior by taking actions and getting feedback, similar to the way humans and animals learn by interacting with their environment. Some scientists describe reinforcement learning as \u201c<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/venturebeat.com\/2021\/01\/02\/leading-computer-scientists-debate-the-next-steps-for-ai-in-2021\/\">the first computational theory of intelligence<\/a>.\u201d<\/p>\n<p>The combination of reinforcement learning and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/01\/28\/deep-learning-explainer\/\">deep neural networks<\/a>, known as deep reinforcement learning, has been at the heart of many advances in AI, including DeepMind\u2019s famous AlphaGo and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/11\/04\/deepmind-ai-starcraft-2-reinforcement-learning\/\">AlphaStar models<\/a>. In both cases, the AI systems were able to outmatch human world champions at their respective games.<\/p>\n<p>But reinforcement learning systems are also notoriously renowned for their lack of flexibility. For example, a reinforcement learning model that can play StarCraft 2 at an expert level won\u2019t be able to play a game with similar mechanics (e.g., Warcraft 3) at any level of competency. Even slight changes to the original game will considerably degrade the AI model\u2019s performance.<\/p>\n<p>\u201cThese agents are often constrained to play only the games they were trained for \u2013 whilst the exact instantiation of the game may vary (e.g. the layout, initial conditions, opponents) the goals the agents must satisfy remain the same between training and testing. Deviation from this can lead to catastrophic failure of the agent,\u201d DeepMind\u2019s researchers write in a <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/deepmind.com\/research\/publications\/open-ended-learning-leads-to-generally-capable-agents\">paper<\/a> that provides the full details on their open-ended learning.<\/p>\n<p>Humans, on the other hand, are very good at transferring knowledge across domains.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_XLand_environment\"><\/span>The XLand environment<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1363119 js-lazy\" alt=\"XLand-environment\" width=\"696\" height=\"382\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment-280x154.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment-492x270.jpeg 492w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment-246x135.jpeg 246w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1363119\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment.jpeg\" alt=\"XLand-environment\" width=\"696\" height=\"382\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment-280x154.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment-492x270.jpeg 492w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/XLand-environment-246x135.jpeg 246w\"\/><\/noscript><\/figure>\n<p>The goal of DeepMind\u2019s new project was to create \u201can artificial agent whose behaviour generalises beyond the set of games it was trained on.\u201d<\/p>\n<p>To this end, the team created XLand, an engine that can generate 3D environments composed of static topology and moveable objects. The game engine simulates rigid-body physics and allows players to use the objects in various ways (e.g., create ramps, block paths, etc.).<\/p>\n<p>XLand is a rich environment in which you can train agents on a virtually unlimited number of tasks. One of the main advantages of XLand is the capability to use programmatic rules to automatically generate a vast array of environments and challenges to train AI agents. This addresses one of the key challenges of machine learning systems, which often require vast amounts of manually curated training data.<\/p>\n<p>According to the blog post, the researchers created \u201cbillions of tasks in XLand, across varied games, worlds, and players.\u201d The games include very simple goals such as finding objects to more complex settings in which the AI agents much weigh the benefits and tradeoffs of different rewards. Some of the games include cooperation or competition elements involving multiple agents.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Deep_reinforcement_learning\"><\/span>Deep reinforcement learning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>DeepMind uses deep reinforcement learning and a few clever tricks to create AI agents that can thrive in the XLand environment.<\/p>\n<p>The reinforcement learning model of each agent receives a first-person view of the world, the agent\u2019s physical state (e.g., whether it holding an object), and its current goal. Each agent finetunes the parameters of its policy neural network to maximize its rewards on the current task. The neural network architecture contains an attention mechanism to ensure the agent can balance optimization for the subgoals required to accomplish the main goal.<\/p>\n<p>Once the agent masters its current challenge, the computational task generator creates a new challenge for the agent. Each new task is generated according to the agent\u2019s training history and in a way to help distribute the agent\u2019s skills across a vast range of challenges.<\/p>\n<p>DeepMind also used its vast computational resources (<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/12\/21\/deepminds-annual-report-why-its-hard-to-run-a-commercial-ai-lab\/\">courtesy of its owner Alphabet Inc.<\/a>) to train a large population of agents in parallel and transfer learned parameters across different agents to improve the general capabilities of the reinforcement learning systems.<\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1363120 js-lazy\" alt=\"DeepMind-XLand-agent-training\" width=\"661\" height=\"1024\" sizes=\"auto, (max-width: 661px) 100vw, 661px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training.jpeg 661w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training-136x210.jpeg 136w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training-174x270.jpeg 174w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training-87x135.jpeg 87w\"\/><figcaption><a rel=\"nofollow noopener\" target=\"_blank\" href=\"#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Feditorial.thenextweb.com%2Fneural%2F2021%2F08%2F06%2Fdeepminds-new-system-general-ai-way-to-go-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: DeepMind uses a multi-step and population-based mechanism to train many reinforcement learning agents\" data-title=\"Share DeepMind uses a multi-step and population-based mechanism to train many reinforcement learning agents on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share DeepMind uses a multi-step and population-based mechanism to train many reinforcement learning agents on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"\/><\/a>DeepMind uses a multi-step and population-based mechanism to train many reinforcement learning agents<\/figcaption><noscript><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1363120\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training.jpeg\" alt=\"DeepMind-XLand-agent-training\" width=\"661\" height=\"1024\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training.jpeg 661w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training-136x210.jpeg 136w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training-174x270.jpeg 174w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-agent-training-87x135.jpeg 87w\"\/><\/noscript><\/figure>\n<p>The performance of the reinforcement learning agents was evaluated based on their general ability to accomplish a wide range of tasks they had not been trained on. Some of the test tasks include well-known challenges such as \u201ccapture the flag\u201d and \u201chide and seek.\u201d<\/p>\n<p>According to DeepMind, each agent played around 700,000 unique games in 4,000 unique worlds within XLand and went through 200 billion training steps across 3.4 million unique tasks (in the paper, the researchers write that 100 million steps are equivalent to approximately 30 minutes of training).<\/p>\n<p>\u201cAt this time, our agents have been able to participate in every procedurally generated evaluation task except for a handful that were impossible even for a human,\u201d the AI researchers wrote. \u201cAnd the results we\u2019re seeing clearly exhibit general, zero-shot behaviour across the task space.\u201d<\/p>\n<p><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/08\/12\/what-is-one-shot-learning\/\">Zero-shot machine learning<\/a> models can solve problems that were not present in their training dataset. In a complicated space such as XLand, zero-shot learning might imply that the agents have obtained fundamental knowledge about their environment as opposed to memorizing sequences of image frames in specific tasks and environments.<\/p>\n<p>The reinforcement learning agents further manifested signs of generalized learning when the researchers tried to adjust them for new tasks. According to their findings, 30 minutes of <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/06\/10\/what-is-transfer-learning\/\">fine-tuning on new tasks<\/a> was enough to create an impressive improvement in a reinforcement learning agent trained with the new method. In contrast, an agent trained from scratch for the same amount of time would have near-zero performance on most tasks.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"High-level_behavior\"><\/span>High-level behavior<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>According to DeepMind, the reinforcement learning agents exhibit the emergence of \u201cheuristic behavior\u201d such as tool use, teamwork, and multi-step planning. If proven, this can be an important milestone. Deep learning systems are often criticized for learning statistical correlations <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/03\/15\/machine-learning-causality\/\">instead of causal relations<\/a>. If neural networks could develop high-level notions such as using objects to create ramps or cause occlusions, it could have a great impact on fields such as robotics and self-driving cars, where deep learning is currently struggling.<\/p>\n<p>But those are big ifs, and DeepMind\u2019s researchers are cautious about jumping to conclusions on their findings. \u201cGiven the nature of the environment, it is difficult to pinpoint intentionality \u2014 the behaviours we see often appear to be accidental, but still we see them occur consistently,\u201d they wrote in their blog post.<\/p>\n<p>But they are confident that their reinforcement learning agents \u201care aware of the basics of their bodies and the passage of time and that they understand the high-level structure of the games they encounter.\u201d<\/p>\n<p>Such <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/07\/26\/ai-visual-reasoning-agent-dataset\/\">fundamental self-learned skills<\/a> are another one of the highly sought goals of the artificial intelligence community.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Theories_of_intelligence\"><\/span>Theories of intelligence<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1363121 js-lazy\" alt=\"DeepMind-XLand-environment\" width=\"696\" height=\"382\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment.jpeg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment-280x154.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment-492x270.jpeg 492w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment-246x135.jpeg 246w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1363121\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment.jpeg\" alt=\"DeepMind-XLand-environment\" width=\"696\" height=\"382\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment.jpeg 696w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment-280x154.jpeg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment-492x270.jpeg 492w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/DeepMind-XLand-environment-246x135.jpeg 246w\"\/><\/noscript><\/figure>\n<p>Some of DeepMind\u2019s top scientists <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/06\/07\/deepmind-artificial-intelligence-reward-maximization\/\">published a paper<\/a> recently in which they hypothesize that a single reward and reinforcement learning are enough to eventually reach <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/05\/13\/what-is-artificial-general-intelligence-agi\/\">artificial general intelligence<\/a> (AGI). An intelligent agent with the right incentives can develop all kinds of capabilities such as perception and natural language understanding, the scientists believe.<\/p>\n<p>Although DeepMind\u2019s new approach still requires the training of reinforcement learning agents on multiple engineered rewards, it is in line with their general perspective of achieving AGI through reinforcement learning.<\/p>\n<p>\u201cWhat DeepMind shows with this paper is that a single RL agent can develop the intelligence to reach many goals, rather than just one,\u201d Chris Nicholson, CEO of Pathmind, told TechTalks. \u201cAnd the skills it learns in accomplishing one thing can generalize to other goals. That is very similar to how human intelligence is applied. For example, we learn to grab and manipulate objects, and that is the foundation of accomplishing goals that range from pounding a hammer to making your bed.\u201d<\/p>\n<p>Nicholson also believes that other aspects of the paper\u2019s findings hint at progress toward general intelligence. \u201cParents will recognize that open-ended exploration is precisely how their toddlers learn to move through the world. They take something out of a cupboard, and put it back in. They invent their own small goals\u2014which may seem meaningless to adults\u2014and they master them,\u201d he said. \u201cDeepMind is programmatically setting goals for its agents within this world, and those agents are learning how to master them one by one.\u201d<\/p>\n<p>The reinforcement learning agents have also shown signs of developing <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/04\/26\/reinforcement-learning-embodied-ai\/\">embodied intelligence<\/a> in their own virtual world, Nicholson said, like the kind humans have. \u201cThis is one more indication that the rich and malleable environment that people learn to move through and manipulate is conducive to the emergence of general intelligence, and that the biological and physical analogies of intelligence can guide further work in AI,\u201d he said.<\/p>\n<p>Sathyanaraya Raghavachary, Associate Professor of Computer <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/sciencee\/\" data-internallinksmanager029f6b8e52c=\"5\" title=\"Science\" target=\"_blank\" rel=\"noopener\">Science<\/a> at the University of Southern California, is a bit more skeptical on the claims made in DeepMind\u2019s paper, especially the conclusions on proprioception, awareness of time, and high-level understanding of goals and environments.<\/p>\n<p>\u201cEven we humans are not fully aware of our bodies, let alone those VR agents,\u201d Raghavachary said in comments to TechTalks, adding that perception of the body requires an integrated brain that is co-designed for suitable body awareness and situatedness in space. \u201cSame with the passage of time\u2014that too would require a brain that has memory of the past, and a sense for time in relation to that past. What they (paper authors) might mean relates to the agents\u2019 tracking progressive changes in the environment resulting from their actions (eg. as a resulting of moving a purple pyramid), state changes which the underlying physics simulator would generate.<\/p>\n<p>Raghavachary also points out, if the agents could understand the high-level structure of their tasks, they would not need 200 billion steps of simulated training to reach optimal results.<\/p>\n<p>\u201cThe underlying architecture lacks what it takes, to achieve these three things (body awareness, time passage, understanding high-level task structure) they point out in conclusion,\u201d he said. \u201cOverall, XLand is simply \u2018more of the same.\u2019\u201d<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_gap_between_simulation_and_the_real_world\"><\/span>The gap between simulation and the real world<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><iframe loading=\"lazy\" title=\"Open-Ended Learning Leads to Generally Capable Agents | Results Showreel\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/lTmL7jwFfdw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>In a nutshell, the paper proves that if you can create a complex enough environment, design the right reinforcement learning architecture, and expose your models to enough experience (and have a lot of money to spend on compute resources), you\u2019ll be able to generalize to various kinds of tasks in the same environment. And this is basically how natural evolution has delivered <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/06\/17\/evolution-rewards-artificial-intelligence\/\">human and animal intelligence<\/a>.<\/p>\n<p>In fact, DeepMind has already done something similar with <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/01\/02\/humanizing-ai-deep-learning-alphazero\/\">AlphaZero<\/a>, a reinforcement learning model that managed to master multiple two-player turn-based games. The XLand experiment has extended the same notion to a much greater level by adding the zero-shot learning element.<\/p>\n<p>But while I think that the experience from the XLand-trained agents will ultimately be transferable to real-world applications such as robotics and self-driving cars, I don\u2019t think it will be a breakthrough. You\u2019ll still need to make compromises (such as creating artificial limits to reduce the complexity of the real world) or create artificial enhancements (such as imbuing the machine learning models with prior knowledge or extra sensors).<\/p>\n<p>DeepMind\u2019s reinforcement learning agents might have become the masters of the virtual XLand. But their simulated world doesn\u2019t even have a fraction of the intricacies of the real world. That gap will continue to remain a challenge for a long time.<\/p>\n<p><i><span>This article was originally published by Ben Dickson on\u00a0<\/span><\/i><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/\"><i><span>TechTalks<\/span><\/i><\/a><i><span>, a publication that examines trends in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a>, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article\u00a0<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/08\/02\/deepmind-xland-deep-reinforcement-learning\/\">here<\/a>.<\/span><\/i><\/p>\n<\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/deepminds-new-system-general-ai-way-to-go-syndication\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#DeepMind\u2019s new system could take us a step closer to general AI&#8221; One of the key challenges of deep reinforcement learning models\u2014the kind of AI systems that have mastered Go, StarCraft 2, and other games\u2014is their inability to generalize their capabilities beyond their training domain. This limit makes it very hard to apply these systems&#8230;<\/p>\n","protected":false},"author":1,"featured_media":317423,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/1BDHed1.jpg&signature=ee7ea65509523e6e36220583577bab8f","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-317422","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/317422","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=317422"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/317422\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/317423"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=317422"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=317422"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=317422"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}