{"id":378406,"date":"2021-12-09T00:10:58","date_gmt":"2021-12-08T21:10:58","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/deepminds-new-language-model-kicks-gpt-3s-butt\/"},"modified":"2021-12-09T00:10:58","modified_gmt":"2021-12-08T21:10:58","slug":"deepminds-new-language-model-kicks-gpt-3s-butt","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/deepminds-new-language-model-kicks-gpt-3s-butt\/","title":{"rendered":"#DeepMind&#8217;s new language model kicks GPT-3&#8217;s butt"},"content":{"rendered":"<p>&#8220;<strong>#DeepMind&#8217;s new language model kicks GPT-3&#8217;s butt<\/strong>&#8221;<\/p>\n<div><span>Move over GPT-3, there\u2019s a scr<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>y new contender for the crown of <\/span><i>world\u2019s greatest language model<\/i><span> and it\u2019s from our old pals over at DeepMind.<\/span><\/p>\n<p><span><b>Up front:<\/b><\/span><span> The Alphabet-owned UK outfit that answered the question of whether humans or computers are better at chess once and for all \u2013 <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/deepmind.com\/blog\/article\/alphazero-shedding-new-light-grand-games-chess-shogi-and-go\">the machines won<\/a> \u2013 has now set its sights on the world of large language models (LLM). <\/span><\/p>\n<p><span>To that end, today it announced \u201cGopher,\u201d a language model that\u2019s about 60% larger, parameter-wise, than GPT-3 and a little over a quarter of the size of Google\u2019s massive trillion-parameter LLM. <\/span><\/p>\n<p><span>Per <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/deepmind.com\/blog\/article\/language-modelling-at-scale\">a press release<\/a> on the DeepMind blog:<\/span><\/p>\n<blockquote><p>In our research, we found the capabilities of Gopher exceed existing language models for a number of key tasks. This includes the Massive Multitask Language Understanding (MMLU) benchmark, where Gopher demonstrates a significant advancement towards human expert performance over prior work.<\/p>\n<\/blockquote>\n<p><b>Background: <\/b>DeepMind accomplished the improvements by focusing in on areas where expanding the size of an AI model made sense.<\/p>\n<p>The more power you can shove into a model for, say, reading comprehension, the better. But the team found that other areas of LLM architecture didn\u2019t benefit as much from brute force.<\/p>\n<p>By prioritizing how the system utilizes and distributes resources, the team was able to tweak their algorithms to outperform state-of-the-art models in 80% of the benchmarks used.<\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1375265 aligncenter js-lazy\" alt=\"A figure from DeepMind's press release\" width=\"1984\" height=\"1023\" sizes=\"auto, (max-width: 1984px) 100vw, 1984px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1.jpg\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1.jpg 1984w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-280x144.jpg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-262x135.jpg 262w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-524x270.jpg 524w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-1536x792.jpg 1536w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-796x410.jpg 796w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-1592x821.jpg 1592w\"\/><figcaption>Credit: <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/deepmind.com\/blog\/article\/language-modelling-at-scale\">DeepMind<\/a><\/figcaption><noscript><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-1375265 aligncenter\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1.jpg\" alt=\"A figure from DeepMind's press release\" width=\"1984\" height=\"1023\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1.jpg 1984w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-280x144.jpg 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-262x135.jpg 262w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-524x270.jpg 524w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-1536x792.jpg 1536w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-796x410.jpg 796w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/gopherfig1-1592x821.jpg 1592w\"\/><\/noscript><\/figure>\n<p>The DeepMind team also released papers discussing the ethics and architecture of LLMs, you can read those <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/storage.googleapis.com\/deepmind-media\/research\/language-research\/Ethical%20and%20social%20risks.pdf\">here<\/a> and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/storage.googleapis.com\/deepmind-media\/research\/language-research\/Improving%20language%20models%20by%20retrieving.pdf\">here<\/a>.<\/p>\n<p><strong>Quick take:<\/strong> To paraphrase\u00a0the great poet Montell Jordan: <i>this is how you do it. <\/i><span>Instead of careening the field towards ruin by increasing the size of models exponentially until GPT-5 or GPT-6 ends up being larger than the known universe, DeepMind\u2019s trying to squeeze more <\/span><i>oomph<\/i><span> out of smaller models.<\/span><\/p>\n<p><span>Don\u2019t get me wrong, Gopher has significantly more parameters than GPT-3. But, when you consider that <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.wired.com\/story\/cerebras-chip-cluster-neural-networks-ai\/\">GPT-4 is expected to have about 100 trillion parameters<\/a>, it looks like DeepMind\u2019s moving in a more feasible direction. <\/span><\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/deepminds-new-280-billion-parameter-language-model-kicks-gpt-3s-butt-accuracy\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#DeepMind&#8217;s new language model kicks GPT-3&#8217;s butt&#8221; Move over GPT-3, there\u2019s a scrappy new contender for the crown of world\u2019s greatest language model and it\u2019s from our old pals over at DeepMind. Up front: The Alphabet-owned UK outfit that answered the question of whether humans or computers are better at chess once and for all&#8230;<\/p>\n","protected":false},"author":1,"featured_media":378407,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/12\/buttkick.jpg&signature=c04853f72d2f4a282900d68fdec7b802","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-378406","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/378406","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=378406"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/378406\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/378407"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=378406"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=378406"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=378406"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}