{"id":124313,"date":"2020-12-01T21:11:56","date_gmt":"2020-12-01T18:11:56","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/shrinking-massive-neural-networks-used-to-model-language\/"},"modified":"2020-12-01T21:11:56","modified_gmt":"2020-12-01T18:11:56","slug":"shrinking-massive-neural-networks-used-to-model-language","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/shrinking-massive-neural-networks-used-to-model-language\/","title":{"rendered":"#Shrinking massive neural networks used to model language"},"content":{"rendered":"<p>&#8220;<strong>#Shrinking massive neural networks used to model language<\/strong>&#8221;<\/p>\n<div>\n<div class=\"article-gallery lightGallery\">\n<div data-thumb=\"https:\/\/scx1.b-cdn.net\/csz\/news\/tmb\/2020\/shrinkingmas.jpg\" data-src=\"https:\/\/scx2.b-cdn.net\/gfx\/news\/2020\/shrinkingmas.jpg\" data-sub-html=\"Deep learning neural networks can be massive, demanding major computing power. In a test of the Lottery Ticket Hypothesis, MIT researchers have found leaner, more efficient subnetworks hidden within BERT models. Credit: Jose-Luis Olivares, MIT\">\n<figure class=\"article-img\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/scx1.b-cdn.net\/csz\/news\/800\/2020\/shrinkingmas.jpg\" alt=\"Shrinking massive neural networks used to model language\" title=\"Deep learning neural networks can be massive, demanding major computing power. In a test of the Lottery Ticket Hypothesis, MIT researchers have found leaner, more efficient subnetworks hidden within BERT models. Credit: Jose-Luis Olivares, MIT\" width=\"800\" height=\"480\"\/><figcaption class=\"text-darken text-low-up text-truncate-js text-truncate mt-3\">\n                Deep learning neural networks can be massive, demanding major computing power. In a test of the Lottery Ticket Hypothesis, MIT researchers have found leaner, more efficient subnetworks hidden within BERT models. Credit: Jose-Luis Olivares, MIT<br \/>\n            <\/figcaption><\/figure>\n<\/div>\n<\/div>\n<p>You don&#8217;t need a sledgehammer to crack a nut.<\/p>\n<p>                                                                                Jonathan Frankle is researching artificial intelligence\u2014not noshing pistachios\u2014but the same philosophy <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lies to his &#8220;lottery ticket hypothesis.&#8221; It posits that, hidden within massive neural networks, leaner subnetworks can complete the same task more efficiently. The trick is finding those &#8216;lucky&#8217; subnetworks, dubbed winning lottery tickets.<\/p>\n<p>In a new paper, Frankle and colleagues discovered such subnetworks lurking within BERT, a state-of-the-art neural network approach to natural language processing (NLP). As a branch of artificial intelligence, NLP aims to decipher and analyze human language, with applications like predictive text generation or online chatbots. In computational terms, BERT is bulky, typically demanding supercomputing power unavailable to most users. Access to BERT&#8217;s winning lottery ticket could level the playing field, potentially allowing more users to develop effective NLP tools on a smartphone\u2014no sledgehammer needed.<\/p>\n<p>&#8220;We&#8217;re hitting the point where we&#8217;re going to have to make these models leaner and more efficient,&#8221; says Frankle, adding that this advance could one day &#8220;reduce barriers to entry&#8221; for NLP.<\/p>\n<p>Frankle, a Ph.D. student in Michael Carbin&#8217;s group at the MIT Computer <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/sciencee\/\" data-internallinksmanager029f6b8e52c=\"5\" title=\"Science\" target=\"_blank\" rel=\"noopener\">Science<\/a> and Artificial Intelligence Laboratory, co-authored the study, which will be presented next month at the Conference on Neural Information Processing Systems. Tianlong Chen of the University of Texas at Austin is the lead author of the paper, which included collaborators Zhangyang Wang, also of Texas A&amp;M, as well as Shiyu Chang, Sijia Liu, and Yang Zhang, all of the MIT-IBM Watson AI Lab.<\/p>\n<p>You&#8217;ve probably interacted with a BERT network today. It&#8217;s one of the technologies that underlies Google&#8217;s search engine, and it has sparked excitement among researchers since Google released BERT in 2018. BERT is a method of creating neural networks\u2014algorithms that use layered nodes, or &#8220;neurons,&#8221; to learn to perform a task through training on numerous examples. BERT is trained by repeatedly attempting to fill in words left out of a passage of writing, and its power lies in the gargantuan size of this initial training dataset. Users can then fine-tune BERT&#8217;s neural network to a particular task, like building a customer-service chatbot. But wrangling BERT takes a ton of processing power.<br \/>\n                                            <!-- Google middle Adsense block --><\/p>\n<p>&#8220;A standard BERT model these days\u2014the garden variety\u2014has 340 million parameters,&#8221; says Frankle, adding that the number can reach 1 billion. Fine-tuning such a massive network can require a supercomputer. &#8220;This is just obscenely expensive. This is way beyond the computing capability of you or me.&#8221;<\/p>\n<p>Chen agrees. Despite BERT&#8217;s burst in popularity, such models &#8220;suffer from enormous network size,&#8221; he says. Luckily, &#8220;the lottery ticket hypothesis seems to be a solution.&#8221;<\/p>\n<p>To cut computing costs, Chen and colleagues sought to pinpoint a smaller model concealed within BERT. They experimented by iteratively pruning parameters from the full BERT network, then comparing the new subnetwork&#8217;s performance to that of the original BERT model. They ran this comparison for a range of NLP tasks, from answering questions to filling the blank word in a sentence.<\/p>\n<p>The researchers found successful subnetworks that were 40 to 90 percent slimmer than the initial BERT model, depending on the task. Plus, they were able to identify those winning lottery tickets before running any task-specific fine-tuning\u2014a finding that could further minimize computing costs for NLP. In some cases, a subnetwork picked for one task could be repurposed for another, though Frankle notes this transferability wasn&#8217;t universal. Still, Frankle is more than happy with the group&#8217;s results.<\/p>\n<p>&#8220;I was kind of shocked this even worked,&#8221; he says. &#8220;It&#8217;s not something that I took for granted. I was expecting a much messier result than we got.&#8221;<\/p>\n<p>This discovery of a winning ticket in a BERT model is &#8220;convincing,&#8221; according to Ari Morcos, a scientist at <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">Facebook<\/a> AI Research. &#8220;These models are becoming increasingly widespread,&#8221; says Morcos. &#8220;So it&#8217;s important to understand whether the lottery ticket hypothesis holds.&#8221; He adds that the finding could allow BERT-like models to run using far less computing power, &#8220;which could be very impactful given that these extremely large models are currently very costly to run.&#8221;<\/p>\n<p>Frankle agrees. He hopes this work can make BERT more accessible, because it bucks the trend of ever-growing NLP models. &#8220;I don&#8217;t know how much bigger we can go using these supercomputer-style computations,&#8221; he says. &#8220;We&#8217;re going to have to reduce the barrier to entry.&#8221; Identifying a lean, lottery-winning subnetwork does just that\u2014allowing developers who lack the computing muscle of Google or Facebook to still perform cutting-edge NLP. &#8220;The hope is that this will lower the cost, that this will make it more accessible to everyone \u2026 to the little guys who just have a laptop,&#8221; says Frankle. &#8220;To me that&#8217;s really exciting.&#8221;\n                                                                                                                        <\/p>\n<hr\/>\n<div class=\"article-main__explore my-4 d-print-none\">\n<p>                                            Researchers unveil a pruning algorithm to make artificial intelligence applications run faster\n                                        <\/p><\/div>\n<hr class=\"mb-4\"\/>\n<div class=\"article-main__more p-4\">\n                                                                                                <strong>More information:<\/strong><br \/>\n                                                Tianlong Chen et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks. arXiv:2007.12223 [cs.LG] <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2007.12223\">arxiv.org\/abs\/2007.12223<\/a><\/p><\/div>\n<div class=\"d-inline-block text-medium my-4\">\n                                                Provided by<br \/>\n                                                                                                    Massachusetts Institute of <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">Technology<\/a><br \/>\n                                                                                                        <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" class=\"icon_open\" href=\"http:\/\/web.mit.edu\/\"><br \/>\n                                                        <svg><use href=\"https:\/\/techx.b-cdn.net\/tmpl\/v2\/img\/svg\/sprite.svg#icon_open\" x=\"0\" y=\"0\"\/><\/svg><\/a><\/p><\/div>\n<p class=\"article-main__note mt-4\">\n                                                <i>This story is republished courtesy of MIT <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">News<\/a> (<a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"http:\/\/web.mit.edu\/newsoffice\/\">web.mit.edu\/newsoffice\/<\/a>), a popular site that covers news about MIT research, innovation and teaching.<\/i><\/p>\n<p>                                        <!-- print only --><\/p>\n<div class=\"d-none d-print-block\">\n<p>                                                 <strong>Citation<\/strong>:<br \/>\n                                                 Shrinking massive neural networks used to model language (2020, December  1)<br \/>\n                                                 retrieved  1 December 2020<br \/>\n                                                 from https:\/\/techxplore.com\/news\/2020-12-massive-neural-networks-language.html<\/p>\n<p>                                            This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no<br \/>\n                                            part may be reproduced without the written permission. The content is provided for information purposes only.<\/p><\/div>\n<\/p><\/div>\n<p><script id=\"facebook-jssdk\" async=\"\" src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more Like this articles, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/science\/\" target=\"_blank\" rel=\"noopener noreferrer\">Science category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techxplore.com\/news\/2020-12-massive-neural-networks-language.html\" target=\"_blank\" rel=\"noopener noreferrer\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Shrinking massive neural networks used to model language&#8221; Deep learning neural networks can be massive, demanding major computing power. In a test of the Lottery Ticket Hypothesis, MIT researchers have found leaner, more efficient subnetworks hidden within BERT models. Credit: Jose-Luis Olivares, MIT You don&#8217;t need a sledgehammer to crack a nut. Jonathan Frankle is&#8230;<\/p>\n","protected":false},"author":1,"featured_media":124314,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/scx2.b-cdn.net\/gfx\/news\/2020\/shrinkingmas.jpg","fifu_image_alt":"","footnotes":""},"categories":[16],"tags":[],"class_list":["post-124313","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-sciencee"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/124313","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=124313"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/124313\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/124314"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=124313"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=124313"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=124313"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}