{"id":523020,"date":"2022-12-08T01:21:03","date_gmt":"2022-12-07T22:21:03","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/exploring-text-to-audio-models-to-make-music-from-scratch\/"},"modified":"2022-12-08T01:21:03","modified_gmt":"2022-12-07T22:21:03","slug":"exploring-text-to-audio-models-to-make-music-from-scratch","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/exploring-text-to-audio-models-to-make-music-from-scratch\/","title":{"rendered":"#Exploring text-to-audio models to make music from scratch"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a2d226ca3b56\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a2d226ca3b56\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/exploring-text-to-audio-models-to-make-music-from-scratch\/#%E2%80%9CExploring_text-to-audio_models_to_make_music_from_scratch%E2%80%9D\" >&#8220;Exploring text-to-audio models to make music from scratch&#8221;<\/a><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"%E2%80%9CExploring_text-to-audio_models_to_make_music_from_scratch%E2%80%9D\"><\/span>&#8220;Exploring text-to-audio models to make music from scratch&#8221;<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<div>\n<div class=\"article-gallery lightGallery\">\n<div data-thumb=\"https:\/\/scx1.b-cdn.net\/csz\/news\/tmb\/2022\/text-to-audio-models-m.jpg\" data-src=\"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2022\/text-to-audio-models-m.jpg\" data-sub-html=\"The algorithm transforms a text prompt into audio. Credit: Zach Evans\">\n<figure class=\"article-img\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/scx1.b-cdn.net\/csz\/news\/800a\/2022\/text-to-audio-models-m.jpg\" alt=\"Text-to-audio models make music from scratch #ASA183\" title=\"The algorithm transforms a text prompt into audio. Credit: Zach Evans\" width=\"800\" height=\"450\"\/><figcaption class=\"text-darken text-low-up text-truncate-js text-truncate mt-3\">\n                The algorithm transforms a text prompt into audio. Credit: Zach Evans<br \/>\n            <\/figcaption><\/figure>\n<\/div>\n<\/div>\n<p>Type a few words into a text-to-image model, and you&#8217;ll end up with a weirdly accurate, completely unique picture. While this tool is fun to play with, it also opens up avenues of creative <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lication and exploration and provides workflow-enhancing tools for visual artists and animators. For musicians, sound designers, and other audio professionals, a text-to-audio model would do the same.<\/p>\n<p>                                                                                As part of the 183rd Meeting of the Acoustical Society of America, Zach Evans, of Stability AI, presented progress toward this end in his talk, &#8220;Musical audio samples generated from joint text embeddings.&#8221; <\/p>\n<p>&#8220;Text-to-image models use deep neural networks to generate original, novel images based on learned semantic correlations with text captions,&#8221; said Evans. &#8220;When trained on a large and varied data set of captioned images, they can be used to create almost any image that can be described, as well as modify images supplied by the user.&#8221;<\/p>\n<p>A text-to-audio model would be able to do the same, but with music as the end result. Among other applications, it could be used to create sound effects for video <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">game<\/a>s or samples for music production.<\/p>\n<p>But training these deep learning models is more difficult than their image counterparts.<\/p>\n<p>&#8220;One of the main difficulties with training a text-to-audio model is finding a large enough data set of text-aligned audio to train on,&#8221; said Evans. &#8220;Outside of speech data, research data sets available for text-aligned audio tend to be much smaller than those available for text-aligned images.&#8221;<\/p>\n<p>Evans and his team, including Belmont University&#8217;s Scott Hawley, have shown early success in generating coherent and relevant music and sound from text. They employed data compression methods to generate the audio with reduced training time and improved output quality.<\/p>\n<p>The researchers plan to expand to larger data sets and release their model as an open-source option for other researchers, developers, and audio professionals to use and improve.\n                                                                                                                            <\/p>\n<div class=\"article-main__more p-4\">\n                                                                                                <strong>More information:<\/strong><br \/>\n                                                Conference: <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/acousticalsociety.org\/asa-meetings\/\">acousticalsociety.org\/asa-meetings\/<\/a><\/p><\/div>\n<div class=\"d-inline-block text-medium my-4\">\n                                                Provided by<br \/>\n                                                                                                    Acoustical Society of America<br \/>\n                                                                                                        <a rel=\"nofollow noopener\" target=\"_blank\" class=\"icon_open\" href=\"http:\/\/acousticalsociety.org\/\"><br \/>\n                                                        <svg><use href=\"https:\/\/techx.b-cdn.net\/tmpl\/v2\/img\/svg\/sprite.svg#icon_open\" x=\"0\" y=\"0\"\/><\/svg><\/a><\/p><\/div>\n<p>                                        <!-- print only --><\/p>\n<div class=\"d-none d-print-block\">\n<p>                                                 <strong>Citation<\/strong>:<br \/>\n                                                 Exploring text-to-audio models to make music from scratch (2022, December  7)<br \/>\n                                                 retrieved  7 December 2022<br \/>\n                                                 from https:\/\/techxplore.com\/<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>\/2022-12-exploring-text-to-audio-music.html<\/p>\n<p>                                            This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no<br \/>\n                                            part may be reproduced without the written permission. The content is provided for information purposes only.<\/p><\/div>\n<\/p><\/div>\n<p><script id=\"facebook-jssdk\" async=\"\" src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more Like this articles, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/science\/\" target=\"_blank\" rel=\"noopener\">Science category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techxplore.com\/news\/2022-12-exploring-text-to-audio-music.html\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Exploring text-to-audio models to make music from scratch&#8221; The algorithm transforms a text prompt into audio. Credit: Zach Evans Type a few words into a text-to-image model, and you&#8217;ll end up with a weirdly accurate, completely unique picture. While this tool is fun to play with, it also opens up avenues of creative application and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":523021,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2022\/text-to-audio-models-m.jpg","fifu_image_alt":"","footnotes":""},"categories":[16],"tags":[],"class_list":["post-523020","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-sciencee"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/523020","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=523020"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/523020\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/523021"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=523020"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=523020"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=523020"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}