{"id":255564,"date":"2021-05-21T15:41:29","date_gmt":"2021-05-21T12:41:29","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet\/"},"modified":"2021-05-21T15:41:29","modified_gmt":"2021-05-21T12:41:29","slug":"can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet\/","title":{"rendered":"#Can we teach AI how to code? Welcome to IBM\u2019s Project CodeNet"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a2ed91c88be2\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a2ed91c88be2\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet\/#Automating_programming_with_deep_learning\" >Automating programming with deep learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet\/#The_CodeNet_dataset\" >The CodeNet dataset<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet\/#Programming_tasks_for_machine_learning\" >Programming tasks for machine learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/can-we-teach-ai-how-to-code-welcome-to-ibms-project-codenet\/#A_monstrous_engineering_effort\" >A monstrous engineering effort<\/a><\/li><\/ul><\/nav><\/div>\n<p>&#8220;<strong>#Can we teach AI how to code? Welcome to IBM\u2019s Project CodeNet<\/strong>&#8221;<\/p>\n<div>IBM\u2019s AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks. Called Project CodeNet, the dataset takes its name after ImageNet, the famous repository of labeled photos that triggered a revolution in computer vision and<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2019\/02\/15\/what-is-deep-learning-neural-networks\/\">deep learning<\/a>.<\/p>\n<p>While there\u2019s a scant chance that machine learning models built on the CodeNet dataset will make human programmers redundant, there\u2019s reason to be hopeful that they will make developers more productive.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Automating_programming_with_deep_learning\"><\/span>Automating programming with deep learning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In the early 2010s, impressive advances in<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2017\/08\/28\/artificial-intelligence-machine-learning-deep-learning\/\">machine learning<\/a><span>\u00a0<\/span>triggered excitement (and fear) about artificial intelligence soon automating many tasks, including programming. But AI\u2019s penetration in software development has been extremely limited.<\/p>\n<p>Human programmers discover new problems and explore different solutions using a plethora of conscious and subconscious thinking mechanisms. In contrast, most machine learning algorithms\u00a0<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/03\/29\/ai-algorithms-representations-herbert-roitblat\/\">require well-defined problems<\/a><span>\u00a0<\/span>and a lot of annotated data to develop models that can solve the same problems.<\/p>\n<p>There have been many efforts to create datasets and benchmarks to develop and evaluate \u201cAI for code\u201d systems. But given the creative and open nature of software development, it\u2019s very hard to create the perfect dataset for programming.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_CodeNet_dataset\"><\/span>The CodeNet dataset<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/github.com\/IBM\/Project_CodeNet\">Project CodeNet<\/a>, the researchers at IBM have tried to create a multi-purpose dataset that can be used to train machine learning models for various tasks. CodeNet\u2019s creators describe it as a \u201cvery large scale, diverse, and high-quality dataset to accelerate the algorithmic advances in AI for Code.\u201d<\/p>\n<p>\u00a0<\/p>\n<figure class=\"wp-block-image size-large\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"jetpack-js-lazy-image jetpack-js-lazy-image--handled wp-image-10362 js-lazy\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" alt=\"Project CodeNet diversity\" width=\"696\" height=\"255\" data-attachment-id=\"10362\" data-permalink=\"https:\/\/bdtechtalks.com\/2021\/05\/17\/ibms-codenet-machine-learning-programming\/project-codenet-diversity\/\" data-orig-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?fit=1920%2C703&amp;ssl=1\" data-orig-size=\"1920,703\" data-comments-opened=\"1\" data-image-meta=\"{\" aperture=\"\" data-image-title=\"Project CodeNet diversity\" data-image-description=\"\" data-medium-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?fit=300%2C110&amp;ssl=1\" data-large-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?fit=696%2C255&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=696%2C255&amp;ssl=1\" srcset=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1024%2C375&amp;ssl=1 1024w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=300%2C110&amp;ssl=1 300w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=768%2C281&amp;ssl=1 768w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1536%2C562&amp;ssl=1 1536w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=696%2C255&amp;ssl=1 696w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1068%2C391&amp;ssl=1 1068w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1147%2C420&amp;ssl=1 1147w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?w=1920&amp;ssl=1 1920w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?w=1392&amp;ssl=1 1392w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"jetpack--image jetpack--image--handled wp-image-10362\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=696%2C255&amp;ssl=1\" alt=\"Project CodeNet diversity\" width=\"696\" height=\"255\" data-attachment-id=\"10362\" data-permalink=\"https:\/\/bdtechtalks.com\/2021\/05\/17\/ibms-codenet-machine-learning-programming\/project-codenet-diversity\/\" data-orig-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?fit=1920%2C703&amp;ssl=1\" data-orig-size=\"1920,703\" data-comments-opened=\"1\" data-image-meta=\"{\" aperture=\"\" data-image-title=\"Project CodeNet diversity\" data-image-description=\"\" data-medium-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?fit=300%2C110&amp;ssl=1\" data-large-file=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?fit=696%2C255&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" srcset=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1024%2C375&amp;ssl=1 1024w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=300%2C110&amp;ssl=1 300w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=768%2C281&amp;ssl=1 768w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1536%2C562&amp;ssl=1 1536w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=696%2C255&amp;ssl=1 696w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1068%2C391&amp;ssl=1 1068w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?resize=1147%2C420&amp;ssl=1 1147w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?w=1920&amp;ssl=1 1920w, https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/Project-CodeNet-diversity.jpg?w=1392&amp;ssl=1 1392w\"\/><\/noscript><\/a><figcaption><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/thenextweb.com\/news\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Feditorial.thenextweb.com%2Fneural%2F2021%2F05%2F21%2Fteach-ai-how-to-code-ibm-project-codenet-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Project CodeNet is a huge dataset of ~14M code samples spread across dozens of programming languages\" data-title=\"Share Project CodeNet is a huge dataset of ~14M code samples spread across dozens of programming languages on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Project CodeNet is a huge dataset of ~14M code samples spread across dozens of programming languages on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"\/><\/a>Project CodeNet is a huge dataset of ~14M code samples spread across dozens of programming languages<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<p>The dataset contains 14 million code samples with 500 million lines of code written in 55 different programming languages. The code samples have been obtained from submissions to nearly 4,000 challenges posted on online coding platforms AIZU and AtCoder. The code samples include both correct and incorrect answers to the challenges.<\/p>\n<p>One of the key features of CodeNet is the amount of annotation that has been added to the examples. Every one of the coding challenges included in the dataset has a textual de<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">script<\/a>ion along with CPU time and memory limits. Every code submission has a dozen pieces of information, including the language, the date of submission, size, execution time, acceptance, and error types.<\/p>\n<p>The researchers at IBM have also gone through great effort to make sure the dataset is balanced along different dimensions, including programming language, acceptance, and error types.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Programming_tasks_for_machine_learning\"><\/span>Programming tasks for machine learning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>CodeNet is not the only dataset to train machine learning models for programming tasks. But a few characteristics that make it stand out. First is the sheer size of the dataset, including the number of samples and the diversity of the languages.<\/p>\n<p>But perhaps more important is the metadata that goes with the coding samples. The rich annotations added to CodeNet make it suitable for a diverse set of tasks as opposed to other coding datasets that are specialized for specific programming tasks.<\/p>\n<p>There are several ways CodeNet can be used to develop machine learning models for programming tasks. One is language translation. Since each coding challenge in the dataset contains submissions of various programming languages, data scientists can use it to create machine learning models that translate code from one language to another. This can be handy for organizations that want to port old code to new languages and make them accessible to newer generations of programmers and maintainable with new development tools.<\/p>\n<p>CodeNet can also help to develop machine learning models for code recommendation. Recommendation tools could be as simple as autocomplete-style models that finish the current line of code to more complex systems that write full functions or blocks of code.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\">\n<p><figure class=\"post-image post-mediaBleed aligncenter\"><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?ssl=1\"><img loading=\"lazy\" decoding=\"async\" class=\"jetpack-js-lazy-image jetpack-js-lazy-image--handled wp-image-10364 js-lazy\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" alt=\"visual studio intellisense\" width=\"696\" height=\"392\" data-attachment-id=\"10364\" data-permalink=\"https:\/\/bdtechtalks.com\/2021\/05\/17\/ibms-codenet-machine-learning-programming\/visual-studio-intellisense\/\" data-orig-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?fit=1920%2C1080&amp;ssl=1\" data-orig-size=\"1920,1080\" data-comments-opened=\"1\" data-image-meta=\"{\" aperture=\"\" data-image-title=\"visual studio intellisense\" data-image-description=\"\" data-medium-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?fit=300%2C169&amp;ssl=1\" data-large-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?fit=696%2C392&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" src=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=696%2C392&amp;ssl=1\" srcset=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=300%2C169&amp;ssl=1 300w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=768%2C432&amp;ssl=1 768w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=1536%2C864&amp;ssl=1 1536w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=696%2C392&amp;ssl=1 696w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=1068%2C601&amp;ssl=1 1068w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=747%2C420&amp;ssl=1 747w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?w=1920&amp;ssl=1 1920w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?w=1392&amp;ssl=1 1392w\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"jetpack--image jetpack--image--handled wp-image-10364\" src=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=696%2C392&amp;ssl=1\" alt=\"visual studio intellisense\" width=\"696\" height=\"392\" data-attachment-id=\"10364\" data-permalink=\"https:\/\/bdtechtalks.com\/2021\/05\/17\/ibms-codenet-machine-learning-programming\/visual-studio-intellisense\/\" data-orig-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?fit=1920%2C1080&amp;ssl=1\" data-orig-size=\"1920,1080\" data-comments-opened=\"1\" data-image-meta=\"{\" aperture=\"\" data-image-title=\"visual studio intellisense\" data-image-description=\"\" data-medium-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?fit=300%2C169&amp;ssl=1\" data-large-file=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?fit=696%2C392&amp;ssl=1\" data-recalc-dims=\"1\" data-lazy-loaded=\"1\" srcset=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=300%2C169&amp;ssl=1 300w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=768%2C432&amp;ssl=1 768w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=1536%2C864&amp;ssl=1 1536w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=696%2C392&amp;ssl=1 696w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=1068%2C601&amp;ssl=1 1068w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?resize=747%2C420&amp;ssl=1 747w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?w=1920&amp;ssl=1 1920w, https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2021\/05\/visual-studio-intellisense.jpg?w=1392&amp;ssl=1 1392w\"\/><\/noscript><\/a><figcaption><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/thenextweb.com\/news\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Feditorial.thenextweb.com%2Fneural%2F2021%2F05%2F21%2Fteach-ai-how-to-code-ibm-project-codenet-syndication%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: Since CodeNet has a wealth of metadata about memory and execution-time metrics, data scientists can also use it to develop code optimization systems. Or they can use the error-type metadata to train machine learning systems that flag potential flaws in source code.\" data-title=\"Share Since CodeNet has a wealth of metadata about memory and execution-time metrics, data scientists can also use it to develop code optimization systems. Or they can use the error-type metadata to train machine learning systems that flag potential flaws in source code. on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share Since CodeNet has a wealth of metadata about memory and execution-time metrics, data scientists can also use it to develop code optimization systems. Or they can use the error-type metadata to train machine learning systems that flag potential flaws in source code. on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"\/><\/a>Since CodeNet has a wealth of metadata about memory and execution-time metrics, data scientists can also use it to develop code optimization systems. Or they can use the error-type metadata to train machine learning systems that flag potential flaws in source code.<\/figcaption><\/figure>\n<\/p>\n<\/figure>\n<\/div>\n<p>A more advanced use case that would be interesting to see is code generations. CodeNet is a rich library of textual descriptions of problems and their corresponding source code. There have already been several examples of developers using<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/08\/17\/openai-gpt-3-commercial-ai\/\">advanced language models such as GPT-3<\/a><span>\u00a0<\/span>to generate code from natural language descriptions. It will be interesting to see whether CodeNet can help finetune these language models to become more consistent in code generation.<\/p>\n<p>The researchers at IBM have already conducted several experiments with CodeNet, including code classification, code similarity evaluation, and code completion. The deep learning architectures they used include simple multi-layer perceptrons,<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2020\/01\/06\/convolutional-neural-networks-cnn-convnets\/\">convolutional neural networks<\/a>, graph neural networks, and Transformers. The results, reported in a<span>\u00a0<\/span><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/github.com\/IBM\/Project_CodeNet\/blob\/main\/ProjectCodeNet.pdf\">paper<\/a><span>\u00a0<\/span>that details Project CodeNet, show that they have been able to obtain above 90-percent accuracy in most tasks. (Though it\u2019s worth noting that evaluating accuracy in programming is a bit different from image classification and text generation, where minor errors might result in awkward but acceptable results.)<\/p>\n<h2><span class=\"ez-toc-section\" id=\"A_monstrous_engineering_effort\"><\/span>A monstrous engineering effort<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The engineers at IBM carried out a complicated software and data engineering effort to curate the CodeNet dataset and develop its complementary tools.<\/p>\n<p>First, they had to gather the code samples from AIZU and AtCoder. While one of them had an application programming interface that made it easy to obtain the code, the other had no easy-to-access interface and the researchers had to develop tools that scrapped the data from the platform\u2019s web pages and decomposed it into a tabular format. Then, they had to manually merge the two datasets into a unified schema.<\/p>\n<p>Next, they had to develop tools to cleanse the data by identifying and removing duplicates and samples that had a lot of dead code (source code that is not executed at runtime).<\/p>\n<p>They also developed preprocessing tools that will make it easier to train machine learning models on the CodeNet corpus. These tools include tokenizers for different programming languages, parse trees, and a graph representation generator for use in graph neural networks.<\/p>\n<p>All these efforts are a reminder of the huge human effort needed to create efficient machine learning systems. Artificial intelligence is not ready to replace programmers (at least for the time being). But it might change the kind of tasks that require the efforts and ingenuity of human programmers.<\/p>\n<p><i><span>This article was originally published by Ben Dickson on\u00a0<\/span><\/i><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/\"><i><span>TechTalks<\/span><\/i><\/a><i><span>, a publication that examines trends in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a>, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article\u00a0<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bdtechtalks.com\/2021\/05\/17\/ibms-codenet-machine-learning-programming\/\">here<\/a>.<\/span><\/i><\/p>\n<\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/teach-ai-how-to-code-ibm-project-codenet-syndication\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Can we teach AI how to code? Welcome to IBM\u2019s Project CodeNet&#8221; IBM\u2019s AI research division has released a 14-million-sample dataset to develop machine learning models that can help in programming tasks. Called Project CodeNet, the dataset takes its name after ImageNet, the famous repository of labeled photos that triggered a revolution in computer vision&#8230;<\/p>\n","protected":false},"author":1,"featured_media":255565,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/05\/AI-code.jpeg&signature=e466ab3c937c5af582d9011cacb7d41d","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-255564","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/255564","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=255564"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/255564\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/255565"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=255564"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=255564"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=255564"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}