{"id":89762,"date":"2020-10-15T14:21:54","date_gmt":"2020-10-15T11:21:54","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures\/"},"modified":"2020-10-15T14:21:54","modified_gmt":"2020-10-15T11:21:54","slug":"microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures\/","title":{"rendered":"#Microsoft&#8217;s image-captioning AI is pretty darn good at describing pictures"},"content":{"rendered":"<p>&#8220;<strong>#Microsoft&#8217;s image-captioning AI is pretty darn good at describing pictures<\/strong>&#8221;<\/p>\n<div>\n                                Microsoft has built a new AI image-captioning system that described photos more accurately than humans in limited tests.<\/p>\n<p>The model has been added to <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.microsoft.com\/en-us\/ai\/seeing-ai\">Seeing AI<\/a>, a free <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a> for people with visual impairments that uses a smartphone camera to read text, identify people, and describe objects and surroundings.<\/p>\n<p>It\u2019s also now available to<span>\u00a0app developers through\u00a0the Computer Vision API in Azure Cognitive Services, and will start rolling out in Microsoft Word, Outlook, and PowerPoint later this year.<\/span><\/p>\n<p>The model can generate \u201calt text\u201d image descriptions for web pages and documents, an important feature for people with limited vision that\u2019s all-too-often unavailable.<\/p>\n<section class=\"f-content-section f-content-block\">\n<div data-grid=\"container\">\n<div class=\"f-content-entry m-rich-content-block\">\n<p class=\"\">\u201cIdeally, everyone would include alt text for all images in documents, on the web, in <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">social media<\/a> \u2013 as this enables people who are blind to access the content and participate in the conversation,\u201d said\u00a0Saqib Shaikh, a software engineering manager at Microsoft\u2019s AI platform group. \u201cBut, alas, people don\u2019t. So, there are several apps that use image captioning as [a] way to fill in alt text when it\u2019s missing.\u201d<\/p>\n<p><em>[Read:\u00a0Microsoft unveils efforts to make AI more accessible to people with disabilities]<\/em>\n<\/div>\n<\/div>\n<\/section>\n<p>The algorithm now tops the leaderboard of an image-captioning benchmark called\u00a0<a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/nocaps.org\/\">nocaps<\/a>.\u00a0Microsoft achieved this by\u00a0<span>pre-training a large AI model on a dataset of images paired with word tags \u2014 rather than full captions, which are less efficient to create. Each of the tags was mapped to a specific object in an image.<\/span><\/p>\n<p>The pre-trained model was then fine-tuned on a dataset of captioned images, which enabled it to compose sentences. It then used its \u201cvisual vocabulary\u201d to create captions for images containing novel objects.<\/p>\n<p>Microsoft said the model is twice as good as the one it\u2019s used in products since 2015. The\u00a0image below shows how these improvements work in practice:<\/p>\n<figure class=\"post-image post-mediaBleed aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1323655 lazy\" alt=\"\" width=\"1332\" height=\"742\" sizes=\"auto, (max-width: 1332px) 100vw, 1332px\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02.png\" data-lazy=\"true\" srcset=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02.png 1332w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-280x156.png 280w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-485x270.png 485w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-242x135.png 242w, https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Screenshot-2020-10-15-at-11.39.02-796x443.png 796w\"\/><figcaption>Credit: Microsoft<\/figcaption><figcaption><a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/15\/microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human\/#\" data-url=\"https:\/\/twitter.com\/intent\/tweet?url=https%3A%2F%2Fthenextweb.com%2Fneural%2F2020%2F10%2F15%2Fmicrosofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human%2F&amp;via=thenextweb&amp;related=thenextweb&amp;text=Check out this picture on: The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d\" data-title=\"Share The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d on Twitter\" data-width=\"685\" data-height=\"500\" class=\"post-image-share popitup\" title=\"Share The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d on Twitter\"><i class=\"icon icon--inline icon--twitter--dark\"\/><\/a>The legacy AI captioned this image as \u201cA person sitting at a table using a laptop.\u201d The new model described it as \u201cA person using a microscope.\u201d<\/figcaption><\/figure>\n<p>However, the benchmark performance achievement doesn\u2019t mean the model will be better than humans at image captioning in the real world. Harsh Agrawal, one of the creators of the benchmark, <a rel=\"nofollow noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.theverge.com\/2020\/10\/14\/21514405\/image-captioning-seeing-ai-microsoft-algorithm-word-powerpoint-outlook\">told The Verge<\/a> that its evaluation metrics \u201conly roughly correlate with human preferences\u201d and that it \u201conly covers a small percentage of all the possible visual concepts.\u201d<\/p>\n<p class=\"c-post-pubDate\">\n                                    Published October 15, 2020 \u2014 11:21 UTC\n                                <\/p>\n<\/p><\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><script data-src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js#xfbml=1&amp;appId=378011798897423&amp;version=v2.6\" id=\"socialSrcFacebook\" type=\"text\/template\"><\/script><\/p>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener noreferrer\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/neural\/2020\/10\/15\/microsofts-image-captioning-ai-is-pretty-darn-good-at-describing-pictures-like-a-human\/\" target=\"_blank\" rel=\"noopener noreferrer\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Microsoft&#8217;s image-captioning AI is pretty darn good at describing pictures&#8221; Microsoft has built a new AI image-captioning system that described photos more accurately than humans in limited tests. The model has been added to Seeing AI, a free app for people with visual impairments that uses a smartphone camera to read text, identify people, and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":89763,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/neural?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/10\/Untitled-design-2020-10-15T114509.811.png&signature=e5d10c5832a1e0b5063e4e0663ef2926","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-89762","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/89762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=89762"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/89762\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/89763"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=89762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=89762"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=89762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}