{"id":674691,"date":"2025-06-12T14:00:31","date_gmt":"2025-06-12T11:00:31","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/smart-glasses-capture-first-person-task-demos\/"},"modified":"2025-06-12T14:00:31","modified_gmt":"2025-06-12T11:00:31","slug":"smart-glasses-capture-first-person-task-demos","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/smart-glasses-capture-first-person-task-demos\/","title":{"rendered":"Smart glasses capture first-person task demos"},"content":{"rendered":"<div>\n<div class=\"article-gallery lightGallery\">\n<div data-thumb=\"https:\/\/scx1.b-cdn.net\/csz\/news\/tmb\/2025\/a-new-system-to-collec.jpg\" data-src=\"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2025\/a-new-system-to-collec.jpg\" data-sub-html=\"Human demonstrations are done with only black ovens (top). The policy transfers zero-shot to the robot with the same oven (middle) and also generalizes to a new oven instance (bottom). The points are color-coded to represent the correspondence. Credit: Liu et al.\">\n<figure class=\"article-img\">\n            <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/scx1.b-cdn.net\/csz\/news\/800a\/2025\/a-new-system-to-collec.jpg\" alt=\"A new system to collect action-labeled data for robot training using smart-glasses\" title=\"Human demonstrations are done with only black ovens (top). The policy transfers zero-shot to the robot with the same oven (middle) and also generalizes to a new oven instance (bottom). The points are color-coded to represent the correspondence. Credit: Liu et al.\" width=\"800\" height=\"488\"\/><figcaption class=\"text-darken text-low-up text-truncate-js text-truncate mt-3\">\n                Human demonstrations are done with only black ovens (top). The policy transfers zero-shot to the robot with the same oven (middle) and also <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a>izes to a new oven instance (bottom). The points are color-coded to represent the correspondence. Credit: Liu et al.<br \/>\n            <\/figcaption><\/figure>\n<\/p><\/div>\n<\/div>\n<p>Over the past few decades, robots have gradually started making their way into various real-world settings, including some malls, airports and hospitals, as well as a few offices and households.<\/p>\n<p>For robots to be deployed on a larger scale, serving as reliable everyday assistants, they should be able to complete a wide range of common manual tasks and chores, such as cleaning, washing the dishes, cooking and doing the laundry.<\/p>\n<p>Training machine learning algorithms that allow robots to successfully complete these tasks can be challenging, as it often requires extensive annotated data and\/or demonstration videos showing humans the tasks. Devising more effective methods to collect data to train robotics algorithms could thus be highly advantageous, as it could help to further broaden the capabilities of robots.<\/p>\n<p>Researchers at New York University and UC Berkeley recently introduced EgoZero, a new system to collect ego-centric demonstrations of humans completing specific manual tasks. This system, introduced in a <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2505.20290\" target=\"_blank\">paper<\/a> posted to the <i>arXiv<\/i> preprint server, relies on the use of Project Aria glasses, the smart glasses for augmented reality (AR) developed by Meta.<\/p>\n<figure class=\"mb-4\" itemscope=\"\" itemtype=\"http:\/\/schema.org\/VideoObject\">\n    <meta itemprop=\"name\" content=\"A new system to collect action-labeled data for robot training using smart-glasses\"\/><br \/>\n    <meta itemprop=\"url\" content=\"https:\/\/scx2.b-cdn.net\/gfx\/video\/2025\/a-new-system-to-collec-2.mp4\"\/><br \/>\n    <meta itemprop=\"description\" content=\"Credit: https:\/\/egozero-robot.github.io\/\"\/><br \/>\n    <meta itemprop=\"uploadDate\" content=\"2025-06-11T10:52:19-04:00\"\/><br \/>\n        <meta itemprop=\"thumbnailUrl\" content=\"https:\/\/scx1.b-cdn.net\/gfx\/video_tmb\/2025\/a-new-system-to-collec-2.mp4.jpg\"\/><br \/>\n    <meta itemprop=\"contentUrl\" content=\"https:\/\/scx2.b-cdn.net\/gfx\/video\/2025\/a-new-system-to-collec-2.mp4\"\/><br \/>\n            <video class=\"embed-responsive embed-responsive-16by9\" id=\"jwVID83876\" controls=\"\" poster=\"https:\/\/scx1.b-cdn.net\/gfx\/video_tmb\/2025\/a-new-system-to-collec-2.mp4.jpg\"><source src=\"https:\/\/scx2.b-cdn.net\/gfx\/video\/2025\/a-new-system-to-collec-2.mp4\" type=\"video\/mp4\"><\/source><\/video><figcaption class=\"text-darken text-low-up mt-4\" itemprop=\"caption\">Credit: https:\/\/egozero-robot.github.io\/<\/figcaption><\/figure>\n<p>&#8220;We believe that general-purpose robotics is bottlenecked by a lack of internet-scale data, and that the best way to address this problem would be to collect and learn from first-person human data,&#8221; Lerrel Pinto, senior author of the paper, told Tech Xplore.<\/p>\n<p>&#8220;The primary objectives of this project were to develop a way to collect accurate action-labeled data for robot training, optimize for the ergonomics of the data collection wearables needed, and transfer human behaviors into robot policies with zero robot data.&#8221;<\/p>\n<p>                                                                                                        <!-- TechX - News - In-article --><\/p>\n<p>EgoZero, the new system developed by Pinto and his colleagues, relies on Project Aria smart glasses to easily collect video demonstrations of humans completing tasks while performing robot-executable actions, captured from the point of view of the person wearing the glasses.<\/p>\n<p>These demonstrations can in turn be used to train robotics algorithms on new manipulation policies, which could in turn allow robots to successfully complete various manual tasks.<\/p>\n<p>&#8220;Unlike prior works that require multiple calibrated cameras, wrist wearables, or motion capture gloves, EgoZero is unique in that it is able to extract these 3D representations with only smart glasses (Project Aria smart glasses),&#8221; explained Ademi Adeniji, student and co-lead author of the paper.<\/p>\n<p>&#8220;As a result, robots can learn a new task from as little as 20 minutes of human demonstrations, with no teleoperation.&#8221;<\/p>\n<div class=\"article-gallery lightGallery\">\n<div data-thumb=\"https:\/\/scx1.b-cdn.net\/csz\/news\/tmb\/2025\/a-new-system-to-collec-1.jpg\" data-src=\"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2025\/a-new-system-to-collec-1.jpg\" data-sub-html=\"Architecture diagram. EgoZero trains policies in a unified state-action space defined as egocentric 3D points. Unlike previous methods, EgoZero localizes object points via triangulation over the camera trajectory, and computes action points via Aria MPS hand pose and a hand estimation model. These points supervise a closed-loop Transformer policy, which is rolled out on unprojected points from an iPhone during inference. Credit: Liu et al.\">\n<figure class=\"article-img text-center\">\n            <img decoding=\"async\" src=\"https:\/\/scx1.b-cdn.net\/csz\/news\/800a\/2025\/a-new-system-to-collec-1.jpg\" alt=\"A new system to collect action-labeled data for robot training using smart-glasses\" title=\"Architecture diagram. EgoZero trains policies in a unified state-action space defined as egocentric 3D points. Unlike previous methods, EgoZero localizes object points via triangulation over the camera trajectory, and computes action points via Aria MPS hand pose and a hand estimation model. These points supervise a closed-loop Transformer policy, which is rolled out on unprojected points from an iPhone during inference. Credit: Liu et al.\"\/><figcaption class=\"text-left text-darken text-truncate text-low-up mt-3\">\n                Architecture diagram. EgoZero trains policies in a unified state-action space defined as egocentric 3D points. Unlike previous methods, EgoZero localizes object points via triangulation over the camera trajectory, and computes action points via Aria MPS hand pose and a hand estimation model. These points supervise a closed-loop Transformer policy, which is rolled out on unprojected points from an iPhone during inference. Credit: Liu et al.<br \/>\n            <\/figcaption><\/figure>\n<\/p><\/div>\n<\/div>\n<p>To evaluate their proposed system, the researchers used it to collect video demonstrations of simple actions that are commonly completed in a household environment (e.g., opening an oven door) and then used these demonstrations to train a machine learning algorithm.<\/p>\n<p>The machine learning algorithm was then deployed on Franka Panda, a robotic arm with a gripper attached at its end. Notably, they found that the robotic arm successfully completed most of the tasks they tested it on, even if the algorithm planning its movements underwent minimal training.<\/p>\n<p>&#8220;EgoZero&#8217;s biggest contribution is that it can transfer human behaviors into robot policies with zero robot data, with just a pair of smart glasses,&#8221; said Pinto.<\/p>\n<p>&#8220;It extends past work (Point Policy) by showing that 3D representations enable efficient robot learning from humans, but completely in-the-wild. We hope this serves as a foundation for future exploration of representations and algorithms to enable human-to-robot learning at scale.&#8221;<\/p>\n<p>The code for the data collection system introduced by Pinto and his colleagues was <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/egozero-robot.github.io\/\" target=\"_blank\">published on GitHub<\/a> and can be easily accessed by other research teams.<\/p>\n<p>In the future, it could be used to rapidly collect datasets to train robotics algorithms, which could contribute to the further development of robots, ultimately facilitating their deployment in a greater number of households and offices worldwide.<\/p>\n<p>&#8220;We now hope to explore the tradeoffs between 2D and 3D representations at a larger scale,&#8221; added Vincent Liu, student and co-lead author of the paper.<\/p>\n<p>&#8220;EgoZero and past work (Point Policy, P3PO) have only explored single-task 3D policies, so it would be interesting to extend this framework of learning from 3D points in the form of a fine-tuned LLM\/VLM, similar to how modern VLA models are trained.&#8221;<\/p>\n<p><i>Written for you by our author <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sciencex.com\/help\/editorial-team\/#authors\" target=\"_blank\">Ingrid Fadelli<\/a>, edited by <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sciencex.com\/help\/editorial-team\/\" target=\"_blank\">Lisa Lock<\/a>, and fact-checked and reviewed by <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sciencex.com\/help\/editorial-team\/\" target=\"_blank\">Robert Egan<\/a>\u2014this article is the result of careful human work. We rely on readers like you to keep independent <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/sciencee\/\" data-internallinksmanager029f6b8e52c=\"5\" title=\"Science\" target=\"_blank\" rel=\"noopener\">science<\/a> journalism alive. If this reporting matters to you, please consider a <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/sciencex.com\/donate\/?utm_source=story&amp;utm_medium=story&amp;utm_campaign=story\" target=\"_blank\">donation<\/a> (especially monthly). You&#8217;ll get an <b>ad-free<\/b> account as a thank-you.<\/i><\/p>\n<div class=\"article-main__more p-4\">\n<p><strong>More information:<\/strong><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\tVincent Liu et al, EgoZero: Robot Learning from Smart Glasses, <i>arXiv<\/i> (2025). <a rel=\"nofollow\" target=\"_blank\" data-doi=\"1\" href=\"https:\/\/dx.doi.org\/10.48550\/arxiv.2505.20290\" target=\"_blank\">DOI: 10.48550\/arxiv.2505.20290<\/a><\/p>\n<div class=\"mt-3\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<strong>Journal information:<\/strong><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<cite>arXiv<\/cite><br \/>\n                                                        <a rel=\"nofollow\" target=\"_blank\" class=\"icon_open\" href=\"http:\/\/arxiv.org\/\" target=\"_blank\" rel=\"nofollow\"><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<svg>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<use href=\"https:\/\/techx.b-cdn.net\/tmpl\/v2\/img\/svg\/sprite.svg#icon_open\" x=\"0\" y=\"0\"\/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/svg><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n<\/p><\/div>\n<p class=\"article-main__note mt-4\">\n                                                \u00a9 2025 Science X Network\n                                            <\/p>\n<p>                                        <!-- print only --><\/p>\n<div class=\"d-none d-print-block\">\n<p>\n                                                <strong>Citation<\/strong>:<br \/>\n                                                Training robots without robots: Smart glasses capture first-person task demos (2025, June 12)<br \/>\n                                                retrieved 12 June 2025<br \/>\n                                                from https:\/\/techxplore.com\/<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>\/2025-06-robots-smart-glasses-capture-person.html\n                                            <\/p>\n<p>\n                                            This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no<br \/>\n                                            part may be reproduced without the written permission. The content is provided for information purposes only.\n                                            <\/p>\n<\/p><\/div>\n<\/p><\/div>\n<p><script id=\"facebook-jssdk\" async=\"\" src=\"https:\/\/connect.facebook.net\/en_US\/sdk.js\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more Like this articles, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/sciencee\/\" target=\"_blank\" >Science category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techxplore.com\/news\/2025-06-robots-smart-glasses-capture-person.html\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Human demonstrations are done with only black ovens (top). The policy transfers zero-shot to the robot with the same oven (middle) and also generalizes to a new oven instance (bottom). The points are color-coded to represent the correspondence. Credit: Liu et al. Over the past few decades, robots have gradually started making their way into&#8230;<\/p>\n","protected":false},"author":1,"featured_media":674692,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/scx2.b-cdn.net\/gfx\/news\/hires\/2025\/a-new-system-to-collec.jpg","fifu_image_alt":"","footnotes":""},"categories":[16],"tags":[],"class_list":["post-674691","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-sciencee"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/674691","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=674691"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/674691\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/674692"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=674691"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=674691"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=674691"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}