{"id":691264,"date":"2025-09-21T22:30:44","date_gmt":"2025-09-21T19:30:44","guid":{"rendered":"https:\/\/buradabiliyorum.com\/en\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/"},"modified":"2025-09-21T22:30:44","modified_gmt":"2025-09-21T19:30:44","slug":"silicon-valley-bets-big-on-environments-to-train-ai-agents","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/","title":{"rendered":"Silicon Valley bets big on &#8216;environments&#8217; to train AI agents"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a2d8e86cd9b2\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a2d8e86cd9b2\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/#What_is_an_RL_environment\" >What is an RL environment?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/#A_crowded_field\" >A crowded field<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/#Will_it_scale\" >Will it scale?<\/a><\/li><\/ul><\/nav><\/div>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">For years, Big Tech CEOs have touted visions of AI agents that can autonomously use software <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lications to complete tasks for people. But take today\u2019s consumer AI agents out for a spin, whether it\u2019s OpenAI\u2019s ChatGPT Agent or Perplexity\u2019s Comet, and you\u2019ll quickly realize how limited the <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a> still is. Making AI agents more robust may take a new set of techniques that the industry is still discovering.<\/p>\n<p class=\"wp-block-paragraph\">One of those techniques is carefully simulating workspaces where agents can be trained on multi-step tasks \u2014 known as reinforcement learning (RL) environments. Similarly to how labeled datasets powered the last wave of AI, RL environments are starting to look like a critical element in the development of agents.<\/p>\n<p class=\"wp-block-paragraph\">AI researchers, founders, and investors tell TechCrunch that leading AI labs are now demanding more RL environments, and there\u2019s no shortage of startups hoping to supply them.<\/p>\n<p class=\"wp-block-paragraph\">\u201cAll the big AI labs are building RL environments in-house,\u201d said Jennifer Li, <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/general\/\" data-internallinksmanager029f6b8e52c=\"3\" title=\"General\" target=\"_blank\" rel=\"noopener\">general<\/a> partner at Andreessen Horowitz, in an interview with TechCrunch. \u201cBut as you can imagine, creating these datasets is very complex, so AI labs are also looking at third party vendors that can create high quality environments and evaluations. Everyone is looking at this space.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The push for RL environments has minted a new class of well-funded startups, such as Mechanize and Prime Intellect, that aim to lead the space. Meanwhile, large data-labeling companies like Mercor and Surge say they\u2019re investing more in RL environments to keep pace with the industry\u2019s shifts from static datasets to interactive simulations. The major labs are considering investing heavily too: according to The Information, leaders at Anthropic have discussed spending more than <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.theinformation.com\/articles\/anthropic-openai-developing-ai-co-workers?rc=dp0mql\">$1 billion on RL environments<\/a> over the next year.<\/p>\n<p class=\"wp-block-paragraph\">The hope for investors and founders is that one of these startups emerge as the \u201cScale AI for environments,\u201d referring to the $29 billion data labelling powerhouse that powered the chatbot era.<\/p>\n<p class=\"wp-block-paragraph\">The question is whether RL environments will truly push the frontier of AI progress.<\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">San Francisco<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">October 27-29, 2025<\/span>\n\t\t\t\t\t\t\t<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<h2 class=\"wp-block-heading\" id=\"h-what-is-an-rl-environment\"><span class=\"ez-toc-section\" id=\"What_is_an_RL_environment\"><\/span>What is an RL environment?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"wp-block-paragraph\">At their core, RL environments are training grounds that simulate what an AI agent would be doing in a real software application. One founder described building them in <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.nytimes.com\/2025\/06\/11\/technology\/ai-mechanize-jobs.html\">recent interview<\/a> \u201clike creating a very boring video <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">game<\/a>.\u201d<\/p>\n<p class=\"wp-block-paragraph\">For example, an environment could simulate a Chrome browser and task an AI agent with purchasing a pair of socks on Amazon. The agent is graded on its performance and sent a reward signal when it succeeds (in this case, buying a worthy pair of socks).<\/p>\n<p class=\"wp-block-paragraph\">While such a task sounds relatively simple, there are a lot of places where an AI agent could get <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/trip-and-travel\/\" data-internallinksmanager029f6b8e52c=\"10\" title=\"Trip &amp; Travel\" target=\"_blank\" rel=\"noopener\">trip<\/a>ped up. It might get lost navigating the web page\u2019s drop down menus, or buy too many socks. And because developers can\u2019t predict exactly what wrong turn an agent will take, the environment itself has to be robust enough to capture any unexpected behavior, and still deliver useful feedback. That makes building environments far more complex than a static dataset.<\/p>\n<p class=\"wp-block-paragraph\">Some environments are quite elaborate, allowing for AI agents to use tools, access the internet, or use various software applications to complete a given task. Others are more narrow, aimed at helping an agent learn specific tasks in enterprise software applications.<\/p>\n<p class=\"wp-block-paragraph\">While RL environments are the hot thing in Silicon Valley right now, there\u2019s a lot of precedent for using this technique. One of OpenAI\u2019s first projects back in 2016 was building \u201c<a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/openai.com\/index\/openai-gym-beta\/\">RL Gyms<\/a>,\u201d which were quite similar to the modern conception of environments. The same year, Google DeepMind\u2019s <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.theguardian.com\/technology\/2016\/mar\/09\/google-deepmind-alphago-ai-defeats-human-lee-sedol-first-game-go-contest\">AlphaGo<\/a> AI system beat a world champion at the board game, Go. It also used RL techniques within a simulated environment.<\/p>\n<p class=\"wp-block-paragraph\">What\u2019s unique about today\u2019s environments is that researchers are trying to build computer-using AI agents with large transformer models. Unlike AlphaGo, which was a specialized AI system working in a closed environments, today\u2019s AI agents are trained to have more general capabilities. AI researchers today have a stronger starting point, but also a complicated goal where more can go wrong. <\/p>\n<h2 class=\"wp-block-heading\" id=\"h-a-crowded-field\"><span class=\"ez-toc-section\" id=\"A_crowded_field\"><\/span><strong>A crowded field<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"wp-block-paragraph\">AI data labeling companies like Scale AI, Surge, and Mercor are trying to meet the moment and build out RL environments. These companies have more resources than many startups in the space, as well as deep relationships with AI labs. <\/p>\n<p class=\"wp-block-paragraph\">Surge CEO Edwin Chen tells TechCrunch he\u2019s recently seen a \u201csignificant increase\u201d in demand for RL environments within AI labs. Surge \u2014 which reportedly generated <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.bloomberg.com\/news\/articles\/2025-07-30\/scale-rival-surge-ai-in-talks-for-funding-at-25-billion-value\">$1.2 billion in revenue<\/a> last year from working with AI labs like OpenAI, Google, Anthropic and Meta \u2014 recently spun up a new internal organization specifically tasked with building out RL environments, he said.<\/p>\n<p class=\"wp-block-paragraph\">Close behind Surge is Mercor, a startup valued at $10 billion, which has also worked with OpenAI, Meta, and Anthropic. Mercor is pitching investors on its business building RL environments for domain specific tasks such as coding, healthcare, and law, according to marketing materials seen by TechCrunch.<\/p>\n<p class=\"wp-block-paragraph\">Mercor CEO Brendan Foody told TechCrunch in an interview that \u201cfew understand how large the opportunity around RL environments truly is.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Scale AI used to dominate the data labeling space, but has lost ground since Meta invested $14 billion and hired away its CEO. Since then, Google and OpenAI dropped Scale AI as a data provider, and the startup even faces competition for data labelling work inside of Meta. But still, Scale is trying to meet the moment and build environments.<\/p>\n<p class=\"wp-block-paragraph\">\u201cThis is just the nature of the business [Scale AI] is in,\u201d said Chetan Rane, Scale AI\u2019s head of product for agents and RL environments. \u201cScale has proven its ability to adapt quickly. We did this in the early days of autonomous vehicles, our first business unit. When ChatGPT came out, Scale AI adapted to that. And now, once again, we\u2019re adapting to new frontier spaces like agents and environments.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Some newer players are focusing exclusively on environments from the outset. Among them is Mechanize, a startup founded roughly six months ago with the audacious goal of \u201cautomating all jobs.\u201d However, co-founder Matthew Barnett tells TechCrunch that his firm is starting with RL environments for AI coding agents.<\/p>\n<p class=\"wp-block-paragraph\">Mechanize aims to supply AI labs with a small number of robust RL environments, Barnett says, rather than larger data firms that create a wide range of simple RL environments. To this point, the startup is offering software engineers <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/jobs.ashbyhq.com\/mechanize\/4e401df6-49cc-4db3-a840-0ee2f68c019b\">$500,000 salaries<\/a> to build RL environments \u2014 far higher than an hourly contractor could earn working at Scale AI or Surge.<\/p>\n<p class=\"wp-block-paragraph\">Mechanize has already been working with Anthropic on RL environments, two sources familiar with the matter told TechCrunch. Mechanize and Anthropic declined to comment on the partnership.<\/p>\n<p class=\"wp-block-paragraph\">Other startups are betting that RL environments will be influential outside of AI labs. Prime Intellect \u2014 a startup backed by AI researcher Andrej Karpathy, Founders Fund, and Menlo Ventures \u2014 is targeting smaller developers with its RL environments.<\/p>\n<p class=\"wp-block-paragraph\">Last month, Prime Intellect launched an <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.primeintellect.ai\/blog\/environments\">RL environments hub,<\/a> which aims to be a \u201cHugging Face for RL environments.\u201d The idea is to give open-source developers access to the same resources that large AI labs have, and sell those developers access to computational resources in the process.<\/p>\n<p class=\"wp-block-paragraph\">Training generally capable agents in RL environments can be more computational expensive than previous AI training techniques, according to Prime Intellect researcher Will Brown. Alongside startups building RL environments, there\u2019s another opportunity for GPU providers that can power the process.<\/p>\n<p class=\"wp-block-paragraph\">\u201cRL environments are going to be too large for any one company to dominate,\u201d said Brown in an interview. \u201cPart of what we\u2019re doing is just trying to build good open-source infrastructure around it. The service we sell is compute, so it is a convenient onramp to using GPUs, but we\u2019re thinking of this more in the long term.\u201d<\/p>\n<h2 class=\"wp-block-heading\" id=\"h-will-it-scale\"><span class=\"ez-toc-section\" id=\"Will_it_scale\"><\/span><strong>Will it scale?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p class=\"wp-block-paragraph\">The open question around RL environments is whether the technique will scale like previous AI training methods.<\/p>\n<p class=\"wp-block-paragraph\">Reinforcement learning has powered some of the biggest leaps in AI over the past year, including models like OpenAI\u2019s o1 and Anthropic\u2019s Claude Opus 4. Those are particularly important breakthroughs because the methods previously used to improve AI models are now showing diminishing returns.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Environments are part of AI labs\u2019 bigger bet on RL, which many believe will continue to drive progress as they add more data and computational resources to the process. Some of the OpenAI researchers behind o1 previously told TechCrunch that the company originally invested in AI reasoning models \u2014 which were created through investments in RL and test-time-compute \u2014 because they thought it would scale nicely.<\/p>\n<p class=\"wp-block-paragraph\">The best way to scale RL remains unclear, but environments seem like a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations with tools and computers at their disposal. That\u2019s far more resource-intensive, but potentially more rewarding. <\/p>\n<p class=\"wp-block-paragraph\">Some are skeptical that all these RL environments will pan out. Ross Taylor, a former AI research lead with Meta that co-founded General Reasoning, tells TechCrunch that RL environments are prone to reward hacking. This is a process in which AI models cheat in order to get a reward, without really doing the task.<\/p>\n<p class=\"wp-block-paragraph\">\u201cI think people are underestimating how difficult it is to scale environments,\u201d said Taylor. \u201cEven the best publicly available [RL environments] typically don\u2019t work without serious modification.\u201d<\/p>\n<p class=\"wp-block-paragraph\">OpenAI\u2019s Head of Engineering for its API business, Sherwin Wu, said in a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/swyx\/status\/1966269298974011594\">recent podcast<\/a> that he was \u201cshort\u201d on RL environment startups. Wu noted that it\u2019s a very competitive space, but also that AI research is evolving so quickly that it\u2019s hard to serve AI labs well.<\/p>\n<p class=\"wp-block-paragraph\">Karpathy, an investor in Prime Intellect that has called RL environments a potential breakthrough, has also voiced caution for the RL space more broadly. In a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/karpathy\/status\/1960803117689397543\">post on X<\/a>, he raised concerns about how much more AI progress can be squeezed out of RL.<\/p>\n<p class=\"wp-block-paragraph\">\u201cI am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically,\u201d said Karpathy. <\/p>\n<p class=\"wp-block-paragraph\"><em>Update: A previous version of this article referred to Mechanize as Mechanize Work. It has been updated to reflect the company\u2019s official name.<\/em><\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2025\/09\/21\/silicon-valley-bets-big-on-environments-to-train-ai-agents\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>For years, Big Tech CEOs have touted visions of AI agents that can autonomously use software applications to complete tasks for people. But take today\u2019s consumer AI agents out for a spin, whether it\u2019s OpenAI\u2019s ChatGPT Agent or Perplexity\u2019s Comet, and you\u2019ll quickly realize how limited the technology still is. Making AI agents more robust&#8230;<\/p>\n","protected":false},"author":1,"featured_media":691265,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/GettyImages-1356382582.jpg?resize=1200,800","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,92249,153308,152300,141199,154173,158778,154755],"class_list":["post-691264","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-agents","tag-ai-research","tag-anthropic","tag-openai","tag-reinforcement-learning","tag-rl","tag-scale-ai"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/691264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=691264"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/691264\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/691265"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=691264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=691264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=691264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}