{"id":732429,"date":"2026-06-09T22:05:15","date_gmt":"2026-06-09T19:05:15","guid":{"rendered":"https:\/\/buradabiliyorum.com\/en\/can-tech-companies-learn-to-love-cheaper-ai-models\/"},"modified":"2026-06-09T22:05:15","modified_gmt":"2026-06-09T19:05:15","slug":"can-tech-companies-learn-to-love-cheaper-ai-models","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/can-tech-companies-learn-to-love-cheaper-ai-models\/","title":{"rendered":"Can tech companies learn to love cheaper AI models?\u00a0"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">The AI boom has been\u00a0built on\u00a0a basic assumption:\u00a0bigger models are more powerful, and the most powerful models win.\u00a0Now, the industry is about to learn what h<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>ens if\u00a0that assumption starts to break.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Mounting costs have\u00a0already\u00a0pressured users to give smaller and cheaper models a second look.\u00a0This\u00a0cost-conscious model-shopping\u00a0is\u00a0new\u00a0and\u00a0it\u2019s\u00a0unclear how it will affect the industry, but the impact is likely to\u00a0be\u00a0significant.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">One\u00a0prediction, laid out best by Coinbase co-founder Brian Armstrong,\u00a0is that it will result in the vast majority of tasks shifting to cheaper models.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cDemand for\u00a0intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12-18 months,\u201d Armstrong\u00a0<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/x.com\/brian_armstrong\/status\/2063782620815876515?s=46&amp;t=45_xAnRsdQP1GVqYv9Gdbw\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">wrote on X<\/a>. \u201c20% of workloads will still run on latest gen models where IQ maxing is important.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s\u00a0hard to overstate what a significant shift it will be for the AI industry if Armstrong\u2019s prediction\u00a0comes true.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Before now,\u00a0most AI companies have\u00a0competed\u00a0on quality, which has meant\u00a0defaulting to the most advanced available model. If those same jobs can be\u00a0handled by\u00a0cheaper models without affecting quality, it would mean a massive shift in the economics of\u00a0AI.\u00a0And critically, much\u00a0of the savings would be coming out of the pockets of the big labs,\u00a0dealing\u00a0a\u00a0financial\u00a0blow to OpenAI and Anthropic just as\u00a0they\u2019re\u00a0heading for\u00a0their\u00a0IPOs.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s\u00a0a potentially seismic change in the\u00a0industry, resting on one basic question:\u00a0Are companies\u00a0ready\u00a0to switch to smaller models?\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Initial tests suggest that, when the system is arranged right, cheaper models could sub in without any sacrifice in quality. In a recent test by the legal AI tool Harvey,\u00a0the\u00a0company was able to reduce\u00a0inference costs\u00a0by 3x\u00a0without reducing quality. The\u00a0test, <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/fireworks.ai\/blog\/open-source-agents-frontier-advisors\">performed in partnership<\/a>\u00a0with the inference platform Fireworks AI, combined\u00a0Claude Opus and\u00a0Fireworks\u2019\u00a0GLM 5.1,\u00a0and shifted to\u00a0Opus for the most intensive tasks.\u00a0The result was a significantly lower load in terms of server time and overall cost.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cQuality comes first, and in legal it always will,\u201d Harvey co-founder Gabe Pereyra told TechCrunch, referring to the AI legal services his startup provides. \u201cHowever, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.\u201d<\/p>\n<p class=\"wp-block-paragraph\">This\u00a0trend\u00a0is often framed\u00a0in terms of\u00a0major labs versus\u00a0Chinese\u00a0models\u00a0or\u00a0open-weight\u00a0ones,\u00a0but that misses the bigger point. The real divide\u00a0isn\u2019t\u00a0between\u00a0proprietary and open models;\u00a0it\u2019s\u00a0between\u00a0large models and small ones.\u00a0You\u00a0can save money by switching from GPT-5.5 to DeepSeek\u2019s V4\u00a0Flash, but\u00a0switching to\u00a0GPT-5.4-mini works just as well.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">There\u2019s\u00a0an active price war going on between in-house inference from the big labs and independently served open-weight models. For the bigger question of small versus\u00a0large, it\u00a0doesn\u2019t\u00a0really matter which kind of small model wins out.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">All\u00a0of\u00a0this might seem obvious \u2014 of course you\u00a0shouldn\u2019t\u00a0use more\u00a0compute\u00a0than\u00a0necessary\u00a0\u2014 but it\u00a0runs counter to\u00a0the scaling-first approach that has dominated the industry until now. Inspired by\u00a0<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Bitter_lesson\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">the\u00a0bitter lesson<\/a>, labs have leaned hard into training the most compute-intensive models possible,\u00a0pushing\u00a0the frontier of what AI models\u00a0can\u00a0do. With prices heavily subsidized by investors, clients had no reason to choose anything but the most advanced option. <\/p>\n<p class=\"wp-block-paragraph\">With token prices rising and subsidies slowing down, users are facing cost pressure for the first time.\u00a0We don\u2019t know whether the new cost pressure will actually drive enterprise users to smaller models.\u00a0They could just as easily economize by making fewer calls, using less\u00a0context,\u00a0or simply giving up on the least promising deployments.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">But if it turns out that most deployments can be run just as well on a smaller model, it could\u00a0put a\u00a0serious damper on the growing demand for inference \u2013 and raise new questions about how to justify the cost of training a frontier model.\u00a0<\/p>\n<\/div>\n<p><em>When you purchase through links in our articles, we may earn a small commission. This doesn\u2019t affect our editorial independence.<\/em><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2026\/06\/09\/can-tech-companies-learn-to-love-cheaper-models\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The AI boom has been\u00a0built on\u00a0a basic assumption:\u00a0bigger models are more powerful, and the most powerful models win.\u00a0Now, the industry is about to learn what happens if\u00a0that assumption starts to break.\u00a0\u00a0 Mounting costs have\u00a0already\u00a0pressured users to give smaller and cheaper models a second look.\u00a0This\u00a0cost-conscious model-shopping\u00a0is\u00a0new\u00a0and\u00a0it\u2019s\u00a0unclear how it will affect the industry, but the impact is&#8230;<\/p>\n","protected":false},"author":1,"featured_media":732430,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2023\/08\/GettyImages-1297856112.jpg?resize=1200,675","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,153012,152300,156371,141199,151454],"class_list":["post-732429","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-ai-models","tag-anthropic","tag-harvey","tag-openai","tag-tc"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/732429","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=732429"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/732429\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/732430"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=732429"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=732429"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=732429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}