{"id":667944,"date":"2025-05-08T21:26:35","date_gmt":"2025-05-08T18:26:35","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/google-launches-implicit-caching-to-make-accessing-its-latest-ai-models-cheaper\/"},"modified":"2025-05-08T21:26:35","modified_gmt":"2025-05-08T18:26:35","slug":"google-launches-implicit-caching-to-make-accessing-its-latest-ai-models-cheaper","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/google-launches-implicit-caching-to-make-accessing-its-latest-ai-models-cheaper\/","title":{"rendered":"Google launches &#8216;implicit caching&#8217; to make accessing its latest AI models cheaper"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers.<\/p>\n<p class=\"wp-block-paragraph\">Google calls the feature \u201cimplicit caching\u201d and says it can deliver 75% savings on \u201crepetitive context\u201d passed to models via the Gemini API. It supports Google\u2019s Gemini 2.5 Pro and 2.5 Flash models.<\/p>\n<p class=\"wp-block-paragraph\">That\u2019s likely to be welcome <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a> to developers as the cost of using frontier models continues to grow.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache \ud83d\udea2<\/p>\n<p>We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!<\/p>\n<p>\u2014 Logan Kilpatrick (@OfficialLoganK) <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/twitter.com\/OfficialLoganK\/status\/1920523026551955512?ref_src=twsrc%5Etfw\">May 8, 2025<\/a><\/p><\/blockquote>\n<\/div>\n<\/figure>\n<p class=\"wp-block-paragraph\">Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to recreate answers to the same request.<\/p>\n<p class=\"wp-block-paragraph\">Google previously offered model prompt caching, but only <em>explicit<\/em> prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work.<\/p>\n<p class=\"wp-block-paragraph\">Some developers weren\u2019t pleased with how Google\u2019s explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills. Complaints reached a fever pitch in the past week, <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.reddit.com\/r\/CLine\/comments\/1kcnmzf\/regarding_unpredictable_pricing_w_gemini_25_pro\/\">prompting the Gemini team to apologize<\/a> and pledge to make changes.<\/p>\n<p class=\"wp-block-paragraph\">In contrast to explicit caching, implicit caching is automatic. Enabled by default for Gemini 2.5 models, it passes on cost savings if a Gemini API request to a model hits a cache. <\/p>\n<div class=\"wp-block-techcrunch-inline-cta\">\n<div class=\"inline-cta__wrapper\">\n<p>Techcrunch event<\/p>\n<div class=\"inline-cta__content\">\n<p>\n\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__location\">Berkeley, CA<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__separator\">|<\/span><br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<span class=\"inline-cta__date\">June 5<\/span>\n\t\t\t\t\t\t\t<\/p>\n<p>\t\t\t\t\t<span>BOOK NOW<\/span><\/p><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">\u201c[W]hen you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it\u2019s eligible for a cache hit,\u201d explained Google in a <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/developers.googleblog.com\/en\/gemini-2-5-models-now-support-implicit-caching\/?linkId=14353307\">blog post<\/a>. \u201cWe will dynamically pass cost savings back to you.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/ai.google.dev\/gemini-api\/docs\/caching?lang=python\">according to Google\u2019s developer documentation<\/a>, which is not a terribly big amount, meaning it shouldn\u2019t take much to trigger these automatic savings. Tokens are the raw bits of data models work with, with a thousand tokens equivalent to about 750 words.<\/p>\n<p class=\"wp-block-paragraph\">Given that Google\u2019s last claims of cost savings from caching ran afoul, there are some buyer-beware areas in these new claims. For one, Google recommends that developers keep repetitive context at the beginning of requests to increase the chances of implicit cache hits. Context that might change from request to request should be <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>ended at the end, the company says.<\/p>\n<p class=\"wp-block-paragraph\">For another, Google didn\u2019t offer any third-party verification that the new implicit caching system would deliver the promised automatic savings. So we\u2019ll have to see what early adopters say.<\/p>\n<\/div>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2025\/05\/08\/google-launches-implicit-caching-to-make-accessing-its-latest-ai-models-cheaper\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers. Google calls the feature \u201cimplicit caching\u201d and says it can deliver 75% savings on \u201crepetitive context\u201d passed to models via the Gemini API. It supports Google\u2019s Gemini 2.5 Pro and 2.5&#8230;<\/p>\n","protected":false},"author":1,"featured_media":667945,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/03\/GettyImages-2169339854.jpg?resize=1200,857","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,74864,26293],"class_list":["post-667944","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-gemini","tag-google"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/667944","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=667944"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/667944\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/667945"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=667944"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=667944"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=667944"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}