Skip to main content
Back to Pulse
Google AI

New ways to balance cost and reliability in the Gemini API

Read the full articleNew ways to balance cost and reliability in the Gemini API on Google AI

What Happened

<img src="https://storage.googleapis.com/gweb-uniblog-publish-prod/images/cost_reliability_Gemini_API-soc.max-600x600.format-webp.webp">Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.

Our Take

Honestly, this is a long overdue update to the Gemini API. Look, we've been complaining about the cost of running these large language models for years. The Flex and Priority tiers are a decent start, but we need more transparency on pricing and a clear roadmap for cost-reducing features.

That being said, the Flex tier's 50ms latency and 20% cost reduction are a step in the right direction. However, we still can't justify the cost of running these models for most clients without a significant return on investment.

What To Do

Start experimenting with the Flex tier to see if it's a viable option for your clients

Cited By

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...