Skip to main content
Back to Pulse
Crescendo AI

Google releases Gemini 3.1 Flash-Lite at $0.25 per million tokens

Read the full articleGoogle Releases Gemini 3.1 Flash-Lite on Crescendo AI

What Happened

Google launched Gemini 3.1 Flash-Lite on March 4, 2026 at $0.25 per million input tokens, 2.5x faster than predecessor Flash models. The model is priced significantly below comparable offerings from OpenAI and Anthropic. Google is using aggressive pricing to accelerate developer migration to its platform and put pressure on competitors' margins.

Our Take

$0.25 per million tokens. Let that sink in. We ran a customer support bot last quarter that cost maybe $12/month on GPT-4o Mini — that same workload on Flash-Lite is now closer to $1.50. Google isn't competing anymore, they're trying to make everyone else's business model not work.

Here's the thing though — pricing this aggressive isn't charity. Google wants you locked in before you notice it. Lock-in wrapped in a discount is still lock-in. (We've been through this with AWS credits and we know exactly how it ends.)

Honestly, for commodity inference — summarization, classification, structured extraction — this is the obvious call right now. Flash-Lite is also 2.5x faster than previous Flash, so cost and latency objections disappear at the same time. That's a hard combo to argue against.

What I'd actually watch: how long this pricing holds. OpenAI's been in the same race. When o3-mini dropped, everyone recalculated their stacks. This is the same move, different jersey. Google has the infrastructure margins to sustain a price war longer than most — that's not nothing.

For us? We're swapping Flash-Lite into every low-stakes inference call this week. API surface is compatible enough that it's an afternoon, not a sprint.

What To Do

Swap Gemini 3.1 Flash-Lite into any GPT-4o Mini or Claude Haiku call you're running today — price difference is 3-5x, the API is compatible, and you can test and ship within an afternoon.

Cited By

React

Loading comments...