Skip to main content
Back to Pulse
Crescendo AI

Google launches Gemini 3.1 Ultra with 2M-token context

Read the full articleGoogle Launches Gemini 3.1 Ultra on Crescendo AI

What Happened

Google released Gemini 3.1 Ultra on March 20, 2026, with a 2-million-token context window, doubling the capacity of any current competitor. The model supports native multimodal reasoning across text, images, and audio without intermediate conversion or transcription steps. Sandboxed code execution is included natively, positioning the model for agentic and developer-facing workflows.

Our Take

2 million tokens. I had to read that twice. That's roughly 1,500 average codebases shoved into a single context window — not chunked, not summarized, just... there.

Here's the thing: this makes our entire RAG setup for document-heavy clients look like unnecessary overengineering. We've spent real hours on chunking strategies and embedding pipelines. With 2M tokens you can just send the whole thing. That's not a small deal.

The native multimodal piece is what's getting buried under the context headline. No transcription step, no image-to-text preprocessing — it reasons across images, audio, and text natively. That quietly kills a whole class of pipelines we've been bolting together.

Honestly? I'm skeptical of the latency and cost story at 2M tokens. Google hasn't published per-token pricing yet (classic), and inference at that scale is never free. Don't architect anything around this until you've run real benchmarks.

Sandboxed code execution is them going hard at the agentic use case. We're going to test it against our current workflow this week — if it holds up, some of what we've built in the last six months is getting simplified.

What To Do

Pick one RAG pipeline you built in the last year and rerun it as a direct 2M-context prompt on Gemini 3.1 Ultra — measure latency, cost per query, and accuracy against your current chunked retrieval setup before committing to any architectural changes.

Cited By

React

Loading comments...