Three reasons why DeepSeek’s new model V4 matters
What Happened
On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design that helps it handle large amounts of text more efficiently. Like DeepSeek’s previous models, V4 is o
Our Take
DeepSeek V4 handles 128k context by default, doubling the effective input length of V3, and shows measurable gains in code and math tasks at comparable FLOPs. The model uses grouped-query attention and a revamped tokenizer, reducing memory overhead during long-context inference.
Long-context models now clear a real threshold: RAG pipelines using Haiku or GPT-4o can be replaced with V4 at 60% lower cost per 100K tokens, but only if you skip retrieval entirely and inject full context directly. Most teams still default to retrieval-augmented generation for long documents—this is now often redundant and slower.
Teams building document-intensive agents on Claude or GPT-4 should test V4 in their stack now; if you’re on a tight latency budget or below 32k context, keep using Haiku. Migrate high-context workflows to V4 instead of chaining chunks through GPT-4 because 128k at lower cost beats fragmented retrieval.
What To Do
Migrate high-context workflows to V4 instead of chaining chunks through GPT-4 because 128k at lower cost beats fragmented retrieval
Builder's Brief
What Skeptics Say
V4’s gains rely on prompt engineering tricks and synthetic data; real-world accuracy on niche domains still lags behind GPT-4.
Cited By
React
Get the weekly AI digest
The stories that matter, with a builder's perspective. Every Thursday.