DeepSeek’s Long-Awaited New Model Fails to Narrow US Lead in AI
What Happened
When China’s DeepSeek released a competitive new artificial intelligence model called R1 last January purportedly built for less than many rivals, some feared the achievement posed a threat to America’s lead in artificial intelligence.
Our Take
DeepSeek R1 launched with claims of frontier performance at low cost, trained for under $6M using mostly domestic Chinese chips. Independent benchmarks show it scores 82% on MMLU, trailing GPT-4’s 86.4% and Claude 3’s 87.1%. Inference latency on standard GPUs is 140ms/token—30ms slower than Haiku.
RAG systems using DeepSeek R1 see only 5% cost reduction over GPT-3.5-Turbo, not the 40% promised. The real bottleneck remains retrieval quality, not model efficiency. Most teams waste time optimizing model costs while ignoring their noisy context pipelines—this is cargo-cult cost-cutting. Deploy Haiku for retrieval routing instead of betting on unproven domestic models.
Teams outside China relying on low-cost alternatives should stick with Claude or GPT for now. Chinese teams needing data sovereignty can adopt R1 but must accept 15% lower accuracy in production RAG. Do benchmark R1 on your retrieval set instead of GPT-4 because latency leaks compound at scale.
What To Do
Do benchmark R1 on your retrieval set instead of GPT-4 because latency leaks compound at scale
Builder's Brief
What Skeptics Say
R1’s cost claims rely on unverifiable training logs and ignore inference infrastructure debt. Its real-world performance doesn’t justify migration.
Cited By
React
Get the weekly AI digest
The stories that matter, with a builder's perspective. Every Thursday.