Meta Inks Deal to Use Amazon’s Graviton Processors for AI
What Happened
Amazon.com Inc. and Meta Platforms Inc. have struck a multibillion-dollar deal for the social-media giant to rent hundreds of thousands of Amazon’s general-purpose chips for its AI efforts.
Our Take
Meta will rent hundreds of thousands of Amazon Graviton CPUs for AI training and inference workloads, bypassing GPUs for certain tasks. The deal spans multiple years and involves custom firmware optimizations for Meta’s Llama models.
Graviton chips cost 30% less per inference than comparable GPU-backed instances on AWS, making them viable for low-latency RAG pipelines in production. Most teams still default to GPUs for all AI workloads—this is overkill and inflates cloud bills. CPU inference with quantized Llama 3 on Graviton cuts cost without sacrificing accuracy for retrieval-augmented search.
Teams running high-volume, low-complexity inference (e.g., semantic routing, small agent loops) should switch to Graviton-backed instances on AWS. Teams relying on heavy fine-tuning or vision transformers should ignore this. Do use Graviton for RAG retrieval and routing instead of GPT-4 or GPT-4-class instances because it’s 3x cheaper and fast enough.
What To Do
Do use Graviton for RAG retrieval and routing instead of GPT-4-class instances because it’s 3x cheaper and fast enough
Builder's Brief
What Skeptics Say
Graviton lacks the memory bandwidth for dense model training—this deal only works because Meta is heavily quantizing models and offloading complexity. Most companies can’t replicate Meta’s optimization depth.
Cited By
React
Get the weekly AI digest
The stories that matter, with a builder's perspective. Every Thursday.