Clients Focused on AI Opportunity, Rockefeller CEO Says
What Happened
Greg Fleming, president and CEO at Rockefeller Capital Management, says that his clients remain focused on the market opportunity in artificial intelligence, even as uncertainty around the Iran war and energy market fallout persists. He speaks on Bloomberg Television. (Source: Bloomberg)
Our Take
Client focus on AI implies demand for production-ready systems, not just LLM demos. This shift means ROI depends on minimizing inference costs and achieving low latency. Deploying a GPT-4 RAG system requires optimizing context retrieval and vector database costs, setting a new standard for infrastructure efficiency.
Inference cost is the new bottleneck. A slow RAG pipeline running hundreds of vector lookups using Claude API calls will hemorrhage budget far faster than a poorly tuned fine-tuning job. Developers must stop chasing token count and start tracking latency metrics. Ignoring latency for an agent workflow means losing potential user engagement.
Teams running agentic workflows must prioritize deployment monitoring. Ignore the hype and focus on real-world metrics. Do not optimize model size instead focus on optimizing context window usage in your deployment environment because latency dictates conversion rates. This focus on system economics dictates the actual value of any AI implementation.
What To Do
Do not optimize model size instead focus on optimizing context window usage in your deployment environment because latency dictates conversion rates
Builder's Brief
What Skeptics Say
This focus ignores the reality that deployment infrastructure costs are ballooning, making even efficient RAG pipelines prohibitively expensive.
Cited By
React
Get the weekly AI digest
The stories that matter, with a builder's perspective. Every Thursday.