Skip to main content
Research
Engineering & AI12 min read

MCP vs CLI: Why Anthropic Over-Engineered a Solved Problem

On March 11, 2026, Perplexity's CTO stood at their developer conference and said they were moving away from MCP. Cloudflare published benchmarks showing MCP consumes 244,000 tokens to describe what 1,000 tokens can express in code. And the sharpest point: Anthropic's own Claude Code — built by the same company that invented MCP — uses a bash tool, not MCP, as its primary integration mechanism. A documented look at why Anthropic over-engineered a solved problem, why this pattern keeps repeating in software history, and what it tells us about how AI tooling actually evolves.

AuthorAbhishek Sharma· Fordel Studios

The Crack Appeared Fast

On March 11, 2026, Perplexity CTO Denis Yarats took the stage at the Ask 2026 conference and announced they were moving away from MCP. The statement was direct: tool schemas eat 72% of the context window before the agent processes a single word of user input. Authentication is clunky. Most features go unused. For Perplexity's use case, MCP was more overhead than it was worth.

This was not a fringe voice. Perplexity runs one of the highest-volume AI query pipelines in the industry. When their CTO makes an architectural decision public, it carries signal.

MCP's tool definitions consume 72% of available context window before the agent processes a single word of user input.
Denis Yarats, Perplexity CTO

Cloudflare had published findings in the same period. Their Code Mode — which lets agents write and execute code rather than calling pre-defined MCP tools — cut token usage by 81% compared to describing the same API surface as MCP tool definitions. For a complex integration like 2,500 API endpoints, MCP required roughly 244,000 tokens to express what Code Mode expressed in approximately 1,000 tokens.

Two major operators, same conclusion: the protocol has a cost problem. And the cost is context.

···

The M×N Argument

To be fair about MCP: it was designed to solve a real problem. Before the protocol existed, every AI integration was custom. You wanted your agent to query a database — you wrote a function, described it to the model, handled the call, parsed the response. You wanted it to interact with GitHub — same process, different implementation. M models multiplied by N tools produced M×N custom integrations. Every new model meant re-implementing every tool. Every new tool meant integrating it with every model.

MCP promised to reduce this to M+N. Implement the protocol once on each side. Any compliant model talks to any compliant tool server without custom glue code. Anthropic announced the protocol on November 25, 2024. OpenAI, Google DeepMind, and Microsoft followed within months. The argument sounded reasonable. On paper it still does.

The Shell Was Right There

Large language models are trained on billions of shell interactions. Stack Overflow answers that show curl commands. GitHub repositories full of Makefiles and shell scripts. Man pages. README files. Decades of Unix knowledge, densely represented in the training corpus.

The practical consequence: models already know gh, git, stripe, aws, curl, jq, psql — not superficially, but deeply. They know the flags, the output formats, the pipe patterns, the error codes. This knowledge costs zero tokens to activate. There is no schema to load. No server to start. No protocol to negotiate. You give the model shell access, and it already knows how to use every mature CLI tool in existence.

CLI tools also compose natively. The model does not just know the individual tools — it knows the patterns for chaining them. `gh issue list --json | jq '.[] | .number'` is not something the model needs to be taught. It is something the model has seen thousands of times. That composability is structural, not incidental.

Any tool with a CLI is immediately accessible. Most mature tools — Stripe, GitHub, AWS, Cloudflare, Kubernetes, PostgreSQL — have excellent CLIs with complete API coverage. The initialization cost is zero.

DimensionCLIMCP
Initialization costZero — model pre-trained on shellSchema loading on every conversation
Model familiarityDeep — billions of training examplesProtocol is 16 months old
ComposabilityNative via pipes and shell operatorsRequires custom orchestration
Auth complexityStandard credential files, env varsOAuth flows, token management per server
DeploymentTools already installedMCP server must be running and reachable
Reliability100% in Scalekit benchmark72% — 7/25 runs failed (TCP timeouts)
···

The Numbers Don't Lie

Scalekit ran 75 head-to-head comparisons for token efficiency and a separate 25-run reliability test for MCP against GitHub's Copilot server. The results were not close.

32xToken overhead: MCP vs CLI for the same simple taskScalekit benchmark: 44,026 tokens (MCP) vs 1,365 tokens (CLI). The difference is almost entirely schema — 43 tool definitions injected into every conversation.
72%MCP reliability in production benchmarkCLI achieved 100%. MCP: 7 of 25 runs failed with TCP-level connection timeouts on GitHub's Copilot MCP server.
81%Token reduction: Code Mode vs MCP for complex APIsCloudflare benchmark: 2,500 API endpoints as MCP tools = ~244,000 tokens. Via Code Mode = ~1,000 tokens.

The reliability gap matters as much as the token gap. A 72% success rate is not a production-viable reliability posture for any synchronous workflow. The failures were not application errors — they were TCP-level connection timeouts, which means the underlying transport was the failure point. This is a structural problem with long-lived MCP server connections, not a configuration issue.

The token numbers explain why Perplexity moved away. At scale, that 32x difference compounds into meaningful inference cost and, more importantly, meaningful reduction in the context available for actual task work.

···

CORBA, SOAP, and Now MCP

This pattern has a history. In the 1990s, a committee of enterprise software companies designed CORBA — the Common Object Request Broker Architecture — to solve distributed object communication. The problem they identified was real: heterogeneous systems needed to call each other's methods across language and network boundaries. The solution they built was elaborate. CORBA's object adapter API required 200+ lines of interface definitions for functionality that needed approximately 30 lines. ACM Queue documented this in 2006, noting the ceremony-to-function ratio as a primary reason for CORBA's eventual abandonment.

SOAP repeated the pattern in the early 2000s. Microsoft's answer to web services: XML envelopes, WSDL interface description files, strict schemas, code generation pipelines. The problem SOAP addressed — cross-system method invocation over HTTP — was genuine. The solution was ceremonial.

Roy Fielding published his PhD dissertation in 2000. It described REST: use HTTP as it was designed, treat resources as URLs, use verbs as operations. HTTP was already there. REST won.

Three Honest Hypotheses

Why did MCP end up this way? Three hypotheses, none of them flattering, all of them plausible.

Why Anthropic Built a Protocol Instead of Using the Shell

01
The Unix gap

The engineers who designed MCP came predominantly from ML and research backgrounds, not systems and Unix backgrounds. They did not think instinctively in terms of shell pipelines, tool composition, and the Unix philosophy of small tools that do one thing well. They thought in terms of APIs, schemas, and protocols — the vocabulary of the environments they knew. The shell was not invisible to them; it simply was not their first instinct for the integration layer.

02
Protocol as moat

A proprietary protocol, even an "open" one, creates ecosystem gravity. If every tool implements MCP for Claude, switching to another model mid-workflow introduces friction — the new model needs MCP client support. The Linux Foundation donation in December 2025 neutralised this concern in practice, but the incentive existed at design time. A protocol with Claude as the primary client has different strategic value than a bash tool that works with any model.

03
The M×N framing was real, but the solution was wrong

The combinatorial integration explosion problem that MCP was designed to solve is genuine. The mistake was in the solution: build a new protocol layer instead of asking what primitive already solves this. The answer was the shell. Any model with bash access can call any CLI tool. The M×N problem dissolves not through a new protocol but through a shared execution environment that all models can already reason about.

···

Anthropic's Own Product Proves the Point

Claude Code is Anthropic's flagship developer product. It is the company's most visible bet on agentic AI. It ships with a bash tool — direct shell access — as its primary mechanism for interacting with the developer's environment.

Claude Code can run `gh pr create`, `stripe customers list`, `git log --oneline`, `kubectl get pods`. It does all of this without MCP servers, without JSON-RPC, without schema loading, without protocol negotiation. It opens a shell, runs commands, reads output, and reasons about what to do next.

This is not a minor implementation detail. This is the company that invented MCP, in their most-used developer product, making an explicit architectural choice to use the shell instead of their own protocol.

The company that invented MCP built their flagship developer product on a bash tool.
Fordel Studios

The most charitable interpretation is that MCP and bash serve different use cases, and Anthropic chose the right tool for each. That may be correct. The less charitable interpretation is that the engineers building Claude Code — who are closer to the daily reality of agent tool use than the team that designed MCP — made a pragmatic judgment that the protocol they inherited was not the right abstraction for their product.

···

Where MCP Survives and Where It Doesn't

MCP has genuine strengths in specific scenarios. Multi-tenant SaaS is the clearest case: when an agent needs to act on behalf of different users, each with their own credentials and access scopes, MCP's OAuth-per-user model is structurally correct. The CLI alternative — switching credential files per user — is workable but clunky at scale.

Dynamic tool discovery is another legitimate use case. If an agent needs to discover new tools at runtime without a redeploy, MCP's discovery mechanism has no obvious CLI equivalent. APIs with no CLI coverage are a third case where MCP may be the only practical option.

Where MCP fails: production agent pipelines where token cost compounds at volume, latency-sensitive workflows where server startup and schema loading add measurable overhead, and any deployment with a fixed, known toolset where the dynamic discovery benefit does not apply.

ScenarioBetter ApproachReason
Single-agent, known tools with CLICLIZero initialization cost, full model familiarity
Multi-tenant SaaS, per-user authMCPOAuth-per-user is structurally correct
Latency-sensitive pipelineCLINo server startup, no schema loading
Fixed toolset, no new tools at runtimeCLIDynamic discovery adds cost with no benefit
API with no CLI coverageMCP or direct API callNo CLI alternative exists
Dynamic tool discoveryMCPProtocol handles this; CLI does not

The security picture adds weight to the CLI side. Docker's analysis of open-source MCP servers (Docker blog: MCP Security Issues Threatening AI Infrastructure) found that 43% have command injection vulnerabilities and 43% have flawed OAuth authentication flows. These are not edge cases — they represent structural problems with how MCP server authors are handling two of the hardest security problems in software. CLI security is not perfect, but its threat model is well-understood and its failure modes are documented by decades of practice.

How We Actually Build

At Fordel, CLI is the default. The burden of proof is on MCP, not on the shell. If a tool has a mature CLI — GitHub, Stripe, AWS, Cloudflare, PostgreSQL, Kubernetes — we use the CLI. Zero initialization cost, full model familiarity, high reliability. The model already knows these tools. We do not need to teach it.

MCP earns its way in on two conditions, both of which are genuine exceptions rather than defaults. First: multi-tenant SaaS where the agent acts on behalf of distinct end users, each with their own OAuth scope — at that point, CLI credential-switching becomes clunky enough that MCP's per-user auth model is structurally correct. Second: a system that has no CLI at all, where neither a shell command nor a direct API call is practical. Both cases exist. Neither is common.

Every other integration decision starts and ends with the shell. Not because it is familiar, but because it is the simplest mechanism that meets the security and reliability requirements. Simplest wins. It always has.

The engineers building the best AI agents in 2026 know their Unix tools as well as their LLM APIs. That is not a coincidence.

Keep Exploring

Related services, agents, and capabilities

Services
01
AI Agent DevelopmentAgents that ship to production — not just pass a demo.
02
API Design & IntegrationAPIs that AI agents can call reliably — and humans can maintain.
03
Full-Stack EngineeringAI-native product engineering — the 100x narrative meets production reality.
Capabilities
04
AI Agent DevelopmentAutonomous systems that act, not just answer
05
AI/ML IntegrationAI that works in production, not just in notebooks
06
Backend DevelopmentThe infrastructure that makes AI-powered systems reliable
Industries
07
SaaSThe SaaSocalypse narrative is real and it is not done. Cursor with Claude built Anysphere into a $2.5B company selling to developers who used to pay for multiple separate tools. Bolt, Lovable, and Replit Agent are letting non-engineers ship MVPs in hours. Zero-seat software is emerging — AI agents as the only users of your API, with no human seat count to price against. The "wrapper problem" is killing thin AI wrappers with no moat. Single-person billion-dollar companies are no longer theoretical. Vertical AI is eating horizontal SaaS in category after category. And the great SaaS repricing is underway: customers are refusing to renew at legacy prices when AI does the same job for less.
08
FinanceAI-first neobanks are emerging. Bloomberg GPT and domain-specific financial LLMs are in production. Upstart and Zest AI are disrupting FICO-based credit scoring. Deepfake voice fraud is hitting bank call centers at scale. The RegTech market is heading toward $20B+ as compliance automation replaces compliance headcount. JP Morgan's LOXM and Goldman's AI initiatives are setting expectations for what institutional-grade financial AI looks like — and the compliance infrastructure required to deploy it.
09
LegalGPT-4 scored in the 90th percentile on the bar exam. Lawyers have been sanctioned for citing AI-hallucinated cases in federal court. Harvey AI raised over $100M and partnered with BigLaw. CoCounsel was acquired by Thomson Reuters. The "robot lawyers" debate is live, the billable hour death spiral is real, and the firms that figure out new pricing models before their clients force the issue will define the next decade of legal services.