Prompt Coupling is the New Vendor Lock-in
What Happened
A new paper formalizes prompt coupling — the invisible switching cost that makes your prompts inseparable from specific LLMs. Format-level dependencies cause up to 78.3pp accuracy variation. No existing tool addresses it.
Our Take
Prompts are not portable. A formatting change as minor as removing colons from field delimiters causes up to 78.3 percentage points of accuracy swing on LLaMA-2-13B. The same prompt performs 300% differently on GPT-4-32k depending on whether it is JSON or plain text. Abhishek Sharma's paper formalizes this as prompt coupling — borrowing Constantine's 1974 software engineering coupling taxonomy and applying it to LLM-based systems.
The practical implication: every agentic tool you use (Claude Code, Cursor, Aider) deepens coupling at four simultaneous layers — encoding, interpretation, system wrapping, and execution. Analysis of Aider's 134 production model configurations confirms the pattern. None of the 10+ prompt optimization tools surveyed (DSPy, Prompty, PromptLayer) performs format-level cross-model compilation. The gap is explicit and unaddressed.
The paper introduces promptc — a transparent HTTP proxy that compiles prompts to model-specific formats without code changes to existing tools. With 75% of enterprises running multi-model strategies by mid-2026 (Gartner), prompt coupling is moving from theoretical to a real switching cost.
What To Do
Use DSPy for content optimization but account for format coupling separately — run identical tasks on both Claude and GPT-4 with model-native formatting before picking a primary model, because switching later costs more than you think.
Cited By
React
Get the weekly AI digest
The stories that matter, with a builder's perspective. Every Thursday.