The Invisible Lock-In
Every engineering team building on LLMs has hit this wall: prompts carefully tuned for Claude break when you switch to GPT-4. Not because the instructions are wrong — because the structural format is wrong. Claude excels with XML tags for section delineation. GPT-4 prefers Markdown headers. Gemini has its own preferences. What looks like a prompting skill issue is actually a structural coupling problem that creates real vendor lock-in.
This is not a minor inconvenience. Research from Sclar et al. (ICLR 2024) demonstrated that removing colons from a prompt template swings LLaMA-2-13B accuracy from 82.6% to 4.3% — a 78 percentage point drop from punctuation alone. Separate research from He et al. found that the optimal prompt format for one model family overlaps less than 20% with the optimal format for another. Your prompts are not portable. They are coupled to their target model at a structural level.
The Pricing Dimension
This coupling problem has a direct cost dimension. LLM pricing spans a 200x range — from Gemini Flash-Lite at /bin/zsh.075 per million tokens to Claude Opus 4 at per million tokens. Teams running multi-model deployments route simple tasks to cheap models and complex tasks to expensive ones. But every prompt tuned for one model must be re-optimized for another. The engineering debt accumulates silently until someone proposes switching providers and discovers that "switching" means rewriting hundreds of prompts.
| Model | Price per 1M Input Tokens | Preferred Format | Switching Cost |
|---|---|---|---|
| Claude Opus 4 | .00 | XML tags, structured sections | High — XML-specific prompts |
| GPT-4o | .50 | Markdown, natural language | Medium — Markdown restructuring |
| Gemini 2.5 Pro | .25-.50 | Mixed, fewer structural preferences | Low-medium |
| Gemini Flash-Lite | /bin/zsh.075 | Simple, direct instructions | Low — but capabilities limited |
| DeepSeek V3 | /bin/zsh.27 | Markdown, structured | Medium |
How the Coupling Manifests
The coupling is visible in production systems if you know where to look. Aider, the popular AI coding assistant, maintains 313 model-specific configurations in a 2,718-line YAML file. The instructions are often contradictory across models — what helps Claude hurts GPT-4, and vice versa. Claude Code only works with Claude by default; third-party developers have built proxy layers and monkey-patches to make it work with other models. Cursor's official guidance when a prompt fails? "Switch models and try again."
These are not bugs. They are symptoms of structural prompt coupling — the fact that each model has a distinct "behavioral grammar" that determines how it interprets structural elements like delimiters, section ordering, whitespace, and markup.
Formalizing the Problem: Coupling Taxonomy
Applying Larry Constantine's 1974 software coupling taxonomy to prompt engineering reveals six levels of prompt coupling, from most to least portable:
The Six Levels of Prompt Coupling
Prompt passes only necessary data to the model. No structural assumptions. Works across any model. Example: "Translate this to French: [text]"
Prompt uses structured data formats (JSON, XML) that the model must parse. Some models handle nested structures better than others.
Prompt includes instructions that control model behavior — system prompts, role assignments, behavioral constraints. These are moderately model-specific.
Prompt references external context (tool schemas, API definitions, MCP resources) that the model must interpret. Highly sensitive to model capabilities.
Multiple prompts share global state or context that is formatted for a specific model. Changing one prompt requires updating all related prompts.
Prompt relies on model-specific internal behaviors — exploiting known biases, specific tokenization patterns, or undocumented formatting preferences. Maximum vendor lock-in.
Existing Solutions and Their Gaps
Several tools address parts of this problem, but none solve structural coupling directly:
| Tool | What It Does | Does It Solve Structural Coupling? |
|---|---|---|
| DSPy | Optimizes prompt content via compilation | No — optimizes what you say, not how you present it |
| Guidance / Outlines | Constrains output format | No — constrains outputs, not inputs |
| LMQL | Query language for LLM interaction | Partial — abstracts some format details |
| PromptLayer | Prompt versioning and A/B testing | No — tracks prompts but does not transform them |
| LiteLLM | Unified API across providers | API-level only — does not transform prompt structure |
The gap is clear: tools either optimize prompt content (what you say) or unify the API layer (how you call the model), but none optimize prompt structure (how you present instructions to each model).
promptc: A Structural Prompt Compiler
To address this gap, we built promptc — a transparent HTTP proxy that intercepts LLM API calls and rewrites prompt structure based on the target model's preferences. It sits between your client and the API endpoint, adapting structural presentation without changing semantic content.
The architecture uses a two-pass pipeline. The first pass is deterministic: structural transforms based on model profiles — XML to Markdown conversion, section reordering, delimiter changes. The second pass (optional) uses a local Ollama model to rewrite prompts to match target behavioral patterns while preserving intent. SQLite-based caching ensures that repeated prompts are transformed once and served from cache.
- Transparent HTTP proxy — drop-in replacement, no code changes required
- Model profile system — extensible format preferences per model family
- Two-pass pipeline — deterministic structural transform + optional semantic rewrite
- SQLite caching — transform once, serve from cache for repeated prompts
- Multi-turn session tracking — maintains context across conversation turns
- Web dashboard — monitor transformation activity and cache hit rates
- MIT licensed, open source: github.com/shakecodeslikecray/promptc
The paper accompanying this work — "Prompt Coupling: Formalizing Cross-Model Prompt Dependencies in Large Language Model Systems" (Sharma, 2026) — applies Constantine's coupling taxonomy formally to prompt engineering and proposes structural decoupling as an architectural pattern for multi-model AI systems.
Practical Implications
For engineering teams building production AI systems, the implications are immediate. If you are writing prompts that use model-specific formatting (XML tags for Claude, Markdown for GPT-4), you are accumulating switching costs whether you realize it or not. Every new prompt deepens the lock-in. Every prompt optimization for a specific model makes portability harder.
The organizations that will navigate the next wave of model releases — and the inevitable pricing changes, capability shifts, and deprecations — are the ones that treat prompt structure as an abstraction layer, not a hardcoded implementation detail.
“Your prompts are not portable. They are coupled to their target model at a structural level. The question is not whether this matters — it is whether you discover it during a planned migration or during an emergency when your provider changes pricing, deprecates a model, or experiences an outage.”