Skip to main content
Research
Engineering & AI12 min read min read

The Vendor Lock-In Hidden in Your AI Prompts

Your prompts are structurally coupled to specific models. When Claude prefers XML tags and GPT-4 prefers Markdown, switching providers means rewriting every prompt in your system. This is vendor lock-in hiding in plain text — and research shows format alone can swing accuracy by 78 percentage points.

AuthorAbhishek Sharma· Fordel Studios

The Invisible Lock-In

Every engineering team building on LLMs has hit this wall: prompts carefully tuned for Claude break when you switch to GPT-4. Not because the instructions are wrong — because the structural format is wrong. Claude excels with XML tags for section delineation. GPT-4 prefers Markdown headers. Gemini has its own preferences. What looks like a prompting skill issue is actually a structural coupling problem that creates real vendor lock-in.

This is not a minor inconvenience. Research from Sclar et al. (ICLR 2024) demonstrated that removing colons from a prompt template swings LLaMA-2-13B accuracy from 82.6% to 4.3% — a 78 percentage point drop from punctuation alone. Separate research from He et al. found that the optimal prompt format for one model family overlaps less than 20% with the optimal format for another. Your prompts are not portable. They are coupled to their target model at a structural level.

78.3ppAccuracy variation from prompt format changes aloneSclar et al., ICLR 2024 — LLaMA-2-13B
···

The Pricing Dimension

This coupling problem has a direct cost dimension. LLM pricing spans a 200x range — from Gemini Flash-Lite at /bin/zsh.075 per million tokens to Claude Opus 4 at per million tokens. Teams running multi-model deployments route simple tasks to cheap models and complex tasks to expensive ones. But every prompt tuned for one model must be re-optimized for another. The engineering debt accumulates silently until someone proposes switching providers and discovers that "switching" means rewriting hundreds of prompts.

ModelPrice per 1M Input TokensPreferred FormatSwitching Cost
Claude Opus 4.00XML tags, structured sectionsHigh — XML-specific prompts
GPT-4o.50Markdown, natural languageMedium — Markdown restructuring
Gemini 2.5 Pro.25-.50Mixed, fewer structural preferencesLow-medium
Gemini Flash-Lite/bin/zsh.075Simple, direct instructionsLow — but capabilities limited
DeepSeek V3/bin/zsh.27Markdown, structuredMedium

How the Coupling Manifests

The coupling is visible in production systems if you know where to look. Aider, the popular AI coding assistant, maintains 313 model-specific configurations in a 2,718-line YAML file. The instructions are often contradictory across models — what helps Claude hurts GPT-4, and vice versa. Claude Code only works with Claude by default; third-party developers have built proxy layers and monkey-patches to make it work with other models. Cursor's official guidance when a prompt fails? "Switch models and try again."

These are not bugs. They are symptoms of structural prompt coupling — the fact that each model has a distinct "behavioral grammar" that determines how it interprets structural elements like delimiters, section ordering, whitespace, and markup.

Formalizing the Problem: Coupling Taxonomy

Applying Larry Constantine's 1974 software coupling taxonomy to prompt engineering reveals six levels of prompt coupling, from most to least portable:

The Six Levels of Prompt Coupling

01
Data Coupling (most portable)

Prompt passes only necessary data to the model. No structural assumptions. Works across any model. Example: "Translate this to French: [text]"

02
Stamp Coupling

Prompt uses structured data formats (JSON, XML) that the model must parse. Some models handle nested structures better than others.

03
Control Coupling

Prompt includes instructions that control model behavior — system prompts, role assignments, behavioral constraints. These are moderately model-specific.

04
External Coupling

Prompt references external context (tool schemas, API definitions, MCP resources) that the model must interpret. Highly sensitive to model capabilities.

05
Common Coupling

Multiple prompts share global state or context that is formatted for a specific model. Changing one prompt requires updating all related prompts.

06
Content Coupling (least portable)

Prompt relies on model-specific internal behaviors — exploiting known biases, specific tokenization patterns, or undocumented formatting preferences. Maximum vendor lock-in.

···

Existing Solutions and Their Gaps

Several tools address parts of this problem, but none solve structural coupling directly:

ToolWhat It DoesDoes It Solve Structural Coupling?
DSPyOptimizes prompt content via compilationNo — optimizes what you say, not how you present it
Guidance / OutlinesConstrains output formatNo — constrains outputs, not inputs
LMQLQuery language for LLM interactionPartial — abstracts some format details
PromptLayerPrompt versioning and A/B testingNo — tracks prompts but does not transform them
LiteLLMUnified API across providersAPI-level only — does not transform prompt structure

The gap is clear: tools either optimize prompt content (what you say) or unify the API layer (how you call the model), but none optimize prompt structure (how you present instructions to each model).

promptc: A Structural Prompt Compiler

To address this gap, we built promptc — a transparent HTTP proxy that intercepts LLM API calls and rewrites prompt structure based on the target model's preferences. It sits between your client and the API endpoint, adapting structural presentation without changing semantic content.

The architecture uses a two-pass pipeline. The first pass is deterministic: structural transforms based on model profiles — XML to Markdown conversion, section reordering, delimiter changes. The second pass (optional) uses a local Ollama model to rewrite prompts to match target behavioral patterns while preserving intent. SQLite-based caching ensures that repeated prompts are transformed once and served from cache.

promptc Key Features
  • Transparent HTTP proxy — drop-in replacement, no code changes required
  • Model profile system — extensible format preferences per model family
  • Two-pass pipeline — deterministic structural transform + optional semantic rewrite
  • SQLite caching — transform once, serve from cache for repeated prompts
  • Multi-turn session tracking — maintains context across conversation turns
  • Web dashboard — monitor transformation activity and cache hit rates
  • MIT licensed, open source: github.com/shakecodeslikecray/promptc

The paper accompanying this work — "Prompt Coupling: Formalizing Cross-Model Prompt Dependencies in Large Language Model Systems" (Sharma, 2026) — applies Constantine's coupling taxonomy formally to prompt engineering and proposes structural decoupling as an architectural pattern for multi-model AI systems.

Practical Implications

For engineering teams building production AI systems, the implications are immediate. If you are writing prompts that use model-specific formatting (XML tags for Claude, Markdown for GPT-4), you are accumulating switching costs whether you realize it or not. Every new prompt deepens the lock-in. Every prompt optimization for a specific model makes portability harder.

The organizations that will navigate the next wave of model releases — and the inevitable pricing changes, capability shifts, and deprecations — are the ones that treat prompt structure as an abstraction layer, not a hardcoded implementation detail.

Your prompts are not portable. They are coupled to their target model at a structural level. The question is not whether this matters — it is whether you discover it during a planned migration or during an emergency when your provider changes pricing, deprecates a model, or experiences an outage.
Abhishek Sharma, Fordel Studios