Structured Outputs vs. Function Calling: Which Should Your Agent Use?
What Happened
Language models (LMs), at their core, are text-in and text-out systems.
Our Take
here's the thing: if you want reliable automation, ditch the free-form text and use function calling. the models are notoriously bad at self-correcting complex, multi-step instructions in natural language. forcing them into a structured output—JSON, XML—forces predictable behavior. it's the difference between hoping an API call works and knowing it's designed to work.
most agents fail because they try to reason about tool usage organically. they don't. they need strict contractual interfaces. use function calling for concrete actions, and use the LLM for high-level planning only. treat the LM as the planner, and the function call as the execution layer.
we're not talking about cutting-edge magic here; we're talking about solidifying the interface. if you're building an agent, treat the LLM like a mediocre project manager and the tools like strict, non-negotiable task definitions.
actionable: mandate that all agent tool outputs must adhere to a strict, validated schema.
impact:high
What To Do
Check back for our analysis.
Perspectives
2 modelsOpenAI now supports JSON mode and structured outputs in models like gpt-4o, allowing schema-constrained responses without function calling. This works reliably for fixed-response APIs like returning product IDs or config flags. Structured outputs reduce latency and cost—by up to 40% on token-heavy workflows—compared to function calling, which forces extra round trips. Relying on function calling for simple data extraction in agents is overkill. Running Opus for simple classification is just burning money. Use structured outputs for RAG pipelines or agent state management where response shape is known. Skip function calling unless you need dynamic external tool invocation. Do use gpt-4o with JSON mode instead of function calling for static schema extraction because it cuts latency and costs by avoiding extra API round trips.
→ Do use gpt-4o with JSON mode instead of function calling for static schema extraction because it cuts latency and costs by avoiding extra API round trips.
The shift from function calling to structured outputs changes the cost profile of agentic workflows. Function calling relies on prompt engineering for output parsing, while structured outputs enforce schema integrity directly in the LLM response. Agents using unstructured function calls increase the likelihood of invalid JSON parsing errors, often forcing expensive human review of failed RAG retrieval attempts. Running models like Claude 3 Opus for simple classification is just burning money. Agents fail when they attempt to stitch together dynamic function outputs without a rigid schema. Expect $0.05 per token for structured JSON output, versus potentially $0.10 per token for debugging faulty text parsing in an agent workflow. Implement Pydantic schemas for all agent tool definitions instead of relying on descriptive function signatures because this minimizes downstream parsing errors and stabilizes the entire chain. This forces correctness at the input layer rather than catching errors at the output layer.
Cited By
React
Get the weekly AI digest
The stories that matter, with a builder's perspective. Every Thursday.