Back to Research
AI Engineering2025-11-12·7 min read read

Structured Outputs Changed How We Build AI Features

structured outputsjson modetype safetyai engineering
Structured Outputs Changed How We Build AI Features

For eighteen months of building AI features, our production incident log had a recurring theme: parsing failures. The LLM returned "The category is: Marketing" instead of just "Marketing." The model added a preamble before the JSON. The extraction output used single quotes instead of double quotes. The response included markdown code fences around the JSON block. Every week, a new variation.

We tried everything to wrangle free-text responses. Regex patterns to extract JSON from prose. Stern prompt instructions to return ONLY valid JSON, which worked 95% of the time, meaning it failed thousands of times per month at scale. Retry logic that re-prompted on parse failure. A custom JSON repair utility that fixed common formatting issues. Each was a bandaid on the fundamental problem: asking a language model to format its output correctly is asking it to do two jobs at once, and the formatting job is the one it cares about least.

Structured output modes changed everything. OpenAI, Anthropic, and Google all now offer guaranteed structured output through JSON mode, tool use with schema enforcement, or response format constraints. The provider's inference engine constrains token generation to only produce valid outputs matching your schema. Not usually valid. Guaranteed valid. We migrated every production feature over three months. Parsing incidents dropped from twelve per month to zero. Not nearly zero. Literally zero.

Our implementation uses Zod schemas as the single source of truth. Define the expected output as a Zod schema, convert it to JSON Schema for the API call, and validate the response with Zod on return. This gives compile-time TypeScript safety, runtime validation, and a clear contract between application code and the AI layer. No string manipulation, no regex, no JSON.parse wrapped in try-catch.

Beyond eliminating failures, structured outputs improved accuracy by 2-5 percentage points on classification and extraction tasks. Our theory: the model wastes fewer tokens on formatting preambles and dedicates more capacity to the actual task. The schema also serves as documentation. New developers can look at the Zod schema and immediately understand what the model produces.

One pattern we use extensively: nested schemas for multi-step reasoning. Instead of free-text chain-of-thought, we define schemas with explicit reasoning, confidence, and result fields. This forces the model to separate reasoning from conclusions and gives us typed confidence scores we can threshold on programmatically.

Our rule of thumb: if the AI output feeds into application logic, use structured outputs with no exceptions. If the output is displayed directly to a human, use free-form text. About 70% of our features fall into the first category.

About the Author

Fordel Studios

AI-native app development for startups and growing teams. 14+ years of experience shipping production software.

Want to discuss this further?

We love talking shop. If this article resonated, let's connect.

Start a Conversation

Ready to build
something real?

Tell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.

Start a Conversation