Getting Reliable Structured Output from AI Agents in 2026

March 22, 2026 · Editorial Team · 8 min read · ai-agents prompt-engineering developer

Getting a language model to return valid JSON sounds like it should be solved by now. It's 2026. The models are phenomenally capable. And yet, anyone who's shipped a production system that relies on structured model output has their own stories: the model that wraps JSON in a markdown code block half the time, the one that drops a required field under load, the one that adds a helpful "Here's your data:" prefix that breaks your parser.

Structured output isn't just a prompting problem. It's an architecture problem. This guide covers what actually works, why some approaches fail in ways that aren't obvious, and the practical patterns worth building with.

Why free-form output breaks production systems

Before getting to solutions, it's worth being clear about the failure modes.

The basic issue: language models are trained to produce human-readable text. JSON, XML, and other structured formats are technically valid outputs, but the model's learned distribution pulls toward natural language. Structured output requires fighting that pull, and the fight isn't always won.

The specific failures:

Format drift under long contexts. A model that reliably returns valid JSON for short prompts will start producing malformed output as context length grows. The format instruction at the top of the system prompt has less weight when the model is thousands of tokens into a response.

Extra verbiage. The model wants to be helpful and conversational. "Here is the extracted data:" or "I've formatted this as JSON:" or a closing "Let me know if you need anything else!" will appear before or after the structured content, breaking parsers that expect pure JSON.

Hallucinated schema variation. You asked for {"name": "string", "age": number}. The model decides that {"fullName": "string", "ageYears": number} is more descriptive and returns that instead. The model is right that it's more descriptive. Your code doesn't know that field.

Partial output truncation. Under rate limits, token constraints, or infrastructure pressure, output gets cut off mid-JSON. You get a valid-looking beginning and a parser error at the end.

Type coercion surprises. You asked for a number. The model returns "42" (a string). You asked for an array. The model returns a single item without array brackets because there was only one result.

JSON mode: what it does and what it doesn't do

All major LLM APIs now offer some form of JSON mode or structured output constraint. The terminology varies but the concept is the same: the API guarantees that the model's output will be valid JSON.

OpenAI offers response_format: { type: "json_object" } (guarantees valid JSON) and the newer structured outputs with explicit schema (response_format: { type: "json_schema", json_schema: {...} }) that constrains output to a specific schema.

Anthropic's Claude API doesn't have a native JSON mode in the same sense, but it reliably follows structured format instructions and supports tool use with schemas. The tools parameter with a single tool definition is the common pattern for getting schema-constrained output.

Google's Gemini API has responseMimeType: "application/json" plus a responseSchema parameter.

What JSON mode actually guarantees: you get syntactically valid JSON. What it doesn't guarantee: the JSON matches the schema you intended. You can still get missing fields, extra fields, and wrong types if you only use basic JSON mode without an explicit schema.

The structured output endpoints that accept an explicit schema (OpenAI's json_schema mode, Gemini's responseSchema) are meaningfully stronger. The tokenizer-level constraints force the model to produce output that matches the schema structure. Field names come from your schema. Types are enforced.

XML tags: the underrated approach

Before JSON mode existed, the standard technique for getting structured output from Claude was XML tags. Anthropic built Claude to be particularly good at XML-delimited output because the training data included a lot of XML and because XML's explicit open/close tags make it easy for models to track structure.

The pattern:

Extract the following fields from the contract text.
Wrap each field in XML tags exactly as shown:

<party_a>name of first party</party_a>
<party_b>name of second party</party_b>
<effective_date>ISO 8601 date</effective_date>
<term_months>integer number of months</term_months>
<jurisdiction>governing law jurisdiction</jurisdiction>

XML tags work well for several reasons. They're harder to accidentally include extra text inside (you'd have to add a new tag). The model has strong learned associations between XML structure and structured data extraction. And you can parse them with simple string operations without a full JSON parser:

import re

def extract_xml_field(text: str, field: str) -> str:
    match = re.search(f"<{field}>(.*?)</{field}>", text, re.DOTALL)
    return match.group(1).strip() if match else ""

XML tags shine for extraction tasks where the output is a flat set of named fields. They're less elegant for nested objects or arrays, where JSON ends up being cleaner.

Zod schemas for TypeScript consumers

If you're building in TypeScript, Zod gives you a clean pattern for defining the schema once and using it both as a prompt instruction and as a runtime validator.

import { z } from "zod";

const DealSchema = z.object({
  company: z.string(),
  arr: z.number().nonnegative(),
  stage: z.enum(["prospecting", "demo", "proposal", "closed_won", "closed_lost"]),
  next_action: z.string(),
  close_date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
});

type Deal = z.infer<typeof DealSchema>;

You can serialize this schema to JSON Schema format and include it in your prompt:

import zodToJsonSchema from "zod-to-json-schema";

const jsonSchema = zodToJsonSchema(DealSchema, "DealSchema");
const schemaString = JSON.stringify(jsonSchema, null, 2);

const systemPrompt = `You are a CRM data extraction assistant.
Extract deal information and return it as JSON matching this schema:
${schemaString}`;

After getting the model's response, validate it before using it:

const rawOutput = await callLLM(systemPrompt, userMessage);
const parsed = JSON.parse(rawOutput);
const validated = DealSchema.safeParse(parsed);

if (!validated.success) {
  // handle validation failure: retry, log, or fall back
  console.error("Schema validation failed:", validated.error);
}

The safeParse call is the critical piece. Don't trust the model's output directly even if you're using JSON mode. The model might return a valid JSON object that doesn't match your schema. Zod validation at the application layer catches this and gives you type safety.

For nested schemas, Zod's composition works naturally:

const ContactSchema = z.object({
  name: z.string(),
  email: z.string().email(),
  role: z.string(),
});

const CompanySchema = z.object({
  name: z.string(),
  domain: z.string(),
  primary_contact: ContactSchema,
  additional_contacts: z.array(ContactSchema).max(5),
});

Pydantic for Python consumers

The Python equivalent is Pydantic, which integrates directly with the OpenAI and Anthropic SDKs:

from pydantic import BaseModel, Field
from typing import Literal, List

class DealRecord(BaseModel):
    company: str
    arr: float = Field(ge=0)
    stage: Literal["prospecting", "demo", "proposal", "closed_won", "closed_lost"]
    next_action: str
    close_date: str  # YYYY-MM-DD

With the OpenAI SDK, you can use parse() directly with Pydantic models:

from openai import OpenAI

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract deal information from the CRM note."},
        {"role": "user", "content": crm_note},
    ],
    response_format=DealRecord,
)

deal = response.choices[0].message.parsed
# deal is now a validated DealRecord instance

For the Anthropic SDK, the pattern is slightly different: you use a tool definition that matches your Pydantic model:

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=[{
        "name": "extract_deal",
        "description": "Extract deal information from the text",
        "input_schema": DealRecord.model_json_schema(),
    }],
    tool_choice={"type": "tool", "name": "extract_deal"},
    messages=[{"role": "user", "content": crm_note}],
)

tool_use = response.content[0]
deal = DealRecord(**tool_use.input)

Retry patterns for production

Even with schema-constrained endpoints, validation failures happen. A production-grade implementation needs a retry strategy.

The naive approach: retry the same call on failure. This works for transient errors (rate limits, network issues) but doesn't help for systematic errors (the schema is unclear, the model consistently misinterprets a field).

A better approach: retry with failure context. Include the validation error in the retry prompt:

def extract_with_retry(text: str, schema: type[BaseModel], max_retries: int = 3) -> BaseModel:
    last_error = None
    for attempt in range(max_retries):
        try:
            raw = call_llm(text)
            parsed = schema.model_validate_json(raw)
            return parsed
        except Exception as e:
            last_error = e
            if attempt < max_retries - 1:
                text = f"""Previous attempt failed with error: {str(e)}

Original text: {text}

Please correct your output to match the required schema exactly."""
    raise ValueError(f"Failed after {max_retries} attempts. Last error: {last_error}")

Including the error message in the retry prompt helps the model understand what went wrong. If validation failed because close_date was returned as "March 15, 2026" instead of "2026-03-15", the retry prompt with that context will usually produce the correct format on the second attempt.

Common failure patterns and fixes

The markdown code block problem. The model wraps its JSON in a code block:

```json
{"name": "Acme", "arr": 50000}


Fix: add explicit instruction "Return only the JSON with no code block formatting, no backticks, no explanatory text." Or just strip code block markers before parsing:

```python
import re
def strip_code_block(text: str) -> str:
    return re.sub(r"```(?:json)?\n?(.*?)\n?```", r"\1", text, flags=re.DOTALL).strip()

Missing optional fields. The model omits a field when it has no data, but your schema expects null or an empty string.

Fix: In your prompt, be explicit: "If a field has no value, return null for that field. Do not omit fields from the response."

Number as string. Model returns "42" instead of 42.

Fix: Pydantic's default behavior is to coerce strings to numbers when the field is typed as int or float, so this often handles itself. With Zod, use z.coerce.number() instead of z.number() to enable coercion.

Array with single item. Model returns {"tags": "marketing"} instead of {"tags": ["marketing"]}.

Fix: State explicitly in the prompt that array fields should always use array notation, even for a single item. "The tags field must always be an array, even if there is only one tag."

Choosing your approach

For greenfield TypeScript projects, use OpenAI's structured outputs with a Zod schema, or tool use with schema for Anthropic. Both give you type safety from definition through runtime validation.

For greenfield Python, use Pydantic with the OpenAI parse() endpoint or Anthropic's tool use. Instructor (a library by Jason Liu) is also worth looking at; it wraps multiple provider APIs with a unified structured output interface and handles retries.

For extraction tasks on Claude where you want simplicity, XML tags are still a solid choice, especially for flat schemas with 5-10 fields.

For any production system: always validate at the application layer regardless of which structured output mode you use. API-level schema constraints reduce failure rates dramatically but they don't eliminate them. Your Zod .safeParse() or Pydantic .model_validate() is the last line of defense.

The fundamental shift in 2025-2026 is that structured output has moved from "a hard problem" to "a solved problem with known patterns." The models, the APIs, and the validation libraries have all matured to a point where you can build reliable structured output into production systems without heroic effort. You just need to actually use the patterns rather than relying on prompting alone.

For related reading, the AI agent architecture guide covers how structured output fits into broader agent design. If you're building extraction pipelines specifically, the document intelligence agents guide gets into multi-step extraction patterns.