Agentbrisk

Structured Outputs from LLMs in 2026: JSON Schema, Pydantic, Zod Compared

March 15, 2026 · Editorial Team · 6 min read · structured-outputsapijson-schema

Getting an LLM to return valid JSON isn't hard. Getting it to return valid JSON that matches a specific schema reliably, across thousands of requests, with nested objects and specific field types, is a problem that every team building production applications has to solve.

The approaches available in 2026 are significantly better than they were two years ago. Constrained decoding, native JSON modes, and Pydantic/Zod integrations have made structured output much more reliable. But the approaches differ meaningfully across providers, and the right choice depends on what language you're using, which provider you're on, and how strict your schema requirements are.


Why "just tell it to output JSON" isn't enough

The naive approach is to include "respond in JSON" in your system prompt and hope the model complies. This works sometimes. It fails in a few predictable ways:

  • The model adds explanatory text before or after the JSON block
  • The JSON is valid syntax but wrong schema (wrong field names, wrong types, missing required fields)
  • Very long outputs produce valid JSON that gets cut off mid-stream by token limits
  • The model puts the JSON in a markdown code block (json ... ) which needs to be stripped before parsing

For occasional personal use, these failures are tolerable. For a production pipeline that processes thousands of inputs per day, a 2% JSON parse failure rate means manual review queues or broken automations.


Constrained decoding: the reliable solution

The underlying technology that makes structured output actually reliable is constrained decoding. Rather than hoping the model produces valid JSON, the inference engine constrains the model's token choices at each step so that the output is guaranteed to be valid against a given schema.

At each generation step, the engine computes a mask over the vocabulary based on what tokens are legal at the current position in the output given the schema. Only legal tokens get probability mass. The model's logits are multiplied by this mask before sampling. The output is guaranteed valid by construction.

This works. Reliability on valid schema output goes from 95-98% with prompt-based approaches to 99.9%+ with constrained decoding. The downside is that constrained decoding requires either direct access to the inference engine (which most API providers don't give you) or a provider that implements it server-side.


OpenAI: structured outputs and JSON mode

OpenAI has two relevant features:

JSON mode (response_format: {type: "json_object"}): Forces the model to output valid JSON. Doesn't validate against a specific schema, but guarantees valid parseable JSON syntax. Available on GPT-4o and GPT-4o-mini.

Structured outputs (response_format: {type: "json_schema", json_schema: {...}}): Forces output that conforms to a specific JSON schema, implemented with constrained decoding. This is the strong guarantee. Available on GPT-4o and newer models.

The Python SDK provides clean integration with Pydantic:

from pydantic import BaseModel
from openai import OpenAI

class UserProfile(BaseModel):
    name: str
    email: str
    age: int
    tags: list[str]

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-11-20",
    messages=[{"role": "user", "content": "Extract user info: John Doe, 34, [email protected], prefers mobile"}],
    response_format=UserProfile,
)

user = response.choices[0].message.parsed
print(user.name)  # John Doe
print(type(user.age))  # <class 'int'>

The .parse() method returns a typed Python object, not raw JSON. Type validation is done for you. This is production-ready.

OpenAI's structured outputs support a subset of JSON Schema. Not all schema features work: anyOf, complex references, and some validators have limitations. Check the OpenAI documentation for the current supported subset.

For TypeScript/JavaScript, the Zod integration:

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";

const UserProfile = z.object({
  name: z.string(),
  email: z.string().email(),
  age: z.number().int().min(0),
  tags: z.array(z.string()),
});

const client = new OpenAI();

const response = await client.beta.chat.completions.parse({
  model: "gpt-4o-2024-11-20",
  messages: [{ role: "user", content: "Extract user info: ..." }],
  response_format: zodResponseFormat(UserProfile, "user_profile"),
});

const user = response.choices[0].message.parsed;
// user is typed as z.infer<typeof UserProfile>

Anthropic: tool forcing as structured output

Anthropic's approach to structured output is via tool calling rather than a native JSON schema mode. You define a tool with the schema you want as its input schema, set tool_choice to force that specific tool, and the model must return a valid tool call matching the schema.

This approach works reliably and is well-supported, but it's a bit indirect conceptually.

import anthropic
from pydantic import BaseModel
import json

class ArticleSummary(BaseModel):
    title: str
    key_points: list[str]
    word_count_estimate: int
    sentiment: str

client = anthropic.Anthropic()

schema = ArticleSummary.model_json_schema()

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=[{
        "name": "return_summary",
        "description": "Return the structured article summary",
        "input_schema": schema
    }],
    tool_choice={"type": "tool", "name": "return_summary"},
    messages=[{"role": "user", "content": f"Summarize this article: {article_text}"}]
)

tool_call = next(b for b in response.content if b.type == "tool_use")
summary = ArticleSummary(**tool_call.input)

Anthropic's tool calling schema validation is strict: the model's output is validated against the schema, and malformed responses are rejected. In practice this gives you the same reliability as OpenAI's constrained decoding, just via a different mechanism.

The main practical difference: tool forcing doesn't support streaming as cleanly. If you need to stream structured output and display partial results as they arrive, OpenAI's approach works better for that use case.


Google Gemini: response schema

Gemini 2.0 and 2.5 added native structured output via response_schema in the generation config:

import google.generativeai as genai
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    categories: list[str]

model = genai.GenerativeModel("gemini-2.0-flash")

response = model.generate_content(
    "Extract product info from: ...",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema=Product,
    ),
)

product = Product.model_validate_json(response.text)

Gemini's implementation uses constrained decoding and the reliability is similar to OpenAI's. The Pydantic integration is slightly less polished than OpenAI's (you get a JSON string back rather than a parsed object), but it works.


Library approaches: Instructor and Outlines

Two libraries deserve mention for teams that want provider-agnostic structured output:

Instructor wraps OpenAI, Anthropic, and Gemini APIs and provides a consistent Pydantic-based interface across all of them. You write your schema once and Instructor handles the provider-specific implementation details.

import instructor
import anthropic
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class Company(BaseModel):
    name: str
    founded: int
    headquarters: str

company = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me about Apple Inc."}],
    response_model=Company,
)

print(company.name)  # Apple Inc.

Instructor also handles retry logic: if validation fails, it automatically sends the error back to the model and asks for a correction. Default retry count is 3.

Outlines is for teams running their own inference (with vLLM, llama.cpp, or similar). It implements constrained decoding locally, which gives you the strongest reliability guarantees and works offline.


Schema design for better output quality

Even with constrained decoding, some schema designs produce better results than others.

Keep schemas shallow when possible. Models fill in nested object schemas less reliably than flat ones. A schema with 3 levels of nesting will sometimes produce correct structure but wrong content in deeply nested fields. Flatten where you can.

Add descriptions to fields. JSON Schema allows a description field on properties. Use it. The model reads these descriptions when filling in values. A field named ts is ambiguous; adding "description": "Unix timestamp in seconds" makes the expected format explicit.

Distinguish between absent and null. If a field can be absent vs. explicitly null, model that correctly. Use Optional[str] in Pydantic (which generates anyOf: [{type: string}, {type: null}]) rather than hoping the model knows your convention.

Use string enums for categorical values. Same principle as tool schemas: enums constrain the model's choices and produce more consistent output than free-form strings for categorical fields.


When to use which approach

SituationRecommended Approach
Python + OpenAIclient.beta.chat.completions.parse() with Pydantic
TypeScript + OpenAIzodResponseFormat helper
Python + AnthropicInstructor or manual tool forcing
Python + Geminiresponse_schema with Pydantic
Provider-agnosticInstructor
Self-hosted inferenceOutlines
Simple JSON, no strict schemaJSON mode (OpenAI) or system prompt

The choice for most new production Python projects in 2026 is either Instructor (if you want provider flexibility) or the native provider integration (if you're committed to a single provider). Both are mature enough for production use.


Structured output has genuinely matured. Two years ago, getting reliable JSON out of LLMs required significant engineering work: validation loops, retry logic, regex extraction from markdown code blocks, custom parsers. Today the native integrations handle all of that. The main remaining engineering question is schema design, which is a domain problem rather than an LLM infrastructure problem.

Search