Python MIT orchestrationmulti-agent

OpenAI Swarm / Agents SDK

OpenAI's official framework for multi-agent handoffs and production agent workflows

OpenAI Swarm started as a small educational experiment in late 2024 to show how multi-agent handoffs could work cleanly in Python. By early 2025 it graduated into the OpenAI Agents SDK, a production-ready framework that keeps the same handoff-first design but adds tracing, guardrails, voice, and hosted tooling. If you are building on OpenAI models and want the path of least resistance, this is the blessed route.

OpenAI released Swarm in October 2024 with an unusual disclaimer attached: this was not a production framework. It was labeled educational and experimental, meant to show one clean way to think about multi-agent coordination. The code was small enough to read in an afternoon. The core idea was a handoff primitive, one agent deciding to pass control to another agent, and that was essentially it.

That restraint turned out to be the point. Most agent frameworks had been adding features at a pace that made them hard to understand. Swarm said: here are two concepts, agents and handoffs, now go build something.

Six months later OpenAI turned that experiment into the Agents SDK, a proper production-grade library that kept the same minimal philosophy while adding the infrastructure that production use actually requires. As of May 2026 it sits at v0.17.0 with 26,000 GitHub stars and weekly releases.

From experiment to production

The transition from Swarm to Agents SDK was not a rewrite. OpenAI kept the handoff-first design and built around it. What changed was the addition of tracing baked into the OpenAI dashboard, output guardrails, human-in-the-loop hooks, voice support through the realtime API, and tighter integration with hosted tools like web search and computer use.

The original Swarm repository is now archived. If you are starting a project today, you install openai-agents and work with the Agents SDK directly. The mental model is identical to what Swarm taught, just with a runtime that can handle production traffic without you needing to build monitoring on top of it yourself.

Handoffs as the core primitive

The handoff is what makes this framework worth discussing separately from CrewAI or LangGraph. In most multi-agent systems, routing between agents is something you implement yourself, usually as a conditional in a loop or a node in a graph. In the Agents SDK, a handoff is a named concept with its own behavior and a first-class place in the trace.

When an agent decides to hand off to another agent, it does so explicitly. The runtime transfers context, conversation history, and any relevant state. The receiving agent picks up with full awareness of what happened before it arrived. This is different from simply calling another agent as a tool, because the calling agent exits and the new agent takes over the conversation turn.

The practical effect is that multi-agent pipelines become easier to follow. You can look at a trace and see exactly which agent was running at each step and why the handoff happened. That clarity is worth real money in debugging time.

Lightweight runtime

The Agents SDK does not try to be a complete application framework. It handles orchestration, conversation history, tool calls, and handoffs. It does not tell you how to store data, how to expose an API, or how to manage deployments. Those choices stay with you.

This is a deliberate design decision that separates it from heavier alternatives. LangGraph gives you a state graph and expects you to model your entire workflow inside it. CrewAI gives you roles and crews and a process manager. The Agents SDK gives you agents and handoffs, and then gets out of the way.

For teams that already have infrastructure, this is genuinely useful. You are not inheriting someone else's opinions about your stack. The SDK composes cleanly with FastAPI, existing background job systems, and whatever data layer you already use.

Installation is straightforward. Python 3.10 or newer, then pip install openai-agents. Optional extras include [voice] for the realtime audio pipeline and [redis] for distributed session storage when you need to run agents across multiple workers.

OpenAI tooling integration (assistants, computer use)

The tightest integration the Agents SDK offers is with OpenAI's own hosted tools. Web search, code interpreter, file search, and computer use are available as built-in tool types that require no additional setup beyond an API key. You attach them to an agent with a single line and the tool calls are handled by OpenAI's infrastructure, not your servers.

Computer use is the most interesting of these right now. The SDK ships with a ComputerTool that wraps the computer use model and handles the screenshot-action loop. Building a browser automation agent that can navigate interfaces it has never seen before is a few dozen lines of code, not a weekend project.

This integration is also where the model-agnostic question gets uncomfortable. All of these hosted tools are OpenAI-only. If you switch to Anthropic or a local model, you lose them and have to replace them with your own implementations. For teams committed to OpenAI's model stack this is purely an advantage. For teams that want flexibility it is a real lock-in consideration.

The SDK also integrates with the Model Context Protocol, so you can connect third-party MCP servers as tool sources. This gives you a path to a larger tool ecosystem without building each integration from scratch.

Production-ready Agents SDK

The jump from Swarm to Agents SDK added the features that make the difference between a prototype and something you can put in front of users.

Guardrails let you validate agent inputs and outputs before and after each run. You define them as functions or use OpenAI's hosted content moderation. Failed guardrails can terminate a run, route to a fallback agent, or trigger a human review step. This is not something Swarm had at all, it was entirely left to the user.

Human-in-the-loop is handled through interrupts. An agent can pause at a defined point and wait for external input before continuing. The state is preserved across the pause. This is the feature that makes the Agents SDK usable for workflows where a human needs to approve something before the agent proceeds, a common requirement in finance, legal, and internal tooling contexts.

Long-running agents can operate in containerized environments where they have filesystem access and can run processes. A coding agent that needs to run tests, fix failures, and iterate can do so inside a sandboxed container without you managing that environment yourself.

Voice support through the realtime API means you can build agents that speak and listen. The pipeline from speech to agent to speech is abstracted into the same handoff model as text agents. A voice triage agent that hands off to a specialized agent when it detects a certain intent works the same way in voice as it does in text.

Tracing and evals through OpenAI dashboard

The observability story for the Agents SDK is meaningfully better than most open-source alternatives. Every agent run generates a trace automatically. You see the full chain of agent calls, tool invocations, handoffs, and model outputs in the OpenAI dashboard without adding any instrumentation to your code.

The trace view shows you which agent was active, what tools it called, when handoffs happened, and what the model returned at each step. For multi-agent systems where something went wrong three steps in, this is the difference between a thirty-minute debugging session and an afternoon of log spelunking.

OpenAI has also started adding eval capabilities tied to the same infrastructure. You can run your agent on a test dataset, review outputs in the dashboard, and compare results across model versions or prompt changes. This is not as mature as what Braintrust or LangSmith offer for third-party evals, but for teams that live inside the OpenAI ecosystem it is a credible starting point.

The tradeoff is that all of this observability is inside OpenAI's platform. If you want to route traces to your own systems you can, the SDK supports custom trace exporters, but the out-of-the-box path goes to openai.com.

Honest comparison with alternatives

The Agents SDK is the right choice when your team is committed to OpenAI models and wants a minimal, officially maintained foundation. The handoff abstraction is clean, the hosted tool integration saves real engineering time, and tracing is built in.

LangGraph is the better choice when your workflow has serious branching logic, parallel subgraphs, or complex human approval flows. LangGraph forces you to draw your control flow explicitly, which is more work upfront but scales better to workflows that have genuine conditional complexity. It is also model-agnostic, so switching providers is not a project.

AutoGen is worth considering when you want agents that collaborate through conversation rather than structured handoffs. AutoGen's conversation model is more flexible for scenarios where agents need to negotiate or challenge each other's outputs. It is also more provider-neutral than the Agents SDK.

CrewAI sits closer to the Agents SDK in simplicity but uses a role metaphor instead of a handoff metaphor. CrewAI is faster to prototype with if your team thinks in terms of roles and responsibilities. The Agents SDK is more explicit about control flow, which makes it easier to understand what is actually happening at runtime.

If you are building AI coding agents specifically, the Agents SDK plus computer use is a compelling combination. You get a framework that handles orchestration, tools that can interact with a browser or terminal, and tracing that shows you what the agent did step by step.

Who should use it

The Agents SDK makes sense for teams that are already on OpenAI, want a minimal framework with an official support story, and need the hosted tools. It is the lowest-friction path to production if those three things are true.

It is the wrong choice if model flexibility matters to you, if you need a graph-based workflow model, or if you are working in a context where sending trace data to OpenAI is not acceptable. For those situations LangGraph or AutoGen will serve you better.

The framework has earned its position. Starting as a readable experiment and becoming a production-grade SDK in under a year is a credible arc, and the handoff abstraction that drove it is genuinely worth understanding regardless of whether you end up using it.

Key features

Agent handoffs as a first-class primitive
Lightweight runtime with no heavy abstractions
Native OpenAI tooling (Assistants, computer use, web search)
Built-in tracing and evals through OpenAI dashboard
Guardrails for input and output validation
Human-in-the-loop support
Voice agent support via gpt-realtime-2
MCP tool integration

Frequently Asked Questions

What is OpenAI Swarm?

OpenAI Swarm was an experimental Python library released in late 2024 to demonstrate a clean pattern for multi-agent handoffs. It was superseded by the OpenAI Agents SDK in early 2025, which is the production version with the same core design plus tracing, guardrails, and hosted tool support.

Is OpenAI Swarm still maintained?

The original Swarm repository is archived. Active development has moved to the OpenAI Agents SDK at github.com/openai/openai-agents-python, which reached v0.17.0 in May 2026 and is actively maintained by OpenAI.

How does OpenAI Swarm compare to AutoGen?

Swarm/Agents SDK uses a handoff model where one agent explicitly transfers control to another. AutoGen uses a conversation-based approach where agents exchange messages in a loop. Swarm is simpler to trace and reason about for linear pipelines; AutoGen offers more flexibility for agents that need to negotiate or collaborate dynamically.

Can I use OpenAI Agents SDK with other LLM providers?

The SDK is designed to work with OpenAI models and tools natively. It does support routing to other providers through LiteLLM or compatible endpoints, but you lose access to hosted tools like web search and computer use, which are OpenAI-only features.