Python MIT orchestrationmulti-agentconversation

AutoGen

Microsoft's multi-agent conversation framework with role-based agents and tool use

AutoGen is Microsoft Research's open-source framework for building multi-agent systems where AI agents communicate with each other to complete tasks. It introduced the conversational agent pattern that most multi-agent frameworks have since copied. The v0.4 redesign in 2025 brought a layered architecture with an event-driven Core, a higher-level AgentChat API, and AutoGen Studio for no-code prototyping. It supports Python and .NET, integrates with most major LLM providers, and runs code safely in Docker containers. AutoGen is MIT-licensed and free to use. You bring your own API keys, so costs depend entirely on which models you call and how often. As of mid-2026, Microsoft has moved active development to its broader Agent Framework, leaving AutoGen in community maintenance mode.

When multi-agent AI was still a fringe research topic, Microsoft Research shipped AutoGen and gave developers a concrete way to make agents talk to each other. That was 2023. By 2024 it had cleared 40,000 GitHub stars. By 2026 it's sitting at nearly 58,000. The number tells you something real: AutoGen set the template that everyone else followed. The question worth asking now is whether following that template still makes sense, or whether the framework that defined the pattern has been surpassed by the frameworks it inspired.

Quick verdict

AutoGen is the framework that proved multi-agent conversation was a viable software pattern, not just a research demo. The v0.4 redesign brought a cleaner architecture and genuine async support. But Microsoft has since moved active development to its broader Agent Framework, leaving AutoGen in community maintenance mode. For new projects, you need to weigh the large community and deep documentation base against the fact that the framework is no longer receiving new features.

What is AutoGen, exactly?

AutoGen started as a Microsoft Research project with a specific insight: complex AI tasks get easier when you break them into conversations between specialized agents rather than cramming everything into one massive prompt. An agent that just does code review, talking to an agent that just writes code, talking to an agent that just runs tests, produces better results than a single agent trying to do all three.

The original v0.2 API was opinionated and simple. You defined agents, gave them roles and tools, dropped them into a group chat, and let them collaborate. It was rough around the edges but it worked, and the community built a huge amount of knowledge around it.

Then came v0.4 in 2025, and it broke almost everything. Microsoft rewrote the framework from scratch around a layered architecture:

Core is the event-driven foundation. It handles message passing, agent lifecycle, and the async runtime. This is where you work if you're building production systems that need to scale.
AgentChat sits above Core and provides the high-level conversational APIs most developers actually want. This is the equivalent of the v0.2 experience, rebuilt with better abstractions.
Extensions handles integrations with MCP servers, Docker code execution, Azure services, and gRPC for distributed deployments.
Studio is a web UI built on top of AgentChat that lets you assemble agent teams without writing code.

The architecture is genuinely better than v0.2. The problem is that none of your v0.2 code runs on it, and neither does any tutorial written before mid-2025. That was a painful transition for teams with production AutoGen deployments.

The features that defined the multi-agent space

Conversable agents and group chat

The core concept in AutoGen is the conversable agent: an agent that can send and receive messages, use tools, and decide when it's done. Group chat puts multiple conversable agents in a shared conversation with a manager that decides whose turn it is to speak.

In v0.4, the AgentChat API exposes AssistantAgent, UserProxyAgent, and several team types including RoundRobinGroupChat and SelectorGroupChat. The selector variant uses an LLM to decide which agent speaks next based on the conversation state, which produces more natural collaboration but introduces nondeterminism that can be hard to debug.

This pattern is genuinely powerful for tasks that benefit from multiple perspectives: code generation with a reviewer agent, research with a planner and a retriever, data analysis with a coder and an interpreter. It's also where AutoGen's complexity starts to show. When four agents are talking to each other and one goes off on a tangent, tracing what happened requires good logging that you have to set up yourself.

Tool use and code execution

AutoGen agents can call tools defined as Python functions, and the framework has built-in support for executing the code those tools generate. The code execution happens in isolated environments: either a Docker container or a local subprocess. Docker is the safer option and the one you should use for anything beyond personal experimentation.

The code execution loop is one of the things AutoGen did better than most early frameworks. An agent writes code, the executor runs it, the output comes back into the conversation, and the agent decides what to do next. This human-like iteration cycle produces much better results on programming tasks than single-shot code generation. It's also what powers the more sophisticated patterns in tools like Claude Code, which apply similar iterative execution loops to actual development tasks.

AutoGen Studio for visual design

AutoGen Studio is a browser-based interface that ships with the framework. You can define agents, assign them system prompts and tools, group them into teams, and run workflows from a point-and-click UI. It generates the underlying Python configuration behind the scenes.

It's genuinely useful for prototyping and for demonstrating multi-agent workflows to stakeholders who don't want to read code. It's not a production tool. The workflows you build in Studio need to be exported and hardened before they're reliable enough for real workloads, and the GUI doesn't expose all the options available in the Python API.

Multi-model support

AutoGen's model client layer supports OpenAI, Azure OpenAI, Anthropic, Google Gemini, and local models through Ollama. Switching models means changing the client configuration, not rewriting your agent logic. In practice this is useful for cost optimization: you can run planning agents on a cheap model and route the actual code generation to a stronger one.

The model-agnostic design also matters for enterprise deployments where specific providers are mandated by policy. You don't have to commit to one LLM vendor to use AutoGen.

Async architecture in v0.4

One of the real improvements in v0.4 is the async runtime in Core. The original AutoGen was synchronous, which meant agents took turns in a blocking sequence. The new event-driven architecture lets multiple agents work concurrently, which matters for workflows where parallelism makes sense: running a research agent and a code agent at the same time, for example, rather than waiting for one to finish before starting the other.

For simple workflows, this doesn't change much. For complex production systems with many agents handling different subtasks, the async foundation gives you room to scale without rewriting your orchestration logic.

Pricing

AutoGen costs nothing to use. The MIT license lets you run it in any environment, modify the source, and build commercial products on top of it without paying Microsoft or filing paperwork.

Your actual costs come from the LLM API calls your agents make. Multi-agent workflows tend to be expensive because each agent turn is a separate API call, and group chats can generate a lot of turns before a task completes. A workflow with four agents collaborating on a complex task can burn through tokens quickly, especially if you're using the selector group chat where an LLM is also deciding turn order.

Rough cost expectations:

Lightweight workflows (two agents, simple tasks): $0.01-0.10 per run depending on model
Medium complexity (4-6 agents, research or coding tasks): $0.50-3.00 per run with GPT-4 class models
Heavy workflows (many agents, long conversations): costs can reach $10+ per run

The practical move is to develop with cheaper models like GPT-4o mini or Claude Haiku, profile your actual token usage, and only upgrade to more capable models for tasks where quality measurably improves. AutoGen's multi-model support makes it easy to mix and match.

There's no hosted platform, no usage dashboard, and no telemetry unless you add it yourself. You manage costs entirely through your LLM provider accounts.

Where AutoGen wins and where it doesn't

AutoGen wins when you need maximum flexibility in how agents interact. The layered API lets you stay at the high-level AgentChat surface for simple workflows and drop down to Core when you need precise control over message routing, agent lifecycle, or concurrent execution. Few frameworks give you that range without forcing you to pick one abstraction and stick with it.

The code execution pipeline is also genuinely strong. If your use case involves agents generating and running code iteratively, AutoGen's built-in Docker execution and the feedback loop between executor and assistant is well-tested and reliable.

Where AutoGen struggles is usability. The v0.4 redesign introduced a layered architecture that's better in theory but harder to navigate in practice. You need to understand which layer you're working in and why. The documentation has improved but still has gaps, and a large fraction of community resources target the old v0.2 API. Someone new to the framework in 2026 will spend real time figuring out which tutorials are outdated.

The maintenance mode announcement is the biggest practical concern. AutoGen isn't broken and it isn't going away, but it also isn't getting new features. If you're building something that needs to evolve with the AI tooling landscape, that's a limitation worth taking seriously. The ecosystem around frameworks under active development tends to grow faster.

Who AutoGen is built for

AutoGen makes the most sense for teams with specific needs:

Researchers and advanced prototypers who want maximum control over agent interaction patterns and are willing to invest time in understanding the framework. The Core API exposes primitives that other frameworks abstract away.

Teams already on AutoGen v0.4 who have working production systems. Migrating to a new framework has real costs, and AutoGen v0.4 isn't going anywhere. Continuing to use it with community support is a reasonable choice.

Developers who prioritize code execution as a first-class concern. AutoGen's Docker-based executor is one of the better-integrated code execution setups available. If your agents need to run code reliably, it's a genuine advantage.

It's harder to recommend AutoGen for teams that want fast time-to-prototype on simple role-based workflows, or teams building systems that need to stay current with actively developed tooling. Those teams will probably do better elsewhere.

AutoGen vs the alternatives

AutoGen vs CrewAI

CrewAI uses a crew-and-role mental model that feels natural from day one. You define agents as roles, wire them into a crew, and let the framework handle delegation. AutoGen's group chat is more powerful but also less predictable. CrewAI wins on developer experience for straightforward workflows. AutoGen wins when you need to customize how agents communicate at a granular level or when you need async concurrent execution.

AutoGen vs LangGraph

LangGraph takes the opposite philosophical approach: you define your agent workflow as an explicit state graph where nodes are operations and edges are transitions. Nothing happens that you didn't draw on the graph. This makes LangGraph significantly easier to debug and reason about, especially for workflows with complex branching logic, human approval steps, or conditional execution. AutoGen's group chat is more emergent and less deterministic. LangGraph is the better choice for production systems where predictability matters more than flexibility. If you're looking at coding agents specifically, LangGraph's explicit control flow tends to produce more auditable results.

AutoGen vs OpenAI Swarm

Swarm is a lightweight experimental library from OpenAI for agent handoffs. It's simpler than AutoGen and intentionally minimal. Swarm works well for basic routing patterns where one agent decides to transfer a task to another. It doesn't have AutoGen's group chat, code execution, or multi-model support. AutoGen is the better choice for anything beyond simple handoff patterns.

Getting started

Install AutoGen's AgentChat layer:

pip install "autogen-agentchat" "autogen-ext[openai]"

A minimal two-agent conversation:

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

coder = AssistantAgent(
    name="coder",
    model_client=model_client,
    system_message="You write Python code to solve the user's task.",
)

reviewer = AssistantAgent(
    name="reviewer",
    model_client=model_client,
    system_message="You review code for bugs and suggest improvements.",
)

team = RoundRobinGroupChat(
    [coder, reviewer],
    termination_condition=MaxMessageTermination(max_messages=6),
)

async def main():
    result = await team.run(task="Write a function that finds duplicate items in a list.")
    print(result.messages[-1].content)

asyncio.run(main())

From here, swap in SelectorGroupChat for smarter turn management, add tools via function decorators, or drop down to the Core API if you need to customize message routing. The official docs at microsoft.github.io/autogen/stable/ are the most reliable reference.

The bottom line

AutoGen earned its reputation. It put multi-agent conversation on the map as a practical development pattern, not just a research concept. The v0.4 redesign, despite the disruption, produced a cleaner framework with a better async foundation. The 58,000-star community has generated more real-world AutoGen knowledge than exists for any competing framework.

The honest problem is timing. Microsoft has moved on to its Agent Framework, and AutoGen is now community-maintained. If you're starting fresh in mid-2026, that's a meaningful factor. LangGraph offers more predictable control flow. CrewAI offers faster onboarding. Both are under active development.

If you're already running AutoGen v0.4, stick with it. The framework still works, the community is active, and a forced migration has real costs. If you're choosing for the first time, be honest about whether you're picking AutoGen for its technical merits or out of name recognition.

Key features

Conversable agents that talk to each other in structured group chats
Safe code execution via Docker or subprocess isolation
AutoGen Studio: no-code GUI for designing multi-agent workflows
Multi-model support across OpenAI, Azure, Anthropic, Gemini, and local models
Async event-driven runtime in v0.4 for scalable concurrent agent execution
MCP server integration for extending agent capabilities
Human-in-the-loop support for approval and intervention at runtime

Frequently Asked Questions

What is AutoGen?

AutoGen is an open-source Python framework from Microsoft Research for building multi-agent AI systems. Agents in AutoGen can send messages to each other, use tools, execute code, and loop until a task is complete. The framework introduced the conversational agent pattern where multiple AI agents collaborate by talking to each other, a design that influenced most subsequent multi-agent frameworks. It supports Python and .NET and works with OpenAI, Azure OpenAI, Anthropic, Google Gemini, and local models.

Is AutoGen free?

Yes. AutoGen is MIT-licensed and completely free to use, modify, and deploy. There is no hosted platform fee. Your only costs are the API calls your agents make to LLM providers like OpenAI or Anthropic, which you pay directly through your own accounts.

How does AutoGen compare to CrewAI?

AutoGen gives you more explicit control over agent communication and supports event-driven async execution, which matters for complex workflows. CrewAI is faster to get started with because it uses a role-and-crew mental model that maps naturally to how people think about team tasks. AutoGen's group chat orchestration is more flexible but also harder to reason about when agents go off-script. If you want simple delegation flows, [CrewAI](/frameworks/crewai/) is easier. If you need fine-grained control over how agents talk to each other, AutoGen gives you more levers.

What changed in AutoGen v0.4?

AutoGen v0.4 was a ground-up redesign released in 2025. It introduced a layered architecture with three distinct APIs: Core for event-driven low-level control, AgentChat for high-level conversational workflows, and Extensions for third-party integrations. The async runtime replaced the synchronous execution model. This broke compatibility with all v0.2 code, which caused significant frustration in the community since most existing tutorials and production code targeted the old API.

Does AutoGen have a GUI?

Yes. AutoGen Studio is a no-code web interface included with the framework. You can use it to define agents, configure their tools and models, assemble them into teams, and run workflows without writing Python. It's best suited for prototyping and demos. For production systems, you'll still want to manage agent logic in code where you can version-control and test it.

Should I use AutoGen in 2026?

It depends on your situation. If you're starting a new project, Microsoft now recommends its Agent Framework over AutoGen, which is officially in maintenance mode. AutoGen still works, has a large community, and isn't going away soon, but you won't get new features. If you have existing AutoGen code on v0.4, staying put is reasonable. For new projects, evaluate whether the Agent Framework, [LangGraph](/frameworks/langgraph/), or [CrewAI](/frameworks/crewai/) better fits your needs before defaulting to AutoGen.