Agentbrisk
Python Apache-2.0 orchestrationcode-execution

smolagents

HuggingFace's minimal agent framework where the LLM writes Python, not JSON


smolagents is HuggingFace's minimal Python agent framework built around one unconventional idea: let the LLM write real Python code instead of structured JSON tool calls. The core logic fits in roughly a thousand lines. The CodeAgent pattern it promotes has benchmark results behind it, and for developers who find larger frameworks bloated, smolagents is one of the cleaner starting points available.

Most agent frameworks share the same basic idea: the LLM decides which tool to call, the framework formats that decision as a JSON object, the runtime executes the tool, and the result goes back into the context. smolagents rejects that design at the foundation. Instead of JSON tool dispatch, the LLM writes actual Python code. The CodeAgent runs that code in an interpreter, feeds the output back, and the loop continues until the task is done.

It sounds like a small architectural choice. In practice, it changes how agents compose actions, handle complex outputs, and build on each other's work in ways that matter at the edges of what agents can do.

The problem smolagents is solving

Tool-calling agents have a hidden ceiling. Every action has to fit into the schema of a single function call: one tool, one set of arguments, one output. When the task needs to loop over a list, filter results, nest one tool's output into another tool's input, or run a quick calculation between two API calls, JSON tool dispatch forces awkward workarounds. You either add more tools or ask the LLM to do logic inside its text response, which is fragile.

Code agents sidestep this. When the LLM can write for item in results: process(item) directly, loops, conditionals, and function composition are just Python. The framework does not need special syntax for branching or iteration because the programming language already has it.

The research behind this design is not vague intuition. The paper "Executable Code Actions Elicit Better LLM Agents" (Wang et al., 2024) showed code-based action representations outperform JSON and text-based formats on a range of agent benchmarks. HuggingFace's own benchmark comparing CodeAgent against ToolCallingAgent across multiple models showed consistent wins for the code approach, particularly on tasks that require multi-step computation.

Why the small codebase matters

smolagents is explicit about its philosophy: the core agent logic fits in roughly a thousand lines. That is not a marketing number; you can read the source in an afternoon.

This matters for a reason that most framework documentation does not mention: when something breaks in production, you need to understand what the framework is actually doing. With LangChain, tracing a bug through several layers of abstraction is a genuine skill that takes time to develop. With smolagents, there are fewer layers to trace through. The control flow is legible.

The small codebase is also a contribution to trust. You can audit what the framework does with your model calls, how it formats prompts, how it handles errors. For teams that are security-conscious or working with sensitive data, that legibility has real value beyond developer comfort.


CodeAgent: agents that write Python instead of JSON

The CodeAgent is the default and recommended agent class in smolagents. At each step of the ReAct loop, instead of generating a JSON blob that specifies a tool name and arguments, the LLM generates a Python code snippet. That snippet gets executed, and the result (whatever type it is) flows back into the context as the observation.

from smolagents import CodeAgent, DuckDuckGoSearchTool, InferenceClientModel

model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)

agent.run("Find the three most cited papers on code generation from 2024 and summarize their main contributions.")

The LLM might produce something like:

results = web_search("most cited code generation papers 2024")
papers = [r for r in results if "2024" in r["date"]]
for paper in papers[:3]:
    detail = web_search(f"{paper['title']} main contributions abstract")
    print(detail)

This is just Python. The agent can loop, filter, call the same tool multiple times with different arguments, and pass complex objects between steps without the framework needing special support for any of those patterns.

Minimal core, easy to read

The entire framework lives in a small number of files. The agent loop, the tool abstraction, the model wrapper, and the executor are all written to be read, not just used. There are no deep inheritance chains or plugin architectures that require understanding eight files to trace a single request.

This design choice has a practical consequence for debugging. When an agent misbehaves, the first place to look is the prompt that got sent to the model. smolagents makes it straightforward to log and inspect those prompts because the code is right there. Frameworks built for extensibility often trade this transparency for configurability, which is a reasonable choice, but a different one.

Sandboxed execution with E2B and Docker

The obvious concern with running LLM-generated code is security. smolagents is direct about this: the default LocalPythonExecutor is not a security boundary. If the model generates malicious code, or if a user can influence the agent's inputs, local execution is dangerous.

The framework handles this with first-class support for sandboxed runtimes. E2B provides ephemeral cloud sandboxes. Modal and Blaxel offer serverless execution environments. Docker works for self-hosted sandboxing. Switching from local to sandboxed execution is a one-line change:

from smolagents import CodeAgent, E2BExecutor

agent = CodeAgent(tools=[...], model=model, executor=E2BExecutor(api_key="..."))

The documentation leads with the security caveat rather than burying it. That reflects a more honest attitude toward agent deployment than you see in frameworks where the danger of code execution is mentioned in a footnote.

HuggingFace Hub integration

Tools and agents in smolagents can be published to and loaded from the HuggingFace Hub as Gradio Spaces. This means the community shares not just model weights but ready-to-use agent tools. A web scraper, a PDF parser, a code analysis tool: if someone has built and published it, you can pull it in directly:

from smolagents import Tool

image_tool = Tool.from_hub("m-ric/text-to-image", token="...")
agent = CodeAgent(tools=[image_tool], model=model)

The reverse also works. Tools you build can be pushed to the Hub so your team or the community can reuse them without copying code. This gives smolagents a built-in distribution mechanism that most framework ecosystems handle through package managers or documentation links.

Multi-step traces and visualization

smolagents ships with logging and visualization support for multi-step agent runs. Each step of the agent loop (what code the LLM generated, what the execution returned, what the model saw as its next observation) is captured and can be displayed through a Gradio interface or printed to the terminal.

For complex tasks that run for many steps, this trace is the primary debugging tool. It shows exactly where the agent made a wrong assumption, called a tool with bad arguments, or got stuck in a loop. Frameworks that do not capture this structure leave developers staring at logs that require reconstruction to understand.

Model support and the BYOK model

smolagents is model-agnostic by design. The InferenceClientModel connects to HuggingFace's hosted inference. LiteLLMModel supports OpenAI, Anthropic, Mistral, Cohere, and anything else LiteLLM covers, which covers most of the commercial API landscape. TransformersModel runs models locally. OllamaModel works for local inference via Ollama.

The pricing model is bring-your-own-key throughout. smolagents itself has no inference costs; you pay whatever your model provider charges, nothing more. For teams already running HuggingFace models on their own infrastructure, this integrates naturally. For teams using OpenAI or Anthropic, the LiteLLM wrapper adds minimal overhead.

The CodeAgent pattern performs best with models that have strong instruction-following and coding ability. GPT-4o, Claude Sonnet, and Llama 3.3 70B are common choices. Smaller models can work for simpler tasks, but multi-step coding agents need a model that can reason about intermediate results and write syntactically correct Python under context pressure.

Tool compatibility across ecosystems

One thoughtful design decision in smolagents is that it does not require you to rewrite tools from other ecosystems. LangChain tools can be imported directly via Tool.from_langchain(). MCP servers are supported through ToolCollection.from_mcp(). HuggingFace Spaces become tools through Tool.from_space().

This matters if you are migrating from LangChain or evaluating whether smolagents fits your existing tooling. You do not have to rewrite your web search wrapper or your database connector to try the CodeAgent approach. The existing tools plug in and the framework handles the rest.

When smolagents fits and when it does not

smolagents is the right choice for a specific type of developer and a specific type of task. If you want to understand your agent framework completely, run LLM-generated code as the primary action mechanism, and stay as close to plain Python as possible, smolagents is difficult to beat. Research teams, ML engineers building internal tools, and developers who find larger frameworks opaque will be comfortable here.

It is less suited to teams that need production-grade observability tooling comparable to what LangGraph and LangSmith offer. Multi-agent orchestration in smolagents is supported but less structured than graph-based approaches. If your workflow requires explicit state machines with branching and human-in-the-loop checkpoints, a framework that models control flow as a graph will be easier to reason about. OpenAI Swarm and LangGraph both take that direction more seriously.

For coding-specific tasks, smolagents is an obvious fit for the best AI agent for coding category. The CodeAgent's native Python execution is not a simulation of code running, it is code running, which makes it more capable and more debuggable than agents that only pretend to use code.

The benchmark argument

HuggingFace's comparison between CodeAgent and ToolCallingAgent on their internal benchmark showed code-based actions winning across model families. The gains were larger on tasks requiring multiple sequential tool calls or computation between calls, precisely the cases where JSON dispatch forces workarounds.

The caveat is that benchmarks measure what they measure. The advantage of code agents on composed, multi-step tasks does not automatically translate to every real workload. For tasks that are genuinely single-tool calls (search, retrieve, return), the ToolCallingAgent in smolagents (also available if you prefer the JSON pattern) works fine and has broader model compatibility for models that were not trained on heavy code generation.

What 27,000 stars means in practice

smolagents crossed 27,000 GitHub stars in roughly a year of existence. For an HF library that deliberately avoids features, that growth reflects a real demand for minimal agent frameworks rather than framework fatigue with the larger players. The community around it is active, the issue tracker moves fast, and the HuggingFace team releases updates regularly; v1.24.0 landed in January 2026 and development has continued since.

The open-source tooling ecosystem on the Hub is an underrated asset. Tools shared as Gradio Spaces are immediately usable, versioned, and discoverable. Over time that distribution model could become one of smolagents' more significant advantages over frameworks where the tool ecosystem is scattered across package indexes and GitHub repositories.

Verdict

smolagents is what it says it is: a small, honest, well-reasoned framework that makes the unconventional argument that LLMs should write code rather than fill in JSON templates. That argument has evidence behind it, the implementation backs it up, and the codebase is readable enough that you can verify both claims yourself.

It is not trying to be the only framework you ever need. It is trying to be the framework that gets out of the way. For the right use case, that is exactly what you want.

Key features

  • CodeAgent paradigm: LLM writes Python directly
  • Minimal core (~1000 lines of logic)
  • Sandboxed execution via E2B, Modal, Docker, and Blaxel
  • HuggingFace Hub integration for sharing tools and agents
  • Model-agnostic: OpenAI, Anthropic, local models, LiteLLM
  • MCP server and LangChain tool compatibility
  • Multi-step traces and Gradio visualization

Frequently Asked Questions

What is smolagents?
smolagents is an open-source Python library from HuggingFace for building AI agents. Its defining feature is the CodeAgent, which generates Python code to perform actions instead of calling tools through JSON. The entire agent logic fits in roughly a thousand lines of code, making it one of the smallest serious agent frameworks available.
Is smolagents free?
Yes. smolagents is Apache-2.0 licensed and completely free. You bring your own model keys: HuggingFace Inference API, OpenAI, Anthropic, or a local model via Ollama or Transformers. There is no paid tier for the framework itself.
How does smolagents compare to LangChain?
LangChain is a large ecosystem with extensive abstractions for chains, memory, retrievers, and integrations. smolagents takes the opposite approach: minimal by design, with a focus on the CodeAgent loop. If you want to understand every line of your agent code, smolagents is significantly easier to read. If you need deep ecosystem integrations or production-grade observability tooling, LangChain or LangGraph has more to offer.
Is the CodeAgent approach safe?
The default LocalPythonExecutor is not a security sandbox; running untrusted code locally is dangerous. For production use, smolagents supports sandboxed execution via E2B, Modal, Docker, and Blaxel. The documentation is explicit about this requirement, which is better than frameworks that bury the caveat.
What models work with smolagents?
Any model supported by HuggingFace Inference API, OpenAI, Anthropic, Mistral, or any provider accessible through LiteLLM. Local models work via the Transformers or Ollama integrations. The CodeAgent pattern performs best with strong instruction-following models in the GPT-4 class or equivalent open models.
Search