The ReAct Pattern Explained: How AI Agents Reason and Act
If you spend any time reading about AI agents, the acronym "ReAct" shows up constantly. It shows up in framework documentation, research papers, product announcements, and casual blog posts, often without much explanation. Most people absorb a vague sense that ReAct has something to do with reasoning, or with tool use, and move on. That vagueness is fine until you need to actually build something, or until you're trying to understand why an agent is behaving the way it is.
This guide covers what the ReAct pattern actually is, where it came from, how it works step by step, how it compares to earlier prompting approaches, and where you'll encounter it in production systems today. No prerequisites needed beyond a basic familiarity with how language models work.
What ReAct stands for
ReAct is short for "Reason + Act." It's a prompting and execution pattern for language models that interleaves two kinds of output: reasoning traces (the model thinking out loud about what to do) and actions (actual calls to tools or the environment). The model alternates between these two modes until the task is complete.
The name is also a deliberate play on "react," in the sense of responding to observations. After each action, the model receives an observation back from the environment, and that observation informs the next reasoning step. The loop is: think, act, observe, think again, act again, and so on.
The pattern was introduced in a 2022 paper by Yao et al. titled "ReAct: Synergizing Reasoning and Acting in Language Models." The paper showed that combining reasoning traces with action calls outperformed both pure reasoning and pure action-selection on a range of tasks, including question answering, fact verification, and interactive decision-making. That result held up, and the pattern became the backbone of almost every agent framework built afterward.
The problem ReAct was designed to solve
To understand why ReAct mattered, you have to understand what the alternatives were, and what was wrong with them.
Before ReAct, the dominant approach for getting an LLM to handle complex tasks was chain-of-thought prompting (CoT). Chain-of-thought asks the model to reason step by step before giving a final answer. It works well for math, logic, and multi-step reasoning that can be done entirely inside the model's head. The problem is that it's closed. The model can only use information that was in its context at the time of the prompt. If it needs to look something up, check a live data source, run code, or take any action in the real world, pure chain-of-thought can't help. It just reasons about what it already knows, which is often not enough.
There were also action-only approaches, early tool-use systems where the model would simply select a tool and call it without any explicit reasoning trace. These worked for simple cases, but fell apart on harder tasks where the right action depended on understanding the state of a multi-step problem. Without a reasoning trace, the model had no way to build up a coherent plan across multiple tool calls.
ReAct combined both. The reasoning trace gives the model a scratchpad to work out what's needed before each action. The action calls give it access to external information and capabilities. The observations from those actions feed back into the next reasoning step, so the model can update its understanding as it goes. That feedback loop is what makes ReAct qualitatively different from its predecessors.
The ReAct loop in detail
The mechanics are worth going through precisely, because "reason + act" is vague until you see what it looks like in a real prompt.
A ReAct trace has four alternating components:
Thought. The model generates a reasoning step in natural language. This is not returned to the user. It's the model working out what the situation is and what it should do next. A thought might be: "The user wants the population of Lagos in 2023. I don't have this in context. I should search for it."
Action. The model outputs a structured action call. In the original paper this was a search query; in modern implementations it's usually a function call in JSON format. The action specifies which tool to use and what arguments to pass. The model doesn't execute the action itself; it outputs the call and waits.
Observation. The external environment (the code running around the model) executes the action and returns a result. That result is injected back into the model's context as an observation. The model didn't produce this; it received it from outside.
Thought. The model reads the observation and reasons about it. "The search returned a figure of 15.9 million for 2023. That answers the question. I can now write the final response."
This cycle repeats until the model determines the task is complete and produces a final answer instead of another thought/action pair.
The critical thing here is that the model is not running in a vacuum. Every observation it receives is grounded in actual external state. That's what makes a ReAct agent capable of doing things that chain-of-thought can't: looking up current information, reading files, running code, interacting with APIs.
Understanding how AI agents work at a structural level makes the ReAct loop easier to situate. The loop is the agent's "inner loop," the thing that runs inside whatever orchestration layer is managing the overall task.
ReAct vs. chain-of-thought: a concrete comparison
The easiest way to understand the difference is to think about what each approach can handle.
With chain-of-thought, you give the model a problem and it works through it step by step. All of the "steps" are inside the model's context. If the answer requires reasoning about information the model was trained on, CoT works well. If the answer requires a fact the model doesn't have, or a computation it can't do reliably in text (like precise arithmetic on large numbers, or checking whether a file contains a certain string), CoT either fails or hallucinates.
With ReAct, the model can pause its reasoning to call a tool, get back real information, and continue. The reasoning trace serves as a working memory that persists across multiple tool calls. The model can build up a coherent picture of a complex situation over several rounds.
Here's a stylized example. Suppose you ask: "What's the current ratio for Apple, and is it above the industry median for consumer electronics?"
A chain-of-thought approach would attempt to reason from training data, likely getting stale numbers and potentially fabricating the industry median.
A ReAct approach would produce something like: "I need Apple's current ratio. I'll look up their latest balance sheet. [calls financial data tool] Balance sheet returned: current assets $135B, current liabilities $145B, ratio 0.93. Now I need the industry median. [calls second search or data tool] Industry median current ratio for consumer electronics: 1.2. Apple's ratio of 0.93 is below the median."
The final answer is grounded in actual current data. The reasoning trace is what kept the multi-step comparison coherent.
How the original ReAct paper demonstrated this
The 2022 paper tested ReAct on three task categories: HotpotQA (multi-hop question answering requiring reasoning over multiple Wikipedia passages), FEVER (fact verification), and ALFWorld (interactive text game requiring multi-step planning in a simulated household).
The key finding was consistent across all three: combining reasoning traces with action calls outperformed either approach used alone. Chain-of-thought on its own was brittle because it couldn't look things up. Action-selection without reasoning traces failed on longer-horizon tasks because there was no mechanism to maintain coherent plans across steps.
The paper also showed something important about failure modes. ReAct agents failed in two distinct ways: "hallucination" errors, where the model reasoned incorrectly even with access to correct observations, and "snowballing" errors, where an early incorrect assumption propagated through subsequent reasoning steps. Identifying these distinct failure modes was as valuable as the performance improvement, because it told practitioners where to focus when debugging.
ReAct in modern agent frameworks
You don't see raw ReAct prompts in most production code anymore. The pattern has been absorbed into the abstractions that frameworks provide. But it's worth knowing where it lives.
In LangGraph, the ReAct loop is implemented as a state graph with a specific topology: a reasoning node, an action-dispatch node, and edges that route based on whether the model output an action or a final answer. The framework handles the injection of observations back into context. You're not writing thought/action/observation formatting by hand; the framework does it. But if you look at the message sequence in the LLM's context during execution, you'll see exactly the ReAct structure.
LangChain's AgentExecutor is an older abstraction that also implements ReAct, along with variants like the OpenAI functions agent format that uses structured function calls instead of free-text action descriptions. The underlying pattern is the same.
OpenAI's function calling API, and Anthropic's tool use API, are both designed to support ReAct-style loops. The model output either produces a tool call or a final response. The application layer detects which one, executes tool calls, injects results, and loops. Again: ReAct, just with a different surface syntax.
Claude Code is a good example of a production agent that runs a ReAct-style loop over a long coding session. It reasons about the codebase, calls tools to read files, run tests, and apply edits, observes the results, and decides what to do next. The loop is not literally the 2022 paper's prompt format, but the structure is the same.
Common extensions and variants
The original ReAct formulation has been extended in several directions.
Reflexion. A variant where the agent, after completing a task, reflects on what went wrong and generates a written self-critique. This critique is stored and included in the prompt on subsequent attempts. It's ReAct with an outer learning loop around it. Research showed it significantly improved performance on tasks where a single attempt wasn't enough.
Plan-then-ReAct. A two-stage approach where a planning step generates a structured plan before the ReAct loop begins. The plan acts as a scaffold that the ReAct loop follows, which improves coherence on long-horizon tasks but reduces flexibility when the plan turns out to be wrong. This maps onto the "planner + executor" architecture described in the AI agent architecture patterns guide.
Multi-agent ReAct. Individual agents each run their own ReAct loop, but they can also call other agents as tools. This is how most modern multi-agent systems work. A coordinator agent runs a ReAct loop where some of the available actions are "delegate to agent X." Each sub-agent runs its own ReAct loop when called.
Self-consistency. Running multiple independent ReAct traces on the same task and aggregating the results, usually by taking a majority vote on the final answer. This helps with tasks where the model's reasoning is noisy and a single trace is unreliable. It costs more (multiple model calls) but can meaningfully improve accuracy on difficult questions.
What makes a ReAct agent go wrong
Understanding the failure modes helps with debugging and with designing more reliable systems.
Reasoning errors that accumulate. The model draws an incorrect conclusion in a thought step, and subsequent reasoning is built on that wrong conclusion. The observation from the next tool call might contradict it, but if the model doesn't notice the contradiction, it keeps going down the wrong path. Long chains are more vulnerable to this.
Tool call errors that don't propagate correctly. A tool returns an error, a malformed response, or empty results. If the agent doesn't handle this gracefully, it either stalls, retries indefinitely, or reasons from the error message as if it were a real observation. Good ReAct implementations treat tool errors as first-class observations and reason explicitly about them: "The search returned no results. I should try a different query."
Loop termination failures. The agent can't determine that the task is complete and keeps generating new thought/action pairs. Or it terminates too early because it convinced itself the task was done when it wasn't. Both failure modes require explicit loop management: maximum iteration limits and clear termination conditions.
Context window exhaustion. On long tasks with many reasoning steps, the full thought/action/observation history grows. At some point, early parts of the trace get truncated. The model loses access to earlier observations and starts contradicting itself. Production systems handle this with context compression: summarizing completed sections of the trace before they would be truncated.
Why ReAct became the standard
The pattern is widely used because it's actually a good match for how language models work. Models trained on human text have absorbed a lot of implicit structure about how humans reason and take action: they describe what they're trying to do, do it, see what happened, and respond. The ReAct format makes that implicit structure explicit and gives the model a scaffold to follow.
It also helps with interpretability. A pure action-selection approach gives you a sequence of tool calls with no explanation. A ReAct trace gives you a chain of reasoning that you can read and audit. When something goes wrong, you can often identify exactly where the reasoning went off the rails. For production systems where debugging and auditability matter, that's a genuine advantage.
The pattern is not perfect. The reasoning traces take tokens, and tokens cost money. For simple tool-use tasks where the right action is obvious, the reasoning overhead is unnecessary. This is why some frameworks let you configure "act-only" mode for steps that don't need explicit reasoning. The general principle is that reasoning traces are worth the cost when the action selection is non-obvious or when multi-step coherence matters.
Putting it into practice
If you're evaluating agents or frameworks and want to understand whether a system uses ReAct, look at the message sequence sent to the model during execution. If you see interleaved thought and tool-call content with observations injected between turns, that's ReAct regardless of what name the framework uses for it.
If you're building an agent and deciding whether to use a ReAct-style loop, the answer is almost always yes for any task that requires more than one tool call and where the right tool calls aren't known in advance. The overhead is modest, and the benefit in coherence and debuggability is real.
If you're debugging a ReAct agent, start with the reasoning traces. Most failures are visible in the thought steps before they show up in the final output. An agent that produces a wrong answer usually shows you where it went wrong if you read the trace.
The ReAct pattern is now old enough that "using ReAct" is roughly equivalent to "using modern agent patterns." It's the foundation that almost everything else is built on. Understanding it clearly makes the rest of the agent landscape easier to navigate.