Agentbrisk

Chain of Thought vs ReAct: Which AI Reasoning Pattern Wins?

March 8, 2026 · Editorial Team · 10 min read · promptingreasoningagent-design

Two prompting patterns dominate the conversation whenever developers talk about making language models think more carefully: Chain of Thought and ReAct. Both techniques ask the model to reason before answering. Both improve performance on complex tasks compared to a plain prompt. But they are built for different jobs, and using one where you need the other is a reliable way to end up with an agent that is slower, less accurate, or both.

This guide explains what each pattern actually does under the hood, where each one is genuinely strong, where each one falls apart, and how to decide which one belongs in your next project. If you are new to the idea of AI agents in general, start with How Do AI Agents Work? before reading this.

What Chain of Thought actually is

Chain of Thought - usually abbreviated CoT - was introduced in a 2022 paper by Jason Wei and colleagues at Google Brain. The core observation was simple: if you give a language model a math problem and ask it for the answer directly, it gets it wrong more often than if you tell it to show its work. By asking the model to produce intermediate reasoning steps before the final answer, performance on multi-step reasoning benchmarks improved dramatically.

The basic form is a few-shot prompt where you include worked examples. Each example shows the problem, a step-by-step reasoning chain, and then the answer. The model learns the pattern and applies it to new problems. A variant called zero-shot CoT skips the examples entirely and simply appends the phrase "Let's think step by step" to the prompt. It works surprisingly well because those words prime the model to enter a more deliberate reasoning mode.

What CoT does not do is reach outside the model's own knowledge. The reasoning chain is entirely self-contained. The model thinks, reasons, and concludes - all inside a single inference call. There are no tool calls, no API lookups, no search queries. The model knows what it knows at training time, and that is the complete budget it has to work with.

That constraint is also the design. CoT is meant for problems where the information is already in the model - math problems, logical puzzles, code debugging, reading comprehension, causal reasoning. For those tasks, the bottleneck is not missing information but the model's tendency to jump to an answer without thinking through the intermediate steps. CoT fixes that by making the intermediate steps explicit.

What ReAct actually is

ReAct, introduced in a paper by Shunyu Yao and colleagues later in 2022, takes a different approach. Instead of keeping everything inside a single inference, the model is given tools it can call - a search engine, a calculator, a code interpreter, a database query interface. The model reasons about what action to take next, calls the tool, reads the result, then reasons about the next step.

The pattern is a repeating loop: Thought, Action, Observation, repeat. The "Thought" is natural language reasoning. The "Action" is a structured tool call. The "Observation" is the tool's response, appended to the context. The model keeps looping until it has enough information to produce a final answer.

ReAct is described in much more detail in ReAct Pattern Explained, but the key distinction from CoT is that the reasoning and the external world are interleaved. The model does not have to know the answer upfront. It can discover information through actions and refine its reasoning as new observations come in.

This makes ReAct genuinely useful for tasks that require current information, multi-step retrieval, or actions whose results cannot be predicted before execution. A ReAct agent can look up a stock price, read a documentation page, run a calculation, and combine all three to answer a question that no single piece of training data would cover.

The structural difference in one sentence

Chain of Thought keeps all reasoning inside a single closed inference. ReAct extends reasoning across multiple inference steps connected by real-world tool calls.

That structural difference flows into every practical tradeoff between them.

Where Chain of Thought is stronger

CoT outperforms ReAct in scenarios where the task is self-contained and latency matters. Because CoT runs in a single inference call, it is significantly faster than a ReAct loop that might make three, five, or ten tool calls before finishing. For production systems with real users waiting on responses, that speed difference is often decisive.

CoT also tends to produce cleaner, more coherent reasoning on tasks that are primarily about logical structure rather than information retrieval. When you need a model to walk through a multi-step math proof, evaluate a legal argument, or write code for a well-specified algorithm, CoT gives the model a clean workspace to think. There is no distraction from parsing tool outputs or deciding which action to take next.

Reliability is another advantage. CoT has no external dependencies. There are no tools that can time out, return unexpected formats, or fail with rate limit errors. A well-prompted CoT chain either works or it does not - and when it does not, the failure is usually traceable to the reasoning itself rather than an environmental factor.

For cost-sensitive applications, CoT is almost always cheaper. ReAct agents accumulate context with every tool call, and that growing context window increases token costs on every subsequent inference step.

Where ReAct is stronger

ReAct's advantage is completeness. For any task that requires information the model cannot possibly have - current events, proprietary databases, real-time calculations, user-specific data - CoT cannot help. The model can reason beautifully about information it does not have and produce a confident, well-structured wrong answer. ReAct solves this by letting the model go get the information it needs.

ReAct also handles tasks with conditional structure better than CoT. When step three of a task depends on the output of step two in a way that cannot be predicted upfront, CoT can only reason about it in the abstract. A ReAct agent can actually execute step two, see what it returns, and then reason about step three with real data. This is the core of agentic behavior, and it is why most serious agent frameworks are built around the ReAct loop.

Error recovery is another area where ReAct has an edge. If a tool call fails or returns something unexpected, the model can read that observation and try a different approach. A CoT chain that hits an implicit assumption it cannot satisfy will often just continue reasoning incorrectly rather than correcting course.

For multi-step workflows that interact with external systems - the kind of tasks described in AI Agent Architecture Patterns - ReAct is usually the right structural choice.

The common failure modes

Understanding where each pattern breaks down is at least as important as understanding where each one works.

CoT's most consistent failure mode is confident hallucination. The model produces a clean, well-structured reasoning chain that contains a factual error somewhere in the middle. Every subsequent step in the chain builds on that error, and the final answer is wrong in a way that looks convincing. CoT does not have a mechanism to catch this. The model cannot verify claims against external sources, so it has no way to notice when it has stated something false.

CoT also degrades on very long or branching problems. The reasoning chain grows long enough that the model starts losing track of its own earlier steps, or it picks an early framing that turns out to be wrong and cannot backtrack without abandoning the chain.

ReAct's failure modes are different. The most common one is looping: the model keeps calling tools, getting observations that are slightly unsatisfying, and making the same or very similar calls again. Without a hard step limit and an explicit exit condition, ReAct agents can exhaust their context window or rack up significant API costs on a task they are not actually making progress on.

ReAct agents are also more sensitive to tool output quality. If a search result returns a poorly formatted page, or a database query returns an ambiguous schema, the model may reason incorrectly about the observation and go down the wrong path. CoT reasoning is reliable to noisy inputs because there are no inputs - but ReAct observations become part of the reasoning context, and noise in the observations creates noise in the reasoning.

Finally, ReAct is harder to debug and audit. A long chain of thought is easy to read. A ReAct trace with many tool calls and observations is harder to follow, and identifying where the reasoning went wrong requires inspecting both the thought steps and the tool outputs at each turn.

Hybrid approaches worth knowing

In practice, CoT and ReAct are not mutually exclusive. Most sophisticated agent systems use both, often at different levels of the same pipeline.

One common pattern is to use ReAct at the outer level for tool selection and information gathering, but switch to CoT inside a specific tool call that requires pure reasoning. For example, a ReAct agent might search for relevant documents and then call a summarization step that uses CoT internally to reason about the content.

Another pattern is plan-then-execute: use CoT to produce a high-level plan for a complex task, then use ReAct to execute each step of the plan, allowing the agent to adapt when real tool results differ from what the plan assumed. This is one of the patterns covered in AI Agent Architecture Patterns.

Tree of Thought (ToT) is a newer extension of CoT that addresses the backtracking limitation by letting the model explore multiple reasoning branches and evaluate which one is most promising before committing. It is more expensive than plain CoT but handles branching problems significantly better.

How to choose between them

The decision mostly comes down to four questions.

Does the task require information outside the model's training data? If yes, CoT alone cannot work. Use ReAct or a pipeline that includes retrieval.

Does the task require real-time actions with unpredictable outputs, like running code or querying a live database? Again, ReAct is the right structural choice.

Is latency a hard constraint? If users need a response in under two seconds, a multi-step ReAct loop is going to struggle. CoT with a well-tuned prompt is more likely to fit in the latency budget.

Is the reasoning itself the hard part, or is gathering the right information the hard part? If the problem is that the model is not thinking carefully enough, CoT addresses that. If the problem is that the model is missing critical information, ReAct addresses that.

These questions are also covered in the broader context of Prompt Engineering for Agents, which goes into how prompting choices interact with the rest of your agent's architecture.

Practical implementation notes

For CoT, the highest-impact prompting choice is usually the quality of your few-shot examples. Well-written examples that demonstrate clean step-by-step reasoning transfer well to new problems. Poorly written examples - ones that skip steps or reason in an unclear order - tend to produce the same quality of output you put in. For zero-shot CoT, "Let's think step by step" works, but more specific instructions like "First identify what is being asked, then list what you know, then work through the answer" tend to work better for domain-specific tasks.

For ReAct, the most important implementation detail is the stopping condition. Define explicitly what constitutes a complete answer and build a mechanism that recognizes it. Without this, agents drift. A maximum step limit is a necessary safety valve, but it should be paired with an explicit final-answer instruction that tells the model when it has enough information to stop.

Tool design also matters far more for ReAct than most people expect when they start building agents. Tools that return clean, predictable outputs are significantly easier for the model to reason about than tools that return unstructured text, HTML, or error messages without standard formats. Spending time on tool output formatting usually pays off more than spending the same time on prompt optimization.

The pattern is not the bottleneck

Both Chain of Thought and ReAct are techniques for helping a language model use its capability more effectively. Neither one makes a weak model into a strong one. A model that does not have the knowledge or capability to solve a problem will not solve it better just because you added a reasoning pattern.

The real work is matching the pattern to the task, the task to the right model, and the model to appropriate tools. The pattern choice matters at the margin - sometimes significantly - but it is one variable among several. If your agent is underperforming, the reasoning pattern is worth examining, but so is the model size, the quality of your training data if you are fine-tuning, and the design of the tools themselves.

Both CoT and ReAct have earned their place in the production agent toolkit. Understanding when each one is appropriate is one of the more transferable skills in applied AI development.

Search