ExplainerDay 2

ReAct: when your AI agent stops thinking and starts doing

Chain of Thought lets a model reason. ReAct lets it act on that reasoning. That shift changes everything about how failures happen and who owns them.

2026-06-24·6 min

The failure this pattern exists to prevent

This is the kind of failure that can appear in production more than once. A customer support agent built on an LLM has access to three tools. It can look up order status, initiate a refund, and send a confirmation email. The team tests it thoroughly. It works well.

Two weeks after go-live, a user asks about a missing order. The agent looks up the order, cannot find it in the first call, and looks it up again. And again. It runs the same tool call dozens of times in a few minutes before someone notices the API cost spike and pulls the plug manually. The user never gets a response. The order was there all along. A field name mismatch means the agent is querying the wrong identifier and interpreting the empty result as a reason to retry rather than a reason to escalate.

That is not a prompting problem. That is what happens when a model can take real actions in the world without a mechanism to stop when something is clearly not working.

What the pattern is

ReAct stands for Reason and Act. It was introduced by Yao et al. in a 2022 paper and has since become one of the most widely used patterns in modern agent frameworks.

The core idea is that reasoning and action are interleaved rather than separated. The model does not reason first and then act. It reasons, then acts, then observes what happened, then reasons again based on that observation, then acts again. The loop continues until the task is complete or a stopping condition is met.

The cycle works like this. The model receives a task. It generates a thought about what to do. It calls a tool based on that thought. It receives an observation from the tool. It generates a new thought based on the observation. It calls another tool or produces a final answer. This thought-action-observation loop is what separates a ReAct agent from a model that just answers questions.

The reason this matters is grounding. A model running Chain of Thought reasons inside its own context, which is valuable for structured thinking but offers no feedback when the reasoning drifts from reality. A ReAct agent receives real observations from real tools and has to reason against actual data at each step. That grounding meaningfully reduces a specific class of errors for tasks that involve the real world. It does not eliminate hallucinations, and it introduces a new class of risks that pure reasoning models do not have.

A concrete before and after

A user asks an agent to check if their order has shipped and, if it has not, to reschedule the delivery for next Friday.

Without ReAct, a model using only Chain of Thought might produce this. Thought: The user wants to know if their order has shipped and wants

to reschedule if it has not. Answer: I have checked your order and it has not shipped yet. I have

rescheduled your delivery for next Friday. You will receive a

confirmation shortly.

The model invented the outcome. It has no tool access. It cannot check order status or reschedule anything. It produced a confident answer that is entirely fabricated. The user waits for a delivery that was never rescheduled.

With ReAct, the same task produces this. Thought: I need to check the order status before deciding what to do. Action: lookup_order(order_id="ORD-8812") Observation: {"status": "processing", "estimated_ship": "2026-06-26"} Thought: The order has not shipped yet. I should reschedule the

delivery as the user requested. Action: reschedule_delivery(order_id="ORD-8812", date="2026-06-27") Observation: {"success": true, "new_date": "2026-06-27",

"confirmation": "DEL-4421"} Thought: The reschedule was successful. I can now confirm this

to the user with the confirmation number. Answer: Your order has not shipped yet. I have rescheduled your

delivery for Friday 27 June. Your confirmation number is DEL-4421.

Every step is grounded in a real observation. The answer reflects what actually happened, not what the model assumed would happen.

Why this matters beyond the demo

The shift from reasoning to acting is the moment AI systems go from advisory to consequential. A model that gives a wrong answer causes confusion. A model that takes a wrong action causes damage.

That distinction changes who owns the risk. In a Chain of Thought system the output is text and a human decides what to do with it. In a ReAct system the model is the one deciding, and the actions it takes can be irreversible. An order rescheduled. An account flagged. A record updated. An email sent.

From a delivery and governance perspective, this is the point where you cannot treat an AI system as a sophisticated search engine anymore. It is a system that acts on behalf of your company, and the controls you put around it need to reflect that.

Most teams do not make this mental shift early enough. They build a ReAct agent, it works in testing, they ship it, and the first production incident involves an action the model took that nobody intended and cannot be undone cleanly.

Where ReAct breaks down

Tool call loops. When a model receives an observation it does not know how to interpret, it can treat the absence of a clear result as a reason to retry. Without a hard limit on the number of steps an agent can take, this becomes a realistic engineering risk in any long-running pipeline. Setting a maximum step count and an escalation path when that limit is hit is not optional for production systems.

Compounding errors. In a multi-step ReAct loop, a wrong observation in step two corrupts the reasoning in step three, which corrupts the action in step four. By the time the agent produces a final answer, the original error may be four steps back and invisible in the output. The answer looks plausible because the reasoning chain from step three onwards is internally consistent. It is just built on a bad foundation.

Hallucinated observations. Some models, particularly when tool responses are slow or malformed, may generate a plausible-looking observation rather than correctly surfacing an error state. This is not common but it does occur, and it is harder to catch than other failure modes because the fabricated observation looks like a real one in the trace.

Scope creep mid-task. An agent can successfully complete a user request and then continue reasoning because it identifies something adjacent it thinks it should also fix. Nobody asked it to. The user did not want it to. The model may not be malfunctioning. It is following its reasoning to a conclusion that sits outside the original scope. Clear task boundaries in the system prompt reduce this but do not eliminate it entirely.

The question to ask before you give an agent tools

Before connecting a ReAct agent to any tool, answer these three questions for every tool on the list.

Can this action be undone if the agent calls it incorrectly? If the answer is no, that tool needs a human confirmation step before the agent can use it in production.

What happens if this tool is called repeatedly in a short window? If that outcome is acceptable, proceed. If it is not, add a hard step limit and an alerting threshold before the agent ships.

Who in your team will be paged if this agent does something unexpected at 2am on a Saturday? If nobody owns that on-call responsibility, the agent is not ready for production regardless of how well it performed in testing.

Teams that answer these questions before building avoid most of the incidents that teams who skip them spend months cleaning up.

What comes next

ReAct gives an agent the ability to act on its reasoning and update that reasoning based on what it observes. The loop is powerful but it is passive in one important way. The agent can act and observe but it cannot evaluate the quality of its own output before handing it back to the user.

Day 3 covers Reflection, which adds that self-evaluation step. The agent finishes a task, then critiques its own answer before delivering it. That additional loop catches a category of errors that neither Chain of Thought nor ReAct can catch on their own.

What is the most consequential action your current AI system takes without a human in the loop, and do you have a hard stop if that action starts repeating?

Part of the TrilokCloud AI Design Patterns series · One pattern per day · trilokcloud.in/blog