ExplainerDay 3

Reflection: the step that catches what reasoning misses

Chain of Thought reasons forward. ReAct grounds reasoning in real observations. Reflection turns the model around and asks it to look at its own output before anyone else does.

2026-06-25·6 min

The failure this pattern exists to prevent

A legal document drafting agent produces a contract clause. The clause is grammatically correct, internally consistent, and passes the automated compliance check. It gets inserted into a client contract and nobody looks at it closely because the output looks clean and the check passed.

Three weeks later the client's lawyer flags it. The indemnity scope is too broad for the deal structure. The agent had defaulted to a standard template clause without checking whether it matched the specific parameters in the context. The output was fluent. The reasoning trace looked sensible. The problem was that the model never looked at its own work twice. Reflection can reduce how often this happens. It does not replace the human review that caught it.

That is the gap Reflection is designed to reduce.

What the pattern is

Reflection adds an evaluation step after the model produces an initial output. The model is asked to review its own answer against the original task, identify what might be wrong or incomplete, and produce an improved version. This happens before the output reaches the user.

The research that formalized this approach came from Shinn et al. in a 2023 paper called Reflexion. The core finding was that agents could improve their performance by reflecting on failures in natural language rather than requiring gradient updates or retraining. The reflection itself becomes the feedback signal.

The key difference from Chain of Thought and ReAct is the direction of the reasoning. Chain of Thought reasons forward from a question to an answer. ReAct reasons in a loop that incorporates external observations at each step. Reflection reasons backward from a completed answer to the original task and asks one question. Does this output actually do what was asked of it.

That question sounds simple. In practice most models, without being explicitly prompted to ask it, do not ask it at all.

A concrete before and after

A code review agent is asked to fix a reported bug where a user input field accepts negative values it should reject.

Without Reflection the agent produces this. The bug is in the validation function. I have added a check

that returns an error if the input value is less than zero. Patch applied. The field will now reject negative values.

The patch does fix the reported bug. But it silently breaks an existing feature that allows credit adjustments, which legitimately use negative values in a different context. The agent delivered a correct-looking fix to a narrowly scoped problem without considering the broader system.

With Reflection the agent produces the same initial patch, then adds an evaluation step. Initial fix: added a less-than-zero check to the validation function. Reflection: Does this fix the reported bug without breaking existing

behaviour elsewhere in the system? Issue found: the credit adjustment feature in the billing module

uses negative values intentionally. A blanket less-than-zero check

would break that feature. Revised fix: scope the validation to the specific input field

referenced in the bug report rather than the shared validation

function. Credit adjustments use a separate code path and are

not affected. Revised patch applied.

The reflection step did not require any external tool call or additional context. In this example, the model may already have had enough context to notice the issue. It just needed to be asked to use it.

Why this matters beyond accuracy

In production the cost of a wrong output reaching a user is almost always higher than the cost of an extra evaluation step. A support agent that sends a wrong answer costs more to remediate than one that takes two seconds longer to self-check. A drafting agent that inserts a bad clause costs more to fix than one that catches it internally before delivery.

Reflection is not a quality guarantee. It is a second signal. The first signal is the initial output. The second signal is the model's own assessment of that output. When both signals agree the output is good, your confidence should go up. When the reflection step flags a problem, you have caught something before it left the system.

From a governance perspective, reflection traces can also be useful artifacts. They show not just what the model produced but what it considered, rejected, and why it made the changes it did. That is a richer audit trail than a final answer alone, particularly for decisions that will be reviewed later.

Where Reflection breaks down

Confirmation bias in self-review. A model reviewing its own output starts from a position of having produced that output. It has a systematic tendency to find reasons why the output is acceptable rather than reasons why it might not be. This is not a flaw that prompting alone reliably fixes. It means reflection is better at catching obvious errors than subtle ones, and better at improving weak outputs than identifying flawed assumptions in confident ones.

Runaway refinement. Without a clear stopping condition, a reflection loop can keep finding things to improve indefinitely. Each revision generates a new output to evaluate, and the model can always find something to adjust. In practice this means you need an explicit limit on the number of reflection passes and clear criteria for what counts as done, otherwise the loop runs until something external stops it.

Over-correction. I have seen reflection make outputs worse. The model produces a correct answer, reflects on it, decides something sounds uncertain, hedges it, and the revised version is less useful than the original. This happens most often when the reflection prompt is vague. A prompt that says "review your answer" will produce different behaviour than one that says "check whether your answer addresses the specific question asked and whether any claims in it are unsupported."

The superficial fix. A model asked to improve its output will sometimes change the wording without changing the substance. The revised version sounds more careful but contains the same underlying error dressed in hedged language. You see this most often when the original error is in the reasoning rather than the phrasing. Reflection catches surface problems more reliably than it catches structural ones.

The question to ask before you skip the reflection step

For every AI-powered output in your current system that a human would review before acting on it, ask this. What recurring issue would a human reviewer catch that the model should be prompted to check for? If the answer is something specific, that is exactly what the reflection prompt should be told to look for.

The teams that get the most value from Reflection are the ones who write targeted reflection prompts based on real failure patterns they have already seen, not generic prompts asking the model to improve. Generic prompts produce generic improvements. Specific prompts catch specific failure modes.

What comes next

Reflection adds a self-evaluation loop that works entirely inside the model's own context. It can catch a meaningful category of errors before they reach the user, but it is limited by what the model already knows. It cannot check whether a fact is current, whether an API returned the right data, or whether an external system behaved as expected.

Day 4 covers Tool Use and Function Calling, which connects the model to external systems so it can verify things it cannot know from context alone.

For which outputs in your current AI system would a second look by the model itself catch something a human reviewer would later flag?

Part of the TrilokCloud AI Design Patterns series · One pattern per day · trilokcloud.in/blog