ExplainerDay 5

Plan and Execute: when an agent thinks before it acts

The first four patterns make a model smarter about individual steps. Plan and Execute is the first pattern that makes it think across steps before taking any of them.

2026-06-27·6 min

The failure this pattern exists to prevent

This is a pattern that shows up in delivery programs as often as it shows up in AI systems. A team gets a complex request. They start working immediately because the first step is obvious. Three weeks in, they hit a dependency they did not see coming. The dependency blocks two other workstreams. The timeline slips. A retrospective later reveals that ten minutes of upfront planning would have surfaced the blocker before a single line of code was written.

AI agents fail the same way. Given a complex multi-step task, a reactive agent starts executing the first obvious action. It completes step one, discovers step two requires something step one did not produce, improvises, and either delivers something incomplete or fails mid-task in a way that is expensive to reverse.

The cost of skipping the plan is always paid. The question is whether you pay it before execution or during it.

What the pattern is

Plan and Execute separates high-level task decomposition from step-by-step execution. The model receives a complex task and produces a complete plan before taking any action. The plan is then executed step by step, with each step informed by what the plan already worked out.

The separation matters for the same reason it matters in program delivery. Planning and execution require different cognitive modes. Planning needs the full picture. Execution needs focus on the current step. Mixing the two means neither gets done properly.

In practice the pattern works in two phases. In the planning phase the model breaks the task into steps, identifies what each step needs as input, and flags where external tools or information will be required. In the execution phase it works through the plan step by step. Many implementations also allow partial replanning during execution when new information changes what a later step should do. Plan and Execute is a design pattern that draws from several related approaches rather than a single canonical method, and implementations vary in how rigidly they separate the two phases.

Related work in 2023 and 2024 includes HuggingGPT, which combined a language model with specialist tools through explicit task planning, and Plan-and-Solve by Wang et al., which structured the reasoning process by separating plan generation from solution execution. Both demonstrated meaningful improvements on complex tasks. This approach is especially helpful when early decisions constrain later steps, which is exactly the class of task where reactive agents fail most visibly.

A concrete before and after

A user asks an agent to research three competing products, summarise the key differences, and draft a comparison table formatted for a slide deck.

Without Plan and Execute the agent starts immediately. Step 1: Search for Product A.

Result: found overview page, summarised key features. Step 2: Search for Product B.

Result: found pricing page, summarised pricing tiers. Step 3: Search for Product C.

Result: found a 2023 review, summarised reviewer opinion. Step 4: Draft comparison table.

Problem: Product A summary has features, Product B has

pricing, Product C has opinions. The three sources are

not comparable. The table cannot be structured consistently. Output: inconsistent comparison table with mismatched

categories across columns.

The agent executed confidently and produced something that looks complete but is not usable. The failure was not in the execution. It was in the absence of a plan that decided what to collect before collecting anything.

With Plan and Execute the agent plans first. Plan:

Define the comparison dimensions before searching:

pricing, core features, integration options,

target market. Search each product specifically for those four

dimensions. Verify each search returns comparable data before

proceeding to the next product. Draft the table using the four dimensions as columns. Format for slide deck: max three rows of content

per cell, no prose paragraphs.

Execution: proceeds product by product against the

defined dimensions. Output: consistent comparison table, slide-ready,

with the same four dimensions across all three products.

The plan added a short upfront processing step. The output is often easier to use with less rework as a result.

Where planning meets production reality

This is where the pattern gets more complicated than the research papers suggest, and where TPM experience is more useful than academic benchmarks.

Plans made before execution are made with incomplete information. The agent plans based on what it knows at planning time. Step three may reveal something that invalidates the assumption step five was built on. A rigid plan that does not update when execution surfaces new information is not better than no plan. It is more confidently wrong.

The agents that handle this well use a replanning trigger. When an execution step produces a result that materially changes the picture, the agent pauses, updates the relevant downstream steps, and continues with a revised plan rather than pushing through with a plan that no longer fits reality. This is not a weakness in the pattern. It is how competent delivery managers actually run programs.

Long plans also fail in ways short plans do not. A ten-step plan has ten opportunities for compounding errors. A wrong assumption in step two corrupts the execution of step four which corrupts the output of step seven. By the time the final output is produced the original error may be invisible in the trace. This is the same problem that makes large program dependencies hard to manage. The solution in both cases is the same. Break the plan into phases, validate the output of each phase before proceeding, and treat phase boundaries as checkpoints rather than formalities.

The governance question that most teams skip is who reviews the plan before execution begins. In a human-run program a plan gets reviewed before the team starts building. In an agent-run program the plan is generated and executed in seconds with no human in the loop. For low-stakes tasks that is acceptable. For tasks where the execution touches real systems, external APIs, customer data, or irreversible actions, the plan is the right place to insert a human review step. Reviewing a plan is cheaper than reversing an execution.

There are four failure modes worth knowing specifically.

Over-planning. The agent produces a plan so detailed that it becomes brittle. Every step is specified to a level that leaves no room for the judgment calls that execution always requires. The plan looks thorough and breaks on first contact with reality.

Under-replanning. The agent has a replanning mechanism but the trigger threshold is too high. It takes a significant deviation from expected results before the plan updates. Small deviations accumulate across steps and the plan drifts away from reality gradually rather than suddenly.

Plan-execution mismatch. The planning phase and the execution phase use different context. The plan was made with the full task description in context. By the time step seven is executing, the context window has filled with intermediate results and the original task framing has been pushed out. The agent executes step seven correctly against its current context but incorrectly against the original intent.

Scope expansion mid-plan. The agent identifies adjacent tasks during planning that were not in the original request and adds them to the plan. The user asked for a comparison table. The plan includes sourcing additional data, checking for recent updates, and drafting an executive summary. None of that was requested. All of it gets executed.

The question worth asking before you ship a planning agent

For any agent that plans before it executes, answer these before it goes to production.

What is the maximum number of steps the plan can contain before a human reviews it. Define this explicitly. An agent that can generate a 40-step plan and execute all 40 steps without a checkpoint is a significant operational risk regardless of how well it performs in testing.

What triggers a replan. Write it down. If the answer is "the agent decides," that is not an answer. Define the specific conditions — a tool call that returns an error, a result that falls outside an expected range, a step that cannot be completed as specified — that cause the agent to pause and revise rather than push through.

Who sees the plan before execution begins for tasks above a defined complexity threshold. This does not need to be a manual review for every task. It needs to be a defined policy that someone owns.

What comes next

Plan and Execute gives an agent the ability to think across multiple steps before taking any of them. The plans it generates are only as good as what it knows at planning time.

Day 6 covers Memory, which is what allows an agent to carry knowledge across tasks rather than starting fresh each time. Without memory, every plan is built from scratch. With it, the agent can draw on what previous tasks taught it and plan more accurately from the start.

For the most complex task your current AI system handles, does it plan before it acts — and if it does, does anyone see that plan before execution begins?

Part of the TrilokCloud AI Design Patterns series · One pattern per day · trilokcloud.in/blog