Agentic Planning (The Agents Season, Episode 5)

Summary of Agentic Planning (The Agents Season, Episode 5)

by Ben Jaffe and Katie Malone

24mMay 18, 2026

Overview of Agentic Planning (The Agents Season, Episode 5)

This episode explores planning in AI agents—specifically, how agents can move beyond simple reactive step-by-step behavior and instead search, evaluate, and backtrack when solving complex tasks. Ben Jaffe and Katie Malone center the discussion on the 2023 paper Tree of Thoughts, which showed that prompting an LLM to explore multiple reasoning branches can dramatically improve performance on hard combinatorial problems like the game of 24.

Key Idea: Planning vs. Memory

The hosts distinguish between two related but different agent capabilities:

  • Memory/context: what the model can hold or access.
  • Planning: what the model can do with that information.

A planning-capable agent can:

  • think ahead,
  • consider multiple possible paths,
  • evaluate partial solutions,
  • backtrack when a strategy fails,
  • and choose among alternatives instead of just taking the most obvious next step.

Most current agents, they note, are still closer to reaction than true planning.

Chain of Thought: Helpful, But Limited

The episode revisits chain-of-thought prompting, which asks the model to reason step by step before answering.

Why it helps

  • It externalizes intermediate reasoning.
  • It makes logic visible and easier to correct.
  • It improves performance on many tasks compared with direct answering.

Why it fails on harder search problems

  • It follows a single linear path.
  • It lacks true branching, backtracking, and evaluation of alternatives.
  • If the model goes wrong early, it often keeps going confidently in the wrong direction.

In the game of 24 benchmark, chain-of-thought actually underperformed standard prompting.

Tree of Thoughts: Deliberate Search and Planning

The main focus is Tree of Thoughts, which reframes reasoning as a search tree rather than a single chain.

How it works

At each step, the model:

  • generates several possible next moves,
  • evaluates which are promising,
  • prunes dead ends,
  • and continues down the best branches.

This gives the agent a more human-like planning ability: not just “what next?” but “what are my options, and which path is most likely to work?”

Why it matters

This approach is closer to:

  • System 2 thinking from Daniel Kahneman’s Thinking, Fast and Slow,
  • and classic AI ideas from Newell and Simon, who modeled problem solving as searching a problem space.

Striking Result from the Paper

The episode highlights the dramatic benchmark result from Tree of Thoughts on the game of 24:

  • Standard prompting: about 7%
  • Chain-of-thought: about 4%
  • Tree of Thoughts: about 74%

The key point is that this gain came from a different reasoning strategy, not a better base model.

Why the Game of 24 Is a Good Benchmark

The game of 24 is useful because it requires:

  • genuine combinatorial search,
  • exploration of many possible branches,
  • and the ability to recognize dead ends.

It’s a strong test of whether a model can search and evaluate, not just pattern-match.

Limitations and Costs

Despite its power, Tree of Thoughts has major drawbacks:

  • It is expensive computationally.
  • It requires many model calls per decision point.
  • Costs can grow combinatorially as branches multiply.
  • It is often overkill for simple, linear tasks.

The hosts note that for tasks like:

  • reading a file,
  • summarizing it,
  • writing an email,
  • sending it,

a simpler chain-of-thought or reactive approach is usually sufficient.

When Planning Is Actually Worth It

Tree-of-thought-style planning is most useful when:

  • the task has many interdependent steps,
  • wrong paths are hard to distinguish early,
  • backtracking is likely to help,
  • and the agent must handle uncertainty without immediately escalating to a human.

In these cases, planning increases the range of tasks an agent can handle autonomously.

Where the Field Is Going

The episode ends by noting that modern systems are increasingly blending approaches:

  • start with cheaper reactive reasoning,
  • detect when the model is stuck,
  • and then escalate to more expensive planning/search.

They also point out that newer reasoning models may be learning some of this planning capability natively, blurring the line between:

  • prompted search and
  • learned reasoning.

Main Takeaways

  • Planning is search: good agent planning looks like exploring a problem space, not just following a single chain.
  • Tree of Thoughts is a major step forward in how we think about agent reasoning.
  • The same model can both generate and evaluate candidate paths surprisingly well.
  • Costs matter: better planning comes with significant computational overhead.
  • The key open question is when to use cheap reactive behavior and when to pay for deeper planning.

Looking Ahead

The episode tees up the next topic in the series: why agents fail, especially when they commit confidently to a bad plan over many steps rather than just making a simple wrong answer.