ReAct and Tool Usage (The Agents Season, Episode 2)

Summary of ReAct and Tool Usage (The Agents Season, Episode 2)

by Ben Jaffe and Katie Malone

23mApril 27, 2026

Overview of ReAct and Tool Usage (The Agents Season, Episode 2)

Ben Jaffe and Katie Malone explain how AI agents became meaningfully more useful once models stopped being isolated text generators and began interacting with the outside world through tools. The episode traces the shift from “reasoning-only” language models to systems that can reason, act, observe results, and repeat, focusing on two influential research ideas: ReAct (interleaving reasoning and action) and Toolformer (teaching models when to use tools). It also connects those ideas to modern infrastructure like MCP (Model Context Protocol) and to real-world agent products such as OpenClaw, showing how research quickly turned into practical agent systems.

Key Ideas and Main Takeaways

Why tool use mattered

  • Before 2022–2023, language models were mostly confined to a text in / text out setup.
  • They could reason over training data, but they could not:
    • look things up,
    • run code,
    • verify claims,
    • or access anything beyond their knowledge cutoff.
  • Tool use broke that “wall” by letting models interact with the outside world.

Reasoning and acting were separate research tracks

  • Reasoning improvements came from chain-of-thought prompting and similar methods.
  • Acting systems could browse or navigate environments, but often lacked reasoning and were brittle.
  • The breakthrough was combining them into a loop: think → act → observe → think again.

ReAct: Reasoning + Acting

What ReAct is

  • ReAct stands for Reasoning and Acting.
  • The core idea is to let a model alternate between:
    • generating a reasoning trace,
    • taking an action,
    • receiving an observation,
    • and updating its reasoning based on that observation.

Why it worked

  • It grounded the model’s thinking in real, retrieved information instead of pure speculation.
  • It reduced hallucinations and error propagation because the model could check itself at each step.
  • It improved both:
    • multi-hop question answering
    • and interactive decision-making tasks like navigation or shopping.

The paper’s practical value

  • The thought-action-observation traces are interpretable.
  • When the model fails, you can inspect where it went off track.
  • This made ReAct an important foundation for defining what an agent actually does.

Example used in the episode

  • The hosts walk through a HotpotQA-style question about the Apple Remote and Front Row.
  • A plain reasoning-only model gives the wrong answer.
  • ReAct succeeds by:
    • searching the web,
    • refining its search terms,
    • and using retrieved facts to arrive at the correct answer.
  • The point is not the trivia itself, but the fact that the model is checking reality as it reasons.

Toolformer: Learning When to Use Tools

What Toolformer solved

  • If ReAct is about the process of reasoning + acting, Toolformer is about tool selection.
  • It asks: can a model learn when to use a tool without explicit instructions?

How it worked

  • Meta AI trained models on text with potential API calls inserted.
  • It filtered for API calls that actually improved prediction.
  • The model was then fine-tuned on those useful tool-use examples.

Tools mentioned

  • Calculator
  • Question answering system
  • Search engines
  • Translation system
  • Calendar

Why it mattered

  • A smaller model with tool access could compete with larger models on some tasks.
  • The model learned to offload work:
    • math → calculator
    • factual lookup → search
    • date reasoning → calendar
  • In effect, it learned its own limitations and how to route around them.

MCP: The Engineering Layer for Tool Use

What MCP is

  • Model Context Protocol (MCP) was introduced by Anthropic in November 2024.
  • It is an open standard for connecting AI assistants to external systems.

What problem it solves

  • Without a standard, every model-to-tool connection required a custom integration.
  • That created a combinatorial mess: many models × many tools = too many bespoke connectors.

Why it matters

  • MCP acts like a USB-C port for AI:
    • one standard interface,
    • many compatible tools.
  • It supports connections to systems like:
    • email,
    • calendars,
    • code repositories,
    • business databases,
    • and other data sources.
  • It does not change the core agent loop; it just makes tool access much easier and more standardized.

OpenClaw: Research Ideas Become a Product

What OpenClaw represents

  • The episode uses OpenClaw as a real-world example of the modern agent stack.
  • It’s a personal AI agent that can run on a user’s own computer and interact through apps like WhatsApp or Telegram.

Why it stood out

  • Users can grant it access to things like:
    • email,
    • calendar,
    • files,
    • GitHub,
    • and other tools.
  • It uses a skill system: modular instructions that teach the agent how to use specific tools.
  • This is presented as a direct consumer-facing implementation of the agent/tool-use ideas discussed in the episode.

Bigger implication

  • The popularity of systems like OpenClaw suggests that agents are not just chatbots with upgrades.
  • Giving an AI persistent tool access and permission to act on your behalf creates a different class of system with both huge utility and serious risk.

Why This Matters for AI Agents

The foundational loop

  • The episode frames tool use as central to the definition of an AI agent:
    • observe
    • reason
    • act
    • repeat
  • Tools are how the agent interacts with the world and gets new observations.

The tradeoff

  • More tools mean more capability.
  • But more tools also mean:
    • more ways to make mistakes,
    • more oversight challenges,
    • more security concerns,
    • and harder evaluation.

Open questions raised

  • How do you evaluate long tool-use sequences?
  • How do you supervise an agent acting faster than a human can monitor?
  • What permissions should an agent have?
  • How do you decide which tools to give it?

What’s Coming Next

  • The hosts preview the next episode, which will focus on memory and context.
  • Once an agent can act over multiple steps, the next question becomes:
    • what can it remember?
    • how much context can it hold?
    • and what happens when tasks outlast its attention window?

Bottom Line

This episode argues that tool use is the hinge between a chatbot and a true agent. ReAct showed how to combine reasoning with action in a reliable, interpretable loop. Toolformer showed that models can learn when to invoke tools on their own. MCP made tool integration practical at scale. Together, these developments explain how AI systems moved from impressive text generators to agents that can do useful work in the real world.