Memory Management for AI Agents (The Agents Season, Episode 4)

Summary of Memory Management for AI Agents (The Agents Season, Episode 4)

by Ben Jaffe and Katie Malone

24mMay 10, 2026

Overview of Memory Management for AI Agents (The Agents Season, Episode 4)

Ben Jaffe and Katie Malone explain how AI agents manage memory under the hard constraint of a limited context window. The episode focuses on why long-running agents struggle to stay coherent, and how systems like MEMGPT, RAG, and compaction try to solve different parts of the problem. A major theme is that memory is not one thing: agents need ways to handle factual knowledge, task state, and how-to instructions separately.

Core Problem: Context Is Limited

AI agents operate in a reasoning → acting → observing loop, but everything they need must fit inside their context window.

  • Context windows are finite, even when very large.
  • Models also show a “lost in the middle” effect: the beginning and end of context are more useful than the middle.
  • For long tasks, multi-session conversations, and document analysis, this creates a memory management problem.

The result: agents need an explicit strategy for deciding what stays in context, what gets summarized, and what gets stored externally.

MEMGPT: Treating Agent Memory Like Operating System Memory

The episode uses MEMGPT as the foundational example of agent memory management.

The OS analogy

MEMGPT borrows from operating systems:

  • Context window = RAM
    • Fast, immediate, but limited
  • External storage = disk
    • Slower to access, but effectively larger

The agent acts like a memory manager, deciding:

  • what to keep in context,
  • what to page out,
  • what to retrieve later.

How it works

MEMGPT uses tool calls to:

  • read/write external memory,
  • search past information,
  • summarize and compress old context before moving it out.

Why it matters

This approach helped with tasks like:

  • long-document analysis,
  • multi-session conversations,
  • extended agentic workflows.

The key idea: the agent should manage its own memory, not rely on a passive prompt dump.

RAG vs. Agent Memory

The hosts distinguish retrieval-augmented generation (RAG) from MEMGPT-style memory.

RAG

RAG is mainly:

  • an external system retrieving relevant information,
  • then inserting that information into the model’s context.

This is useful for background knowledge and semantic memory.

How MEMGPT differs

MEMGPT is more active:

  • the model decides when to store something externally,
  • decides what to retrieve,
  • can compress its own state,
  • is aware of its memory architecture.

In short:

  • RAG = retrieval done for the model
  • MEMGPT = retrieval and memory management done by the agent

Compaction: The Practical Version of Memory Management

A major practical example is compaction, as seen in tools like Claude Code.

What compaction does

When context gets too full:

  • the agent generates a structured summary of the session,
  • the detailed history is cleared,
  • the conversation resumes from the summary.

This creates the illusion of continuous conversation across very long sessions.

What gets preserved

Typically, compaction keeps:

  • current state,
  • next steps,
  • key learnings,
  • major decisions.

What can be lost

Because summaries are lossy, compaction can drop:

  • exact reasoning behind old decisions,
  • subtle constraints,
  • exceptions mentioned earlier,
  • fine-grained details.

This can lead to context rot, where the agent becomes fuzzier, repeats itself, or contradicts earlier decisions.

Claude Code and the Role of Claude.md

The episode uses Claude Code as a concrete example of how memory is handled in practice.

Claude.md

This file acts as durable, re-injected context:

  • project conventions,
  • architectural decisions,
  • instructions that should survive compaction.

It’s a workaround for the fact that summaries alone are not reliable enough for everything important.

Reconstructed context

After compaction, the system reloads:

  • the summary,
  • recent files,
  • tool definitions,
  • instructions from Claude.md.

This helps maintain continuity while acknowledging that the full history cannot be kept forever.

The Hierarchy of Context

Another major takeaway: context is not a flat space where everything is equally important.

1. System prompt

At the top is the system prompt:

  • defines the agent’s role,
  • sets constraints,
  • is always present and highly privileged.

2. Durable project instructions

Below that is project- or user-specific persistent context, such as Claude.md.

  • contains rules and conventions that should survive compaction,
  • deliberately re-injected to stay near the start of the context.

3. Conversation history

Then comes the live interaction:

  • user messages,
  • assistant responses,
  • tool calls,
  • tool outputs.

This layer changes the most and is most vulnerable to:

  • lost-in-the-middle,
  • compaction loss,
  • context rot.

The broader lesson: important instructions should be placed where the model is most likely to attend to them.

Different Kinds of Memory

The episode frames memory in a useful three-part taxonomy:

Semantic memory

  • factual knowledge about the world
  • best handled by RAG

Episodic memory

  • what happened during a task
  • observed state, decisions, session history
  • handled by MEMGPT-style systems and compaction

Procedural memory

  • how to do things
  • loaded as skills or tool-specific instructions
  • examples include GitHub or Gmail skills

This is a helpful way to think about what kind of memory a system needs, rather than treating “memory” as one generic feature.

Main Takeaways

  • The context window is a hard constraint, not just an inconvenience.
  • Long-running agents need explicit memory management.
  • MEMGPT models memory like an operating system managing RAM and disk.
  • RAG helps with retrieving external knowledge, but it is not the same as agent-managed memory.
  • Compaction is the practical, widely used method for maintaining long sessions, but it is lossy.
  • Systems like Claude Code rely on durable files like Claude.md to preserve important instructions.
  • Good agent design depends on understanding the hierarchy of context and placing critical information where the model will actually use it.

What’s Next

The hosts end by teasing the next episode, which will focus on planning:

  • If memory is about what an agent can hold in its head,
  • planning is about how far ahead it can think.

They suggest that the gap between search-based planning and what production agents actually do is larger than many people expect.