Lost in the Middle (The Agents Season, Episode 3)

Summary of Lost in the Middle (The Agents Season, Episode 3)

by Ben Jaffe and Katie Malone

19mMay 4, 2026

Overview of Lost in the Middle (The Agents Season, Episode 3)

In this episode of Linear Digressions, Ben Jaffe and Katie Malone explain the “lost in the middle” phenomenon in large language models: even when a model has access to a long context window, it often performs best when relevant information appears at the beginning or end of the prompt, and worst when it’s buried in the middle. The episode connects this research finding to AI agents, showing why long-running agent workflows can degrade over time and why context management matters as much as context size.

What “Lost in the Middle” Means

The central idea is that LLMs do not use all parts of their context equally.

  • Beginning and end of context are easier for models to use
  • Middle information is often underutilized or effectively ignored
  • This happens even when the relevant information is well within the model’s stated context limit

The hosts compare this to human memory’s serial position effect:

  • Primacy = remembering the beginning
  • Recency = remembering the end
  • The middle is least memorable

Key Research Findings

The episode focuses on the 2023 Stanford/Berkeley paper Lost in the Middle: How Language Models Use Long Contexts.

Experimental setup

Researchers gave models a set of documents and asked them to find the one containing the answer. They varied where the relevant document appeared:

  • At the start
  • At the end
  • In the middle

Main result: a U-shaped curve

Models performed best when the answer was:

  • First
  • Last

They performed much worse when the answer was in the middle, creating a U-shaped performance curve.

Surprising implication

In one case, GPT-3.5 Turbo did worse with documents containing the answer in the middle than when given no documents at all, suggesting that irrelevant surrounding context could actively hurt performance.

Bigger context windows do not fully solve it

Even when researchers increased the context window size, the position-based performance pattern remained largely the same. More context did not automatically mean better use of that context.

More documents also helps only a little

Giving models more retrieved documents produced only minor gains. In the episode’s example, going from 20 to 50 documents improved performance by only about 1.5% for GPT-3.5 Turbo.

Why This Happens

The episode explains that this is not just a training-data issue. While models may learn patterns from text where important information often appears early or late, the deeper explanation is architectural.

Causal masking

Transformers use causal masking, meaning each token can only attend to earlier tokens, not later ones. This creates an asymmetry:

  • The first token is visible to every later token
  • Middle tokens are visible to fewer downstream tokens
  • Tokens near the end benefit from other structural effects

Residual connections

A second architectural factor helps the end of the sequence retain signal better than the middle. Together with causal masking, this leaves the middle in a kind of dead zone.

Why This Matters for AI Agents

The episode ties the research directly to agentic systems.

AI agents:

  • Observe
  • Reason
  • Act

They repeat that loop many times, accumulating context as they go. Over long tasks, important information from early steps may become buried in the middle of a huge context window.

Practical consequence

Even if the information technically still fits in the context window, it may become functionally less accessible to the model.

This is especially important for:

  • Multi-step workflows
  • Long-running tool use
  • Agents maintaining task state across many interactions

Practical Takeaways

For people building or using agents, the episode suggests several lessons:

  • Front-load important instructions and constraints
  • Don’t assume a bigger context window solves memory problems
  • Be aware that where information appears matters, not just whether it is present
  • For long tasks, you need active context management, not just more tokens

What’s Next

The episode sets up the next topic in the series: context management.

The hosts note that if bigger context windows alone aren’t enough, then the real solution is to manage context more intelligently. They tease future discussion of:

  • Memory-management-inspired systems
  • Compaction
  • Techniques used in long coding sessions and agent workflows

Bottom Line

The core message is simple: LLMs are not equally attentive to all parts of a long prompt. Information in the middle is often hardest for them to use, and this has major consequences for AI agents that must reason over long sequences of observations and actions. For reliable agent behavior, context placement and context management matter just as much as context size.