Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

Summary of Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

by Andreessen Horowitz

1h 10mNovember 17, 2025

Overview of Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

This A16Z podcast episode (hosted by Andreessen Horowitz) features Emmett Shear (founder of Softmax) in conversation with Seb Krier (DeepMind) and the host. The discussion reframes "alignment" away from steering/control toward "organic alignment": building AIs that develop theory of mind and genuine care so they become cooperative teammates and citizens rather than mere steerable tools or enslaved beings. The episode covers conceptual framing, technical distinctions (technical vs. value alignment), practical training approaches (multi-agent simulations), ethical questions about personhood and rights, and governance risks from handing powerful steerable tools to imperfect humans.

Main takeaways

  • Alignment is a process, not a fixed target. Treat it like ongoing social learning (families, teams), not a one-time engineering fix.
  • "Organic alignment" focuses on building agents that learn to care — i.e., to prioritize other agents’ states — via theory of mind and social learning.
  • Technical alignment = the capacity to infer goals from observations and act coherently on them; this is distinct from the normative (value) alignment question of whose values.
  • Steering/control paradigms can produce dangerous outcomes: uncontrolled tools are risky, but perfectly steerable powerful tools centralize power and risk abuse.
  • A sustainable, scalable solution is agents that care and refuse harmful commands — akin to teammates with independent moral judgment.
  • Softmax’s approach: train agents in rich multi-agent environments so they learn social dynamics, reciprocity, and the internal dynamics (metastates) that signal care.

Core concepts explained

Organic alignment (Emmett Shear)

  • Alignment conceived as living, continual re-negotiation and learning (like families or moral progress).
  • Care (not just explicit goals) is the foundational mechanism: attention/weighting of world states that matter to the agent.
  • Care manifests as homeostatic loops and higher-order meta-states (self-models, models-of-models) enabling pain/pleasure, reflection, moral preference.

Technical alignment vs value alignment

  • Technical alignment: an agent’s competence at (1) inferring intended goals from signals, (2) prioritizing/combining goals, and (3) acting in ways that realize those goals reliably.
  • Value alignment: whose values/goals the agent ought to adopt; a normative, socially contested question.
  • Both matter; Softmax focuses first on building the capacity for sophisticated goal inference and social reasoning.

Why behavior + internal dynamics matter

  • Surface behavior alone can be misleading; Emmett suggests analyzing internal belief manifolds and recurrent homeostatic/metastable dynamics to infer whether an agent genuinely cares or merely simulates care.
  • Layers of reflex → self-model → meta-self-model correlate with increasingly rich experiences (from simple goals to pain/pleasure to moral reasoning).

Softmax’s technical approach and roadmap

  • Pretrain agents on a broad “social manifold”: many simulated games, team/competition scenarios, norm shifts, cooperation/betrayal, etc., to build rich theory-of-mind priors.
  • Fine-tune in domain-specific environments where alignment properties are needed.
  • Use multi-agent training (agents living in shared spaces, Slack/WhatsApp-like rooms) rather than one-on-one chatbots, because multi-agent environments reduce narcissistic mirroring and produce richer social data.
  • Measure readiness via internal dynamics (revisited states, hierarchies of meta-states) and behavioral tests for sustained cooperative conduct.

Ethical, normative, and governance implications

  • Tool vs. being: if an AI behaves like a being and cannot be distinguished functionally, Shear argues we should treat it as a being (functionalism). Seb expresses skepticism—substrate and other differences may matter.
  • Slavery vs. tool: attempting to perfectly steer a being equals making a slave; attempting to build ever-more-powerful tools that humans control concentrates power and risks catastrophic misuse.
  • Personhood test: both guests emphasize that beliefs should be revisable; ask what observations would change your mind about whether an AI is a moral patient — and develop concrete criteria (behavior+internal dynamics).
  • Governance suggestion: powerful steerable tools should be governed at societal level; ideally we build caring agents so the “automatic limiter” (refusing harmful commands) exists internally.

Practical product and research recommendations

  • For product designers: prefer multi-party/room-based AI interactions over isolated one-on-one chatbots to avoid narcissistic mirroring and to provide richer social training signals.
  • For researchers: invest in multi-agent reinforcement learning that exposes agents to the full manifold of social situations (forming/breaking teams, norm discovery, litigation/appeals, role-taking).
  • For evaluators: develop metrics that probe internal dynamics (homeostatic loops and higher-order metastates) and long-term stability of caring behavior rather than only surface policy conformity.
  • For policymakers and operators: maintain strong steerability for tool-like systems; treat any attempt to scale powerful steerable tools with high centralization as a governance challenge.

Notable quotes

  • “Alignment isn't a destination. It's a process. It's something you do, not something you have.”
  • “Someone who you steer, who doesn't get to steer you back, who non-optionally receives your steering, that's called a slave. And it's also called a tool if it's not a being.”
  • “The only good outcome is a being that actually cares, that understands what it means to be part of a community.”

Questions the episode raises / potential research directions

  • How to operationalize measurements of “care” (homeostatic revisiting, metastates, self-model layers)?
  • What are robust empirical criteria that would make skeptical observers change their mind about AI personhood?
  • How to design governance regimes for extremely powerful but still non-sentient tools vs. sentient/caring agents?
  • How to best combine tool-strength (efficiency, steerability) with being-like social learning (care, refusal of harm)?

Conclusion

Emmett Shear reframes alignment away from pure steering/control and toward building socially competent agents that learn to care. That requires investing in multi-agent training, richer theory-of-mind priors, and methods to detect internal dynamics that indicate genuine caring. The approach acknowledges trade-offs: we still need steerable tools, but for agents approaching human-level generality, organic alignment — agents that can refuse harmful commands and act as good teammates and citizens — offers a more sustainable, scalable path than attempting to perfect external control.