Treating your agents like microservices

Summary of Treating your agents like microservices

by The Stack Overflow Podcast

35mDecember 5, 2025

Overview of The Stack Overflow Podcast — Treating your agents like microservices

This episode (host Ryan Donovan) features Guillaume de Saint‑Marc, VP of Engineering at Outshift (Cisco), discussing why the near‑term future of AI will be multi‑agent systems and how those agents should be treated like microservices: specialized, discoverable, secure, observable, and able to collaborate at machine scale. Guillaume explains the infrastructure, protocols, and operational practices needed to move from single agents to interoperable ecosystems (an “Internet of Agents”), and describes open‑source work (AGNTCY and related tooling) intended to bootstrap that stack.

Key topics covered

  • Why multi‑agent systems (MAS) are the natural next step: modular reasoning, many inferences instead of one huge model, and specialization for enterprise trust.
  • Tradeoffs between a single “universal” agent vs many specialized agents.
  • Identity, authorization, and security challenges unique to agents (logical/business identity vs workload identity; rapidly changing context/identity at machine speed).
  • Protocols and messaging patterns for agent communication:
    • MCP (tool/access pattern)
    • A2A (agent‑to‑agent peer protocol)
    • Slim (group messaging transport + SlimRPC)
  • Discovery and registries: Open Agent Schema/Framework (OASF) and a decentralized agency directory (DHT/federation model).
  • Observability and monitoring for agentic systems: extending OpenTelemetry, layer‑8/9 semantic observation, the Metric Computation (MC) engine, and LLMs-as-judges to evaluate behavior.
  • Deployment patterns: “lift‑and‑shift” of deterministic workflows vs fully self‑forming agent swarms, and shades in between.
  • Open‑source initiative and testbeds: AGNTCY (spelled A‑G‑N‑T‑C‑Y) and Outshift’s website/newsletter.

Main takeaways

  • Multi‑agent systems are becoming necessary because specialized agents cooperating can outperform a single monolithic agent in enterprise settings (trust, cost, control).
  • Agents combine properties of workloads and human‑like actors → they need new layers of infrastructure beyond traditional cloud‑native tooling.
  • Identity and authorization are central problems: agents can act for different users and at machine speed, so you need fine‑grained, provable, semantic identity and authorization proxies.
  • Agent communication needs more than point‑to‑point RPC. Secure, low‑latency, group messaging (Slim) and scalable data‑oriented transports are required to avoid race conditions and to scale collaboration.
  • Discovery must be decentralized and extensible (OASF + agency directory) so many vendors and internal teams can publish and find agents without being locked into a single platform.
  • Observability must include semantic/behavioral telemetry (not just container health). Reconstructing call graphs and measuring consistency across runs is critical for trust and debugging.
  • Practical adoption path: start with deterministic, certifiable workflows (lift‑and‑shift) where acceptable accuracy (e.g., 80%) already yields large ROI; gradually move toward more dynamic agent orchestration.

Technical components & protocols explained

  • MCP (Model Connector Protocol / Message Call Protocol context): widely used today for agents to access tools or data as a transactional API. Good for tool access but limited for symmetric agent collaboration.
  • A2A: peer‑to‑peer agent protocol for symmetric agent interactions. Good for one‑to‑one collaboration; point‑to‑point only.
  • Slim (SlimRPC / slim transport):
    • A gRPC‑style, low‑latency, data‑oriented messaging transport built for agents.
    • Supports point‑to‑point, multicast/group communication, “fire and forget” patterns.
    • Implements secure group messaging (MLS stack), member revocation, and efficient data multicast to scale better than many point‑to‑point messages.
    • Enables multi‑RPC/group RPC where multiple agents share content and respond.
  • OASF (Open Agent Schema/Framework):
    • Extensible schema for “agent cards” (descriptive metadata about agents).
    • Can carry MCP, A2A, vendor specs and custom attributes.
  • Agency directory (decentralized registry):
    • Federation of directory nodes using DHT-like peer mesh to enable discovery without centralized directories.
    • Nodes can publish internal/externally shared agent cards; signed provenance ensures authenticity.
  • Observability stack:
    • Extend OpenTelemetry to capture semantic/agentic behavior (layer 8 syntax, layer 9 semantics).
    • Metric Computation (MC) engine: open‑source tool to compute metrics, reconstruct call graphs, measure consistency across runs, and identify critical agents.
    • LLMs as evaluators/judges to provide qualitative insights (e.g., criticality or correctness).

Notable quotes / concise insights

  • “Agents will have to be really, really specialized to be trusted by enterprise.”
  • “This is a new era for computer science: mixing deterministic code and probabilistic entities.”
  • “Agents change face and identity multiple times per second — versus we don’t.”
  • “If you have an agent, the minute after you want it to collaborate with another agent — that’s how teams work.”

Risks, challenges and failure modes highlighted

  • Security/backdoor risks via third‑party connectors (example: an SCCP/NCP-like connector that silently adds a BCC to emails).
  • Identity confusion between workload identity and business/logical identity; token reuse across contexts can cause unauthorized actions.
  • Latency and race conditions in multi‑party reasoning — out‑of‑order messages can break convergence.
  • Discovery fragmentation if registries do not interoperate; vendor lock‑in if the ecosystem isn’t open.
  • Observability blindness if you only monitor containers/hosts and not semantic interactions.

Practical recommendations / next steps for engineers and architects

  • Start with lift‑and‑shift for well‑bounded processes where partial automation yields big ROI (e.g., root cause analysis, first‑pass triage).
  • Favor specialized, auditable agents for enterprise tasks rather than a single universal agent.
  • Design logical/business identity and authorization for agents (proxies that map to enterprise auth systems) instead of relying solely on workload-level identity.
  • Adopt or experiment with peer protocols (A2A/MCP) and evaluate group messaging transports (Slim or other) for multi‑agent workflows.
  • Extend telemetry to include semantic traces and reconstruct agent call graphs; use MC engine–style tools to measure stability and criticality.
  • Participate in or follow open initiatives (AGNTCY, agency directory testbeds) to avoid lock‑in and to shape discovery/interop standards.

Who should listen / care

  • Platform architects building AI/agent platforms or integrations for enterprises.
  • Security and identity engineers planning agent access controls and provenance.
  • SRE/observability teams responsible for multi‑service/agent monitoring.
  • Developers/product teams exploring agentic automation for internal workflows.
  • Standards and open‑source contributors interested in agent discovery, messaging, and interoperability.

Resources & where to learn more

  • AGNTCY open‑source project (spelled A‑G‑N‑T‑C‑Y; referenced in the episode) — visit the project site mentioned in the episode (AGNTCY .org).
  • Outshift/Cisco — outshift.com for newsletters and blog posts on multi‑agent systems.
  • Read about A2A, MCP, and group messaging concepts; search for the slim/SlimRPC and the Ripple Effect Protocol (paper referenced with MIT).
  • Look into extending OpenTelemetry for LLM/agent observability and the Metric Computation (MC) engine open‑source tool.

Host: Ryan Donovan; Guest: Guillaume de Saint‑Marc (VP Engineering, Outshift by Cisco).

If you want the quick, practical takeaway: treat agents as microservices but extend the stack with semantic identity, secure group messaging, decentralized discovery, and semantic observability — start with specialized agents and deterministic workflows, then iterate toward more dynamic, collaborative multi‑agent systems.