Brex’s AI Hail Mary — With CTO James Reggio

Summary of Brex’s AI Hail Mary — With CTO James Reggio

by swyx + Alessio

1h 13mJanuary 17, 2026

Overview of Brex’s AI Hail Mary — With CTO James Reggio

This episode (Layton Space podcast) features Brex CTO James Reggio in conversation with swyx and Alessio. It covers Brex’s company-wide AI strategy, how the engineering org adopted LLMs and agentic development, the platform and architecture behind Brex’s agent products, concrete operational use cases (onboarding, KYC, underwriting, disputes, auditing), engineering culture and hiring choices, and practical lessons about evals, guarding against regressions and hallucinations, and the human changes that follow faster code generation.

Key takeaways

  • Brex organizes AI work across three pillars: corporate AI (internal productivity tooling), operational AI (automations that reduce operating cost and meet compliance), and product AI (agents and features sold to customers).
  • They centralize platform work (LLM gateway, tooling, observability, evals, vector stores) while letting product teams build domain-specific agents that connect into that platform.
  • Brex built a small, specialized “AI center of excellence” (~10 people) pairing “AI-native” younger engineers with Brex staff engineers to rapidly prototype agentic products; broader org adoption of agentic coding and copilots is widespread.
  • Architecture favors a multi-agent (assistant + sub-agents) orchestrated approach rather than a single overloaded agent — this simplifies ownership, testing, and incremental productization.
  • Operational AI often benefits most from simple, well-audited prompt-driven SOPs rather than exotic ML approaches; many ops problems are solved by clear agentic workflows and evals.
  • Major technical choices: agent framework(s) (they evaluated/used “Mastra” / similar frameworks vs older options like LangChain), TypeScript for the new agent layer, and vector DBs (PGVector / Pinecone). They also use Retool for admin UIs and store models/usage via an LLM gateway for observability and routing.
  • Practical engineering hygiene: agentic code generation requires new emphasis on evals, regression tests, code reviews (they use an AI-assisted reviewer tool referenced as “Greptile”), linting, and ownership to avoid slop and drift.

Brex’s AI strategy (the 3 pillars + platform)

  • Corporate AI: enable employees to 10x workflows by procuring and adopting multi-vendor AI tools, running internal experimentation, and teaching AI fluency across functions.
  • Operational AI: automate high-cost, repeatable, audit-friendly operational tasks (onboarding, KYC, underwriting, fraud, disputes, dispute docs) to lower ops cost and expand addressable customer profiles.
  • Product AI: build agentic features and an employee/finance assistant that customers can adopt as part of their corporate AI strategy.
  • Platform (cross-cutting): LLM gateway, prompt & model routing, versioning, observability, eval frameworks, vector stores, and tooling that services both product and ops use cases.

Metrics & goals cited:

  • Brex deploys across ~40,000 customer finance teams.
  • Example internal goal: 80% automated acceptance for commercial/startup onboarding with a decision within ~60 seconds (touchless).

Engineering org & talent approach

  • Engineering size: ~300 engineers in EPD.
  • Domain-aligned full-stack product teams (30–40 people) own product areas (cards, banking, expenses, travel, accounting).
  • AI center of excellence: ~10-person team focused on agentic LLM applications, highly cross-functional and distributed (Seattle, São Paulo, etc.).
  • Hiring philosophy: they intentionally hire ex-founders and future founders (program dubbed “Quitters Welcome”) to attract high-agency talent who want to build interesting problems with instant distribution.
  • Interview loop adapted for agentic development: candidates are evaluated on agentic workflows and use of AI tooling; existing employees re-ran the same exercise to upskill.

Agent architecture and tooling

  • Early infrastructure: an LLM gateway (prompt versioning, model routing, egress control, basic observability & cost monitoring) built in 2023 to manage model usage safely.
  • Agent layer: newer agentic layer built in TypeScript (chosen for ergonomics and if you “were to start Brex today”), with some applications running on a framework referred to in the episode as “Mastra” (agent framework) and others on in-house multi-agent orchestration frameworks.
  • Vector and retrieval: PG Vector, Pinecone used as vector DB options.
  • UI/UX and ops tooling: Retool for prompt/tool/eval management so domain experts can iterate without engineers.
  • Multi-vendor procurement: Brex deliberately keeps multiple model/tool vendors (e.g., different model providers, copilots) and lets employees choose; usage analytics inform renewals and budgeting.

Multi-agent networks — design & rationale

  • Problem: a single agent with many tools struggled to handle the breadth of Brex product domains (expenses, travel, policy, reimbursements).
  • Solution: orchestration pattern — a single “assistant” (employee-facing EA) that delegates to many specialized sub-agents (expense agent, travel agent, policy agent, audit agent). Sub-agents can have multi-turn dialog among themselves (not just single RPC tool calls), enabling richer coordination.
  • Benefits:
    • Encapsulation: product teams owning domain agents can iterate without breaking the whole system.
    • Better evals: smaller domain responsibilities are easier to test and refine.
    • Realistic UX: mirrors how an executive EA would coordinate specialists.
  • Implementation note: topology is more tree-like (assistant → sub-agents) but can produce graph-like interactions when agents communicate widely.

Example: Audit agent (illustrative)

  • Audit agent continuously looks for policy violations or unusual patterns (e.g., repeated $74 purchases where receipts are required at $75).
  • Review agent filters/triages audit hits to reduce false positives and decide which should become cases.
  • Cases trigger interactions with the employee assistant (collecting explanations, evidence).
  • This modular chain replaces a multi-team human workflow (outsourced auditors → finance reviewers → follow-up).

Operational & product use cases

  • Onboarding & underwriting: automated customer research agents to replace manual underwriting/KYC steps; simpler architectures outperformed more complex (e.g., attempts at RL-based decisioning).
  • Fraud, disputes, and KYC: automations frequently implemented on the platform; disputes are more complex and lower priority due to lower frequency.
  • Employee assistant: goal is to “make Brex disappear” for employees — the card is the optimal UI, and the assistant performs booking travel, filing expenses, answering policy, etc.
  • Audit and compliance monitoring: proactive, continuous detection plus triage workflow implemented via agent networks.
  • Knowledge grounding: curated product/process corpora are essential to avoid hallucinations and provide correct product answers.

Development practices, evals, safety and ops

  • Evals are central: platform includes evals per prompt/agent; ops teams co-own eval creation and QA loops. Mistakes become regression tests.
  • Evals types: blocking accuracy/regression tests vs. softer subjective metrics (tone/coherency). Multi-turn integration tests are used but can be noisy; they sometimes seed conversations to isolate behaviors.
  • Regression prevention is a core operational change as agent deployments scale.
  • Guardrails and circuit breakers: the LLM gateway can enforce basic limits; some hallucination/false-action patterns are mitigated via system prompts and guardrails, but more work remains.
  • Code quality: adoption of agentic coding demands stronger reviews and ownership. Brex uses AI-assisted reviewers (referenced as “Greptile” in the conversation) plus traditional linters and CI checks. They experiment with AI in CI to produce higher-level review comments.
  • Memory limits: agents still have poor long-term memory; this is a practical limitation for agent abilities today.

Cultural change, fluency & people management

  • Fluency framework: levels such as advocate, builder, native — company-wide training & incentives (spot bonuses, AI spotlights at all-hands) encourage adoption without punitive measures.
  • Reskilling approach: ops employees are being trained to shift from executing SOPs to writing prompts, building evals, and managing agent workflows.
  • Headcount & productivity: Reggio notes agentic development amplifies both good and bad outcomes; Brex hasn’t shrunk engineering headcount despite efficiency gains — instead they aim to serve more customers with the same team size.
  • Hiring & retention: hiring founders / ex-founders is intentional; company supports people who later go start companies (Quitters Welcome).

Notable quotes & insights

  • “We have three pillars for our AI strategy — corporate, operational, and product — and the platform ties them together.”
  • “We want to build features that somebody else can say to their board: ‘We adopted Brex and this is part of our corporate AI strategy.’”
  • “Sometimes too much experience is an impediment — pairing AI-native engineers with senior product engineers has been powerful.”
  • On ops automation: “In operations you need to break down problems really granularly and form SOPs that humans can repeatedly follow and thus can be audited — that translates cleanly to LLMs.”

Risks, open challenges and lessons learned

  • Overengineering vs. simplicity: complex ML (RL for underwriting) underperformed simpler agent / search-based approaches — start simple and iterate.
  • Hallucinations and false "actions": agents may claim to contact a team or perform a step they can’t — requires explicit tooling or system prompt guardrails.
  • Code drift & knowledge drift: faster code generation increases risk of team unfamiliarity with live code, making incident response harder.
  • Evals & regression testing are non-negotiable as systems go into production; multi-turn evals are useful but need careful design to avoid brittleness.
  • Platform fragmentation vs. ergonomics: Brex supports multiple frameworks and model providers to enable experimentation and avoid vendor lock-in—but this requires tooling to manage cost/routing/observability.

What Brex is asking for (call to action)

  • Reggio invites builders and researchers working on multi-agent networks and richer agent-to-agent tooling to engage — Brex sees opportunity to standardize and grow tooling for agent orchestration, evaluation, and reliable multi-agent execution.

Recommendations for listeners / builders

  • If building AI features inside regulated/ops-heavy businesses, prioritize:
    • Clear, auditable SOP-to-prompt translation
    • Eval-first thinking (regressions as tests)
    • Modular agent design (domain agents + orchestrator)
    • Low-code admin surfaces so domain experts can iterate (e.g., Retool-style UIs)
    • Multi-vendor models for procurement flexibility and usage analytics

Final snapshot of the setup (numbers & tech at a glance)

  • Engineering: ~300 engineers; AI center of excellence ~10 people.
  • Customers: ~40,000 finance teams.
  • Tech: TypeScript agent layer, KG/vector stores (PGVector, Pinecone), LLM gateway, Retool admin UIs, mixed agent frameworks (internal + “Mastra”/others), AI-assisted code review tooling (referred to as “Greptile”).
  • Goals: more automation of onboarding/ops, strong eval & regression discipline, and productizing agent features that customers can adopt as part of their corporate AI strategy.