DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Summary of DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

by swyx + Alessio

October 7, 2025

Summary — DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Hosts: swyx + Alessio
Guests: Sherwin & Christina (OpenAI Open Platform team)


Overview

This interview covers OpenAI’s recent DevDay product launches and platform direction: Apps SDK (ChatGPT-first integrations), Agent Kit (builder + SDK + runtime + evals), adoption of the MCP protocol, evals & prompt optimization, Codex usage internally, reliability tooling, and ecosystem/portability trade-offs. The conversation focuses on how these pieces fit together to help developers build, deploy, evaluate, and iterate on agents and conversational apps.


Key points & main takeaways

  • Platform philosophy: OpenAI sees APIs and developer tooling as essential to distributing AI benefits broadly; recent launches are iterative steps to empower external builders.
  • Apps SDK (ChatGPT-first integrations): Inverts the old website-with-a-chatbot pattern — ChatGPT becomes the top layer and embeds apps (retains brand control with custom UI components and polished widgets).
  • MCP adoption: OpenAI integrated MCP (originally from Anthropic) into Agents SDK in March — they credit MCP as a useful, open protocol and are participating in its steering.
  • Agent Kit (Builder + SDK + Runtime + Evals):
    • Agent Builder (visual canvas) is intended as both a development playground and a deployment path (export to code or run via Chat Kit).
    • Includes templates/playbooks (customer support, document discovery, data enrichment, planning, internal knowledge, etc.).
    • Supports human-in-the-loop approval nodes and stateful workflows; roadmap includes richer modalities (voice, multimodal) and more complex approval workflows.
  • Evals improvements:
    • Now support running agent traces and grading full agent traces; roadmap to break traces into parts and apply rubrics / human-in-the-loop evaluation for each stage.
    • Evals can target multiple model providers (via Open Router) to compare performance.
  • Prompt optimization is increasingly central — OpenAI invests in automated prompt tuning tied to evals; “prompts are not dying, they’re more important than ever.”
  • Codex/internal developer workflow:
    • Codex is used heavily internally for feature implementation, PR previews, and PR review assist; tip: trust the model more (let it write larger chunks).
  • Chat Kit & widgets:
    • Chat Kit is an embeddable, opinionated iframe (kept evergreen, not planned to be open-sourced) that provides polished consumer-grade chat UX and widgets; widget studio exists to create UI components quickly.
  • Portability & multi-model: OpenAI intends to support third-party and open models (evals can compare many providers), and is thinking about portability standards for stateful APIs.
  • Reliability & observability: New per-org service health dashboard (personal SLOs, token velocity, TPM, response codes) to help customers monitor integrations; OpenAI is aggressively improving SRE to meet high availability goals.
  • Cost / BYOK (bring-your-own-key):
    • Many developers ask for BYOK for inference cost control; it is not available out-of-the-box but is top-of-mind and a common ask.
    • Warning: state stores may be repurposed as databases; watch for cost/scale and operational impact.

Notable quotes & insights

  • “It’s kind of inverted — there’s ChatGPT at the top layer and then the website embedded inside of it.” — on the Apps SDK experience.
  • “Prompting is more important than ever.” — recurring theme: prompts + evals + optimization remain central to building effective agents.
  • “Trust the model to do more.” — Codex power-user tip: let the model produce bigger, riskier outputs and iterate.

Topics discussed

  • Apps SDK: intent, developer experience, brand-preserving integrations, widgets & Chat Kit
  • Agent Kit: agent builder canvas, SDK, connectors, templates, human approval, export-to-code, deployment
  • MCP protocol adoption and ecosystem (Anthropic origin, steering participation)
  • Responses API, stateful APIs, and porting considerations
  • Evals: agent traces, grading, rubrics, multi-model comparison
  • Prompt optimization & automated prompt tuning
  • Codex: productivity tips and internal adoption patterns
  • Widgets, embeddable chat iframe, and trade-offs of open-sourcing
  • Service health dashboard and reliability improvements
  • BYOK, cost control, and state-as-database caution
  • Roadmap expectations: more modalities (voice, multimodal), deeper human-in-loop support, and broader third-party model support

Action items / Recommendations (for developers)

  • Try Agent Builder as a playground:
    • Use it to prototype agents, iterate on prompts, and export to Agents SDK when ready.
    • Leverage provided templates (customer service, document discovery, data enrichment) to accelerate builds.
  • Integrate evals into development:
    • Capture agent traces and run evals to measure end-to-end behavior; begin defining rubrics for long agentic tasks.
    • Use evals to compare models (including open-source ones via Open Router) and to drive automated prompt optimization loops.
  • Invest in prompt engineering:
    • Treat prompts as first-class, iteratively optimize them (automated tuning where possible), and expect maintenance as models evolve.
  • Use Codex to accelerate dev workflows:
    • Try delegating larger chunks of code; use Codex-assisted PR previews and reviews to speed context switching.
  • Monitor reliability:
    • Enable and watch the service health dashboard to get personal SLOs and real-time telemetry for your org.
  • Plan for cost & keys:
    • Expect to manage inference costs; watch for possible BYOK options in the future and design guardrails (rate limiting, allow-lists).
  • Give feedback:
    • OpenAI is actively seeking developer input on Agent Builder trade-offs (deterministic vs LM-driven nodes, types of logical nodes, modality priorities).

Additional technical notes

  • Agents SDK + Responses API launched earlier; MCP integrated into Agents SDK around March.
  • Evals now can evaluate long traces from agent runs; future improvements include finer-grained part-by-part evaluation and multimodal evals.
  • Chat Kit is an embeddable iframe optimized and kept evergreen (avoid rebuilding for model/modal changes).
  • OpenAI participates in MCP steering and collaborates with other vendors to promote open protocols and multi-model ecosystem support.

If you want, I can:

  • Produce a one-page checklist for adopting Agent Kit + Evals in your product.
  • Draft example eval rubrics for common agent workflows (e.g., customer support ticket resolution).