Summary — DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Hosts: swyx + Alessio
Guests: Sherwin & Christina (OpenAI Open Platform team)

Overview

This interview covers OpenAI’s recent DevDay product launches and platform direction: Apps SDK (ChatGPT-first integrations), Agent Kit (builder + SDK + runtime + evals), adoption of the MCP protocol, evals & prompt optimization, Codex usage internally, reliability tooling, and ecosystem/portability trade-offs. The conversation focuses on how these pieces fit together to help developers build, deploy, evaluate, and iterate on agents and conversational apps.

Key points & main takeaways

Platform philosophy: OpenAI sees APIs and developer tooling as essential to distributing AI benefits broadly; recent launches are iterative steps to empower external builders.
Apps SDK (ChatGPT-first integrations): Inverts the old website-with-a-chatbot pattern — ChatGPT becomes the top layer and embeds apps (retains brand control with custom UI components and polished widgets).
MCP adoption: OpenAI integrated MCP (originally from Anthropic) into Agents SDK in March — they credit MCP as a useful, open protocol and are participating in its steering.
Agent Kit (Builder + SDK + Runtime + Evals):
- Agent Builder (visual canvas) is intended as both a development playground and a deployment path (export to code or run via Chat Kit).
- Includes templates/playbooks (customer support, document discovery, data enrichment, planning, internal knowledge, etc.).
- Supports human-in-the-loop approval nodes and stateful workflows; roadmap includes richer modalities (voice, multimodal) and more complex approval workflows.
Evals improvements:
- Now support running agent traces and grading full agent traces; roadmap to break traces into parts and apply rubrics / human-in-the-loop evaluation for each stage.
- Evals can target multiple model providers (via Open Router) to compare performance.
Prompt optimization is increasingly central — OpenAI invests in automated prompt tuning tied to evals; “prompts are not dying, they’re more important than ever.”
Codex/internal developer workflow:
- Codex is used heavily internally for feature implementation, PR previews, and PR review assist; tip: trust the model more (let it write larger chunks).
Chat Kit & widgets:
- Chat Kit is an embeddable, opinionated iframe (kept evergreen, not planned to be open-sourced) that provides polished consumer-grade chat UX and widgets; widget studio exists to create UI components quickly.
Portability & multi-model: OpenAI intends to support third-party and open models (evals can compare many providers), and is thinking about portability standards for stateful APIs.
Reliability & observability: New per-org service health dashboard (personal SLOs, token velocity, TPM, response codes) to help customers monitor integrations; OpenAI is aggressively improving SRE to meet high availability goals.
Cost / BYOK (bring-your-own-key):
- Many developers ask for BYOK for inference cost control; it is not available out-of-the-box but is top-of-mind and a common ask.
- Warning: state stores may be repurposed as databases; watch for cost/scale and operational impact.

Notable quotes & insights

“It’s kind of inverted — there’s ChatGPT at the top layer and then the website embedded inside of it.” — on the Apps SDK experience.
“Prompting is more important than ever.” — recurring theme: prompts + evals + optimization remain central to building effective agents.
“Trust the model to do more.” — Codex power-user tip: let the model produce bigger, riskier outputs and iterate.

Topics discussed

Apps SDK: intent, developer experience, brand-preserving integrations, widgets & Chat Kit
Agent Kit: agent builder canvas, SDK, connectors, templates, human approval, export-to-code, deployment
MCP protocol adoption and ecosystem (Anthropic origin, steering participation)
Responses API, stateful APIs, and porting considerations
Evals: agent traces, grading, rubrics, multi-model comparison
Prompt optimization & automated prompt tuning
Codex: productivity tips and internal adoption patterns
Widgets, embeddable chat iframe, and trade-offs of open-sourcing
Service health dashboard and reliability improvements
BYOK, cost control, and state-as-database caution
Roadmap expectations: more modalities (voice, multimodal), deeper human-in-loop support, and broader third-party model support

Action items / Recommendations (for developers)

Try Agent Builder as a playground:
- Use it to prototype agents, iterate on prompts, and export to Agents SDK when ready.
- Leverage provided templates (customer service, document discovery, data enrichment) to accelerate builds.
Integrate evals into development:
- Capture agent traces and run evals to measure end-to-end behavior; begin defining rubrics for long agentic tasks.
- Use evals to compare models (including open-source ones via Open Router) and to drive automated prompt optimization loops.
Invest in prompt engineering:
- Treat prompts as first-class, iteratively optimize them (automated tuning where possible), and expect maintenance as models evolve.
Use Codex to accelerate dev workflows:
- Try delegating larger chunks of code; use Codex-assisted PR previews and reviews to speed context switching.
Monitor reliability:
- Enable and watch the service health dashboard to get personal SLOs and real-time telemetry for your org.
Plan for cost & keys:
- Expect to manage inference costs; watch for possible BYOK options in the future and design guardrails (rate limiting, allow-lists).
Give feedback:
- OpenAI is actively seeking developer input on Agent Builder trade-offs (deterministic vs LM-driven nodes, types of logical nodes, modality priorities).

Additional technical notes

Agents SDK + Responses API launched earlier; MCP integrated into Agents SDK around March.
Evals now can evaluate long traces from agent runs; future improvements include finer-grained part-by-part evaluation and multimodal evals.
Chat Kit is an embeddable iframe optimized and kept evergreen (avoid rebuilding for model/modal changes).
OpenAI participates in MCP steering and collaborates with other vendors to promote open protocols and multi-model ecosystem support.

If you want, I can:

Produce a one-page checklist for adopting Agent Kit + Evals in your product.
Draft example eval rubrics for common agent workflows (e.g., customer support ticket resolution).

Summary of DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Latent Space: The AI Engineer Podcast
by swyx + Alessio

Summary — DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Overview

Key points & main takeaways

Notable quotes & insights

Topics discussed

Action items / Recommendations (for developers)

Additional technical notes

Summary of DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Latent Space: The AI Engineer Podcastby swyx + Alessio

Summary — DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Overview

Key points & main takeaways

Notable quotes & insights

Topics discussed

Action items / Recommendations (for developers)

Additional technical notes

Latent Space: The AI Engineer Podcast
by swyx + Alessio