Overview of “Engineers are becoming sorcerers” | The future of software development with OpenAI’s Sherwin Wu
Lenny Rachitsky interviews Sherwin Wu, Head of Engineering for OpenAI’s API and developer platform, about how AI (especially Codex and OpenAI models) is changing software engineering, management, product design, and the startup ecosystem. The conversation covers how engineers are shifting from writing code to orchestrating agents, how teams scale AI adoption, where platform and product builders should place their bets, and practical recommendations for companies trying to get value from AI.
Key topics discussed
- Current engineer workflows at OpenAI: heavy daily use of Codex, Codex reviews 100% of PRs, and AI-first code generation.
- Role changes: IC engineers becoming managers of agents; managers able to scale leverage through AI.
- Metaphors: SICP’s “engineers as wizards” and the “Sorcerer’s Apprentice” — powerful incantations (prompts/agents) that require skill and oversight.
- Practical challenges: agent fragility, context/knowledge encoding, CI & deployment automation, code-review scaling.
- Product & platform guidance: build for where models are headed (not only where they are), avoid overfitting to current scaffolding, OpenAI’s platform/mission stance.
- Business implications: one-person high-leverage startups, second/third-order effects (growth of bespoke B2B SaaS), big opportunity in business-process automation.
- Near-term model roadmap: longer coherent task horizons (multi-hour agents), and stronger multimodal/audio capabilities.
Main takeaways
- AI is highly embedded in engineering workflows: ~95% of engineers use Codex daily; many PRs are generated and reviewed by models. This changes the nature of engineering work (more orchestration, less line-by-line coding).
- Engineers who adopt AI tools produce far more output (Sherwin cites ~70% more PRs for active Codex users) and widen the productivity gap.
- Agent-driven workflows are high-leverage but fragile: failures usually stem from poor context or missing tribal knowledge. Encoding that context (docs, structured info, comments, “skills” files) is critical.
- Don’t blindly follow user requests for scaffolding that the models may eventually obviate—models “eat your scaffolding for breakfast.” Build for the trajectory of model capabilities.
- Companies often see negative ROI on AI when deployments are top-down without bottom-up adoption or a “tiger team” to discover, adapt, and share pragmatic workflows.
- OpenAI positions itself as a platform/ecosystem company and encourages startups to build on top of models rather than fear being squashed.
Notable quotes & metaphors
- “Engineers are becoming tech leads. They’re managing fleets and fleets of agents. It literally feels like we’re wizards casting all these spells.”
- Kevin Whale: “This is the worst the models will ever be.” (i.e., expect continual improvement)
- “The models will eat your scaffolding for breakfast.” — risk of building rigid tooling around models that will soon be unnecessary.
- Management metaphor: treat engineers like surgeons—support the “surgeon” (top performers) and clear organizational blockers.
Practical advice & action items
- For engineers and founders:
- “Build for where the models are going, not where they are today.” Target product experiences that will be unlocked as models improve.
- Start small: experiment with Codex/ChatGPT on internal data (Notion, Slack, GitHub) to learn limits and workflows.
- Automate repetitive review and CI tasks with model-assisted tooling to collapse friction (lint fixes, PR suggestions).
- For managers:
- Spend disproportionate time with top performers: unblock them and amplify their experiments with models.
- Create an internal tiger team (often composed of “technical-adjacent” power users, not necessarily full-stack engineers) to prototype use cases and spread best practices.
- Use AI-assisted org knowledge (integrated with docs/commit history) to detect current and potential blockers proactively.
- For product teams:
- Favor flexible, minimally prescriptive abstractions (e.g., simple search or “skills” files) rather than heavy scaffolding that might be superseded by model capability.
- Build evals and quantitative tests into launches (use evals API) to measure agent correctness and regression.
- For startups:
- Don’t over-fear platform incumbents—focus on solving customer problems well; the opportunity space is huge.
- Consider offering specialized B2B tooling for vertical use cases (support, podcast tools, etc.) that one-person or small teams can integrate.
Platform & product specifics (what OpenAI’s API offers)
- Responses API: low-level primitive for sampling from models and building long-running agents.
- Agents SDK: higher-level toolkit for building multi-agent orchestration, sub-agents, guardrails, and workflows.
- Agent Kit & Widgets: UI components to quickly ship polished agent interfaces.
- Evals API: tools for quantitatively testing agent behavior and correctness.
- OpenAI strategy: keep models and capabilities available via the API to foster an open ecosystem rather than exclusively building every vertical internally.
Predictions & near-term technical trends
- Task length/coherence: frontier models are trending toward multi-hour coherent task handling (current benchmarks show hours-level tasks are becoming feasible).
- Multimodal/audio: substantial improvements expected in audio/speech models in the next 6–18 months; enterprise audio workflows are underrated opportunities.
- Business-process automation: major opportunity to automate many repeatable, deterministic enterprise operations (support, billing, HR workflows) outside of classic software engineering.
Risks, pitfalls, and common anti-patterns
- Relying solely on top-down mandates without bottom-up adopters or champions leads to poor ROI on AI.
- Over-engineering scaffolding (agent frameworks, vector-store heavy designs) that gets obsoleted as models improve.
- Treating models as fully autonomous without human oversight—agents can go “off the rails” if not given context or monitored.
- Underestimating support and distribution costs for very small/solo-run companies—many micro-startups may still need outsourced services or plug-in vendors.
Quick practical checklist (for teams starting with AI)
-
- Identify 1–3 power users (technical-adjacent) to form a tiger team.
-
- Hook a model (Codex/ChatGPT) to internal knowledge (Notion, Slack, GitHub) and run a few pilot workflows.
-
- Instrument & evaluate with quantitative tests (use evals) before wide rollout.
-
- Automate repetitive CI/review tasks to increase throughput and reduce reviewer fatigue.
-
- Iterate product abstractions with the expectation that model capabilities will improve—avoid locking into heavy scaffolding.
Additional useful links & resources from the episode
- Sherwin on X (Twitter): @sherwinwu — he invites builders to share projects and ideas.
- Books Sherwin recommended (lightning round): I Am the Antimemetics Division (QNTM), Breakneck (Dan Wang), and Patrick McGee’s book on Apple in China.
- Tactical products mentioned: Codex, Agents SDK, Agent Kit, Evals API, Ubiquiti home networking (personal recommendation).
Final thought Sherwin leaves listeners with
The next 2–3 years will be one of the most exciting, energizing periods in tech—lean in, experiment, and don’t take this moment for granted. Engage with the tools early (you don’t need to be an engineer), learn limits, and build for the model-driven future.
