Overview of OpenAI Calls a ‘Code Red’ + Which Model Should I Use? + The Hard Fork Review of Slop
Hosts Kevin Roose (NYT) and Casey Newton (Platformer) discuss a week of seismic moves in frontier AI: OpenAI’s internal “code red,” competing model releases from Google (Gemini 3) and Anthropic (Claude Opus 4.5), what that means strategically for the major labs, practical guidance on which models to use, and a cultural “Hard Fork Review of Slop” covering how AI-generated content is reshaping media, ads and entertainment.
Key takeaways
- OpenAI declared a “code red,” reallocating engineering resources back to ChatGPT and delaying other work (ads, agents, Pulse) to respond to competitive pressure from Google and Anthropic.
- Gemini 3 (Google) and Claude Opus 4.5 (Anthropic) have tightened the race: Gemini is fast and ubiquitous; Opus 4.5 shows strong style/voice fidelity and “human” conversational quality.
- OpenAI is reportedly having pre-training troubles; fixing them is expensive and slow—hence urgency and new internal model projects (codenames Garlic and Shallot).
- For most users, ChatGPT, Gemini, or Claude will work well; power users should mix and match models because capabilities are changing quickly.
- “Slop” (AI-generated low-fidelity or deceptive content) is becoming a new medium—some benign/creative, some harmful (fake ads, bad recipes, false events).
OpenAI “Code Red” — what it is and why it matters
- Reported memo from Sam Altman: immediate redeployment of engineers to improve ChatGPT (personalization, fewer refusals, speed/reliability); deprioritizing other product bets.
- Why now: Gemini 3 and Claude Opus 4.5 pose real competitive threats. Google’s distribution and willingness to subsidize could rapidly erode OpenAI’s consumer foothold if Google’s model is seen as superior.
- Structural risks for OpenAI: massive capital commitments, reliance on subscription and enterprise revenue, and a broad, unfocused product portfolio (video generator Sora cited as a distracting bet).
- Technical issue highlighted: OpenAI hasn’t had a successful pre-training run in some time; rival pre-trains (e.g., Gemini 3) may have leapfrogged them. Pre-training fixes are costly and non-trivial.
Model rundown: Gemini 3, Claude Opus 4.5, ChatGPT
- Gemini 3 (Google)
- Strengths: speed, integration/distribution across Google products; effectively a “workhorse” that is fast and reliable for many tasks.
- Weaknesses: less personality; sometimes weaker in thorough fact-checking compared with ChatGPT.
- Reach claim: Google announced ~650 million monthly Gemini users (caveat: unclear what “using” counts).
- Claude Opus 4.5 (Anthropic)
- Strengths: high-quality prose, strong “style transfer” and consistent tone; empathetic, warm conversational behavior; promising for research/creative tasks.
- Positioning: Anthropic focused on enterprise API sales and agentic workflows; Claude as a polished consumer-facing byproduct.
- Notable artifact: “SolDocument” (a training-stage document about Claude/Anthropic values) leaked/confirmed—signals Anthropic’s philosophical focus on safety and model “internality.”
- ChatGPT (OpenAI)
- Strengths: brand ubiquity, many weekly users (>800M weekly reported), strong post-training capabilities.
- Weaknesses: current competitive pressure; OpenAI’s path forward likely requires new, larger pre-trains or leapfrog innovations.
Strategic implications
- Distribution matters: Google’s platform reach (Search, Gmail, Android) is a huge advantage if model quality is competitive.
- Business models diverge: Google and OpenAI appear to be optimizing for engagement/consumer monetization (ads, integrations); Anthropic is more enterprise- and alignment-focused.
- Market dynamics: enterprise AI revenue (Anthropic’s rapid ARR growth) is reshaping who captures value; Anthropic’s success in enterprise reduces OpenAI’s market share potential.
Which model should you use?
- 80% rule (general users): Any of ChatGPT, Gemini, or Claude will handle common tasks well.
- Power users (top ~20%): Keep experimenting and mix models—capabilities are changing rapidly; use different models for different strengths (speed, style fidelity, research help).
- Practical tip: use models for drafting, brainstorming, fact-checking (then verify), and workflow acceleration. Expect the “best” choice to change over months.
Notable insights & quotes
- “If OpenAI flames out, we’ll be able to identify 15 huge mistakes they made.” — on how fragile a lead can be.
- “Blurry JPEG” metaphor (Ted Chiang revisited): models are getting higher resolution—what used to be approximate outputs are becoming refined and stylistically convincing.
- Cultural divide: “California view” of AI = What can it do? vs “New York view” = What can’t it do? Both perspectives matter.
- Rule of thumb: don’t trust AI opinions from people who don’t regularly use the models.
The Hard Fork Review of Slop — examples & lessons
- Buckingham Palace fake Christmas market: AI-generated promotional images drove tourists to a non-existent event—harmless-seeming slop causing real-world confusion.
- AI-generated recipes: traffic drop and harm to food bloggers; AI recipe outputs can be nonsensical and damage creators who tested original recipes.
- Learning With Lyrics: creative use of AI to make educational songs—example of benign/valuable slop.
- Whirlpool/Brazil ad: agency used a TED Talk clip and synthetic voice of a US politician without consent; won Cannes awards then had to return them. Highlights legal, ethical, and reputational risks.
- Bird Game 3: viral, fictional game clips made by AI—satire/entertainment use of slop; may inspire real creations.
Actionable recommendations
- If you rely on AI for work: try multiple models and update your toolkit regularly. Use AI for research, drafting, and iterative workflows but always verify facts.
- For creators & journalists: protect original work (recipes, videos) and monitor AI-driven scraping/remixing—document provenance and consider takedown/legal options.
- For product teams: think about incentives—ad-driven engagement models will shape model behavior; enterprise-focused labs may prioritize robustness and alignment.
- For consumers: be skeptical of sensational AI content (events, ads, “viral games”); look for provenance and corroboration.
Where this is headed
- Short term: rapid iteration—models will keep improving; distribution will decide many market outcomes.
- Medium term: coding/agentic workflows likely to be among the first broadly disrupted categories; other professions will follow more slowly.
- Cultural/ethical front: expect more regulation, litigation, and debates about consent, IP, safety, and the ethics of training/model behavior.
