Move over, vibe-coding. Vibe-proving is here for math

Summary of Move over, vibe-coding. Vibe-proving is here for math

by Science Friday and WNYC Studios

18mMarch 27, 2026

Overview of "Move over, vibe-coding. Vibe-proving is here for math" (Science Friday)

This episode explores how recent advances in large language models and AI are changing mathematical practice. Hosts Flora Lichtman interviews two research mathematicians—Dr. Emily Riehl (Johns Hopkins) and Dr. Daniel Litt (University of Toronto)—about where AI currently helps in mathematics, where it falls short, the risks of AI-generated “proofs,” and how formal proof assistants and human mathematicians fit into the future workflow. The conversation introduces the notion of "vibe-proving" (analogue to "vibe-coding"): using AI to generate proofs or proof-sketches that may be plausible but require careful human verification.

Key takeaways

  • Recent AI progress: Large models have moved from being bad at basic arithmetic to solving high-level contest problems and producing research-level proofs in some cases. This is a rapid and significant improvement but not yet a revolution.
  • Not revolutionary—yet: There are promising success stories, but AI has not (so far) independently produced fundamentally new research programs or reliably solved deeply open problems in a way that replaces mathematicians.
  • Vibe-proving = mix of promise and risk: AI can generate proofs that look convincing (“vibe-proofs”), but they often contain errors or hallucinations. Human oversight remains essential.
  • Formal verification is crucial: Output in natural language is hard to trust; translating AI proofs into the formal languages used by proof assistants (Coq, Lean, Isabelle, etc.) enables automated checking and makes AI-generated proofs more trustworthy.
  • Changing skill set: AI will shift what mathematicians need to know (more ability to curate, verify, formalize, synthesize literature) while freeing time from routine tasks. Creativity, problem selection, and deep understanding remain central human strengths.
  • Community norms and standards must evolve: Because AI models don’t have the human norms of belief/credibility, mathematicians may need to demand stronger evidence (formalized proofs, verifiable artifacts) before accepting AI-generated results.

Topics discussed

  • AI performance trajectory: from poor arithmetic to contest-level success and isolated research-level wins.
  • Differences between problem-solving and problem selection: AI has shown ability to solve or assist with problems but not to autonomously choose interesting new directions or state the most important open problems.
  • Examples and evidence (as discussed by guests):
    • High-level competition successes reported for major models (e.g., high International Mathematical Olympiad–level performance).
    • A small number of Erdős problems reportedly solved autonomously or with AI assistance; more examples where AI extracted or reassembled literature results.
    • Instances of bad outcomes: several Hodge conjecture preprints on the arXiv that appeared to be AI-generated nonsense—illustrating the danger of posting unverified AI proofs.
  • Vibe-proving vs vibe-coding: parallels between programmers using auto-complete tools and mathematicians using AI to sketch proofs—both can accelerate experts but also enable novices to produce plausible (but incorrect) outputs.
  • Role of proof assistants: these tools can enforce rigor and provide precise feedback on each proof step; integrating AI with proof assistants could enable automated verification and training signals for better formal proofs.
  • What makes a great mathematician: sustained engagement, intuition developed over years, creativity, ability to formulate problems, communicate ideas, and collaborate.

Notable quotes & paraphrases

  • Daniel Litt: “Mathematics is not just about solving problems... stating the problems, figuring out what is an interesting theorem to try and prove is equally important.”
  • Emily Riehl (paraphrase of point raised): If AI produced an oracle that simply told us which theorems were true without giving human-understandable insight, mathematicians would find that unsatisfactory; we prize explanatory proofs.
  • On verification: AI should ideally produce proofs in a format that proof assistants can check—this moves the burden of checking from human referees to trusted software.

Risks and limitations

  • Hallucinations: AI can produce plausible but incorrect proofs; these can be mistakenly published if users don’t verify them.
  • Trust gap: Unlike human authors who are expected to believe in their proofs, AI systems have no belief-norm; the community needs higher standards for AI-derived claims.
  • Interpretability: Long machine-generated proofs may be technically correct but not illuminating; mathematicians often want proofs that explain why something is true, not just that it is true.
  • Workload of verification: Formalizing and verifying long arguments in proof assistants is labor-intensive, though it offers the best route to reliable acceptance of AI proofs.

Practical recommendations / action items

For mathematicians and research groups:

  • Use AI as a research aid—literature synthesis, learning new areas, sketching approaches—but always verify results.
  • Prefer workflows that couple AI output with proof assistants for formal verification when possible.
  • Treat natural-language AI proofs with skepticism; avoid posting unvetted AI-generated proofs to preprint servers.
  • Learn tooling: invest time in understanding formal proof assistants and AI-assisted formalization workflows to get ahead of the transition.

For journals, arXiv moderators, and institutions:

  • Update norms and submission/peer-review processes to address AI-generated results (e.g., require verifiable artifacts or formal proofs for high-risk claims).
  • Develop standards for disclosure when AI is used in producing proofs or technical results.

For students and early-career researchers:

  • Expect changing skill requirements: familiarity with AI research tools and proof assistants will be a valuable complement to traditional mathematical training.
  • Continue to cultivate deep problem-formulation and collaboration skills—those remain central and hard to automate.

Final perspective

AI is already reshaping parts of mathematical practice—helping with computations, learning, literature review, and occasionally producing suggestive or partly correct proofs—but it has not yet replaced core human roles: creative problem selection, deep conceptual understanding, and community verification. The most productive near-term path is likely a hybrid one: humans using AI to accelerate discovery while relying on formal verification tools and human judgment to ensure correctness and comprehension.