Overview of Generating text with diffusion (and ROI with LLMs)
This episode of the Stack Overflow Podcast (recorded at AWS re:Invent) contains two floor interviews: Stefano Armano (CEO & co‑founder of Inception) on diffusion language models (diffusion LLMs), and Aldo Lovano (chairman/co‑founder of Rumi) on an ROI‑first approach to enterprise AI and physical/robotics AI. The episode compares diffusion vs. autoregressive approaches, covers technical tradeoffs and deployment considerations, and explains how two startups are applying ML to real businesses (APIs, legacy modernization, robotics, and computer vision).
Key takeaways
- Diffusion LLMs generate many tokens in parallel by iteratively denoising a noisy token sequence; unlike autoregressive models that output tokens left-to-right.
- Inception claims 5–10x speed improvements vs. similarly accurate autoregressive models because diffusion allows parallel token updates and better arithmetic/memory‑bandwidth efficiency.
- Diffusion LLMs are trained as denoisers (trained to correct corrupted text/code) rather than next‑token predictors; this gives built‑in error‑correction potential but does not eliminate hallucinations.
- Main technical challenges for diffusion LLMs: variable‑length generation, discrete token math vs. continuous diffusion math, repetition/degeneration (repetitive loops), and efficient serving/throughput.
- Rumi’s product strategy emphasizes ROI‑first enterprise AI: measure TCO and predicted ROI before implementation; product suite includes modules for legacy systems, automation, computer vision, agentic integrations, and physical AI (robots).
- Practical business focus: maintain/augment legacy systems (COBOL/mainframes) with natural‑language driven functionality and code support rather than always migrating — plus optional migration tools like AST-based conversion.
- Robotics/physical AI: Rumi connects agentic intelligence to physical devices (humanoids, teleoperation, edge AI) but sees limited immediate unit‑economics for general‑purpose humanoids; they view robotics and enterprise AI as complementary long‑term bets.
- Ethical and workforce impacts: both guests acknowledge potential job disruption; Rumi explicitly expects payroll reductions in some cases and emphasizes managing transition and retraining.
Stefano Armano — Inception: Diffusion language models (what, why, how)
How diffusion LLMs work (high level)
- Start from a noisy/random token sequence (analogous to image diffusion starting from noise).
- Use transformer‑based denoisers to iteratively refine many token positions in parallel until the sequence is “clean.”
- Training objective: corrupt clean text/code and train networks to reconstruct (correct) it — different objective vs. autoregressive next‑token prediction.
Claimed benefits
- Substantial speedups (5–10×) vs. autoregressive LMs of similar quality due to parallelism and reduced memory‑bandwidth bottlenecks.
- Higher arithmetic efficiency (weights can be reused across many tokens during inference).
- Potential for built‑in error correction (model is trained to fix mistakes rather than commit to a token forever).
Limitations and open problems
- Not yet perfect: still prone to hallucinations/errors, though denoising helps reduce some errors.
- Degeneration/repetition problems observed (analogous to image “six‑finger” oddities) — models can loop or repeat content.
- Variable‑length generation is nontrivial because diffusion math is continuous but token space is discrete; requires special math/engineering.
- Large model sizes and memory requirements remain — diffusion reduces memory bandwidth issues but not model scale.
Architecture & future directions
- Current production models are transformer‑based; Inception is also exploring alternative backbones (state‑space models) which scale better with context length.
- Diffusion LMs can combine algorithmic (denoising/inference) and architectural (non‑transformer backbones) improvements.
- Diffusion is already popular for world models and can be used where fast, accurate prediction is needed.
Deployment & availability
- Inception provides an OpenAI‑compatible API; existing autoregressive applications can migrate to Inception’s diffusion models to gain latency and cost benefits.
- Inception has built a custom serving engine that handles continuous batching, caching, and serving at scale.
Aldo Lovano — Rumi: ROI‑first enterprise AI and physical AI
Business model and product approach
- Rumi offers an enterprise AI platform with modules for back‑office automation, legacy systems, and physical AI integrations.
- Core differentiator: an ROI‑first module that calculates current TCO of processes, forecasts TCO after AI implementation, and estimates ROI; core to every deployment.
- Company background: ~11 years in market, started in robotics and pivoted toward enterprise AI while maintaining R&D in robotics.
Legacy systems and code modernization
- Focus on large installed bases (mainframes, COBOL, legacy monoliths) where migration isn’t always chosen by customers.
- Rumi offers:
- Maintenance and support tooling to extend legacy systems via natural‑language driven functionality (bi‑coding for legacy).
- Optionally, migration tools (e.g., AST‑based approaches) when customers opt to modernize.
- Advantage: Rumi has deep domain experience and historical client code that can be used to train verticalized models for private‑code generation.
Physical AI & robotics
- Rumi integrates agentic AI with physical devices: humanoid robots, teleoperation, edge AI for factories/distribution centers.
- Computer vision (CNN-based) is used for tasks like picking, self‑checkout, out‑of‑stock detection; they connect vision inference to agents that trigger actions.
- Current state: robotics has limited immediate ROI; Rumi raises VC for long‑term humanoid development, expecting deployments in 5–10 years.
Ethics and workforce impacts
- Rumi is explicit that automation will reduce payroll in some contexts; their ROI‑first stance includes modeling workforce impact.
- They acknowledge ethical concerns and emphasize managing transitions, retraining, and new job creation (robot operation, teleoperation, maintenance).
Market focus and company details
- Target customers: enterprise accounts (top and mid market), with special emphasis on Latin American companies and plans to expand to U.S. market.
- Website/contact: rumiit.com (company details and partnership info).
Notable quotes & soundbites
- Stefano Armano: diffusion LMs “generate multiple tokens in parallel” and are “trained to correct mistakes” rather than predict the next token.
- Stefano: diffusion models can be “5–10x faster” than autoregressive models of similar quality.
- Aldo Lovano: Rumi’s differentiator is an “ROI‑first” core module that “tracks the return on investment of each dollar invested in AI.”
- Aldo: “We are going to go short to reduce part of the payroll of the organization” — explicit on expected workforce reductions from automation.
Practical next steps / recommendations from the episode
- If you’re latency‑sensitive (IDEs, voice agents, customer support, agents), consider evaluating diffusion LMs for lower latency and cost — Inception offers an OpenAI‑compatible API (inceptionlabs.ai).
- For enterprises with legacy systems, evaluate an ROI‑first approach: measure current TCO, forecast post‑AI TCO, and prioritize use cases with clear unit economics.
- For robotics/physical AI pilots, focus on teleoperation/edge use cases with measurable ROI (e.g., picking, self‑checkout) before investing in general‑purpose humanoids.
Where to find the companies
- Inception Labs: inceptionlabs.ai (models available via API; Stefano Armano on LinkedIn)
- Rumi: rumiit.com (enterprise AI + robotics; Aldo Lovano contact via company site)
