Overview of Open source for awkward robots (The Stack Overflow Podcast)
This episode features Jan Lipart, CEO and co‑founder of OpenMind, discussing the design, goals, and societal implications of an open‑source software stack for humanoid robots (OM1). The conversation covers how OM1 uses large language models (LLMs) for decision‑making, how motion and cognition are separated in the stack, hardware standardization strategies, the idea of immutable governance (storing rules on a blockchain), and the need for public conversation, regulation, and lifelong learning as robots become more common.
Key topics covered
- Guest introduction: Jan Lipart’s background in physics, hardware for experiments, and later work in software and healthcare—returning to robotics driven by interest in embodied AI.
- OM1 overview: an open‑source operating system for humanoids that uses natural language as the communication format between many models.
- LLMs in robots: LLMs are used primarily for data fusion and high‑level decision making (the “what should happen next?”).
- Motion vs cognition: low‑latency motion/robotics models handle physical execution; LLMs handle social, memory, and decision layers.
- Decentralized governance: Asimov‑style rules stored in natural language on a blockchain (Ethereum) to enable immutable, inspectable guardrails.
- Open ecosystem: an app‑store model for humanoid skills to enable broad developer participation and transparency.
- Hardware and drivers: strategy to standardize compute via a “brain‑pack” (NVIDIA unit) and common middleware (Cyclone DDS/Xeno) to reduce driver combination complexity.
- Societal impacts and policy: need for public debate on jobs, regulation, insurance, and social effects; education and lifelong learning emphasized as preparation.
How OM1 works (technical summary)
Architecture and messaging
- Internal components (vision, inertial, battery, etc.) each emit short natural‑language statements (e.g., “I see Ryan,” “battery fully charged”).
- Those sentences are fused into paragraphs and fed to LLMs that debate and decide the next action.
- Natural‑language internal communication makes it easy to inspect, add guardrails, and audit model interactions.
The “mother” / referee model
- A supervisory LLM (nicknamed “mother” or coach) periodically monitors behavior and offers corrective suggestions (e.g., posture, conversational pacing).
- Acts as an extra input to the decision process—like a coach or reviewer rather than an authoritarian controller.
Motion vs decision stack
- Motion control (grasping, balance) uses specialized robotics models/world models (e.g., Google’s Gemini Robotics, vision‑action models).
- Decision and social cognition live above the motion layer; after LLM decides “pick up red apple,” a motion model executes the physical action.
- Both layers compete for compute and power; deployment choices depend on application constraints (e.g., naval torpedoes require fully local compute).
Open source, app store, and governance
- OM1 is open source and available on GitHub (search OM1/OpenMind); the goal is transparency so owners can inspect and modify robot behavior.
- OpenMind aims for an app‑store model where thousands of small, focused skills are contributed by developers—similar to mobile app ecosystems.
- Governance idea: store constitutions/ rules (Asimov‑style) in natural language on a blockchain smart contract standard so robots can fetch immutable guardrails.
Hardware strategy and standardization
- Challenge: many sensors/actuators across different manufacturers. Solution: standard OS + drivers model, analogous to PC peripherals.
- Practical approach: attach a standardized “brain pack” (e.g., an NVIDIA‑class compute unit) to different humanoid frames; plug sensors into that backpack.
- Use of middleware (Cyclone DDS or Xeno) to route data/actions over Ethernet reduces driver permutation complexity.
- Noted trends: accelerating hardware improvements (example: affordable multi‑finger robot hands with high MTBF), many Chinese suppliers increasing pace and lowering costs.
- Focus choice: OpenMind prioritizes socially useful capabilities (speech, memory, spatial understanding) over extremely dexterous manipulation.
Societal implications and business concerns
- Rapid technical progress exposes social, legal, regulatory, and insurance questions: workplace impacts, liability for accidents, union responses, school/education policies.
- The speaker urges wider public engagement and education to bridge the gap between tech communities and general public understanding.
- Career guidance: embrace lifelong learning—people should expect to reinvent themselves continuously and build broader cognitive/system thinking skills, not just narrow technical expertise.
Notable quotes
- “If a LLM is able to generate photorealistic video and write computer code, you can extrapolate… maybe [LLMs are] also very good at generating actions that suitable hardware can execute in the real world.”
- “By virtue of all the internal communications in the software being natural language, it's very easy for us to figure out which model is saying what and then how to add natural language guardrails.”
- “I don't want this humanoid to be like my Tesla… that does over‑the‑air updates every few days and I have absolutely no idea what's going on.”
Key takeaways
- Open, inspectable robot software is important for trust, safety, and community involvement.
- LLMs are well‑suited to decision making and data fusion; specialized models handle low‑level motion control.
- Practical hardware standardization (brain packs + middleware) reduces complexity when supporting many robot bodies.
- Immutable, decentralized rule storage (blockchain smart contracts) is proposed as one component of governance for autonomous agents.
- Society must prepare—through regulation, insurance frameworks, education reform, and public discourse—for the human impacts of embodied AI.
Actionable resources & links
- OpenMind / OM1: start at openmind.org and the OpenMind GitHub (search OM1 or OpenMind) to explore code and contribute.
- If you’re a developer: consider building small, focused skills/apps for humanoids (app‑store model analogy).
Hosts/contacts:
- Host: Ryan Donovan (Stack Overflow Podcast)
- Guest: Jan Lipart (OpenMind) — check openmind.org and their GitHub for more.
This summary captures the technical approach, architectural choices, societal concerns, and open‑source ethos discussed in the episode.
