Open Source Self-Driving with Comma AI

Summary of Open Source Self-Driving with Comma AI

by Practical AI LLC

46mApril 16, 2026

Overview of Open Source Self-Driving with Comma AI

This episode of the Practical AI podcast features Harold (CTO at Comma.ai) discussing Comma’s hardware and OpenPilot software—an open source, retrofittable ADAS/autonomy stack. The conversation covers system architecture (device, on-device agent, and cloud training), Comma’s end-to-end approach to learning driving behavior, their world-model / simulation strategy, practical user experience, technical constraints (compute, control, RL, continual learning), and why open source matters for this space.

Key takeaways

  • Comma.ai provides a retrofit device you attach to a car (behind the rearview mirror, CAN-connected) that runs OpenPilot and gives highway-level autonomy (steering + longitudinal control).
  • OpenPilot is a widely used open-source autonomy stack and one of the most popular robotics projects on GitHub.
  • Runtime inference and control happen entirely on-device; training and simulation run in Comma’s data center.
  • Comma follows an “end-to-end” approach: models ingest raw video and output steering curvature and acceleration/gas/brake commands, with minimal explicit intermediate perception outputs.
  • A critical innovation: Comma trains policy models inside a learned video-based world model (a generative/simulation model that responds to control inputs), then uses that simulator to train robust recovery behavior.
  • Major unsolved technical problems highlighted: low-level controls, reinforcement learning (for tight feedback control), and continual learning (live adaptation on a per-vehicle basis).
  • Open source is central to Comma’s strategy—community contributions help port new cars and increase adoption; transparency supports user ownership of devices.

System architecture — mental model

  • Device (on-vehicle):
    • Cameras + compute + GPS/IMU (optional).
    • Runs the small policy/agent model and OpenPilot runtime processes (UI, state management, car API).
    • Sends CAN messages to the car (steer, gas, brake, engagement state).
    • Offline operation: works without continual cloud connectivity (only updates / data upload).
  • Two-model separation:
    • World model (simulator): learned generative video model that can be conditioned on actions (e.g., “turn left 10 degrees”) and must be photorealistic and responsive to inputs.
    • Agent/policy: smaller model trained inside the simulator; the tiny model is what ships to devices for runtime inference.
  • Training infrastructure:
    • Central data center runs simulation-based training, diffusion-video generation, and supervisory processes.
    • Training uses hundreds of millions of miles of human-driving data, plus simulator-generated perturbed trajectories to teach recovery from mistakes.

End-to-end vs classical modular approaches

  • End-to-end (Comma’s emphasis): raw sensor (video) → neural network → driving commands (curvature, acceleration). Minimal hand-engineered intermediate representations (no explicit lane/traffic-light detection in the core policy).
  • Classical pipelines: perception (segmentation, object detection, lanes) → planner/optimizer → control. These require heavy labeling, hand-engineered rules, and curated data pipelines.
  • Trade-offs:
    • End-to-end scales better with less human labeling; requires sophisticated simulators and robustness methods.
    • Classical approaches remain popular at larger, better-funded companies that can afford extensive labeling and multi-stage systems.

World model / learned simulator (practical explanation)

  • Purpose: avoid pure imitation learning’s drift problem by training the agent on perturbed scenarios and supervised recoveries.
  • Requirements for a useful world model:
    • Photorealism and diversity to avoid exploitable artifacts.
    • Responsiveness to control inputs: generated video must change realistically when inputs/actions are applied.
  • Role in training:
    • Instantiate simulator on a real scene, apply perturbations (e.g., push off-center), and use the simulator (which “knows” the future) to provide supervisory signals / recovery trajectories for agent training.

User experience & product details

  • Installation: plug into the car’s CAN connector near the rearview mirror, mount on the windshield—minimal setup for supported vehicles.
  • Engagement: uses the car’s cruise-control engage button; device provides UI and audio feedback.
  • Performance: Comma reports >50% of miles driven by OpenPilot users are driven by the system (highway-focused reliability).
  • Upgrades planned: an external GPU option to raise on-device model size/compute (expected to significantly improve edge cases like nuanced green-light detection).

Practical constraints and hardware

  • On-device compute is constrained by thermal/placement limits (windshield-mounted device); therefore Comma runs older, lower-power chips compared to integrated OEM systems.
  • Typical compute gap: Comma’s device uses far less compute than a full OEM FSD computer (Harold estimated ~1/100th), so many gains come from efficient models and simulation-informed training.
  • Adding an external GPU (e.g., under the seat) could enable 10–100x larger models and measurable capability increases (e.g., better traffic light recognition).

Major technical challenges (unsolved problems)

  • Controls:
    • Cars respond inconsistently across models (delays, unknown internal logic). Classical control tuning currently required; learning-based low-level control still immature.
    • Per-vehicle parameters (tire stiffness, friction) often have to be learned live.
  • Reinforcement learning (RL):
    • Imitation learning alone fails for tight feedback-control tasks.
    • Current RL approaches are not yet reliable at real-world scale for driving controls; research/engineering gaps remain (reward specification, noisy real-world dynamics).
  • Continual learning:
    • The agent must adapt online to changes (tire pressure, weather, wear) to maintain performance; current systems use classical optimization for adaptation rather than seamless neural continual learning.

Differences vs. other companies (Tesla, Waymo)

  • Tesla / Waymo often combine large labeling/data-engineering efforts and greater on-vehicle compute budgets; they may use hybrid pipelines (some classical perception + end-to-end research).
  • Comma’s differentiator: lean end-to-end focus, learned simulators, and open-source ecosystem enabling rapid car support via community contributions.
  • Comma aims to be profitable and incremental—ship useful features today while iterating toward the longer-term end-to-end robotics vision.

Open source rationale and engineering choices

  • OpenPilot is open source to enable community-driven car ports and ecosystem growth—practical necessity for supporting many car models.
  • Language choices:
    • Heavy use of Python (~66%) for rapid experimentation, development, and ML workflows.
    • C/C++ used where required by performance, safety standards, or low-level car interfacing.
  • Philosophy: if you own a device, you should be able to inspect and control its software; openness supports user sovereignty.

Future directions and adjacent use cases

  • Short/medium-term: improve urban red-light behavior, smoother city driving, and make models more robust with increased on-device compute.
  • Adjacent robotics: the same end-to-end + world-model approach could transfer to indoor navigation and other mobile robotics (vacuum, mowers, home robots) once perception, simulation, and control mature.
  • Long-term vision: generalized ML agents that treat different actuators (steering vs. arm movement) similarly—enabling broader robotic actions beyond driving.
  • Desire for simple, useful robotics products (dishwashers, vacuum cleaners, indoor helpers) that are open source and user-controlled.

Notable quotes / insights

“Self-driving was the most interesting applied robotics problem, period. It’s a place where you can make products that are immediately useful.” — Harold, CTO, Comma.ai

“If you want to make a robotic simulator with this approach, you need it to be accurate in terms of responding to inputs… It has to look photorealistic and respond accurately to inputs.” — Harold

Where to learn more / next steps

  • OpenPilot GitHub and docs (search “comma.ai OpenPilot”).
  • Comma.ai blog posts for technical write-ups (world model, releases).
  • PracticalAI.fm for this episode and other AI application discussions.

Actionable takeaways for listeners

  • If you’re curious about retrofit autonomy, check whether your car is supported by OpenPilot and review the installation/compatibility guidance on the OpenPilot site.
  • For researchers and developers: Comma’s open-source codebase and simulation-first approach are good references for end-to-end autonomy experiments.
  • If you care about device ownership and transparency, consider supporting/open-source projects in robotics and autonomy that make their code and data pipelines accessible.

Sources: episode interview with Harold (CTO, Comma.ai) on the Practical AI podcast (hosts Daniel Whitenack and Chris Benson). Note: transcript contained a few transcription errors (e.g., “Kama” → Comma.ai).