Overview of Talk Python to Me — Episode #541: Monty — Python in Rust for AI
This episode (recorded Feb 17, 2026) features Samuel Colvin (creator of Pydantic) discussing Monty — a new Python interpreter written from scratch in Rust and purpose-built to safely run LLM-generated code. The conversation covers why Monty exists, how it’s different from CPython and other interpreters, key implementation choices (sandboxing, host callbacks, durability), current limitations, performance characteristics, and practical use cases inside agentic AI workflows.
Key takeaways
- Monty is a minimal, sandboxed Python runtime implemented in pure Rust (no CPython dependency) aimed specifically at executing LLM-generated code reliably and safely.
- The runtime starts in microseconds (or sub-microsecond in hot loops), enabling cheap, low-latency execution for many small runs typical in "code mode" / programmatic tool-calling scenarios.
- Every interaction with the real world (files, network, environment) must be explicitly enabled and goes through a host callback, giving strong sandboxing and fine-grained control.
- Monty can serialize its entire interpreter state to storage and resume later — enabling durable workflows where invoked tools may take minutes/hours.
- It’s intentionally limited: not a drop-in replacement for CPython. Full stdlib and third-party packages (pip installs like Pydantic, FastAPI) aren’t generally supported; instead, Monty exposes controlled shims for the needed functionality.
- LLMs both helped build Monty (accelerating implementation) and are its main target users: Monty is optimized for scenarios where the LLM writes small, verifiable Python snippets.
What Monty is and why it exists
- Purpose: run LLM-generated Python code safely, quickly, and with predictable resource control — a sweet spot between pure tool-calling and full sandbox container/VM access.
- Motivations:
- LLMs are good at writing Python and SQL; giving them a safe, fast runtime reduces token costs and increases reliability (e.g., parsing big API responses then making follow-ups).
- Avoids cold-start and orchestration complexity of spinning up containers for each code execution.
- Provides auditability/debugging: you get executable code out of the LLM to inspect and test.
Technical design and notable features
- Implementation:
- Written in Rust (pure Rust library).
- Uses Ruff’s AST parser and ty type checker (both Rust) to parse Python and perform type checking before execution.
- Offers Python and JavaScript bindings (PyO3 for Python, N-API for JS), and can be embedded directly into Rust applications.
- PGO (profile-guided optimization) builds to improve performance.
- Sandboxing model:
- No direct syscalls to host. File, env, network, and other external ops must be explicitly provided by the host as controlled callbacks or shims.
- Default: no networking or file access. Host can whitelist or proxy these services (e.g., block localhost access).
- Callbacks & durability:
- Instead of host-provided function pointers, Monty suspends execution when a tool-call is needed and returns a structured "call" to the host to perform.
- Full interpreter state can be serialized to a database and resumed later — useful when external tools take long to complete.
- Resource controls:
- Execution time limits, memory limits, recursion depth — intended to prevent OOMs and runaway execution.
- Language support:
- Monty targets Python 3.14 syntax (subset). The published Python package supports installation on Python 3.10–3.14 environments.
- Currently missing or partial: class definitions, context managers (with), and match expressions. These are planned or in-progress.
- Standard library:
- Very limited standard library implemented in Rust; JSON, datetime, regex, some typing/sys bits exist or are planned.
- Full stdlib and third-party libraries are not generally supported. Instead, Monty expects host-provided shims for key functionality (e.g., HTTP, DB access).
- WASM / browser:
- Monty can be compiled to WebAssembly and run in-browser; community examples (Simon Willison) show it running via WASM / Pyodide combinations.
Performance and benchmarks
- Startup/latency focus:
- Monty shows microsecond-level cold-starts for tiny snippets (example: 1+1 measured in microseconds; hot loop ~900 ns).
- Container cold-starts and many sandboxing services measure in hundreds of milliseconds to seconds (e.g., Docker cold start ~195 ms, Pyodide ~2.8s in examples).
- Relative runtime:
- Monty is not designed to beat CPython in all workloads; observed performance varies (roughly from ~5× faster to ~5× slower depending on case). The main win is very low startup overhead and controlled execution cost for many small runs.
- CI/perf tooling:
- Monty maintainers use CodSpeed (bench PR comments, flamegraphs, CPU instruction counts via Valgrind) to prevent regressions.
Security model & developer ergonomics
- Safety-by-default: no file/network/host access unless explicitly granted.
- Host-side policy enforcement: host receives proposed calls and can validate/deny; e.g., disallow requests to localhost.
- Easier auditing: code produced by LLMs is visible and testable; you can fuzz and run unit tests against Monty vs CPython for parity.
- Fuzzing and tests: Monty leverages fuzzing (and intends to reuse CPython unit tests where appropriate) to find panics and memory errors.
Where Monty fits in agentic workflows
- Ideal for "code mode": let LLMs write small Python snippets that call host-provided tools/API shims — this often reduces token usage and inference cost compared to repeatedly prompting the LLM for step-by-step tool calls.
- Placement on spectrum:
- Tool calling (JSON + API) <— Monty —> full sandboxed containers / terminal control (Claude Code, etc.)
- Monty aims for more expressiveness than pure tool-calling but stronger safety and lower overhead than full sandboxing/terminal access.
- Integrations & adoption:
- Pydantic AI will add Monty as a code execution environment (so agents built with Pydantic AI can use Monty soon).
- Other projects (JustBash, BashKit) are exploring using Monty to execute Python parts securely.
- Community already using Monty for recursive language models (RLMs) and other agentic patterns.
Current limitations and trade-offs
- Not aiming for full CPython compatibility — the bar for replacing CPython is extremely high.
- No general pip/installable third-party libraries inside Monty; full ABI compatibility would defeat the sandbox goal.
- Some Python features are not yet supported (classes, with, match). Monty will add features selectively based on need.
- Standard library will be selective and implemented in Rust on a case-by-case basis; maintainers prefer to expose small, useful shims rather than mimic every historic API.
- Developers must design host shims or whitelists for required functionality (HTTP, DBs, numeric libs).
Why LLMs made Monty feasible
- LLMs encode patterns for implementing interpreters, public APIs, and error semantics — speeding development.
- Large corpora give LLMs knowledge of CPython behavior (signatures, expected error messages), enabling automated test generation and parity checks.
- Unit tests and fuzzing make it feasible to iterate quickly and validate behavior versus CPython.
- Samuel notes Monty is a class of problems where LLMs produce outsized acceleration (implementations with a canonical reference and testable outputs).
Notable quotes / insights
- “Every single place where the code can interact with the real world, it must call an external function.” — on Monty’s sandboxing and host callbacks.
- “You can serialize the entire interpreter state, put it in a database, and retrieve it later.” — durability for long-running tool calls.
- “We’re not trying to build another Python you’ll port your app to — we’re building a different tool that’s better for LLM-generated code.” — clarifying scope.
- LLMs helped make Monty feasible by bringing knowledge of interpreter implementations, APIs, and tests.
Practical next steps / action items
- Explore the Monty GitHub repo to try the Python or JavaScript bindings and read the README and issues.
- If you run agentic workflows, experiment with Monty as a code execution backend (especially for short-lived, frequently-run snippets).
- If you need specific capabilities (HTTP, DB, Polars-like dataframes), consider contributing or requesting shims that map those APIs safely to host services.
- Run evaluations: test LLM behavior with Monty shims vs emulating popular library APIs to see which approach yields more reliable generated code.
- Follow Pydantic AI / Logfire for ecosystem integrations and skills that will make Monty easier to use in agent frameworks.
Links & resources mentioned
- Monty (GitHub) — primary repo (search GitHub for “monty” / Samuel Colvin)
- Pydantic / Pydantic AI (Samuel Colvin’s projects)
- Logfire (observability platform from the same team)
- Ruff (AST parser) and ty (type checker) — Rust components Monty leverages
- CodSpeed (performance PR comments & benchmarks)
- Examples: Simon Willison’s browser WASM demo of Monty
- Projects mentioned: JustBash, BashKit, RLM implementations using Monty
If you want a concise checklist for getting started: 1) visit Monty’s GitHub, 2) install the Python or JS package and run a tiny snippet, 3) try a host-provided shim (e.g., a safe JSON load or controlled HTTP fetch) to see the callback flow, 4) evaluate cold-start latency versus your current sandbox approach.
