#540: Modern Python monorepo with uv and prek

Summary of #540: Modern Python monorepo with uv and prek

by Michael Kennedy

1h 2mMarch 13, 2026

Overview of #540: Modern Python monorepo with uv and prec

This episode of Talk Python to Me (host Michael Kennedy) is a deep, practical look inside Apache Airflow — one of the largest open-source Python monorepos — with maintainers Yarek Patuk and Amag (Amog) Desai. They explain why Airflow uses a monorepo, the tooling and Python packaging standards that made it feasible at scale, how the contributor workflow works (uv workspaces + prec hooks + per-package pyproject.toml), and the architectural choices (including a symlink-based “shared libraries” approach) that let them balance DRYness, isolation, and backward compatibility.

Guests and context

  • Yarek Patuk — Airflow maintainer, Apache Software Foundation member, Security Committee member (drives security/supply chain thinking).
  • Amag (Amog) Desai — Apache Airflow PMC member, top contributor, works at Astronomer (major Airflow stakeholder).
  • Project scale highlighted: ~1.2M lines of Python (≈918k excluding comments), 100+ internal packages/distributions, heavy daily PR/issue traffic (dozens of PRs/day).

Key topics covered

  • What a monorepo is (and how it differs from a monolith or multi-repo approach).
  • Why Airflow chose/keeps a monorepo and how modern tooling changed the tradeoffs.
  • Tooling and standards that enabled scaling: uv workspaces, per-package pyproject.toml, dependency groups, inline script metadata, pip changes.
  • prec (replacement/enhanced alternative to pre-commit) for workspace-aware hooks and faster local checks.
  • Shared libraries implemented via symlinks + automated vendoring to avoid runtime/dependency conflicts while keeping code DRY.
  • Contribution/workflow realities (CI/QA, code review, AI-generated PRs, security considerations).
  • IDE integration and helper scripts for PyCharm/VS Code to make multi-package repos editable.

Main takeaways (concise)

  • Modern packaging standards + tooling make monorepos practical for large Python projects. The classic reasons to split into many repos have mostly faded if you use the right tools.
  • uv workspaces are a game-changer: they let you treat a sub-package as the “active project”, auto-sync (and isolate) the virtual environment to only the dependencies declared for that package, and use source packages in-workspace rather than PyPI installs.
    • Common commands: uv sync (create/update env for that package) and uv run pytest (auto-sync + run tests for that package).
  • Use per-package pyproject.toml (dependency groups, inline script metadata) as the single source of truth for each distribution. New PEPs (inline script metadata — PEP 723; dependency groups — e.g. PEP 735) + pip and tooling support make this workable.
  • prec (workspace-aware pre-commit alternative) allows defining hooks inside each distribution, runs fast, supports tab-completion and local scoping of hooks — making local dev + CI more reliable.
  • Shared libraries: symlink-and-vendor approach — embed (vendor) the exact version of shared code into a distribution at build time to avoid runtime conflicts and keep packages independent. This gives the benefits of code reuse without forcing a single shared runtime dependency version for all consumers.
  • Large-project benefits from these approaches:
    • Enforced isolation (can't import code unless declared dependency).
    • Easier local testing and reproducible builds.
    • Cleaner architecture: encourages explicit initialization, dependency injection, and fewer implicit global imports.
    • Better contributor onboarding and fewer “mystery” dependencies in developer envs.

Notable numbers & operational facts

  • ~1.2M lines of Python; ~120+ Python distributions/subpackages.
  • Weekly pulse example: ~310 active PRs in a week, ~200 merged; heavy daily review load.
  • Hundreds of automated checks (pre-commit/CI hooks) enforce quality across packages.
  • Airflow actively addresses the influx of low-quality / AI-generated PRs through contributor guidelines and triage.

Practical, actionable checklist (if you want to try this)

  • Start per-package:
    • Add a pyproject.toml to each package (declare dependencies + dependency groups).
    • Define a top-level workspace in the repo-level pyproject.toml so tooling can discover packages.
  • Adopt uv (or a workspace-capable tool) for local env management:
    • Use uv sync in a package dir to create the correct venv for that package.
    • Use uv run <tool> (pytest, linters) so env auto-syncs before running.
  • Use dependency groups (e.g., [tool.uv.dependency-groups] / pyproject convention) for dev/test tooling separation.
  • Use inline script metadata for runnable scripts and simpler pre-commit / tooling config.
  • Replace monolithic pre-commit YAML with a workspace-aware solution (prec) that allows hooks to live inside each package and run only relevant hooks locally.
  • Consider a vendoring strategy for shared internal libs:
    • Use symlink generation + pre-processing hooks to include (vendor) specific shared-library code into a distribution at build time if you need independent versioning.
  • Add small IDE helper scripts for PyCharm/VS Code that auto-discover and mark all package source/test roots — improves navigation & autocomplete.

Notable quotes & insights

  • “The reasons why you would like to have multiple repos are gone now if you're using the right tooling. Only the benefits of having everything in one place remain.” — Yarek
  • “UV workspaces were the most important thing for me — it let us split the repo into many distributions and make development isolated and simple.” — Amag
  • “The best way to foresee the future is to shape it.” — Yarek (on collaborating with tool authors to make tooling support monorepo workflows)

Risks, trade-offs & operational notes

  • You must invest in CI, automated checks and rigorous contribution guidelines to handle high PR volumes and prevent regressions.
  • AI-generated contributions require triage strategies — make low-quality submissions expensive for submitters and quick to close for maintainers.
  • GitHub/host availability can still create operational pain (cloning, Git operations) for big repos — but not a blocking problem in general.

Resources (mentioned / recommended)

  • Apache Airflow GitHub repo — inspect the monorepo and the implementations described.
  • “Modern Python repo for Apache Airflow” — four-part blog series by Yarek (detailed how-to + rationale).
  • FOSDEM / conference talk from the guests (recording available from FOSDEM).
  • uv (workspace-capable Python tooling) and prec (workspace-aware pre-commit style tool) — try them on a small repo to learn the workflow.

Final recommendation from the guests

  • If you’re considering a monorepo for a growing Python project: don’t fear it — with pyproject.toml per package, uv workspaces, dependency groups, inline script metadata and workspace-aware hooks, a monorepo is now a robust, maintainable option. Their bottom line: “Just do it.”