#534: diskcache: Your secret Python perf weapon

Summary of #534: diskcache: Your secret Python perf weapon

by Michael Kennedy

1h 14mJanuary 13, 2026

Overview of #534: diskcache: Your secret Python perf weapon

Host Michael Kennedy and guest Vincent Warmerdom explore diskcache — a lightweight, practical Python caching library built on SQLite. The episode covers what diskcache does, how it works under the hood, real-world use cases (web apps, notebooks, LLM experiments), advanced features (sharding/fanout, eviction policies, custom serialization), performance tradeoffs, deployment tips, and caveats (pickling/versioning, network filesystems, maintenance status).

Key points / main takeaways

  • diskcache behaves like a Python dict but persists data to disk (usually SQLite). That gives you durable, cross-process, thread-safe caching without running Redis or another server.
  • It’s especially useful for expensive or slow operations: LLM calls, image classification, heavy DB queries, and long notebook computations.
  • Very easy to adopt: dictionary-style ops plus function memoization (decorator).
  • Works well when multiple processes on the same machine can share a filesystem volume — ideal for a single VM with several worker processes.
  • Features include TTL expiries, eviction policies, sharding (fanout) for concurrent writers, Django backend, custom serializers (JSON + compression), and queue/deque-like data structures.
  • Be mindful of pickle/version compatibility, write contention on shared SQLite, and avoid using cache files on slow network filesystems.

How diskcache works (simple)

  • API is dict-like and persistent:
    • cache = Cache('/path/to/cache')
    • cache['k'] = obj
    • cache.get('k', default)
  • Under the hood: values are serialized (pickle by default), stored in a SQLite file. For simple types it uses native storage instead of pickling.
  • Supports memoize decorator for functions: caches outputs by function arguments and can set expire times.

Use cases and real examples discussed

  • LLM experiments: avoid repeat calls/costs by caching prompt→response pairs. Huge win for dev/test/benchmark loops.
  • Web server caching:
    • Caching Markdown→HTML fragments, parsed YouTube IDs, RSS feed generation (e.g., cache RSS for 1 minute to avoid recomputing).
    • Shared cache across multiple web worker processes via a mounted shared volume in Docker Compose.
  • Notebooks and long-running analytics: checkpoint intermediate results, prevent recomputing after kernel restarts/crashes.
  • Job/queue patterns: diskcache provides deque-like structures useful for cross-process queues (pop/push semantics).
  • Vincent’s Marimo project: used diskcache to cache repeated expensive git-blame computations and Altair chart assets.

Features & options (what to watch for)

  • Persistence: cache survives process restarts.
  • Thread/process safety: suitable for multi-process web workers on the same machine.
  • Expiry/TTL: set per-item expiry to avoid stale data.
  • Eviction policies: max size (default 1 GiB), max items, and multiple eviction strategies (last stored default, least recently used, least frequently used, etc.).
  • Fanout (sharding): distribute keys across several SQLite files to reduce writer contention; default shard count ≈ 8.
  • Django integration: diskcache.django_cache as a drop-in backend.
  • DQ / deque: queue-style structures for cross-process communication.
  • Transactions and diskcache.index: support atomic reads/updates and ordered mappings.
  • Custom disk classes: implement alternative serialization (e.g., JSON + zlib) for big, compressible text blobs or ORJSON for performance and safer cross-version compatibility.
  • Numpy/embeddings: store arrays, consider quantization or float16 to reduce size; converting to raw bytes/pickle may not give big wins unless you compress or quantize.

Performance & benchmarks (practical notes)

  • Local/same-machine caching often outperforms a networked Redis because it avoids network hops; diskcache authors and examples show it can be extremely fast for many workloads.
  • Modern NVMe disks are very fast; using disk instead of RAM can be more cost-effective for large caches.
  • Concurrency caveat: SQLite handles many reads well, but concurrent writes can block — fanout (sharding) mitigates this by spreading writers across files.
  • Default diskcache size limit (1 GiB) prevents unbounded cache growth; configure to your needs.

Practical deployment considerations

  • Use a shared persistent volume when running multiple processes/containers so they can access the same cache files (Docker Compose external volume, big VM disk).
  • Avoid network/CIFS mounts for the cache file — locking and performance degrade on network filesystems.
  • Configure size limits, TTLs, and shards appropriately depending on read/write patterns and concurrency.
  • Design cache keys to include all factors that should invalidate a result (e.g., content hashes, version IDs). Good key design prevents stale data.
  • For extremely large analytical datasets, consider using columnar formats or DBs (Parquet/DuckDB) rather than treating those as cache items.
  • If sharing caches across teams/universes, consider access and security (file permissions/locations).

Tips, trade-offs & gotchas

  • Pickling: default serialization is pickle — easy and general, but vulnerable to cross-version/package mismatches. If you plan to keep caches between Python or dependency upgrades, prefer portable serializers (JSON/ORJSON) or explicit upgrade/migration strategies.
  • Custom serializers + compression: for text-heavy caches, JSON + zlib (or better compressors) can dramatically reduce disk usage (often large % savings).
  • Fanout helps writer-heavy scenarios. If your cache is write-heavy, it might not be a good fit; caches are optimal for read-heavy workloads.
  • Numpy arrays: raw bytes or pickle often similar size; quantization (float16 or bucketed quantization) can drastically reduce size at a tolerable accuracy loss for embeddings.
  • Eviction defaults: check and set useful limits (e.g., item count, disk size) to avoid runaway caches.
  • Project maintenance: diskcache is mature and widely used, but recent release cadence slowed in the last year(s). It’s OSS on GitHub — you can fork if you need tweaks — and SQLite itself is actively maintained.

Practical recommendations / action items

  • Quick try:
    • pip install diskcache
    • Minimal usage:
      • from diskcache import Cache
      • cache = Cache('/path/to/cache')
      • cache['k'] = expensive_result
      • value = cache.get('k', compute_if_missing)
  • For function-level caching: use cache.memoize(expire=60) to wrap expensive functions (e.g., RSS generation, model inference).
  • For shared web workers: mount a persistent disk volume and point all workers to the same Cache location.
  • Use TTLs and size limits to keep cache bounded.
  • For LLM/text-heavy caching, implement a JSON+compression disk class (or ORJSON + zlib) to reduce disk footprint.
  • Avoid putting the cache file on network filesystems; use local NVMe volumes, or consider different architecture if you need distributed cross-machine caching.
  • Design cache keys carefully — include content/version hashes to avoid stale results.

Resources & where to look next

  • diskcache docs and PyPI (for API, decorators, fanout, Django backend, DQ/deque).
  • Examples in episode: Marimo project / Vincent’s notebook demos (git-blame chart caching and Altair chart caching).
  • Consider related tools for larger or different problems: Redis/Valkyrie (in-memory, networked caches), DuckDB/Parquet for analytical storage, ORJSON for JSON serialization.
  • If you plan productionize with SQLite persistence in the cloud, investigate services and approaches for backing up SQLite (S3 streaming backups, providers offering persistent SQLite).

Guest: Vincent Warmerdom — practical examples from notebooks, LLM workflows, and Marimo; Host: Michael Kennedy.

Summary verdict: diskcache is a compact, powerful tool for many real-world caching needs — extremely easy to adopt (dict + decorator), economical in resource usage, and often the fastest/cheapest win for single-machine, multi-process deployments.