Overview of Talk Python to Me — Episode #538: Python in Digital Humanities
This episode (recorded Jan 22, 2026) features David Flood (DArTh — Digital Arts & Humanities team, Harvard) in conversation with Michael Kennedy. They cover how Python and web technologies are applied to digital humanities projects, the practical problems of long‑term maintenance when grant funding ends, and pragmatic solutions: static exports, client‑side search (PageFind), browser Python / WebAssembly tricks, and sensible deployment choices. The discussion mixes project case studies, tooling recommendations, and considerations about cost, compliance, and the emerging role of AI.
Guest background
- David Flood: PhD in humanities (textual criticism); self‑taught Python since ~2019. Now works full time with Harvard’s DArTh group building web platforms for humanities research and public archives.
- Entry point: textual criticism (comparing many manuscript versions), where computational sequence‑comparison tools (phylogenetic/sequence algorithms) are applicable.
Notable projects (high level)
- Amendments Project — searchable archive of ~22,000 proposed U.S. constitutional amendments (public site with faceted search).
- Mapping Color in History — pigments database combining spectral analyses of paintings with deep‑zoom image selection and provenance/analysis metadata.
- Finn (folk) stories database — multi‑language audio & text corpus (English, Scottish Gaelic, Irish), searchable and mapped geographically.
- Apatosaurus — web tool for visualizing textual collations/critical apparatus (textual criticism).
- Water Stories — public submissions during an art installation, later archived as static site.
- Sumeb specimens / minerals database — static site using Astro + PageFind for search.
Key technical themes and takeaways
- Use the web as the easiest distribution platform for research tools (avoids app‑store friction and code signing).
- Prioritize presentation order: show compelling research results first, then explain tools. That helps non-technical audiences value the output.
- Good search + filtering (facets) is hugely impactful for researchers and public users.
- Design for archival/“end of life” from the beginning when possible — build with static export in mind to avoid losing access after grants end.
- Open source when feasible — publishing repos and static archives increases longevity.
Technical details & tooling mentioned
- Backends and frameworks:
- Django (Django REST Framework), GraphQL.
- Postgres (RDS) as primary DB. SQLite for lighter local/storage scenarios.
- Containerized deployment on AWS (ECS) using AWS CDK (infrastructure-as-code).
- Search:
- Elasticsearch (powerful but costly to operate long‑term).
- PageFind — client‑side, very fast static site search; PageFind has a Python API to build indexes programmatically (useful for archiving dynamic content).
- Static site approaches:
- Django Bakery — bake Django sites to static HTML when using their class‑based views (works well if chosen early).
- Frozen Flask — equivalent for Flask apps.
- Astro, Hugo — static site generators used for public sites (Astro helpful to mix components).
- Browser Python / WebAssembly:
- Pyodide / PyScript and PGLite (PG in browser) — proof-of-concept exists (Django WebAssembly) that can run Django admin / DB in the browser via service worker, enabling fully client‑side archives with interactive admin.
- Other tools:
- Librosa (audio/music processing).
- Dev/workflow: Docker, Codespaces, GitHub (including archival concepts like GitHub’s backup initiative).
- AI / assistants: Copilot, Claude, ChatGPT for code and complex data processing tasks.
- UV (uvicorn) suggested for easy runs without requiring Python installation; PyPI distribution for desktop CLI apps.
- Observability / error tracking: Sentry (sponsor mention).
Archival and “end of grant” strategies
Problems:
- Grants end; persistent hosting (containers, Elasticsearch, RDS) becomes expensive or unsupported by faculty.
- Compliance and institution rules can force choices (containers on ECS simplify compliance vs single VM management).
Practical solutions discussed:
- Static export (bake site into HTML/CSS/JS):
- Tradeoffs: no dynamic writes, no Elasticsearch vector search, but much cheaper and easy to host (S3, GitHub Pages).
- Use Django Bakery or comparable tools where feasible; design for baking from project start.
- Client‑side search with PageFind:
- Replaces Elasticsearch for many discovery scenarios (keyword + facets). PageFind loads compact index fragments on demand and offers a Python API for indexing.
- Browser-hosted full functionality:
- Pyodide/PyScript + PGLite allow running Python + DB in the browser (Django WebAssembly proof of concept), so a project could remain interactive without server hosting.
- Containers / images:
- Provide Docker images or GitHub repo (Codespace) that can boot the site locally as a fallback.
- Open source / public archive:
- Publish code and static exports to GitHub; can be rescued, forked, or hosted later.
PageFind: why it matters
- Fast, client‑side search designed for static sites; breaks index into small chunks to only fetch relevant fragments.
- Supports faceting/filtering (important for research discovery).
- Has a Python API (can be indexed from a database dump) and integrates into static site build pipelines.
- Good fit to convert dynamic search backed by Elasticsearch into a static archive with functional search UX.
WebAssembly / Pyodide possibilities
- Running Python, SQLite/PG‑Lite in the browser makes it possible to:
- Keep full interactivity (simple CRUD and admin) without servers.
- Ship an archive that’s both static and still dynamic client‑side.
- Caveats:
- Conversion/migration work (Postgres → SQLite/PGLite).
- Long‑term risk if WASM standard or browser support changes (still a promising approach).
- Larger initial JS/WASM payloads; but still often preferable to ongoing hosting costs.
AI: impact and caveats
- AI tools (ChatGPT, Claude, Copilot) speed many tasks (data extraction, complex math, code scaffolding).
- Short‑term productivity gains for engineers; potential downside: researchers may rely on AI to the point of skipping learning technical skills that lead to careers like David’s.
- David’s practical use: combining domain knowledge with AI outputs (e.g., using Claude to help with math in Librosa workflows).
- Recommendation: teach researchers how to use AI tools responsibly (Copilot, dedicated tools) instead of copy‑pasting output blindly.
Practical recommendations / action items (for teams building DH projects)
- Design for archiving from project start if grant funding is time‑limited.
- Prefer architecture and frameworks that can be exported (e.g., class‑based views compatible with Django Bakery).
- Prioritize a great discovery UX: faceted filters + fast search (PageFind is a cost‑effective static alternative to Elasticsearch).
- Containerize builds and keep reproducible infra-as-code (CDK, Docker images), and maintain an archival build procedure (static export + index generation).
- Consider browser Python / PGLite when you need persistent interactivity without server hosting.
- Open source and publish artifacts (code, static export) when permitted; it increases longevity and discoverability.
- Teach faculty basic, safe AI usage and tooling (Copilot vs copy/paste) so they can use automations effectively and learnablely.
- Balance compliance and cost — containers (ECS) often simplify compliance even if they cost more than a single VM.
Notable quotes / soundbites
- “What happens when the grant ends but the website can't? His answer? Static sites, client‑side search, and sneaky Python.”
- “Present your research first. Hook people with what’s possible — then tell them about the tool.”
- “Programming is a superpower, not a replacement for your job.”
- On AI: a candid note — if AI coding had existed when David started, he might not have learned the technical skills that led him to his current role.
Resources & tools mentioned (quick list)
- Frameworks & infra: Django, Django REST Framework, GraphQL, Postgres, Docker, AWS (ECS), AWS CDK, Codespaces
- Static/site: Django Bakery, Frozen Flask, Astro, Hugo
- Search & indexing: Elasticsearch, PageFind (Python API)
- Browser Python / WASM: Pyodide, PyScript, PGLite, Django WebAssembly (POC)
- Libraries: Librosa (audio)
- AI / assistants: Copilot, Claude, ChatGPT
- Other: GitHub (archival/backups), Sentry (error monitoring sponsor), CommandBook (host’s app)
Final thoughts
- Digital humanities projects show how modest technical choices unlock access to historical and cultural data at scale. The biggest near‑term practical win for sustainability is designing for archiving and using client‑side search (PageFind) to replace expensive backend search. Emerging WebAssembly options make compelling, interactive static archives possible — worth experimenting with for projects with limited long‑term funding.
