#547: Parallel Python at Anyscale with Ray

Summary of #547: Parallel Python at Anyscale with Ray

by Michael Kennedy

59mMay 6, 2026

Overview of Talk Python to Me Episode 547: Parallel Python at Anyscale with Ray

This episode explores Ray, an open-source Python framework for distributed execution and AI workloads, with guests Edward Oakes and Richard Law, two of the founding engineers behind Ray and Anyscale. The conversation traces Ray’s origins at UC Berkeley’s RISE Lab, explains how it evolved from reinforcement learning research tooling into a broader distributed compute platform, and shows why it has become important again in the era of LLM post-training, multimodal pipelines, and large-scale AI orchestration.

Ray’s Origin Story

Berkeley roots and the lab ecosystem

  • Ray came out of UC Berkeley’s systems and ML research environment, specifically the RISE Lab under Ion Stoica.
  • The lab was part of a lineage of research groups that also produced Spark.
  • The lab’s interdisciplinary structure brought together:
    • distributed systems researchers
    • machine learning / reinforcement learning students
    • security-focused researchers
  • This cross-pollination helped produce practical infrastructure driven by real research needs.

Built to solve a real research bottleneck

  • Ray began because students working on reinforcement learning were trying to use Spark, but Spark was not a good fit for the dynamic, iterative, actor-based nature of RL workloads.
  • Rather than forcing a tool to fit, they built a new one that matched the problem.

Why Ray Matters Now

Reinforcement learning fell out of favor — then returned

  • Ray’s early success was tied to RLlib, Ray’s reinforcement learning library.
  • RL research lost momentum for a while, so Ray’s RL-centered identity became less visible.
  • With ChatGPT and modern LLM post-training, reinforcement learning returned in a big way:
    • pretraining builds the base model
    • post-training / RLHF refines it for useful interaction
  • Ray became relevant again because this post-training stage fits its orchestration strengths.

LLMs and Ray

  • The guests noted that OpenAI used Ray for GPT-3 training orchestration.
  • Ray is now used across the lifecycle of modern AI systems:
    • training
    • fine-tuning
    • RL post-training
    • serving
    • data preprocessing
    • agent orchestration

What Ray Actually Is

Core idea

  • Ray is best described as a distributed execution engine for AI and Python workloads.
  • It lets developers write code in a familiar Python style while Ray handles:
    • task scheduling
    • data movement
    • process orchestration
    • cluster execution
    • failure handling

Two layers of value

  • Ray Core: the low-level distributed runtime with tasks, actors, and execution primitives.
  • Ray libraries: higher-level tools built on top of Ray, including:
    • Ray Data
    • Ray Train
    • Ray Tune
    • Ray Serve
    • RLlib

How Ray Fits Into the Parallel Computing Landscape

A useful way to think about parallelism

The discussion framed compute tools along two axes:

Specific vs. general

  • SQL databases are highly specific.
  • Spark is specialized for big-data style workloads.
  • Dask and Ray are more general-purpose.

Scale-up vs. scale-out

  • asyncio: concurrency within a single thread, mostly useful for I/O-bound work
  • threads: limited by Python’s historical GIL, though free-threaded Python changes this somewhat
  • multiprocessing: scale within a single machine
  • Ray / Dask: scale beyond one machine to a cluster

Where Ray stands out

  • Ray is especially strong when workloads combine:
    • I/O
    • CPU processing
    • GPU inference/training
    • distributed coordination
  • It is designed for heterogeneous compute pipelines, not just tabular data.

Ray in Practice

The programming model

  • Ray tries to make distributed code feel like regular Python.
  • You can:
    • define functions
    • pass them to Ray
    • let Ray distribute execution across machines
  • The user writes Python; Ray handles orchestration.

Ray Data example

  • ray.data.read_parquet(...) reads distributed data lazily.
  • Data can remain partitioned across storage rather than being centralized on one machine.
  • A pipeline can include:
    • reading data from S3
    • CPU-based preprocessing
    • GPU-based model inference
    • distributed writes back to storage

Heterogeneous pipeline example

The example discussed a multimodal audio pipeline:

  • read parquet-based audio data
  • transform raw bytes into usable tensors / arrays
  • resample audio
  • run Whisper-style transcription
  • apply an LLM/VLM-based quality filter
  • persist a curated subset

This demonstrates Ray’s ability to coordinate different kinds of compute in one pipeline.

Ray’s Strengths

Orchestration across many resource types

  • Ray can schedule work across:
    • CPUs
    • GPUs
    • multiple nodes
    • different task types
  • It can also make resource-aware decisions, like:
    • allocating enough CPUs to keep GPUs busy
    • balancing I/O, preprocessing, and model execution

Local development mirrors cluster execution

  • A big benefit is that the same code can run:
    • on a laptop
    • on a single machine
    • on a large cluster
  • This makes development and debugging much less painful than systems where local and production execution differ dramatically.

Good observability and debugging

  • The Ray dashboard shows:
    • node-level resource usage
    • tasks and actors
    • failures and stack traces
    • higher-level training/serving views
  • There is also a remote debugger integration with VS Code, allowing you to inspect remote processes much like a local debugger.

Cluster Management and Deployment

Ways to run Ray

Ray can be deployed in several modes:

  • Ray cluster launcher: quick setup on AWS, GCP, Azure, or on your own hardware
  • KubeRay: Kubernetes operator for running Ray clusters on K8s
  • Anyscale: managed Ray infrastructure and platform
  • Other partners/providers also support Ray deployments

KubeRay

  • KubeRay installs a controller/operator in Kubernetes.
  • You then create Ray clusters/jobs as custom resources.
  • Kubernetes handles the pod lifecycle while Ray manages distributed execution.

Fast iteration

  • Ray’s runtime environment can package local code and ship it to the cluster.
  • If you change a driver script, you can often rerun with minimal delay instead of redeploying the whole cluster.
  • This is especially valuable when iterating on AI pipelines where quick feedback matters.

Versioning and reproducibility

  • Versioning of running workflows is mostly handled by the layer above Ray:
    • Airflow
    • Kubernetes manifests
    • AnyScale job definitions
  • Ray itself focuses on execution; surrounding systems handle release/version semantics.

Ecosystem and Positioning

Ray sits in the “narrow waist”

  • The guests described Ray as a kind of narrow waist for the AI/distributed compute ecosystem:
    • higher-level libraries build on it
    • infrastructure platforms integrate underneath it
  • It aims to be the common execution layer that many AI workloads can share.

Integrations and ecosystem

  • Ray works alongside tools such as:
    • Airflow
    • Dask
    • Kubernetes
    • other workflow and automation systems
  • Some projects are adjacent to Ray, while others are built on top of Ray.
  • The ecosystem is especially active in:
    • reinforcement learning
    • multimodal data processing
    • AI pipelines

AnyScale’s Role and Business Model

Why a company matters

  • The guests emphasized that a company backing Ray is important for:
    • maintaining the core runtime
    • funding ecosystem integrations
    • supporting users at scale
    • keeping the project healthy long term

What AnyScale provides

  • Managed Ray infrastructure
  • Better interactive development
  • Faster startup and deployment
  • Shared resources across teams
  • Observability and debugging tooling
  • Enterprise support and upstream contributions

Open source monetization lesson

  • The conversation noted that open source projects often succeed commercially through:
    • managed infrastructure
    • support
    • operational tooling
    • expertise
  • For some projects, consulting/support can be a viable path; for Ray, the managed platform model fits the scale of the system.

Key Takeaways

Big ideas from the episode

  • Ray was born from real research pain, not abstraction for its own sake.
  • It became especially relevant because modern AI workloads need:
    • distributed execution
    • flexible orchestration
    • GPU-aware scheduling
    • easy debugging
  • Ray is broader than RL, but RL and post-training remain central to its identity.
  • The strongest selling point is that you can write ordinary Python and scale it out dramatically.

Practical use cases

  • Reinforcement learning
  • LLM post-training
  • Model serving
  • Multimodal preprocessing
  • Time series and finance workloads
  • Parallel backtesting
  • General-purpose distributed Python execution

Recommended Next Steps

If you want to try Ray

  • Start with the Ray documentation
  • Browse the examples gallery
  • Try a simple parallel Python workload
  • Experiment with:
    • Ray Core
    • Ray Data
    • Ray Train or Serve
    • KubeRay if you use Kubernetes

If you’re evaluating it for a team

  • Look at whether your workload has:
    • CPU + GPU stages
    • large data movement
    • distributed orchestration complexity
    • local-dev vs cluster parity pain
  • If yes, Ray is likely a strong fit.

Final Impression

Ray is presented as more than a library: it’s a distributed execution platform for modern AI that makes parallel Python feel approachable. The episode highlights how Ray’s research origins, practical ergonomics, and ecosystem depth have positioned it well for today’s AI-heavy workloads.