RL Infrastructure Labs

Labs

Rollout runtime, placement, and recovery

Use these exercises after the handbook. Each lab names state, capacity assumptions, and recovery boundaries rather than hiding them behind a framework.

Lab 01 · Runnable

Rollout Capacity and Version Lag

Calculate actor production versus learner consumption, then decide which trajectory versions remain acceptable after an update.

Open Python starter Open estimator

Change actor count and interval until backlog becomes negative.
Explain why version lag is a correctness and measurement concern.

Lab 02 · Design worksheet

Actor / Learner Placement

For a fixed accelerator budget, compare a colocated design with a disaggregated actor pool. Account for serving KV memory, learner state memory, publish frequency, and metrics.

Decision record:
resources -> roles -> persistent state -> transfer path
          -> freshness bound -> utilization metric -> failure response

Lab 03 · Failure trace

Weight Sync and Checkpoint Recovery

Given a learner at policy version 12 and actors generating versions 10-12, define what persists in a distributed checkpoint and which in-flight rollouts may be consumed after restart.

Validate the checkpoint design against PyTorch Distributed Checkpoint, which supports parallel save/load and load-time resharding.