Rollout runtime, placement, and recovery
Use these exercises after the handbook. Each lab names state, capacity assumptions, and recovery boundaries rather than hiding them behind a framework.
Rollout Capacity and Version Lag
Calculate actor production versus learner consumption, then decide which trajectory versions remain acceptable after an update.
- Change actor count and interval until backlog becomes negative.
- Explain why version lag is a correctness and measurement concern.
Actor / Learner Placement
For a fixed accelerator budget, compare a colocated design with a disaggregated actor pool. Account for serving KV memory, learner state memory, publish frequency, and metrics.
Decision record:
resources -> roles -> persistent state -> transfer path
-> freshness bound -> utilization metric -> failure responseWeight Sync and Checkpoint Recovery
Given a learner at policy version 12 and actors generating versions 10-12, define what persists in a distributed checkpoint and which in-flight rollouts may be consumed after restart.
Validate the checkpoint design against PyTorch Distributed Checkpoint, which supports parallel save/load and load-time resharding.
