Trace stages, transfers, and stream pressure
These labs treat intermediate data ownership and bounded queues as part of the model-serving design.
Multistage Event Trace
Execute a bounded autoregressive -> diffusion -> VAE decode flow and inspect rejected work when admission exceeds stage capacity.
Video / Diffusion Parallel Plan
Start with latent token count and denoising-stage memory. Decide whether stage placement, sequence parallel execution, or bounded concurrency addresses the measured bottleneck.
Use the Video Token Count Estimator before committing to a parallel layout.
Chunk Scheduler and Cancellation
Describe the first emitted chunk, queue capacity, cancellation propagation, and tensor cleanup for an audio or video output stage. Then enumerate the per-stage metrics required to confirm the behavior.
Reference implementation boundary: vLLM-Omni disaggregated inference docs.
