This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.
Read text/image/video conditions as separate encoder paths that meet inside the denoiser.
## Conditioning flow
```text
prompt -> text_encoder -> text_context
reference_image -> vision_or_vae_encoder -> image_context
control_signal -> control_encoder -> block_residuals
all contexts -> denoiser(latent, timestep, contexts)
```