InfraLens

Conditioning Flow

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

Reading focus

Read text/image/video conditions as separate encoder paths that meet inside the denoiser.

Annotated sketch

## Conditioning flow

```text
prompt -> text_encoder -> text_context
reference_image -> vision_or_vae_encoder -> image_context
control_signal -> control_encoder -> block_residuals
all contexts -> denoiser(latent, timestep, contexts)

```

What to explain

text becomes embeddings.
image/video reference may become latent or encoder features.
control is injected at defined blocks.

Common trap

Do not reduce every condition to a prompt.
Do not ignore where the condition enters the model.

This site is open source. Improve this page.