# Offload / Memory Saving

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

## Reading focus

Read offload plans as memory residency schedules.

## Annotated sketch

```text
## Offload reading plan

| Module | Called when | Residency question |
| --- | --- | --- |
| text encoder | before loop | can move away after embeddings |
| denoiser | every step | usually stays hot |
| VAE decoder | after loop | can load late if memory constrained |
```

## What to explain

- resident modules stay on GPU.
- offloaded modules move when inactive.
- transfer is the latency cost.

## Common trap

- Offload is not free acceleration.
- Do not offload a module that is needed every tiny step without considering transfer cost.
