Lab 05: VAE Latent Space
Annotated code reading lab. Running code is optional.
VAE Latent Space
This lab maps directly to the handbook section. Read the related handbook section first, then use the lab page and starter file to connect the concept to concrete variables, shapes, APIs, and interview-ready explanations.
VAE Latent Space
Track how VAE compression changes pixel tensors into smaller latent tensors.
Mechanism to keep in mind
- `image_shape` is RGB pixel space.
- `latent_shape` has learned channels and downsampled spatial axes; scale factor and channel count are model-specific examples here.
- `decode` returns to pixel space after denoising.
Annotated Code Preview
Open starter fileStarter preview
Excerpt from code/lab-05-vae-latent-space/vae_latent_shapes.py. The linked starter file is the source of truth.
# VAE Latent Space
# Annotated reading material. Running this file is optional.
# Source-of-truth focus: Track how VAE compression changes pixel tensors into smaller latent tensors.
batch, channels, height, width = 1, 3, 1024, 1024
vae_scale_factor = 8 # example only; check the target VAE/pipeline
latent_channels = 4 # example only; not universal
latent_shape = (batch, latent_channels, height // vae_scale_factor, width // vae_scale_factor)
image_shape_after_decode = (batch, channels, height, width)
# What to explain while reading:
# - image_shape is RGB pixel space.
# - latent_shape has learned channels and downsampled spatial axes; constants are model-specific.
# - decode returns to pixel space after denoising.
#
# Common traps:
# - Do not treat latent channels as colors.
# - Do not forget the VAE when tracing full pipeline memory.
What each block is doing
- Setup / contract
- `image_shape` is RGB pixel space.
- Main transition
- `latent_shape` has learned channels and downsampled spatial axes.
- Interview hook
- `decode` returns to pixel space after denoising.
Reading checkpoints
- Latent space is not RGB.
- Latent scaling constants, channel counts and spatial downsampling factors are model- and pipeline-specific.
- VAE decode can be a memory spike.
What this lab prevents
- Do not treat latent channels as colors.
- Do not forget the VAE when tracing full pipeline memory.
How to say it out loud
Track how VAE compression changes pixel tensors into smaller latent tensors. Then explain the code by naming the state being transformed, the axis or shape that matters, and the tradeoff that would appear in a real system.
Additional intuition
- Use official docs and papers for API behavior and factual claims; use blogs only to improve the mental picture.
- If support matrices, performance behavior or backend choices are version-sensitive, check current docs before repeating them.
- A strong interview answer names the state object, the shape or axis it changes, and the tradeoff it creates.
