# Lab 12: 64-GPU Parallelism Design

This is a topology-reading worksheet. You do not need a 64-GPU cluster to use
it; the goal is to understand how parallelism choices map to communication
paths and memory pressure.

## Reading focus

- TP is usually kept inside a node when layer-level collectives are hot.
- PP splits model depth but introduces pipeline bubble and stage-balance concerns.
- DP replicates the training path and changes global batch semantics.
- FSDP/ZeRO shards training state and adds all-gather/reduce-scatter.
- EP/MoE introduces token routing and possible all-to-all pressure.

## File

- `topology_design_worksheet.md`

## Questions to answer while reading

- Which communication path is hottest?
- Which parallelism axis should stay on fast intra-node links?
- What fallback would you choose if communication dominates?