# Parallelism Design Template

## Cluster topology

- Nodes:
- GPUs per node:
- Intra-node links:
- Inter-node links:
- GPU/NIC affinity:

## Model assumptions

- Parameter count:
- Layers:
- Hidden size:
- Sequence length:
- Precision:
- Target global batch:

## TP/PP/DP choice

| Axis | Degree | Placement | Reason |
| --- | ---: | --- | --- |
| TP | | | |
| PP | | | |
| DP | | | |

## FSDP/ZeRO choice

- Stage / policy:
- What is sharded:
- Expected memory saving:
- Expected communication cost:

## Communication map

| Collective / transfer | Where it appears | Link preference | Risk |
| --- | --- | --- | --- |
| AllReduce | | | |
| AllGather | | | |
| ReduceScatter | | | |
| Send/Recv | | | |
| AllToAll | | | |

## Expected bottlenecks

- Compute:
- Memory:
- Communication:
- Pipeline bubble:
- Checkpointing:

## Risk and fallback

- Main risk:
- First metric to inspect:
- Fallback design:
- Open questions:
