# Topology Communication

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

## Reading focus

Read communication plans by asking which collective crosses which link.

## Annotated sketch

```text
## Collective map

| Strategy | Common collective | Placement concern |
| --- | --- | --- |
| DDP | all-reduce | gradient bucket size and overlap |
| FSDP | all-gather / reduce-scatter | parameter gather peak |
| MoE | all-to-all | token routing balance |
```

## What to explain

- all_reduce is common in DDP/TP.
- all_gather and reduce_scatter appear in FSDP.
- all_to_all appears in expert/sequence routing.

## Common trap

- Do not optimize FLOPs while ignoring communication.
- Do not assume all GPU pairs have equal bandwidth.
