# Lab 09: Nsight Profiling Workflow

This directory contains annotated command templates and a checklist. The goal is
to understand what each profiler answers; Nsight tools are optional for a later
profiling pass.

## Reading focus

- `torchrun` describes the workload shape for distributed training.
- `nsys profile` answers timeline questions: CPU gaps, kernel order, H2D/D2H and NCCL overlap.
- `ncu --set full` answers kernel questions: memory throughput, occupancy and stall reasons.
- `profiling_checklist.md` shows the baseline -> hypothesis -> one change -> re-check reasoning loop.

## Optional command

If you later want to print the command templates:

```bash
chmod +x profile_commands.sh
./profile_commands.sh
```

## Questions to answer while reading

- Which profiler would you open first for timeline gaps?
- Which profiler would you open for one suspicious CUDA kernel?
- Why is changing one variable at a time part of the method?
