# Attention Backend Comparison Reading

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

## Reading focus

Compare math-level attention with kernel/runtime choices: SDPA, FlashAttention and PagedAttention.

## Annotated sketch

```text
## Backend comparison

| Layer | Question | Reading focus |
| --- | --- | --- |
| Math | What is computed? | QK^T, mask, softmax, V |
| Kernel | How is it computed? | SDPA / FlashAttention memory traffic |
| Runtime | How is state managed? | KV blocks, batching, paging |
```

## What to explain

- math names the formula.
- kernel names how the formula is executed.
- runtime names serving-time cache/request management.

## Common trap

- Do not call FlashAttention an approximation.
- Do not confuse kernel backend with serving scheduler.