# Quantization Reading Lab

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

## Reading focus

Separate weight, activation and KV-cache quantization before discussing memory or quality tradeoffs.

## Annotated sketch

```text
## Quantization reading table

| Target | Saves | Main risk |
| --- | --- | --- |
| Weights | model memory / bandwidth | dequantization and accuracy |
| Activations | temporary memory | calibration sensitivity |
| KV cache | long-context serving memory | attention-score error |
```

## What to explain

- target says what tensor is quantized.
- saved_memory depends on that target.
- risk records where error can affect model behavior.

## Common trap

- Do not report only bit width.
- Do not assume quantization preserves quality in every workload.
