InfraLens

Quantization Reading Lab

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

Reading focus

Separate weight, activation and KV-cache quantization before discussing memory or quality tradeoffs.

Annotated sketch

## Quantization reading table

| Target | Saves | Main risk |
| --- | --- | --- |
| Weights | model memory / bandwidth | dequantization and accuracy |
| Activations | temporary memory | calibration sensitivity |
| KV cache | long-context serving memory | attention-score error |

What to explain

target says what tensor is quantized.
saved_memory depends on that target.
risk records where error can affect model behavior.

Common trap

Do not report only bit width.
Do not assume quantization preserves quality in every workload.

This site is open source. Improve this page.