InfraLens

Quantization Reading Lab

This starter is annotated reading material and the source of truth for the lab preview. Running anything is optional; the reading goal is to explain the mechanism without hiding behind a framework call.

Reading focus

Separate weight, activation and KV-cache quantization before discussing memory or quality tradeoffs.

Annotated sketch

## Quantization reading table

| Target | Saves | Main risk |
| --- | --- | --- |
| Weights | model memory / bandwidth | dequantization and accuracy |
| Activations | temporary memory | calibration sensitivity |
| KV cache | long-context serving memory | attention-score error |

What to explain

Common trap