# Lab 11: Quantization Comparison

This lab is a reading plan for precision tradeoffs. It avoids assuming a
specific quantization library and focuses on how to reason about what is
quantized, what is saved and what can break.

## Reading focus

- FP16/BF16 is the baseline for quality and common kernel support.
- INT8 may save memory/bandwidth but often requires calibration and outlier handling.
- INT4 weight-only mainly compresses weights; it does not imply every operation is INT4.
- KV Cache quantization targets serving capacity for long context and concurrency.
- Kernel support determines whether lower bit-width turns into lower latency.

## File

- `quantization_comparison_plan.md`

## Questions to answer while reading

- What tensor is quantized: weights, activations or KV Cache?
- What is the quality risk?
- What kernel path must exist for this to help performance?
