KV Cache Calculator
What it estimates: KV Cache memory for a batch and context length.
When to use it: before increasing context length, concurrency, or KV precision.
Small calculators for building system intuition. They estimate orders of magnitude, not exact production capacity.
What it estimates: KV Cache memory for a batch and context length.
When to use it: before increasing context length, concurrency, or KV precision.
What it estimates: naive attention score elements and memory.
When to use it: when sequence length or visual tokens make attention the bottleneck.
What it estimates: image tokens and spacetime tokens.
When to use it: before changing resolution, frame count, patch size, or temporal patching.
What it estimates: parameter memory before and after quantization.
When to use it: when comparing FP16/BF16 weights with int8, int4, or scale overhead.