Resource estimation and whiteboard drills
Write your answer before opening the reference program. Each estimate must state exclusions and a measurement plan.
Complete LLM Whiteboard Sprint
Start here when the prompt asks for model mechanics rather than only a resource ledger. Move from attention and KV cache to losses, ordinary decoding policy, LoRA adaptation, and MoE routing; each topic ends in a validation statement.
Memory, Collective, and Pipeline Bounds
For a 7B model, FP16 parameters/gradients, 8 bytes per parameter of optimizer state, and DP=8, compute ZeRO-3 persistent bytes per rank. Then compute ring all-reduce payload for a 2 GB tensor at world size 8 and pipeline bubble for four stages with sixteen microbatches.
Expected checks: 10.5 GB per-rank persistent state under the stated decimal-byte convention; 3.5 GB ring payload; about 15.79% bubble estimate.
KV Cache and Speculative Decode
Use the existing KV cache formula to set a concurrency bound, then state draft length, acceptance, and overhead assumptions before estimating speculative passes. The estimate is invalid without sampling-correct verification and measured batching behavior.
Design Prompt Checklist
Choose one: high-throughput LLM serving, a 100B training job, distributed checkpointing, or RL rollout infrastructure. Deliver workload assumptions, memory ledger, communication path, fault recovery, four metrics, and one experiment that could invalidate your design.
