state and shapes -> formula -> implementation
-> smoke test -> measured caveat -> interview explanation
Coding Practice
Implement small mechanisms, calculate resource bounds, and defend the assumptions that determine a system design.
Start with the complete LLM whiteboard sprint
Live-coding primitives
Practice QKV shapes, MHA/MQA/GQA/MLA cache organization, decoder flow, KV-cache accounting, post-training losses, decoding policy, LoRA, and MoE routing.
Reference code
Begin with the compact decoder primitive file, then implement logits filtering and low-rank adaptation as separate, checkable exercises.
Implement only the missing systems checks
The runtime exercises cover rollout capacity, bounded stage scheduling, and queue backpressure. CUDA-specific kernel material remains an annotated reasoning task unless an actual CUDA toolchain and target device are part of the validation environment.
Make dimensions and assumptions explicit
| Question | State to write first | Validation surface |
|---|---|---|
| Training memory | parameter, gradient, optimizer bytes and sharding degree | Training State estimator |
| DDP / pipeline cost | payload, world size, effective bandwidth, stages, microbatches | Collective & Pipeline estimator |
| Speculation | draft size, acceptance, draft overhead | Speculative Decode estimator |
| RL online capacity | actor production, learner demand, version lag | Rollout Capacity estimator |
Answer from a ledger, not from a keyword list
For each architecture prompt: define workload and SLO, identify persistent and transient state, estimate the dominant cost, choose placement and communication, specify failure recovery, then name metrics that could falsify the choice.
