Coding Practice Labs

Labs

Resource estimation and whiteboard drills

Write your answer before opening the reference program. Each estimate must state exclusions and a measurement plan.

Sprint · LLM

Complete LLM Whiteboard Sprint

Start here when the prompt asks for model mechanics rather than only a resource ledger. Move from attention and KV cache to losses, ordinary decoding policy, LoRA adaptation, and MoE routing; each topic ends in a validation statement.

Open whiteboard sprint Open transformer_whiteboard_primitives.py Open decoding_strategies_whiteboard.py Open lora_finetuning_whiteboard.py

Lab 01 · Runnable

Memory, Collective, and Pipeline Bounds

For a 7B model, FP16 parameters/gradients, 8 bytes per parameter of optimizer state, and DP=8, compute ZeRO-3 persistent bytes per rank. Then compute ring all-reduce payload for a 2 GB tensor at world size 8 and pipeline bubble for four stages with sixteen microbatches.

Open reference implementation Verify interactively

Expected checks: 10.5 GB per-rank persistent state under the stated decimal-byte convention; 3.5 GB ring payload; about 15.79% bubble estimate.

Lab 02 · Serving

KV Cache and Speculative Decode

Use the existing KV cache formula to set a concurrency bound, then state draft length, acceptance, and overhead assumptions before estimating speculative passes. The estimate is invalid without sampling-correct verification and measured batching behavior.

KV concept Speculation concept KV reference implementation

Lab 03 · Architecture

Design Prompt Checklist

Choose one: high-throughput LLM serving, a 100B training job, distributed checkpointing, or RL rollout infrastructure. Deliver workload assumptions, memory ledger, communication path, fault recovery, four metrics, and one experiment that could invalidate your design.