InfraLens - Coding Practice

Implementation Practice

Coding Practice

Implement small mechanisms, calculate resource bounds, and defend the assumptions that determine a system design.

Exercise loop

state and shapes -> formula -> implementation
      -> smoke test -> measured caveat -> interview explanation

Whiteboard path

LLM Whiteboard Runnable primitive example Decoding example LoRA example Examples browser Interview prompts

Model Primitives

Start with the complete LLM whiteboard sprint

Live-coding primitives

Practice QKV shapes, MHA/MQA/GQA/MLA cache organization, decoder flow, KV-cache accounting, post-training losses, decoding policy, LoRA, and MoE routing.

Reference code

Begin with the compact decoder primitive file, then implement logits filtering and low-rank adaptation as separate, checkable exercises.

Runtime Code

Implement only the missing systems checks

The runtime exercises cover rollout capacity, bounded stage scheduling, and queue backpressure. CUDA-specific kernel material remains an annotated reasoning task unless an actual CUDA toolchain and target device are part of the validation environment.

Whiteboard Estimates

Make dimensions and assumptions explicit

Question	State to write first	Validation surface
Training memory	parameter, gradient, optimizer bytes and sharding degree	Training State estimator
DDP / pipeline cost	payload, world size, effective bandwidth, stages, microbatches	Collective & Pipeline estimator
Speculation	draft size, acceptance, draft overhead	Speculative Decode estimator
RL online capacity	actor production, learner demand, version lag	Rollout Capacity estimator

System Design

Answer from a ledger, not from a keyword list

For each architecture prompt: define workload and SLO, identify persistent and transient state, estimate the dominant cost, choose placement and communication, specify failure recovery, then name metrics that could falsify the choice.

References