InfraLens

A clear starting point for learning AI infrastructure.

Labs

Make queue and resource behavior observable

One portable C++ artifact and one Python simulation anchor the runtime discussion in explicit state changes.

Lab 01 · C++20 Runnable

RAII Bounded Queue

Read a portable queue implementation where capacity rejects excess work and object lifetime releases its owned state without a manual cleanup path.

clang++ -std=c++20 -Wall -Wextra -pedantic tracks/systems-runtime/labs/code/bounded_queue.cpp -o /tmp/infralens_bounded_queue
/tmp/infralens_bounded_queue
Lab 02 · Runnable

Backpressure Simulation

Run deterministic admission, rejection, and completion events; then connect queue high-water marks to the overload narrative.

Lab 03 · Decision record

Transport and Timeout Diagnosis

Given a stalled multi-GPU serving request, list evidence for request transport, device transfer, collective synchronization, GPU work, and cleanup. Use the communication estimator only as a lower bound, then specify the trace required to verify it.

Reference: NCCL collective operations and error handling.