Systems Runtime Interview Practice
Make lifetime, overload behavior, communication layers, and evidence explicit.
Q&A Cards
01Why is RAII relevant to an accelerator-serving runtime?
Short Answer
Asynchronous request paths own sockets, buffers, stream events, and communicator state. RAII makes cleanup deterministic when success, cancellation, timeout, or exceptions take different control paths.
02Why should a serving queue be bounded?
Short Answer
A bounded queue turns overload into an explicit admission policy before in-flight buffers exhaust memory and tail latency grows without limit. Rejecting work can be the reliability-preserving answer.
03How do TCP, NVLink, and NCCL differ in an explanation?
Short Answer
TCP carries request/response traffic; NVLink describes local device-connectivity capability; NCCL implements GPU collective communication over available transports/topologies. They are different system layers, not interchangeable optimizations.
| Term | System layer | Moves or provides | Not interchangeable with |
|---|---|---|---|
| TCP | Network transport. | Byte streams for service communication. | A GPU-local link or collective library. |
| NVLink | Device interconnect. | Fast local GPU-to-GPU connectivity. | A request protocol or collective API. |
| NCCL | GPU collective library. | Collectives mapped onto available transports. | The physical link itself. |
Source: NCCL User Guide
04How would you diagnose a distributed-serving timeout?
Short Answer
Correlate request deadlines, queue waits, cancellation logs, kernel timelines, collective duration and asynchronous communicator errors. Decide whether the root cause is admission overload, compute delay, transfer delay, collective failure, or cleanup failure.
