Serving Substrate

Systems Runtime Interview Practice

Make lifetime, overload behavior, communication layers, and evidence explicit.

#
Q&A

Q&A Cards

01Why is RAII relevant to an accelerator-serving runtime?

Short Answer

Asynchronous request paths own sockets, buffers, stream events, and communicator state. RAII makes cleanup deterministic when success, cancellation, timeout, or exceptions take different control paths.

Source: C++ Core Guidelines: Resource Management

02Why should a serving queue be bounded?

Short Answer

A bounded queue turns overload into an explicit admission policy before in-flight buffers exhaust memory and tail latency grows without limit. Rejecting work can be the reliability-preserving answer.

03How do TCP, NVLink, and NCCL differ in an explanation?

Short Answer

TCP carries request/response traffic; NVLink describes local device-connectivity capability; NCCL implements GPU collective communication over available transports/topologies. They are different system layers, not interchangeable optimizations.

TermSystem layerMoves or providesNot interchangeable with
TCPNetwork transport.Byte streams for service communication.A GPU-local link or collective library.
NVLinkDevice interconnect.Fast local GPU-to-GPU connectivity.A request protocol or collective API.
NCCLGPU collective library.Collectives mapped onto available transports.The physical link itself.

Source: NCCL User Guide

04How would you diagnose a distributed-serving timeout?

Short Answer

Correlate request deadlines, queue waits, cancellation logs, kernel timelines, collective duration and asynchronous communicator errors. Decide whether the root cause is admission overload, compute delay, transfer delay, collective failure, or cleanup failure.