socket/event -> admission -> bounded queue -> GPU task
| | |
reject timeout/cancel release stateSystems Runtime
Reason about the C++ and networking boundary underneath accelerator services: resource lifetime, request admission, queues, communication layers, and failures.
Failure paths are part of performance correctness
Use RAII-style owned handles for sockets, staging buffers, streams, and communicator lifetimes. A timeout or cancelled request must release each resource once, even when completion events arrive late.
Name who owns each buffer before and after an asynchronous submission, and what destroys it on normal completion, cancellation, or exception.
Bound queue memory before overload chooses for you
Stable operating point
Under stable assumptions, Little's law relates average in-flight work to arrival rate and time in system: .
Overload behavior
Once arrivals outpace service, choose explicit rejection, deferral, prioritization, or shedding. Do not treat growing host or device memory as a queue policy.
Name the boundary before naming the optimization
| Layer | Moves | Question |
|---|---|---|
| TCP / HTTP request path | client payloads and streamed responses | admission, reconnect, deadline, framing |
| RDMA-style transfer | registered memory between hosts | registration, ownership, failure boundary |
| NVLink / device fabric | device-access topology | locality and available bandwidth |
| NCCL collectives | distributed tensors | collective order, communicator failure, measured duration |
Distinguish overload, communication failure, and slow compute
Collect queue wait, rejection rate, deadline expiry, buffer high-water marks, kernel timeline, collective duration, asynchronous communicator errors, and request cancellation traces. A single utilization number cannot locate the cause.
