InfraLens - Systems Runtime

Serving Substrate

Systems Runtime

Reason about the C++ and networking boundary underneath accelerator services: resource lifetime, request admission, queues, communication layers, and failures.

Request lifecycle

socket/event -> admission -> bounded queue -> GPU task
                 |               |              |
               reject        timeout/cancel   release state

Connected material

Concept node Annotated labs Interview practice Topology Communication estimator

C++ Ownership

Failure paths are part of performance correctness

Use RAII-style owned handles for sockets, staging buffers, streams, and communicator lifetimes. A timeout or cancelled request must release each resource once, even when completion events arrive late.

Interview test

Name who owns each buffer before and after an asynchronous submission, and what destroys it on normal completion, cancellation, or exception.

Admission

Bound queue memory before overload chooses for you

Stable operating point

Under stable assumptions, Little's law relates average in-flight work to arrival rate and time in system: .

Overload behavior

Once arrivals outpace service, choose explicit rejection, deferral, prioritization, or shedding. Do not treat growing host or device memory as a queue policy.

Transport Layers

Name the boundary before naming the optimization

Layer	Moves	Question
TCP / HTTP request path	client payloads and streamed responses	admission, reconnect, deadline, framing
RDMA-style transfer	registered memory between hosts	registration, ownership, failure boundary
NVLink / device fabric	device-access topology	locality and available bandwidth
NCCL collectives	distributed tensors	collective order, communicator failure, measured duration

Troubleshooting

Distinguish overload, communication failure, and slow compute

Collect queue wait, rejection rate, deadline expiry, buffer high-water marks, kernel timeline, collective duration, asynchronous communicator errors, and request cancellation traces. A single utilization number cannot locate the cause.

References

Systems Runtime

Failure paths are part of performance correctness

Bound queue memory before overload chooses for you

Stable operating point

Overload behavior

Name the boundary before naming the optimization

Distinguish overload, communication failure, and slow compute

Primary documentation