
Following on from RDMA and RoCE last week, the next feature of AI and HPC networks that I will look at is lossless transport. Lossless transport is achieved in fundamentally different ways for InfiniBand and Ethernet, so I will cover InfiniBand this week and Ethernet next week.
Unlike Ethernet, which uses the upper layers such as TCP for retransmission of lost packets, InfiniBand was designed to be natively lossless. Several features which make this possible, such as congestion control being done in hardware, and RDMA which means that network traffic is not passed to the OS Kernel or the CPU, which avoids delays in buffering. However, the key feature for lossless transport is Credit-Based Flow Control (CBFC).
CBFC is essentially a very simple process. The receiver sends credits to the transmitter, where one credit allows for transmission of 64 Bytes of data. If the transmitter has packets to send and sufficient credits, it will send the data. If the number of credits reduces to zero, it will stop sending until it receives more credits. This stops the receiver overflowing its ingress buffer and dropping packets.
Diving just a little deeper. CBFC is implemented on both a virtual lane and port-to-port basis. Typically, an InfiniBand link will have 16 virtual lanes, with Lane 0 assigned to normal traffic and Lane 15 allocated to InfiniBand management traffic. Each lane has its own set of flow credits, to allow for prioritisation and avoid Head of Line (HoL) blocking.
Finally, as CBFC works primarily on a port-to-port basis, Explicit Congestion Notification (ECN) is necessary to provide end-to-end lossless transport over the InfiniBand fabric. If a switch detects congestion on a network flow, it sets the Forward Explicit Congestion Notification (FECN). When the packet with the FECN is received by the destination, the local Host Channel Adapter (HCA) sets the Backward Explicit Congestion Notification (BECN) on return traffic to the transmitter, which will then throttle transmission until the BECNs drop off and normal transmission resumes.
