Starfish Website

Back in 2011 software was eating the world. But in 2025, putting all bubble discussion aside, NVIDIA are making hardware sexy again. So, I thought it might be useful to position some of the network hardware within the NVIDIA portfolio.

NVLink: NVLink provides a high speed (up to 900G per GPU), low latency communication path between GPUs within a node or a POD. Initially, I thought of NVLink as similar to the crossbar modules used on the backplane of chassis-based switches but it’s really an alternative to the local PCIe bus for GPU-to-GPU connectivity. Note that the PCIe bus is still required to connect the GPUs other internal components such as local storage, NIC (Network Interface Cards) or CPU (NVLink Fusion can also be used to connect to CPU).

NVSwitch: At scale, a limitation of NVLink is that it requires a direct physical link between each pair of GPUs and using the standard N*(N+1) calculation, this massively increases the number of physical links required in large node or POD. NVIDIA solve this limitation by using NVSwitch, which provides a purpose-built custom switching ASIC or SoC (Switch on a Chip) and delivers direct non-blocking GPU-to-GPU (1.8T per GPU pair) connectivity.

ConnectX: Where NVLink is concerned with GPU-to-GPU connectivity within the node or POD, ConnectX is concerned with high-speed connectivity between nodes and PODs. ConnectX is an advanced NIC (Network Interface Card), which can run as either InfiniBand or Ethernet, supports RDMA (Remote Direct Memory Access) and runs at speeds up to 400G.

BlueField: A DPU (Data Processing Units) is a smart NIC with its own CPU core, memory, hardware accelerator, and a built in ConnectX NIC. DPUs can be used in place of ConnectX NICs or alongside them in a node to provide security, encryption offload, compression, and packet inspection.

QuantumX: InfiniBand is still the gold standard method of interconnecting the largest of AI and HPC clusters (more on this in coming weeks). QuantumX is NVIDIAs range of InfiniBand switches supporting up to XDR (eXtended Data Rate) 800G. QuantumX natively provides low latency and lossless connectivity using Advanced Congestion Control and Adaptive Routing. They also provide RDMA (Remote Direct Memory Access), SHARP (Scalable Hierarchical Aggregation and Reduction Protocol), and options for co-packaged optics.

SpectrumX: The final item for today is the SpectrumX range of Ethernet switches, again supporting up to 800G ports. SpectrumX key features include RoCEv2 (RDMA over Converged Ethernet), Adaptive Routing, ultra-low-latency, and Intelligent Congestion Control. These switches are frequently used in large AI and HPC deployments, connecting 10s of 1000s of GPUs.

For more information, see the link below. Also, stay tuned for a deep dive on some of the solutions and their alternatives in coming weeks.

‍https://lnkd.in/eiNR6kDV

DNA: NVIDIA Networking Overview