Skip to main content

NVSwitch and NVLink Secure Partition

Bridge manages NVSwitch-equipped HGX servers by integrating with NVIDIA Fabric Manager to configure NVLink Secure Partitions. This enables per-tenant GPU isolation at the fabric level, allowing each tenant to use full NVLink bandwidth within their allocation with no cross-tenant communication.

GPU Interconnect Architecture

IOMMU-Based Isolation (PCIe Servers)

On servers without NVSwitch — such as H100 PCIe or A100 PCIe systems — Bridge enforces GPU isolation using the IOMMU (Input–Output Memory Management Unit). IOMMU acts as a hardware boundary between devices and system memory: GPUs are assigned to specific IOMMU groups and passed through directly to the tenant's VM, preventing any GPU assigned to one tenant from accessing memory belonging to another.

IOMMU memory model — GPU virtual addresses routed through IOMMU to per-process page tables

Bridge configures IOMMU-based GPU passthrough as part of VM provisioning on PCIe GPU servers. This model is proven and secure, but introduces a limitation in multi-tenant environments: NVLink is disabled for partial GPU allocations. When only a subset of GPUs is assigned to a VM, GPU-to-GPU traffic falls back to PCIe, significantly reducing performance for multi-GPU workloads.

IOMMU GPU passthrough — PCI device assigned to Guest VM via VFIO driver with protected DMAs

NVLink is NVIDIA's dedicated GPU-to-GPU interconnect, delivering bandwidth measured in hundreds of gigabytes per second per GPU — far exceeding PCIe. NVLink allows GPUs to exchange data directly, bypassing the CPU, and is critical for workloads such as large language model training, distributed inference, and vision-language pipelines.

NVSwitch extends NVLink from point-to-point connections to a full non-blocking any-to-any fabric across all GPUs in the system. On HGX platforms, multiple NVSwitch chips work together to deliver terabytes per second of aggregate bandwidth, ensuring uniform and predictable GPU-to-GPU communication at any scale.

HGX-2 baseboard — 8× V100 GPUs connected via 6× NVSwitch chips (NVS 1–6) with PCIe to CPU complex

The Multi-Tenancy Challenge

The physical NVSwitch fabric connects all GPUs in the system. Without additional controls, GPUs assigned to different tenants are on the same fabric — creating a potential path for cross-tenant GPU communication.

The conventional response is to disable NVLink entirely when GPUs are split across tenants. This preserves isolation but forces all GPU-to-GPU traffic back onto PCIe, negating the performance benefits of NVSwitch and effectively reducing powerful multi-GPU servers to single-tenant machines.

ConfigurationIsolationGPU-to-GPU Bandwidth
NVLink disabled (PCIe fallback)StrongLow (PCIe limited)
NVLink enabled, no partitioningNoneHigh (NVLink)
NVLink Secure Partition (Bridge)Strong (hardware)High (NVLink)

Bridge resolves this challenge by configuring NVLink Secure Partitions on the NVSwitch fabric through NVIDIA Fabric Manager. Fabric Manager programs the NVSwitch routing tables to create logical NVLink domains within a single physical system:

PropertyDescription
Intra-partition NVLinkFull NVLink bandwidth active between all GPUs within the same partition
Inter-partition blockingNVLink traffic between partitions completely blocked at the hardware routing level
Hardware enforcementIsolation enforced by NVSwitch hardware, not software
Transparent to workloadsFrom the workload's perspective, GPUs appear as a fully connected group with no performance penalty

Each tenant receives a private, high-bandwidth NVLink fabric slice while remaining fully isolated from neighboring tenants at the hardware level.

Bridge Integration with NVIDIA Fabric Manager

Bridge integrates with NVIDIA Fabric Manager to automate the NVSwitch partition lifecycle as part of compute allocation.

Bridge integration with NVIDIA Fabric Manager — Bridge NVLink controller and Metal Provisioning interact with Fabric Manager over out-of-band; Bridge Observability connects to DCGM and NVML for GPU and NVSwitch monitoring

Bridge manages Fabric Manager through two controllers:

NVLink controller + Metal Provisioning:

  • Communicates with Fabric Manager via the out-of-band (OOB) network and BMC.
  • Creates and removes NVLink partitions as compute is allocated and deallocated.
  • Ensures NVSwitch routing tables are programmed correctly for each tenant's GPU group.

Observability:

  • Integrates with DCGM (Data Center GPU Manager) for GPU and NVSwitch monitoring.
  • Integrates with NVML (NVIDIA Management Library) for programmatic GPU metrics access.
  • Exposes GPU and NVSwitch health metrics on the Bridge dashboard.

Partition Lifecycle

EventBridge Action
Compute allocated to tenantBridge calls Fabric Manager to assign the tenant's GPUs to an isolated NVLink partition
Compute deallocatedBridge removes the GPU GUIDs from the tenant's partition, immediately revoking NVLink fabric membership

NVLink Secure Partitions are created and removed automatically with each compute allocation — no manual Fabric Manager configuration is required.

Combined Isolation Model

Bridge enforces GPU isolation at two hardware levels simultaneously:

LayerMechanismEnforced By
Host levelIOMMU device passthrough — GPU memory mapped exclusively to the tenant's VMCPU/chipset hardware
Fabric levelNVLink Secure Partition — NVLink routing blocked between tenant GPU groupsNVSwitch hardware via Fabric Manager

Together, these two layers guarantee that a tenant's GPUs cannot access memory or communicate over NVLink with GPUs belonging to any other tenant, while preserving full NVLink performance within each tenant's allocation.

Supported Hardware

NVSwitch-based GPU interconnect and NVLink Secure Partition are available on NVIDIA HGX systems:

PlatformNVSwitchNVLink Secure Partition
HGX H100YesYes
HGX H200YesYes
HGX B200YesYes
GB200 NVL72YesYes
H100 PCIe / A100 PCIeNoNot applicable (IOMMU only)