Skip to main content

AMD GPU

Bridge is adding support for AMD Instinct GPU accelerators. AMD GPU integration follows the same infrastructure lifecycle as NVIDIA GPU support — discovery via Redfish, catalog registration, post-provisioning software stack installation, and IOMMU-based tenant isolation.

Current Status

FeatureStatus
AMD Instinct MI300X discoveryIn progress
IOMMU passthrough isolationIn progress
ROCm software stack post-provisioningIn progress
GPU observability (ROCm SMI / OTEL)In progress
MIG-equivalent partitioningNot applicable (AMD uses Compute Partitioning)

Architecture

AMD Instinct GPUs are connected via PCIe and isolated using IOMMU passthrough — the same hardware isolation model used for NVIDIA PCIe GPUs (H100 PCIe, A100 PCIe). Bridge assigns each GPU to a specific IOMMU group and passes it directly through to the tenant's bare metal or VM environment.

AMD Instinct MI300X is an integrated GPU + CPU (APU) architecture that includes on-package HBM memory. Bridge catalogs MI300X as a compute flavor with combined CPU and GPU attributes.

Post-Provisioning Software Stack

For AMD GPU servers, Bridge's post-provisioning controller will install:

ComponentPurpose
ROCm (Radeon Open Compute)AMD GPU compute runtime and math libraries
MIOpenDeep learning primitives library (equivalent to cuDNN)
RCCLCollective communications library for multi-GPU workloads
AMD GPU kernel driver (amdgpu)Kernel module for AMD GPU access
ROCm SMIGPU health and metrics access
  • GPU Overview — Supported GPU families and resource models
  • NVIDIA — NVIDIA GPU configuration reference