AMD GPU
Bridge is adding support for AMD Instinct GPU accelerators. AMD GPU integration follows the same infrastructure lifecycle as NVIDIA GPU support — discovery via Redfish, catalog registration, post-provisioning software stack installation, and IOMMU-based tenant isolation.
Current Status
| Feature | Status |
|---|---|
| AMD Instinct MI300X discovery | In progress |
| IOMMU passthrough isolation | In progress |
| ROCm software stack post-provisioning | In progress |
| GPU observability (ROCm SMI / OTEL) | In progress |
| MIG-equivalent partitioning | Not applicable (AMD uses Compute Partitioning) |
Architecture
AMD Instinct GPUs are connected via PCIe and isolated using IOMMU passthrough — the same hardware isolation model used for NVIDIA PCIe GPUs (H100 PCIe, A100 PCIe). Bridge assigns each GPU to a specific IOMMU group and passes it directly through to the tenant's bare metal or VM environment.
AMD Instinct MI300X is an integrated GPU + CPU (APU) architecture that includes on-package HBM memory. Bridge catalogs MI300X as a compute flavor with combined CPU and GPU attributes.
Post-Provisioning Software Stack
For AMD GPU servers, Bridge's post-provisioning controller will install:
| Component | Purpose |
|---|---|
| ROCm (Radeon Open Compute) | AMD GPU compute runtime and math libraries |
| MIOpen | Deep learning primitives library (equivalent to cuDNN) |
| RCCL | Collective communications library for multi-GPU workloads |
AMD GPU kernel driver (amdgpu) | Kernel module for AMD GPU access |
| ROCm SMI | GPU health and metrics access |
Related Pages
- GPU Overview — Supported GPU families and resource models
- NVIDIA — NVIDIA GPU configuration reference