Onboarding NVIDIA NVIS Deployed GPU Topology with Bridge GPU CMS
We recently collaborated with the NVIDIA Infrastructure Specialist (NVIS) team to onboard and validate a complex metadata topology deployed by NVIS into our Bridge GPU Cloud Management Software (CMS). This activity demonstrates how Bridge GPU CMS can take over an NVIS deployed GPU topology and then perform day 1, 2 activities such as discovery, dynamic multi-tenancy, observability, fault management, and more.
The Challenge
The initial deployment and configuration of GPU hardware for NVIDIA Cloud Partners (NCPs) and subsequent management is often done by different entities. Case in point, Day 0 tasks for NCP GPU environments are often performed by NVIS. After NVIS hands over the cluster to the NCP effectively with one single tenant, the task of multi-tenancy and other Day 1, 2 tasks can be performed by Bridge GPU CMS.
In other words, there is a hand-off from an NVIS deployed topology to our GPU CMS. NCPs and GPU-as-a-service providers needed a robust and automated method to:
- Onboard a topology created by NVIS onto Bridge GPU CMS
- Validate that the metadata topology files created by the NVIDIA NVIS team after deploying the hardware are correctly and completely onboarded
- Efficiently provision underlay and overlay network configurations for onboarding infrastructure tenants
Validation & Onboarding with Bridge GPU CMS
We validated the successful hand-off of a 16 SU topology deployed by NVIS to Bridge GPU CMS. This validation was performed on NVIDIA Air.
Step-by-Step Workflow
- Metadata Onboarding: Imported the NVIS metadata topology file into Bridge GPU CMS
- RA Compliance Validation: Automatically validated the metadata against RA compliance rules. Non-compliance feedback was immediately provided to the user with actionable insights
- Topology Discovery: Dynamically discovered all underlying topology nodes (compute, network, and storage) referenced in the metadata
- Underlay Configuration: Configured network underlay settings for discovered nodes, ensuring base connectivity across the infrastructure
- Tenant Overlay Creation: Built tenant-specific overlay networks, enabling scalable multi-tenant operations on top of the validated infrastructure
Impact
By automating and validating the metadata topology through Bridge GPU CMS, the NCPs can achieve:
- Clean hand-off from NVIS to Bridge GPU CMS
- Faster deployment readiness
- Improved reliability of infrastructure metadata
- Streamlined compliance checks, reducing engineering effort
This use case illustrates how Bridge GPU CMS can successfully onboard a GPU topology deployed by NVIS. This validation is very important for NCPs as they require a clean hand-off between Day 0 to Day 1,2 activities without any disruptions.
