Delivering Distributed AI at the Edge with Bridge

September 2, 2025 · 6 min read

Amar Kapadia

Product

Sriram Rupanagunta

Engineering

All AI is not created equal. While centralized inference serves some use-cases well where long thinking times are acceptable, new use cases such as physical AI, real-time agentic AI chatbots, digital avatars doing real time dialog, and computer vision require faster response times. It is not just about network latency, but compute latency becomes important, mandating computation closer to data sources, and lower bandwidth usage across the network in order to scale cost effectively.

These applications can't tolerate the latency of round trips to centralized data centers nor can they afford the cost of constantly transferring large volumes of data. Instead, they require inference that is geographically distributed, dynamically orchestrated, and tightly optimized for latency and bandwidth.

The Growing Need for Edge Inference

This is fueling a surge in demand for distributed inference infrastructure—capable of running AI models across clusters of GPUs residing at regional data centers and edge sites, while maintaining cloud-like flexibility and scale. The distributed inference market is poised for exceptional growth between 2025 and 2030, with projections indicating an expansion from USD 106.15 billion to USD 254.98 billion at a CAGR of 19.2%.

Why NVIDIA MGX Servers Are a Game-Changer for Edge Inference

NVIDIA MGX servers, based on a modular reference design, can be used for a wide variety of use cases, from compute-intensive datacenter to edge workloads. MGX provides a new standard for modular server design by improving ROI and reducing time to market and is especially suited to distributed inference. Some of the reasons for this are:

Modular design allows core and edge sites to scale from 1 RU to multiple racks of servers
High performance per watt allows maximum GPU compute capacity to be deployed at the distributed inference site
Integration with NVIDIA Cloud Functions (NVCF) and NVIDIA NIM, both included in the NVIDIA AI Enterprise suite, providing access to a large number of vertically oriented models and solutions

When combined with NVIDIA Spectrum-X Ethernet networking platform for AI, customers can extract the full performance of the underlying GPUs.

Challenges in Building an Edge Inference Stack

While MGX servers along with Spectrum-X and NVIDIA AI Enterprise offer an integrated solution stack, distributed inference presents a number of infrastructure challenges for a GPU-as-a-Service (GPUaaS) provider:

Managing multiple sites: Distributed GPUaaS providers typically have multiple sites that are often in light-out environments. The infrastructure consisting of compute, storage, networking, and WAN gateways has to be managed remotely with the lowest possible OPEX
Managing isolation between multiple tenants: Distributed GPU sites have multiple tenants that demand the highest level of security between tenants
Matching workloads to the correct GPU site: Workloads have to be mapped to the appropriate site for latency, bandwidth, compliance, or data gravity reasons
Maximizing utilization: Given the high cost of GPUs, utilization has to be as close to 100% as possible at all times

The Need for Secure, Dynamic Tenancy and Isolation

The above challenges require a secure and dynamic tenancy software layer for distributed inference. The ideal software solution must offer:

Zero touch management of the underlying hardware infrastructure potentially across 10,000s edge and core sites to slash OPEX
Isolation between tenants for security and compliance
Dynamic resource scaling for maximizing GPU utilization
Registration of underutilized resources with NVCF for maximizing GPU utilization

Introducing Bridge GPU Cloud Management Software

Bridge GPU CMS provides the following functionality:

On-demand isolation spanning CPU, GPU, network, storage, and the WAN gateway
Bare metal, virtual machine, or container instances
Automated infrastructure management for tenants with scale-out and scale-in
Admin functionality to discover, observe, and manage the underlying hardware across 10,000s of sites
Billing and User management with RBAC
Integration with open source (Ray, vLLM) or 3rd party PaaS (Red Hat OpenShift and more)
Integration with NVIDIA Cloud Functions (NVCF) to monetize unused capacity
Centralized Management for managing and orchestrating multiple Edge locations

Reference Architecture: NVIDIA + Bridge for Edge Inference

Bridge GPU CMS when coupled with NVIDIA MGX, Spectrum-X, and NVIDIA AI Enterprise solves the above-listed problems for GPUaaS providers. The components for this architecture are:

NVIDIA MGX servers with NVIDIA Bluefield-3 DPU or CX7 ethernet card
NVIDIA Spectrum switches for communication (East-West and North-South)
NVIDIA Spectrum switches for OOB Management
NVIDIA AI Enterprise (NVIDIA NIM, NVCF)
Optional NVIDIA Quantum InfiniBand switches for East-West communication
External High Performance Storage (HPS) from partner solutions
Bridge GPU Cloud Management Software (CMS)

Installation

The infrastructure is installed at the distributed edge-core locations, along with other software components including Bridge GPU CMS, and all the hardware related tests are performed, before onboarding the resources.

Onboarding

Once this is completed, the site administrator discovers the infrastructure using Bridge GPU CMS, and creates the underlay network using the Spectrum switches and the network adapters or DPUs on the MGX servers. The Admin then goes on to create tenants, which are the logical entities that run different types of workloads on the same physical infrastructure.

Isolation

The important consideration while allocating these resources is that they need to be fully isolated, so that each tenant's workload can run without any performance or security implication from other tenants. Bridge GPU CMS ensures this by providing hard multi-tenancy at all levels - CPU, GPU, memory, network adapters, network switches, internal and external storage, all the way to the external gateway.

Conclusion: The Path to Scalable, Multi-Tenant, Distributed AI

The next wave of AI adoption depends on pushing inference closer to where data is generated—at the edge. Achieving this requires more than raw compute; it calls for an architecture that delivers secure multi-tenancy, dynamic scaling, high utilization, and seamless integration with cloud-native AI services.

NVIDIA MGX servers, combined with Spectrum-X networking and NVIDIA AI Enterprise, provide the performance and flexibility needed for distributed Edge deployments. Layered with Bridge GPU Cloud Management Software, organizations gain the management and orchestration capabilities essential for turning this distributed infrastructure into a scalable, revenue-generating service.

The opportunity is clear: distributed inference is becoming central to how next-generation AI will be delivered. Now is the time to explore, pilot, and engage to unlock these capabilities and be part of the ecosystem shaping the future of AI at the edge.

The Growing Need for Edge Inference​

Why NVIDIA MGX Servers Are a Game-Changer for Edge Inference​

Challenges in Building an Edge Inference Stack​

The Need for Secure, Dynamic Tenancy and Isolation​

Introducing Bridge GPU Cloud Management Software​

Reference Architecture: NVIDIA + Bridge for Edge Inference​

Installation​

Onboarding​

Isolation​

Conclusion: The Path to Scalable, Multi-Tenant, Distributed AI​