Bridge Tenant Guide

Purpose

Bridge provides self-service, on-demand access to secure, high-performance GPU infrastructure and AI platforms — without requiring tenants to manage hardware, networking, or cloud operations.

Overview

This guide provides comprehensive instructions for Tenants to manage and utilize resources within Bridge platform.

Key Capabilities

As a Tenant Admin or Tenant User (which are explained in following sections), you can:

Allocate Resources - Allocate bare metal servers and virtual machines
Create Clusters - Set up Slurm, JupyterHub, and Kubernetes clusters
Deploy Models - Deploy ML/AI models for inference
Configure GPU - Set up NVIDIA Multi-Instance GPU (MIG) profiles
Manage Endpoints - Create endpoints for services
Access Jupyter - Run interactive notebooks with GPU access
Deploy Applications - Deploy custom workloads and applications
Scale Infrastructure - Scale clusters up or down based on demand

Guide Structure

This guide is organized into the following sections:

Dashboard - Overview of tenant resources
Resource Allocation - Allocate servers and infrastructure
Cluster Management - Create and manage different cluster types
GPU Configuration - Configure MIG profiles
Endpoints & Services - Create service endpoints
Model Deployment - Deploy ML models
Jupyter Access - Use Jupyter notebooks
Workload Management - Deploy and manage workloads
Application Management - Deploy applications
Cleanup - Delete resources

Bridge — Tenant Overview

Tenants consume GPUs as a service with:

Clear boundaries
Predictable performance
Full usage visibility

Tenant Roles

Bridge supports two tenant personas:

Tenant Admin – Manages users, quotas, and services
Tenant User – Consumes compute, platforms, and AI services

What Tenants Get

1. Isolated GPU Infrastructure

Each tenant receives infrastructure that is:

Logically hard isolated
Equivalent to a private GPU cloud
Consistent across bare metal, VMs, and Kubernetes

Tenants do not share any infrastructure resources (GPUs, Networking, Storage etc).

2. Multiple Consumption Models

Bare Metal GPU Instances as a Service (BMaaS)

On-demand Dedicated GPU servers
Ideal for large training or regulated workloads
Can form clusters or supercomputers
Fully isolated from other Tenants
Part of Tenant VPC and Subnet(s)

Virtual Machines with GPUs as a Service (VMaaS)

On-demand GPU VMs
GPU passthrough and fractional GPUs (MIG)
Suitable for development and inference
Fully isolated from other Tenants
Part of Tenant VPC and Subnet(s)

Platform-as-a-Service (PaaS)

Managed Kubernetes clusters
Autoscaling based on GPU utilization
Deploy Applications from a Catalog (Marketplace)

3. AI & Model Services (Self-Service)

Tenants can:

Select models from curated catalogs (Hugging Face, NVIDIA NIM, private repositories)
Deploy inference endpoints
Run fine-tuning and batch jobs
Deploy Jupyter Notebooks on KAI

All services are exposed through simple UI workflows and APIs.

Tenant Admin Capabilities

Tenant Admins manage:

Tenant users and roles
Usage limits and quotas
Compute, clusters, and storage provisioning
Model catalogs and integrations
Monitoring, usage, performance, and cost tracking
Alerts and billing controls

Physical infrastructure and shared fabric remain abstracted.

Tenant End User Experience

End Users can:

Provision GPU instances or platforms
Submit training, inference, or HPC jobs
Access Jupyter Notebooks or LLM endpoints
Monitor job status and resource usage
Access logs and performance metrics

Users focus on workloads — not infrastructure.

Tenant Value Proposition

With Bridge, tenants gain:

Predictable, isolated GPU access without CapEx
Flexible infrastructure, platform, and AI services
Faster time-to-model and time-to-inference
Enterprise-grade security and observability
A consistent experience across development, training, and production

Getting Started

Start with the Dashboard Overview to see your tenant resources, then proceed to allocate infrastructure and create your first cluster.

Purpose​

Overview​

Key Capabilities​

Guide Structure​

Bridge — Tenant Overview

Tenant Roles​

What Tenants Get​

1. Isolated GPU Infrastructure​

2. Multiple Consumption Models​

Bare Metal GPU Instances as a Service (BMaaS)​

Virtual Machines with GPUs as a Service (VMaaS)​

Platform-as-a-Service (PaaS)​

3. AI & Model Services (Self-Service)​

Tenant Admin Capabilities​

Tenant End User Experience​

Tenant Value Proposition​

Getting Started​