Skip to main content

Bridge Tenant Guide

Purpose

Bridge provides self-service, on-demand access to secure, high-performance GPU infrastructure and AI platforms — without requiring tenants to manage hardware, networking, or cloud operations.

Overview

This guide provides comprehensive instructions for Tenants to manage and utilize resources within Bridge platform.

Key Capabilities

As a Tenant Admin or Tenant User (which are explained in following sections), you can:

  • Allocate Resources - Allocate bare metal servers and virtual machines
  • Create Clusters - Set up Slurm, JupyterHub, and Kubernetes clusters
  • Deploy Models - Deploy ML/AI models for inference
  • Configure GPU - Set up NVIDIA Multi-Instance GPU (MIG) profiles
  • Manage Endpoints - Create endpoints for services
  • Access Jupyter - Run interactive notebooks with GPU access
  • Deploy Applications - Deploy custom workloads and applications
  • Scale Infrastructure - Scale clusters up or down based on demand

Guide Structure

This guide is organized into the following sections:

  1. Dashboard - Overview of tenant resources
  2. Resource Allocation - Allocate servers and infrastructure
  3. Cluster Management - Create and manage different cluster types
  4. GPU Configuration - Configure MIG profiles
  5. Endpoints & Services - Create service endpoints
  6. Model Deployment - Deploy ML models
  7. Jupyter Access - Use Jupyter notebooks
  8. Workload Management - Deploy and manage workloads
  9. Application Management - Deploy applications
  10. Cleanup - Delete resources

Bridge — Tenant Overview

Tenants consume GPUs as a service with:

  • Clear boundaries
  • Predictable performance
  • Full usage visibility

Tenant Roles

Bridge supports two tenant personas:

  • Tenant Admin – Manages users, quotas, and services
  • Tenant User – Consumes compute, platforms, and AI services

What Tenants Get

1. Isolated GPU Infrastructure

Each tenant receives infrastructure that is:

  • Logically hard isolated
  • Equivalent to a private GPU cloud
  • Consistent across bare metal, VMs, and Kubernetes

Tenants do not share any infrastructure resources (GPUs, Networking, Storage etc).


2. Multiple Consumption Models

Bare Metal GPU Instances as a Service (BMaaS)

  • On-demand Dedicated GPU servers
  • Ideal for large training or regulated workloads
  • Can form clusters or supercomputers
  • Fully isolated from other Tenants
  • Part of Tenant VPC and Subnet(s)

Virtual Machines with GPUs as a Service (VMaaS)

  • On-demand GPU VMs
  • GPU passthrough and fractional GPUs (MIG)
  • Suitable for development and inference
  • Fully isolated from other Tenants
  • Part of Tenant VPC and Subnet(s)

Platform-as-a-Service (PaaS)

  • Managed Kubernetes clusters
  • Autoscaling based on GPU utilization
  • Deploy Applications from a Catalog (Marketplace)

3. AI & Model Services (Self-Service)

Tenants can:

  • Select models from curated catalogs (Hugging Face, NVIDIA NIM, private repositories)
  • Deploy inference endpoints
  • Run fine-tuning and batch jobs
  • Deploy Jupyter Notebooks on KAI

All services are exposed through simple UI workflows and APIs.


Tenant Admin Capabilities

Tenant Admins manage:

  • Tenant users and roles
  • Usage limits and quotas
  • Compute, clusters, and storage provisioning
  • Model catalogs and integrations
  • Monitoring, usage, performance, and cost tracking
  • Alerts and billing controls

Physical infrastructure and shared fabric remain abstracted.


Tenant End User Experience

End Users can:

  • Provision GPU instances or platforms
  • Submit training, inference, or HPC jobs
  • Access Jupyter Notebooks or LLM endpoints
  • Monitor job status and resource usage
  • Access logs and performance metrics

Users focus on workloads — not infrastructure.


Tenant Value Proposition

With Bridge, tenants gain:

  • Predictable, isolated GPU access without CapEx
  • Flexible infrastructure, platform, and AI services
  • Faster time-to-model and time-to-inference
  • Enterprise-grade security and observability
  • A consistent experience across development, training, and production

Getting Started

Start with the Dashboard Overview to see your tenant resources, then proceed to allocate infrastructure and create your first cluster.