Armada.ai Bridge Tenant Guide
Purpose
Armada Bridge provides self-service, on-demand access to secure, high-performance GPU infrastructure and AI platforms — without requiring tenants to manage hardware, networking, or cloud operations.
Overview
This guide provides comprehensive instructions for Tenants to manage and utilize resources within the Armada Bridge platform.
Key Capabilities
As a Tenant Admin or Tenant User (which are explained in following sections), you can:
- Allocate Resources - Allocate bare metal servers and virtual machines
- Create Clusters - Set up Slurm, JupyterHub, and Kubernetes clusters
- Deploy Models - Deploy ML/AI models for inference
- Configure GPU - Set up NVIDIA Multi-Instance GPU (MIG) profiles
- Manage Endpoints - Create endpoints for services
- Access Jupyter - Run interactive notebooks with GPU access
- Deploy Applications - Deploy custom workloads and applications
- Scale Infrastructure - Scale clusters up or down based on demand
Guide Structure
This guide is organized into the following sections:
- Dashboard - Overview of tenant resources
- Resource Allocation - Allocate servers and infrastructure
- Cluster Management - Create and manage different cluster types
- GPU Configuration - Configure MIG profiles
- Endpoints & Services - Create service endpoints
- Model Deployment - Deploy ML models
- Jupyter Access - Use Jupyter notebooks
- Workload Management - Deploy and manage workloads
- Application Management - Deploy applications
- Cleanup - Delete resources
Armada Bridge — Tenant Overview
Tenants consume GPUs as a service with:
- Clear boundaries
- Predictable performance
- Full usage visibility
Tenant Roles
Bridge supports two tenant personas:
- Tenant Admin – Manages users, quotas, and services
- Tenant User – Consumes compute, platforms, and AI services
What Tenants Get
1. Isolated GPU Infrastructure
Each tenant receives infrastructure that is:
- Logically hard isolated
- Equivalent to a private GPU cloud
- Consistent across bare metal, VMs, and Kubernetes
Tenants do not share any infrastructure resources (GPUs, Networking, Storage etc).
2. Multiple Consumption Models
Bare Metal GPU Instances as a Service (BMaaS)
- On-demand Dedicated GPU servers
- Ideal for large training or regulated workloads
- Can form clusters or supercomputers
- Fully isolated from other Tenants
- Part of Tenant VPC and Subnet(s)
Virtual Machines with GPUs as a Service (VMaaS)
- On-demand GPU VMs
- GPU passthrough and fractional GPUs (MIG)
- Suitable for development and inference
- Fully isolated from other Tenants
- Part of Tenant VPC and Subnet(s)
Platform-as-a-Service (PaaS)
- Managed Kubernetes clusters
- Autoscaling based on GPU utilization
- Deploy Applications from a Catalog (Marketplace)
3. AI & Model Services (Self-Service)
Tenants can:
- Select models from curated catalogs (Hugging Face, NVIDIA NIM, private repositories)
- Deploy inference endpoints
- Run fine-tuning and batch jobs
- Deploy Jupyter Notebooks on KAI
All services are exposed through simple UI workflows and APIs.
Tenant Admin Capabilities
Tenant Admins manage:
- Tenant users and roles
- Usage limits and quotas
- Compute, clusters, and storage provisioning
- Model catalogs and integrations
- Monitoring, usage, performance, and cost tracking
- Alerts and billing controls
Physical infrastructure and shared fabric remain abstracted.
Tenant End User Experience
End Users can:
- Provision GPU instances or platforms
- Submit training, inference, or HPC jobs
- Access Jupyter Notebooks or LLM endpoints
- Monitor job status and resource usage
- Access logs and performance metrics
Users focus on workloads — not infrastructure.
Tenant Value Proposition
With Armada Bridge, tenants gain:
- Predictable, isolated GPU access without CapEx
- Flexible infrastructure, platform, and AI services
- Faster time-to-model and time-to-inference
- Enterprise-grade security and observability
- A consistent experience across development, training, and production
Getting Started
Start with the Dashboard Overview to see your tenant resources, then proceed to allocate infrastructure and create your first cluster.