Skip to main content

Kubernetes Cluster (PaaS) Overview

The Kubernetes PaaS layer needs to support a diverse set of AI/ML workloads, such as:

  • Large language model training, inference, distilling, fine tuning, RAG
  • Small language model training, inference
  • Deep learning training
  • HPC workloads
  • Any other cloud-native applications

These workloads require different underlying platform capabilities. For example, HPC needs physical / virtualized compute clusters, while others need container orchestration tools like Kubernetes. There is no one-size-fits-all solution for providing a unified PaaS layer that is capable of supporting all such diverse workloads.[^1] Additionally, this layer also needs to be offered in a multi-tenant manner and in synchronization with the underlying infra components. Finally, the solution should also ensure optimum GPU utilization.

Armada Bridge addresses these aspects by providing a PaaS layer with following capabilities:

  • Create / Scale / Delete Kubernetes clusters using the underlying BM or VM compute instances

    • Multiple Kubernetes distributions - Upstream K8s, RedHat Openshift, SuSE Rancher - are supported, and other distributions can be integrated easily due to the open nature of CMS architecture.
  • Native job scheduler with common interface for different types of job submission

  • PaaS capabilities with underlay infra comprising both bare metal / virtualized compute nodes and K8s clusters

  • Support for both containerized and non-containerized workloads

  • Dynamic autoscaling of AI/ML workloads within and across clusters, based on GPU utilization

  • Auto scale-in and scale-out of Kubernetes clusters by allocating and de-allocating GPU compute nodes

  • Registering Kubernetes cluster with NVIDIA Cloud Functions (NVCF): GPUaaS providers/NCPs need to maximize the utilization of their GPUs to get the best return-on-investment. The NVIDIA Cloud Functions (NVCF) product is a mechanism to do so. NVCF allows NCPs to sell their unused capacity to pre-approved users as spot instances. From a technology point of view, the NCP needs to register a GPU based Kubernetes cluster that scales dynamically depending on the number of unused GPUs at any given moment in time. For example, there may only be 2 GPUs available at 2PM, but 200 GPUs available at 2AM. The Kubernetes cluster registered with NVCF needs to scale dynamically to accommodate these unused cycles. In this case, the cluster(s) are fully isolated from other tenants/workloads and scaled-out/in which allows the NCPs/GPUaaS providers to monetize their excess GPU capacity. CMS provides usage metrics for these clusters that are registered with NVCF, to enable the billing process.

  • Supporting 3^rd^ party Job Schedulers: Armada Bridge PaaS layer is also capable of utilizing 3^rd^ party job schedulers like Run:ai and SLURM, based on the NCP / AI Cloud provider preferences. Run:AI is an AI/ML workload orchestration platform built on Kubernetes, providing dynamic GPU allocation, fractional GPU sharing, job queue management, and priority-based scheduling to maximize GPU utilization across multiple users and workloads. By integrating with Run:AI, Armada Bridge enables organizations to take advantage of Run:AI's scheduling features while maintaining full control over resource allocation and multi-tenancy through Bridge GPU CMS-native capabilities. Additionally, Armada Bridge provides fine-grained RBAC, tenant isolation, and a unified observability layer, allowing seamless orchestration of AI jobs while ensuring efficient GPU utilization, workload prioritization, and policy enforcement.

Different personas consume the PaaS layer in different ways:

In this scenario, the NCP Admin (Super Admin) does not have any role to play since their responsibility is to provide isolated infrastructure.

Tenant Admin:

  • Create and manage clusters (add / remove nodes) comprising BM / virtualized compute nodes

  • Create and manage Kubernetes clusters with multiple master and worker nodes (add / remove nodes)

  • Define cluster wide policies

  • Create / Invite end users

  • Provision external storage for the clusters

  • Define policies and quota for end users

  • Observe and monitor the clusters

  • Specify billing parameters

  • Set FinOps alerts, enable / disable access to enterprise users

Tenant User:

  • Access PaaS resources

  • Submit HPC, AI/ML custom jobs / workloads

  • Monitor job status / workloads