Skip to main content

Kubeflow

Overview

The Kubeflow template provisions a complete Machine Learning operations (MLOps) platform on your Kubernetes cluster. This deployment includes the full Kubeflow ecosystem including Central Dashboard, Notebooks, Pipelines, and Katib

Accessing the cluster

Bridge provides two ways to work with your cluster after it is created:

  • Download kubeconfig — You can download the cluster kubeconfig file from the cluster menu. Use this to access the cluster from your local machine or external tools by setting KUBECONFIG.

  • Kubectl Terminal — Interact directly with the Kubernetes cluster from Bridge UI. Use this to:

    • Run kubectl commands directly from the UI to add any additional kubeflow config
    • Monitor the 50+ microservices that make up the Kubeflow stack

Prerequisites

  • Tenant Admin access — Log in as a Tenant Admin to create clusters.
  • Compute resources — Minimum 16GB RAM and 4 CPUs per node recommended for Kubeflow.

Create a Kubeflow Cluster

Step 1: Start Cluster Creation

  1. Log in to Bridge as a Tenant Admin.
  2. In the left sidebar, click Compute → Cluster.
  3. Click Create Cluster.

Create Cluster Button

Step 2: Configure Cluster Details

  1. Enter a name and description for the cluster, then click Next.
  2. Select the Kubernetes version.
  3. Select the CNI plugin (Cilium is preferred for Istio compatibility).
  4. (Optional) Enable Install NVIDIA GPU tools to enable GPU-accelerated Notebooks and Training.
  5. Click Next.

Fill Cluster Setup Details

Fill Cluster CNI Plugin

Step 3: Select Cluster Template

  1. Choose Kubeflow Template.
  2. Click Next.

Select Cluster Nodes

Step 4: Select Nodes and Create

  1. Select the cluster node(s) (Bare Metal or Virtual Machine).
  2. Click Create to start cluster creation.

Select Cluster Nodes

Step 5: Monitor Cluster Creation

Wait until the status is Running.

Note: Kubeflow deployment is heavy. Even after the cluster is "Running," it may take an additional 5-10 minutes for all Kubeflow components to initialize.

  1. Initializing Control Planes — Status shows Processing.

Controlplane processing state

  1. Initializing Workers — Status remains Processing.

workers processing state

  1. When creation completes, the Status shows Running.

workers processing state


Post-Deployment Configuration

Step 6: Map to Hostname

If this domain is not resolvable via your corporate or public DNS, you must manually point your local machine to the cluster Ingress by adding an entry to your /etc/hosts file:

<VM_PUBLIC_IP>  kubeflow.armada.ai

Replace <VM_PUBLIC_IP> with the public IP of the node or the LoadBalancer IP provided in the Cluster Overview.

kubectl get pods -n kubeflow

Step 7: Access the Dashboard

  1. Open your web browser and navigate to https://kubeflow.armada.ai.
  2. Log in using your tenant credentials (Dex/OIDC). Contact administrator for login credentials.
  3. Upon successful authentication, you will be redirected to the Kubeflow Central Dashboard.

kubeflow dashboard-1

kubeflow dashboard-2


To Learn more about how to use kubeflow for MLOps refer official Kubeflow documentation at :

https://www.kubeflow.org/docs/