Skip to main content

Models Overview

What Are Models in Bridge?

Models in Armada Bridge is the capability to deploy and serve machine learning (ML) and AI models as inference APIs. As a Tenant Admin, you choose models from the Bridge model catalog, deploy them with GPU resources and endpoints, and expose them so that tenant users and applications can send requests and get predictions (inference) in real time.

How It Works

  1. Catalog — Bridge provides a model catalog with ready-to-deploy models from Hugging Face (e.g., Qwen, Llama) and NVIDIA NIM. Your administrator can add more models to the catalog.
  2. Deploy — You select a model from the catalog, configure name, endpoint, GPU type and count, and rate limits, then deploy. Bridge provisions the infrastructure and serves the model.
  3. Endpoint — Each deployed model is exposed through an endpoint. Applications and users call this endpoint to get predictions.
  4. Inference — After deployment, users can try the model in the Model Playground or call the endpoint from your applications via API.

For step-by-step deployment, see Deploy Model.

Key Capabilities

  • Deploy from catalog — Use pre-configured models from Hugging Face and NVIDIA NIM without building custom containers.
  • GPU-backed inference — Assign GPU type and count (e.g., L4) so inference runs with the right performance.
  • Endpoints — Expose models via endpoints so internal apps, external services, or the Model Playground can call them.
  • Rate limits and pricing — Set tokens per minute, requests per minute, and optional pricing for usage and governance.
  • Scale and operate — Scale replicas, view metrics (latency, throughput, GPU usage), and access logs from the Bridge UI or kubectl.
  • Playground — Test and iterate on deployed models in the Model Playground before integrating with applications.

Next Steps

  • Deploy Model — Step-by-step guide to selecting a model, configuring deployment, and monitoring until the model is running.
  • Model Playground — Test and interact with your deployed models from the Bridge UI.