Models Overview
What Are Models in Bridge?
Models in Bridge is the capability to deploy and serve machine learning (ML) and AI models as inference APIs. As a Tenant Admin, you choose models from Bridge model catalog, deploy them with GPU resources and endpoints, and expose them so that tenant users and applications can send requests and get predictions (inference) in real time.
How It Works
- Catalog — Bridge provides a model catalog with ready-to-deploy models from Hugging Face (e.g., Qwen, Llama), NVIDIA NIM, and Azure ML (models tracked and versioned in your Azure ML model registry). Your administrator can add more models to the catalog.
- Deploy — You select a model from the catalog, configure name, endpoint, GPU type and count, and rate limits, then deploy. Bridge provisions the infrastructure and serves the model.
- Endpoint — Each deployed model is exposed through an endpoint. Applications and users call this endpoint to get predictions.
- Inference — After deployment, users can try the model in the Model Playground or call the endpoint from your applications via API.
For step-by-step deployment, see Deploy Hugging Face Model, Deploy NIM Model, or Deploy Azure ML Model.
Key Capabilities
- Deploy from catalog — Use pre-configured models from Hugging Face, NVIDIA NIM, and Azure ML without building custom containers.
- GPU-backed inference — Assign GPU type and count (e.g., L4) so inference runs with the right performance.
- Endpoints — Expose models via endpoints so internal apps, external services, or the Model Playground can call them.
- Rate limits and pricing — Set tokens per minute, requests per minute, and optional pricing for usage and governance.
- Scale and operate — Scale replicas, view metrics (latency, throughput, GPU usage), and access logs from Bridge UI or kubectl.
- Playground — Test and iterate on deployed models in the Model Playground before integrating with applications.
Prepare Model
Model Requirements
Models should be:
- Serialized in standard format (SavedModel, ONNX, etc.)
- Packaged with dependencies
- Tested locally
- Documented with input/output specs
Supported Formats
- TensorFlow SavedModel
- PyTorch TorchScript
- ONNX models
- Custom containers
Model Catalog
Models are deployed from the Bridge model catalog. Bridge provides default models from Hugging Face (e.g., Qwen/Qwen2.5-1.5B-Instruct), NIM (e.g., meta/llama-3.2-3b-instruct), and Azure ML (models tracked and versioned in your Azure ML model registry). Azure ML integration enables hybrid-cloud workflows — data science teams can train and version models in Azure, then deploy the exact model version to a Bridge-managed Kubernetes cluster. The Available Models tab in the Models section lists all models that can be deployed. If the model you need is not listed, contact Bridge Super Administrator to add it to the catalog.
Model Requirements (for Catalog Models)
Models in the catalog are typically:
- Serialized in a standard format — For example, SavedModel, ONNX, or TorchScript, so they can be loaded and served by the runtime.
- Packaged with dependencies — Any runtime or framework dependencies are included or specified.
- Documented — Input/output specifications and usage are documented so you can configure deployment and test correctly.
The catalog may include models in these formats; the model card or catalog entry indicates the format.
Credentials (if Required)
Some catalog models require credentials to pull the model or access gated assets. Have the following ready depending on your model provider:
- Hugging Face token — For models that require Hugging Face access. Create one from https://huggingface.co/ (sign in or sign up, then create a token in your account settings).
- NVIDIA NGC API key — For NIM models distributed through NVIDIA NGC. Create one from https://ngc.nvidia.com/ (sign in or sign up, then generate an API key under Setup > API Key).
- Azure Service Principal credentials — For Azure ML models. You need the Azure ML Registry Name, Client ID, Client Secret, and Tenant ID. Contact your Azure administrator to obtain these credentials.
Next Steps
- Deploy Hugging Face Model — Deploy open-source models from Hugging Face Hub.
- Deploy NIM Model — Deploy GPU-optimized NVIDIA NIM inference containers.
- Deploy Azure ML Model — Deploy models from your Azure ML model registry to Bridge-managed Kubernetes clusters.