Version: 5.4.0

Deploy Azure ML Model

Overview

Bridge provides native support for Azure ML Model Registry as part of its hybrid-cloud capabilities. This integration allows data science teams to synchronize their machine learning lifecycles by tracking, versioning, and deploying models stored in Azure directly through the Bridge interface. By connecting Bridge's orchestration power with Azure's robust model management, users can maintain a single source of truth for their model artifacts — ensuring that the exact version of a model trained in Azure is the one deployed to a Bridge-managed Kubernetes cluster.

This guide walks you through deploying an Azure ML model from Bridge model catalog. Bridge connects to your Azure ML registry using Azure Service Principal credentials, pulls the specified model version, deploys it on GPU infrastructure, and exposes it via an endpoint for inference.

This guide covers:

Selecting an Azure ML model from the catalog and starting deployment
Configuring the model name, endpoint, GPU type and count, and rate limits
Providing Azure ML registry name and Service Principal credentials
Monitoring deployment until the model is running

Prerequisites

Tenant Admin access — You must log in as a Tenant Admin to deploy models from the catalog.
Model catalog — The Azure ML model you want to deploy must be available in Bridge model catalog. The Available Models tab lists all deployable models. If the model you need is not listed, contact your Bridge Super Administrator to add it.
Endpoint — You may need to create or select an LLM endpoint before or during deployment.
Azure ML Registry Name — The name of your Azure ML model registry.
Azure Service Principal credentials — You need the following to authenticate Bridge with your Azure ML registry:
- Azure Client ID — The Application (client) ID of the Service Principal.
- Azure Client Secret — The client secret for the Service Principal.
- Azure Tenant ID — Your Azure Active Directory tenant ID.

note

If you do not have these credentials, contact your Azure administrator.

Deploy Model

Step 1: Select Model

Log in to Bridge as a Tenant Admin.
In the left sidebar, open Models.
Open the Available Models tab. All catalog models available for deployment are listed.

Available Models

Select the provider as Azure ML. This filters the list to show only Azure ML registry models.
Find the model you want to deploy(e.g, azure/qwen2.5-1.5b-instruct) and click Deploy.

Select Azure ML Model

Step 2: Model Details

Enter a model Name and Description.
Click Next.

Azure ML Model Details

Step 3: Model Configuration

Select the dType (data type) from the dropdown.
Provide the below Azure ML Credentials.
- Enter the Client ID.
- Enter the Client Secret.
- Enter the Azure Tenant ID.
- Enter the Registry Name.
Enter the Max Model Length.

info

dType controls the numerical precision used for the model's weights during inference — lower precision (e.g., float16) reduces GPU memory usage and speeds up inference, while higher precision (e.g., float32) preserves accuracy. Select auto to use the model's original precision.
Max Model Length is the maximum number of tokens (input + output combined) the model can process in a single request. A higher value allows longer prompts and responses but requires more GPU memory.

Azure ML Model Configuration

Step 4: Select Endpoint and GPU

Select the Endpoint that will expose this model.
Select the GPU type (e.g., A100, V100, T4).
Set GPU count (e.g., 1).
Click Next.

tip

Select the GPU type based on the model size and your workload requirements. Ensure the model version you trained and registered in Azure ML is compatible with the GPU resources allocated here.

Select Azure ML Endpoint and GPU

Step 5: Set Rate Limits and Pricing

Configure the following rate limits and pricing:
- Token per minute — e.g., 4000000
- Request per minute — e.g., 50
- Currency — e.g., USD
- Price per million input tokens — e.g., 1
- Price per million output tokens — e.g., 1
Click Deploy.

Azure ML Rate Limits and Pricing

Step 6: Monitor Deployment

Deployment typically takes 10–20 minutes. Bridge pulls the exact model version from your Azure ML registry and deploys it to a Bridge-managed Kubernetes cluster.

Watch the deployment progress in the UI. The model status will initially show Processing.

Azure ML Process State

When deployment completes successfully, the model status shows Running.

Azure ML Success State

Next Steps

Deploy Hugging Face Model — Deploy open-source Hugging Face models.
Deploy NIM Model — Deploy GPU-optimized NVIDIA NIM models.
Access Model Playground — Test deployed AI models interactively by sending prompts and inspecting responses in real time.

Overview​

Prerequisites​

Deploy Model​

Step 1: Select Model​

Step 2: Model Details​

Step 3: Model Configuration​

Step 4: Select Endpoint and GPU​

Step 5: Set Rate Limits and Pricing​

Step 6: Monitor Deployment​

Next Steps​