Deploy Azure ML Model
Overview
Bridge provides native support for Azure ML Model Registry as part of its hybrid-cloud capabilities. This integration allows data science teams to synchronize their machine learning lifecycles by tracking, versioning, and deploying models stored in Azure directly through the Bridge interface. By connecting Bridge's orchestration power with Azure's robust model management, users can maintain a single source of truth for their model artifacts — ensuring that the exact version of a model trained in Azure is the one deployed to a Bridge-managed Kubernetes cluster.
This guide walks you through deploying an Azure ML model from Bridge model catalog. Bridge connects to your Azure ML registry using Azure Service Principal credentials, pulls the specified model version, deploys it on GPU infrastructure, and exposes it via an endpoint for inference.
This guide covers:
- Selecting an Azure ML model from the catalog and starting deployment
- Configuring the model name, endpoint, GPU type and count, and rate limits
- Providing Azure ML registry name and Service Principal credentials
- Monitoring deployment until the model is running
Prerequisites
- Tenant Admin access — You must log in as a Tenant Admin to deploy models from the catalog.
- Model catalog — The Azure ML model you want to deploy must be available in Bridge model catalog. The Available Models tab lists all deployable models. If the model you need is not listed, contact your Bridge Super Administrator to add it.
- Endpoint — You may need to create or select an LLM endpoint before or during deployment.
- Azure ML Registry Name — The name of your Azure ML model registry.
- Azure Service Principal credentials — You need the following to authenticate Bridge with your Azure ML registry:
- Azure Client ID — The Application (client) ID of the Service Principal.
- Azure Client Secret — The client secret for the Service Principal.
- Azure Tenant ID — Your Azure Active Directory tenant ID.
If you do not have these credentials, contact your Azure administrator.
Deploy Model
Step 1: Select Model
- Log in to Bridge as a Tenant Admin.
- In the left sidebar, open Models.
- Open the Available Models tab. All catalog models available for deployment are listed.

- Select the provider as Azure ML. This filters the list to show only Azure ML registry models.
- Find the model you want to deploy(e.g, azure/qwen2.5-1.5b-instruct) and click Deploy.

Step 2: Model Details
- Enter a model Name and Description.
- Click Next.

Step 3: Model Configuration
- Select the dType (data type) from the dropdown.
- Provide the below Azure ML Credentials.
- Enter the Client ID.
- Enter the Client Secret.
- Enter the Azure Tenant ID.
- Enter the Registry Name.
- Enter the Max Model Length.
- dType controls the numerical precision used for the model's weights during inference — lower precision (e.g., float16) reduces GPU memory usage and speeds up inference, while higher precision (e.g., float32) preserves accuracy. Select auto to use the model's original precision.
- Max Model Length is the maximum number of tokens (input + output combined) the model can process in a single request. A higher value allows longer prompts and responses but requires more GPU memory.

Step 4: Select Endpoint and GPU
- Select the Endpoint that will expose this model.
- Select the GPU type (e.g., A100, V100, T4).
- Set GPU count (e.g.,
1). - Click Next.
Select the GPU type based on the model size and your workload requirements. Ensure the model version you trained and registered in Azure ML is compatible with the GPU resources allocated here.

Step 5: Set Rate Limits and Pricing
- Configure the following rate limits and pricing:
- Token per minute — e.g.,
4000000 - Request per minute — e.g.,
50 - Currency — e.g.,
USD - Price per million input tokens — e.g.,
1 - Price per million output tokens — e.g.,
1
- Token per minute — e.g.,
- Click Deploy.

Step 6: Monitor Deployment
Deployment typically takes 10–20 minutes. Bridge pulls the exact model version from your Azure ML registry and deploys it to a Bridge-managed Kubernetes cluster.
- Watch the deployment progress in the UI. The model status will initially show Processing.

- When deployment completes successfully, the model status shows Running.

Next Steps
- Deploy Hugging Face Model — Deploy open-source Hugging Face models.
- Deploy NIM Model — Deploy GPU-optimized NVIDIA NIM models.
- Access Model Playground — Test deployed AI models interactively by sending prompts and inspecting responses in real time.