Skip to main content
Version: 5.4.0

Deploy NIM Model

Overview

This guide walks you through deploying an NVIDIA NIM model from Bridge model catalog. NVIDIA NIM (NVIDIA Inference Microservices) provides pre-optimized, GPU-accelerated inference containers for popular AI models (e.g., meta/llama-3.2-3b-instruct, nvidia/llama-3.1-nemotron-70b-instruct). Bridge pulls the NIM container, serves it on GPU infrastructure, and exposes it via an endpoint for inference.

This guide covers:

  • Selecting a NIM model from the catalog and starting deployment
  • Configuring the model name, endpoint, GPU type and count, and rate limits
  • Providing NVIDIA NGC credentials (if required)
  • Monitoring deployment until the model is running

Prerequisites

  • Tenant Admin access — You must log in as a Tenant Admin to deploy models from the catalog.
  • Model catalog — The NIM model you want to deploy must be available in Bridge model catalog. The Available Models tab lists all deployable models. If the model you need is not listed, contact your Bridge Super Administrator to add it.
  • Endpoint — You may need to create or select an NIM endpoint before or during deployment.
  • NVIDIA NIM API key — NIM models are distributed through NVIDIA NGC. You need an NVIDIA NIM API key to pull the model container. Have your key ready.
note

If you do not have an NVIDIA NIM API key, follow these steps to create one:

  1. Go to https://ngc.nvidia.com/ and sign in (or sign up for a free account).
  2. Click your user icon in the top-right corner.
  3. Select Setup.
  4. In the Setup page, click Generate API Key.
  5. Click Confirm to generate a new key. Copy and save it — the key is shown only once.

Deploy Model

Step 1: Select Model

  1. Log in to Bridge as a Tenant Admin.
  2. In the left sidebar, open Models.
  3. Open the Available Models tab. All catalog models available for deployment are listed.

Available Models

  1. Select the provider as NIM. This filters the list to show only NIM models.
  2. Find the model you want to deploy (e.g., meta/llama-3.2-3b-instruct) and click Deploy.

Select NIM Model

Step 2: Model Details

  1. Enter a model Name and Description.
  2. Click Next.

NIM Model Details

Step 3: Model Configuration

  1. Enter your NVIDIA NIM API Key to authenticate and pull the NIM container image.
info
  • NVIDIA NIM API Key authenticates your access to NVIDIA NGC to pull the NIM container image. The key is stored securely and used only during the deployment process.

NIM Model Configuration

Step 4: Select Endpoint and GPU

  1. Select the Endpoint that will expose this model.
  2. Select the GPU type (e.g., A100, H100, L4).
  3. Set GPU count (e.g., 1).
  4. Click Next.

Select NIM Endpoint and GPU

Step 5: Set Rate Limits and Pricing

  1. Configure the following rate limits and pricing:
    • Token per minute — e.g., 4000000
    • Request per minute — e.g., 50
    • Currency — e.g., USD
    • Price per million input tokens — e.g., 1
    • Price per million output tokens — e.g., 1
  2. Click Deploy.

NIM Rate Limits and Pricing

Step 6: Monitor Deployment

Deployment typically takes 5–10 minutes (NIM containers are pre-optimized so startup is faster than pulling raw model weights).

  1. Watch the deployment progress in the UI. The model status will initially show Processing.

NIM Process State

  1. When deployment completes successfully, the model status shows Running.

NIM Success State

Next Steps