NVIDIA NeMo Inference Microservice Deploy Job
2 minute read
Deploy a model artifact from W&B to a NVIDIA NeMo Inference Microservice. To do this, use W&B Launch. W&B Launch converts model artifacts to NVIDIA NeMo Model and deploys to a running NIM/Triton server.
W&B Launch currently accepts the following compatible model types:
a2-ultragpu-1g
.Quickstart
-
Create a launch queue if you don’t have one already. See an example queue config below.
-
Create this job in your project:
-
Launch an agent on your GPU machine:
-
Submit the deployment launch job with your desired configs from the Launch UI
- You can also submit via the CLI:
- You can also submit via the CLI:
-
You can track the deployment process in the Launch UI.
-
Once complete, you can immediately curl the endpoint to test the model. The model name is always
ensemble
.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.