Why use W&B Inference for LoRAs?
- Upload once, deploy instantly — no servers to manage.
- Track exactly which version is live with artifact versioning.
- Update models in seconds by swapping small LoRA files instead of the full model weights.
Workflow
- Upload your LoRA weights as a W&B artifact
- Reference the artifact URI as your model name in the API
- W&B dynamically loads your weights for inference
Prerequisites
You need:- A W&B API key
- A W&B project
- Python 3.8+ with
openaiandwandbpackages:pip install wandb openai
How to add LoRAs and use them
You can add LoRAs to your W&B account and start using them with two methods:- Upload a LoRA you trained elsewhere
- Train a new LoRA with W&B
Upload your own custom LoRA directory as a W&B artifact. This is perfect if you’ve trained your LoRA elsewhere (local environment, cloud provider, or partner service).This Python code uploads your locally stored LoRA weights to W&B as a versioned artifact. It creates a
lora type artifact with the required metadata (base model and storage region), adds your LoRA files from a local directory, and logs it to your W&B project for use with inference.Key Requirements
To use your own LoRAs with Inference:- The LoRA must have been trained using one of the models listed in the Supported Base Models section.
- A LoRA saved in PEFT format as a
loratype artifact in your W&B account. - The LoRA must be stored in the
storage_region="coreweave-us"for low latency. - When uploading, include the name of the base model you trained it on (for example,
meta-llama/Llama-3.1-8B-Instruct). This ensures W&B can load it with the correct model.
Supported Base Models
Inference is currently configured for the following LLMs (exact strings must be used inwandb.base_model). More models coming soon:
OpenPipe/Qwen3-14B-InstructQwen/Qwen2.5-14B-Instructmeta-llama/Llama-3.1-70B-Instructmeta-llama/Llama-3.1-8B-Instruct
Pricing
Serverless LoRA Inference is simple and cost-effective: you pay only for storage and the inference you actually run, rather than for always-on servers or dedicated GPU instances.- Storage - Storing LoRA weights is inexpensive, especially compared to maintaining your own GPU infrastructure.
- Inference usage - Calls that use LoRA artifacts are billed at the same rates as standard model inference. There are no extra fees for serving custom LoRAs.