API Reference
4 minute read
Learn how to use the W&B Inference API to access foundation models programmatically.
Endpoint
Access the Inference service at:
https://api.inference.wandb.ai/v1
Important
To use this endpoint, you need:
- A W&B account with Inference credits
- A valid W&B API key
If you belong to more than one team or want to attribute your usage to a project you will also need team and project IDs. In code samples, these appear as <your-team>/<your-project>
. Your default entity and the project name inference
will be used if unspecified.
Available methods
The Inference API supports these methods:
Chat completions
Create a chat completion using the /chat/completions
endpoint. This endpoint follows the OpenAI format for sending messages and receiving responses.
To create a chat completion, provide:
- The Inference service base URL:
https://api.inference.wandb.ai/v1
- Your W&B API key:
<your-api-key>
- Optional: Your W&B team and project:
<your-team>/<your-project>
- A model ID from the available models
import openai
client = openai.OpenAI(
# The custom base URL points to W&B Inference
base_url='https://api.inference.wandb.ai/v1',
# Get your API key from https://wandb.ai/authorize
# Consider setting it in the environment as OPENAI_API_KEY instead for safety
api_key="<your-api-key>",
# Optional: Team and project for usage tracking
project="<your-team>/<your-project>",
)
# Replace <model-id> with any model ID from the available models list
response = client.chat.completions.create(
model="<model-id>",
messages=[
{"role": "system", "content": "<your-system-prompt>"},
{"role": "user", "content": "<your-prompt>"}
],
)
print(response.choices[0].message.content)
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-H "OpenAI-Project: <your-team>/<your-project>" \
-d '{
"model": "<model-id>",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Tell me a joke." }
]
}'
Response format
The API returns responses in OpenAI-compatible format:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a joke for you..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 50,
"total_tokens": 75
}
}
List supported models
Get all available models and their IDs. Use this to select models dynamically or check what’s available.
import openai
client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key="<your-api-key>",
project="<your-team>/<your-project>" # Optional, for usage tracking
)
response = client.models.list()
for model in response.data:
print(model.id)
curl https://api.inference.wandb.ai/v1/models \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-H "OpenAI-Project: <your-team>/<your-project>"
Response format
The API returns responses in OpenAI-compatible format:
{
"object": "list",
"data": [
{
"id": "deepseek-ai/DeepSeek-V3.1",
"object": "model",
"created": 0,
"owned_by": "system",
"root": "deepseek-ai/DeepSeek-V3.1"
},
{
"id": "openai/gpt-oss-20b",
"object": "model",
"created": 0,
"owned_by": "system",
"root": "openai/gpt-oss-20b"
},
...
]
}
API errors
The following table lists common API errors you might encounter:
Error Code | Message | Cause | Solution |
---|---|---|---|
401 | Authentication failed | Your authentication credentials are incorrect or your W&B project entity and/or name are incorrect. | Ensure you’re using the correct API key and that your W&B project name and entity are correct. |
403 | Country, region, or territory not supported | Accessing the API from an unsupported location. | Please see Geographic restrictions |
429 | Concurrency limit reached for requests | Too many concurrent requests. | Reduce the number of concurrent requests or increase your limits. For more information, see Usage information and limits. |
429 | You exceeded your current quota, please check your plan and billing details | Out of credits or reached monthly spending cap. | Get more credits or increase your limits. For more information, see Usage information and limits. |
429 | W&B Inference isn’t available for personal accounts. Please switch to a non-personal account to access W&B Inference | The user is on a personal account, which doesn’t have access to W&B Inference. | Switch to a non-personal account. If one isn’t available, create a Team to create a non-personal account. For more information, see Personal entities unsupported. |
500 | The server had an error while processing your request | Internal server error. | Retry after a brief wait and contact support if it persists. |
503 | The engine is currently overloaded, please try again later | Server is experiencing high traffic. | Retry your request after a short delay. |
Next steps
- Try the usage examples to see the API in action
- Explore models in the UI
Feedback
Was this page helpful?
Glad to hear it! If you have more to say, please let us know.
Sorry to hear that. Please tell us how we can improve.