API Reference

Complete API reference for W&B Inference service

Learn how to use the W&B Inference API to access foundation models programmatically.

Endpoint

Access the Inference service at:

https://api.inference.wandb.ai/v1

Available methods

The Inference API supports these methods:

Chat completions

Create a chat completion using the /chat/completions endpoint. This endpoint follows the OpenAI format for sending messages and receiving responses.

To create a chat completion, provide:

  • The Inference service base URL: https://api.inference.wandb.ai/v1
  • Your W&B API key: <your-api-key>
  • Optional: Your W&B team and project: <your-team>/<your-project>
  • A model ID from the available models
import openai

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-api-key>",

    # Optional: Team and project for usage tracking
    project="<your-team>/<your-project>",
)

# Replace <model-id> with any model ID from the available models list
response = client.chat.completions.create(
    model="<model-id>",
    messages=[
        {"role": "system", "content": "<your-system-prompt>"},
        {"role": "user", "content": "<your-prompt>"}
    ],
)

print(response.choices[0].message.content)
curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "OpenAI-Project: <your-team>/<your-project>" \
  -d '{
    "model": "<model-id>",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Tell me a joke." }
    ]
  }'

Response format

The API returns responses in OpenAI-compatible format:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a joke for you..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  }
}

List supported models

Get all available models and their IDs. Use this to select models dynamically or check what’s available.

import openai

client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key="<your-api-key>",
    project="<your-team>/<your-project>"  # Optional, for usage tracking
)

response = client.models.list()

for model in response.data:
    print(model.id)
curl https://api.inference.wandb.ai/v1/models \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "OpenAI-Project: <your-team>/<your-project>"

Response format

The API returns responses in OpenAI-compatible format:

{
  "object": "list",
  "data": [
    {
      "id": "deepseek-ai/DeepSeek-V3.1",
      "object": "model",
      "created": 0,
      "owned_by": "system",
      "root": "deepseek-ai/DeepSeek-V3.1"
    },
    {
      "id": "openai/gpt-oss-20b",
      "object": "model",
      "created": 0,
      "owned_by": "system",
      "root": "openai/gpt-oss-20b"
    },
    ...
  ]
}

API errors

The following table lists common API errors you might encounter:

Error Code Message Cause Solution
401 Authentication failed Your authentication credentials are incorrect or your W&B project entity and/or name are incorrect. Ensure you’re using the correct API key and that your W&B project name and entity are correct.
403 Country, region, or territory not supported Accessing the API from an unsupported location. Please see Geographic restrictions
429 Concurrency limit reached for requests Too many concurrent requests. Reduce the number of concurrent requests or increase your limits. For more information, see Usage information and limits.
429 You exceeded your current quota, please check your plan and billing details Out of credits or reached monthly spending cap. Get more credits or increase your limits. For more information, see Usage information and limits.
429 W&B Inference isn’t available for personal accounts. Please switch to a non-personal account to access W&B Inference The user is on a personal account, which doesn’t have access to W&B Inference. Switch to a non-personal account. If one isn’t available, create a Team to create a non-personal account. For more information, see Personal entities unsupported.
500 The server had an error while processing your request Internal server error. Retry after a brief wait and contact support if it persists.
503 The engine is currently overloaded, please try again later Server is experiencing high traffic. Retry your request after a short delay.

Next steps