This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Capabilities

Capabilities of W&B Inference with examples of use.

The pages below demonstrate various features and capabilities of W&B Inference’s hosted models.

1 - Enable streaming responses

How to use streaming output with W&B Inference.

Sometimes models take a while to generate a response. Setting the stream option to true allows you to receive the response as a stream of chunks, allowing you to incrementally display results instead of waiting for the entire response to be generated.

Streaming output is supported for all hosted models. We especially encourage its use with reasoning models, as non-streaming requests may timeout if the model thinks for too long before output starts.

import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="<your-api-key>",  # Available from https://wandb.ai/authorize
)

stream = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Tell me a rambling joke"}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content or "", end="", flush=True)
    else:
        print(chunk) # Show CompletionUsage object
curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [
      { "role": "user", "content": "Tell me a rambling joke" }
    ],
    "stream": true
  }'

2 - View reasoning information

How to return and view reasoning in your W&B Inference responses.

Reasoning models, like OpenAI’s GPT OSS 20B, include information about their reasoning steps as part of the output returned in addition to the final answer. This is automatic and no additional input parameters are needed.

You can determine whether a model supports reasoning or not by checking the Supported Features sections of its catalog page in the UI.

You can find reasoning information in the a reasoning_content field of responses. This field is not present in the outputs of other models.

import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="<your-api-key>",  # Available from https://wandb.ai/authorize
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "user", "content": "3.11 and 3.8, which is greater?"}
    ],
)

print(response.choices[0].message.reasoning_content)
print("--------------------------------")
print(response.choices[0].message.content)
curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "3.11 and 3.8, which is greater?" }
    ],
  }'

3 - Enable JSON mode

How to use JSON mode with W&B Inference.

Enabling JSON mode instructs the model to return the response in a valid JSON format. However, the reponse’s schema may not be consistent or adhere to a particular structure. For consistent structured JSON responses, we recommend using structured output when possible.

To enable JSON mode, specify it as the “response_format” in the request:

import json
import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="<your-api-key>",  # Available from https://wandb.ai/authorize
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
        {"role": "user", "content": "Give me a list of three fruits with their colors."},
    ],
    response_format={"type": "json_object"}  # This enables JSON mode
)

content = response.choices[0].message.content
parsed = json.loads(content)
print(parsed)
curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
        {"role": "user", "content": "Give me a list of three fruits with their colors."},
    ],
    "response_format": {"type": "json_object"}
  }'

4 - Enable structured output

How to configure structured output in W&B Inference responses.

Structured Output is similar to JSON mode but provides the added benefit of ensuring that the model’s response adheres to the schema you specify. We recommend using structured output instead of JSON mode when possible.

To enable structured output, specify json_schema as the response_format type in the request:

import json
import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="<your-api-key>",  # Available from https://wandb.ai/authorize
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "CalendarEventResponse",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "participants": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["name", "date", "participants"],
                "additionalProperties": False,
            },
        },
    },
)

content = response.choices[0].message.content
parsed = json.loads(content)
print(parsed)
curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "CalendarEventResponse",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "participants": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["name", "date", "participants"],
                "additionalProperties": False,
            },
        },
    },
  }'

5 - Use tool calling

How to use Tool Calling with W&B Inference.

Tool calling allows you to extend a model’s capabilities to include invoking tools as part of its response. W&B Inference only supports calling functions at this time.

To call functions, specify them and their arguments as part of your request to the model. The model determines whether it needs to run the function to fulfill a request, and then specifies the function’s argument values if needed.

import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="<your-api-key>",  # Available from https://wandb.ai/authorize
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "user", "content": "What is the weather like in San Francisco? Use Fahrenheit."},
    ],
    tool_choice="auto",
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location", "unit"],
                },
            },
        }
    ],
)

print(response.choices[0].message.tool_calls)
curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{
    "model": "openai/gpt-oss-20b",
        "messages": [
            {"role": "user", "content": "What is the weather like in San Francisco? Use Fahrenheit."},
        ],
        "tool_choice": "auto",
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get the current weather in a given location",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                        },
                        "required": ["location", "unit"],
                    },
                },
            }
        ],
  }'