The pages below demonstrate various features and capabilities of W&B Inference’s hosted models.
This is the multi-page printable view of this section. Click here to print.
Capabilities
- 1: Enable streaming responses
- 2: View reasoning information
- 3: Enable JSON mode
- 4: Enable structured output
- 5: Use tool calling
1 - Enable streaming responses
Sometimes models take a while to generate a response.
Setting the stream
option to true allows you to receive the response as a stream
of chunks, allowing you to incrementally display results instead of waiting for the entire
response to be generated.
Streaming output is supported for all hosted models. We especially encourage its use with reasoning models, as non-streaming requests may timeout if the model thinks for too long before output starts.
import openai
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="<your-api-key>", # Available from https://wandb.ai/authorize
)
stream = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{"role": "user", "content": "Tell me a rambling joke"}
],
stream=True,
)
for chunk in stream:
if chunk.choices:
print(chunk.choices[0].delta.content or "", end="", flush=True)
else:
print(chunk) # Show CompletionUsage object
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "openai/gpt-oss-120b",
"messages": [
{ "role": "user", "content": "Tell me a rambling joke" }
],
"stream": true
}'
2 - View reasoning information
Reasoning models, like OpenAI’s GPT OSS 20B, include information about their reasoning steps as part of the output returned in addition to the final answer. This is automatic and no additional input parameters are needed.
You can determine whether a model supports reasoning or not by checking the Supported Features sections of its catalog page in the UI.
You can find reasoning information in the a reasoning_content
field of responses. This field is not present in the outputs of other models.
import openai
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="<your-api-key>", # Available from https://wandb.ai/authorize
)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "user", "content": "3.11 and 3.8, which is greater?"}
],
)
print(response.choices[0].message.reasoning_content)
print("--------------------------------")
print(response.choices[0].message.content)
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "openai/gpt-oss-20b",
"messages": [
{ "role": "user", "content": "3.11 and 3.8, which is greater?" }
],
}'
3 - Enable JSON mode
Enabling JSON mode instructs the model to return the response in a valid JSON format. However, the reponse’s schema may not be consistent or adhere to a particular structure. For consistent structured JSON responses, we recommend using structured output when possible.
To enable JSON mode, specify it as the “response_format” in the request:
import json
import openai
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="<your-api-key>", # Available from https://wandb.ai/authorize
)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "system", "content": "You are a helpful assistant that outputs JSON."},
{"role": "user", "content": "Give me a list of three fruits with their colors."},
],
response_format={"type": "json_object"} # This enables JSON mode
)
content = response.choices[0].message.content
parsed = json.loads(content)
print(parsed)
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "openai/gpt-oss-20b",
"messages": [
{"role": "system", "content": "You are a helpful assistant that outputs JSON."},
{"role": "user", "content": "Give me a list of three fruits with their colors."},
],
"response_format": {"type": "json_object"}
}'
4 - Enable structured output
Structured Output is similar to JSON mode but provides the added benefit of ensuring that the model’s response adheres to the schema you specify. We recommend using structured output instead of JSON mode when possible.
To enable structured output, specify json_schema
as the response_format
type in the request:
import json
import openai
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="<your-api-key>", # Available from https://wandb.ai/authorize
)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "CalendarEventResponse",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"date": {"type": "string"},
"participants": {"type": "array", "items": {"type": "string"}},
},
"required": ["name", "date", "participants"],
"additionalProperties": False,
},
},
},
)
content = response.choices[0].message.content
parsed = json.loads(content)
print(parsed)
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "openai/gpt-oss-20b",
"messages": [
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "CalendarEventResponse",
"strict": True,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"date": {"type": "string"},
"participants": {"type": "array", "items": {"type": "string"}},
},
"required": ["name", "date", "participants"],
"additionalProperties": False,
},
},
},
}'
5 - Use tool calling
Tool calling allows you to extend a model’s capabilities to include invoking tools as part of its response. W&B Inference only supports calling functions at this time.
To call functions, specify them and their arguments as part of your request to the model. The model determines whether it needs to run the function to fulfill a request, and then specifies the function’s argument values if needed.
import openai
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="<your-api-key>", # Available from https://wandb.ai/authorize
)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "user", "content": "What is the weather like in San Francisco? Use Fahrenheit."},
],
tool_choice="auto",
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location", "unit"],
},
},
}
],
)
print(response.choices[0].message.tool_calls)
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "openai/gpt-oss-20b",
"messages": [
{"role": "user", "content": "What is the weather like in San Francisco? Use Fahrenheit."},
],
"tool_choice": "auto",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location", "unit"],
},
},
}
],
}'