Documentation Index
Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt
Use this file to discover all available pages before exploring further.
Follow these best practices to handle W&B Inference errors gracefully and maintain reliable applications.
1. Always implement error handling
Wrap API calls in try-except blocks:
import openai
try:
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=messages
)
except Exception as e:
print(f"Error: {e}")
# Handle error appropriately
2. Use retry logic with exponential backoff
import time
from typing import Optional
def call_inference_with_retry(
client,
messages,
model: str,
max_retries: int = 3,
base_delay: float = 1.0
) -> Optional[str]:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages
)
return response.choices[0].message.content
except Exception as e:
if attempt == max_retries - 1:
raise
# Calculate delay with exponential backoff
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed, retrying in {delay}s...")
time.sleep(delay)
return None
3. Monitor your usage
- Track credit usage in the W&B Billing page
- Set up alerts before hitting limits
- Log API usage in your application
4. Handle specific error codes
def handle_inference_error(error):
error_str = str(error)
if "401" in error_str:
# Invalid authentication
raise ValueError("Check your API key and project configuration")
elif "402" in error_str:
# Out of credits
raise ValueError("Insufficient credits")
elif "429" in error_str:
# Rate limited
return "retry"
elif "500" in error_str or "503" in error_str:
# Server error
return "retry"
else:
# Unknown error
raise
5. Set appropriate timeouts
Configure reasonable timeouts for your use case:
# For longer responses
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="your-api-key",
timeout=60.0 # 60 second timeout
)
Additional tips
- Log errors with timestamps for debugging
- Use async operations for better concurrency handling
- Implement circuit breakers for production systems
- Cache responses when appropriate to reduce API calls
Inference