How do I fix server errors (500, 503) with W&B Inference?

Support:

Inference

less than a minute

Server errors indicate temporary issues with the W&B Inference service.

Error types

500 - Internal Server Error

Message: “The server had an error while processing your request”

This is a temporary internal error on the server side.

503 - Service Overloaded

Message: “The engine is currently overloaded, please try again later”

The service is experiencing high traffic.

How to handle server errors

Wait before retrying
- 500 errors: Wait 30-60 seconds
- 503 errors: Wait 60-120 seconds

Use exponential backoff

import time
import openai

def call_with_retry(client, messages, model, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages
            )
        except Exception as e:
            if "500" in str(e) or "503" in str(e):
                if attempt < max_retries - 1:
                    wait_time = min(60, (2 ** attempt))
                    time.sleep(wait_time)
                else:
                    raise
            else:
                raise

Set appropriate timeouts
- Increase timeout values for your HTTP client
- Consider async operations for better handling

When to contact support

Contact support if:

Errors persist for more than 10 minutes
You see patterns of failures at specific times
Error messages contain additional details

Provide:

Error messages and codes
Time when errors occurred
Your code snippet (remove API keys)
W&B entity and project names

Feedback

Was this page helpful?

Glad to hear it! If you have more to say, please let us know.

Sorry to hear that. Please tell us how we can improve.

Last modified October 3, 2025

Edit page Report issue