> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Why am I getting rate limit errors (429) with Serverless Inference?

This page explains why Serverless Inference returns `429` rate limit errors and how to resolve them so your requests succeed within the allowed concurrency limits.

Rate limit errors (`429`) occur when you exceed concurrency limits.

**Error:** "Concurrency limit reached for requests"

**Solution:** To resolve the error, do one of the following:

* Reduce the number of parallel requests.
* Add delays between requests.
* Implement exponential backoff.

Note: Rate limits apply per W\&B project.

## Best practices to avoid rate limits

The following practices help your application stay within concurrency limits and recover gracefully when it hits limits.

* **Implement retry logic with exponential backoff:** Backoff spaces out retries so transient `429` responses clear before the next attempt.
  ```python theme={null}
  import time

  def retry_with_backoff(func, max_retries=3):
      for i in range(max_retries):
          try:
              return func()
          except Exception as e:
              if "429" in str(e) and i < max_retries - 1:
                  time.sleep(2 ** i)
              else:
                  raise
  ```

* **Use batch processing instead of parallel requests.**

* **Monitor your usage on the W\&B Billing page.**

## Default spending caps

Accounts also have default spending caps that bound overall Inference usage:

* **Pro accounts:** \$6,000 per month
* **Enterprise accounts:** \$700,000 per year

Contact your account executive or support to adjust limits.

***

<Badge stroke shape="pill" color="orange" size="md">[Inference](/support/models/tags/inference)</Badge>
