> ## Documentation Index > Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt > Use this file to discover all available pages before exploring further. # Why am I getting rate limit errors (429) with Serverless Inference? This page explains why Serverless Inference returns `429` rate limit errors and how to resolve them so your requests succeed within the allowed concurrency limits. Rate limit errors (`429`) occur when you exceed concurrency limits. **Error:** "Concurrency limit reached for requests" **Solution:** To resolve the error, do one of the following: * Reduce the number of parallel requests. * Add delays between requests. * Implement exponential backoff. Note: Rate limits apply per W\&B project. ## Best practices to avoid rate limits The following practices help your application stay within concurrency limits and recover gracefully when it hits limits. * **Implement retry logic with exponential backoff:** Backoff spaces out retries so transient `429` responses clear before the next attempt. ```python theme={null} import time def retry_with_backoff(func, max_retries=3): for i in range(max_retries): try: return func() except Exception as e: if "429" in str(e) and i < max_retries - 1: time.sleep(2 ** i) else: raise ``` * **Use batch processing instead of parallel requests.** * **Monitor your usage on the W\&B Billing page.** ## Default spending caps Accounts also have default spending caps that bound overall Inference usage: * **Pro accounts:** \$6,000 per month * **Enterprise accounts:** \$700,000 per year Contact your account executive or support to adjust limits. *** [Inference](/support/models/tags/inference)