wandb.init() or sluggish metric uploads are usually caused by network latency, large media payloads, high logging frequency, or slow startup of the W&B service process.
Slow wandb.init()
wandb.init() contacts the W&B API to create the run and verify credentials. If it hangs for more than a few seconds:
-
Check connectivity: Run
curl -I https://api.wandb.aito confirm your machine can reach the W&B API. Firewall rules or proxy configurations on clusters are a common cause. -
Increase the init timeout: If the connection is intermittent, give
wandb.init()more time before it gives up: -
Use offline mode during testing: If you do not need live syncing while iterating, run offline and sync later. Replace
[TIMESTAMP]and[ID]with your run’s timestamp and ID:
Slow metric uploads during training
W&B uploads metrics asynchronously in background threads so your training loop is not blocked. Uploads can fall behind when:-
You log too frequently: Calling
wandb.log()every step on a fast GPU can generate more data than the background threads can upload. Log every N steps instead: -
You log large media on every step:
wandb.Image,wandb.Table, andwandb.Videoobjects are significantly larger than scalar metrics. Log rich media every epoch or every N steps rather than every step. -
Rate limits: If you hit the
429 Rate limit exceedederror, see How do I fix rate limit exceeded errors?.
Run finalization is slow
After your script callswandb.finish() (or exits), W&B flushes any remaining buffered data. This can take time if a large backlog built up during training. Keep logging frequency reasonable throughout training rather than batching everything at the end.
Diagnosing with debug logs
Enable debug logging to see where time is spent:wandb/debug.log and wandb/debug-internal.log.
For more information, see Experiments limits and performance and How do I deal with network issues?.
Runs Experiments Connectivity