Why is my W&B run slow to initialize or upload? - Weights & Biases Documentation

Slow wandb.init() or sluggish metric uploads are usually caused by network latency, large media payloads, high logging frequency, or slow startup of the W&B service process.

Slow wandb.init()

wandb.init() contacts the W&B API to create the run and verify credentials. If it hangs for more than a few seconds:

Check connectivity: Run curl -I https://api.wandb.ai to confirm your machine can reach the W&B API. Firewall rules or proxy configurations on clusters are a common cause.
Increase the init timeout: If the connection is intermittent, give wandb.init() more time before it gives up:
import os os.environ["WANDB_INIT_TIMEOUT"] = "120" # seconds
Use offline mode during testing: If you do not need live syncing while iterating, run offline and sync later. Replace [TIMESTAMP] and [ID] with your run’s timestamp and ID:
WANDB_MODE=offline python train.py wandb sync wandb/run-[TIMESTAMP]-[ID]

Slow metric uploads during training

W&B uploads metrics asynchronously in background threads so your training loop is not blocked. Uploads can fall behind when:

You log too frequently: Calling wandb.log() every step on a fast GPU can generate more data than the background threads can upload. Log every N steps instead:
if step % 50 == 0: wandb.log({"loss": loss}, step=step)
You log large media on every step: wandb.Image, wandb.Table, and wandb.Video objects are significantly larger than scalar metrics. Log rich media every epoch or every N steps rather than every step.
Rate limits: If you hit the 429 Rate limit exceeded error, see How do I fix rate limit exceeded errors?.

Run finalization is slow

After your script calls wandb.finish() (or exits), W&B flushes any remaining buffered data. This can take time if a large backlog built up during training. Keep logging frequency reasonable throughout training rather than batching everything at the end.

Diagnosing with debug logs

Enable debug logging to see where time is spent:

WANDB_DEBUG=true python train.py

This writes detailed timing information to wandb/debug.log and wandb/debug-internal.log. For more information, see Experiments limits and performance and How do I deal with network issues?.

Runs Experiments Connectivity

​Slow wandb.init()

​Slow metric uploads during training

​Run finalization is slow

​Diagnosing with debug logs

Slow wandb.init()

Slow metric uploads during training

Run finalization is slow

Diagnosing with debug logs