It is extremely important to us that we never interfere with your training runs. We run wandb in a separate process to make sure that if wandb somehow crashes, your training will continue to run. If the internet goes out, wandb will continue to retry sending data to wandb.ai.
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run as crashed after a short period of retrying.
"Is the logging function lazy? I don't want to be dependent on the network to send the results to your servers and then carry on with my local operations."
wandb.logwrites a line to a local file; it does not block any network calls. When you call
wandb.initwe launch a new process on the same machine that listens for filesystem changes and talks to our web service asynchronously from your training process.
os.environ["WANDB_SILENT"] = "true"
Ctrl+Don your keyboard to stop a script that is instrumented with wandb.
If you're seeing SSL or network errors:
wandb: Network error (ConnectionError), entering retry loop. You can try a couple of different approaches to solving this issue:
- 1.Upgrade your SSL certificate. If you're running the script on an Ubuntu server, run
update-ca-certificatesWe can't sync training logs without a valid SSL certificate because it's a security vulnerability.
SSL CERTIFICATE_VERIFY_FAILED: this error could be due to your company's firewall. You can set up local CAs and then use:
If our library is unable to connect to the internet it will enter a retry loop and keep attempting to stream metrics until the network is restored. During this time your program is able to continue running.
If you need to run on a machine without internet, you can set
WANDB_MODE=offlineto only have metrics stored locally on your hard drive. Later you can call
wandb sync DIRECTORYto have the data streamed to our server.