Looking for the gory details on how the wandb library, CLI, and UI tools work? You want the Reference documentation.
Need to track large-scale ML experiments distributed across multiple GPUs and multiple nodes? Then check out our guide to Distributed Training. For some approaches to distributed training and cross-validation, you also need to combine multiple runs together into a single experiment, as described in our guide on how to Group Runs.
At Weights & Biases, we're all about preventing you from losing any of your work. If you're using pre-emptible compute or your machine crashes, we'll help you Resume Runs where you left off. If you're in danger of losing valuable data, wandb can even Save & Restore Files.
Tired of wondering whether training has finished or, worse, crashed? Set up Alerts to Slack or your e-mail, with configurable triggers right in your Python code.
The behavior of the tool is controllable from the command line, as described in our guide to Environment Variables.