Original training script
Suppose you have a Python script that trains a model (see below). Your goal is to find the hyperparameters that maxmimizes the validation accuracy(val_acc).
In your Python script, you define two functions: train_one_epoch and evaluate_one_epoch. The train_one_epoch function simulates training for one epoch and returns the training accuracy and loss. The evaluate_one_epoch function simulates evaluating the model on the validation data set and returns the validation accuracy and loss.
You define a configuration dictionary (config) that contains hyperparameter values such as the learning rate (lr), batch size (batch_size), and number of epochs (epochs). The values in the configuration dictionary control the training process.
Next you define a function called main that mimics a typical training loop. For each epoch, the accuracy and loss is computed on the training and validation data sets.
This code is a mock training script. It does not train a model, but simulates the training process by generating random accuracy and loss values. The purpose of this code is to demonstrate how to integrate W&B into your training script.
val_acc).
Add W&B to your training script
Update you training script to include W&B. How you integrate W&B to your Python script or notebook depends on how you manage sweeps. To use the W&B Python SDK to start, stop, and manage sweeps, follow the instructions in the Python script or notebook tab. To use the W&B CLI instead, follow the instructions in the CLI tab.- CLI
- Python script or notebook
Create a YAML configuration file with your sweep configuration. The
configuration file contains the hyperparameters you want the sweep to explore. In
the following example, the batch size (For more information on how to create a W&B Sweep configuration, see Define sweep configuration.You must provide the name of your Python script for the In your CLI, set a maximum number of runs for the sweep
agent to try. This is optional. This example we set the
maximum number to 5.Next, initialize the sweep with the This returns a sweep ID. For more information on how to initialize sweeps, see
Initialize sweeps.Copy the sweep ID and replace For more information, see Start sweep jobs.
batch_size), epochs (epochs), and
the learning rate (lr) hyperparameters are varied during each sweep.program key
in your YAML file.Next, add the following to the code example:- Import the W&B Python SDK (
wandb) and PyYAML (yaml). PyYAML is used to read in our YAML configuration file. - Read in the configuration file.
- Use
wandb.init()to start a background process to sync and log data as a W&B Run. Pass the config object to the config parameter. - Define hyperparameter values from
wandb.Run.configinstead of using hard coded values. - Log the metric you want to optimize with
wandb.Run.log(). You must log the metric defined in your configuration. Within the configuration dictionary (sweep_configurationin this example) you define the sweep to maximize theval_accvalue.
wandb sweep command. Provide the name of the YAML file. Optionally provide the name of the project for the project flag (--project):sweepID in the following code snippet to start
the sweep job with the wandb agent
command:Logging metrics to W&B in a sweepYou must log the metric you define and are optimizing for in both your sweep configuration and with The following is an incorrect example of logging the metric to W&B. The metric that is optimized for in the sweep configuration is
wandb.Run.log(). For example, if you define the metric to optimize as val_acc within your sweep configuration, you must also log val_acc to W&B. If you do not log the metric, W&B does not know what to optimize for.val_acc, but the code logs val_acc within a nested dictionary under the key validation. You must log the metric directly, not within a nested dictionary.