Use W&B Sweeps to automate hyperparameter search and visualize rich, interactive experiment tracking. Pick from popular search methods such as Bayesian, grid search, and random to search the hyperparameter space. Scale and parallelize sweep across one or more machines.
The preceding code snippet, and the colab linked on this page, show how to initialize and create a sweep with wht W&B CLI. See the Sweeps Walkthrough for a step-by-step outline of the W&B Python SDK commands to use to define a sweep configuration, initialize a sweep, and start a sweep.
How to get started
Depending on your use case, explore the following resources to get started with W&B Sweeps:
Read through the sweeps walkthrough for a step-by-step outline of the W&B Python SDK commands to use to define a sweep configuration, initialize a sweep, and start a sweep.
The following sections break down and explains each step in the code sample.
Set up your training code
Define a training function that takes in hyperparameter values from wandb.config and uses them to train a model and return metrics.
Optionally provide the name of the project where you want the output of the W&B Run to be stored (project parameter in wandb.init). If the project is not specified, the run is put in an “Uncategorized” project.
Both the sweep and the run must be in the same project. Therefore, the name you provide when you initialize W&B must match the name of the project you provide when you initialize a sweep.
Define the search space with a sweep configuration
Within a dictionary, specify what hyperparameters you want to sweep over and. For more information about configuration options, see Define sweep configuration.
The proceeding example demonstrates a sweep configuration that uses a random search ('method':'random'). The sweep will randomly select a random set of values listed in the configuration for the batch size, epoch, and the learning rate.
Throughout the sweeps, W&B will maximize the metric specified in the metric key (metric). In the following example, W&B will maximize ('goal':'maximize') the validation accuracy ('val_acc').
W&B uses a Sweep Controller to manage sweeps on the cloud (standard), locally (local) across one or more machines. For more information about Sweep Controllers, see Search and stop algorithms locally.
A sweep identification number is returned when you initialize a sweep:
From the terminal, hit Ctrl+c to stop the run that the Sweep agent is currently running. To kill the agent, hit Ctrl+c again after the run is stopped.
2 - Add W&B (wandb) to your code
Add W&B to your Python code script or Jupyter Notebook.
There are numerous ways to add the W&B Python SDK to your script or Jupyter Notebook. Outlined below is a “best practice” example of how to integrate the W&B Python SDK into your own code.
Original training script
Suppose you have the following code in a Jupyter Notebook cell or Python script. We define a function called main that mimics a typical training loop. For each epoch, the accuracy and loss is computed on the training and validation data sets. The values are randomly generated for the purpose of this example.
We defined a dictionary called config where we store hyperparameters values (line 15). At the end of the cell, we call the main function to execute the mock training code.
# train.pyimport random
import numpy as np
deftrain_one_epoch(epoch, lr, bs):
acc =0.25+ ((epoch /30) + (random.random() /10))
loss =0.2+ (1- ((epoch -1) /10+ random.random() /5))
return acc, loss
defevaluate_one_epoch(epoch):
acc =0.1+ ((epoch /20) + (random.random() /10))
loss =0.25+ (1- ((epoch -1) /10+ random.random() /6))
return acc, loss
config = {"lr": 0.0001, "bs": 16, "epochs": 5}
defmain():
# Note that we define values from `wandb.config`# instead of defining hard values lr = config["lr"]
bs = config["bs"]
epochs = config["epochs"]
for epoch in np.arange(1, epochs):
train_acc, train_loss = train_one_epoch(epoch, lr, bs)
val_acc, val_loss = evaluate_one_epoch(epoch)
print("epoch: ", epoch)
print("training accuracy:", train_acc, "training loss:", train_loss)
print("validation accuracy:", val_acc, "training loss:", val_loss)
# Call the main function.main()
Training script with W&B Python SDK
The following code examples demonstrate how to add the W&B Python SDK into your code. If you start W&B Sweep jobs in the CLI, you will want to explore the CLI tab. If you start W&B Sweep jobs within a Jupyter notebook or Python script, explore the Python SDK tab.
To create a W&B Sweep, we added the following to the code example:
Line 1: Import the Weights & Biases Python SDK.
Line 6: Create a dictionary object where the key-value pairs define the sweep configuration. In the proceeding example, the batch size (batch_size), epochs (epochs), and the learning rate (lr) hyperparameters are varied during each sweep. For more information on how to create a sweep configuration, see Define sweep configuration.
Line 19: Pass the sweep configuration dictionary to wandb.sweep. This initializes the sweep. This returns a sweep ID (sweep_id). For more information on how to initialize sweeps, see Initialize sweeps.
Line 33: Use the wandb.init() API to generate a background process to sync and log data as a W&B Run.
Line 37-39: (Optional) define values from wandb.config instead of defining hard coded values.
Line 45: Log the metric we want to optimize with wandb.log. You must log the metric defined in your configuration. Within the configuration dictionary (sweep_configuration in this example) we defined the sweep to maximize the val_acc value).
Line 54: Start the sweep with the wandb.agent API call. Provide the sweep ID (line 19), the name of the function the sweep will execute (function=main), and set the maximum number of runs to try to four (count=4). For more information on how to start W&B Sweep, see Start sweep agents.
import wandb
import numpy as np
import random
# Define sweep configsweep_configuration = {
"method": "random",
"name": "sweep",
"metric": {"goal": "maximize", "name": "val_acc"},
"parameters": {
"batch_size": {"values": [16, 32, 64]},
"epochs": {"values": [5, 10, 15]},
"lr": {"max": 0.1, "min": 0.0001},
},
}
# Initialize sweep by passing in config.# (Optional) Provide a name of the project.sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep")
# Define training function that takes in hyperparameter# values from `wandb.config` and uses them to train a# model and return metricdeftrain_one_epoch(epoch, lr, bs):
acc =0.25+ ((epoch /30) + (random.random() /10))
loss =0.2+ (1- ((epoch -1) /10+ random.random() /5))
return acc, loss
defevaluate_one_epoch(epoch):
acc =0.1+ ((epoch /20) + (random.random() /10))
loss =0.25+ (1- ((epoch -1) /10+ random.random() /6))
return acc, loss
defmain():
run = wandb.init()
# note that we define values from `wandb.config`# instead of defining hard values lr = wandb.config.lr
bs = wandb.config.batch_size
epochs = wandb.config.epochs
for epoch in np.arange(1, epochs):
train_acc, train_loss = train_one_epoch(epoch, lr, bs)
val_acc, val_loss = evaluate_one_epoch(epoch)
wandb.log(
{
"epoch": epoch,
"train_acc": train_acc,
"train_loss": train_loss,
"val_acc": val_acc,
"val_loss": val_loss,
}
)
# Start sweep job.wandb.agent(sweep_id, function=main, count=4)
To create a W&B Sweep, we first create a YAML configuration file. The configuration file contains he hyperparameters we want the sweep to explore. In the proceeding example, the batch size (batch_size), epochs (epochs), and the learning rate (lr) hyperparameters are varied during each sweep.
Note that you must provide the name of your Python script for the program key in your YAML file.
Next, we add the following to the code example:
Line 1-2: Import the Wieghts & Biases Python SDK (wandb) and PyYAML (yaml). PyYAML is used to read in our YAML configuration file.
Line 18: Read in the configuration file.
Line 21: Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. We pass the config object to the config parameter.
Line 25 - 27: Define hyperparameter values from wandb.config instead of using hard coded values.
Line 33-39: Log the metric we want to optimize with wandb.log. You must log the metric defined in your configuration. Within the configuration dictionary (sweep_configuration in this example) we defined the sweep to maximize the val_acc value.
import wandb
import yaml
import random
import numpy as np
deftrain_one_epoch(epoch, lr, bs):
acc =0.25+ ((epoch /30) + (random.random() /10))
loss =0.2+ (1- ((epoch -1) /10+ random.random() /5))
return acc, loss
defevaluate_one_epoch(epoch):
acc =0.1+ ((epoch /20) + (random.random() /10))
loss =0.25+ (1- ((epoch -1) /10+ random.random() /6))
return acc, loss
defmain():
# Set up your default hyperparameterswith open("./config.yaml") as file:
config = yaml.load(file, Loader=yaml.FullLoader)
run = wandb.init(config=config)
# Note that we define values from `wandb.config`# instead of defining hard values lr = wandb.config.lr
bs = wandb.config.batch_size
epochs = wandb.config.epochs
for epoch in np.arange(1, epochs):
train_acc, train_loss = train_one_epoch(epoch, lr, bs)
val_acc, val_loss = evaluate_one_epoch(epoch)
wandb.log(
{
"epoch": epoch,
"train_acc": train_acc,
"train_loss": train_loss,
"val_acc": val_acc,
"val_loss": val_loss,
}
)
# Call the main function.main()
Navigate to your CLI. Within your CLI, set a maximum number of runs the sweep agent should try. This is step optional. In the following example we set the maximum number to five.
NUM=5
Next, initialize the sweep with the wandb sweep command. Provide the name of the YAML file. Optionally provide the name of the project for the project flag (--project):
wandb sweep --project sweep-demo-cli config.yaml
This returns a sweep ID. For more information on how to initialize sweeps, see Initialize sweeps.
Copy the sweep ID and replace sweepID in the proceeding code snippet to start the sweep job with the wandb agent command:
For more information on how to start sweep jobs, see Start sweep jobs.
Consideration when logging metrics
Ensure to log the metric you specify in your sweep configuration explicitly to W&B. Do not log metrics for your sweep inside of a sub-directory.
For example, consider the proceeding psuedocode. A user wants to log the validation loss ("val_loss": loss). First they pass the values into a dictionary (line 16). However, the dictionary passed to wandb.log does not explicitly access the key-value pair in the dictionary:
Instead, explicitly access the key-value pair within the Python dictionary. For example, the proceeding code (line after you create a dictionary, specify the key-value pair when you pass the dictionary to the wandb.log method:
Learn how to create configuration files for sweeps.
A W&B Sweep combines a strategy for exploring hyperparameter values with the code that evaluates them. The strategy can be as simple as trying every option or as complex as Bayesian Optimization and Hyperband (BOHB).
Define a sweep configuration either in a Python dictionary or a YAML file. How you define your sweep configuration depends on how you want to manage your sweep.
Define your sweep configuration in a YAML file if you want to initialize a sweep and start a sweep agent from the command line. Define your sweep in a Python dictionary if you initialize a sweep and start a sweep entirely within a Python script or Jupyter notebook.
The following guide describes how to format your sweep configuration. See Sweep configuration options for a comprehensive list of top-level sweep configuration keys.
Basic structure
Both sweep configuration format options (YAML and Python dictionary) utilize key-value pairs and nested structures.
Use top-level keys within your sweep configuration to define qualities of your sweep search such as the name of the sweep (name key), the parameters to search through (parameters key), the methodology to search the parameter space (method key), and more.
For example, the proceeding code snippets show the same sweep configuration defined within a YAML file and within a Python dictionary. Within the sweep configuration there are five top level keys specified: program, name, method, metric and parameters.
Define a sweep configuration in a YAML file if you want to manage sweeps interactively from the command line (CLI)
Within the top level parameters key, the following keys are nested: learning_rate, batch_size, epoch, and optimizer. For each of the nested keys you specify, you can provide one or more values, a distribution, a probability, and more. For more information, see the parameters section in Sweep configuration options.
Double nested parameters
Sweep configurations support nested parameters. To delineate a nested parameter, use an additional parameters key under the top level parameter name. Sweep configs support multi-level nesting.
Specify a probability distribution for your random variables if you use a Bayesian or random hyperparameter search. For each hyperparameter:
Create a top level parameters key in your sweep config.
Within the parameterskey, nest the following:
Specify the name of hyperparameter you want to optimize.
Specify the distribution you want to use for the distribution key. Nest the distribution key-value pair underneath the hyperparameter name.
Specify one or more values to explore. The value (or values) should be inline with the distribution key.
(Optional) Use an additional parameters key under the top level parameter name to delineate a nested parameter.
Nested parameters defined in sweep configuration overwrite keys specified in a W&B run configuration.
For example, suppose you initialize a W&B run with the following configuration in a train.py Python script (see Lines 1-2). Next, you define a sweep configuration in a dictionary called sweep_configuration (see Lines 4-13). You then pass the sweep config dictionary to wandb.sweep to initialize a sweep config (see Line 16).
The nested_param.manual_key that is passed when the W&B run is initialized (line 2) is not accessible. The run.config only possess the key-value pairs that are defined in the sweep configuration dictionary (lines 4-13).
Sweep configuration template
The following template shows how you can configure parameters and specify search constraints. Replace hyperparameter_name with the name of your hyperparameter and any values enclosed in <>.
The proceeding tabs show how to specify common command macros:
Remove the {$interpreter} macro and provide a value explicitly to hardcode the python interpreter. For example, the following code snippet demonstrates how to do this:
If your program does not use argument parsing you can avoid passing arguments all together and take advantage of wandb.init picking up sweep parameters into wandb.config automatically:
command:
- ${env} - ${interpreter} - ${program}
You can change the command to pass arguments the way tools like Hydra expect. See Hydra with W&B for more information.
A sweep configuration consists of nested key-value pairs. Use top-level keys within your sweep configuration to define qualities of your sweep search such as the parameters to search through (parameter key), the methodology to search the parameter space (method key), and more.
The proceeding table lists top-level sweep configuration keys and a brief description. See the respective sections for more information about each key.
Command structure for invoking and passing arguments to the training script
run_cap
Maximum number of runs for this sweep
See the Sweep configuration structure for more information on how to structure your sweep configuration.
metric
Use the metric top-level sweep configuration key to specify the name, the goal, and the target metric to optimize.
Key
Description
name
Name of the metric to optimize.
goal
Either minimize or maximize (Default is minimize).
target
Goal value for the metric you are optimizing. The sweep does not create new runs when if or when a run reaches a target value that you specify. Active agents that have a run executing (when the run reaches the target) wait until the run completes before the agent stops creating new runs.
parameters
In your YAML file or Python script, specify parameters as a top level key. Within the parameters key, provide the name of a hyperparameter you want to optimize. Common hyperparameters include: learning rate, batch size, epochs, optimizers, and more. For each hyperparameter you define in your sweep configuration, specify one or more search constraints.
The proceeding table shows supported hyperparameter search constraints. Based on your hyperparameter and use case, use one of the search constraints below to tell your sweep agent where (in the case of a distribution) or what (value, values, and so forth) to search or use.
Search constraint
Description
values
Specifies all valid values for this hyperparameter. Compatible with grid.
value
Specifies the single valid value for this hyperparameter. Compatible with grid.
distribution
Specify a probability distribution. See the note following this table for information on default values.
probabilities
Specify the probability of selecting each element of values when using random.
min, max
(intor float) Maximum and minimum values. If int, for int_uniform -distributed hyperparameters. If float, for uniform -distributed hyperparameters.
mu
(float) Mean parameter for normal - or lognormal -distributed hyperparameters.
sigma
(float) Standard deviation parameter for normal - or lognormal -distributed hyperparameters.
q
(float) Quantization step size for quantized hyperparameters.
parameters
Nest other parameters inside a root level parameter.
W&B sets the following distributions based on the following conditions if a distribution is not specified:
categorical if you specify values
int_uniform if you specify max and min as integers
uniform if you specify max and min as floats
constant if you provide a set to value
method
Specify the hyperparameter search strategy with the method key. There are three hyperparameter search strategies to choose from: grid, random, and Bayesian search.
Grid search
Iterate over every combination of hyperparameter values. Grid search makes uninformed decisions on the set of hyperparameter values to use on each iteration. Grid search can be computationally costly.
Grid search executes forever if it is searching within in a continuous search space.
Random search
Choose a random, uninformed, set of hyperparameter values on each iteration based on a distribution. Random search runs forever unless you stop the process from the command line, within your python script, or the W&B App UI.
Specify the distribution space with the metric key if you choose random (method: random) search.
Bayesian search
In contrast to random and grid search, Bayesian models make informed decisions. Bayesian optimization uses a probabilistic model to decide which values to use through an iterative process of testing values on a surrogate function before evaluating the objective function. Bayesian search works well for small numbers of continuous parameters but scales poorly. For more information about Bayesian search, see the Bayesian Optimization Primer paper.
Bayesian search runs forever unless you stop the process from the command line, within your python script, or the W&B App UI.
Distribution options for random and Bayesian search
Within the parameter key, nest the name of the hyperparameter. Next, specify the distribution key and specify a distribution for the value.
The proceeding tables lists distributions W&B supports.
Value for distribution key
Description
constant
Constant distribution. Must specify the constant value (value) to use.
categorical
Categorical distribution. Must specify all valid values (values) for this hyperparameter.
int_uniform
Discrete uniform distribution on integers. Must specify max and min as integers.
uniform
Continuous uniform distribution. Must specify max and min as floats.
q_uniform
Quantized uniform distribution. Returns round(X / q) * q where X is uniform. q defaults to 1.
log_uniform
Log-uniform distribution. Returns a value X between exp(min) and exp(max)such that the natural logarithm is uniformly distributed between min and max.
log_uniform_values
Log-uniform distribution. Returns a value X between min and max such that log(X) is uniformly distributed between log(min) and log(max).
q_log_uniform
Quantized log uniform. Returns round(X / q) * q where X is log_uniform. q defaults to 1.
q_log_uniform_values
Quantized log uniform. Returns round(X / q) * q where X is log_uniform_values. q defaults to 1.
inv_log_uniform
Inverse log uniform distribution. Returns X, where log(1/X) is uniformly distributed between min and max.
inv_log_uniform_values
Inverse log uniform distribution. Returns X, where log(1/X) is uniformly distributed between log(1/max) and log(1/min).
normal
Normal distribution. Return value is normally distributed with mean mu (default 0) and standard deviation sigma (default 1).
q_normal
Quantized normal distribution. Returns round(X / q) * q where X is normal. Q defaults to 1.
log_normal
Log normal distribution. Returns a value X such that the natural logarithm log(X) is normally distributed with mean mu (default 0) and standard deviation sigma (default 1).
q_log_normal
Quantized log normal distribution. Returns round(X / q) * q where X is log_normal. q defaults to 1.
early_terminate
Use early termination (early_terminate) to stop poorly performing runs. If early termination occurs, W&B stops the current run before it creates a new run with a new set of hyperparameter values.
You must specify a stopping algorithm if you use early_terminate. Nest the type key within early_terminate within your sweep configuration.
Stopping algorithm
W&B currently supports Hyperband stopping algorithm.
Hyperband hyperparameter optimization evaluates if a program should stop or if it should to continue at one or more pre-set iteration counts, called brackets.
When a W&B run reaches a bracket, the sweep compares that run’s metric to all previously reported metric values. The sweep terminates the run if the run’s metric value is too high (when the goal is minimization) or if the run’s metric is too low (when the goal is maximization).
Brackets are based on the number of logged iterations. The number of brackets corresponds to the number of times you log the metric you are optimizing. The iterations can correspond to steps, epochs, or something in between. The numerical value of the step counter is not used in bracket calculations.
Specify either min_iter or max_iter to create a bracket schedule.
Key
Description
min_iter
Specify the iteration for the first bracket
max_iter
Specify the maximum number of iterations.
s
Specify the total number of brackets (required for max_iter)
eta
Specify the bracket multiplier schedule (default: 3).
strict
Enable ‘strict’ mode that prunes runs aggressively, more closely following the original Hyperband paper. Defaults to false.
Hyperband checks which W&B runs to end once every few minutes. The end run timestamp might differ from the specified brackets if your run or iteration are short.
command
Modify the format and contents with nested values within the command key. You can directly include fixed components such as filenames.
On Unix systems, /usr/bin/env ensures that the OS chooses the correct Python interpreter based on the environment.
W&B supports the following macros for variable components of the command:
Command macro
Description
${env}
/usr/bin/env on Unix systems, omitted on Windows.
${interpreter}
Expands to python.
${program}
Training script filename specified by the sweep configuration program key.
${args}
Hyperparameters and their values in the form --param1=value1 --param2=value2.
${args_no_boolean_flags}
Hyperparameters and their values in the form --param1=value1 except boolean parameters are in the form --boolean_flag_param when True and omitted when False.
${args_no_hyphens}
Hyperparameters and their values in the form param1=value1 param2=value2.
${args_json}
Hyperparameters and their values encoded as JSON.
${args_json_file}
The path to a file containing the hyperparameters and their values encoded as JSON.
${envvar}
A way to pass environment variables. ${envvar:MYENVVAR} __ expands to the value of MYENVVAR environment variable. __
4 - Initialize a sweep
Initialize a W&B Sweep
W&B uses a Sweep Controller to manage sweeps on the cloud (standard), locally (local) across one or more machines. After a run completes, the sweep controller will issue a new set of instructions describing a new run to execute. These instructions are picked up by agents who actually perform the runs. In a typical W&B Sweep, the controller lives on the W&B server. Agents live on your machines.
The following code snippets demonstrate how to initialize sweeps with the CLI and within a Jupyter Notebook or Python script.
Before you initialize a sweep, make sure you have a sweep configuration defined either in a YAML file or a nested Python dictionary object in your script. For more information see, Define sweep configuration.
Both the W&B Sweep and the W&B Run must be in the same project. Therefore, the name you provide when you initialize W&B (wandb.init) must match the name of the project you provide when you initialize a W&B Sweep (wandb.sweep).
Use the W&B SDK to initialize a sweep. Pass the sweep configuration dictionary to the sweep parameter. Optionally provide the name of the project for the project parameter (project) where you want the output of the W&B Run to be stored. If the project is not specified, the run is put in an “Uncategorized” project.
The wandb.sweep function returns the sweep ID. The sweep ID includes the entity name and the project name. Make a note of the sweep ID.
Use the W&B CLI to initialize a sweep. Provide the name of your configuration file. Optionally provide the name of the project for the project flag. If the project is not specified, the W&B Run is put in an “Uncategorized” project.
Use the wandb sweep command to initialize a sweep. The proceeding code example initializes a sweep for a sweeps_demo project and uses a config.yaml file for the configuration.
wandb sweep --project sweeps_demo config.yaml
This command will print out a sweep ID. The sweep ID includes the entity name and the project name. Make a note of the sweep ID.
5 - Start or stop a sweep agent
Start or stop a W&B Sweep Agent on one or more machines.
Start a W&B Sweep on one or more agents on one or more machines. W&B Sweep agents query the W&B server you launched when you initialized a W&B Sweep (wandb sweep) for hyperparameters and use them to run model training.
To start a W&B Sweep agent, provide the W&B Sweep ID that was returned when you initialized a W&B Sweep. The W&B Sweep ID has the form:
entity/project/sweep_ID
Where:
entity: Your W&B username or team name.
project: The name of the project where you want the output of the W&B Run to be stored. If the project is not specified, the run is put in an “Uncategorized” project.
sweep_ID: The pseudo random, unique ID generated by W&B.
Provide the name of the function the W&B Sweep will execute if you start a W&B Sweep agent within a Jupyter Notebook or Python script.
The proceeding code snippets demonstrate how to start an agent with W&B. We assume you already have a configuration file and you have already initialized a W&B Sweep. For more information about how to define a configuration file, see Define sweep configuration.
Use the wandb agent command to start a sweep. Provide the sweep ID that was returned when you initialized the sweep. Copy and paste the code snippet below and replace sweep_id with your sweep ID:
wandb agent sweep_id
Use the W&B Python SDK library to start a sweep. Provide the sweep ID that was returned when you initialized the sweep. In addition, provide the name of the function the sweep will execute.
Random and Bayesian searches will run forever. You must stop the process from the command line, within your python script, or the Sweeps UI.
Optionally specify the number of W&B Runs a Sweep agent should try. The following code snippets demonstrate how to set a maximum number of W&B Runs with the CLI and within a Jupyter Notebook, Python script.
First, initialize your sweep. For more information, see Initialize sweeps.
sweep_id = wandb.sweep(sweep_config)
Next, start the sweep job. Provide the sweep ID generated from sweep initiation. Pass an integer value to the count parameter to set the maximum number of runs to try.
If you start a new run after the sweep agent has finished, within the same script or notebook, then you should call wandb.teardown() before starting the new run.
Parallelize W&B Sweep agents on multi-core or multi-GPU machine.
Parallelize your W&B Sweep agents on a multi-core or multi-GPU machine. Before you get started, ensure you have initialized your W&B Sweep. For more information on how to initialize a W&B Sweep, see Initialize sweeps.
Parallelize on a multi-CPU machine
Depending on your use case, explore the proceeding tabs to learn how to parallelize W&B Sweep agents using the CLI or within a Jupyter Notebook.
Use the wandb agent command to parallelize your W&B Sweep agent across multiple CPUs with the terminal. Provide the sweep ID that was returned when you initialized the sweep.
Open more than one terminal window on your local machine.
Copy and paste the code snippet below and replace sweep_id with your sweep ID:
wandb agent sweep_id
Use the W&B Python SDK library to parallelize your W&B Sweep agent across multiple CPUs within Jupyter Notebooks. Ensure you have the sweep ID that was returned when you initialized the sweep. In addition, provide the name of the function the sweep will execute for the function parameter:
Open more than one Jupyter Notebook.
Copy and past the W&B Sweep ID on multiple Jupyter Notebooks to parallelize a W&B Sweep. For example, you can paste the following code snippet on multiple jupyter notebooks to paralleliz your sweep if you have the sweep ID stored in a variable called sweep_id and the name of the function is function_name:
Follow the procedure outlined to parallelize your W&B Sweep agent across multiple GPUs with a terminal using CUDA Toolkit:
Open more than one terminal window on your local machine.
Specify the GPU instance to use with CUDA_VISIBLE_DEVICES when you start a W&B Sweep job (wandb agent). Assign CUDA_VISIBLE_DEVICES an integer value corresponding to the GPU instance to use.
For example, suppose you have two NVIDIA GPUs on your local machine. Open a terminal window and set CUDA_VISIBLE_DEVICES to 0 (CUDA_VISIBLE_DEVICES=0). Replace sweep_ID in the proceeding example with the W&B Sweep ID that is returned when you initialized a W&B Sweep:
Terminal 1
CUDA_VISIBLE_DEVICES=0 wandb agent sweep_ID
Open a second terminal window. Set CUDA_VISIBLE_DEVICES to 1 (CUDA_VISIBLE_DEVICES=1). Paste the same W&B Sweep ID for the sweep_ID mentioned in the proceeding code snippet:
Terminal 2
CUDA_VISIBLE_DEVICES=1 wandb agent sweep_ID
7 - Visualize sweep results
Visualize the results of your W&B Sweeps with the W&B App UI.
Visualize the results of your W&B Sweeps with the W&B App UI. Navigate to the W&B App UI at https://wandb.ai/home. Choose the project that you specified when you initialized a W&B Sweep. You will be redirected to your project workspace. Select the Sweep icon on the left panel (broom icon). From the Sweep UI, select the name of your Sweep from the list.
By default, W&B will automatically create a parallel coordinates plot, a parameter importance plot, and a scatter plot when you start a W&B Sweep job.
Parallel coordinates charts summarize the relationship between large numbers of hyperparameters and model metrics at a glance. For more information on parallel coordinates plots, see Parallel coordinates.
The scatter plot(left) compares the W&B Runs that were generated during the Sweep. For more information about scatter plots, see Scatter Plots.
The parameter importance plot(right) lists the hyperparameters that were the best predictors of, and highly correlated to desirable values of your metrics. For more information parameter importance plots, see Parameter Importance.
You can alter the dependent and independent values (x and y axis) that are automatically used. Within each panel there is a pencil icon called Edit panel. Choose Edit panel. A model will appear. Within the modal, you can alter the behavior of the graph.
For more information on all default W&B visualization options, see Panels. See the Data Visualization docs for information on how to create plots from W&B Runs that are not part of a W&B Sweep.
8 - Manage sweeps with the CLI
Pause, resume, and cancel a W&B Sweep with the CLI.
Pause, resume, and cancel a W&B Sweep with the CLI. Pausing a W&B Sweep tells the W&B agent that new W&B Runs should not be executed until the Sweep is resumed. Resuming a Sweep tells the agent to continue executing new W&B Runs. Stopping a W&B Sweep tells the W&B Sweep agent to stop creating or executing new W&B Runs. Cancelling a W&B Sweep tells the Sweep agent to kill currently executing W&B Runs and stop executing new Runs.
In each case, provide the W&B Sweep ID that was generated when you initialized a W&B Sweep. Optionally open a new terminal window to execute the proceeding commands. A new terminal window makes it easier to execute a command if a W&B Sweep is printing output statements to your current terminal window.
Use the following guidance to pause, resume, and cancel sweeps.
Pause sweeps
Pause a W&B Sweep so it temporarily stops executing new W&B Runs. Use the wandb sweep --pause command to pause a W&B Sweep. Provide the W&B Sweep ID that you want to pause.
wandb sweep --pause entity/project/sweep_ID
Resume sweeps
Resume a paused W&B Sweep with the wandb sweep --resume command. Provide the W&B Sweep ID that you want to resume:
wandb sweep --resume entity/project/sweep_ID
Stop sweeps
Finish a W&B sweep to stop executing newW&B Runs and let currently executing Runs finish.
wandb sweep --stop entity/project/sweep_ID
Cancel sweeps
Cancel a sweep to kill all running runs and stop running new runs. Use the wandb sweep --cancel command to cancel a W&B Sweep. Provide the W&B Sweep ID that you want to cancel.
wandb sweep --cancel entity/project/sweep_ID
For a full list of CLI command options, see the wandb sweep CLI Reference Guide.
Pause, resume, stop, and cancel a sweep across multiple agents
Pause, resume, stop, or cancel a W&B Sweep across multiple agents from a single terminal. For example, suppose you have a multi-core machine. After you initialize a W&B Sweep, you open new terminal windows and copy the Sweep ID to each new terminal.
Within any terminal, use the wandb sweep CLI command to pause, resume, stop, or cancel a W&B Sweep. For example, the proceeding code snippet demonstrates how to pause a W&B Sweep across multiple agents with the CLI:
wandb sweep --pause entity/project/sweep_ID
Specify the --resume flag along with the Sweep ID to resume the Sweep across your agents:
wandb sweep --resume entity/project/sweep_ID
For more information on how to parallelize W&B agents, see Parallelize agents.
Description: We examine agents trained with different side effect penalties on three different tasks: pattern creation, pattern removal, and navigation.
Description: How do we distinguish signal from pareidolia (imaginary patterns)? This article is showcases what is possible with W&B and aims to inspire further exploration.
Description: Explore why hyperparameter optimization matters and look at three algorithms to automate hyperparameter tuning for your machine learning models.
selfm-anaged
The following how-to-guide demonstrates how to solve real-world problems with W&B:
Description: How to use W&B Sweeps for hyperparameter tuning using XGBoost.
Sweep GitHub repository
W&B advocates open source and welcome contributions from the community. Find the GitHub repository at https://github.com/wandb/sweeps. For information on how to contribute to the W&B open source repo, see the W&B GitHub Contribution guidelines.
10 - Manage algorithms locally
Search and stop algorithms locally instead of using the W&B cloud-hosted service.
The hyper-parameter controller is hosted by Weights & Biased as a cloud service by default. W&B agents communicate with the controller to determine the next set of parameters to use for training. The controller is also responsible for running early stopping algorithms to determine which runs can be stopped.
The local controller feature allows the user to commence search and stop algorithms locally. The local controller gives the user the ability to inspect and instrument the code in order to debug issues as well as develop new features which can be incorporated into the cloud service.
This feature is offered to support faster development and debugging of new algorithms for the Sweeps tool. It is not intended for actual hyperparameter optimization workloads.
Before you get start, you must install the W&B SDK(wandb). Type the following code snippet into your command line:
pip install wandb sweeps
The following examples assume you already have a configuration file and a training loop defined in a python script or Jupyter Notebook. For more information about how to define a configuration file, see Define sweep configuration.
Run the local controller from the command line
Initialize a sweep similarly to how you normally would when you use hyper-parameter controllers hosted by W&B as a cloud service. Specify the controller flag (controller) to indicate you want to use the local controller for W&B sweep jobs:
wandb sweep --controller config.yaml
Alternatively, you can separate initializing a sweep and specifying that you want to use a local controller into two steps.
To separate the steps, first add the following key-value to your sweep’s YAML configuration file:
controller:
type: local
Next, initialize the sweep:
wandb sweep config.yaml
After you initialized the sweep, start a controller with wandb controller:
# wandb sweep command will print a sweep_idwandb controller {user}/{entity}/{sweep_id}
Once you have specified you want to use a local controller, start one or more Sweep agents to execute the sweep. Start a W&B Sweep similar to how you normally would. See Start sweep agents, for more information.
wandb sweep sweep_ID
Run a local controller with W&B Python SDK
The following code snippets demonstrate how to specify and use a local controller with the W&B Python SDK.
The simplest way to use a controller with the Python SDK is to pass the sweep ID to the wandb.controller method. Next, use the return objects run method to start the sweep job:
Troubleshoot common error messages with the guidance suggested.
CommError, Run does not exist and ERROR Error uploading
Your W&B Run ID might be defined if these two error messages are both returned. As an example, you might have a similar code snippet defined somewhere in your Jupyter Notebooks or Python script:
wandb.init(id="some-string")
You can not set a Run ID for W&B Sweeps because W&B automatically generates random, unique IDs for Runs created by W&B Sweeps.
W&B Run IDs need to be unique within a project.
We recommend you pass a name to the name parameter when you initialized W&B, if you want to set a custom name that will appear on tables and graphs. For example:
wandb.init(name="a helpful readable run name")
Cuda out of memory
Refactor your code to use process-based executions if you see this error message. More specifically, rewrite your code to a Python script. In addition, call the W&B Sweep Agent from the CLI, instead of the W&B Python SDK.
As an example, suppose you rewrite your code to a Python script called train.py. Add the name of the training script (train.py) to your YAML Sweep configuration file (config.yaml in this example):
Next, add the following to your train.py Python script:
if _name_ =="_main_":
train()
Navigate to your CLI and initialize a W&B Sweep with wandb sweep:
wandb sweep config.yaml
Make a note of the W&B Sweep ID that is returned. Next, start the Sweep job with wandb agent with the CLI instead of the Python SDK (wandb.agent). Replace sweep_ID in the code snippet below with the Sweep ID that was returned in the previous step:
wandb agent sweep_ID
anaconda 400 error
The following error usually occurs when you do not log the metric that you are optimizing:
wandb: ERROR Error while calling W&B API: anaconda 400 error:
{"code": 400, "message": "TypeError: bad operand type for unary -: 'NoneType'"}
Within your YAML file or nested dictionary you specify a key named “metric” to optimize. Ensure that you log (wandb.log) this metric. In addition, ensure you use the exact metric name that you defined the sweep to optimize within your Python script or Jupyter Notebook. For more information about configuration files, see Define sweep configuration.
12 - Sweeps UI
Describes the different components of the Sweeps UI.
The state (State), creation time (Created), the entity that started the sweep (Creator), the number of runs completed (Run count), and the time it took to compute the sweep (Compute time) are displayed in the Sweeps UI. The expected number of runs a sweep will create (Est. Runs) is provided when you do a grid search over a discrete search space. You can also click on a sweep to pause, resume, stop, or kill the sweep from the interface.
13 - Tutorial: Create sweep job from project
Tutorial on how to create sweep jobs from a pre-existing W&B project.
First, create a baseline. Download the PyTorch MNIST dataset example model from W&B examples GitHub repository. Next, train the model. The training script is within the examples/pytorch/pytorch-cnn-fashion directory.
Clone this repo git clone https://github.com/wandb/examples.git
Open this example cd examples/pytorch/pytorch-cnn-fashion
Run a run manually python train.py
Optionally explore the example appear in the W&B App UI dashboard.
From your project page, open the Sweep tab in the sidebar and select Create Sweep.
The auto-generated configuration guesses values to sweep over based on the runs you have completed. Edit the configuration to specify what ranges of hyperparameters you want to try. When you launch the sweep, it starts a new process on the hosted W&B sweep server. This centralized service coordinates the agents— the machines that are running the training jobs.
3. Launch agents
Next, launch an agent locally. You can launch up to 20 agents on different machines in parallel if you want to distribute the work and finish the sweep job more quickly. The agent will print out the set of parameters it’s trying next.
Now you’re running a sweep. The following image demonstrates what the dashboard looks like as the example sweep job is running. View an example project page →
Seed a new sweep with existing runs
Launch a new sweep using existing runs that you’ve previously logged.
Open your project table.
Select the runs you want to use with checkboxes on the left side of the table.
Click the dropdown to create a new sweep.
Your sweep will now be set up on our server. All you need to do is launch one or more agents to start running runs.
If you kick off the new sweep as a bayesian sweep, the selected runs will also seed the Gaussian Process.