Documentation
Search…
Common Questions

Do I need to provide values for all hyperparameters as part of the sweep, or can I set defaults?

The hyperparameter names and values specified as part of the sweep configuration are accessible in wandb.config, a dictionary-like object.
For runs that are not part of a sweep, the values of wandb.config are usually set by providing a dictionary to the config argument of wandb.init. During a sweep, however, any configuration information passed to wandb.init is instead treated as a default value, which might be over-ridden by the sweep.
You can also be more explicit about the intended behavior by using config.set_defaults. Code snippets for both methods appear below:
wandb.init
config.set_defaults
1
# set default values for hyperparameters
2
config_defaults = {"lr": 0.1, "batch_size": 256}
3
4
# start a run, providing defaults
5
# that can be over-ridden by the sweep
6
with wandb.init(config=config_default) as run:
7
# add your training code here
Copied!
1
# set default values for hyperparameters
2
config_defaults = {"lr": 0.1, "batch_size": 256}
3
4
# start a run
5
with wandb.init() as run:
6
# update any values not set by sweep
7
run.config.setdefaults(config_defaults)
8
9
# add your training code here
Copied!

Why are my sweep agents running forever? Is there a way to set a maximum number of runs?

Random and Bayesian searches will run forever -- until you stop the process from the command line or the UI. You can set a target to automatically stop the sweep when it achieves a certain value for a metric, or you can specify the number of runs an agent should try:
Command Line
Python
1
NUM=10
2
SWEEPID="dtzl1o7u"
3
wandb agent --count $NUM $SWEEPID
Copied!
1
sweep_id, count = "dtzl1o7u", 10
2
wandb.agent(sweep_id, count)
Copied!

How do I set the project and entity where the sweep is logged?

Every sweep is associated with an entity (a user or a team) and a project.
These values can be set in four ways: as command-line arguments to wandb sweep, as part of the sweep configuration YAML file, as environment variables, or via the wandb/settings file.
CLI
sweep_config.yaml
Environment Variables
wandb/settings
1
wandb sweep --entity geoff --project capsules
Copied!
1
# inside of sweep_config.yaml
2
entity: geoff
3
project: capsules
Copied!
1
# in the shell
2
WANDB_ENTITY="geoff"
3
WANDB_PROJECT="capsules"
4
5
# pure Python
6
import os
7
os.environ[WANDB_ENTITY] = "geoff"
8
os.environ[WANDB_PROJECT] = "capsules"
9
10
# IPython/Jupyter
11
%env WANDB_ENTITY="geoff"
12
%env WANDB_PROJECT="capsules"
Copied!
1
[default]
2
entity: geoff
3
project: capsules
Copied!

What's with this warning about ignoring the project? Why's my sweep not logging where I expect it?

If you get this warning:
wandb: WARNING Ignoring project='speech-reconstruction-baseline' passed to wandb.init when running a sweep
then your wandb.init call includes the project argument. That's invalid, because sweep and the runs have to be in the same project. The project is set during sweep creation, e.g. bywandb.sweep(sweep_config, project="cat-detector")

Why are my agents stopping after the first run finishes?

If the error message is a 400 code from the W&B anaconda API, like this one:
wandb: ERROR Error while calling W&B API: anaconda 400 error: {"code": 400, "message": "TypeError: bad operand type for unary -: 'NoneType'"}
then the most likely reason is that the metric you are optimizing in your configuration YAML file is not a metric that you are logging. For example, you could be optimizing the metric f1, but logging validation_f1. Double check that you're logging the exact metric name that you've asked the sweep to optimize.

How should I run sweeps on SLURM?

When using sweeps with the SLURM scheduling system, we recommend running wandb agent --count 1 SWEEP_ID in each of your scheduled jobs, which will run a single training job and then exit. This makes it easier to predict runtimes when requesting resources and takes advantage of the parallelism of hyperparameter search.

Can I rerun a grid search?

Yes! If you exhaust a grid search but want to rerun some of the runs (for example because some crashed), you can delete the ones you want to rerun, then hit the resume button on the sweep control page, then start new agents for that sweep ID. Parameter combinations with completed runs will not be retried.

What do I do if I get the error message CommError, Run does not exist?

If you're seeing that error message, plus ERROR Error uploading, you might be setting an ID for your run, e.g. wandb.init(id="some-string") . This ID needs to be unique in the project, and if it's not unique, the error above will be thrown. In the sweeps context, you can't set a manual ID for your runs because we're automatically generating random, unique IDs for the runs.
If you're trying to get a nice name to show up in the table and on the graphs, we recommend using name instead of id. For example:
1
wandb.init(name="a helpful readable run name")
Copied!

How do I use custom commands with sweeps?

If you normally configure some aspects of training by passing command line arguments, for example:
1
/usr/bin/env python edflow.py -b \
2
your-training-config \
3
--batchsize 8 \
4
--lr 0.00001
Copied!
you can still use sweeps. You just need to edit the command key in the YAML file, like so:
1
program:
2
edflow.py
3
method: grid
4
parameters:
5
batch_size:
6
value: 8
7
lr:
8
value: 0.0001
9
command:
10
- ${env}
11
- python
12
- ${program}
13
- "-b"
14
- your-training-config
15
- ${args}
Copied!
The ${args} key expands to all the parameters in the sweep configuration file, expanded so they can be parsed by argparse: --param1 value1 --param2 value2
If you have extra arguments that you don't want to specify with argparse you can use:
1
parser = argparse.ArgumentParser()
2
args, unknown = parser.parse_known_args()
Copied!
Depending on the environment, python might point to Python 2. To ensure Python 3 is invoked, just use python3 instead of python when configuring the command:
1
program:
2
script.py
3
command:
4
- ${env}
5
- python3
6
- ${program}
7
- ${args}
Copied!

How does the Bayesian search work?

The Gaussian process model that's used for Bayesian optimization is defined in our open source sweep logic. If you'd like extra configurability and control, try our support for Ray Tune.
We use scikit-learn's Matern kernel with the nu parameter set to 1.5 -- this corresponds to much a weaker smoothness assumption than for the radial basis function (RBF) kernel. For details on kernels in Gaussian processes, see Chapter 4 of Rasmussen and Williams or the scikit-learn docs linked above.

What's the difference between "stopping" and "pausing" a sweep? Why isn't the wandb agent command terminating when I pause the sweep?

"Stopping" a sweep in the Sweeps UI indicates that the hyperparameter search is over. "Pausing" it merely means that new jobs should not be launched until the sweep is resumed.
If you stop the sweep instead of pausing it, then the agents will exit -- their work is done. If the sweep is merely paused, the agents will stay running in case the sweep is resumed.

Is there a way to add a extra values to a sweep, or do I need to start a new one?

Once a sweep has started you cannot change the sweep configuration. But you can go to any table view, and use the checkboxes to select runs, then use the "create sweep" menu option to a create a new sweep configuration using prior runs.

How to use sweeps with cloud infrastructures such as AWS Batch, ECS, etc.?

In general, you would need a way to publish sweep_id to a place that any potential agent can read and a way for these agents to consume this sweep_id and start running.
In other words, you would need something that can invoke wandb agent. For instance, bring up an EC2 instance and then call wandb agent on it. In this case, you might use a SQS queue to broadcast sweep_id to a few EC2 instances and then have them consume the sweep_id from the queue and start running.
Last modified 6d ago