resume=True
to wandb.init()
. If your process doesn't exit successfully, the next time you run it wandb will start logging from the last step.resume=True
to wandb.init()
. This can be thought of as auto-resuming, where we “automatically” pick up from where an aborted run left off. If your process doesn't exit successfully, the next time you run it wandb will start logging from the last step.wandb/wandb-resume.json
.wandb.init(id=run_id)
and then when you resume (if you want to be sure that it is resuming, you do wandb.init(id=run_id, resume="must")
.run_id
. We provide a utility to generate run_id
: wandb.util.generate_id()
. As long as you set the id to one of these unique ids for each unique run, you can say resume="allow"
and wandb will automatically resume the run with that id.wandb.save
to record the state of your run via checkpoint files. Create a checkpoint file through wandb.save()
, which can then be used through wandb.init(resume=<run-id>)
. This report illustrates how to save and restore models with W&B.WANDB_RUN_ID
: a globally unique string (per project) corresponding to a single run of your script. It must be no longer than 64 characters. All non-word characters will be converted to dashes.WANDB_RESUME
equal to "allow"
, you can always set WANDB_RUN_ID
to a unique string and restarts of the process will be handled automatically. If you set WANDB_RESUME
equal to "must"
, wandb will throw an error if the run to be resumed does not exist yet instead of auto-creating a new run.run_id
concurrently unexpected results will be recorded and rate limiting will occur.notes
specified in wandb.init()
, those notes will overwrite any notes that you have added in the UI.wandb.init(resume=True)
.