Save & Restore Files

Save files to the cloud and restore them locally later

This guide first demonstrates how to save files to the cloud with wandb.save, then demonstrates how they can be re-created locally with wandb.restore.

Saving Files

Sometimes, rather than logging a numerical value or a piece of media, you want to log a whole file: the weights of a model, the output of other logging software, even source code.

There are two ways to associate a file with a run and upload it to W&B.

  1. Use wandb.save(filename).

  2. Put a file in the wandb run directory, and it will get uploaded at the end of the run.

If you're resuming a run, you can recover a file by callingwandb.restore(filename)

If you want to sync files as they're being written, you can specify a filename or glob in wandb.save.

Examples of wandb.save

See this report for a complete working example.

# Save a model file from the current directory
wandb.save('model.h5')
# Save all files that currently exist containing the substring "ckpt"
wandb.save('../logs/*ckpt*')
# Save any files starting with "checkpoint" as they're written to
wandb.save(os.path.join(wandb.run.dir, "checkpoint*"))

W&B's local run directories are by default inside the ./wandb directory relative to your script, and the path looks like run-20171023_105053-3o4933r0 where 20171023_105053 is the timestamp and 3o4933r0 is the ID of the run. You can set the WANDB_DIR environment variable, or the dir keyword argument of wandb.init, to an absolute path and files will be written within that directory instead.

Example of saving a file to the wandb run directory

The file model.h5 is saved into the wandb.run.dir and will be uploaded at the end of training.

import wandb
wandb.init()
model.fit(X_train, y_train, validation_data=(X_test, y_test),
callbacks=[wandb.keras.WandbCallback()])
model.save(os.path.join(wandb.run.dir, "model.h5"))

Here's a public example page. You can see on the files tab, there's a model-best.h5. That's automatically saved by default by the Keras integration, but you can save a checkpoint manually and we'll store it for you in association with your run.

See the live example →

Restoring Files

Calling wandb.restore(filename)will restore a file into your local run directory. Typicallyfilename refers to a file generated by an earlier experiment run and uploaded to our cloud with wandb.save. This call will make a local copy of the file and return a local file stream open for reading.

Common use cases:

  • restore the model architecture or weights generated by past runs (for more complicated version control use cases, see our Artifacts tool)

  • resume training from the last checkpoint in the case of failure (see the section on resuming for important details)

Examples of wandb.restore

See this report for a complete working example.

# restore a model file from a specific run by user "vanpelt" in "my-project"
best_model = wandb.restore(
'model-best.h5', run_path="vanpelt/my-project/a1b2c3d")
# restore a weights file from a checkpoint
# (NOTE: resuming must be configured if run_path is not provided)
weights_file = wandb.restore('weights.h5')
# use the "name" attribute of the returned object
# if your framework expects a filename, e.g. as in Keras
my_predefined_model.load_weights(weights_file.name)

If you don't specify a run_path, you'll need to configure resuming for your run. If you want access to files programmatically outside of training, use the Run API.

Common Questions

How do I ignore files?

You can edit the wandb/settings file and set ignore_globs equal to a comma separated list of globs. You can also set the WANDB_IGNORE_GLOBS environment variable. A common use case is to prevent the git patch that we automatically create from being uploaded i.e. WANDB_IGNORE_GLOBS=*.patch.

How can I sync files before the run ends?

If you have a long run, you might want to see files like model checkpoints uploaded to the cloud before the end of the run. By default, we wait to upload most files until the end of the run. You can add a wandb.save('*.pth') or just wandb.save('latest.pth') in your script to upload those files whenever they are written or updated.

Change directory for saving files

If you default to saving files in AWS S3 or Google Cloud Storage, you might get this error:events.out.tfevents.1581193870.gpt-tpu-finetune-8jzqk-2033426287 is a cloud storage url, can't save file to wandb.

To change the log directory for TensorBoard events files or other files you'd like us to sync, save your files to the wandb.run.dir so they're synced to our cloud.

How do I get the name of a run?

If you'd like to use the run name from within your script, you can use wandb.run.name and you'll get the run name— "blissful-waterfall-2" for example.

You need to call save on the run before you can access the display name:

run = wandb.init(...)
run.save()
print(run.name)

How can I push all saved files from local?

Call wandb.save("*.pt") once at the top of your script after wandb.init, then all files that match that pattern will save immediately once they're written to wandb.run.dir.

Can I remove local files that have already been synced to cloud storage?

There’s a command wandb sync --clean that you can run to remove local files that have already been synced to cloud storage. More information about usage can be found with wandb sync --help

What if I want to restore the state of my code?

Use the restore command of our command line tool to return to the state of your code when you ran a given run.

# creates a branch and restores the code to the state
# it was in when run $RUN_ID was executed
wandb restore $RUN_ID

How does wandb capture the state of the code?

When wandb.init is called from your script, a link is saved to the last git commit if the code is in a git repository. A diff patch is also created in case there are uncommitted changes or changes that are out of sync with your remote.