Save & Restore Files
Save files to the cloud and restore them locally later
This guide first demonstrates how to save files to the cloud with
wandb.save, then demonstrates how they can be re-created locally with
Sometimes, rather than logging a numerical value or a piece of media, you want to log a whole file: the weights of a model, the output of other logging software, even source code.
There are two ways to associate a file with a run and upload it to W&B.
- 2.Put a file in the wandb run directory, and it will get uploaded at the end of the run.
If you want to sync files as they're being written, you can specify a filename or glob in
# Save a model file from the current directory
# Save all files that currently exist containing the substring "ckpt"
# Save any files starting with "checkpoint" as they're written to
wandb.saveaccepts a policy argument which is set to "live" by default. Available policies are:
- live (default) - sync this file to a wandb server immediately and re-sync it if it changes
- now - sync this file to a wandb server immediately, don't continue syncing if it changes
- end - only sync the file when the run finishes
You can also specify the base_path argument to
wandb.save. This would allow you to maintain a directory hierarchy, for example:
wandb.save("./results/eval/*", base_path="./results", policy="now")
Would result in all files matching the pattern being saved in an
evalfolder instead of at the root.
model.h5is saved into the
wandb.run.dirand will be uploaded at the end of training.
model.fit(X_train, y_train, validation_data=(X_test, y_test),
Here's a public example page. You can see on the files tab, there's a
model-best.h5. That's automatically saved by default by the Keras integration, but you can save a checkpoint manually and we'll store it for you in association with your run.
wandb.restore(filename)will restore a file into your local run directory. Typically
filenamerefers to a file generated by an earlier experiment run and uploaded to our cloud with
wandb.save. This call will make a local copy of the file and return a local file stream open for reading.
Common use cases:
# restore a model file from a specific run by user "vanpelt" in "my-project"
best_model = wandb.restore(
# restore a weights file from a checkpoint
# (NOTE: resuming must be configured if run_path is not provided)
weights_file = wandb.restore('weights.h5')
# use the "name" attribute of the returned object
# if your framework expects a filename, e.g. as in Keras
If you have a long run, you might want to see files like model checkpoints uploaded to the cloud before the end of the run. By default, we wait to upload most files until the end of the run. You can add a
wandb.save('latest.pth')in your script to upload those files whenever they are written or updated.
If you default to saving files in AWS S3 or Google Cloud Storage, you might get this error:
events.out.tfevents.1581193870.gpt-tpu-finetune-8jzqk-2033426287 is a cloud storage url, can't save file to wandb.
To change the log directory for TensorBoard events files or other files you'd like us to sync, save your files to the
wandb.run.dirso they're synced to our cloud.
If you'd like to use the run name from within your script, you can use
wandb.run.nameand you'll get the run name— "blissful-waterfall-2" for example.
You need to call save on the run before you can access the display name:
run = wandb.init(...)
wandb.save("*.pt")once at the top of your script after
wandb.init, then all files that match that pattern will save immediately once they're written to
There’s a command
wandb sync --cleanthat you can run to remove local files that have already been synced to cloud storage. More information about usage can be found with
wandb sync --help
# creates a branch and restores the code to the state
# it was in when run $RUN_ID was executed
wandb restore $RUN_ID
wandb.initis called from your script, a link is saved to the last git commit if the code is in a git repository. A diff patch is also created in case there are uncommitted changes or changes that are out of sync with your remote.