run.log_artifact()
. You can also track datasets in a remote filesystem (e.g. cloud storage in S3 or GCP) by reference, using a link or URI instead of the raw contents."raw_data"
, "preprocessed_data"
, "balanced_data"
) and a name ("imagenet_cats_10K"
). When you log the same name again, W&B automatically creates a new version of the artifact with the latest contents."best"
or "production"
to highlight the important versions in a lineage of artifacts."latest"
is the alias of the most recent version). To keep data transfer lean and fast, wandb caches files.run.log_artifact()
to push the new version to the cloud. This will automatically update the artifact with a new version to reflect your changes, while preserving the lineage and history of previous changes.latest
or a stable
version of the data, to taste.nature-data
directory contains two lists of photo ids, animal-ids.txt
and plant-ids.txt
. We edit animals-ids.txt
to remove mislabeled examples. This script will capture the new version neatly — we'll checksum the artifact, identify that something changed, and track the new version. If nothing changes, we don't reupload any data (i.e. in this case, we don't reupload plant-ids.txt
) or create a new version.train
, val
, or test
; for scripts this could be preprocess
, train
, evaluate
, etc. You may also want to log other data as artifacts: a model's predictions on fixed validation data, samples of generated output, evaluation metrics, etc.