run.log_artifact(). You can also track datasets in a remote filesystem (e.g. cloud storage in S3 or GCP) by reference, using a link or URI instead of the raw contents.
"balanced_data") and a name (
"imagenet_cats_10K"). When you log the same name again, W&B automatically creates a new version of the artifact with the latest contents.
"production"to highlight the important versions in a lineage of artifacts.
"latest"is the alias of the most recent version). To keep data transfer lean and fast, wandb caches files.
run.log_artifact()to push the new version to the cloud. This will automatically update the artifact with a new version to reflect your changes, while preserving the lineage and history of previous changes.
stableversion of the data, to taste.
nature-datadirectory contains two lists of photo ids,
plant-ids.txt. We edit
animals-ids.txtto remove mislabeled examples. This script will capture the new version neatly — we'll checksum the artifact, identify that something changed, and track the new version. If nothing changes, we don't reupload any data (i.e. in this case, we don't reupload
plant-ids.txt) or create a new version.
test; for scripts this could be
evaluate, etc. You may also want to log other data as artifacts: a model's predictions on fixed validation data, samples of generated output, evaluation metrics, etc.