Create new artifact versions
Create a new artifact version using a single run, collaboratively using distributed writers, or as a patch against a prior version.
Create a new artifact version in one of three ways:
- Simple: A single run provides all the data for a new version. This is the most common case and is best suited for when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis.
- Collaborative: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions.
- Patch: (coming soon) A single run provides a patch of the differences to be applied. This is best suited when a run wants to add data to an artifact without needing to recreate all the already existing data. For example: you have a golden dataset which is created by running a daily web scraper - in this case, you want the run to append new data to the dataset.
Simple Mode
To log a new version of an Artifact with a single run that produces all the files in the artifact, use the Simple Mode:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
run.log_artifact(artifact)
Use the Artifact.save()
to create the version without starting a run.
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
artifact.save()
Collaborative Mode
Use Collaborative Mode to allow a collection of runs to collaborate on a version before committing it. There are two key ideas to keep in mind when using Collaborative Mode:
- Each Run in the collection needs to be aware of the same unique ID (called
distributed_id
) in order to collaborate on the same version. As a default, if present, Weights & Biases uses the run'sgroup
as set bywandb.init(group=GROUP)
as thedistributed_id
. - There must be a final run that "commits" the version, permanently locking its state.
Consider the following example. Note that rather than using log_artifact
we use upsert_artifact
to add the the collaborative artifact and finish_artifact
to finalize the commit:
Run 1:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
run.upsert_artifact(
artifact,
distributed_id="my_dist_artifact"
)
Run 2:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image2.png")
run.upsert_artifact(
artifact,
distributed_id="my_dist_artifact"
)
Run 3
Must run after Run 1 and Run 2 complete. The Run that calls finish_artifact
can include files in the artifact, but does not need to.
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image3.png")
run.finish_artifact(
artifact,
distributed_id="my_dist_artifact"
)