Use Weights & Biases Artifacts to track datasets, models, dependencies, and results through each step of your machine learning pipeline. Artifacts make it easy to get a complete and auditable history of changes to your files.
Artifacts can be thought of as a versioned directory. Artifacts are either an input of a run or an output of a run. Common artifacts include entire training sets and models. Store datasets directly into artifacts, or use artifact references to point to data in other systems like Amazon S3, GCP, or your own system.
Artifacts can be an input or an output of a given run.
Artifacts and runs form a directed graph because a given W&B run can use another run’s output artifact as input. You do not need to define pipelines ahead of time. Weights and Biases will create the DAG for you when you use and log artifacts.
The following animation demonstrates an example artifacts DAG as seen in the W&B App UI.
Example artifact DAG
An artifact is like a directory of data. Each entry is either an actual file stored in the artifact, or a reference to an external URI. You can nest folders inside an artifact just like a regular filesystem. You can store any data, including: datasets, models, images, HTML, code, audio, raw binary data and more.
Every time you change the contents of this directory, Weights & Biases will create a new version of your artifact instead overwriting the previous contents.
As an example, assume we have the following directory structure:
|-- cat.png (2MB)
|-- dog.png (1MB)
The proceeding code snippet demonstrates how to create a dataset artifact called
‘animals’. (The specifics of how the following code snippet work are explained in greater detail in later sections).
run = wandb.init() # Initialize a W&B Run
artifact = wandb.Artifact('animals', type='dataset')
artifact.add_dir('images') # Adds multiple files to artifact
run.log_artifact(artifact) # Creates `animals:v0`
Weights & Biases automatically assigns a version
v0and attaches an alias called
latestwhen you create and log a new artifact object to W&B. An alias is a human-readable name that you can give to an artifact version.
If you create another artifact with the same name, type, and contents (in other words, you create another version of the artifact), W&B will increase the version index by one. The alias
latestis unassigned from artifact
v0and assigned to the
You can use either the index version or the alias to refer to a specific artifact.
As an example, suppose you want to upload a new image,
bird.png, to your dataset artifact. Continuing from the previous code example, your directory might look similar to the following:
|-- cat.png (2MB)
|-- dog.png (1MB)
|-- bird.png (3MB)
Re initialize the previous code snippet. This will produce a new artifact version
animals:v1. W&B will automatically assign this version with the alias:
latest. You can customize the aliases to apply to a version by passing in
log_artifact. For more information about how to create new versions, see Create a new artifact version.
To use the artifact, provide the name of the artifact along with the alias.
run = wandb.init()
animals = run.use_artifact('animals:latest')
directory = animals.download()
Depending on your use case, explore the following resources to get started with W&B Artifacts:
- Explore topics about Artifacts in the Weights & Biases Developer Guide such as:
- Create an artifact or a new artifact version.
- Update an artifact.
- Download and use an artifact.
- Delete artifacts.