Use W&B Artifacts to track datasets, models, dependencies, and results through each step of your machine learning pipeline. Artifacts make it easy to get a complete and auditable history of changes to your files.
Artifacts can be thought of as a versioned directory. Artifacts are either an input of a run or an output of a run. Common artifacts include entire training sets and models. Store datasets directly into artifacts, or use artifact references to point to data in other systems like Amazon S3, GCP, or your own system.
Artifacts can be an input or an output of a given run.
Artifacts and runs form a directed graph because a given W&B run can use another run’s output artifact as input. You do not need to define pipelines ahead of time. Weights and Biases will create the DAG for you when you use and log artifacts.
The following animation demonstrates an example artifacts DAG as seen in the W&B App UI.
For more information about exploring an artifacts graph, see Explore and traverse an artifact graph.
How it works
An artifact is like a directory of data. Each entry is either an actual file stored in the artifact, or a reference to an external URI. You can nest folders inside an artifact just like a regular filesystem. You can store any data, including: datasets, models, images, HTML, code, audio, raw binary data and more.
Every time you change the contents of this directory, W&B will create a new version of your artifact instead of overwriting the previous contents.
As an example, assume we have the following directory structure:
|-- cat.png (2MB)
|-- dog.png (1MB)
The proceeding code snippet demonstrates how to create a dataset artifact called
‘animals’. (The specifics of how the following code snippet work are explained in greater detail in later sections).
run = wandb.init() # Initialize a W&B Run
artifact = wandb.Artifact('animals', type='dataset')
artifact.add_dir('images') # Adds multiple files to artifact
run.log_artifact(artifact) # Creates `animals:v0`
W&B automatically assigns a version
v0 and attaches an alias called
latest when you create and log a new artifact object to W&B. An alias is a human-readable name that you can give to an artifact version.
If you create another artifact with the same name, type, and contents (in other words, you create another version of the artifact), W&B will increase the version index by one. The alias
latest is unassigned from artifact
v0 and assigned to the
W&B uploads files that were modified between artifacts versions. For more information about how artifacts are stored, see Artifacts Storage.
You can use either the index version or the alias to refer to a specific artifact.
As an example, suppose you want to upload a new image,
bird.png, to your dataset artifact. Continuing from the previous code example, your directory might look similar to the following:
|-- cat.png (2MB)
|-- dog.png (1MB)
|-- bird.png (3MB)
Re initialize the previous code snippet. This will produce a new artifact version
animals:v1. W&B will automatically assign this version with the alias:
latest . You can customize the aliases to apply to a version by passing in
log_artifact. For more information about how to create new versions, see Create a new artifact version.
To use the artifact, provide the name of the artifact along with the alias.
run = wandb.init()
animals = run.use_artifact('animals:latest')
directory = animals.download()
For more information about how to download use artifacts, see Use an artifact.
How to get started
Depending on your use case, explore the following resources to get started with W&B Artifacts:
- If this is your first time using W&B Artifacts, we recommend you read the Quick Start. The Quickstart walks you through setting up your first artifact.
- Explore topics about Artifacts in the W&B Developer Guide such as:
- Create an artifact or a new artifact version.
- Update an artifact.
- Download and use an artifact.
- Delete artifacts.
- Within the W&B SDK Reference Guide explore Python Artifact APIs and Artifact CLI Reference Guide.
For a step-by-step video, see Version Control Data and Model with W&B Artifacts.