Skip to main content

Artifacts

Use W&B Artifacts to track datasets, models, dependencies, and results through each step of your machine learning pipeline. Artifacts make it easy to get a complete and auditable history of changes to your files.

Artifacts can be thought of as a versioned directory. Artifacts are either an input of a run or an output of a run. Common artifacts include entire training sets and models. Store datasets directly into artifacts, or use artifact references to point to data in other systems like Amazon S3, GCP, or your own system.

Artifact overviewArtifacts can be an input or an output of a given run.

Artifacts and runs form a directed graph because a given W&B run can use another run’s output artifact as input. You do not need to define pipelines ahead of time. Weights and Biases will create the DAG for you when you use and log artifacts.

The following animation demonstrates an example artifacts DAG as seen in the W&B App UI.

Example artifact DAG

For more information about exploring an artifacts graph, see Explore and traverse an artifact graph.

How it works

An artifact is like a directory of data. Each entry is either an actual file stored in the artifact, or a reference to an external URI. You can nest folders inside an artifact just like a regular filesystem. You can store any data, including: datasets, models, images, HTML, code, audio, raw binary data and more.

Every time you change the contents of this directory, W&B will create a new version of your artifact instead of overwriting the previous contents.

As an example, assume we have the following directory structure:

images
|-- cat.png (2MB)
|-- dog.png (1MB)

The proceeding code snippet demonstrates how to create a dataset artifact called ‘animals’. (The specifics of how the following code snippet work are explained in greater detail in later sections).

import wandb

run = wandb.init() # Initialize a W&B Run
artifact = wandb.Artifact('animals', type='dataset')
artifact.add_dir('images') # Adds multiple files to artifact
run.log_artifact(artifact) # Creates `animals:v0`

W&B automatically assigns a version v0 and attaches an alias called latest when you create and log a new artifact object to W&B. An alias is a human-readable name that you can give to an artifact version.

If you create another artifact with the same name, type, and contents (in other words, you create another version of the artifact), W&B will increase the version index by one. The alias latest is unassigned from artifact v0 and assigned to the v1 artifact.

W&B uploads files that were modified between artifacts versions. For more information about how artifacts are stored, see Artifacts Storage.

You can use either the index version or the alias to refer to a specific artifact.

As an example, suppose you want to upload a new image, bird.png, to your dataset artifact. Continuing from the previous code example, your directory might look similar to the following:

images
|-- cat.png (2MB)
|-- dog.png (1MB)
|-- bird.png (3MB)

Re initialize the previous code snippet. This will produce a new artifact version animals:v1. W&B will automatically assign this version with the alias: latest . You can customize the aliases to apply to a version by passing in aliases=['my-cool-alias'] to log_artifact. For more information about how to create new versions, see Create a new artifact version.

To use the artifact, provide the name of the artifact along with the alias.

import wandb

run = wandb.init()
animals = run.use_artifact('animals:latest')
directory = animals.download()

For more information about how to download use artifacts, see Use an artifact.

How to get started

Depending on your use case, explore the following resources to get started with W&B Artifacts:

  • If this is your first time using W&B Artifacts, we recommend you read the Quick Start. The Quickstart walks you through setting up your first artifact.
  • Explore topics about Artifacts in the W&B Developer Guide such as:
    • Create an artifact or a new artifact version.
    • Update an artifact.
    • Download and use an artifact.
    • Delete artifacts.
  • Within the W&B SDK Reference Guide explore Python Artifact APIs and Artifact CLI Reference Guide.

For a step-by-step video, see Version Control Data and Model with W&B Artifacts.

Was this page helpful?👍👎