This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Artifacts

Overview of W&B Artifacts, how they work, and how to get started using them.

1: Create an artifact
2: Download and use artifacts
3: Update an artifact
4: Create an artifact alias
5: Create an artifact version
6: Track external files
7: Manage data

7.1: Delete an artifact
7.2: Manage artifact data retention
7.3: Manage artifact storage and memory allocation

8: Explore artifact graphs
9: Artifact data privacy and compliance
10: Tutorial: Create, track, and use a dataset artifact

Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. You can log hyperparameters, metadata, and metrics to a run, and you can use an artifact to log, track, and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as output.

Use cases

You can use artifacts throughout your entire ML workflow as inputs and outputs of runs. You can use datasets, models, or even other artifacts as inputs for processing.

Use Case	Input	Output
Model Training	Dataset (training and validation data)	Trained Model
Dataset Pre-Processing	Dataset (raw data)	Dataset (pre-processed data)
Model Evaluation	Model + Dataset (test data)	W&B Table
Model Optimization	Model	Optimized Model

The proceeding code snippets are meant to be run in order.

Create an artifact

Create an artifact with four lines of code:

Create a W&B run.
Create an artifact object with the wandb.Artifact API.
Add one or more files, such as a model file or dataset, to your artifact object.
Log your artifact to W&B.

For example, the proceeding code snippet shows how to log a file called dataset.h5 to an artifact called example_artifact:

import wandb

run = wandb.init(project="artifacts-example", job_type="add-dataset")
artifact = wandb.Artifact(name="example_artifact", type="dataset")
artifact.add_file(local_path="./dataset.h5", name="training_dataset")
artifact.save()

# Logs the artifact version "my_data" as a dataset with data from dataset.h5

See the track external files page for information on how to add references to files or directories stored in external object storage, like an Amazon S3 bucket.

Download an artifact

Indicate the artifact you want to mark as input to your run with the use_artifact method.

Following the preceding code snippet, this next code block shows how to use the training_dataset artifact:

artifact = run.use_artifact(
    "training_dataset:latest"
)  # returns a run object using the "my_data" artifact

This returns an artifact object.

Next, use the returned object to download all contents of the artifact:

datadir = (
    artifact.download()
)  # downloads the full `my_data` artifact to the default directory.

You can pass a custom path into the root parameter to download an artifact to a specific directory. For alternate ways to download artifacts and to see additional parameters, see the guide on downloading and using artifacts.

Next steps

Learn how to version and update artifacts.
Learn how to trigger downstream workflows or notify a Slack channel in response to changes to your artifacts with automations.
Learn about the registry, a space that houses trained models.
Explore the Python SDK and CLI reference guides.

1 - Create an artifact

Create, construct a W&B Artifact. Learn how to add one or more files or a URI reference to an Artifact.

Use the W&B Python SDK to construct artifacts from W&B Runs. You can add files, directories, URIs, and files from parallel runs to artifacts. After you add a file to an artifact, save the artifact to the W&B Server or your own private server.

For information on how to track external files, such as files stored in Amazon S3, see the Track external files page.

How to construct an artifact

Construct a W&B Artifact in three steps:

1. Create an artifact Python object with `wandb.Artifact()`

Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:

Name: Specify a name for your artifact. The name should be unique, descriptive, and easy to remember. Use an artifacts name to both: identify the artifact in the W&B App UI and when you want to use that artifact.
Type: Provide a type. The type should be simple, descriptive and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.

The “name” and “type” you provide is used to create a directed acyclic graph. This means you can view the lineage of an artifact on the W&B App.

See the Explore and traverse artifact graphs for more information.

Artifacts can not have the same name, even if you specify a different type for the types parameter. In other words, you can not create an artifact named cats of type dataset and another artifact with the same name of type model.

You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see wandb.Artifact Class definition in the Python SDK Reference Guide.

The proceeding example demonstrates how to create a dataset artifact:

import wandb

artifact = wandb.Artifact(name="<replace>", type="<replace>")

Replace the string arguments in the preceding code snippet with your own name and type.

2. Add one more files to the artifact

Add files, directories, external URI references (such as Amazon S3) and more with artifact methods. For example, to add a single text file, use the add_file method:

artifact.add_file(local_path="hello_world.txt", name="optional-name")

You can also add multiple files with the add_dir method. For more information on how to add files, see Update an artifact.

3. Save your artifact to the W&B server

Finally, save your artifact to the W&B server. Artifacts are associated with a run. Therefore, use a run objects log_artifact() method to save the artifact.

# Create a W&B Run. Replace 'job-type'.
run = wandb.init(project="artifacts-example", job_type="job-type")

run.log_artifact(artifact)

You can optionally construct an artifact outside of a W&B run. For more information, see Track external files.

Calls to log_artifact are performed asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example:

for i in range(10):
    a = wandb.Artifact(
        "race",
        type="dataset",
        metadata={
            "index": i,
        },
    )
    # ... add files to artifact a ...
    run.log_artifact(a)

The artifact version v0 is NOT guaranteed to have an index of 0 in its metadata, as the artifacts may be logged in an arbitrary order.

Add files to an artifact

The following sections demonstrate how to construct artifacts with different file types and from parallel runs.

For the following examples, assume you have a project directory with multiple files and a directory structure:

project-directory
|-- images
|   |-- cat.png
|   +-- dog.png
|-- checkpoints
|   +-- model.h5
+-- model.h5

Add a single file

The proceeding code snippet demonstrates how to add a single, local file to your artifact:

# Add a single file
artifact.add_file(local_path="path/file.format")

For example, suppose you had a file called 'file.txt' in your working local directory.

artifact.add_file("path/file.txt")  # Added as `file.txt'

The artifact now has the following content:

file.txt

Optionally, pass the desired path within the artifact for the name parameter.

artifact.add_file(local_path="path/file.format", name="new/path/file.format")

The artifact is stored as:

new/path/file.txt

API Call	Resulting artifact
`artifact.add_file('model.h5')`	model.h5
`artifact.add_file('checkpoints/model.h5')`	model.h5
`artifact.add_file('model.h5', name='models/mymodel.h5')`	models/mymodel.h5

Add multiple files

The proceeding code snippet demonstrates how to add an entire, local directory to your artifact:

# Recursively add a directory
artifact.add_dir(local_path="path/file.format", name="optional-prefix")

The proceeding API calls produce the proceeding artifact content:

API Call Resulting artifact

API Call	Resulting artifact
`artifact.add_dir('images')`	`cat.png` `dog.png`
`artifact.add_dir('images', name='images')`	`images/cat.png` `images/dog.png`
`artifact.new_file('hello.txt')`	`hello.txt`

artifact.add_dir('images')

cat.png

dog.png

artifact.add_dir('images', name='images')

images/cat.png

images/dog.png

artifact.new_file('hello.txt') hello.txt

Add a URI reference

Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle.

Add an external URI reference to an artifact with the add_reference method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.

# Add a URI reference
artifact.add_reference(uri="uri", name="optional-name")

Artifacts currently support the following URI schemes:

http(s)://: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
s3://: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
gs://: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.

The proceeding API calls will produce the proceeding artifacts:

API call	Resulting artifact contents
`artifact.add_reference('s3://my-bucket/model.h5')`	`model.h5`
`artifact.add_reference('s3://my-bucket/checkpoints/model.h5')`	`model.h5`
`artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5')`	`models/mymodel.h5`
`artifact.add_reference('s3://my-bucket/images')`	`cat.png` `dog.png`
`artifact.add_reference('s3://my-bucket/images', name='images')`	`images/cat.png` `images/dog.png`

Add files to artifacts from parallel runs

For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.

import wandb
import time

# We will use ray to launch our runs in parallel
# for demonstration purposes. You can orchestrate
# your parallel runs however you want.
import ray

ray.init()

artifact_type = "dataset"
artifact_name = "parallel-artifact"
table_name = "distributed_table"
parts_path = "parts"
num_parallel = 5

# Each batch of parallel writers should have its own
# unique group name.
group_name = "writer-group-{}".format(round(time.time()))


@ray.remote
def train(i):
    """
    Our writer job. Each writer will add one image to the artifact.
    """
    with wandb.init(group=group_name) as run:
        artifact = wandb.Artifact(name=artifact_name, type=artifact_type)

        # Add data to a wandb table. In this case we use example data
        table = wandb.Table(columns=["a", "b", "c"], data=[[i, i * 2, 2**i]])

        # Add the table to folder in the artifact
        artifact.add(table, "{}/table_{}".format(parts_path, i))

        # Upserting the artifact creates or appends data to the artifact
        run.upsert_artifact(artifact)


# Launch your runs in parallel
result_ids = [train.remote(i) for i in range(num_parallel)]

# Join on all the writers to make sure their files have
# been added before finishing the artifact.
ray.get(result_ids)

# Once all the writers are finished, finish the artifact
# to mark it ready.
with wandb.init(group=group_name) as run:
    artifact = wandb.Artifact(artifact_name, type=artifact_type)

    # Create a "PartitionTable" pointing to the folder of tables
    # and add it to the artifact.
    artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)

    # Finish artifact finalizes the artifact, disallowing future "upserts"
    # to this version.
    run.finish_artifact(artifact)

2 - Download and use artifacts

Download and use Artifacts from multiple projects.

Download and use an artifact that is already stored on the W&B server or construct an artifact object and pass it in to for de-duplication as necessary.

Team members with view-only seats cannot download artifacts.

Download and use an artifact stored on W&B

Download and use an artifact stored in W&B either inside or outside of a W&B Run. Use the Public API (wandb.Api) to export (or update data) already saved in W&B. For more information, see the W&B Public API Reference guide.

First, import the W&B Python SDK. Next, create a W&B Run:

import wandb

run = wandb.init(project="<example>", job_type="<job-type>")

Indicate the artifact you want to use with the use_artifact method. This returns a run object. In the proceeding code snippet specifies an artifact called 'bike-dataset' with the alias 'latest':

artifact = run.use_artifact("bike-dataset:latest")

Use the object returned to download all the contents of the artifact:

datadir = artifact.download()

You can optionally pass a path to the root parameter to download the contents of the artifact to a specific directory. For more information, see the Python SDK Reference Guide.

Use the get_path method to download only subset of files:

path = artifact.get_path(name)

This fetches only the file at the path name. It returns an Entry object with the following methods:

Entry.download: Downloads file from the artifact at path name
Entry.ref: If add_reference stored the entry as a reference, returns the URI

References that have schemes that W&B knows how to handle get downloaded just like artifact files. For more information, see Track external files.

First, import the W&B SDK. Next, create an artifact from the Public API Class. Provide the entity, project, artifact, and alias associated with that artifact:

import wandb

api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")

Use the object returned to download the contents of the artifact:

artifact.download()

You can optionally pass a path the root parameter to download the contents of the artifact to a specific directory. For more information, see the API Reference Guide.

Use the wandb artifact get command to download an artifact from the W&B server.

$ wandb artifact get project/artifact:alias --root mnist/

Partially download an artifact

You can optionally download part of an artifact based on a prefix. Using the path_prefix parameter, you can download a single file or the content of a sub-folder.

artifact = run.use_artifact("bike-dataset:latest")

artifact.download(path_prefix="bike.png") # downloads only bike.png

Alternatively, you can download files from a certain directory:

artifact.download(path_prefix="images/bikes/") # downloads files in the images/bikes directory

Use an artifact from a different project

Specify the name of artifact along with its project name to reference an artifact. You can also reference artifacts across entities by specifying the name of the artifact with its entity name.

The following code example demonstrates how to query an artifact from another project as input to the current W&B run.

import wandb

run = wandb.init(project="<example>", job_type="<job-type>")
# Query W&B for an artifact from another project and mark it
# as an input to this run.
artifact = run.use_artifact("my-project/artifact:alias")

# Use an artifact from another entity and mark it as an input
# to this run.
artifact = run.use_artifact("my-entity/my-project/artifact:alias")

Construct and use an artifact simultaneously

Simultaneously construct and use an artifact. Create an artifact object and pass it to use_artifact. This creates an artifact in W&B if it does not exist yet. The use_artifact API is idempotent, so you can call it as many times as you like.

import wandb

artifact = wandb.Artifact("reference model")
artifact.add_file("model.h5")
run.use_artifact(artifact)

For more information about constructing an artifact, see Construct an artifact.

3 - Update an artifact

Update an existing Artifact inside and outside of a W&B Run.

Pass desired values to update the description, metadata, and alias of an artifact. Call the save() method to update the artifact on the W&B servers. You can update an artifact during a W&B Run or outside of a Run.

Use the W&B Public API (wandb.Api) to update an artifact outside of a run. Use the Artifact API (wandb.Artifact) to update an artifact during a run.

You can not update the alias of artifact linked to a model in Model Registry.

The proceeding code example demonstrates how to update the description of an artifact using the wandb.Artifact API:

import wandb

run = wandb.init(project="<example>")
artifact = run.use_artifact("<artifact-name>:<alias>")
artifact.description = "<description>"
artifact.save()

The proceeding code example demonstrates how to update the description of an artifact using the wandb.Api API:

import wandb

api = wandb.Api()

artifact = api.artifact("entity/project/artifact:alias")

# Update the description
artifact.description = "My new description"

# Selectively update metadata keys
artifact.metadata["oldKey"] = "new value"

# Replace the metadata entirely
artifact.metadata = {"newKey": "new value"}

# Add an alias
artifact.aliases.append("best")

# Remove an alias
artifact.aliases.remove("latest")

# Completely replace the aliases
artifact.aliases = ["replaced"]

# Persist all artifact modifications
artifact.save()

For more information, see the Weights and Biases Artifact API.

You can also update an Artifact collection in the same way as a singular artifact:

import wandb
run = wandb.init(project="<example>")
api = wandb.Api()
artifact = api.artifact_collection(type="<type-name>", collection="<collection-name>")
artifact.name = "<new-collection-name>"
artifact.description = "<This is where you'd describe the purpose of your collection.>"
artifact.save()

For more information, see the Artifacts Collection reference.

4 - Create an artifact alias

Create custom aliases for W&B Artifacts.

Use aliases as pointers to specific versions. By default, Run.log_artifact adds the latest alias to the logged version.

An artifact version v0 is created and attached to your artifact when you log an artifact for the first time. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.

For example, if you want your training script to pull the most recent version of a dataset, specify latest when you use that artifact. The proceeding code example demonstrates how to download a recent dataset artifact named bike-dataset that has an alias, latest:

import wandb

run = wandb.init(project="<example-project>")

artifact = run.use_artifact("bike-dataset:latest")

artifact.download()

You can also apply a custom alias to an artifact version. For example, if you want to mark that model checkpoint is the best on the metric AP-50, you could add the string 'best-ap50' as an alias when you log the model artifact.

artifact = wandb.Artifact("run-3nq3ctyy-bike-model", type="model")
artifact.add_file("model.h5")
run.log_artifact(artifact, aliases=["latest", "best-ap50"])

5 - Create an artifact version

Create a new artifact version from a single run or from a distributed process.

Create a new artifact version with a single run or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an incremental artifact.

We recommend that you create an incremental artifact when you need to apply changes to a subset of files in an artifact, where the size of the original artifact is significantly larger.

Create new artifact versions from scratch

There are two ways to create a new artifact version: from a single run and from distributed runs. They are defined as follows:

Single run: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis.
Distributed runs: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions.

W&B will create a new artifact and assign it a v0 alias if you pass a name to the wandb.Artifact API that does not exist in your project. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.

W&B will retrieve an existing artifact if you pass a name and artifact type to the wandb.Artifact API that matches an existing artifact in your project. The retrieved artifact will have a version greater than 1.

Single run

Log a new version of an Artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact.

Based on your use case, select one of the tabs below to create a new artifact version inside or outside of a run:

Create an artifact version within a W&B run:

Create a run with wandb.init.
Create a new artifact or retrieve an existing one with wandb.Artifact.
Add files to the artifact with .add_file.
Log the artifact to the run with .log_artifact.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Create an artifact version outside of a W&B run:

Create a new artifact or retrieve an existing one with wanb.Artifact.
Add files to the artifact with .add_file.
Save the artifact with .save.

artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
artifact.save()

Distributed runs

Allow a collection of runs to collaborate on a version before committing it. This is in contrast to single run mode described above where one run provides all the data for a new version.

Each run in the collection needs to be aware of the same unique ID (called distributed_id) in order to collaborate on the same version. By default, if present, W&B uses the run’s group as set by wandb.init(group=GROUP) as the distributed_id.
There must be a final run that “commits” the version, permanently locking its state.
Use upsert_artifact to add to the collaborative artifact and finish_artifact to finalize the commit.

Consider the following example. Different runs (labelled below as Run 1, Run 2, and Run 3) add a different image file to the same artifact with upsert_artifact.

Run 1:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 2:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image2.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 3

Must run after Run 1 and Run 2 complete. The Run that calls finish_artifact can include files in the artifact, but does not need to.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image3.png")
    run.finish_artifact(artifact, distributed_id="my_dist_artifact")

Create a new artifact version from an existing version

Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn’t change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an incremental artifact.

Here are some scenarios for each type of incremental change you might encounter:

add: you periodically add a new subset of files to a dataset after collecting a new batch.
remove: you discovered several duplicate files and want to remove them from your artifact.
update: you corrected annotations for a subset of files and want to replace the old files with the correct ones.

You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you will need to have all the contents of your artifact on your local disk. When making an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version.

You can create an incremental artifact within a single run or with a set of runs (distributed mode).

Follow the procedure below to incrementally change an artifact:

Obtain the artifact version you want to perform an incremental change on:

saved_artifact = run.use_artifact("my_artifact:latest")

client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")

Create a draft with:

draft_artifact = saved_artifact.new_draft()

Perform any incremental changes you want to see in the next version. You can either add, remove, or modify an existing entry.

Select one of the tabs for an example on how to perform each of these changes:

Add a file to an existing artifact version with the add_file method:

draft_artifact.add_file("file_to_add.txt")

You can also add multiple files by adding a directory with the add_dir method.

Remove a file from an existing artifact version with the remove method:

draft_artifact.remove("file_to_remove.txt")

You can also remove multiple files with the remove method by passing in a directory path.

Modify or replace contents by removing the old contents from the draft and adding the new contents back in:

draft_artifact.remove("modified_file.txt")
draft_artifact.add_file("modified_file.txt")

Lastly, log or save your changes. The following tabs show you how to save your changes inside and outside of a W&B run. Select the tab that is appropriate for your use case:

run.log_artifact(draft_artifact)

draft_artifact.save()

Putting it all together, the code examples above look like:

with wandb.init(job_type="modify dataset") as run:
    saved_artifact = run.use_artifact(
        "my_artifact:latest"
    )  # fetch artifact and input it into your run
    draft_artifact = saved_artifact.new_draft()  # create a draft version

    # modify a subset of files in the draft version
    draft_artifact.add_file("file_to_add.txt")
    draft_artifact.remove("dir_to_remove/")
    run.log_artifact(
        artifact
    )  # log your changes to create a new version and mark it as output to your run

client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")  # load your artifact
draft_artifact = saved_artifact.new_draft()  # create a draft version

# modify a subset of files in the draft version
draft_artifact.remove("deleted_file.txt")
draft_artifact.add_file("modified_file.txt")
draft_artifact.save()  # commit changes to the draft

6 - Track external files

Track files saved outside the W&B such as in an Amazon S3 bucket, GCS bucket, HTTP file server, or even an NFS share.

Use reference artifacts to track files saved outside the W&B system, for example in an Amazon S3 bucket, GCS bucket, Azure blob, HTTP file server, or even an NFS share. Log artifacts outside of a W&B Run with the W&B CLI.

Log artifacts outside of runs

W&B creates a run when you log an artifact outside of a run. Each artifact belongs to a run, which in turn belongs to a project. An artifact (version) also belongs to a collection, and has a type.

Use the wandb artifact put command to upload an artifact to the W&B server outside of a W&B run. Provide the name of the project you want the artifact to belong to along with the name of the artifact (project/artifact_name).Optionally provide the type (TYPE). Replace PATH in the code snippet below with the file path of the artifact you want to upload.

$ wandb artifact put --name project/artifact_name --type TYPE PATH

W&B will create a new project if a the project you specify does not exist. For information on how to download an artifact, see Download and use artifacts.

Track artifacts outside of W&B

Use W&B Artifacts for dataset versioning and model lineage, and use reference artifacts to track files saved outside the W&B server. In this mode an artifact only stores metadata about the files, such as URLs, size, and checksums. The underlying data never leaves your system. See the Quick start for information on how to save files and directories to W&B servers instead.

The following describes how to construct reference artifacts and how to best incorporate them into your workflows.

Amazon S3 / GCS / Azure Blob Storage References

Use W&B Artifacts for dataset and model versioning to track references in cloud storage buckets. With artifact references, seamlessly layer tracking on top of your buckets with no modifications to your existing storage layout.

Artifacts abstract away the underlying cloud storage vendor (such AWS, GCP or Azure). Information described in the proceeding section apply uniformly to Amazon S3, Google Cloud Storage and Azure Blob Storage.

W&B Artifacts support any Amazon S3 compatible interface, including MinIO. The scripts below work as-is, when you set the AWS_S3_ENDPOINT_URL environment variable to point at your MinIO server.

Assume we have a bucket with the following structure:

s3://my-bucket
+-- datasets/
|		+-- mnist/
+-- models/
		+-- cnn/

Under mnist/ we have our dataset, a collection of images. Lets track it with an artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")
run.log_artifact(artifact)

By default, W&B imposes a 10,000 object limit when adding an object prefix. You can adjust this limit by specifying max_objects= in calls to add_reference.

Our new reference artifact mnist:latest looks and behaves similarly to a regular artifact. The only difference is that the artifact only consists of metadata about the S3/GCS/Azure object such as its ETag, size, and version ID (if object versioning is enabled on the bucket).

W&B will use the default mechanism to look for credentials based on the cloud provider you use. Read the documentation from your cloud provider to learn more about the credentials used:

Cloud provider	Credentials Documentation
AWS	Boto3 documentation
GCP	Google Cloud documentation
Azure	Azure documentation

For AWS, if the bucket is not located in the configured user’s default region, you must set the AWS_REGION environment variable to match the bucket region.

Interact with this artifact similarly to a normal artifact. In the App UI, you can look through the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact.

Rich media such as images, audio, video, and point clouds may fail to render in the App UI depending on the CORS configuration of your bucket. Allow listing app.wandb.ai in your bucket’s CORS settings will allow the App UI to properly render such rich media.

Panels might fail to render in the App UI for private buckets. If your company has a VPN, you could update your bucket’s access policy to whitelist IPs within your VPN.

Download a reference artifact

import wandb

run = wandb.init()
artifact = run.use_artifact("mnist:latest", type="dataset")
artifact_dir = artifact.download()

W&B will use the metadata recorded when the artifact was logged to retrieve the files from the underlying bucket when it downloads a reference artifact. If your bucket has object versioning enabled, W&B will retrieve the object version corresponding to the state of the file at the time an artifact was logged. This means that as you evolve the contents of your bucket, you can still point to the exact iteration of your data a given model was trained on since the artifact serves as a snapshot of your bucket at the time of training.

W&B recommends that you enable ‘Object Versioning’ on your storage buckets if you overwrite files as part of your workflow. With versioning enabled on your buckets, artifacts with references to files that have been overwritten will still be intact because the older object versions are retained.

Based on your use case, read the instructions to enable object versioning: AWS, GCP, Azure.

Tying it together

The following code example demonstrates a simple workflow you can use to track a dataset in Amazon S3, GCS, or Azure that feeds into a training job:

import wandb

run = wandb.init()

artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")

# Track the artifact and mark it as an input to
# this run in one swoop. A new artifact version
# is only logged if the files in the bucket changed.
run.use_artifact(artifact)

artifact_dir = artifact.download()

# Perform training here...

To track models, we can log the model artifact after the training script uploads the model files to the bucket:

import boto3
import wandb

run = wandb.init()

# Training here...

s3_client = boto3.client("s3")
s3_client.upload_file("my_model.h5", "my-bucket", "models/cnn/my_model.h5")

model_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("s3://my-bucket/models/cnn/")
run.log_artifact(model_artifact)

Read through the following reports for an end-to-end walkthrough of how to track artifacts by reference for GCP or Azure:

Filesystem References

Another common pattern for fast access to datasets is to expose an NFS mount point to a remote filesystem on all machines running training jobs. This can be an even simpler solution than a cloud storage bucket because from the perspective of the training script, the files look just like they are sitting on your local filesystem. Luckily, that ease of use extends into using Artifacts to track references to file systems, whether they are mounted or not.

Assume we have a filesystem mounted at /mount with the following structure:

mount
+-- datasets/
|		+-- mnist/
+-- models/
		+-- cnn/

Under mnist/ we have our dataset, a collection of images. Let’s track it with an artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")
run.log_artifact(artifact)

By default, W&B imposes a 10,000 file limit when adding a reference to a directory. You can adjust this limit by specifying max_objects= in calls to add_reference.

Note the triple slash in the URL. The first component is the file:// prefix that denotes the use of filesystem references. The second is the path to our dataset, /mount/datasets/mnist/.

The resulting artifact mnist:latest looks and acts just like a regular artifact. The only difference is that the artifact only consists of metadata about the files, such as their sizes and MD5 checksums. The files themselves never leave your system.

You can interact with this artifact just as you would a normal artifact. In the UI, you can browse the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. However, the UI will not be able to render rich media such as images, audio, etc. as the data itself is not contained within the artifact.

Downloading a reference artifact is simple:

import wandb

run = wandb.init()
artifact = run.use_artifact("entity/project/mnist:latest", type="dataset")
artifact_dir = artifact.download()

For filesystem references, a download() operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of /mount/datasets/mnist will be copied into the directory artifacts/mnist:v0/. If an artifact contains a reference to a file that was overwritten, then download() will throw an error as the artifact can no longer be reconstructed.

Putting everything together, here’s a simple workflow you can use to track a dataset under a mounted filesystem that feeds into a training job:

import wandb

run = wandb.init()

artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")

# Track the artifact and mark it as an input to
# this run in one swoop. A new artifact version
# is only logged if the files under the directory
# changed.
run.use_artifact(artifact)

artifact_dir = artifact.download()

# Perform training here...

To track models, we can log the model artifact after the training script writes the model files to the mount point:

import wandb

run = wandb.init()

# Training here...

# Write model to disk

model_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("file:///mount/cnn/my_model.h5")
run.log_artifact(model_artifact)

7 - Manage data

7.1 - Delete an artifact

Delete artifacts interactively with the App UI or programmatically with the W&B SDK/

Delete artifacts interactively with the App UI or programmatically with the W&B SDK. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage.

The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions.

The sections in this page describe how to delete specific artifact versions, how to delete an artifact collection, how to delete artifacts with and without aliases, and more. You can schedule when artifacts are deleted from W&B with TTL policies. For more information, see Manage data retention with Artifact TTL policy.

Artifacts that are scheduled for deletion with a TTL policy, deleted with the W&B SDK, or deleted with the W&B App UI are first soft-deleted. Artifacts that are soft deleted undergo garbage collection before they are hard-deleted.

Delete an artifact version

To delete an artifact version:

Select the name of the artifact. This will expand the artifact view and list all the artifact versions associated with that artifact.
From the list of artifacts, select the artifact version you want to delete.
On the right hand side of the workspace, select the kebab dropdown.
Choose Delete.

An artifact version can also be deleted programatically via the delete() method. See the examples below.

Delete multiple artifact versions with aliases

The following code example demonstrates how to delete artifacts that have aliases associated with them. Provide the entity, project name, and run ID that created the artifacts.

import wandb

run = api.run("entity/project/run_id")

for artifact in run.logged_artifacts():
    artifact.delete()

Set the delete_aliases parameter to the boolean value, True to delete aliases if the artifact has one or more aliases.

import wandb

run = api.run("entity/project/run_id")

for artifact in run.logged_artifacts():
    # Set delete_aliases=True in order to delete
    # artifacts with one more aliases
    artifact.delete(delete_aliases=True)

Delete multiple artifact versions with a specific alias

The proceeding code demonstrates how to delete multiple artifact versions that have a specific alias. Provide the entity, project name, and run ID that created the artifacts. Replace the deletion logic with your own:

import wandb

runs = api.run("entity/project_name/run_id")

# Delete artifact ith alias 'v3' and 'v4
for artifact_version in runs.logged_artifacts():
    # Replace with your own deletion logic.
    if artifact_version.name[-2:] == "v3" or artifact_version.name[-2:] == "v4":
        artifact.delete(delete_aliases=True)

Delete all versions of an artifact that do not have an alias

The following code snippet demonstrates how to delete all versions of an artifact that do not have an alias. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively. Replace the <> with the name of your artifact:

import wandb

# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})

artifact_type, artifact_name = "<>"  # provide type and name
for v in api.artifact_versions(artifact_type, artifact_name):
    # Clean up versions that don't have an alias such as 'latest'.
    # NOTE: You can put whatever deletion logic you want here.
    if len(v.aliases) == 0:
        v.delete()

Delete an artifact collection

To delete an artifact collection:

Navigate to the artifact collection you want to delete and hover over it.
Select the kebab dropdown next to the artifact collection name.
Choose Delete.

You can also delete artifact collection programmatically with the delete() method. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively:

import wandb

# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})
collection = api.artifact_collection(
    "<artifact_type>", "entity/project/artifact_collection_name"
)
collection.delete()

How to enable garbage collection based on how W&B is hosted

Garbage collection is enabled by default if you use W&B’s shared cloud. Based on how you host W&B, you might need to take additional steps to enable garbage collection, this includes:

Set the GORILLA_ARTIFACT_GC_ENABLED environment variable to true: GORILLA_ARTIFACT_GC_ENABLED=true
Enable bucket versioning if you use AWS, GCP or any other storage provider such as Minio. If you use Azure, enable soft deletion.
Soft deletion in Azure is equivalent to bucket versioning in other storage providers.

The following table describes how to satisfy requirements to enable garbage collection based on your deployment type.

The X indicates you must satisfy the requirement:

	Environment variable	Enable versioning
Shared cloud
Shared cloud with secure storage connector		X
Dedicated cloud
Dedicated cloud with secure storage connector		X
Customer-managed cloud	X	X
Customer managed on-prem	X	X

note Secure storage connector is currently only available for Google Cloud Platform and Amazon Web Services.

7.2 - Manage artifact data retention

Time to live policies (TTL)

Try in Colab

Schedule when artifacts are deleted from W&B with W&B Artifact time-to-live (TTL) policy. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W&B deletes artifacts, see the Delete artifacts page.

Check out this video tutorial to learn how to manage data retention with Artifacts TTL in the W&B App.

W&B deactivates the option to set a TTL policy for model artifacts linked to the Model Registry. This is to help ensure that linked models do not accidentally expire if used in production workflows.

Only team admins can view a team’s settings and access team level TTL settings such as (1) permitting who can set or edit a TTL policy or (2) setting a team default TTL.
If you do not see the option to set or edit a TTL policy in an artifact’s details in the W&B App UI or if setting a TTL programmatically does not successfully change an artifact’s TTL property, your team admin has not given you permissions to do so.

Auto-generated Artifacts

Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W&B cannot have TTL policies set for them.

The following Artifact types indicate an auto-generated Artifact:

run_table
code
job
Any Artifact type starting with: wandb-*

You can check an Artifact’s type on the W&B platform or programmatically:

import wandb

run = wandb.init(project="<my-project-name>")
artifact = run.use_artifact(artifact_or_name="<my-artifact-name>")
print(artifact.type)

Replace the values enclosed with <> with your own.

Define who can edit and set TTL policies

Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions.

Only team admins can define who can set or edit a TTL policy.

Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
From the TTL permissions dropdown, select who can set and edit TTL policies.
Click on Review and save settings.
Confirm the changes and select Save settings.

Create a TTL policy

Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created.

For all the code snippets below, replace the content wrapped in <> with your information to use the code snippet.

Set a TTL policy when you create an artifact

Use the W&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days.

Defining a TTL policy when you create an artifact is similar to how you normally create an artifact. With the exception that you pass in a time delta to the artifact’s ttl attribute.

The steps are as follows:

Create an artifact.
Add content to the artifact such as files, a directory, or a reference.
Define a TTL time limit with the datetime.timedelta data type that is part of Python’s standard library.
Log the artifact.

The following code snippet demonstrates how to create an artifact and set a TTL policy.

import wandb
from datetime import timedelta

run = wandb.init(project="<my-project-name>", entity="<my-entity>")
artifact = wandb.Artifact(name="<artifact-name>", type="<type>")
artifact.add_file("<my_file>")

artifact.ttl = timedelta(days=30)  # Set TTL policy
run.log_artifact(artifact)

The preceding code snippet sets the TTL policy for the artifact to 30 days. In other words, W&B deletes the artifact after 30 days.

Set or edit a TTL policy after you create an artifact

Use the W&B App UI or the W&B Python SDK to define a TTL policy for an artifact that already exists.

When you modify an artifact’s TTL, the time the artifact takes to expire is still calculated using the artifact’s createdAt timestamp.

Fetch your artifact.
Pass in a time delta to the artifact’s ttl attribute.
Update the artifact with the save method.

The following code snippet shows how to set a TTL policy for an artifact:

import wandb
from datetime import timedelta

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = timedelta(days=365 * 2)  # Delete in two years
artifact.save()

The preceding code example sets the TTL policy to two years.

Navigate to your W&B project in the W&B App UI.
Select the artifact icon on the left panel.
From the list of artifacts, expand the artifact type you
Select on the artifact version you want to edit the TTL policy for.
Click on the Version tab.
From the dropdown, select Edit TTL policy.
Within the modal that appears, select Custom from the TTL policy dropdown.
Within the TTL duration field, set the TTL policy in units of days.
Select the Update TTL button to save your changes.

Set default TTL policies for a team

Only team admins can set a default TTL policy for a team.

Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team’s default TTL.

Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
Click on the Set team’s default TTL policy.
Within the Duration field, set the TTL policy in units of days.
Click on Review and save settings. 7/ Confirm the changes and then select Save settings.

Set a TTL policy outside of a run

Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days.

The following code sample shows how to fetch an artifact using the public API and set the TTL policy.

api = wandb.Api()

artifact = api.artifact("entity/project/artifact:alias")

artifact.ttl = timedelta(days=365)  # Delete in one year

artifact.save()

Deactivate a TTL policy

Use the W&B Python SDK or W&B App UI to deactivate a TTL policy for a specific artifact version.

Fetch your artifact.
Set the artifact’s ttl attribute to None.
Update the artifact with the save method.

The following code snippet shows how to turn off a TTL policy for an artifact:

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = None
artifact.save()

Navigate to your W&B project in the W&B App UI.
Select the artifact icon on the left panel.
From the list of artifacts, expand the artifact type you
Select on the artifact version you want to edit the TTL policy for.
Click on the Version tab.
Click on the meatball UI icon next to the Link to registry button.
From the dropdown, select Edit TTL policy.
Within the modal that appears, select Deactivate from the TTL policy dropdown.
Select the Update TTL button to save your changes.

View TTL policies

View TTL policies for artifacts with the Python SDK or with the W&B App UI.

Use a print statement to view an artifact’s TTL policy. The following example shows how to retrieve an artifact and view its TTL policy:

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
print(artifact.ttl)

View a TTL policy for an artifact with the W&B App UI.

Navigate to the W&B App at https://wandb.ai.
Go to your W&B Project.
Within your project, select the Artifacts tab in the left sidebar.
Click on a collection.

Within the collection view you can see all of the artifacts in the selected collection. Within the Time to Live column you will see the TTL policy assigned to that artifact.

7.3 - Manage artifact storage and memory allocation

Manage storage, memory allocation of W&B Artifacts.

W&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit.

For sensitive files, we recommend you set up Private Hosting or use reference artifacts.

During training, W&B locally saves logs, artifacts, and configuration files in the following local directories:

File	Default location	To change default location set:
logs	`./wandb`	`dir` in `wandb.init` or set the `WANDB_DIR` environment variable
artifacts	`~/.cache/wandb`	the `WANDB_CACHE_DIR` environment variable
configs	`~/.config/wandb`	the `WANDB_CONFIG_DIR` environment variable
staging artifacts for upload	`~/.cache/wandb-data/`	the `WANDB_DATA_DIR` environment variable
downloaded artifacts	`./artifacts`	the `WANDB_ARTIFACT_DIR` environment variable

For a complete guide to using environment variables to configure W&B, see the environment variables reference.

Depending on the machine on wandb is initialized on, these default folders may not be located in a writeable part of the file system. This might trigger an error.

Clean up local artifact cache

W&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the wandb artifact cache cleanup command to prune the cache and to remove any files that have not been used recently.

The proceeding code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal:

$ wandb artifact cache cleanup 1GB

8 - Explore artifact graphs

Traverse automatically created direct acyclic W&B Artifact graphs.

W&B automatically tracks the artifacts a given run logged as well as the artifacts a given run uses. These artifacts can include datasets, models, evaluation results, or more. You can explore an artifact’s lineage to track and manage the various artifacts produced throughout the machine learning lifecycle.

Lineage

Tracking an artifact’s lineage has several key benefits:

Reproducibility: By tracking the lineage of all artifacts, teams can reproduce experiments, models, and results, which is essential for debugging, experimentation, and validating machine learning models.
Version Control: Artifact lineage involves versioning artifacts and tracking their changes over time. This allows teams to roll back to previous versions of data or models if needed.
Auditing: Having a detailed history of the artifacts and their transformations enables organizations to comply with regulatory and governance requirements.
Collaboration and Knowledge Sharing: Artifact lineage facilitates better collaboration among team members by providing a clear record of attempts as well as what worked, and what didn’t. This helps in avoiding duplication of efforts and accelerates the development process.

Finding an artifact’s lineage

When selecting an artifact in the Artifacts tab, you can see your artifact’s lineage. This graph view shows a general overview of your pipeline.

To view an artifact graph:

Navigate to your project in the W&B App UI
Choose the artifact icon on the left panel.
Select Lineage.

Navigating the lineage graph

The artifact or job type you provide appears in front of its name, with artifacts represented by blue icons and runs represented by green icons. Arrows detail the input and output of a run or artifact on the graph.

You can view the type and the name of artifact in both the left sidebar and in the Lineage tab.

For a more detailed view, click any individual artifact or run to get more information on a particular object.

Artifact clusters

When a level of the graph has five or more runs or artifacts, it creates a cluster. A cluster has a search bar to find specific versions of runs or artifacts and pulls an individual node from a cluster to continue investigating the lineage of a node inside a cluster.

Clicking on a node opens a preview with an overview of the node. Clicking on the arrow extracts the individual run or artifact so you can examine the lineage of the extracted node.

Use the API to track lineage

You can also navigate a graph using the W&B API.

Create an artifact. First, create a run with wandb.init. Then,create a new artifact or retrieve an existing one with wandb.Artifact. Next, add files to the artifact with .add_file. Finally, log the artifact to the run with .log_artifact. The finished code looks something like this:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Use the artifact object’s logged_by and used_by methods to walk the graph from the artifact:

# Walk up and down the graph from an artifact:
producer_run = artifact.logged_by()
consumer_runs = artifact.used_by()

Next steps

9 - Artifact data privacy and compliance

Learn where W&B files are stored by default. Explore how to save, store sensitive information.

Files are uploaded to Google Cloud bucket managed by W&B when you log artifacts. The contents of the bucket are encrypted both at rest and in transit. Artifact files are only visible to users who have access to the corresponding project.

When you delete a version of an artifact, it is marked for soft deletion in our database and removed from your storage cost. When you delete an entire artifact, it is queued for permanently deletion and all of its contents are removed from the W&B bucket. If you have specific needs around file deletion please reach out to Customer Support.

For sensitive datasets that cannot reside in a multi-tenant environment, you can use either a private W&B server connected to your cloud bucket or reference artifacts. Reference artifacts track references to private buckets without sending file contents to W&B. Reference artifacts maintain links to files on your buckets or servers. In other words, W&B only keeps track of the metadata associated with the files and not the files themselves.

Create a reference artifact similar to how you create a non reference artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("animals", type="dataset")
artifact.add_reference("s3://my-bucket/animals")

For alternatives, contact us at contact@wandb.com to talk about private cloud and on-premises installations.

10 - Tutorial: Create, track, and use a dataset artifact

Artifacts quickstart shows how to create, track, and use a dataset artifact with W&B.

This walkthrough demonstrates how to create, track, and use a dataset artifact from W&B Runs.

1. Log into W&B

Import the W&B library and log in to W&B. You will need to sign up for a free W&B account if you have not done so already.

import wandb

wandb.login()

2. Initialize a run

Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. Provide a project name and a job type:

# Create a W&B Run. Here we specify 'dataset' as the job type since this example
# shows how to create a dataset artifact.
run = wandb.init(project="artifacts-example", job_type="upload-dataset")

3. Create an artifact object

Create an artifact object with the wandb.Artifact() API. Provide a name for the artifact and a description of the file type for the name and type parameters, respectively.

For example, the following code snippet demonstrates how to create an artifact called ‘bicycle-dataset’ with a ‘dataset’ label:

artifact = wandb.Artifact(name="bicycle-dataset", type="dataset")

For more information about how to construct an artifact, see Construct artifacts.

Add the dataset to the artifact

Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named dataset.h5 that is saved locally on our machine to the artifact:

# Add a file to the artifact's contents
artifact.add_file(local_path="dataset.h5")

Replace the filename dataset.h5 in the preceding code snippet with the path to the file you want to add to the artifact.

4. Log the dataset

Use the W&B run objects log_artifact() method to both save your artifact version and declare the artifact as an output of the run.

# Save the artifact version to W&B and mark it
# as the output of this run
run.log_artifact(artifact)

A 'latest' alias is created by default when you log an artifact. For more information about artifact aliases and versions, see Create a custom alias and Create new artifact versions, respectively.

5. Download and use the artifact

The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W&B servers.

First, initialize a new run object with wandb.init().
Second, use the run objects use_artifact() method to tell W&B what artifact to use. This returns an artifact object.
Third, use the artifacts download() method to download the contents of the artifact.

# Create a W&B Run. Here we specify 'training' for 'type'
# because we will use this run to track training.
run = wandb.init(project="artifacts-example", job_type="training")

# Query W&B for an artifact and mark it as input to this run
artifact = run.use_artifact("bicycle-dataset:latest")

# Download the artifact's contents
artifact_dir = artifact.download()

Alternatively, you can use the Public API (wandb.Api) to export (or update data) data already saved in a W&B outside of a Run. See Track external files for more information.

Artifacts

Use cases

Create an artifact

Download an artifact

Next steps

1 - Create an artifact

How to construct an artifact

1. Create an artifact Python object with wandb.Artifact()

2. Add one more files to the artifact

3. Save your artifact to the W&B server

Add files to an artifact

Add a single file

Add multiple files

Add a URI reference

Add files to artifacts from parallel runs

2 - Download and use artifacts

Download and use an artifact stored on W&B

Partially download an artifact

Use an artifact from a different project

Construct and use an artifact simultaneously

3 - Update an artifact

4 - Create an artifact alias

5 - Create an artifact version

Create new artifact versions from scratch

Single run

Distributed runs

Run 1:

Run 2:

Run 3

Create a new artifact version from an existing version

6 - Track external files

Log artifacts outside of runs

Track artifacts outside of W&B

Amazon S3 / GCS / Azure Blob Storage References

Download a reference artifact

Tying it together

Filesystem References

7 - Manage data

7.1 - Delete an artifact

Delete an artifact version

Delete multiple artifact versions with aliases

Delete multiple artifact versions with a specific alias

Delete all versions of an artifact that do not have an alias

Delete an artifact collection

How to enable garbage collection based on how W&B is hosted

7.2 - Manage artifact data retention

Auto-generated Artifacts

Define who can edit and set TTL policies

Create a TTL policy

Set a TTL policy when you create an artifact

Set or edit a TTL policy after you create an artifact

Set default TTL policies for a team

Set a TTL policy outside of a run

Deactivate a TTL policy

View TTL policies

7.3 - Manage artifact storage and memory allocation

Clean up local artifact cache

8 - Explore artifact graphs

Lineage

Finding an artifact’s lineage

Navigating the lineage graph

Artifact clusters

Use the API to track lineage

Next steps

9 - Artifact data privacy and compliance

10 - Tutorial: Create, track, and use a dataset artifact

1. Log into W&B

2. Initialize a run

3. Create an artifact object

Add the dataset to the artifact

4. Log the dataset

5. Download and use the artifact

1. Create an artifact Python object with `wandb.Artifact()`