This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

W&B Core

W&B Core is the foundational framework supporting W&B Models and W&B Weave, and is itself supported by the W&B Platform.

W&B Core provides capabilities across the entire ML lifecycle. With W&B Core, you can:

1 - Artifacts

Overview of what W&B Artifacts are, how they work, and how to get started using W&B Artifacts.

Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. You can log hyperparameters, metadatra, and metrics to a run, and you can use an artifact to log, track, and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as output.

Use cases

You can use artifacts throughout your entire ML workflow as inputs and outputs of runs. You can use datasets, models, or even other artifacts as inputs for processing.

Use Case Input Output
Model Training Dataset (training and validation data) Trained Model
Dataset Pre-Processing Dataset (raw data) Dataset (pre-processed data)
Model Evaluation Model + Dataset (test data) W&B Table
Model Optimization Model Optimized Model

Create an artifact

Create an artifact with four lines of code:

  1. Create a W&B run.
  2. Create an artifact object with the wandb.Artifact API.
  3. Add one or more files, such as a model file or dataset, to your artifact object.
  4. Log your artifact to W&B.

For example, the proceeding code snippet shows how to log a file called dataset.h5 to an artifact called example_artifact:

import wandb

run = wandb.init(project = "artifacts-example", job_type = "add-dataset")
artifact = wandb.Artifact(name = "example_artifact", type = "dataset")
artifact.add_file(local_path = "./dataset.h5", name = "training_dataset")
artifact.save()

# Logs the artifact version "my_data" as a dataset with data from dataset.h5

Download an artifact

Indicate the artifact you want to mark as input to your run with the use_artifact method.

Following the preceding code snippet, this next code block shows how to use the training_dataset artifact:

artifact = run.use_artifact("training_dataset:latest") #returns a run object using the "my_data" artifact

This returns an artifact object.

Next, use the returned object to download all contents of the artifact:

datadir = artifact.download() #downloads the full "my_data" artifact to the default directory.

Next steps

1.1 - Create an artifact

Create, construct a W&B Artifact. Learn how to add one or more files or a URI reference to an Artifact.

Use the W&B Python SDK to construct artifacts from W&B Runs. You can add files, directories, URIs, and files from parallel runs to artifacts. After you add a file to an artifact, save the artifact to the W&B Server or your own private server.

For information on how to track external files, such as files stored in Amazon S3, see the Track external files page.

How to construct an artifact

Construct a W&B Artifact in three steps:

1. Create an artifact Python object with wandb.Artifact()

Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:

  • Name: Specify a name for your artifact. The name should be unique, descriptive, and easy to remember. Use an artifacts name to both: identify the artifact in the W&B App UI and when you want to use that artifact.
  • Type: Provide a type. The type should be simple, descriptive and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.

You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see wandb.Artifact Class definition in the Python SDK Reference Guide.

The proceeding example demonstrates how to create a dataset artifact:

import wandb

artifact = wandb.Artifact(name="<replace>", type="<replace>")

Replace the string arguments in the preceding code snippet with your own name and type.

2. Add one more files to the artifact

Add files, directories, external URI references (such as Amazon S3) and more with artifact methods. For example, to add a single text file, use the add_file method:

artifact.add_file(local_path="hello_world.txt", name="optional-name")

You can also add multiple files with the add_dir method. For more information on how to add files, see Update an artifact.

3. Save your artifact to the W&B server

Finally, save your artifact to the W&B server. Artifacts are associated with a run. Therefore, use a run objects log_artifact() method to save the artifact.

# Create a W&B Run. Replace 'job-type'.
run = wandb.init(project="artifacts-example", job_type="job-type")

run.log_artifact(artifact)

You can optionally construct an artifact outside of a W&B run. For more information, see Track external files.

Add files to an artifact

The following sections demonstrate how to construct artifacts with different file types and from parallel runs.

For the following examples, assume you have a project directory with multiple files and a directory structure:

project-directory
|-- images
|   |-- cat.png
|   +-- dog.png
|-- checkpoints
|   +-- model.h5
+-- model.h5

Add a single file

The proceeding code snippet demonstrates how to add a single, local file to your artifact:

# Add a single file
artifact.add_file(local_path="path/file.format")

For example, suppose you had a file called 'file.txt' in your working local directory.

artifact.add_file("path/file.txt")  # Added as `file.txt'

The artifact now has the following content:

file.txt

Optionally, pass the desired path within the artifact for the name parameter.

artifact.add_file(local_path="path/file.format", name="new/path/file.format")

The artifact is stored as:

new/path/file.txt
API Call Resulting artifact
artifact.add_file('model.h5') model.h5
artifact.add_file('checkpoints/model.h5') model.h5
artifact.add_file('model.h5', name='models/mymodel.h5') models/mymodel.h5

Add multiple files

The proceeding code snippet demonstrates how to add an entire, local directory to your artifact:

# Recursively add a directory
artifact.add_dir(local_path="path/file.format", name="optional-prefix")

The proceeding API calls produce the proceeding artifact content:

API Call Resulting artifact
artifact.add_dir('images')

cat.png

dog.png

artifact.add_dir('images', name='images')

images/cat.png

images/dog.png

artifact.new_file('hello.txt') hello.txt

Add a URI reference

Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle.

Add an external URI reference to an artifact with the add_reference method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.

# Add a URI reference
artifact.add_reference(uri="uri", name="optional-name")

Artifacts currently support the following URI schemes:

  • http(s)://: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
  • s3://: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
  • gs://: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.

The proceeding API calls will produce the proceeding artifacts:

API call Resulting artifact contents
artifact.add_reference('s3://my-bucket/model.h5') model.h5
artifact.add_reference('s3://my-bucket/checkpoints/model.h5') model.h5
artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5') models/mymodel.h5
artifact.add_reference('s3://my-bucket/images')

cat.png

dog.png

artifact.add_reference('s3://my-bucket/images', name='images')

images/cat.png

images/dog.png

Add files to artifacts from parallel runs

For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.

import wandb
import time

# We will use ray to launch our runs in parallel
# for demonstration purposes. You can orchestrate
# your parallel runs however you want.
import ray

ray.init()

artifact_type = "dataset"
artifact_name = "parallel-artifact"
table_name = "distributed_table"
parts_path = "parts"
num_parallel = 5

# Each batch of parallel writers should have its own
# unique group name.
group_name = "writer-group-{}".format(round(time.time()))


@ray.remote
def train(i):
    """
    Our writer job. Each writer will add one image to the artifact.
    """
    with wandb.init(group=group_name) as run:
        artifact = wandb.Artifact(name=artifact_name, type=artifact_type)

        # Add data to a wandb table. In this case we use example data
        table = wandb.Table(columns=["a", "b", "c"], data=[[i, i * 2, 2**i]])

        # Add the table to folder in the artifact
        artifact.add(table, "{}/table_{}".format(parts_path, i))

        # Upserting the artifact creates or appends data to the artifact
        run.upsert_artifact(artifact)


# Launch your runs in parallel
result_ids = [train.remote(i) for i in range(num_parallel)]

# Join on all the writers to make sure their files have
# been added before finishing the artifact.
ray.get(result_ids)

# Once all the writers are finished, finish the artifact
# to mark it ready.
with wandb.init(group=group_name) as run:
    artifact = wandb.Artifact(artifact_name, type=artifact_type)

    # Create a "PartitionTable" pointing to the folder of tables
    # and add it to the artifact.
    artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)

    # Finish artifact finalizes the artifact, disallowing future "upserts"
    # to this version.
    run.finish_artifact(artifact)

1.2 - Download and use artifacts

Download and use Artifacts from multiple projects.

Download and use an artifact that is already stored on the W&B server or construct an artifact object and pass it in to for de-duplication as necessary.

Download and use an artifact stored on W&B

Download and use an artifact stored in W&B either inside or outside of a W&B Run. Use the Public API (wandb.Api) to export (or update data) already saved in W&B. For more information, see the W&B Public API Reference guide.

First, import the W&B Python SDK. Next, create a W&B Run:

import wandb

run = wandb.init(project="<example>", job_type="<job-type>")

Indicate the artifact you want to use with the use_artifact method. This returns a run object. In the proceeding code snippet specifies an artifact called 'bike-dataset' with the alias 'latest':

artifact = run.use_artifact("bike-dataset:latest")

Use the object returned to download all the contents of the artifact:

datadir = artifact.download()

You can optionally pass a path to the root parameter to download the contents of the artifact to a specific directory. For more information, see the Python SDK Reference Guide.

Use the get_path method to download only subset of files:

path = artifact.get_path(name)

This fetches only the file at the path name. It returns an Entry object with the following methods:

  • Entry.download: Downloads file from the artifact at path name
  • Entry.ref: If add_reference stored the entry as a reference, returns the URI

References that have schemes that W&B knows how to handle get downloaded just like artifact files. For more information, see Track external files.

First, import the W&B SDK. Next, create an artifact from the Public API Class. Provide the entity, project, artifact, and alias associated with that artifact:

import wandb

api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")

Use the object returned to download the contents of the artifact:

artifact.download()

You can optionally pass a path the root parameter to download the contents of the artifact to a specific directory. For more information, see the API Reference Guide.

Use the wandb artifact get command to download an artifact from the W&B server.

$ wandb artifact get project/artifact:alias --root mnist/

Partially download an artifact

You can optionally download part of an artifact based on a prefix. Using the path_prefix parameter, you can download a single file or the content of a sub-folder.

artifact = run.use_artifact("bike-dataset:latest")

artifact.download(path_prefix="bike.png") # downloads only bike.png

Alternatively, you can download files from a certain directory:

artifact.download(path_prefix="images/bikes/") # downloads files in the images/bikes directory

Use an artifact from a different project

Specify the name of artifact along with its project name to reference an artifact. You can also reference artifacts across entities by specifying the name of the artifact with its entity name.

The following code example demonstrates how to query an artifact from another project as input to the current W&B run.

import wandb

run = wandb.init(project="<example>", job_type="<job-type>")
# Query W&B for an artifact from another project and mark it
# as an input to this run.
artifact = run.use_artifact("my-project/artifact:alias")

# Use an artifact from another entity and mark it as an input
# to this run.
artifact = run.use_artifact("my-entity/my-project/artifact:alias")

Construct and use an artifact simultaneously

Simultaneously construct and use an artifact. Create an artifact object and pass it to use_artifact. This creates an artifact in W&B if it does not exist yet. The use_artifact API is idempotent, so you can call it as many times as you like.

import wandb

artifact = wandb.Artifact("reference model")
artifact.add_file("model.h5")
run.use_artifact(artifact)

For more information about constructing an artifact, see Construct an artifact.

1.3 - Update an artifact

Update an existing Artifact inside and outside of a W&B Run.

Pass desired values to update the description, metadata, and alias of an artifact. Call the save() method to update the artifact on the W&B servers. You can update an artifact during a W&B Run or outside of a Run.

Use the W&B Public API (wandb.Api) to update an artifact outside of a run. Use the Artifact API (wandb.Artifact) to update an artifact during a run.

The proceeding code example demonstrates how to update the description of an artifact using the wandb.Artifact API:

import wandb

run = wandb.init(project="<example>")
artifact = run.use_artifact("<artifact-name>:<alias>")
artifact.description = "<description>"
artifact.save()

The proceeding code example demonstrates how to update the description of an artifact using the wandb.Api API:

import wandb

api = wandb.Api()

artifact = api.artifact("entity/project/artifact:alias")

# Update the description
artifact.description = "My new description"

# Selectively update metadata keys
artifact.metadata["oldKey"] = "new value"

# Replace the metadata entirely
artifact.metadata = {"newKey": "new value"}

# Add an alias
artifact.aliases.append("best")

# Remove an alias
artifact.aliases.remove("latest")

# Completely replace the aliases
artifact.aliases = ["replaced"]

# Persist all artifact modifications
artifact.save()

For more information, see the Weights and Biases Artifact API.

You can also update an Artifact collection in the same way as a singular artifact:

import wandb
run = wandb.init(project="<example>")
api = wandb.Api()
artifact = api.artifact_collection(type="<type-name>", collection="<collection-name>")
artifact.name = "<new-collection-name>"
artifact.description = "<This is where you'd describe the purpose of your collection.>"
artifact.save()

For more information, see the Artifacts Collection reference.

1.4 - Create an artifact alias

Create custom aliases for W&B Artifacts.

Use aliases as pointers to specific versions. By default, Run.log_artifact adds the latest alias to the logged version.

An artifact version v0 is created and attached to your artifact when you log an artifact for the first time. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.

For example, if you want your training script to pull the most recent version of a dataset, specify latest when you use that artifact. The proceeding code example demonstrates how to download a recent dataset artifact named bike-dataset that has an alias, latest:

import wandb

run = wandb.init(project="<example-project>")

artifact = run.use_artifact("bike-dataset:latest")

artifact.download()

You can also apply a custom alias to an artifact version. For example, if you want to mark that model checkpoint is the best on the metric AP-50, you could add the string 'best-ap50' as an alias when you log the model artifact.

artifact = wandb.Artifact("run-3nq3ctyy-bike-model", type="model")
artifact.add_file("model.h5")
run.log_artifact(artifact, aliases=["latest", "best-ap50"])

1.5 - Create an artifact version

Create a new artifact version from a single run or from a distributed process.

Create a new artifact version with a single run or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an incremental artifact.

Create new artifact versions from scratch

There are two ways to create a new artifact version: from a single run and from distributed runs. They are defined as follows:

  • Single run: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis.
  • Distributed runs: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions.

W&B will create a new artifact and assign it a v0 alias if you pass a name to the wandb.Artifact API that does not exist in your project. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.

W&B will retrieve an existing artifact if you pass a name and artifact type to the wandb.Artifact API that matches an existing artifact in your project. The retrieved artifact will have a version greater than 1.

Single run

Log a new version of an Artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact.

Based on your use case, select one of the tabs below to create a new artifact version inside or outside of a run:

Create an artifact version within a W&B run:

  1. Create a run with wandb.init. (Line 1)
  2. Create a new artifact or retrieve an existing one with wandb.Artifact . (Line 2)
  3. Add files to the artifact with .add_file. (Line 9)
  4. Log the artifact to the run with .log_artifact. (Line 10)
with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Create an artifact version outside of a W&B run:

  1. Create a new artifact or retrieve an existing one with wanb.Artifact. (Line 1)
  2. Add files to the artifact with .add_file. (Line 4)
  3. Save the artifact with .save. (Line 5)
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
artifact.save()

Distributed runs

Allow a collection of runs to collaborate on a version before committing it. This is in contrast to single run mode described above where one run provides all the data for a new version.

Consider the following example. Different runs (labelled below as Run 1, Run 2, and Run 3) add a different image file to the same artifact with upsert_artifact.

Run 1:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 2:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image2.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 3

Must run after Run 1 and Run 2 complete. The Run that calls finish_artifact can include files in the artifact, but does not need to.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image3.png")
    run.finish_artifact(artifact, distributed_id="my_dist_artifact")

Create a new artifact version from an existing version

Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn’t change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an incremental artifact.

Here are some scenarios for each type of incremental change you might encounter:

  • add: you periodically add a new subset of files to a dataset after collecting a new batch.
  • remove: you discovered several duplicate files and want to remove them from your artifact.
  • update: you corrected annotations for a subset of files and want to replace the old files with the correct ones.

You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you will need to have all the contents of your artifact on your local disk. When making an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version.

Follow the procedure below to incrementally change an artifact:

  1. Obtain the artifact version you want to perform an incremental change on:
saved_artifact = run.use_artifact("my_artifact:latest")
client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")
  1. Create a draft with:
draft_artifact = saved_artifact.new_draft()
  1. Perform any incremental changes you want to see in the next version. You can either add, remove, or modify an existing entry.

Select one of the tabs for an example on how to perform each of these changes:

Add a file to an existing artifact version with the add_file method:

draft_artifact.add_file("file_to_add.txt")

Remove a file from an existing artifact version with the remove method:

draft_artifact.remove("file_to_remove.txt")

Modify or replace contents by removing the old contents from the draft and adding the new contents back in:

draft_artifact.remove("modified_file.txt")
draft_artifact.add_file("modified_file.txt")
  1. Lastly, log or save your changes. The following tabs show you how to save your changes inside and outside of a W&B run. Select the tab that is appropriate for your use case:
run.log_artifact(draft_artifact)
draft_artifact.save()

Putting it all together, the code examples above look like:

with wandb.init(job_type="modify dataset") as run:
    saved_artifact = run.use_artifact(
        "my_artifact:latest"
    )  # fetch artifact and input it into your run
    draft_artifact = saved_artifact.new_draft()  # create a draft version

    # modify a subset of files in the draft version
    draft_artifact.add_file("file_to_add.txt")
    draft_artifact.remove("dir_to_remove/")
    run.log_artifact(
        artifact
    )  # log your changes to create a new version and mark it as output to your run
client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")  # load your artifact
draft_artifact = saved_artifact.new_draft()  # create a draft version

# modify a subset of files in the draft version
draft_artifact.remove("deleted_file.txt")
draft_artifact.add_file("modified_file.txt")
draft_artifact.save()  # commit changes to the draft

1.6 - Track external files

Track files saved outside the W&B such as in an Amazon S3 bucket, GCS bucket, HTTP file server, or even an NFS share.

Use reference artifacts to track files saved outside the W&B system, for example in an Amazon S3 bucket, GCS bucket, Azure blob, HTTP file server, or even an NFS share. Log artifacts outside of a W&B Run with the W&B CLI.

Log artifacts outside of runs

W&B creates a run when you log an artifact outside of a run. Each artifact belongs to a run, which in turn belongs to a project. An artifact (version) also belongs to a collection, and has a type.

Use the wandb artifact put command to upload an artifact to the W&B server outside of a W&B run. Provide the name of the project you want the artifact to belong to along with the name of the artifact (project/artifact_name).Optionally provide the type (TYPE). Replace PATH in the code snippet below with the file path of the artifact you want to upload.

$ wandb artifact put --name project/artifact_name --type TYPE PATH

W&B will create a new project if a the project you specify does not exist. For information on how to download an artifact, see Download and use artifacts.

Track artifacts outside of W&B

Use W&B Artifacts for dataset versioning and model lineage, and use reference artifacts to track files saved outside the W&B server. In this mode an artifact only stores metadata about the files, such as URLs, size, and checksums. The underlying data never leaves your system. See the Quick start for information on how to save files and directories to W&B servers instead.

The following describes how to construct reference artifacts and how to best incorporate them into your workflows.

Amazon S3 / GCS / Azure Blob Storage References

Use W&B Artifacts for dataset and model versioning to track references in cloud storage buckets. With artifact references, seamlessly layer tracking on top of your buckets with no modifications to your existing storage layout.

Artifacts abstract away the underlying cloud storage vendor (such AWS, GCP or Azure). Information described in the proceeding section apply uniformly to Amazon S3, Google Cloud Storage and Azure Blob Storage.

Assume we have a bucket with the following structure:

s3://my-bucket
+-- datasets/
|		+-- mnist/
+-- models/
		+-- cnn/

Under mnist/ we have our dataset, a collection of images. Lets track it with an artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")
run.log_artifact(artifact)

Our new reference artifact mnist:latest looks and behaves similarly to a regular artifact. The only difference is that the artifact only consists of metadata about the S3/GCS/Azure object such as its ETag, size, and version ID (if object versioning is enabled on the bucket).

W&B will use the default mechanism to look for credentials based on the cloud provider you use. Read the documentation from your cloud provider to learn more about the credentials used:

Cloud provider Credentials Documentation
AWS Boto3 documentation
GCP Google Cloud documentation
Azure Azure documentation

For AWS, if the bucket is not located in the configured user’s default region, you must set the AWS_REGION environment variable to match the bucket region.

Interact with this artifact similarly to a normal artifact. In the App UI, you can look through the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact.

Download a reference artifact

import wandb

run = wandb.init()
artifact = run.use_artifact("mnist:latest", type="dataset")
artifact_dir = artifact.download()

W&B will use the metadata recorded when the artifact was logged to retrieve the files from the underlying bucket when it downloads a reference artifact. If your bucket has object versioning enabled, W&B will retrieve the object version corresponding to the state of the file at the time an artifact was logged. This means that as you evolve the contents of your bucket, you can still point to the exact iteration of your data a given model was trained on since the artifact serves as a snapshot of your bucket at the time of training.

Tying it together

The following code example demonstrates a simple workflow you can use to track a dataset in Amazon S3, GCS, or Azure that feeds into a training job:

import wandb

run = wandb.init()

artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")

# Track the artifact and mark it as an input to
# this run in one swoop. A new artifact version
# is only logged if the files in the bucket changed.
run.use_artifact(artifact)

artifact_dir = artifact.download()

# Perform training here...

To track models, we can log the model artifact after the training script uploads the model files to the bucket:

import boto3
import wandb

run = wandb.init()

# Training here...

s3_client = boto3.client("s3")
s3_client.upload_file("my_model.h5", "my-bucket", "models/cnn/my_model.h5")

model_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("s3://my-bucket/models/cnn/")
run.log_artifact(model_artifact)

Filesystem References

Another common pattern for fast access to datasets is to expose an NFS mount point to a remote filesystem on all machines running training jobs. This can be an even simpler solution than a cloud storage bucket because from the perspective of the training script, the files look just like they are sitting on your local filesystem. Luckily, that ease of use extends into using Artifacts to track references to file systems, whether they are mounted or not.

Assume we have a filesystem mounted at /mount with the following structure:

mount
+-- datasets/
|		+-- mnist/
+-- models/
		+-- cnn/

Under mnist/ we have our dataset, a collection of images. Let’s track it with an artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")
run.log_artifact(artifact)

By default, W&B imposes a 10,000 file limit when adding a reference to a directory. You can adjust this limit by specifying max_objects= in calls to add_reference.

Note the triple slash in the URL. The first component is the file:// prefix that denotes the use of filesystem references. The second is the path to our dataset, /mount/datasets/mnist/.

The resulting artifact mnist:latest looks and acts just like a regular artifact. The only difference is that the artifact only consists of metadata about the files, such as their sizes and MD5 checksums. The files themselves never leave your system.

You can interact with this artifact just as you would a normal artifact. In the UI, you can browse the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. However, the UI will not be able to render rich media such as images, audio, etc. as the data itself is not contained within the artifact.

Downloading a reference artifact is simple:

import wandb

run = wandb.init()
artifact = run.use_artifact("entity/project/mnist:latest", type="dataset")
artifact_dir = artifact.download()

For filesystem references, a download() operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of /mount/datasets/mnist will be copied into the directory artifacts/mnist:v0/. If an artifact contains a reference to a file that was overwritten, then download() will throw an error as the artifact can no longer be reconstructed.

Putting everything together, here’s a simple workflow you can use to track a dataset under a mounted filesystem that feeds into a training job:

import wandb

run = wandb.init()

artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")

# Track the artifact and mark it as an input to
# this run in one swoop. A new artifact version
# is only logged if the files under the directory
# changed.
run.use_artifact(artifact)

artifact_dir = artifact.download()

# Perform training here...

To track models, we can log the model artifact after the training script writes the model files to the mount point:

import wandb

run = wandb.init()

# Training here...

# Write model to disk

model_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("file:///mount/cnn/my_model.h5")
run.log_artifact(model_artifact)

1.7 - Manage data

1.7.1 - Delete an artifact

Delete artifacts interactively with the App UI or programmatically with the W&B SDK/

Delete artifacts interactively with the App UI or programmatically with the W&B SDK. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage.

The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions.

The sections in this page describe how to delete specific artifact versions, how to delete an artifact collection, how to delete artifacts with and without aliases, and more. You can schedule when artifacts are deleted from W&B with TTL policies. For more information, see Manage data retention with Artifact TTL policy.

Delete an artifact version

To delete an artifact version:

  1. Select the name of the artifact. This will expand the artifact view and list all the artifact versions associated with that artifact.
  2. From the list of artifacts, select the artifact version you want to delete.
  3. On the right hand side of the workspace, select the kebab dropdown.
  4. Choose Delete.

An artifact version can also be deleted programatically via the delete() method. See the examples below.

Delete multiple artifact versions with aliases

The following code example demonstrates how to delete artifacts that have aliases associated with them. Provide the entity, project name, and run ID that created the artifacts.

import wandb

run = api.run("entity/project/run_id")

for artifact in run.logged_artifacts():
    artifact.delete()

Set the delete_aliases parameter to the boolean value, True to delete aliases if the artifact has one or more aliases.

import wandb

run = api.run("entity/project/run_id")

for artifact in run.logged_artifacts():
    # Set delete_aliases=True in order to delete
    # artifacts with one more aliases
    artifact.delete(delete_aliases=True)

Delete multiple artifact versions with a specific alias

The proceeding code demonstrates how to delete multiple artifact versions that have a specific alias. Provide the entity, project name, and run ID that created the artifacts. Replace the deletion logic with your own:

import wandb

runs = api.run("entity/project_name/run_id")

# Delete artifact ith alias 'v3' and 'v4
for artifact_version in runs.logged_artifacts():
    # Replace with your own deletion logic.
    if artifact_version.name[-2:] == "v3" or artifact_version.name[-2:] == "v4":
        artifact.delete(delete_aliases=True)

Delete all versions of an artifact that do not have an alias

The following code snippet demonstrates how to delete all versions of an artifact that do not have an alias. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively. Replace the <> with the name of your artifact:

import wandb

# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})

artifact_type, artifact_name = "<>"  # provide type and name
for v in api.artifact_versions(artifact_type, artifact_name):
    # Clean up versions that don't have an alias such as 'latest'.
    # NOTE: You can put whatever deletion logic you want here.
    if len(v.aliases) == 0:
        v.delete()

Delete an artifact collection

To delete an artifact collection:

  1. Navigate to the artifact collection you want to delete and hover over it.
  2. Select the kebab dropdown next to the artifact collection name.
  3. Choose Delete.

You can also delete artifact collection programmatically with the delete() method. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively:

import wandb

# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})
collection = api.artifact_collection(
    "<artifact_type>", "entity/project/artifact_collection_name"
)
collection.delete()

How to enable garbage collection based on how W&B is hosted

Garbage collection is enabled by default if you use W&B’s shared cloud. Based on how you host W&B, you might need to take additional steps to enable garbage collection, this includes:

  • Set the GORILLA_ARTIFACT_GC_ENABLED environment variable to true: GORILLA_ARTIFACT_GC_ENABLED=true
  • Enable bucket versioning if you use AWS, GCP or any other storage provider such as Minio. If you use Azure, enable soft deletion.

The following table describes how to satisfy requirements to enable garbage collection based on your deployment type.

The X indicates you must satisfy the requirement:

Environment variable Enable versioning
Shared cloud
Shared cloud with secure storage connector X
Dedicated cloud
Dedicated cloud with secure storage connector X
Customer-managed cloud X X
Customer managed on-prem X X

1.7.2 - Manage artifact data retention

Time to live policies (TTL)

Schedule when artifacts are deleted from W&B with W&B Artifact time-to-live (TTL) policy. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W&B deletes artifacts, see the Delete artifacts page.

Check out this video tutorial to learn how to manage data retention with Artifacts TTL in the W&B App.

Auto-generated Artifacts

Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W&B cannot have TTL policies set for them.

The following Artifact types indicate an auto-generated Artifact:

  • run_table
  • code
  • job
  • Any Artifact type starting with: wandb-*

You can check an Artifact’s type on the W&B platform or programmatically:

import wandb

run = wandb.init(project="<my-project-name>")
artifact = run.use_artifact(artifact_or_name="<my-artifact-name>")
print(artifact.type)

Replace the values enclosed with <> with your own.

Define who can edit and set TTL policies

Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions.

  1. Navigate to your team’s profile page.
  2. Select the Settings tab.
  3. Navigate to the Artifacts time-to-live (TTL) section.
  4. From the TTL permissions dropdown, select who can set and edit TTL policies.
  5. Click on Review and save settings.
  6. Confirm the changes and select Save settings.

Create a TTL policy

Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created.

For all the code snippets below, replace the content wrapped in <> with your information to use the code snippet.

Set a TTL policy when you create an artifact

Use the W&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days.

The steps are as follows:

  1. Create an artifact.
  2. Add content to the artifact such as files, a directory, or a reference.
  3. Define a TTL time limit with the datetime.timedelta data type that is part of Python’s standard library.
  4. Log the artifact.

The following code snippet demonstrates how to create an artifact and set a TTL policy.

import wandb
from datetime import timedelta

run = wandb.init(project="<my-project-name>", entity="<my-entity>")
artifact = wandb.Artifact(name="<artifact-name>", type="<type>")
artifact.add_file("<my_file>")

artifact.ttl = timedelta(days=30)  # Set TTL policy
run.log_artifact(artifact)

The preceding code snippet sets the TTL policy for the artifact to 30 days. In other words, W&B deletes the artifact after 30 days.

Set or edit a TTL policy after you create an artifact

Use the W&B App UI or the W&B Python SDK to define a TTL policy for an artifact that already exists.

  1. Fetch your artifact.
  2. Pass in a time delta to the artifact’s ttl attribute.
  3. Update the artifact with the save method.

The following code snippet shows how to set a TTL policy for an artifact:

import wandb
from datetime import timedelta

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = timedelta(days=365 * 2)  # Delete in two years
artifact.save()

The preceding code example sets the TTL policy to two years.

  1. Navigate to your W&B project in the W&B App UI.
  2. Select the artifact icon on the left panel.
  3. From the list of artifacts, expand the artifact type you
  4. Select on the artifact version you want to edit the TTL policy for.
  5. Click on the Version tab.
  6. From the dropdown, select Edit TTL policy.
  7. Within the modal that appears, select Custom from the TTL policy dropdown.
  8. Within the TTL duration field, set the TTL policy in units of days.
  9. Select the Update TTL button to save your changes.

Set default TTL policies for a team

Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team’s default TTL.

  1. Navigate to your team’s profile page.
  2. Select the Settings tab.
  3. Navigate to the Artifacts time-to-live (TTL) section.
  4. Click on the Set team’s default TTL policy.
  5. Within the Duration field, set the TTL policy in units of days.
  6. Click on Review and save settings. 7/ Confirm the changes and then select Save settings.

Set a TTL policy outside of a run

Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days.

The following code sample shows how to fetch an artifact using the public API and set the TTL policy.

api = wandb.Api()

artifact = api.artifact("entity/project/artifact:alias")

artifact.ttl = timedelta(days=365)  # Delete in one year

artifact.save()

Deactivate a TTL policy

Use the W&B Python SDK or W&B App UI to deactivate a TTL policy for a specific artifact version.

  1. Fetch your artifact.
  2. Set the artifact’s ttl attribute to None.
  3. Update the artifact with the save method.

The following code snippet shows how to turn off a TTL policy for an artifact:

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = None
artifact.save()
  1. Navigate to your W&B project in the W&B App UI.
  2. Select the artifact icon on the left panel.
  3. From the list of artifacts, expand the artifact type you
  4. Select on the artifact version you want to edit the TTL policy for.
  5. Click on the Version tab.
  6. Click on the meatball UI icon next to the Link to registry button.
  7. From the dropdown, select Edit TTL policy.
  8. Within the modal that appears, select Deactivate from the TTL policy dropdown.
  9. Select the Update TTL button to save your changes.

View TTL policies

View TTL policies for artifacts with the Python SDK or with the W&B App UI.

Use a print statement to view an artifact’s TTL policy. The following example shows how to retrieve an artifact and view its TTL policy:

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
print(artifact.ttl)

View a TTL policy for an artifact with the W&B App UI.

  1. Navigate to the W&B App at https://wandb.ai.
  2. Go to your W&B Project.
  3. Within your project, select the Artifacts tab in the left sidebar.
  4. Click on a collection.

Within the collection view you can see all of the artifacts in the selected collection. Within the Time to Live column you will see the TTL policy assigned to that artifact.

1.7.3 - Manage artifact storage and memory allocation

Manage storage, memory allocation of W&B Artifacts.

W&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit.

For sensitive files, we recommend you set up Private Hosting or use reference artifacts.

During training, W&B locally saves logs, artifacts, and configuration files in the following local directories:

File Default location To change default location set:
logs ./wandb dir in wandb.init or set the WANDB_DIR environment variable
artifacts ~/.cache/wandb the WANDB_CACHE_DIR environment variable
configs ~/.config/wandb the WANDB_CONFIG_DIR environment variable

Clean up local artifact cache

W&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the wandb artifact cache cleanup command to prune the cache and to remove any files that have not been used recently.

The proceeding code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal:

$ wandb artifact cache cleanup 1GB

1.8 - Explore artifact graphs

Traverse automatically created direct acyclic W&B Artifact graphs.

W&B automatically tracks the artifacts a given run logged as well as the artifacts a given run uses. These artifacts can include datasets, models, evaluation results, or more. You can explore an artifact’s lineage to track and manage the various artifacts produced throughout the machine learning lifecycle.

Lineage

Tracking an artifact’s lineage has several key benefits:

  • Reproducibility: By tracking the lineage of all artifacts, teams can reproduce experiments, models, and results, which is essential for debugging, experimentation, and validating machine learning models.

  • Version Control: Artifact lineage involves versioning artifacts and tracking their changes over time. This allows teams to roll back to previous versions of data or models if needed.

  • Auditing: Having a detailed history of the artifacts and their transformations enables organizations to comply with regulatory and governance requirements.

  • Collaboration and Knowledge Sharing: Artifact lineage facilitates better collaboration among team members by providing a clear record of attempts as well as what worked, and what didn’t. This helps in avoiding duplication of efforts and accelerates the development process.

Finding an artifact’s lineage

When selecting an artifact in the Artifacts tab, you can see your artifact’s lineage. This graph view shows a general overview of your pipeline.

To view an artifact graph:

  1. Navigate to your project in the W&B App UI
  2. Choose the artifact icon on the left panel.
  3. Select Lineage.
Getting to the Lineage tab

The artifact or job type you provide appears in front of its name, with artifacts represented by blue icons and runs represented by green icons. Arrows detail the input and output of a run or artifact on the graph.

Run and artifact nodes Inputs and outputs

For a more detailed view, click any individual artifact or run to get more information on a particular object.

Previewing a run

Artifact clusters

When a level of the graph has five or more runs or artifacts, it creates a cluster. A cluster has a search bar to find specific versions of runs or artifacts and pulls an individual node from a cluster to continue investigating the lineage of a node inside a cluster.

Clicking on a node opens a preview with an overview of the node. Clicking on the arrow extracts the individual run or artifact so you can examine the lineage of the extracted node.

Searching a run cluster

Use the API to track lineage

You can also navigate a graph using the W&B API.

Create an artifact. First, create a run with wandb.init. Then,create a new artifact or retrieve an existing one with wandb.Artifact. Next, add files to the artifact with .add_file. Finally, log the artifact to the run with .log_artifact. The finished code looks something like this:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Use the artifact object’s logged_by and used_by methods to walk the graph from the artifact:

# Walk up and down the graph from an artifact:
producer_run = artifact.logged_by()
consumer_runs = artifact.used_by()

Next steps

1.9 - Artifact data privacy and compliance

Learn where W&B files are stored by default. Explore how to save, store sensitive information.

Files are uploaded to Google Cloud bucket managed by W&B when you log artifacts. The contents of the bucket are encrypted both at rest and in transit. Artifact files are only visible to users who have access to the corresponding project.

GCS W&B Client Server diagram

When you delete a version of an artifact, it is marked for soft deletion in our database and removed from your storage cost. When you delete an entire artifact, it is queued for permanently deletion and all of its contents are removed from the W&B bucket. If you have specific needs around file deletion please reach out to Customer Support.

For sensitive datasets that cannot reside in a multi-tenant environment, you can use either a private W&B server connected to your cloud bucket or reference artifacts. Reference artifacts track references to private buckets without sending file contents to W&B. Reference artifacts maintain links to files on your buckets or servers. In other words, W&B only keeps track of the metadata associated with the files and not the files themselves.

W&B Client Server Cloud diagram

Create a reference artifact similar to how you create a non reference artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("animals", type="dataset")
artifact.add_reference("s3://my-bucket/animals")

For alternatives, contact us at contact@wandb.com to talk about private cloud and on-premises installations.

1.10 - Tutorial: Create, track, and use a dataset artifact

Artifacts quickstart shows how to create, track, and use a dataset artifact with W&B.

This walkthrough demonstrates how to create, track, and use a dataset artifact from W&B Runs.

1. Log into W&B

Import the W&B library and log in to W&B. You will need to sign up for a free W&B account if you have not done so already.

import wandb

wandb.login()

2. Initialize a run

Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. Provide a project name and a job type:

# Create a W&B Run. Here we specify 'dataset' as the job type since this example
# shows how to create a dataset artifact.
run = wandb.init(project="artifacts-example", job_type="upload-dataset")

3. Create an artifact object

Create an artifact object with the wandb.Artifact() API. Provide a name for the artifact and a description of the file type for the name and type parameters, respectively.

For example, the following code snippet demonstrates how to create an artifact called ‘bicycle-dataset’ with a ‘dataset’ label:

artifact = wandb.Artifact(name="bicycle-dataset", type="dataset")

For more information about how to construct an artifact, see Construct artifacts.

Add the dataset to the artifact

Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named dataset.h5 that is saved locally on our machine to the artifact:

# Add a file to the artifact's contents
artifact.add_file(local_path="dataset.h5")

Replace the filename dataset.h5 in the preceding code snippet with the path to the file you want to add to the artifact.

4. Log the dataset

Use the W&B run objects log_artifact() method to both save your artifact version and declare the artifact as an output of the run.

# Save the artifact version to W&B and mark it
# as the output of this run
run.log_artifact(artifact)

A 'latest' alias is created by default when you log an artifact. For more information about artifact aliases and versions, see Create a custom alias and Create new artifact versions, respectively.

5. Download and use the artifact

The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W&B servers.

  1. First, initialize a new run object with wandb.init().
  2. Second, use the run objects use_artifact() method to tell W&B what artifact to use. This returns an artifact object.
  3. Third, use the artifacts download() method to download the contents of the artifact.
# Create a W&B Run. Here we specify 'training' for 'type'
# because we will use this run to track training.
run = wandb.init(project="artifacts-example", job_type="training")

# Query W&B for an artifact and mark it as input to this run
artifact = run.use_artifact("bicycle-dataset:latest")

# Download the artifact's contents
artifact_dir = artifact.download()

Alternatively, you can use the Public API (wandb.Api) to export (or update data) data already saved in a W&B outside of a Run. See Track external files for more information.

2 - Tables

Iterate on datasets and understand model predictions

Use W&B Tables to visualize and query tabular data. For example:

  • Compare how different models perform on the same test set
  • Identify patterns in your data
  • Look at sample model predictions visually
  • Query to find commonly misclassified examples

The above image shows a table with semantic segmentation and custom metrics. View this table here in this sample project from the W&B ML Course.

How it works

A Table is a two-dimensional grid of data where each column has a single type of data. Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types.

Log a Table

Log a table with a few lines of code:

  • wandb.init(): Create a run to track results.
  • wandb.Table(): Create a new table object.
    • columns: Set the column names.
    • data: Set the contents of the table.
  • run.log(): Log the table to save it to W&B.
import wandb

run = wandb.init(project="table-test")
my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
run.log({"Table Name": my_table})

How to get started

  • Quickstart: Learn to log data tables, visualize data, and query data.
  • Tables Gallery: See example use cases for Tables.

2.1 - Tutorial: Log tables, visualize and query data

Explore how to use W&B Tables with this 5 minute Quickstart.

The following Quickstart demonstrates how to log data tables, visualize data, and query data.

Select the button below to try a PyTorch Quickstart example project on MNIST data.

1. Log a table

Log a table with W&B. You can either construct a new table or pass a Pandas Dataframe.

To construct and log a new Table, you will use:

  • wandb.init(): Create a run to track results.
  • wandb.Table(): Create a new table object.
    • columns: Set the column names.
    • data: Set the contents of each row.
  • run.log(): Log the table to save it to W&B.

Here’s an example:

import wandb

run = wandb.init(project="table-test")
# Create and log a new table.
my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
run.log({"Table Name": my_table})

Pass a Pandas Dataframe to wandb.Table() to create a new table.

import wandb
import pandas as pd

df = pd.read_csv("my_data.csv")

run = wandb.init(project="df-table")
my_table = wandb.Table(dataframe=df)
wandb.log({"Table Name": my_table})

For more information on supported data types, see the wandb.Table in the W&B API Reference Guide.

2. Visualize tables in your project workspace

View the resulting table in your workspace.

  1. Navigate to your project in the W&B App.
  2. Select the name of your run in your project workspace. A new panel is added for each unique table key.

In this example, my_table, is logged under the key "Table Name".

3. Compare across model versions

Log sample tables from multiple W&B Runs and compare results in the project workspace. In this example workspace, we show how to combine rows from multiple different versions in the same table.

Use the table filter, sort, and grouping features to explore and evaluate model results.

2.2 - Visualize and analyze tables

Visualize and analyze W&B Tables.

Customize your W&B Tables to answer questions about your machine learning model’s performance, analyze your data, and more.

Interactively explore your data to:

  • Compare changes precisely across models, epochs, or individual examples
  • Understand higher-level patterns in your data
  • Capture and communicate your insights with visual samples

How to view two tables

Compare two tables with a merged view or a side-by-side view. For example, the image below demonstrates a table comparison of MNIST data.

Left: mistakes after 1 training epochs, Right: mistakes after 5 epochs

Follow these steps to compare two tables:

  1. Go to your project in the W&B App.
  2. Select the artifacts icon on the left panel.
  3. Select an artifact version.

In the following image we demonstrate a model’s predictions on MNIST validation data after each of five epochs (view interactive example here).

Click on 'predictions' to view the Table
  1. Hover over the second artifact version you want to compare in the sidebar and click Compare when it appears. For example, in the image below we select a version labeled as “v4” to compare to MNIST predictions made by the same model after 5 epochs of training.
Preparing to compare model predictions after training for 1 epoch (v0, shown here) vs 5 epochs (v4)

Merged view

Initially you see both tables merged together. The first table selected has index 0 and a blue highlight, and the second table has index 1 and a yellow highlight. View a live example of merged tables here.

In the merged view, numerical columns appears as histograms by default

From the merged view, you can

  • choose the join key: use the dropdown at the top left to set the column to use as the join key for the two tables. Typically this is the unique identifier of each row, such as the filename of a specific example in your dataset or an incrementing index on your generated samples. Note that it’s currently possible to select any column, which may yield illegible tables and slow queries.
  • concatenate instead of join: select “concatenating all tables” in this dropdown to union all the rows from both tables into one larger Table instead of joining across their columns
  • reference each Table explicitly: use 0, 1, and * in the filter expression to explicitly specify a column in one or both table instances
  • visualize detailed numerical differences as histograms: compare the values in any cell at a glance

Side-by-side view

To view the two tables side-by-side, change the first dropdown from “Merge Tables: Table” to “List of: Table” and then update the “Page size” respectively. Here the first Table selected is on the left and the second one is on the right. Also, you can compare these tables vertically as well by clicking on the “Vertical” checkbox.

In the side-by-side view, Table rows are independent of each other.
  • compare the tables at a glance: apply any operations (sort, filter, group) to both tables in tandem and spot any changes or differences quickly. For example, view the incorrect predictions grouped by guess, the hardest negatives overall, the confidence score distribution by true label, etc.
  • explore two tables independently: scroll through and focus on the side/rows of interest

Compare artifacts

You can also compare tables across time or model variants.

Compare tables across time

Log a table in an artifact for each meaningful step of training to analyze model performance over training time. For example, you could log a table at the end of every validation step, after every 50 epochs of training, or any frequency that makes sense for your pipeline. Use the side-by-side view to visualize changes in model predictions.

For each label, the model makes fewer mistakes after 5 training epochs (R) than after 1 (L)

For a more detailed walkthrough of visualizing predictions across training time, see this report and this interactive notebook example.

Compare tables across model variants

Compare two artifact versions logged at the same step for two different models to analyze model performance across different configurations (hyperparameters, base architectures, and so forth).

For example, compare predictions between a baseline and a new model variant, 2x_layers_2x_lr, where the first convolutional layer doubles from 32 to 64, the second from 128 to 256, and the learning rate from 0.001 to 0.002. From this live example, use the side-by-side view and filter down to the incorrect predictions after 1 (left tab) versus 5 training epochs (right tab).

After 1 epoch, performance is mixed: precision improves for some classes and worsens for others.
After 5 epochs, the 'double' variant is catching up to the baseline.

Save your view

Tables you interact with in the run workspace, project workspace, or a report automatically saves their view state. If you apply any table operations then close your browser, the table retains the last viewed configuration when you next navigate to the table.

To save a table from a workspace in a particular state, export it to a W&B Report. To export a table to report:

  1. Select the kebob icon (three vertical dots) in the top right corner of your workspace visualization panel.
  2. Select either Share panel or Add to report.
Share panel creates a new report, Add to report lets you append to an existing report.

Examples

These reports highlight the different use cases of W&B Tables:

2.3 - Example tables

Examples of W&B Tables

The following sections highlight some of the ways you can use tables:

View your data

Log metrics and rich media during model training or evaluation, then visualize results in a persistent database synced to the cloud, or to your hosting instance.

Browse examples and verify the counts and distribution of your data

For example, check out this table that shows a balanced split of a photos dataset.

Interactively explore your data

View, sort, filter, group, join, and query tables to understand your data and model performance—no need to browse static files or rerun analysis scripts.

Listen to original songs and their synthesized versions (with timbre transfer)

For example, see this report on style-transferred audio.

Compare model versions

Quickly compare results across different training epochs, datasets, hyperparameter choices, model architectures etc.

See granular differences: the left model detects some red sidewalk, the right does not.

For example, see this table that compares two models on the same test images.

Track every detail and see the bigger picture

Zoom in to visualize a specific prediction at a specific step. Zoom out to see the aggregate statistics, identify patterns of errors, and understand opportunities for improvement. This tool works for comparing steps from a single model training, or results across different model versions.

For example, see this example table that analyzes results after one and then after five epochs on the MNIST dataset.

Example Projects with W&B Tables

The following highlight some real W&B Projects that use W&B Tables.

Image classification

Read this report, follow this colab, or explore this artifacts context to see how a CNN identifies ten types of living things (plants, bird, insects, etc) from iNaturalist photos.

Compare the distribution of true labels across two different models' predictions.

Audio

Interact with audio tables in this report on timbre transfer. You can compare a recorded whale song with a synthesized rendition of the same melody on an instrument like violin or trumpet. You can also record your own songs and explore their synthesized versions in W&B with this colab.

Text

Browse text samples from training data or generated output, dynamically group by relevant fields, and align your evaluation across model variants or experiment settings. Render text as Markdown or use visual diff mode to compare texts. Explore a simple character-based RNN for generating Shakespeare in this report.

Doubling the size of the hidden layer yields some more creative prompt completions.

Video

Browse and aggregate over videos logged during training to understand your models. Here is an early example using the SafeLife benchmark for RL agents seeking to minimize side effects

Browse easily through the few successful agents

Tabular data

View a report on how to split and pre-process tabular data with version control and de-duplication.

Tables and Artifacts work together to version control, label, and de-duplicate your dataset iterations

Comparing model variants (semantic segmentation)

An interactive notebook and live example of logging Tables for semantic segmentation and comparing different models. Try your own queries in this Table.

Find the best predictions across two models on the same test set

Analyzing improvement over training time

A detailed report on how to visualize predictions over time and the accompanying interactive notebook.

2.4 - Export table data

How to export data from tables.

Like all W&B Artifacts, Tables can be converted into pandas dataframes for easy data exporting.

Convert table to artifact

First, you’ll need to convert the table to an artifact. The easiest way to do this using artifact.get(table, "table_name"):

# Create and log a new table.
with wandb.init() as r:
    artifact = wandb.Artifact("my_dataset", type="dataset")
    table = wandb.Table(
        columns=["a", "b", "c"], data=[(i, i * 2, 2**i) for i in range(10)]
    )
    artifact.add(table, "my_table")
    wandb.log_artifact(artifact)

# Retrieve the created table using the artifact you created.
with wandb.init() as r:
    artifact = r.use_artifact("my_dataset:latest")
    table = artifact.get("my_table")

Convert artifact to Dataframe

Then, convert the table into a dataframe:

# Following from the last code example:
df = table.get_dataframe()

Export Data

Now you can export using any method dataframe supports:

# Converting the table data to .csv
df.to_csv("example.csv", encoding="utf-8")

Next Steps

3 - Reports

Project management and collaboration tools for machine learning projects

Use W&B Reports to:

  • Organize Runs.
  • Embed and automate visualizations.
  • Describe your findings.
  • Share updates with collaborators, either as a LaTeX zip file a PDF.

The following image shows a section of a report created from metrics that were logged to W&B over the course of training.

View the report where the above image was taken from here.

How it works

Create a collaborative report with a few clicks.

  1. Navigate to your W&B project workspace in the W&B App.
  2. Click the Create report button in the upper right corner of your workspace.
  1. A modal titled Create Report will appear. Select the charts and panels you want to add to your report. (You can add or remove charts and panels later).
  2. Click Create report.
  3. Edit the report to your desired state.
  4. Click Publish to project.
  5. Click the Share button to share your report with collaborators.

See the Create a report page for more information on how to create reports interactively an programmatically with the W&B Python SDK.

How to get started

Depending on your use case, explore the following resources to get started with W&B Reports:

3.1 - Create a report

Create a W&B Report with the App UI or programmatically with the Weights & Biases SDK.

Create a report interactively with the W&B App UI or programmatically with the W&B Python SDK.

  1. Navigate to your project workspace in the W&B App.

  2. Click Create report in the upper right corner of your workspace.

  3. A modal will appear. Select the charts you would like to start with. You can add or delete charts later from the report interface.

  4. Select the Filter run sets option to prevent new runs from being added to your report. You can toggle this option on or off. Once you click Create report, a draft report will be available in the report tab to continue working on.

  1. Navigate to your project workspace in the W&B App.

  2. Select to the Reports tab (clipboard image) in your project.

  3. Select the Create Report button on the report page.

Create a report programmatically with the wandb library.

  1. Install W&B SDK and Workspaces API:

    pip install wandb wandb-workspaces
    
  2. Next, import workspaces

    import wandb
    import wandb_workspaces.reports.v2 as wr
    
  3. Create a report with wandb_workspaces.reports.v2.Report. Create a report instance with the Report Class Public API (wandb.apis.reports). Specify a name for the project.

    report = wr.Report(project="report_standard")
    
  4. Save the report. Reports are not uploaded to the W&B server until you call the .save() method:

    report.save()
    

For information on how to edit a report interactively with the App UI or programmatically, see Edit a report.

3.2 - Edit a report

Edit a report interactively with the App UI or programmatically with the W&B SDK.

Edit a report interactively with the App UI or programmatically with the W&B SDK.

Reports consist of blocks. Blocks make up the body of a report. Within these blocks you can add text, images, embedded visualizations, plots from experiments and run, and panels grids.

Panel grids are a specific type of block that hold panels and run sets. Run sets are a collection of runs logged to a project in W&B. Panels are visualizations of run set data.

Add plots

Each panel grid has a set of run sets and a set of panels. The run sets at the bottom of the section control what data shows up on the panels in the grid. Create a new panel grid if you want to add charts that pull data from a different set of runs.

Enter a forward slash (/) in the report to display a dropdown menu. Select Add panel to add a panel. You can add any panel that is supported by W&B, including a line plot, scatter plot or parallel coordinates chart.

Add charts to a report

Add plots to a report programmatically with the SDK. Pass a list of one or more plot or chart objects to the panels parameter in the PanelGrid Public API Class. Create a plot or chart object with its associated Python Class.

The proceeding examples demonstrates how to create a line plot and scatter plot.

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(
    project="report-editing",
    title="An amazing title",
    description="A descriptive description.",
)

blocks = [
    wr.PanelGrid(
        panels=[
            wr.LinePlot(x="time", y="velocity"),
            wr.ScatterPlot(x="time", y="acceleration"),
        ]
    )
]

report.blocks = blocks
report.save()

For more information about available plots and charts you can add to a report programmatically, see wr.panels.

Add run sets

Add run sets from projects interactively with the App UI or the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown, choose Panel Grid. This will automatically import the run set from the project the report was created from.

Add run sets from projects with the wr.Runset() and wr.PanelGrid Classes. The proceeding procedure describes how to add a runset:

  1. Create a wr.Runset() object instance. Provide the name of the project that contains the runsets for the project parameter and the entity that owns the project for the entity parameter.
  2. Create a wr.PanelGrid() object instance. Pass a list of one or more runset objects to the runsets parameter.
  3. Store one or more wr.PanelGrid() object instances in a list.
  4. Update the report instance blocks attribute with the list of panel grid instances.
import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(
    project="report-editing",
    title="An amazing title",
    description="A descriptive description.",
)

panel_grids = wr.PanelGrid(
    runsets=[wr.RunSet(project="<project-name>", entity="<entity-name>")]
)

report.blocks = [panel_grids]
report.save()

You can optionally add runsets and panels with one call to the SDK:

import wandb

report = wr.Report(
    project="report-editing",
    title="An amazing title",
    description="A descriptive description.",
)

panel_grids = wr.PanelGrid(
    panels=[
        wr.LinePlot(
            title="line title",
            x="x",
            y=["y"],
            range_x=[0, 100],
            range_y=[0, 100],
            log_x=True,
            log_y=True,
            title_x="x axis title",
            title_y="y axis title",
            ignore_outliers=True,
            groupby="hyperparam1",
            groupby_aggfunc="mean",
            groupby_rangefunc="minmax",
            smoothing_factor=0.5,
            smoothing_type="gaussian",
            smoothing_show_original=True,
            max_runs_to_show=10,
            plot_type="stacked-area",
            font_size="large",
            legend_position="west",
        ),
        wr.ScatterPlot(
            title="scatter title",
            x="y",
            y="y",
            # z='x',
            range_x=[0, 0.0005],
            range_y=[0, 0.0005],
            # range_z=[0,1],
            log_x=False,
            log_y=False,
            # log_z=True,
            running_ymin=True,
            running_ymean=True,
            running_ymax=True,
            font_size="small",
            regression=True,
        ),
    ],
    runsets=[wr.RunSet(project="<project-name>", entity="<entity-name>")],
)


report.blocks = [panel_grids]
report.save()

Add code blocks

Add code blocks to your report interactively with the App UI or with the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Code.

Select the name of the programming language on the right hand of the code block. This will expand a dropdown. From the dropdown, select your programming language syntax. You can choose from Javascript, Python, CSS, JSON, HTML, Markdown, and YAML.

Use the wr.CodeBlock Class to create a code block programmatically. Provide the name of the language and the code you want to display for the language and code parameters, respectively.

For example the proceeding example demonstrates a list in YAML file:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.CodeBlock(
        code=["this:", "- is", "- a", "cool:", "- yaml", "- file"], language="yaml"
    )
]

report.save()

This will render a code block similar to:

this:
- is
- a
cool:
- yaml
- file

The proceeding example demonstrates a Python code block:

report = wr.Report(project="report-editing")


report.blocks = [wr.CodeBlock(code=["Hello, World!"], language="python")]

report.save()

This will render a code block similar to:

Hello, World!

Add markdown

Add markdown to your report interactively with the App UI or with the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Markdown.

Use the wandb.apis.reports.MarkdownBlock Class to create a markdown block programmatically. Pass a string to the text parameter:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.MarkdownBlock(text="Markdown cell with *italics* and **bold** and $e=mc^2$")
]

This will render a markdown block similar to:

Add HTML elements

Add HTML elements to your report interactively with the App UI or with the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown select a type of text block. For example, to create an H2 heading block, select the Heading 2 option.

Pass a list of one or more HTML elements to wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to create an H1, H2, and an unordered list:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.H1(text="How Programmatic Reports work"),
    wr.H2(text="Heading 2"),
    wr.UnorderedList(items=["Bullet 1", "Bullet 2"]),
]

report.save()

This will render a HTML elements to the following:

Embed rich media within the report with the App UI or with the W&B SDK.

Copy and past URLs into reports to embed rich media within the report. The following animations demonstrate how to copy and paste URLs from Twitter, YouTube, and SoundCloud.

Twitter

Copy and paste a Tweet link URL into a report to view the Tweet within the report.

Youtube

Copy and paste a YouTube video URL link to embed a video in the report.

SoundCloud

Copy and paste a SoundCloud link to embed an audio file into a report.

Pass a list of one or more embedded media objects to the wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to embed video and Twitter media into a report:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.Video(url="https://www.youtube.com/embed/6riDJMI-Y8U"),
    wr.Twitter(
        embed_html='<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The voice of an angel, truly. <a href="https://twitter.com/hashtag/MassEffect?src=hash&amp;ref_src=twsrc%5Etfw">#MassEffect</a> <a href="https://t.co/nMev97Uw7F">pic.twitter.com/nMev97Uw7F</a></p>&mdash; Mass Effect (@masseffect) <a href="https://twitter.com/masseffect/status/1428748886655569924?ref_src=twsrc%5Etfw">August 20, 2021</a></blockquote>\n'
    ),
]
report.save()

Duplicate and delete panel grids

If you have a layout that you would like to reuse, you can select a panel grid and copy-paste it to duplicate it in the same report or even paste it into a different report.

Highlight a whole panel grid section by selecting the drag handle in the upper right corner. Click and drag to highlight and select a region in a report such as panel grids, text, and headings.

Select a panel grid and press delete on your keyboard to delete a panel grid.

Collapse headers to organize Reports

Collapse headers in a Report to hide content within a text block. When the report is loaded, only headers that are expanded will show content. Collapsing headers in reports can help organize your content and prevent excessive data loading. The proceeding gif demonstrates the process.

3.3 - Collaborate on reports

Collaborate and share W&B Reports with peers, co-workers, and your team.

Once you have saved a report, you can select the Share button to collaborate. A draft copy of the report is created when you select the Edit button. Draft reports auto-save. Select Save to report to publish your changes to the shared report.

A warning notification will appear if an edit conflict occurs. This can occur if you and another collaborator edit the same report at the same time. The warning notification will guide you to resolve potential edit conflicts.

Report sharing modal for a report in a 'Public' project

Comment on reports

Click the comment button on a panel in a report to add a comment directly to that panel.

Adding a comment to a panel

3.4 - Clone and export reports

Export a W&B Report as a PDF or LaTeX.

Export reports

Export a report as a PDF or LaTeX. Within your report, select the kebab icon to expand the dropdown menu. Choose Download and select either PDF or LaTeX output format.

Cloning reports

Within your report, select the kebab icon to expand the dropdown menu. Choose the Clone this report button. Pick a destination for your cloned report in the modal. Choose Clone report.

Clone a report to reuse a project’s template and format. Cloned projects are visible to your team if you clone a project within the team’s account. Projects cloned within an individual’s account are only visible to that user.

Load a Report from a URL to use it as a template.

report = wr.Report(
    project=PROJECT, title="Quickstart Report", description="That was easy!"
)  # Create
report.save()  # Save
new_report = wr.Report.from_url(report.url)  # Load

Edit the content within new_report.blocks.

pg = wr.PanelGrid(
    runsets=[
        wr.Runset(ENTITY, PROJECT, "First Run Set"),
        wr.Runset(ENTITY, PROJECT, "Elephants Only!", query="elephant"),
    ],
    panels=[
        wr.LinePlot(x="Step", y=["val_acc"], smoothing_factor=0.8),
        wr.BarPlot(metrics=["acc"]),
        wr.MediaBrowser(media_keys="img", num_columns=1),
        wr.RunComparer(diff_only="split", layout={"w": 24, "h": 9}),
    ],
)
new_report.blocks = (
    report.blocks[:1] + [wr.H1("Panel Grid Example"), pg] + report.blocks[1:]
)
new_report.save()

3.5 - Embed a report

Embed W&B reports directly into Notion or with an HTML IFrame element.

HTML iframe element

Select the Share button on the upper right hand corner within a report. A modal window will appear. Within the modal window, select Copy embed code. The copied code will render within an Inline Frame (IFrame) HTML element. Paste the copied code into an iframe HTML element of your choice.

Confluence

The proceeding animation demonstrates how to insert the direct link to the report within an IFrame cell in Confluence.

Notion

The proceeding animation demonstrates how to insert a report into a Notion document using an Embed block in Notion and the report’s embedded code.

Gradio

You can use the gr.HTML element to embed W&B Reports within Gradio Apps and use them within Hugging Face Spaces.

import gradio as gr


def wandb_report(url):
    iframe = f'<iframe src={url} style="border:none;height:1024px;width:100%">'
    return gr.HTML(iframe)


with gr.Blocks() as demo:
    report = wandb_report(
        "https://wandb.ai/_scott/pytorch-sweeps-demo/reports/loss-22-10-07-16-00-17---VmlldzoyNzU2NzAx"
    )
demo.launch()

3.6 - Compare runs across projects

Compare runs from two different projects with cross-project reports.

Compare runs from two different projects with cross-project reports. Use the project selector in the run set table to pick a project.

Compare runs across different projects

The visualizations in the section pull columns from the first active runset. Make sure that the first run set checked in the section has that column available if you do not see the metric you are looking for in the line plot.

This feature supports history data on time series lines, but we don’t support pulling different summary metrics from different projects. In other words, you can not create a scatter plot from columns that are only logged in another project.

If you need to compare runs from two projects and the columns are not working, add a tag to the runs in one project and then move those runs to the other project. You can still filter only the runs from each project, but the report includes all the columns for both sets of runs.

Share a view-only link to a report that is in a private project or team project.

View-only report links add a secret access token to the URL, so anyone who opens the link can view the page. Anyone can use the magic link to view the report without logging in first. For customers on W&B Local private cloud installations, these links remain behind your firewall, so only members of your team with access to your private instance and access to the view-only link can view the report.

In view-only mode, someone who is not logged in can see the charts and mouse over to see tooltips of values, zoom in and out on charts, and scroll through columns in the table. When in view mode, they cannot create new charts or new table queries to explore the data. View-only visitors to the report link won’t be able to click a run to get to the run page. Also, the view-only visitors would not be able to see the share modal but instead would see a tooltip on hover which says: Sharing not available for view only access.

Send a graph to a report

Send a graph from your workspace to a report to keep track of your progress. Click the dropdown menu on the chart or panel you’d like to copy to a report and click Add to report to select the destination report.

3.7 - Example reports

Reports gallery

Notes: Add a visualization with a quick summary

Capture an important observation, an idea for future work, or a milestone reached in the development of a project. All experiment runs in your report will link to their parameters, metrics, logs, and code, so you can save the full context of your work.

Jot down some text and pull in relevant charts to illustrate your insight.

See the What To Do When Inception-ResNet-V2 Is Too Slow W&B Report for an example of how you can share comparisons of training time.

Save the best examples from a complex code base for easy reference and future interaction. See the LIDAR point clouds W&B Report for an example of how to visualize LIDAR point clouds from the Lyft dataset and annotate with 3D bounding boxes.

Collaboration: Share findings with your colleagues

Explain how to get started with a project, share what you’ve observed so far, and synthesize the latest findings. Your colleagues can make suggestions or discuss details using comments on any panel or at the end of the report.

Include dynamic settings so that your colleagues can explore for themselves, get additional insights, and better plan their next steps. In this example, three types of experiments can be visualized independently, compared, or averaged.

See the SafeLife benchmark experiments W&B Report for an example of how to share first runs and observations of a benchmark.

Use sliders and configurable media panels to showcase a model’s results or training progress. View the Cute Animals and Post-Modern Style Transfer: StarGAN v2 for Multi-Domain Image Synthesis report for an example W&B Report with sliders.

Work log: Track what you’ve tried and plan next steps

Write down your thoughts on experiments, your findings, and any gotchas and next steps as you work through a project, keeping everything organized in one place. This lets you “document” all the important pieces beyond your scripts. See the Who Is Them? Text Disambiguation With Transformers W&B Report for an example of how you can report your findings.

Tell the story of a project, which you and others can reference later to understand how and why a model was developed. See The View from the Driver’s Seat W&B Report for how you can report your findings.

See the Learning Dexterity End-to-End Using W&B Reports for an example of how W&B Reports were used to explore how the OpenAI Robotics team used W&B Reports to run massive machine learning projects.