Document and share insights across the entire organization by generating live reports in digestible, visual formats that are easily understood by non-technical stakeholders.
Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. You can log hyperparameters, metadatra, and metrics to a run, and you can use an artifact to log, track, and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as output.
Use cases
You can use artifacts throughout your entire ML workflow as inputs and outputs of runs. You can use datasets, models, or even other artifacts as inputs for processing.
Add one or more files, such as a model file or dataset, to your artifact object.
Log your artifact to W&B.
For example, the proceeding code snippet shows how to log a file called dataset.h5 to an artifact called example_artifact:
import wandb
run = wandb.init(project ="artifacts-example", job_type ="add-dataset")
artifact = wandb.Artifact(name ="example_artifact", type ="dataset")
artifact.add_file(local_path ="./dataset.h5", name ="training_dataset")
artifact.save()
# Logs the artifact version "my_data" as a dataset with data from dataset.h5
See the track external files page for information on how to add references to files or directories stored in external object storage, like an Amazon S3 bucket.
Download an artifact
Indicate the artifact you want to mark as input to your run with the use_artifact method.
Following the preceding code snippet, this next code block shows how to use the training_dataset artifact:
artifact = run.use_artifact("training_dataset:latest") #returns a run object using the "my_data" artifact
This returns an artifact object.
Next, use the returned object to download all contents of the artifact:
datadir = artifact.download() #downloads the full "my_data" artifact to the default directory.
You can pass a custom path into the rootparameter to download an artifact to a specific directory. For alternate ways to download artifacts and to see additional parameters, see the guide on downloading and using artifacts.
1. Create an artifact Python object with wandb.Artifact()
Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:
Name: Specify a name for your artifact. The name should be unique, descriptive, and easy to remember. Use an artifacts name to both: identify the artifact in the W&B App UI and when you want to use that artifact.
Type: Provide a type. The type should be simple, descriptive and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.
The “name” and “type” you provide is used to create a directed acyclic graph. This means you can view the lineage of an artifact on the W&B App.
Artifacts can not have the same name, even if you specify a different type for the types parameter. In other words, you can not create an artifact named cats of type dataset and another artifact with the same name of type model.
You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see wandb.Artifact Class definition in the Python SDK Reference Guide.
The proceeding example demonstrates how to create a dataset artifact:
Replace the string arguments in the preceding code snippet with your own name and type.
2. Add one more files to the artifact
Add files, directories, external URI references (such as Amazon S3) and more with artifact methods. For example, to add a single text file, use the add_file method:
You can also add multiple files with the add_dir method. For more information on how to add files, see Update an artifact.
3. Save your artifact to the W&B server
Finally, save your artifact to the W&B server. Artifacts are associated with a run. Therefore, use a run objects log_artifact() method to save the artifact.
You can optionally construct an artifact outside of a W&B run. For more information, see Track external files.
Calls to log_artifact are performed asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example:
for i in range(10):
a = wandb.Artifact(
"race",
type="dataset",
metadata={
"index": i,
},
)
# ... add files to artifact a ... run.log_artifact(a)
The artifact version v0 is NOT guaranteed to have an index of 0 in its metadata, as the artifacts may be logged in an arbitrary order.
Add files to an artifact
The following sections demonstrate how to construct artifacts with different file types and from parallel runs.
For the following examples, assume you have a project directory with multiple files and a directory structure:
The proceeding code snippet demonstrates how to add an entire, local directory to your artifact:
# Recursively add a directoryartifact.add_dir(local_path="path/file.format", name="optional-prefix")
The proceeding API calls produce the proceeding artifact content:
API Call
Resulting artifact
artifact.add_dir('images')
cat.png
dog.png
artifact.add_dir('images', name='images')
images/cat.png
images/dog.png
artifact.new_file('hello.txt')
hello.txt
Add a URI reference
Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle.
Add an external URI reference to an artifact with the add_reference method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.
# Add a URI referenceartifact.add_reference(uri="uri", name="optional-name")
Artifacts currently support the following URI schemes:
http(s)://: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
s3://: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
gs://: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
The proceeding API calls will produce the proceeding artifacts:
For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.
import wandb
import time
# We will use ray to launch our runs in parallel# for demonstration purposes. You can orchestrate# your parallel runs however you want.import ray
ray.init()
artifact_type ="dataset"artifact_name ="parallel-artifact"table_name ="distributed_table"parts_path ="parts"num_parallel =5# Each batch of parallel writers should have its own# unique group name.group_name ="writer-group-{}".format(round(time.time()))
@ray.remotedeftrain(i):
"""
Our writer job. Each writer will add one image to the artifact.
"""with wandb.init(group=group_name) as run:
artifact = wandb.Artifact(name=artifact_name, type=artifact_type)
# Add data to a wandb table. In this case we use example data table = wandb.Table(columns=["a", "b", "c"], data=[[i, i *2, 2**i]])
# Add the table to folder in the artifact artifact.add(table, "{}/table_{}".format(parts_path, i))
# Upserting the artifact creates or appends data to the artifact run.upsert_artifact(artifact)
# Launch your runs in parallelresult_ids = [train.remote(i) for i in range(num_parallel)]
# Join on all the writers to make sure their files have# been added before finishing the artifact.ray.get(result_ids)
# Once all the writers are finished, finish the artifact# to mark it ready.with wandb.init(group=group_name) as run:
artifact = wandb.Artifact(artifact_name, type=artifact_type)
# Create a "PartitionTable" pointing to the folder of tables# and add it to the artifact. artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)
# Finish artifact finalizes the artifact, disallowing future "upserts"# to this version. run.finish_artifact(artifact)
1.2 - Download and use artifacts
Download and use Artifacts from multiple projects.
Download and use an artifact that is already stored on the W&B server or construct an artifact object and pass it in to for de-duplication as necessary.
Team members with view-only seats cannot download artifacts.
Download and use an artifact stored on W&B
Download and use an artifact stored in W&B either inside or outside of a W&B Run. Use the Public API (wandb.Api) to export (or update data) already saved in W&B. For more information, see the W&B Public API Reference guide.
First, import the W&B Python SDK. Next, create a W&B Run:
import wandb
run = wandb.init(project="<example>", job_type="<job-type>")
Indicate the artifact you want to use with the use_artifact method. This returns a run object. In the proceeding code snippet specifies an artifact called 'bike-dataset' with the alias 'latest':
Use the object returned to download all the contents of the artifact:
datadir = artifact.download()
You can optionally pass a path to the root parameter to download the contents of the artifact to a specific directory. For more information, see the Python SDK Reference Guide.
Use the get_path method to download only subset of files:
path = artifact.get_path(name)
This fetches only the file at the path name. It returns an Entry object with the following methods:
Entry.download: Downloads file from the artifact at path name
Entry.ref: If add_reference stored the entry as a reference, returns the URI
References that have schemes that W&B knows how to handle get downloaded just like artifact files. For more information, see Track external files.
First, import the W&B SDK. Next, create an artifact from the Public API Class. Provide the entity, project, artifact, and alias associated with that artifact:
import wandb
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
Use the object returned to download the contents of the artifact:
artifact.download()
You can optionally pass a path the root parameter to download the contents of the artifact to a specific directory. For more information, see the API Reference Guide.
Use the wandb artifact get command to download an artifact from the W&B server.
$ wandb artifact get project/artifact:alias --root mnist/
Partially download an artifact
You can optionally download part of an artifact based on a prefix. Using the path_prefix parameter, you can download a single file or the content of a sub-folder.
artifact = run.use_artifact("bike-dataset:latest")
artifact.download(path_prefix="bike.png") # downloads only bike.png
Alternatively, you can download files from a certain directory:
artifact.download(path_prefix="images/bikes/") # downloads files in the images/bikes directory
Use an artifact from a different project
Specify the name of artifact along with its project name to reference an artifact. You can also reference artifacts across entities by specifying the name of the artifact with its entity name.
The following code example demonstrates how to query an artifact from another project as input to the current W&B run.
import wandb
run = wandb.init(project="<example>", job_type="<job-type>")
# Query W&B for an artifact from another project and mark it# as an input to this run.artifact = run.use_artifact("my-project/artifact:alias")
# Use an artifact from another entity and mark it as an input# to this run.artifact = run.use_artifact("my-entity/my-project/artifact:alias")
Construct and use an artifact simultaneously
Simultaneously construct and use an artifact. Create an artifact object and pass it to use_artifact. This creates an artifact in W&B if it does not exist yet. The use_artifact API is idempotent, so you can call it as many times as you like.
Update an existing Artifact inside and outside of a W&B Run.
Pass desired values to update the description, metadata, and alias of an artifact. Call the save() method to update the artifact on the W&B servers. You can update an artifact during a W&B Run or outside of a Run.
Use the W&B Public API (wandb.Api) to update an artifact outside of a run. Use the Artifact API (wandb.Artifact) to update an artifact during a run.
You can not update the alias of artifact linked to a model in Model Registry.
The proceeding code example demonstrates how to update the description of an artifact using the wandb.Artifact API:
import wandb
run = wandb.init(project="<example>")
artifact = run.use_artifact("<artifact-name>:<alias>")
artifact.description ="<description>"artifact.save()
The proceeding code example demonstrates how to update the description of an artifact using the wandb.Api API:
import wandb
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
# Update the descriptionartifact.description ="My new description"# Selectively update metadata keysartifact.metadata["oldKey"] ="new value"# Replace the metadata entirelyartifact.metadata = {"newKey": "new value"}
# Add an aliasartifact.aliases.append("best")
# Remove an aliasartifact.aliases.remove("latest")
# Completely replace the aliasesartifact.aliases = ["replaced"]
# Persist all artifact modificationsartifact.save()
For more information, see the Weights and Biases Artifact API.
You can also update an Artifact collection in the same way as a singular artifact:
import wandb
run = wandb.init(project="<example>")
api = wandb.Api()
artifact = api.artifact_collection(type="<type-name>", collection="<collection-name>")
artifact.name ="<new-collection-name>"artifact.description ="<This is where you'd describe the purpose of your collection.>"artifact.save()
Use aliases as pointers to specific versions. By default, Run.log_artifact adds the latest alias to the logged version.
An artifact version v0 is created and attached to your artifact when you log an artifact for the first time. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.
For example, if you want your training script to pull the most recent version of a dataset, specify latest when you use that artifact. The proceeding code example demonstrates how to download a recent dataset artifact named bike-dataset that has an alias, latest:
import wandb
run = wandb.init(project="<example-project>")
artifact = run.use_artifact("bike-dataset:latest")
artifact.download()
You can also apply a custom alias to an artifact version. For example, if you want to mark that model checkpoint is the best on the metric AP-50, you could add the string 'best-ap50' as an alias when you log the model artifact.
Create a new artifact version from a single run or from a distributed process.
Create a new artifact version with a single run or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an incremental artifact.
We recommend that you create an incremental artifact when you need to apply changes to a subset of files in an artifact, where the size of the original artifact is significantly larger.
Create new artifact versions from scratch
There are two ways to create a new artifact version: from a single run and from distributed runs. They are defined as follows:
Single run: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis.
Distributed runs: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions.
W&B will create a new artifact and assign it a v0 alias if you pass a name to the wandb.Artifact API that does not exist in your project. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.
W&B will retrieve an existing artifact if you pass a name and artifact type to the wandb.Artifact API that matches an existing artifact in your project. The retrieved artifact will have a version greater than 1.
Single run
Log a new version of an Artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact.
Based on your use case, select one of the tabs below to create a new artifact version inside or outside of a run:
Create an artifact version within a W&B run:
Create a run with wandb.init. (Line 1)
Create a new artifact or retrieve an existing one with wandb.Artifact . (Line 2)
Add files to the artifact with .add_file. (Line 9)
Log the artifact to the run with .log_artifact. (Line 10)
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png")
run.log_artifact(artifact)
Create an artifact version outside of a W&B run:
Create a new artifact or retrieve an existing one with wanb.Artifact. (Line 1)
Add files to the artifact with .add_file. (Line 4)
Save the artifact with .save. (Line 5)
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference`artifact.add_file("image1.png")
artifact.save()
Distributed runs
Allow a collection of runs to collaborate on a version before committing it. This is in contrast to single run mode described above where one run provides all the data for a new version.
Each run in the collection needs to be aware of the same unique ID (called distributed_id) in order to collaborate on the same version. By default, if present, W&B uses the run’s group as set by wandb.init(group=GROUP) as the distributed_id.
There must be a final run that “commits” the version, permanently locking its state.
Use upsert_artifact to add to the collaborative artifact and finish_artifact to finalize the commit.
Consider the following example. Different runs (labelled below as Run 1, Run 2, and Run 3) add a different image file to the same artifact with upsert_artifact.
Run 1:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png")
run.upsert_artifact(artifact, distributed_id="my_dist_artifact")
Run 2:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image2.png")
run.upsert_artifact(artifact, distributed_id="my_dist_artifact")
Run 3
Must run after Run 1 and Run 2 complete. The Run that calls finish_artifact can include files in the artifact, but does not need to.
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image3.png")
run.finish_artifact(artifact, distributed_id="my_dist_artifact")
Create a new artifact version from an existing version
Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn’t change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an incremental artifact.
Here are some scenarios for each type of incremental change you might encounter:
add: you periodically add a new subset of files to a dataset after collecting a new batch.
remove: you discovered several duplicate files and want to remove them from your artifact.
update: you corrected annotations for a subset of files and want to replace the old files with the correct ones.
You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you will need to have all the contents of your artifact on your local disk. When making an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version.
You can create an incremental artifact within a single run or with a set of runs (distributed mode).
Follow the procedure below to incrementally change an artifact:
Obtain the artifact version you want to perform an incremental change on:
Lastly, log or save your changes. The following tabs show you how to save your changes inside and outside of a W&B run. Select the tab that is appropriate for your use case:
run.log_artifact(draft_artifact)
draft_artifact.save()
Putting it all together, the code examples above look like:
with wandb.init(job_type="modify dataset") as run:
saved_artifact = run.use_artifact(
"my_artifact:latest" ) # fetch artifact and input it into your run draft_artifact = saved_artifact.new_draft() # create a draft version# modify a subset of files in the draft version draft_artifact.add_file("file_to_add.txt")
draft_artifact.remove("dir_to_remove/")
run.log_artifact(
artifact
) # log your changes to create a new version and mark it as output to your run
client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest") # load your artifactdraft_artifact = saved_artifact.new_draft() # create a draft version# modify a subset of files in the draft versiondraft_artifact.remove("deleted_file.txt")
draft_artifact.add_file("modified_file.txt")
draft_artifact.save() # commit changes to the draft
1.6 - Track external files
Track files saved outside the W&B such as in an Amazon S3 bucket, GCS bucket, HTTP file server, or even an NFS share.
Use reference artifacts to track files saved outside the W&B system, for example in an Amazon S3 bucket, GCS bucket, Azure blob, HTTP file server, or even an NFS share. Log artifacts outside of a W&B Run with the W&B CLI.
Log artifacts outside of runs
W&B creates a run when you log an artifact outside of a run. Each artifact belongs to a run, which in turn belongs to a project. An artifact (version) also belongs to a collection, and has a type.
Use the wandb artifact put command to upload an artifact to the W&B server outside of a W&B run. Provide the name of the project you want the artifact to belong to along with the name of the artifact (project/artifact_name).Optionally provide the type (TYPE). Replace PATH in the code snippet below with the file path of the artifact you want to upload.
$ wandb artifact put --name project/artifact_name --type TYPE PATH
W&B will create a new project if a the project you specify does not exist. For information on how to download an artifact, see Download and use artifacts.
Track artifacts outside of W&B
Use W&B Artifacts for dataset versioning and model lineage, and use reference artifacts to track files saved outside the W&B server. In this mode an artifact only stores metadata about the files, such as URLs, size, and checksums. The underlying data never leaves your system. See the Quick start for information on how to save files and directories to W&B servers instead.
The following describes how to construct reference artifacts and how to best incorporate them into your workflows.
Amazon S3 / GCS / Azure Blob Storage References
Use W&B Artifacts for dataset and model versioning to track references in cloud storage buckets. With artifact references, seamlessly layer tracking on top of your buckets with no modifications to your existing storage layout.
Artifacts abstract away the underlying cloud storage vendor (such AWS, GCP or Azure). Information described in the proceeding section apply uniformly to Amazon S3, Google Cloud Storage and Azure Blob Storage.
W&B Artifacts support any Amazon S3 compatible interface, including MinIO. The scripts below work as-is, when you set the AWS_S3_ENDPOINT_URL environment variable to point at your MinIO server.
Assume we have a bucket with the following structure:
By default, W&B imposes a 10,000 object limit when adding an object prefix. You can adjust this limit by specifying max_objects= in calls to add_reference.
Our new reference artifact mnist:latest looks and behaves similarly to a regular artifact. The only difference is that the artifact only consists of metadata about the S3/GCS/Azure object such as its ETag, size, and version ID (if object versioning is enabled on the bucket).
W&B will use the default mechanism to look for credentials based on the cloud provider you use. Read the documentation from your cloud provider to learn more about the credentials used:
For AWS, if the bucket is not located in the configured user’s default region, you must set the AWS_REGION environment variable to match the bucket region.
Interact with this artifact similarly to a normal artifact. In the App UI, you can look through the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact.
Rich media such as images, audio, video, and point clouds may fail to render in the App UI depending on the CORS configuration of your bucket. Allow listing app.wandb.ai in your bucket’s CORS settings will allow the App UI to properly render such rich media.
Panels might fail to render in the App UI for private buckets. If your company has a VPN, you could update your bucket’s access policy to whitelist IPs within your VPN.
W&B will use the metadata recorded when the artifact was logged to retrieve the files from the underlying bucket when it downloads a reference artifact. If your bucket has object versioning enabled, W&B will retrieve the object version corresponding to the state of the file at the time an artifact was logged. This means that as you evolve the contents of your bucket, you can still point to the exact iteration of your data a given model was trained on since the artifact serves as a snapshot of your bucket at the time of training.
W&B recommends that you enable ‘Object Versioning’ on your storage buckets if you overwrite files as part of your workflow. With versioning enabled on your buckets, artifacts with references to files that have been overwritten will still be intact because the older object versions are retained.
Based on your use case, read the instructions to enable object versioning: AWS, GCP, Azure.
Tying it together
The following code example demonstrates a simple workflow you can use to track a dataset in Amazon S3, GCS, or Azure that feeds into a training job:
import wandb
run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")
# Track the artifact and mark it as an input to# this run in one swoop. A new artifact version# is only logged if the files in the bucket changed.run.use_artifact(artifact)
artifact_dir = artifact.download()
# Perform training here...
To track models, we can log the model artifact after the training script uploads the model files to the bucket:
Another common pattern for fast access to datasets is to expose an NFS mount point to a remote filesystem on all machines running training jobs. This can be an even simpler solution than a cloud storage bucket because from the perspective of the training script, the files look just like they are sitting on your local filesystem. Luckily, that ease of use extends into using Artifacts to track references to file systems, whether they are mounted or not.
Assume we have a filesystem mounted at /mount with the following structure:
mount
+-- datasets/
| +-- mnist/
+-- models/
+-- cnn/
Under mnist/ we have our dataset, a collection of images. Let’s track it with an artifact:
By default, W&B imposes a 10,000 file limit when adding a reference to a directory. You can adjust this limit by specifying max_objects= in calls to add_reference.
Note the triple slash in the URL. The first component is the file:// prefix that denotes the use of filesystem references. The second is the path to our dataset, /mount/datasets/mnist/.
The resulting artifact mnist:latest looks and acts just like a regular artifact. The only difference is that the artifact only consists of metadata about the files, such as their sizes and MD5 checksums. The files themselves never leave your system.
You can interact with this artifact just as you would a normal artifact. In the UI, you can browse the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. However, the UI will not be able to render rich media such as images, audio, etc. as the data itself is not contained within the artifact.
For filesystem references, a download() operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of /mount/datasets/mnist will be copied into the directory artifacts/mnist:v0/. If an artifact contains a reference to a file that was overwritten, then download() will throw an error as the artifact can no longer be reconstructed.
Putting everything together, here’s a simple workflow you can use to track a dataset under a mounted filesystem that feeds into a training job:
import wandb
run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")
# Track the artifact and mark it as an input to# this run in one swoop. A new artifact version# is only logged if the files under the directory# changed.run.use_artifact(artifact)
artifact_dir = artifact.download()
# Perform training here...
To track models, we can log the model artifact after the training script writes the model files to the mount point:
import wandb
run = wandb.init()
# Training here...# Write model to diskmodel_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("file:///mount/cnn/my_model.h5")
run.log_artifact(model_artifact)
1.7 - Manage data
1.7.1 - Delete an artifact
Delete artifacts interactively with the App UI or programmatically with the W&B SDK/
Delete artifacts interactively with the App UI or programmatically with the W&B SDK. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage.
The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions.
The sections in this page describe how to delete specific artifact versions, how to delete an artifact collection, how to delete artifacts with and without aliases, and more. You can schedule when artifacts are deleted from W&B with TTL policies. For more information, see Manage data retention with Artifact TTL policy.
Artifacts that are scheduled for deletion with a TTL policy, deleted with the W&B SDK, or deleted with the W&B App UI are first soft-deleted. Artifacts that are soft deleted undergo garbage collection before they are hard-deleted.
Delete an artifact version
To delete an artifact version:
Select the name of the artifact. This will expand the artifact view and list all the artifact versions associated with that artifact.
From the list of artifacts, select the artifact version you want to delete.
On the right hand side of the workspace, select the kebab dropdown.
Choose Delete.
An artifact version can also be deleted programatically via the delete() method. See the examples below.
Delete multiple artifact versions with aliases
The following code example demonstrates how to delete artifacts that have aliases associated with them. Provide the entity, project name, and run ID that created the artifacts.
import wandb
run = api.run("entity/project/run_id")
for artifact in run.logged_artifacts():
artifact.delete()
Set the delete_aliases parameter to the boolean value, True to delete aliases if the artifact has one or more aliases.
import wandb
run = api.run("entity/project/run_id")
for artifact in run.logged_artifacts():
# Set delete_aliases=True in order to delete# artifacts with one more aliases artifact.delete(delete_aliases=True)
Delete multiple artifact versions with a specific alias
The proceeding code demonstrates how to delete multiple artifact versions that have a specific alias. Provide the entity, project name, and run ID that created the artifacts. Replace the deletion logic with your own:
import wandb
runs = api.run("entity/project_name/run_id")
# Delete artifact ith alias 'v3' and 'v4for artifact_version in runs.logged_artifacts():
# Replace with your own deletion logic.if artifact_version.name[-2:] =="v3"or artifact_version.name[-2:] =="v4":
artifact.delete(delete_aliases=True)
Delete all versions of an artifact that do not have an alias
The following code snippet demonstrates how to delete all versions of an artifact that do not have an alias. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively. Replace the <> with the name of your artifact:
import wandb
# Provide your entity and a project name when you# use wandb.Api methods.api = wandb.Api(overrides={"project": "project", "entity": "entity"})
artifact_type, artifact_name ="<>"# provide type and namefor v in api.artifact_versions(artifact_type, artifact_name):
# Clean up versions that don't have an alias such as 'latest'.# NOTE: You can put whatever deletion logic you want here.if len(v.aliases) ==0:
v.delete()
Delete an artifact collection
To delete an artifact collection:
Navigate to the artifact collection you want to delete and hover over it.
Select the kebab dropdown next to the artifact collection name.
Choose Delete.
You can also delete artifact collection programmatically with the delete() method. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively:
import wandb
# Provide your entity and a project name when you# use wandb.Api methods.api = wandb.Api(overrides={"project": "project", "entity": "entity"})
collection = api.artifact_collection(
"<artifact_type>", "entity/project/artifact_collection_name")
collection.delete()
How to enable garbage collection based on how W&B is hosted
Garbage collection is enabled by default if you use W&B’s shared cloud. Based on how you host W&B, you might need to take additional steps to enable garbage collection, this includes:
Set the GORILLA_ARTIFACT_GC_ENABLED environment variable to true: GORILLA_ARTIFACT_GC_ENABLED=true
Enable bucket versioning if you use AWS, GCP or any other storage provider such as Minio. If you use Azure, enable soft deletion.
Soft deletion in Azure is equivalent to bucket versioning in other storage providers.
The following table describes how to satisfy requirements to enable garbage collection based on your deployment type.
Schedule when artifacts are deleted from W&B with W&B Artifact time-to-live (TTL) policy. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W&B deletes artifacts, see the Delete artifacts page.
Check out this video tutorial to learn how to manage data retention with Artifacts TTL in the W&B App.
W&B deactivates the option to set a TTL policy for model artifacts linked to the Model Registry. This is to help ensure that linked models do not accidentally expire if used in production workflows.
Only team admins can view a team’s settings and access team level TTL settings such as (1) permitting who can set or edit a TTL policy or (2) setting a team default TTL.
If you do not see the option to set or edit a TTL policy in an artifact’s details in the W&B App UI or if setting a TTL programmatically does not successfully change an artifact’s TTL property, your team admin has not given you permissions to do so.
Auto-generated Artifacts
Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W&B cannot have TTL policies set for them.
The following Artifact types indicate an auto-generated Artifact:
run_table
code
job
Any Artifact type starting with: wandb-*
You can check an Artifact’s type on the W&B platform or programmatically:
import wandb
run = wandb.init(project="<my-project-name>")
artifact = run.use_artifact(artifact_or_name="<my-artifact-name>")
print(artifact.type)
Replace the values enclosed with <> with your own.
Define who can edit and set TTL policies
Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions.
Only team admins can define who can set or edit a TTL policy.
Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
From the TTL permissions dropdown, select who can set and edit TTL policies.
Click on Review and save settings.
Confirm the changes and select Save settings.
Create a TTL policy
Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created.
For all the code snippets below, replace the content wrapped in <> with your information to use the code snippet.
Set a TTL policy when you create an artifact
Use the W&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days.
Defining a TTL policy when you create an artifact is similar to how you normally create an artifact. With the exception that you pass in a time delta to the artifact’s ttl attribute.
The following code snippet shows how to set a TTL policy for an artifact:
import wandb
from datetime import timedelta
artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = timedelta(days=365*2) # Delete in two yearsartifact.save()
The preceding code example sets the TTL policy to two years.
Navigate to your W&B project in the W&B App UI.
Select the artifact icon on the left panel.
From the list of artifacts, expand the artifact type you
Select on the artifact version you want to edit the TTL policy for.
Click on the Version tab.
From the dropdown, select Edit TTL policy.
Within the modal that appears, select Custom from the TTL policy dropdown.
Within the TTL duration field, set the TTL policy in units of days.
Select the Update TTL button to save your changes.
Set default TTL policies for a team
Only team admins can set a default TTL policy for a team.
Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team’s default TTL.
Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
Click on the Set team’s default TTL policy.
Within the Duration field, set the TTL policy in units of days.
Click on Review and save settings.
7/ Confirm the changes and then select Save settings.
Set a TTL policy outside of a run
Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days.
The following code sample shows how to fetch an artifact using the public API and set the TTL policy.
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
artifact.ttl = timedelta(days=365) # Delete in one yearartifact.save()
Deactivate a TTL policy
Use the W&B Python SDK or W&B App UI to deactivate a TTL policy for a specific artifact version.
Within your project, select the Artifacts tab in the left sidebar.
Click on a collection.
Within the collection view you can see all of the artifacts in the selected collection. Within the Time to Live column you will see the TTL policy assigned to that artifact.
1.7.3 - Manage artifact storage and memory allocation
Manage storage, memory allocation of W&B Artifacts.
W&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit.
During training, W&B locally saves logs, artifacts, and configuration files in the following local directories:
File
Default location
To change default location set:
logs
./wandb
dir in wandb.init or set the WANDB_DIR environment variable
artifacts
~/.cache/wandb
the WANDB_CACHE_DIR environment variable
configs
~/.config/wandb
the WANDB_CONFIG_DIR environment variable
Depending on the machine on wandb is initialized on, these default folders may not be located in a writeable part of the file system. This might trigger an error.
Clean up local artifact cache
W&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the wandb artifact cache cleanup command to prune the cache and to remove any files that have not been used recently.
The proceeding code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal:
$ wandb artifact cache cleanup 1GB
1.8 - Explore artifact graphs
Traverse automatically created direct acyclic W&B Artifact graphs.
W&B automatically tracks the artifacts a given run logged as well as the artifacts a given run uses. These artifacts can include datasets, models, evaluation results, or more. You can explore an artifact’s lineage to track and manage the various artifacts produced throughout the machine learning lifecycle.
Lineage
Tracking an artifact’s lineage has several key benefits:
Reproducibility: By tracking the lineage of all artifacts, teams can reproduce experiments, models, and results, which is essential for debugging, experimentation, and validating machine learning models.
Version Control: Artifact lineage involves versioning artifacts and tracking their changes over time. This allows teams to roll back to previous versions of data or models if needed.
Auditing: Having a detailed history of the artifacts and their transformations enables organizations to comply with regulatory and governance requirements.
Collaboration and Knowledge Sharing: Artifact lineage facilitates better collaboration among team members by providing a clear record of attempts as well as what worked, and what didn’t. This helps in avoiding duplication of efforts and accelerates the development process.
Finding an artifact’s lineage
When selecting an artifact in the Artifacts tab, you can see your artifact’s lineage. This graph view shows a general overview of your pipeline.
To view an artifact graph:
Navigate to your project in the W&B App UI
Choose the artifact icon on the left panel.
Select Lineage.
Navigating the lineage graph
The artifact or job type you provide appears in front of its name, with artifacts represented by blue icons and runs represented by green icons. Arrows detail the input and output of a run or artifact on the graph.
You can view the type and the name of artifact in both the left sidebar and in the Lineage tab.
For a more detailed view, click any individual artifact or run to get more information on a particular object.
Artifact clusters
When a level of the graph has five or more runs or artifacts, it creates a cluster. A cluster has a search bar to find specific versions of runs or artifacts and pulls an individual node from a cluster to continue investigating the lineage of a node inside a cluster.
Clicking on a node opens a preview with an overview of the node. Clicking on the arrow extracts the individual run or artifact so you can examine the lineage of the extracted node.
Create an artifact. First, create a run with wandb.init. Then,create a new artifact or retrieve an existing one with wandb.Artifact. Next, add files to the artifact with .add_file. Finally, log the artifact to the run with .log_artifact. The finished code looks something like this:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png")
run.log_artifact(artifact)
Use the artifact object’s logged_by and used_by methods to walk the graph from the artifact:
# Walk up and down the graph from an artifact:producer_run = artifact.logged_by()
consumer_runs = artifact.used_by()
Learn where W&B files are stored by default. Explore how to save, store sensitive information.
Files are uploaded to Google Cloud bucket managed by W&B when you log artifacts. The contents of the bucket are encrypted both at rest and in transit. Artifact files are only visible to users who have access to the corresponding project.
When you delete a version of an artifact, it is marked for soft deletion in our database and removed from your storage cost. When you delete an entire artifact, it is queued for permanently deletion and all of its contents are removed from the W&B bucket. If you have specific needs around file deletion please reach out to Customer Support.
For sensitive datasets that cannot reside in a multi-tenant environment, you can use either a private W&B server connected to your cloud bucket or reference artifacts. Reference artifacts track references to private buckets without sending file contents to W&B. Reference artifacts maintain links to files on your buckets or servers. In other words, W&B only keeps track of the metadata associated with the files and not the files themselves.
Create a reference artifact similar to how you create a non reference artifact:
import wandb
run = wandb.init()
artifact = wandb.Artifact("animals", type="dataset")
artifact.add_reference("s3://my-bucket/animals")
For alternatives, contact us at contact@wandb.com to talk about private cloud and on-premises installations.
1.10 - Tutorial: Create, track, and use a dataset artifact
Artifacts quickstart shows how to create, track, and use a dataset artifact with W&B.
This walkthrough demonstrates how to create, track, and use a dataset artifact from W&B Runs.
1. Log into W&B
Import the W&B library and log in to W&B. You will need to sign up for a free W&B account if you have not done so already.
import wandb
wandb.login()
2. Initialize a run
Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. Provide a project name and a job type:
# Create a W&B Run. Here we specify 'dataset' as the job type since this example# shows how to create a dataset artifact.run = wandb.init(project="artifacts-example", job_type="upload-dataset")
3. Create an artifact object
Create an artifact object with the wandb.Artifact() API. Provide a name for the artifact and a description of the file type for the name and type parameters, respectively.
For example, the following code snippet demonstrates how to create an artifact called ‘bicycle-dataset’ with a ‘dataset’ label:
For more information about how to construct an artifact, see Construct artifacts.
Add the dataset to the artifact
Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named dataset.h5 that is saved locally on our machine to the artifact:
# Add a file to the artifact's contentsartifact.add_file(local_path="dataset.h5")
Replace the filename dataset.h5 in the preceding code snippet with the path to the file you want to add to the artifact.
4. Log the dataset
Use the W&B run objects log_artifact() method to both save your artifact version and declare the artifact as an output of the run.
# Save the artifact version to W&B and mark it# as the output of this runrun.log_artifact(artifact)
A 'latest' alias is created by default when you log an artifact. For more information about artifact aliases and versions, see Create a custom alias and Create new artifact versions, respectively.
5. Download and use the artifact
The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W&B servers.
First, initialize a new run object with wandb.init().
Second, use the run objects use_artifact() method to tell W&B what artifact to use. This returns an artifact object.
Third, use the artifacts download() method to download the contents of the artifact.
# Create a W&B Run. Here we specify 'training' for 'type'# because we will use this run to track training.run = wandb.init(project="artifacts-example", job_type="training")
# Query W&B for an artifact and mark it as input to this runartifact = run.use_artifact("bicycle-dataset:latest")
# Download the artifact's contentsartifact_dir = artifact.download()
Alternatively, you can use the Public API (wandb.Api) to export (or update data) data already saved in a W&B outside of a Run. See Track external files for more information.
2 - Tables
Iterate on datasets and understand model predictions
A Table is a two-dimensional grid of data where each column has a single type of data. Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types.
import wandb
run = wandb.init(project="table-test")
# Create and log a new table.my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
run.log({"Table Name": my_table})
Pass a Pandas Dataframe to wandb.Table() to create a new table.
import wandb
import pandas as pd
df = pd.read_csv("my_data.csv")
run = wandb.init(project="df-table")
my_table = wandb.Table(dataframe=df)
wandb.log({"Table Name": my_table})
For more information on supported data types, see the wandb.Table in the W&B API Reference Guide.
2. Visualize tables in your project workspace
View the resulting table in your workspace.
Navigate to your project in the W&B App.
Select the name of your run in your project workspace. A new panel is added for each unique table key.
In this example, my_table, is logged under the key "Table Name".
3. Compare across model versions
Log sample tables from multiple W&B Runs and compare results in the project workspace. In this example workspace, we show how to combine rows from multiple different versions in the same table.
Use the table filter, sort, and grouping features to explore and evaluate model results.
2.2 - Visualize and analyze tables
Visualize and analyze W&B Tables.
Customize your W&B Tables to answer questions about your machine learning model’s performance, analyze your data, and more.
Interactively explore your data to:
Compare changes precisely across models, epochs, or individual examples
Understand higher-level patterns in your data
Capture and communicate your insights with visual samples
W&B Tables posses the following behaviors:
Stateless in an artifact context: any table logged alongside an artifact version resets to its default state after you close the browser window
Stateful in a workspace or report context: any changes you make to a table in a single run workspace, multi-run project workspace, or Report persists.
For information on how to save your current W&B Table view, see Save your view.
How to view two tables
Compare two tables with a merged view or a side-by-side view. For example, the image below demonstrates a table comparison of MNIST data.
Follow these steps to compare two tables:
Go to your project in the W&B App.
Select the artifacts icon on the left panel.
Select an artifact version.
In the following image we demonstrate a model’s predictions on MNIST validation data after each of five epochs (view interactive example here).
Hover over the second artifact version you want to compare in the sidebar and click Compare when it appears. For example, in the image below we select a version labeled as “v4” to compare to MNIST predictions made by the same model after 5 epochs of training.
Merged view
Initially you see both tables merged together. The first table selected has index 0 and a blue highlight, and the second table has index 1 and a yellow highlight. View a live example of merged tables here.
From the merged view, you can
choose the join key: use the dropdown at the top left to set the column to use as the join key for the two tables. Typically this is the unique identifier of each row, such as the filename of a specific example in your dataset or an incrementing index on your generated samples. Note that it’s currently possible to select any column, which may yield illegible tables and slow queries.
concatenate instead of join: select “concatenating all tables” in this dropdown to union all the rows from both tables into one larger Table instead of joining across their columns
reference each Table explicitly: use 0, 1, and * in the filter expression to explicitly specify a column in one or both table instances
visualize detailed numerical differences as histograms: compare the values in any cell at a glance
Side-by-side view
To view the two tables side-by-side, change the first dropdown from “Merge Tables: Table” to “List of: Table” and then update the “Page size” respectively. Here the first Table selected is on the left and the second one is on the right. Also, you can compare these tables vertically as well by clicking on the “Vertical” checkbox.
compare the tables at a glance: apply any operations (sort, filter, group) to both tables in tandem and spot any changes or differences quickly. For example, view the incorrect predictions grouped by guess, the hardest negatives overall, the confidence score distribution by true label, etc.
explore two tables independently: scroll through and focus on the side/rows of interest
Log a table in an artifact for each meaningful step of training to analyze model performance over training time. For example, you could log a table at the end of every validation step, after every 50 epochs of training, or any frequency that makes sense for your pipeline. Use the side-by-side view to visualize changes in model predictions.
For a more detailed walkthrough of visualizing predictions across training time, see this report and this interactive notebook example.
Compare tables across model variants
Compare two artifact versions logged at the same step for two different models to analyze model performance across different configurations (hyperparameters, base architectures, and so forth).
For example, compare predictions between a baseline and a new model variant, 2x_layers_2x_lr, where the first convolutional layer doubles from 32 to 64, the second from 128 to 256, and the learning rate from 0.001 to 0.002. From this live example, use the side-by-side view and filter down to the incorrect predictions after 1 (left tab) versus 5 training epochs (right tab).
Save your view
Tables you interact with in the run workspace, project workspace, or a report automatically saves their view state. If you apply any table operations then close your browser, the table retains the last viewed configuration when you next navigate to the table.
Tables you interact with in the artifact context remains stateless.
To save a table from a workspace in a particular state, export it to a W&B Report. To export a table to report:
Select the kebob icon (three vertical dots) in the top right corner of your workspace visualization panel.
Select either Share panel or Add to report.
Examples
These reports highlight the different use cases of W&B Tables:
The following sections highlight some of the ways you can use tables:
View your data
Log metrics and rich media during model training or evaluation, then visualize results in a persistent database synced to the cloud, or to your hosting instance.
View, sort, filter, group, join, and query tables to understand your data and model performance—no need to browse static files or rerun analysis scripts.
Zoom in to visualize a specific prediction at a specific step. Zoom out to see the aggregate statistics, identify patterns of errors, and understand opportunities for improvement. This tool works for comparing steps from a single model training, or results across different model versions.
Interact with audio tables in this report on timbre transfer. You can compare a recorded whale song with a synthesized rendition of the same melody on an instrument like violin or trumpet. You can also record your own songs and explore their synthesized versions in W&B with this colab.
Text
Browse text samples from training data or generated output, dynamically group by relevant fields, and align your evaluation across model variants or experiment settings. Render text as Markdown or use visual diff mode to compare texts. Explore a simple character-based RNN for generating Shakespeare in this report.
Video
Browse and aggregate over videos logged during training to understand your models. Here is an early example using the SafeLife benchmark for RL agents seeking to minimize side effects
Like all W&B Artifacts, Tables can be converted into pandas dataframes for easy data exporting.
Convert table to artifact
First, you’ll need to convert the table to an artifact. The easiest way to do this using artifact.get(table, "table_name"):
# Create and log a new table.with wandb.init() as r:
artifact = wandb.Artifact("my_dataset", type="dataset")
table = wandb.Table(
columns=["a", "b", "c"], data=[(i, i *2, 2**i) for i in range(10)]
)
artifact.add(table, "my_table")
wandb.log_artifact(artifact)
# Retrieve the created table using the artifact you created.with wandb.init() as r:
artifact = r.use_artifact("my_dataset:latest")
table = artifact.get("my_table")
Convert artifact to Dataframe
Then, convert the table into a dataframe:
# Following from the last code example:df = table.get_dataframe()
Export Data
Now you can export using any method dataframe supports:
# Converting the table data to .csvdf.to_csv("example.csv", encoding="utf-8")
Share updates with collaborators, either as a LaTeX zip file a PDF.
The following image shows a section of a report created from metrics that were logged to W&B over the course of training.
View the report where the above image was taken from here.
How it works
Create a collaborative report with a few clicks.
Navigate to your W&B project workspace in the W&B App.
Click the Create report button in the upper right corner of your workspace.
A modal titled Create Report will appear. Select the charts and panels you want to add to your report. (You can add or remove charts and panels later).
Click Create report.
Edit the report to your desired state.
Click Publish to project.
Click the Share button to share your report with collaborators.
See the Create a report page for more information on how to create reports interactively an programmatically with the W&B Python SDK.
How to get started
Depending on your use case, explore the following resources to get started with W&B Reports:
Navigate to your project workspace in the W&B App.
Click Create report in the upper right corner of your workspace.
A modal will appear. Select the charts you would like to start with. You can add or delete charts later from the report interface.
Select the Filter run sets option to prevent new runs from being added to your report. You can toggle this option on or off. Once you click Create report, a draft report will be available in the report tab to continue working on.
Navigate to your project workspace in the W&B App.
Select to the Reports tab (clipboard image) in your project.
Select the Create Report button on the report page.
Create a report programmatically with the wandb library.
Install W&B SDK and Workspaces API:
pip install wandb wandb-workspaces
Next, import workspaces
import wandb
import wandb_workspaces.reports.v2 as wr
Create a report with wandb_workspaces.reports.v2.Report. Create a report instance with the Report Class Public API (wandb.apis.reports). Specify a name for the project.
report = wr.Report(project="report_standard")
Save the report. Reports are not uploaded to the W&B server until you call the .save() method:
report.save()
For information on how to edit a report interactively with the App UI or programmatically, see Edit a report.
3.2 - Edit a report
Edit a report interactively with the App UI or programmatically with the W&B SDK.
Edit a report interactively with the App UI or programmatically with the W&B SDK.
Reports consist of blocks. Blocks make up the body of a report. Within these blocks you can add text, images, embedded visualizations, plots from experiments and run, and panels grids.
Panel grids are a specific type of block that hold panels and run sets. Run sets are a collection of runs logged to a project in W&B. Panels are visualizations of run set data.
Ensure that you have wandb-workspaces installed in addition to the W&B Python SDK if you want to programmatically edit a report:
pip install wandb wandb-workspaces
Add plots
Each panel grid has a set of run sets and a set of panels. The run sets at the bottom of the section control what data shows up on the panels in the grid. Create a new panel grid if you want to add charts that pull data from a different set of runs.
Enter a forward slash (/) in the report to display a dropdown menu. Select Add panel to add a panel. You can add any panel that is supported by W&B, including a line plot, scatter plot or parallel coordinates chart.
Add plots to a report programmatically with the SDK. Pass a list of one or more plot or chart objects to the panels parameter in the PanelGrid Public API Class. Create a plot or chart object with its associated Python Class.
The proceeding examples demonstrates how to create a line plot and scatter plot.
For more information about available plots and charts you can add to a report programmatically, see wr.panels.
Add run sets
Add run sets from projects interactively with the App UI or the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown, choose Panel Grid. This will automatically import the run set from the project the report was created from.
Add run sets from projects with the wr.Runset() and wr.PanelGrid Classes. The proceeding procedure describes how to add a runset:
Create a wr.Runset() object instance. Provide the name of the project that contains the runsets for the project parameter and the entity that owns the project for the entity parameter.
Create a wr.PanelGrid() object instance. Pass a list of one or more runset objects to the runsets parameter.
Store one or more wr.PanelGrid() object instances in a list.
Update the report instance blocks attribute with the list of panel grid instances.
Add code blocks to your report interactively with the App UI or with the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Code.
Select the name of the programming language on the right hand of the code block. This will expand a dropdown. From the dropdown, select your programming language syntax. You can choose from Javascript, Python, CSS, JSON, HTML, Markdown, and YAML.
Use the wr.CodeBlock Class to create a code block programmatically. Provide the name of the language and the code you want to display for the language and code parameters, respectively.
For example the proceeding example demonstrates a list in YAML file:
Add markdown to your report interactively with the App UI or with the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Markdown.
Use the wandb.apis.reports.MarkdownBlock Class to create a markdown block programmatically. Pass a string to the text parameter:
import wandb
import wandb_workspaces.reports.v2 as wr
report = wr.Report(project="report-editing")
report.blocks = [
wr.MarkdownBlock(text="Markdown cell with *italics* and **bold** and $e=mc^2$")
]
This will render a markdown block similar to:
Add HTML elements
Add HTML elements to your report interactively with the App UI or with the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown select a type of text block. For example, to create an H2 heading block, select the Heading 2 option.
Pass a list of one or more HTML elements to wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to create an H1, H2, and an unordered list:
This will render a HTML elements to the following:
Embed rich media links
Embed rich media within the report with the App UI or with the W&B SDK.
Copy and past URLs into reports to embed rich media within the report. The following animations demonstrate how to copy and paste URLs from Twitter, YouTube, and SoundCloud.
Twitter
Copy and paste a Tweet link URL into a report to view the Tweet within the report.
Youtube
Copy and paste a YouTube video URL link to embed a video in the report.
SoundCloud
Copy and paste a SoundCloud link to embed an audio file into a report.
Pass a list of one or more embedded media objects to the wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to embed video and Twitter media into a report:
import wandb
import wandb_workspaces.reports.v2 as wr
report = wr.Report(project="report-editing")
report.blocks = [
wr.Video(url="https://www.youtube.com/embed/6riDJMI-Y8U"),
wr.Twitter(
embed_html='<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The voice of an angel, truly. <a href="https://twitter.com/hashtag/MassEffect?src=hash&ref_src=twsrc%5Etfw">#MassEffect</a> <a href="https://t.co/nMev97Uw7F">pic.twitter.com/nMev97Uw7F</a></p>— Mass Effect (@masseffect) <a href="https://twitter.com/masseffect/status/1428748886655569924?ref_src=twsrc%5Etfw">August 20, 2021</a></blockquote>\n' ),
]
report.save()
Duplicate and delete panel grids
If you have a layout that you would like to reuse, you can select a panel grid and copy-paste it to duplicate it in the same report or even paste it into a different report.
Highlight a whole panel grid section by selecting the drag handle in the upper right corner. Click and drag to highlight and select a region in a report such as panel grids, text, and headings.
Select a panel grid and press delete on your keyboard to delete a panel grid.
Collapse headers to organize Reports
Collapse headers in a Report to hide content within a text block. When the report is loaded, only headers that are expanded will show content. Collapsing headers in reports can help organize your content and prevent excessive data loading. The proceeding gif demonstrates the process.
3.3 - Collaborate on reports
Collaborate and share W&B Reports with peers, co-workers, and your team.
Once you have saved a report, you can select the Share button to collaborate. A draft copy of the report is created when you select the Edit button. Draft reports auto-save. Select Save to report to publish your changes to the shared report.
A warning notification will appear if an edit conflict occurs. This can occur if you and another collaborator edit the same report at the same time. The warning notification will guide you to resolve potential edit conflicts.
Comment on reports
Click the comment button on a panel in a report to add a comment directly to that panel.
3.4 - Clone and export reports
Export a W&B Report as a PDF or LaTeX.
Export reports
Export a report as a PDF or LaTeX. Within your report, select the kebab icon to expand the dropdown menu. Choose Download and select either PDF or LaTeX output format.
Cloning reports
Within your report, select the kebab icon to expand the dropdown menu. Choose the Clone this report button. Pick a destination for your cloned report in the modal. Choose Clone report.
Clone a report to reuse a project’s template and format. Cloned projects are visible to your team if you clone a project within the team’s account. Projects cloned within an individual’s account are only visible to that user.
Embed W&B reports directly into Notion or with an HTML IFrame element.
HTML iframe element
Select the Share button on the upper right hand corner within a report. A modal window will appear. Within the modal window, select Copy embed code. The copied code will render within an Inline Frame (IFrame) HTML element. Paste the copied code into an iframe HTML element of your choice.
Only public reports are viewable when embedded.
Confluence
The proceeding animation demonstrates how to insert the direct link to the report within an IFrame cell in Confluence.
Notion
The proceeding animation demonstrates how to insert a report into a Notion document using an Embed block in Notion and the report’s embedded code.
Gradio
You can use the gr.HTML element to embed W&B Reports within Gradio Apps and use them within Hugging Face Spaces.
import gradio as gr
defwandb_report(url):
iframe =f'<iframe src={url} style="border:none;height:1024px;width:100%">'return gr.HTML(iframe)
with gr.Blocks() as demo:
report = wandb_report(
"https://wandb.ai/_scott/pytorch-sweeps-demo/reports/loss-22-10-07-16-00-17---VmlldzoyNzU2NzAx" )
demo.launch()
3.6 - Compare runs across projects
Compare runs from two different projects with cross-project reports.
Compare runs from two different projects with cross-project reports. Use the project selector in the run set table to pick a project.
The visualizations in the section pull columns from the first active runset. Make sure that the first run set checked in the section has that column available if you do not see the metric you are looking for in the line plot.
This feature supports history data on time series lines, but we don’t support pulling different summary metrics from different projects. In other words, you can not create a scatter plot from columns that are only logged in another project.
If you need to compare runs from two projects and the columns are not working, add a tag to the runs in one project and then move those runs to the other project. You can still filter only the runs from each project, but the report includes all the columns for both sets of runs.
View-only report links
Share a view-only link to a report that is in a private project or team project.
View-only report links add a secret access token to the URL, so anyone who opens the link can view the page. Anyone can use the magic link to view the report without logging in first. For customers on W&B Local private cloud installations, these links remain behind your firewall, so only members of your team with access to your private instance and access to the view-only link can view the report.
In view-only mode, someone who is not logged in can see the charts and mouse over to see tooltips of values, zoom in and out on charts, and scroll through columns in the table. When in view mode, they cannot create new charts or new table queries to explore the data. View-only visitors to the report link won’t be able to click a run to get to the run page. Also, the view-only visitors would not be able to see the share modal but instead would see a tooltip on hover which says: Sharing not available for view only access.
The magic links are only available for “Private” and “Team” projects. For “Public” (anyone can view) or “Open” (anyone can view and contribute runs) projects, the links can’t turn on/off because this project is public implying that it is already available to anyone with the link.
Send a graph to a report
Send a graph from your workspace to a report to keep track of your progress. Click the dropdown menu on the chart or panel you’d like to copy to a report and click Add to report to select the destination report.
3.7 - Example reports
Reports gallery
Notes: Add a visualization with a quick summary
Capture an important observation, an idea for future work, or a milestone reached in the development of a project. All experiment runs in your report will link to their parameters, metrics, logs, and code, so you can save the full context of your work.
Jot down some text and pull in relevant charts to illustrate your insight.
Save the best examples from a complex code base for easy reference and future interaction. See the LIDAR point clouds W&B Report for an example of how to visualize LIDAR point clouds from the Lyft dataset and annotate with 3D bounding boxes.
Collaboration: Share findings with your colleagues
Explain how to get started with a project, share what you’ve observed so far, and synthesize the latest findings. Your colleagues can make suggestions or discuss details using comments on any panel or at the end of the report.
Include dynamic settings so that your colleagues can explore for themselves, get additional insights, and better plan their next steps. In this example, three types of experiments can be visualized independently, compared, or averaged.
See the SafeLife benchmark experiments W&B Report for an example of how to share first runs and observations of a benchmark.
Work log: Track what you’ve tried and plan next steps
Write down your thoughts on experiments, your findings, and any gotchas and next steps as you work through a project, keeping everything organized in one place. This lets you “document” all the important pieces beyond your scripts. See the Who Is Them? Text Disambiguation With Transformers W&B Report for an example of how you can report your findings.
Tell the story of a project, which you and others can reference later to understand how and why a model was developed. See The View from the Driver’s Seat W&B Report for how you can report your findings.
See the Learning Dexterity End-to-End Using W&B Reports for an example of how W&B Reports were used to explore how the OpenAI Robotics team used W&B Reports to run massive machine learning projects.