Document and share insights across the entire organization by generating live reports in digestible, visual formats that are easily understood by non-technical stakeholders.
Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. You can log hyperparameters, metadata, and metrics to a run, and you can use an artifact to log, track, and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as output.
Use cases
You can use artifacts throughout your entire ML workflow as inputs and outputs of runs. You can use datasets, models, or even other artifacts as inputs for processing.
Add one or more files, such as a model file or dataset, to your artifact object.
Log your artifact to W&B.
For example, the proceeding code snippet shows how to log a file called dataset.h5 to an artifact called example_artifact:
import wandb
run = wandb.init(project="artifacts-example", job_type="add-dataset")
artifact = wandb.Artifact(name="example_artifact", type="dataset")
artifact.add_file(local_path="./dataset.h5", name="training_dataset")
artifact.save()
# Logs the artifact version "my_data" as a dataset with data from dataset.h5
See the track external files page for information on how to add references to files or directories stored in external object storage, like an Amazon S3 bucket.
Download an artifact
Indicate the artifact you want to mark as input to your run with the use_artifact method.
Following the preceding code snippet, this next code block shows how to use the training_dataset artifact:
artifact = run.use_artifact(
"training_dataset:latest") # returns a run object using the "my_data" artifact
This returns an artifact object.
Next, use the returned object to download all contents of the artifact:
datadir = (
artifact.download()
) # downloads the full `my_data` artifact to the default directory.
You can pass a custom path into the rootparameter to download an artifact to a specific directory. For alternate ways to download artifacts and to see additional parameters, see the guide on downloading and using artifacts.
1. Create an artifact Python object with wandb.Artifact()
Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:
Name: Specify a name for your artifact. The name should be unique, descriptive, and easy to remember. Use an artifacts name to both: identify the artifact in the W&B App UI and when you want to use that artifact.
Type: Provide a type. The type should be simple, descriptive and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.
The “name” and “type” you provide is used to create a directed acyclic graph. This means you can view the lineage of an artifact on the W&B App.
Artifacts can not have the same name, even if you specify a different type for the types parameter. In other words, you can not create an artifact named cats of type dataset and another artifact with the same name of type model.
You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see wandb.Artifact Class definition in the Python SDK Reference Guide.
The proceeding example demonstrates how to create a dataset artifact:
Replace the string arguments in the preceding code snippet with your own name and type.
2. Add one more files to the artifact
Add files, directories, external URI references (such as Amazon S3) and more with artifact methods. For example, to add a single text file, use the add_file method:
You can also add multiple files with the add_dir method. For more information on how to add files, see Update an artifact.
3. Save your artifact to the W&B server
Finally, save your artifact to the W&B server. Artifacts are associated with a run. Therefore, use a run objects log_artifact() method to save the artifact.
You can optionally construct an artifact outside of a W&B run. For more information, see Track external files.
Calls to log_artifact are performed asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example:
for i in range(10):
a = wandb.Artifact(
"race",
type="dataset",
metadata={
"index": i,
},
)
# ... add files to artifact a ... run.log_artifact(a)
The artifact version v0 is NOT guaranteed to have an index of 0 in its metadata, as the artifacts may be logged in an arbitrary order.
Add files to an artifact
The following sections demonstrate how to construct artifacts with different file types and from parallel runs.
For the following examples, assume you have a project directory with multiple files and a directory structure:
The proceeding code snippet demonstrates how to add an entire, local directory to your artifact:
# Recursively add a directoryartifact.add_dir(local_path="path/file.format", name="optional-prefix")
The proceeding API calls produce the proceeding artifact content:
API Call
Resulting artifact
artifact.add_dir('images')
cat.png
dog.png
artifact.add_dir('images', name='images')
images/cat.png
images/dog.png
artifact.new_file('hello.txt')
hello.txt
Add a URI reference
Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle.
Add an external URI reference to an artifact with the add_reference method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.
# Add a URI referenceartifact.add_reference(uri="uri", name="optional-name")
Artifacts currently support the following URI schemes:
http(s)://: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
s3://: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
gs://: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
The proceeding API calls will produce the proceeding artifacts:
For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.
import wandb
import time
# We will use ray to launch our runs in parallel# for demonstration purposes. You can orchestrate# your parallel runs however you want.import ray
ray.init()
artifact_type ="dataset"artifact_name ="parallel-artifact"table_name ="distributed_table"parts_path ="parts"num_parallel =5# Each batch of parallel writers should have its own# unique group name.group_name ="writer-group-{}".format(round(time.time()))
@ray.remotedeftrain(i):
"""
Our writer job. Each writer will add one image to the artifact.
"""with wandb.init(group=group_name) as run:
artifact = wandb.Artifact(name=artifact_name, type=artifact_type)
# Add data to a wandb table. In this case we use example data table = wandb.Table(columns=["a", "b", "c"], data=[[i, i *2, 2**i]])
# Add the table to folder in the artifact artifact.add(table, "{}/table_{}".format(parts_path, i))
# Upserting the artifact creates or appends data to the artifact run.upsert_artifact(artifact)
# Launch your runs in parallelresult_ids = [train.remote(i) for i in range(num_parallel)]
# Join on all the writers to make sure their files have# been added before finishing the artifact.ray.get(result_ids)
# Once all the writers are finished, finish the artifact# to mark it ready.with wandb.init(group=group_name) as run:
artifact = wandb.Artifact(artifact_name, type=artifact_type)
# Create a "PartitionTable" pointing to the folder of tables# and add it to the artifact. artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)
# Finish artifact finalizes the artifact, disallowing future "upserts"# to this version. run.finish_artifact(artifact)
1.2 - Download and use artifacts
Download and use Artifacts from multiple projects.
Download and use an artifact that is already stored on the W&B server or construct an artifact object and pass it in to for de-duplication as necessary.
Team members with view-only seats cannot download artifacts.
Download and use an artifact stored on W&B
Download and use an artifact stored in W&B either inside or outside of a W&B Run. Use the Public API (wandb.Api) to export (or update data) already saved in W&B. For more information, see the W&B Public API Reference guide.
First, import the W&B Python SDK. Next, create a W&B Run:
import wandb
run = wandb.init(project="<example>", job_type="<job-type>")
Indicate the artifact you want to use with the use_artifact method. This returns a run object. In the proceeding code snippet specifies an artifact called 'bike-dataset' with the alias 'latest':
Use the object returned to download all the contents of the artifact:
datadir = artifact.download()
You can optionally pass a path to the root parameter to download the contents of the artifact to a specific directory. For more information, see the Python SDK Reference Guide.
Use the get_path method to download only subset of files:
path = artifact.get_path(name)
This fetches only the file at the path name. It returns an Entry object with the following methods:
Entry.download: Downloads file from the artifact at path name
Entry.ref: If add_reference stored the entry as a reference, returns the URI
References that have schemes that W&B knows how to handle get downloaded just like artifact files. For more information, see Track external files.
First, import the W&B SDK. Next, create an artifact from the Public API Class. Provide the entity, project, artifact, and alias associated with that artifact:
import wandb
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
Use the object returned to download the contents of the artifact:
artifact.download()
You can optionally pass a path the root parameter to download the contents of the artifact to a specific directory. For more information, see the API Reference Guide.
Use the wandb artifact get command to download an artifact from the W&B server.
$ wandb artifact get project/artifact:alias --root mnist/
Partially download an artifact
You can optionally download part of an artifact based on a prefix. Using the path_prefix parameter, you can download a single file or the content of a sub-folder.
artifact = run.use_artifact("bike-dataset:latest")
artifact.download(path_prefix="bike.png") # downloads only bike.png
Alternatively, you can download files from a certain directory:
artifact.download(path_prefix="images/bikes/") # downloads files in the images/bikes directory
Use an artifact from a different project
Specify the name of artifact along with its project name to reference an artifact. You can also reference artifacts across entities by specifying the name of the artifact with its entity name.
The following code example demonstrates how to query an artifact from another project as input to the current W&B run.
import wandb
run = wandb.init(project="<example>", job_type="<job-type>")
# Query W&B for an artifact from another project and mark it# as an input to this run.artifact = run.use_artifact("my-project/artifact:alias")
# Use an artifact from another entity and mark it as an input# to this run.artifact = run.use_artifact("my-entity/my-project/artifact:alias")
Construct and use an artifact simultaneously
Simultaneously construct and use an artifact. Create an artifact object and pass it to use_artifact. This creates an artifact in W&B if it does not exist yet. The use_artifact API is idempotent, so you can call it as many times as you like.
Update an existing Artifact inside and outside of a W&B Run.
Pass desired values to update the description, metadata, and alias of an artifact. Call the save() method to update the artifact on the W&B servers. You can update an artifact during a W&B Run or outside of a Run.
Use the W&B Public API (wandb.Api) to update an artifact outside of a run. Use the Artifact API (wandb.Artifact) to update an artifact during a run.
You can not update the alias of artifact linked to a model in Model Registry.
The proceeding code example demonstrates how to update the description of an artifact using the wandb.Artifact API:
import wandb
run = wandb.init(project="<example>")
artifact = run.use_artifact("<artifact-name>:<alias>")
artifact.description ="<description>"artifact.save()
The proceeding code example demonstrates how to update the description of an artifact using the wandb.Api API:
import wandb
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
# Update the descriptionartifact.description ="My new description"# Selectively update metadata keysartifact.metadata["oldKey"] ="new value"# Replace the metadata entirelyartifact.metadata = {"newKey": "new value"}
# Add an aliasartifact.aliases.append("best")
# Remove an aliasartifact.aliases.remove("latest")
# Completely replace the aliasesartifact.aliases = ["replaced"]
# Persist all artifact modificationsartifact.save()
For more information, see the Weights and Biases Artifact API.
You can also update an Artifact collection in the same way as a singular artifact:
import wandb
run = wandb.init(project="<example>")
api = wandb.Api()
artifact = api.artifact_collection(type="<type-name>", collection="<collection-name>")
artifact.name ="<new-collection-name>"artifact.description ="<This is where you'd describe the purpose of your collection.>"artifact.save()
Use aliases as pointers to specific versions. By default, Run.log_artifact adds the latest alias to the logged version.
An artifact version v0 is created and attached to your artifact when you log an artifact for the first time. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.
For example, if you want your training script to pull the most recent version of a dataset, specify latest when you use that artifact. The proceeding code example demonstrates how to download a recent dataset artifact named bike-dataset that has an alias, latest:
import wandb
run = wandb.init(project="<example-project>")
artifact = run.use_artifact("bike-dataset:latest")
artifact.download()
You can also apply a custom alias to an artifact version. For example, if you want to mark that model checkpoint is the best on the metric AP-50, you could add the string 'best-ap50' as an alias when you log the model artifact.
Create a new artifact version from a single run or from a distributed process.
Create a new artifact version with a single run or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an incremental artifact.
We recommend that you create an incremental artifact when you need to apply changes to a subset of files in an artifact, where the size of the original artifact is significantly larger.
Create new artifact versions from scratch
There are two ways to create a new artifact version: from a single run and from distributed runs. They are defined as follows:
Single run: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis.
Distributed runs: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions.
W&B will create a new artifact and assign it a v0 alias if you pass a name to the wandb.Artifact API that does not exist in your project. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.
W&B will retrieve an existing artifact if you pass a name and artifact type to the wandb.Artifact API that matches an existing artifact in your project. The retrieved artifact will have a version greater than 1.
Single run
Log a new version of an Artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact.
Based on your use case, select one of the tabs below to create a new artifact version inside or outside of a run:
Create an artifact version within a W&B run:
Create a run with wandb.init.
Create a new artifact or retrieve an existing one with wandb.Artifact.
Add files to the artifact with .add_file.
Log the artifact to the run with .log_artifact.
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png")
run.log_artifact(artifact)
Create an artifact version outside of a W&B run:
Create a new artifact or retrieve an existing one with wanb.Artifact.
Add files to the artifact with .add_file.
Save the artifact with .save.
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference`artifact.add_file("image1.png")
artifact.save()
Distributed runs
Allow a collection of runs to collaborate on a version before committing it. This is in contrast to single run mode described above where one run provides all the data for a new version.
Each run in the collection needs to be aware of the same unique ID (called distributed_id) in order to collaborate on the same version. By default, if present, W&B uses the run’s group as set by wandb.init(group=GROUP) as the distributed_id.
There must be a final run that “commits” the version, permanently locking its state.
Use upsert_artifact to add to the collaborative artifact and finish_artifact to finalize the commit.
Consider the following example. Different runs (labelled below as Run 1, Run 2, and Run 3) add a different image file to the same artifact with upsert_artifact.
Run 1:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png")
run.upsert_artifact(artifact, distributed_id="my_dist_artifact")
Run 2:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image2.png")
run.upsert_artifact(artifact, distributed_id="my_dist_artifact")
Run 3
Must run after Run 1 and Run 2 complete. The Run that calls finish_artifact can include files in the artifact, but does not need to.
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image3.png")
run.finish_artifact(artifact, distributed_id="my_dist_artifact")
Create a new artifact version from an existing version
Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn’t change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an incremental artifact.
Here are some scenarios for each type of incremental change you might encounter:
add: you periodically add a new subset of files to a dataset after collecting a new batch.
remove: you discovered several duplicate files and want to remove them from your artifact.
update: you corrected annotations for a subset of files and want to replace the old files with the correct ones.
You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you will need to have all the contents of your artifact on your local disk. When making an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version.
You can create an incremental artifact within a single run or with a set of runs (distributed mode).
Follow the procedure below to incrementally change an artifact:
Obtain the artifact version you want to perform an incremental change on:
Lastly, log or save your changes. The following tabs show you how to save your changes inside and outside of a W&B run. Select the tab that is appropriate for your use case:
run.log_artifact(draft_artifact)
draft_artifact.save()
Putting it all together, the code examples above look like:
with wandb.init(job_type="modify dataset") as run:
saved_artifact = run.use_artifact(
"my_artifact:latest" ) # fetch artifact and input it into your run draft_artifact = saved_artifact.new_draft() # create a draft version# modify a subset of files in the draft version draft_artifact.add_file("file_to_add.txt")
draft_artifact.remove("dir_to_remove/")
run.log_artifact(
artifact
) # log your changes to create a new version and mark it as output to your run
client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest") # load your artifactdraft_artifact = saved_artifact.new_draft() # create a draft version# modify a subset of files in the draft versiondraft_artifact.remove("deleted_file.txt")
draft_artifact.add_file("modified_file.txt")
draft_artifact.save() # commit changes to the draft
1.6 - Track external files
Track files saved outside the W&B such as in an Amazon S3 bucket, GCS bucket, HTTP file server, or even an NFS share.
Use reference artifacts to track files saved outside the W&B system, for example in an Amazon S3 bucket, GCS bucket, Azure blob, HTTP file server, or even an NFS share. Log artifacts outside of a W&B Run with the W&B CLI.
Log artifacts outside of runs
W&B creates a run when you log an artifact outside of a run. Each artifact belongs to a run, which in turn belongs to a project. An artifact (version) also belongs to a collection, and has a type.
Use the wandb artifact put command to upload an artifact to the W&B server outside of a W&B run. Provide the name of the project you want the artifact to belong to along with the name of the artifact (project/artifact_name).Optionally provide the type (TYPE). Replace PATH in the code snippet below with the file path of the artifact you want to upload.
$ wandb artifact put --name project/artifact_name --type TYPE PATH
W&B will create a new project if a the project you specify does not exist. For information on how to download an artifact, see Download and use artifacts.
Track artifacts outside of W&B
Use W&B Artifacts for dataset versioning and model lineage, and use reference artifacts to track files saved outside the W&B server. In this mode an artifact only stores metadata about the files, such as URLs, size, and checksums. The underlying data never leaves your system. See the Quick start for information on how to save files and directories to W&B servers instead.
The following describes how to construct reference artifacts and how to best incorporate them into your workflows.
Amazon S3 / GCS / Azure Blob Storage References
Use W&B Artifacts for dataset and model versioning to track references in cloud storage buckets. With artifact references, seamlessly layer tracking on top of your buckets with no modifications to your existing storage layout.
Artifacts abstract away the underlying cloud storage vendor (such AWS, GCP or Azure). Information described in the proceeding section apply uniformly to Amazon S3, Google Cloud Storage and Azure Blob Storage.
W&B Artifacts support any Amazon S3 compatible interface, including MinIO. The scripts below work as-is, when you set the AWS_S3_ENDPOINT_URL environment variable to point at your MinIO server.
Assume we have a bucket with the following structure:
By default, W&B imposes a 10,000 object limit when adding an object prefix. You can adjust this limit by specifying max_objects= in calls to add_reference.
Our new reference artifact mnist:latest looks and behaves similarly to a regular artifact. The only difference is that the artifact only consists of metadata about the S3/GCS/Azure object such as its ETag, size, and version ID (if object versioning is enabled on the bucket).
W&B will use the default mechanism to look for credentials based on the cloud provider you use. Read the documentation from your cloud provider to learn more about the credentials used:
For AWS, if the bucket is not located in the configured user’s default region, you must set the AWS_REGION environment variable to match the bucket region.
Interact with this artifact similarly to a normal artifact. In the App UI, you can look through the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact.
Rich media such as images, audio, video, and point clouds may fail to render in the App UI depending on the CORS configuration of your bucket. Allow listing app.wandb.ai in your bucket’s CORS settings will allow the App UI to properly render such rich media.
Panels might fail to render in the App UI for private buckets. If your company has a VPN, you could update your bucket’s access policy to whitelist IPs within your VPN.
W&B will use the metadata recorded when the artifact was logged to retrieve the files from the underlying bucket when it downloads a reference artifact. If your bucket has object versioning enabled, W&B will retrieve the object version corresponding to the state of the file at the time an artifact was logged. This means that as you evolve the contents of your bucket, you can still point to the exact iteration of your data a given model was trained on since the artifact serves as a snapshot of your bucket at the time of training.
W&B recommends that you enable ‘Object Versioning’ on your storage buckets if you overwrite files as part of your workflow. With versioning enabled on your buckets, artifacts with references to files that have been overwritten will still be intact because the older object versions are retained.
Based on your use case, read the instructions to enable object versioning: AWS, GCP, Azure.
Tying it together
The following code example demonstrates a simple workflow you can use to track a dataset in Amazon S3, GCS, or Azure that feeds into a training job:
import wandb
run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")
# Track the artifact and mark it as an input to# this run in one swoop. A new artifact version# is only logged if the files in the bucket changed.run.use_artifact(artifact)
artifact_dir = artifact.download()
# Perform training here...
To track models, we can log the model artifact after the training script uploads the model files to the bucket:
Another common pattern for fast access to datasets is to expose an NFS mount point to a remote filesystem on all machines running training jobs. This can be an even simpler solution than a cloud storage bucket because from the perspective of the training script, the files look just like they are sitting on your local filesystem. Luckily, that ease of use extends into using Artifacts to track references to file systems, whether they are mounted or not.
Assume we have a filesystem mounted at /mount with the following structure:
mount
+-- datasets/
| +-- mnist/
+-- models/
+-- cnn/
Under mnist/ we have our dataset, a collection of images. Let’s track it with an artifact:
By default, W&B imposes a 10,000 file limit when adding a reference to a directory. You can adjust this limit by specifying max_objects= in calls to add_reference.
Note the triple slash in the URL. The first component is the file:// prefix that denotes the use of filesystem references. The second is the path to our dataset, /mount/datasets/mnist/.
The resulting artifact mnist:latest looks and acts just like a regular artifact. The only difference is that the artifact only consists of metadata about the files, such as their sizes and MD5 checksums. The files themselves never leave your system.
You can interact with this artifact just as you would a normal artifact. In the UI, you can browse the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. However, the UI will not be able to render rich media such as images, audio, etc. as the data itself is not contained within the artifact.
For filesystem references, a download() operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of /mount/datasets/mnist will be copied into the directory artifacts/mnist:v0/. If an artifact contains a reference to a file that was overwritten, then download() will throw an error as the artifact can no longer be reconstructed.
Putting everything together, here’s a simple workflow you can use to track a dataset under a mounted filesystem that feeds into a training job:
import wandb
run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")
# Track the artifact and mark it as an input to# this run in one swoop. A new artifact version# is only logged if the files under the directory# changed.run.use_artifact(artifact)
artifact_dir = artifact.download()
# Perform training here...
To track models, we can log the model artifact after the training script writes the model files to the mount point:
import wandb
run = wandb.init()
# Training here...# Write model to diskmodel_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("file:///mount/cnn/my_model.h5")
run.log_artifact(model_artifact)
1.7 - Manage data
1.7.1 - Delete an artifact
Delete artifacts interactively with the App UI or programmatically with the W&B SDK/
Delete artifacts interactively with the App UI or programmatically with the W&B SDK. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage.
The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions.
The sections in this page describe how to delete specific artifact versions, how to delete an artifact collection, how to delete artifacts with and without aliases, and more. You can schedule when artifacts are deleted from W&B with TTL policies. For more information, see Manage data retention with Artifact TTL policy.
Artifacts that are scheduled for deletion with a TTL policy, deleted with the W&B SDK, or deleted with the W&B App UI are first soft-deleted. Artifacts that are soft deleted undergo garbage collection before they are hard-deleted.
Delete an artifact version
To delete an artifact version:
Select the name of the artifact. This will expand the artifact view and list all the artifact versions associated with that artifact.
From the list of artifacts, select the artifact version you want to delete.
On the right hand side of the workspace, select the kebab dropdown.
Choose Delete.
An artifact version can also be deleted programatically via the delete() method. See the examples below.
Delete multiple artifact versions with aliases
The following code example demonstrates how to delete artifacts that have aliases associated with them. Provide the entity, project name, and run ID that created the artifacts.
import wandb
run = api.run("entity/project/run_id")
for artifact in run.logged_artifacts():
artifact.delete()
Set the delete_aliases parameter to the boolean value, True to delete aliases if the artifact has one or more aliases.
import wandb
run = api.run("entity/project/run_id")
for artifact in run.logged_artifacts():
# Set delete_aliases=True in order to delete# artifacts with one more aliases artifact.delete(delete_aliases=True)
Delete multiple artifact versions with a specific alias
The proceeding code demonstrates how to delete multiple artifact versions that have a specific alias. Provide the entity, project name, and run ID that created the artifacts. Replace the deletion logic with your own:
import wandb
runs = api.run("entity/project_name/run_id")
# Delete artifact ith alias 'v3' and 'v4for artifact_version in runs.logged_artifacts():
# Replace with your own deletion logic.if artifact_version.name[-2:] =="v3"or artifact_version.name[-2:] =="v4":
artifact.delete(delete_aliases=True)
Delete all versions of an artifact that do not have an alias
The following code snippet demonstrates how to delete all versions of an artifact that do not have an alias. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively. Replace the <> with the name of your artifact:
import wandb
# Provide your entity and a project name when you# use wandb.Api methods.api = wandb.Api(overrides={"project": "project", "entity": "entity"})
artifact_type, artifact_name ="<>"# provide type and namefor v in api.artifact_versions(artifact_type, artifact_name):
# Clean up versions that don't have an alias such as 'latest'.# NOTE: You can put whatever deletion logic you want here.if len(v.aliases) ==0:
v.delete()
Delete an artifact collection
To delete an artifact collection:
Navigate to the artifact collection you want to delete and hover over it.
Select the kebab dropdown next to the artifact collection name.
Choose Delete.
You can also delete artifact collection programmatically with the delete() method. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively:
import wandb
# Provide your entity and a project name when you# use wandb.Api methods.api = wandb.Api(overrides={"project": "project", "entity": "entity"})
collection = api.artifact_collection(
"<artifact_type>", "entity/project/artifact_collection_name")
collection.delete()
How to enable garbage collection based on how W&B is hosted
Garbage collection is enabled by default if you use W&B’s shared cloud. Based on how you host W&B, you might need to take additional steps to enable garbage collection, this includes:
Set the GORILLA_ARTIFACT_GC_ENABLED environment variable to true: GORILLA_ARTIFACT_GC_ENABLED=true
Enable bucket versioning if you use AWS, GCP or any other storage provider such as Minio. If you use Azure, enable soft deletion.
Soft deletion in Azure is equivalent to bucket versioning in other storage providers.
The following table describes how to satisfy requirements to enable garbage collection based on your deployment type.
Schedule when artifacts are deleted from W&B with W&B Artifact time-to-live (TTL) policy. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W&B deletes artifacts, see the Delete artifacts page.
Check out this video tutorial to learn how to manage data retention with Artifacts TTL in the W&B App.
W&B deactivates the option to set a TTL policy for model artifacts linked to the Model Registry. This is to help ensure that linked models do not accidentally expire if used in production workflows.
Only team admins can view a team’s settings and access team level TTL settings such as (1) permitting who can set or edit a TTL policy or (2) setting a team default TTL.
If you do not see the option to set or edit a TTL policy in an artifact’s details in the W&B App UI or if setting a TTL programmatically does not successfully change an artifact’s TTL property, your team admin has not given you permissions to do so.
Auto-generated Artifacts
Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W&B cannot have TTL policies set for them.
The following Artifact types indicate an auto-generated Artifact:
run_table
code
job
Any Artifact type starting with: wandb-*
You can check an Artifact’s type on the W&B platform or programmatically:
import wandb
run = wandb.init(project="<my-project-name>")
artifact = run.use_artifact(artifact_or_name="<my-artifact-name>")
print(artifact.type)
Replace the values enclosed with <> with your own.
Define who can edit and set TTL policies
Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions.
Only team admins can define who can set or edit a TTL policy.
Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
From the TTL permissions dropdown, select who can set and edit TTL policies.
Click on Review and save settings.
Confirm the changes and select Save settings.
Create a TTL policy
Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created.
For all the code snippets below, replace the content wrapped in <> with your information to use the code snippet.
Set a TTL policy when you create an artifact
Use the W&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days.
Defining a TTL policy when you create an artifact is similar to how you normally create an artifact. With the exception that you pass in a time delta to the artifact’s ttl attribute.
The following code snippet shows how to set a TTL policy for an artifact:
import wandb
from datetime import timedelta
artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = timedelta(days=365*2) # Delete in two yearsartifact.save()
The preceding code example sets the TTL policy to two years.
Navigate to your W&B project in the W&B App UI.
Select the artifact icon on the left panel.
From the list of artifacts, expand the artifact type you
Select on the artifact version you want to edit the TTL policy for.
Click on the Version tab.
From the dropdown, select Edit TTL policy.
Within the modal that appears, select Custom from the TTL policy dropdown.
Within the TTL duration field, set the TTL policy in units of days.
Select the Update TTL button to save your changes.
Set default TTL policies for a team
Only team admins can set a default TTL policy for a team.
Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team’s default TTL.
Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
Click on the Set team’s default TTL policy.
Within the Duration field, set the TTL policy in units of days.
Click on Review and save settings.
7/ Confirm the changes and then select Save settings.
Set a TTL policy outside of a run
Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days.
The following code sample shows how to fetch an artifact using the public API and set the TTL policy.
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
artifact.ttl = timedelta(days=365) # Delete in one yearartifact.save()
Deactivate a TTL policy
Use the W&B Python SDK or W&B App UI to deactivate a TTL policy for a specific artifact version.
Within your project, select the Artifacts tab in the left sidebar.
Click on a collection.
Within the collection view you can see all of the artifacts in the selected collection. Within the Time to Live column you will see the TTL policy assigned to that artifact.
1.7.3 - Manage artifact storage and memory allocation
Manage storage, memory allocation of W&B Artifacts.
W&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit.
Depending on the machine on wandb is initialized on, these default folders may not be located in a writeable part of the file system. This might trigger an error.
Clean up local artifact cache
W&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the wandb artifact cache cleanup command to prune the cache and to remove any files that have not been used recently.
The proceeding code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal:
$ wandb artifact cache cleanup 1GB
1.8 - Explore artifact graphs
Traverse automatically created direct acyclic W&B Artifact graphs.
W&B automatically tracks the artifacts a given run logged as well as the artifacts a given run uses. These artifacts can include datasets, models, evaluation results, or more. You can explore an artifact’s lineage to track and manage the various artifacts produced throughout the machine learning lifecycle.
Lineage
Tracking an artifact’s lineage has several key benefits:
Reproducibility: By tracking the lineage of all artifacts, teams can reproduce experiments, models, and results, which is essential for debugging, experimentation, and validating machine learning models.
Version Control: Artifact lineage involves versioning artifacts and tracking their changes over time. This allows teams to roll back to previous versions of data or models if needed.
Auditing: Having a detailed history of the artifacts and their transformations enables organizations to comply with regulatory and governance requirements.
Collaboration and Knowledge Sharing: Artifact lineage facilitates better collaboration among team members by providing a clear record of attempts as well as what worked, and what didn’t. This helps in avoiding duplication of efforts and accelerates the development process.
Finding an artifact’s lineage
When selecting an artifact in the Artifacts tab, you can see your artifact’s lineage. This graph view shows a general overview of your pipeline.
To view an artifact graph:
Navigate to your project in the W&B App UI
Choose the artifact icon on the left panel.
Select Lineage.
Navigating the lineage graph
The artifact or job type you provide appears in front of its name, with artifacts represented by blue icons and runs represented by green icons. Arrows detail the input and output of a run or artifact on the graph.
You can view the type and the name of artifact in both the left sidebar and in the Lineage tab.
For a more detailed view, click any individual artifact or run to get more information on a particular object.
Artifact clusters
When a level of the graph has five or more runs or artifacts, it creates a cluster. A cluster has a search bar to find specific versions of runs or artifacts and pulls an individual node from a cluster to continue investigating the lineage of a node inside a cluster.
Clicking on a node opens a preview with an overview of the node. Clicking on the arrow extracts the individual run or artifact so you can examine the lineage of the extracted node.
Create an artifact. First, create a run with wandb.init. Then,create a new artifact or retrieve an existing one with wandb.Artifact. Next, add files to the artifact with .add_file. Finally, log the artifact to the run with .log_artifact. The finished code looks something like this:
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using# `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png")
run.log_artifact(artifact)
Use the artifact object’s logged_by and used_by methods to walk the graph from the artifact:
# Walk up and down the graph from an artifact:producer_run = artifact.logged_by()
consumer_runs = artifact.used_by()
Learn where W&B files are stored by default. Explore how to save, store sensitive information.
Files are uploaded to Google Cloud bucket managed by W&B when you log artifacts. The contents of the bucket are encrypted both at rest and in transit. Artifact files are only visible to users who have access to the corresponding project.
When you delete a version of an artifact, it is marked for soft deletion in our database and removed from your storage cost. When you delete an entire artifact, it is queued for permanently deletion and all of its contents are removed from the W&B bucket. If you have specific needs around file deletion please reach out to Customer Support.
For sensitive datasets that cannot reside in a multi-tenant environment, you can use either a private W&B server connected to your cloud bucket or reference artifacts. Reference artifacts track references to private buckets without sending file contents to W&B. Reference artifacts maintain links to files on your buckets or servers. In other words, W&B only keeps track of the metadata associated with the files and not the files themselves.
Create a reference artifact similar to how you create a non reference artifact:
import wandb
run = wandb.init()
artifact = wandb.Artifact("animals", type="dataset")
artifact.add_reference("s3://my-bucket/animals")
For alternatives, contact us at contact@wandb.com to talk about private cloud and on-premises installations.
1.10 - Tutorial: Create, track, and use a dataset artifact
Artifacts quickstart shows how to create, track, and use a dataset artifact with W&B.
This walkthrough demonstrates how to create, track, and use a dataset artifact from W&B Runs.
1. Log into W&B
Import the W&B library and log in to W&B. You will need to sign up for a free W&B account if you have not done so already.
import wandb
wandb.login()
2. Initialize a run
Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. Provide a project name and a job type:
# Create a W&B Run. Here we specify 'dataset' as the job type since this example# shows how to create a dataset artifact.run = wandb.init(project="artifacts-example", job_type="upload-dataset")
3. Create an artifact object
Create an artifact object with the wandb.Artifact() API. Provide a name for the artifact and a description of the file type for the name and type parameters, respectively.
For example, the following code snippet demonstrates how to create an artifact called ‘bicycle-dataset’ with a ‘dataset’ label:
For more information about how to construct an artifact, see Construct artifacts.
Add the dataset to the artifact
Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named dataset.h5 that is saved locally on our machine to the artifact:
# Add a file to the artifact's contentsartifact.add_file(local_path="dataset.h5")
Replace the filename dataset.h5 in the preceding code snippet with the path to the file you want to add to the artifact.
4. Log the dataset
Use the W&B run objects log_artifact() method to both save your artifact version and declare the artifact as an output of the run.
# Save the artifact version to W&B and mark it# as the output of this runrun.log_artifact(artifact)
A 'latest' alias is created by default when you log an artifact. For more information about artifact aliases and versions, see Create a custom alias and Create new artifact versions, respectively.
5. Download and use the artifact
The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W&B servers.
First, initialize a new run object with wandb.init().
Second, use the run objects use_artifact() method to tell W&B what artifact to use. This returns an artifact object.
Third, use the artifacts download() method to download the contents of the artifact.
# Create a W&B Run. Here we specify 'training' for 'type'# because we will use this run to track training.run = wandb.init(project="artifacts-example", job_type="training")
# Query W&B for an artifact and mark it as input to this runartifact = run.use_artifact("bicycle-dataset:latest")
# Download the artifact's contentsartifact_dir = artifact.download()
Alternatively, you can use the Public API (wandb.Api) to export (or update data) data already saved in a W&B outside of a Run. See Track external files for more information.
W&B Registry is now in public preview. Visit this section to learn how to enable it for your deployment type.
W&B Registry is a curated central repository of artifact versions within your organization. Users who have permission within your organization can download, share, and collaboratively manage the lifecycle of all artifacts, regardless of the team that user belongs to.
Each registry consists of one or more collections. Each collection represents a distinct task or use case.
To add an artifact to a registry, you first log a specific artifact version to W&B. Each time you log an artifact, W&B automatically assigns a version to that artifact. Artifact versions use 0 indexing, so the first version is v0, the second version is v1, and so on.
Once you log an artifact to W&B, you can then link that specific artifact version to a collection in the registry.
The term “link” refers to pointers that connect where W&B stores the artifact and where the artifact is accessible in the registry. W&B does not duplicate artifacts when you link an artifact to a collection.
As an example, the proceeding code example shows how to log and link a fake model artifact called “my_model.txt” to a collection named “first-collection” in the core Model registry. More specifically, the code accomplishes the following:
Initialize a W&B run.
Log the artifact to W&B.
Specify the name of the collection and registry you want to link your artifact version to.
Link the artifact to the collection.
Copy and paste the proceeding code snippet into a Python script and run it. Ensure that you have W&B Python SDK version 0.18.6 or greater.
import wandb
import random
# Initialize a W&B run to track the artifactrun = wandb.init(project="registry_quickstart")
# Create a simulated model file so that you can log itwith open("my_model.txt", "w") as f:
f.write("Model: "+ str(random.random()))
# Log the artifact to W&Blogged_artifact = run.log_artifact(
artifact_or_path="./my_model.txt",
name="gemma-finetuned",
type="model"# Specifies artifact type)
# Specify the name of the collection and registry# you want to publish the artifact toCOLLECTION_NAME ="first-collection"REGISTRY_NAME ="model"# Link the artifact to the registryrun.link_artifact(
artifact=logged_artifact,
target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}")
W&B automatically creates a collection for you if the collection you specify in the returned run object’s link_artifact(target_path = "") method does not exist within the registry you specify.
The URL that your terminal prints directs you to the project where W&B stores your artifact.
Navigate to the Registry App to view artifact versions that you and other members of your organization publish. To do so, first navigate to W&B. Select Registry in the left sidebar below Applications. Select the “Model” registry. Within the registry, you should see the “first-collection” collection with your linked artifact version.
Once you link an artifact version to a collection within a registry, members of your organization can view, download, and manage your artifact versions, create downstream automations, and more if they have the proper permissions.
If an artifact version logs metrics (such as by using run.log_artifact()), you can view metrics for that version from its details page, and you can compare metrics across artifact versions from the collection’s page. Refer to View linked artifacts in a registry.
Enable W&B Registry
Based on your deployment type, satisfy the following conditions to enable W&B Registry:
Deployment type
How to enable
Multi-tenant Cloud
No action required. W&B Registry is available on the W&B App.
Dedicated Cloud
Contact your account team. The Solutions Architect (SA) Team enables W&B Registry within your instance’s operator console. Ensure your instance is on server release version 0.59.2 or newer.
Self-Managed
Enable the environment variable called ENABLE_REGISTRY_UI. To learn more about enabling environment variables in server, visit these docs. In self-managed instances, your infrastructure administrator should enable this environment variable and set it to true. Ensure your instance is on server release version 0.59.2 or newer.
Resources to get started
Depending on your use case, explore the following resources to get started with the W&B Registry:
Use W&B Registry to manage and version your artifacts, track lineage, and promote models through different lifecycle stages.
Automate your model management workflows using webhooks.
Integrate the registry with external ML systems and tools for model evaluation, monitoring, and deployment.
Migrate from the legacy Model Registry to W&B Registry
The legacy Model Registry is scheduled for deprecation with the exact date not yet decided. Before deprecating the legacy Model Registry, W&B will migrate the contents of the legacy Model Registry to the W&B Registry.
Until the migration occurs, W&B supports both the legacy Model Registry and the new Registry.
To view the legacy Model Registry, navigate to the Model Registry in the W&B App. A banner appears at the top of the page that enables you to use the legacy Model Registry App UI.
Reach out to support@wandb.com with any questions or to speak to the W&B Product Team about any concerns about the migration.
A core registry is a template for specific use cases: Models and Datasets.
By default, the Models registry is configured to accept "model" artifact types and the Dataset registry is configured to accept "dataset" artifact types. An admin can add additional accepted artifact types.
The preceding image shows the Models and the Dataset core registry along with a custom registry called Fine_Tuned_Models in the W&B Registry App UI.
A core registry has organization visibility. A registry admin can not change the visibility of a core registry.
Custom registry
Custom registries are not restricted to "model" artifact types or "dataset" artifact types.
You can create a custom registry for each step in your machine learning pipeline, from initial data collection to final model deployment.
For example, you might create a registry called “Benchmark_Datasets” for organizing curated datasets to evaluate the performance of trained models. Within this registry, you might have a collection called “User_Query_Insurance_Answer_Test_Data” that contains a set of user questions and corresponding expert-validated answers that the model has never seen during training.
A custom registry can have either organization or restricted visibility. A registry admin can change the visibility of a custom registry from organization to restricted. However, the registry admin can not change a custom registry’s visibility from restricted to organizational visibility.
Within Custom registry, click on the Create registry button.
Provide a name for your registry in the Name field.
Optionally provide a description about the registry.
Select who can view the registry from the Registry visibility dropdown. See Registry visibility types for more information on registry visibility options.
Select either All types or Specify types from the Accepted artifacts type dropdown.
(If you select Specify types) Add one or more artifact types that your registry accepts.
Click on the Create registry button.
An artifact type cannot be removed from a registry once it is saved in the registry’s settings.
For example, the proceeding image shows a custom registry called Fine_Tuned_Models that a user is about to create. The registry is Restricted to only members that are manually added to the registry.
Visibility types
The visibility of a registry determines who can access that registry. Restricting the visibility of a custom registry helps ensure that only specified members can access that registry.
There are two type registry visibility options for a custom registry:
Visibility
Description
Restricted
Only invited organization members can access the registry.
Organization
Everyone in the org can access the registry.
A team administrator or registry administrator can set the visibility of a custom registry.
The user who creates a custom registry with Restricted visibility is added to the registry automatically as its registry admin.
Configure the visibility of a custom registry
A team administrator or registry administrator can assign the visibility of a custom registry during or after the creation of a custom registry.
To restrict the visibility of an existing custom registry:
Click on the gear icon on the upper right hand corner.
From the Registry visibility dropdown, select the desired registry visibility.
if you select Restricted visibility:
Add members of your organization that you want to have access to this registry. Scroll to the Registry members and roles section and click on the Add member button.
Within the Member field, add the email or username of the member you want to add.
Click Add new member.
See Create a custom registry for more information on how assign the visibility of a custom registry when a team administrator creates it.
Your role in a team has no impact or relationship to your role in any registry.
The proceeding table lists the different roles a user can have and their permissions:
Permission
Permission Group
Viewer
Member
Admin
View a collection’s details
Read
X
X
X
View a linked artifact’s details
Read
X
X
X
Usage: Consume an artifact in a registry with use_artifact
Read
X
X
X
Download a linked artifact
Read
X
X
X
Download files from an artifact’s file viewer
Read
X
X
X
Search a registry
Read
X
X
X
View a registry’s settings and user list
Read
X
X
X
Create a new automation for a collection
Create
X
X
Turn on Slack notifications for new version being added
Create
X
X
Create a new collection
Create
X
X
Create a new custom registry
Create
X
X
Edit collection card (description)
Update
X
X
Edit linked artifact description
Update
X
X
Add or delete a collection’s tag
Update
X
X
Add or delete an alias from a linked artifact
Update
X
X
Link a new artifact
Update
X
X
Edit allowed types list for a registry
Update
X
X
Edit custom registry name
Update
X
X
Delete a collection
Delete
X
X
Delete an automation
Delete
X
X
Unlink an artifact from a registry
Delete
X
X
Edit accepted artifact types for a registry
Admin
X
Change registry visibility (Organization or Restricted)
Admin
X
Add users to a registry
Admin
X
Assign or change a user’s role in a registry
Admin
X
Inherited permissions
A user’s permission in a registry depends on the highest level of privilege assigned to that user, whether individually or by team membership.
For example, suppose a registry admin adds a user called Nico to Registry A and assigns them a Viewer registry role. A registry admin then adds a team called Foundation Model Team to Registry A and assigns Foundation Model Team a Member registry role.
Nico is a member of the Foundation Model Team, which is a Member of the Registry. Because Member has more permission than Viewer, W&B grants Nico the Member role.
The proceeding table demonstrates the highest level of permission in the event of a conflict between a user’s individual registry role and the registry role of a team they are a member of:
Team registry role
Individual registry role
Inherited registry role
Viewer
Viewer
Viewer
Member
Viewer
Member
Admin
Viewer
Admin
If there is a conflict, W&B displays the highest level of permissions next to the name of the user.
For example, in the proceeding image Alex inherits Member role privileges because they are a member of the smle-reg-team-1 team.
Click the gear icon on the upper right hand corner.
Scroll to the Registry members and roles section.
Within the Member field, search for the user or team you want to edit permissions for.
In the Registry role column, click the user’s role.
From the dropdown, select the role you want to assign to the user.
2.4 - Create a collection
A collection is a set of linked artifact versions within a registry. Each collection represents a distinct task or use case.
For example, within the core Dataset registry you might have multiple collections. Each collection contains a different dataset such as MNIST, CIFAR-10, or ImageNet.
As another example, you might have a registry called “chatbot” that contains a collection for model artifacts, another collection for dataset artifacts, and another collection for fine-tuned model artifacts.
How you organize a registry and their collections is up to you.
If you are familiar with W&B Model Registry, you might aware of registered models. Registered models in the Model Registry are now referred to as collections in the W&B Registry.
Collection types
Each collection accepts one, and only one, type of artifact. The type you specify restricts what sort of artifacts you, and other members of your organization, can link to that collection.
You can think of artifact types similar to data types in programming languages such as Python. In this analogy, a collection can store strings, integers, or floats but not a mix of these data types.
For example, suppose you create a collection that accepts “dataset” artifact types. This means that you can only link future artifact versions that have the type “dataset” to this collection. Similarly, you can only link artifacts of type “model” to a collection that accepts only model artifact types.
You specify an artifact’s type when you create that artifact object. Note the type field in wandb.Artifact():
When you create a collection, you can select from a list of predefined artifact types. The artifact types available to you depend on the registry that the collection belongs to. .
Check the types of artifact that a collection accepts
Before you link to a collection, inspect the artifact type that the collection accepts. You can inspect the artifact types that collection accepts programmatically with the W&B Python SDK or interactively with the W&B App
An error message appears if you try to create link an artifact to a collection that does not accept that artifact type.
You can find the accepted artifact types on the registry card on the homepage or within a registry’s settings page.
For both methods, first navigate to your W&B Registry App.
Within the homepage of the Registry App, you can view the accepted artifact types by scrolling to the registry card of that registry. The gray horizontal ovals within the registry card lists the artifact types that registry accepts.
For example, the preceding image shows multiple registry cards on the Registry App homepage. Within the Model registry card, you can see two artifact types: model and model-new.
To view accepted artifact types within a registry’s settings page:
Click on the registry card you want to view the settings for.
Click on the gear icon in the upper right corner.
Scroll to the Accepted artifact types field.
Programmatically view the artifact types that a registry accepts with the W&B Python SDK:
import wandb
registry_name ="<registry_name>"artifact_types = wandb.Api().project(name=f"wandb-registry-{registry_name}").artifact_types()
print(artifact_type.name for artifact_type in artifact_types)
Note that you do not initialize a run with the proceeding code snippet. This is because it is unnecessary to create a run if you are only querying the W&B API and not tracking an experiment, artifact and so on.
Once you know what type of artifact a collection accepts, you can create a collection.
Create a collection
Interactively or programmatically create a collection within a registry. You can not change the type of artifact that a collection accepts after you create it.
Programmatically create a collection
Use the wandb.init.link_artifact() method to link an artifact to a collection. Specify both the collection and the registry to the target_path field as a path that takes the form of:
Where registry_name is the name of the registry and collection_name is the name of the collection. Ensure to append the prefix wandb-registry- to the registry name.
W&B automatically creates a collection for you if you try to link an artifact to a collection that does not exist. If you specify a collection that does exists, W&B links the artifact to the existing collection.
The proceeding code snippet shows how to programmatically create a collection. Ensure to replace other the values enclosed in <> with your own:
import wandb
# Initialize a runrun = wandb.init(entity ="<team_entity>", project ="<project>")
# Create an artifact objectartifact = wandb.Artifact(
name ="<artifact_name>",
type ="<artifact_type>" )
registry_name ="<registry_name>"collection_name ="<collection_name>"target_path =f"wandb-registry-{registry_name}/{collection_name}"# Link the artifact to a collectionrun.link_artifact(artifact = artifact, target_path = target_path)
run.finish()
Interactively create a collection
The following steps describe how to create a collection within a registry using the W&B Registry App UI:
Navigate to the Registry App in the W&B App UI.
Select a registry.
Click on the Create collection button in the upper right hand corner.
Provide a name for your collection in the Name field.
Select a type from the Type dropdown. Or, if the registry enables custom artifact types, provide one or more artifact types that this collection accepts.
Optionally provide a description of your collection in the Description field.
Optionally add one or more tags in the Tags field.
Click Link version.
From the Project dropdown, select the project where your artifact is stored.
From the Artifact collection dropdown, select your artifact.
From the Version dropdown, select the artifact version you want to link to your collection.
Click on the Create collection button.
2.5 - Link an artifact version to a registry
Link artifact versions to a collection to make them available to other members in your organization.
When you link an artifact to a registry, this “publishes” that artifact to that registry. Any user that has access to that registry can access the linked artifact versions in the collection.
In other words, linking an artifact to a registry collection brings that artifact version from a private, project-level scope, to a shared organization level scope.
The term “type” refers to the artifact object’s type. When you create an artifact object (wandb.Artifact), or log an artifact (wandb.init.log_artifact), you specify a type for the type parameter.
Link an artifact to a collection
Link an artifact version to a collection interactively or programmatically.
Before you link an artifact to a registry, check the types of artifacts that collection permits. For more information about collection types, see “Collection types” within Create a collection.
Based on your use case, follow the instructions described in the tabs below to link an artifact version.
If an artifact version logs metrics (such as by using run.log_artifact()), you can view metrics for that version from its details page, and you can compare metrics across artifact versions from the artifact’s page. Refer to View linked artifacts in a registry.
Before you link an artifact to a collection, ensure that the registry that the collection belongs to already exists. To check that the registry exists, navigate to the Registry app on the W&B App UI and search for the name of the registry.
Use the target_path parameter to specify the collection and registry you want to link the artifact version to. The target path consists of the prefix “wandb-registry”, the name of the registry, and the name of the collection separated by a forward slashes:
wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}
Copy and paste the code snippet below to link an artifact version to a collection within an existing registry. Replace values enclosed in <> with your own:
import wandb
# Initialize a runrun = wandb.init(
entity ="<team_entity>",
project ="<project_name>")
# Create an artifact object# The type parameter specifies both the type of the # artifact object and the collection typeartifact = wandb.Artifact(name ="<name>", type ="<type>")
# Add the file to the artifact object. # Specify the path to the file on your local machine.artifact.add_file(local_path ="<local_path_to_artifact>")
# Specify the collection and registry to link the artifact toREGISTRY_NAME ="<registry_name>"COLLECTION_NAME ="<collection_name>"target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"# Link the artifact to the collectionrun.link_artifact(artifact = artifact, target_path = target_path)
If you want to link an artifact version to the Model registry or the Dataset registry, set the artifact type to "model" or "dataset", respectively.
Navigate to the Registry App.
Hover your mouse next to the name of the collection you want to link an artifact version to.
Select the meatball menu icon (three horizontal dots) next to View details.
From the dropdown, select Link new version.
From the sidebar that appears, select the name of a team from the Team dropdown.
From the Project dropdown, select the name of the project that contains your artifact.
From the Artifact dropdown, select the name of the artifact.
From the Version dropdown, select the artifact version you want to link to the collection.
Navigate to your project’s artifact browser on the W&B App at: https://wandb.ai/<entity>/<project>/artifacts
Select the Artifacts icon on the left sidebar.
Click on the artifact version you want to link to your registry.
Within the Version overview section, click the Link to registry button.
From the modal that appears on the right of the screen, select an artifact from the Select a register model menu dropdown.
Click Next step.
(Optional) Select an alias from the Aliases dropdown.
Click Link to registry.
View a linked artifact’s metadata, version data, usage, lineage information and more in the Registry App.
View linked artifacts in a registry
View information about linked artifacts such as metadata, lineage, and usage information in the Registry App.
Navigate to the Registry App.
Select the name of the registry that you linked the artifact to.
Select the name of the collection.
If the collection’s artifacts log metrics, compare metrics across versions by clicking Show metrics.
From the list of artifact versions, select the version you want to access. Version numbers are incrementally assigned to each linked artifact version starting with v0.
To view details about an artifact version, click the version. From the tabs in this page, you can view that version’s metadata (including logged metrics), lineage, and usage information.
Make note of the Full Name field within the Version tab. The full name of a linked artifact consists of the registry, collection name, and the alias or index of the artifact version.
You need the full name of a linked artifact to access the artifact version programmatically.
Troubleshooting
Below are some common things to double check if you are not able to link an artifact.
Logging artifacts from a personal account
Artifacts logged to W&B with a personal entity can not be linked to the registry. Make sure that you log artifacts using a team entity within your organization. Only artifacts logged within an organization’s team can be linked to the organization’s registry.
Ensure that you log an artifact with a team entity if you want to link that artifact to a registry.
Find your team entity
W&B uses the name of your team as the team’s entity. For example, if your team is called team-awesome, your team entity is team-awesome.
You can confirm the name of your team by:
Navigate to your team’s W&B profile page.
Copy the site’s URL. It has the form of https://wandb.ai/<team>. Where <team> is the both the name of your team and the team’s entity.
Log from a team entity
Specify the team as the entity when you initialize a run with wandb.init(). If you do not specify the entity when you initialize a run, the run uses your default entity which may or may not be your team entity.
import wandb
run = wandb.init(
entity='<team_entity>',
project='<project_name>' )
Log the artifact to the run either with run.log_artifact or by creating an Artifact object and then adding files to it with :
If an artifact is logged to your personal entity, you will need to re-log it to an entity within your organization.
Confirm the path of a registry in the W&B App UI
There are two ways to confirm the path of a registry with the UI: create an empty collection and view the collection details or copy and paste the autogenerated code on the collection’s home page.
Ensure that you replace the name of the collection from the temporary collection with the name of the collection that you want to link your artifact to.
2.6 - Download an artifact from a registry
Use the W&B Python SDK to download an artifact linked to a registry. To download and use an artifact, you need to know the name of the registry, the name of the collection, and the alias or index of the artifact version you want to download.
To download an artifact linked to a registry, you must know the path of that linked artifact. The path consists of the registry name, collection name, and the alias or index of the artifact version you want to access.
Once you have the registry, collection, and alias or index of the artifact version, you can construct the path to the linked artifact using the proceeding string template:
# Artifact name with version index specifiedf"wandb-registry-{REGISTRY}/{COLLECTION}:v{INDEX}"# Artifact name with alias specifiedf"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}"
Replace the values within the curly braces {} with the name of the registry, collection, and the alias or index of the artifact version you want to access.
Specify model or dataset to link an artifact version to the core Model registry or the core Dataset registry, respectively.
Use the wandb.init.use_artifact method to access the artifact and download its contents once you have the path of the linked artifact. The proceeding code snippet shows how to use and download an artifact linked to the W&B Registry. Ensure to replace values within <> with your own:
import wandb
REGISTRY ='<registry_name>'COLLECTION ='<collection_name>'ALIAS ='<artifact_alias>'run = wandb.init(
entity ='<team_name>',
project ='<project_name>' )
artifact_name =f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}"# artifact_name = '<artifact_name>' # Copy and paste Full name specified on the Registry Appfetched_artifact = run.use_artifact(artifact_or_name = artifact_name)
download_path = fetched_artifact.download()
The .use_artifact() method both creates a run and marks the artifact you download as the input to that run.
Marking an artifact as the input to a run enables W&B to track the lineage of that artifact.
If you do not want to create a run, you can use the wandb.Api() object to access the artifact:
Example: Use and download an artifact linked to the W&B Registry
The proceeding code example shows how a user can download an artifact linked to a collection called phi3-finetuned in the Fine-tuned Models registry. The alias of the artifact version is set to production.
import wandb
TEAM_ENTITY ="product-team-applications"PROJECT_NAME ="user-stories"REGISTRY ="Fine-tuned Models"COLLECTION ="phi3-finetuned"ALIAS ='production'# Initialize a run inside the specified team and projectrun = wandb.init(entity=TEAM_ENTITY, project = PROJECT_NAME)
artifact_name =f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}"# Access an artifact and mark it as input to your run for lineage trackingfetched_artifact = run.use_artifact(artifact_or_name = name)
# Download artifact. Returns path to downloaded contentsdownloaded_path = fetched_artifact.download()
See use_artifact and Artifact.download() in the API Reference guide for more information on possible parameters and return type.
Users with a personal entity that belong to multiple organizations
Users with a personal entity that belong to multiple organizations must also specify either the name of their organization or use a team entity when accessing artifacts linked to a registry.
import wandb
REGISTRY ="<registry_name>"COLLECTION ="<collection_name>"VERSION ="<version>"# Ensure you are using your team entity to instantiate the APIapi = wandb.Api(overrides={"entity": "<team-entity>"})
artifact_name =f"wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}"artifact = api.artifact(name = artifact_name)
# Use org display name or org entity in the pathapi = wandb.Api()
artifact_name =f"{ORG_NAME}/wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}"artifact = api.artifact(name = artifact_name)
Where the ORG_NAME is the display name of your organization. Multi-tenant SaaS users can find the name of their organization in the organization’s settings page at https://wandb.ai/account-settings/. Dedicated Cloud and Self-Managed users, contact your account administrator to confirm your organization’s display name.
Copy and paste pre-generated code snippet
W&B creates a code snippet that you can copy and paste into your Python script, notebook, or terminal to download an artifact linked to a registry.
Navigate to the Registry App.
Select the name of the registry that contains your artifact.
Select the name of the collection.
From the list of artifact versions, select the version you want to access.
Select the Usage tab.
Copy the code snippet shown in the Usage API section.
Paste the code snippet into your Python script, notebook, or terminal.
Only items that you have permission to view appear in the search results.
Search for registry items
To search for a registry item:
Navigate to the W&B Registry App.
Specify the search term in the search bar at the top of the page. Press Enter to search.
Search results appear below the search bar if the term you specify matches an existing registry, collection name, artifact version tag, collection tag, or alias.
The following table lists query names you can use based on the type of item you want to filter:
query name
registries
name, description, created_at, updated_at
collections
name, tag, description, created_at, updated_at
versions
tag, alias, created_at, updated_at, metadata
The proceeding code examples demonstrate some common search scenarios.
To use the wandb.Api().registries() method, first import the W&B Python SDK (wandb) library:
import wandb
# (Optional) Create an instance of the wandb.Api() class for readabilityapi = wandb.Api()
Filter all registries that contain the string model:
# Filter all registries that contain the string `model`registry_filters = {
"name": {"$regex": "model"}
}
# Returns an iterable of all registries that match the filtersregistries = api.registries(filter=registry_filters)
Filter all collections, independent of registry, that contains the string yolo in the collection name:
# Filter all collections, independent of registry, that # contains the string `yolo` in the collection namecollection_filters = {
"name": {"$regex": "yolo"}
}
# Returns an iterable of all collections that match the filterscollections = api.registries().collections(filter=collection_filters)
Filter all collections, independent of registry, that contains the string yolo in the collection name and possesses cnn as a tag:
# Filter all collections, independent of registry, that contains the# string `yolo` in the collection name and possesses `cnn` as a tagcollection_filters = {
"name": {"$regex": "yolo"},
"tag": "cnn"}
# Returns an iterable of all collections that match the filterscollections = api.registries().collections(filter=collection_filters)
Find all artifact versions that contains the string model and has either the tag image-classification or an latest alias:
# Find all artifact versions that contains the string `model` and # has either the tag `image-classification` or an `latest` aliasregistry_filters = {
"name": {"$regex": "model"}
}
# Use logical $or operator to filter artifact versionsversion_filters = {
"$or": [
{"tag": "image-classification"},
{"alias": "production"}
]
}
# Returns an iterable of all artifact versions that match the filtersartifacts = api.registries(filter=registry_filters).collections().versions(filter=version_filters)
Each item in the artifacts iterable in the previous code snippet is an instance of the Artifact class. This means that you can access each artifact’s attributes, such as name, collection, aliases, tags, created_at, and more:
for art in artifacts:
print(f"artifact name: {art.name}")
print(f"collection artifact belongs to: { art.collection.name}")
print(f"artifact aliases: {art.aliases}")
print(f"tags attached to artifact: {art.tags}")
print(f"artifact created at: {art.created_at}\n")
For a complete list of an artifact object’s attributes, see the Artifacts Class in the API Reference docs.
Filter all artifact versions, independent of registry or collection, created between 2024-01-08 and 2025-03-04 at 13:10 UTC:
# Find all artifact versions created between 2024-01-08 and 2025-03-04 at 13:10 UTC. artifact_filters = {
"alias": "latest",
"created_at" : {"$gte": "2024-01-08", "$lte": "2025-03-04 13:10:00"},
}
# Returns an iterable of all artifact versions that match the filtersartifacts = api.registries().collections().versions(filter=artifact_filters)
Specify the date and time in the format YYYY-MM-DD HH:MM:SS. You can omit the hours, minutes, and seconds if you want to filter by date only.
See the MongoDB documentation for more information on query comparisons.
2.8 - Organize versions with tags
Use tags to organize collections or artifact versions within collections. You can add, remove, edit tags with the Python SDK or W&B App UI.
Create and add tags to organize your collections or artifact versions within your registry. Add, modify, view, or remove tags to a collection or artifact version with the W&B App UI or the W&B Python SDK.
When to use a tag versus using an alias
Use aliases when you need to reference a specific artifact version uniquely. For example, use an alias such as ‘production’ or ’latest’ to ensure that artifact_name:alias always points to a single, specific version.
Use tags when you want more flexibility for grouping or searching. Tags are ideal when multiple versions or collections can share the same label, and you don’t need the guarantee that only one version is associated with a specific identifier.
Add a tag to a collection
Use the W&B App UI or Python SDK to add a tag to a collection:
Update a tag programmatically by reassigning or by mutating the tags attribute. W&B recommends, and it is good Python practice, that you reassign the tags attribute instead of in-place mutation.
For example, the proceeding code snippet shows common ways to update a list with reassignment. For brevity, we continue the code example from the Add a tag to a collection section:
Click View details next to the name of the collection you want to add a tag to
Scroll down to Versions
Click View next to an artifact version
Within the Version tab, click on the plus icon (+) next to the Tags field and type in the name of the tag
Press Enter on your keyboard
Fetch the artifact version you want to add or update a tag to. Once you have the artifact version, you can access the artifact object’s tag attribute to add or modify tags to that artifact. Pass in one or more tags as list to the artifacts tag attribute.
Like other artifacts, you can fetch an artifact from W&B without creating a run or you can create a run and fetch the artifact within that run. In either case, ensure to call the artifact object’s save method to update the artifact on the W&B servers.
Copy and paste an appropriate code cells below to add or modify an artifact version’s tag. Replace the values in <> with your own.
The proceeding code snippet shows how to fetch an artifact and add a tag without creating a new run:
import wandb
ARTIFACT_TYPE ="<TYPE>"ORG_NAME ="<org_name>"REGISTRY_NAME ="<registry_name>"COLLECTION_NAME ="<collection_name>"VERSION ="<artifact_version>"artifact_name =f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}"artifact = wandb.Api().artifact(name = artifact_name, type = ARTIFACT_TYPE)
artifact.tags = ["tag2"] # Provide one or more tags in a listartifact.save()
The proceeding code snippet shows how to fetch an artifact and add a tag by creating a new run:
import wandb
ORG_NAME ="<org_name>"REGISTRY_NAME ="<registry_name>"COLLECTION_NAME ="<collection_name>"VERSION ="<artifact_version>"run = wandb.init(entity ="<entity>", project="<project>")
artifact_name =f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}"artifact = run.use_artifact(artifact_or_name = artifact_name)
artifact.tags = ["tag2"] # Provide one or more tags in a listartifact.save()
Update tags that belong to an artifact version
Update a tag programmatically by reassigning or by mutating the tags attribute. W&B recommends, and it is good Python practice, that you reassign the tags attribute instead of in-place mutation.
For example, the proceeding code snippet shows common ways to update a list with reassignment. For brevity, we continue the code example from the Add a tag to an artifact version section:
Click View details next to the name of the collection you want to add a tag to
Scroll down to Versions section
If an artifact version has one or more tags, you can view those tags within the Tags column.
Fetch the artifact version to view its tags. Once you have the artifact version, you can view tags that belong to that artifact by viewing the artifact object’s tag attribute.
Like other artifacts, you can fetch an artifact from W&B without creating a run or you can create a run and fetch the artifact within that run.
Copy and paste an appropriate code cells below to add or modify an artifact version’s tag. Replace the values in <> with your own.
The proceeding code snippet shows how to fetch and view an artifact version’s tags without creating a new run:
Use the W&B Python SDK to find artifact versions that have a set of tags:
import wandb
api = wandb.Api()
tagged_artifact_versions = api.artifacts(
type_name ="<artifact_type>",
name ="<artifact_name>",
tags = ["<tag_1>", "<tag_2>"]
)
for artifact_version in tagged_artifact_versions:
print(artifact_version.tags)
2.9 - Annotate collections
Add human-friendly text to your collections to help users understand the purpose of the collection and the artifacts it contains.
Depending on the collection, you might want to include information about the training data, model architecture, task, license, references, and deployment. The proceeding lists some topics worth documenting in a collection:
W&B recommends including at minimum these details:
Summary: The purpose of the collection. The machine learning framework used for the machine learning experiment.
License: The legal terms and permissions associated with the use of the machine learning model. It helps model users understand the legal framework under which they can utilize the model. Common licenses include Apache 2.0, MIT, and GPL.
References: Citations or references to relevant research papers, datasets, or external resources.
If your collection contains training data, consider including these additional details:
Training data: Describe the training data used
Processing: Processing done on the training data set.
Data storage: Where is that data stored and how to access it.
If your collection contains a machine learning model, consider including these additional details:
Architecture: Information about the model architecture, layers, and any specific design choices.
Task: The specific type of task or problem that the machine that the collection model is designed to perform. It’s a categorization of the model’s intended capability.
Deserialize the model: Provide information on how someone on your team can load the model into memory.
Task: The specific type of task or problem that the machine learning model is designed to perform. It’s a categorization of the model’s intended capability.
Deployment: Details on how and where the model is deployed and guidance on how the model is integrated into other enterprise systems, such as a workflow orchestration platforms.
Add a description to a collection
Interactively or programmatically add a description to a collection with the W&B Registry UI or Python SDK.
Select View details next to the name of the collection.
Within the Description field, provide information about your collection. Format text within with Markdown markup language.
Use the wandb.Api().artifact_collection() method to access a collection’s description. Use the returned object’s description property to add, or update, a description to the collection.
Specify the collection’s type for the type_name parameter and the collection’s full name for the name parameter. A collection’s name consists of the prefix “wandb-registry”, the name of the registry, and the name of the collection separated by a forward slashes:
wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}
Copy and paste the proceeding code snippet into your Python script or notebook. Replace values enclosed in angle brackets (<>) with your own.
import wandb
api = wandb.Api()
collection = api.artifact_collection(
type_name ="<collection_type>",
name ="<collection_name>" )
collection.description ="This is a description."collection.save()
For example, the proceeding image shows a collection that documents a model’s architecture, intended use, performance information and more.
2.10 - Create and view lineage maps
Create a lineage map in the W&B Registry.
Within a collection in the W&B Registry, you can view a history of the artifacts that an ML experiment uses. This history is called a lineage graph.
You can also view lineage graphs for artifacts you log to W&B that are not part of a collection.
Lineage graphs can show the specific run that logs an artifact. In addition, lineage graphs can also show which run used an artifact as an input. In other words, lineage graphs can show the input and output of a run.
For example, the proceeding image shows artifacts created and used throughout an ML experiment:
From left to right, the image shows:
Multiple runs log the split_zoo_dataset:v4 artifact.
The “rural-feather-20” run uses the split_zoo_dataset:v4 artifact for training.
The output of the “rural-feather-20” run is a model artifact called zoo-ylbchv20:v0.
A run called “northern-lake-21” uses the model artifact zoo-ylbchv20:v0 to evaluate the model.
Track the input of a run
Mark an artifact as an input or dependency of a run with the wandb.init.use_artifact API.
The proceeding code snippet shows how to use the use_artifact. Replace values enclosed in angle brackets (< >) with your values:
import wandb
# Initialize a runrun = wandb.init(project="<project>", entity="<entity>")
# Get artifact, mark it as a dependencyartifact = run.use_artifact(artifact_or_name="<name>", aliases="<alias>")
The proceeding code snippet shows how to use the wandb.init.log_artifact API. Ensure to replace values enclosed in angle brackets (< >) with your values:
import wandb
# Initialize a runrun = wandb.init(entity "<entity>", project ="<project>",)
artifact = wandb.Artifact(name ="<artifact_name>", type ="<artifact_type>")
artifact.add_file(local_path ="<local_filepath>", name="<optional-name>")
# Log the artifact as an output of the runrun.log_artifact(artifact_or_path = artifact)
For more information on about creating artifacts, see Create an artifact.
View lineage graphs in a collection
View the lineage of an artifact linked to a collection in the W&B Registry.
Navigate to the W&B Registry.
Select the collection that contains the artifact.
From the dropdown, click the artifact version you want to view its lineage graph.
Select the “Lineage” tab.
Once you are in an artifact’s lineage graph page, you can view additional information about any node in that lineage graph.
Select a run node to view that run’s details, such as the run’s ID, the run’s name, the run’s state, and more. As an example, the proceeding image shows information about the rural-feather-20 run:
Select an artifact node to view that artifact’s details, such as its full name, type, creation time, and associated aliases.
2.11 - Migrate from legacy Model Registry
W&B will transition assets from the legacy W&B Model Registry to the new W&B Registry. This migration will be fully managed and triggered by W&B, requiring no intervention from users. The process is designed to be as seamless as possible, with minimal disruption to existing workflows.
The transition will take place once the new W&B Registry includes all the functionalities currently available in the Model Registry. W&B will attempt to preserve current workflows, codebases, and references.
This guide is a living document and will be updated regularly as more information becomes available. For any questions or support, contact support@wandb.com.
How W&B Registry differs from the legacy Model Registry
W&B Registry introduces a range of new features and enhancements designed to provide a more robust and flexible environment for managing models, datasets, and other artifacts.
To view the legacy Model Registry, navigate to the Model Registry in the W&B App. A banner appears at the top of the page that enables you to use the legacy Model Registry App UI.
Organizational visibility
Artifacts linked to the legacy Model Registry have team level visibility. This means that only members of your team can view your artifacts in the legacy W&B Model Registry. W&B Registry has organization level visibility. This means that members across an organization, with correct permissions, can view artifacts linked to a registry.
Restrict visibility to a registry
Restrict who can view and access a custom registry. You can restrict visibility to a registry when you create a custom registry or after you create a custom registry. In a Restricted registry, only selected members can access the content, maintaining privacy and control. For more information about registry visibility, see Registry visibility types.
Create custom registries
Unlike the legacy Model Registry, W&B Registry is not limited to models or dataset registries. You can create custom registries tailored to specific workflows or project needs, capable of holding any arbitrary object type. This flexibility allows teams to organize and manage artifacts according to their unique requirements. For more information on how to create a custom registry, see Create a custom registry.
Custom access control
Each registry supports detailed access control, where members can be assigned specific roles such as Admin, Member, or Viewer. Admins can manage registry settings, including adding or removing members, setting roles, and configuring visibility. This ensures that teams have the necessary control over who can view, manage, and interact with the artifacts in their registries.
Terminology update
Registered models are now referred to as collections.
Summary of changes
Legacy W&B Model Registry
W&B Registry
Artifact visibility
Only members of team can view or access artifacts
Members in your organization, with correct permissions, can view or access artifacts linked to a registry
Custom access control
Not available
Available
Custom registry
Not available
Available
Terminology update
A set of pointers (links) to model versions are called registered models.
A set of pointers (links) to artifact versions are called collections.
wandb.init.link_model
Model Registry specific API
Currently only compatible with legacy model registry
Preparing for the migration
W&B will migrate registered models (now called collections) and associated artifact versions from the legacy Model Registry to the W&B Registry. This process will be conducted automatically, with no action required from users.
Team visibility to organization visibility
After the migration, your model registry will have organization level visibility. You can restrict who has access to a registry by assigning roles. This helps ensure that only specific members have access to specific registries.
The migration will preserve existing permission boundaries of your current team-level registered models (soon to be called collections) in the legacy W&B Model Registry. Permissions currently defined in the legacy Model Registry will be preserved in the new Registry. This means that collections currently restricted to specific team members will remain protected during and after the migration.
Artifact path continuity
No action is currently required.
During the migration
W&B will initiate the migration process. The migration will occur during a time window that minimizes disruption to W&B services. The legacy Model Registry will transition to a read-only state once the migration begins and will remain accessible for reference.
After the migration
Post-migration, collections, artifact versions, and associated attributes will be fully accessible within the new W&B Registry. The focus is on ensuring that current workflows remain intact, with ongoing support available to help navigate any changes.
Using the new registry
Users are encouraged to explore the new features and capabilities available in the W&B Registry. The Registry will not only support the functionalities currently relied upon but also introduces enhancements such as custom registries, improved visibility, and flexible access controls.
Support is available if you are interested in trying the W&B Registry early, or for new users that prefer to start with Registry and not the legacy W&B Model Registry. Contact support@wandb.com or your Sales MLE to enable this functionality. Note that any early migration will be into a BETA version. The BETA version of W&B Registry might not have all the functionality or features of the legacy Model Registry.
For more details and to learn about the full range of features in the W&B Registry, visit the W&B Registry Guide.
FAQs
Why is W&B migrating assets from Model Registry to W&B Registry?
W&B is evolving its platform to offer more advanced features and capabilities with the new Registry. This migration is a step towards providing a more integrated and powerful toolset for managing models, datasets, and other artifacts.
What needs to be done before the migration?
No action is required from users before the migration. W&B will handle the transition, ensuring that workflows and references are preserved.
Will access to model artifacts be lost?
No, access to model artifacts will be retained after the migration. The legacy Model Registry will remain in a read-only state, and all relevant data will be migrated to the new Registry.
Will metadata related to artifacts be preserved?
Yes, important metadata related to artifact creation, lineage, and other attributes will be preserved during the migration. Users will continue to have access to all relevant metadata after the migration, ensuring that the integrity and traceability of their artifacts remain intact.
Who do I contact if I need help?
Support is available for any questions or concerns. Reach out to support@wandb.com for assistance.
2.12 - Model registry
Model registry to manage the model lifecycle from training to production
W&B will no longer support W&B Model Registry after 2024. Users are encouraged to instead use W&B Registry for linking and sharing their model artifacts versions. W&B Registry broadens the capabilities of the legacy W&B Model Registry. For more information about W&B Registry, see the Registry docs.
W&B will migrate existing model artifacts linked to the legacy Model Registry to the new W&B Registry in the Fall or early Winter of 2024. See Migrating from legacy Model Registry for information about the migration process.
The W&B Model Registry houses a team’s trained models where ML Practitioners can publish candidates for production to be consumed by downstream teams and stakeholders. It is used to house staged/candidate models and manage workflows associated with staging.
Move model versions through its ML lifecycle; from staging to production.
Track a model’s lineage and audit the history of changes to production models.
How it works
Track and manage your staged models with a few simple steps.
Log a model version: In your training script, add a few lines of code to save the model files as an artifact to W&B.
Compare performance: Check live charts to compare the metrics and sample predictions from model training and validation. Identify which model version performed the best.
Link to registry: Bookmark the best model version by linking it to a registered model, either programmatically in Python or interactively in the W&B UI.
The following code snippet demonstrates how to log and link a model to the Model Registry:
import wandb
import random
# Start a new W&B runrun = wandb.init(project="models_quickstart")
# Simulate logging model metricsrun.log({"acc": random.random()})
# Create a simulated model filewith open("my_model.h5", "w") as f:
f.write("Model: "+ str(random.random()))
# Log and link the model to the Model Registryrun.link_model(path="./my_model.h5", registered_model_name="MNIST")
run.finish()
Connect model transitions to CI/DC workflows: transition candidate models through workflow stages and automate downstream actions with webhooks or jobs.
How to get started
Depending on your use case, explore the following resources to get started with W&B Models:
Use the W&B Model Registry to manage and version your models, track lineage, and promote models through different lifecycle stages
Automate your model management workflows using webhooks.
See how the Model Registry integrates with external ML systems and tools in your model development lifecycle for model evaluation, monitoring, and deployment.
2.12.1 - Tutorial: Use W&B for model management
Learn how to use W&B for Model Management
The following walkthrough shows you how to log a model to W&B. By the end of the walkthrough you will:
Create and train a model with the MNIST dataset and the Keras framework.
Log the model that you trained to a W&B project
Mark the dataset used as a dependency to the model you created
Link the model to the W&B Registry.
Evaluate the performance of the model you link to the registry
Mark a model version ready for production.
Copy the code snippets in the order presented in this guide.
Code not unique to the Model Registry are hidden in collapsible cells.
Setting up
Before you get started, import the Python dependencies required for this walkthrough:
import wandb
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from wandb.integration.keras import WandbMetricsLogger
from sklearn.model_selection import train_test_split
Provide your W&B entity to the entity variable:
entity ="<entity>"
Create a dataset artifact
First, create a dataset. The proceeding code snippet creates a function that downloads the MNIST dataset:
Next, upload the dataset to W&B. To do this, create an artifact object and add the dataset to that artifact.
project ="model-registry-dev"model_use_case_id ="mnist"job_type ="build_dataset"# Initialize a W&B runrun = wandb.init(entity=entity, project=project, job_type=job_type)
# Create W&B Table for training datatrain_table = wandb.Table(data=[], columns=[])
train_table.add_column("x_train", x_train)
train_table.add_column("y_train", y_train)
train_table.add_computed_columns(lambda ndx, row: {"img": wandb.Image(row["x_train"])})
# Create W&B Table for eval dataeval_table = wandb.Table(data=[], columns=[])
eval_table.add_column("x_eval", x_eval)
eval_table.add_column("y_eval", y_eval)
eval_table.add_computed_columns(lambda ndx, row: {"img": wandb.Image(row["x_eval"])})
# Create an artifact objectartifact_name ="{}_dataset".format(model_use_case_id)
artifact = wandb.Artifact(name=artifact_name, type="dataset")
# Add wandb.WBValue obj to the artifact.artifact.add(train_table, "train_table")
artifact.add(eval_table, "eval_table")
# Persist any changes made to the artifact.artifact.save()
# Tell W&B this run is finished.run.finish()
Storing files (such as datasets) to an artifact is useful in the context of logging models because you lets you track a model’s dependencies.
Train a model
Train a model with the artifact dataset you created in the previous step.
Declare dataset artifact as an input to the run
Declare the dataset artifact you created in a previous step as the input to the W&B run. This is particularly useful in the context of logging models because declaring an artifact as an input to a run lets you track the dataset (and the version of the dataset) used to train a specific model. W&B uses the information collected to create a lineage map.
Use the use_artifact API to both declare the dataset artifact as the input of the run and to retrieve the artifact itself.
job_type ="train_model"config = {
"optimizer": "adam",
"batch_size": 128,
"epochs": 5,
"validation_split": 0.1,
}
# Initialize a W&B runrun = wandb.init(project=project, job_type=job_type, config=config)
# Retrieve the dataset artifactversion ="latest"name ="{}:{}".format("{}_dataset".format(model_use_case_id), version)
artifact = run.use_artifact(artifact_or_name=name)
# Get specific content from the dataframetrain_table = artifact.get("train_table")
x_train = train_table.get_column("x_train", convert_to="numpy")
y_train = train_table.get_column("y_train", convert_to="numpy")
For more information about tracking the inputs and output of a model, see Create model lineage map.
Define and train model
For this walkthrough, define a 2D Convolutional Neural Network (CNN) with Keras to classify images from the MNIST dataset.
Train CNN on MNIST data
# Store values from our config dictionary into variables for easy accessingnum_classes =10input_shape = (28, 28, 1)
loss ="categorical_crossentropy"optimizer = run.config["optimizer"]
metrics = ["accuracy"]
batch_size = run.config["batch_size"]
epochs = run.config["epochs"]
validation_split = run.config["validation_split"]
# Create model architecturemodel = keras.Sequential(
[
layers.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
# Generate labels for training datay_train = keras.utils.to_categorical(y_train, num_classes)
# Create training and test setx_t, x_v, y_t, y_v = train_test_split(x_train, y_train, test_size=0.33)
W&B creates a registered model for you if the name you specify for registered-model-name does not already exist.
See link_model in the API Reference guide for more information on optional parameters.
Evaluate the performance of a model
It is common practice to evaluate the performance of a one or more models.
First, get the evaluation dataset artifact stored in W&B in a previous step.
job_type ="evaluate_model"# Initialize a runrun = wandb.init(project=project, entity=entity, job_type=job_type)
model_use_case_id ="mnist"version ="latest"# Get dataset artifact, mark it as a dependencyartifact = run.use_artifact(
"{}:{}".format("{}_dataset".format(model_use_case_id), version)
)
# Get desired dataframeeval_table = artifact.get("eval_table")
x_eval = eval_table.get_column("x_eval", convert_to="numpy")
y_eval = eval_table.get_column("y_eval", convert_to="numpy")
Download the model version from W&B that you want to evaluate. Use the use_model API to access and download your model.
alias ="latest"# aliasname ="mnist_model"# name of the model artifact# Access and download model. Returns path to downloaded artifactdownloaded_model_path = run.use_model(name=f"{name}:{alias}")
# # Log metrics, images, tables, or any data useful for evaluation.run.log(data={"loss": (loss, _)})
Promote a model version
Mark a model version ready for the next stage of your machine learning workflow with a model alias. Each registered model can have one or more model aliases. A model alias can only belong to a single model version at a time.
For example, suppose that after evaluating a model’s performance, you are confident that the model is ready for production. To promote that model version, add the production alias to that specific model version.
The production alias is one of the most common aliases used to mark a model as production-ready.
You can add an alias to a model version interactively with the W&B App UI or programmatically with the Python SDK. The following steps show how to add an alias with the W&B Model Registry App:
A model version represents a single model checkpoint. Model versions are a snapshot at a point in time of a model and its files within an experiment.
A model version is an immutable directory of data and metadata that describes a trained model. W&B suggests that you add files to your model version that let you store (and restore) your model architecture and learned parameters at a later date.
A model version belongs to one, and only one, model artifact. A model version can belong to zero or more, registered models. Model versions are stored in a model artifact in the order they are logged to the model artifact. W&B automatically creates a new model version if it detects that a model you log (to the same model artifact) has different contents than a previous model version.
Store files within model versions that are produced from the serialization process provided by your modeling library (for example, PyTorch and Keras).
Model alias
Model aliases are mutable strings that allow you to uniquely identify or reference a model version in your registered model with a semantically related identifier. You can only assign an alias to one version of a registered model. This is because an alias should refer to a unique version when used programmatically. It also allows aliases to be used to capture a model’s state (champion, candidate, production).
It is common practice to use aliases such as "best", "latest", "production", or "staging" to mark model versions with special purposes.
For example, suppose you create a model and assign it a "best" alias. You can refer to that specific model with run.use_model
import wandb
run = wandb.init()
name =f"{entity/project/model_artifact_name}:{alias}"run.use_model(name=name)
Model tags
Model tags are keywords or labels that belong to one or more registered models.
Use model tags to organize registered models into categories and to search over those categories in the Model Registry’s search bar. Model tags appear at the top of the Registered Model Card. You might choose to use them to group your registered models by ML task, owning team, or priority. The same model tag can be added to multiple registered models to allow for grouping.
Model tags, which are labels applied to registered models for grouping and discoverability, are different from model aliases. Model aliases are unique identifiers or nicknames that you use to fetch a model version programatically. To learn more about using tags to organize the tasks in your Model Registry, see Organize models.
Model artifact
A model artifact is a collection of logged model versions. Model versions are stored in a model artifact in the order they are logged to the model artifact.
A model artifact can contain one or more model versions. A model artifact can be empty if no model versions are logged to it.
For example, suppose you create a model artifact. During model training, you periodically save your model during checkpoints. Each checkpoint corresponds to its own model version. All of the model versions created during your model training and checkpoint saving are stored in the same model artifact you created at the beginning of your training script.
The proceeding image shows a model artifact that contains three model versions: v0, v1, and v2.
A registered model is a collection of pointers (links) to model versions. You can think of a registered model as a folder of “bookmarks” of candidate models for the same ML task. Each “bookmark” of a registered model is a pointer to a model version that belongs to a model artifact. You can use model tags to group your registered models.
Registered models often represent candidate models for a single modeling use case or task. For example, you might create registered model for different image classification task based on the model you use: ImageClassifier-ResNet50, ImageClassifier-VGG16, DogBreedClassifier-MobileNetV2 and so on. Model versions are assigned version numbers in the order in which they were linked to the registered model.
Track a model, the model’s dependencies, and other information relevant to that model with the W&B Python SDK.
Track a model, the model’s dependencies, and other information relevant to that model with the W&B Python SDK.
Under the hood, W&B creates a lineage of model artifact that you can view with the W&B App UI or programmatically with the W&B Python SDK. See the Create model lineage map for more information.
How to log a model
Use the run.log_model API to log a model. Provide the path where your model files are saved to the path parameter. The path can be a local file, directory, or reference URI to an external bucket such as s3://bucket/path.
Optionally provide a name for the model artifact for the name parameter. If name is not specified, W&B uses the basename of the input path prepended with the run ID.
Copy and paste the proceeding code snippet. Ensure to replace values enclosed in <> with your own.
import wandb
# Initialize a W&B runrun = wandb.init(project="<project>", entity="<entity>")
# Log the modelrun.log_model(path="<path-to-model>", name="<name>")
Example: Log a Keras model to W&B
The proceeding code example shows how to log a convolutional neural network (CNN) model to W&B.
import os
import wandb
from tensorflow import keras
from tensorflow.keras import layers
config = {"optimizer": "adam", "loss": "categorical_crossentropy"}
# Initialize a W&B runrun = wandb.init(entity="charlie", project="mnist-project", config=config)
# Training algorithmloss = run.config["loss"]
optimizer = run.config["optimizer"]
metrics = ["accuracy"]
num_classes =10input_shape = (28, 28, 1)
model = keras.Sequential(
[
layers.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
# Save modelmodel_filename ="model.h5"local_filepath ="./"full_path = os.path.join(local_filepath, model_filename)
model.save(filepath=full_path)
# Log the modelrun.log_model(path=full_path, name="MNIST")
# Explicitly tell W&B to end the run.run.finish()
2.12.4 - Create a registered model
Create a registered model to hold all the candidate models for your modeling tasks.
Create a registered model to hold all the candidate models for your modeling tasks. You can create a registered model interactively within the Model Registry or programmatically with the Python SDK.
Programmatically create registered a model
Programmatically register a model with the W&B Python SDK. W&B automatically creates a registered model for you if the registered model doesn’t exist.
Ensure to replace other the values enclosed in <> with your own:
import wandb
run = wandb.init(entity="<entity>", project="<project>")
run.link_model(path="<path-to-model>", registered_model_name="<registered-model-name>")
run.finish()
The name you provide for registered_model_name is the name that appears in the Model Registry App.
For example, suppose you have a nightly job. It is tedious to manually link a model created each night. Instead, you could create a script that evaluates the model, and if the model improves in performance, link that model to the model registry with the W&B Python SDK.
2.12.5 - Link a model version
Link a model version to a registered model with the W&B App or programmatically with the Python SDK.
Link a model version to a registered model with the W&B App or programmatically with the Python SDK.
Programmatically link a model
Use the link_model method to programmatically log model files to a W&B run and link it to the W&B Model Registry.
Ensure to replace other the values enclosed in <> with your own:
import wandb
run = wandb.init(entity="<entity>", project="<project>")
run.link_model(path="<path-to-model>", registered_model_name="<registered-model-name>")
run.finish()
W&B creates a registered model for you if the name you specify for the registered-model-name parameter does not already exist.
For example, suppose you have an existing registered model named “Fine-Tuned-Review-Autocompletion”(registered-model-name="Fine-Tuned-Review-Autocompletion") in your Model Registry. And suppose that a few model versions are linked to it: v0, v1, v2. If you programmatically link a new model and use the same registered model name (registered-model-name="Fine-Tuned-Review-Autocompletion"), W&B links this model to the existing registered model and assigns it a model version v3. If no registered model with this name exists, a new one registered model is created and it will have a model version v0.
Hover your mouse next to the name of the registered model you want to link a new model to.
Select the meatball menu icon (three horizontal dots) next to View details.
From the dropdown, select Link new version.
From the Project dropdown, select the name of the project that contains your model.
From the Model Artifact dropdown, select the name of the model artifact.
From the Version dropdown, select the model version you want to link to the registered model.
Navigate to your project’s artifact browser on the W&B App at: https://wandb.ai/<entity>/<project>/artifacts
Select the Artifacts icon on the left sidebar.
Click on the model version you want to link to your registry.
Within the Version overview section, click the Link to registry button.
From the modal that appears on the right of the screen, select a registered model from the Select a register model menu dropdown.
Click Next step.
(Optional) Select an alias from the Aliases dropdown.
Click Link to registry.
View the source of linked models
There are two ways to view the source of linked models: The artifact browser within the project that the model is logged to and the W&B Model Registry.
A pointer connects a specific model version in the model registry to the source model artifact (located within the project the model is logged to). The source model artifact also has a pointer to the model registry.
Select View details next the name of your registered model.
Within the Versions section, select View next to the model version you want to investigate.
Click on the Version tab within the right panel.
Within the Version overview section there is a row that contains a Source Version field. The Source Version field shows both the name of the model and the model’s version.
For example, the following image shows a v0 model version called mnist_model (see Source version field mnist_model:v0), linked to a registered model called MNIST-dev.
Navigate to your project’s artifact browser on the W&B App at: https://wandb.ai/<entity>/<project>/artifacts
Select the Artifacts icon on the left sidebar.
Expand the model dropdown menu from the Artifacts panel.
Select the name and version of the model linked to the model registry.
Click on the Version tab within the right panel.
Within the Version overview section there is a row that contains a Linked To field. The Linked To field shows both the name of the registered model and the version it possesses(registered-model-name:version).
For example, in the following image, there is a registered model called MNIST-dev (see the Linked To field). A model version called mnist_model with a version v0(mnist_model:v0) points to the MNIST-dev registered model.
2.12.6 - Organize models
Use model tags to organize registered models into categories and to search over those categories.
Select View details next to the name of the registered model you want to add a model tag to.
Scroll to the Model card section.
Click the plus button (+) next to the Tags field.
Type in the name for your tag or search for a pre-existing model tag.
For example. the following image shows multiple model tags added to a registered model called FineTuned-Review-Autocompletion:
2.12.7 - Create model lineage map
This page describes creating lineage graphs in the legacy W&B Model Registry. To learn about lineage graphs in W&B Registry, refer to Create and view lineage maps.
W&B will transition assets from the legacy W&B Model Registry to the new W&B Registry. This migration will be fully managed and triggered by W&B, requiring no intervention from users. The process is designed to be as seamless as possible, with minimal disruption to existing workflows. Refer to Migrate from legacy Model Registry.
A useful feature of logging model artifacts to W&B are lineage graphs. Lineage graphs show artifacts logged by a run as well as artifacts used by specific run.
This means that, when you log a model artifact, you at a minimum have access to view the W&B run that used or produced the model artifact. If you track a dependency, you also see the inputs used by the model artifact.
For example, the proceeding image shows artifacts created and used throughout an ML experiment:
From left to right, the image shows:
The jumping-monkey-1 W&B run created the mnist_dataset:v0 dataset artifact.
The vague-morning-5 W&B run trained a model using the mnist_dataset:v0 dataset artifact. The output of this W&B run was a model artifact called mnist_model:v0.
A run called serene-haze-6 used the model artifact (mnist_model:v0) to evaluate the model.
Track an artifact dependency
Declare an dataset artifact as an input to a W&B run with the use_artifact API to track a dependency.
The proceeding code snippet shows how to use the use_artifact API:
# Initialize a runrun = wandb.init(project=project, entity=entity)
# Get artifact, mark it as a dependencyartifact = run.use_artifact(artifact_or_name="name", aliases="<alias>")
Once you have retrieved your artifact, you can use that artifact to (for example), evaluate the performance of a model.
Example: Train a model and track a dataset as the input of a model
job_type ="train_model"config = {
"optimizer": "adam",
"batch_size": 128,
"epochs": 5,
"validation_split": 0.1,
}
run = wandb.init(project=project, job_type=job_type, config=config)
version ="latest"name ="{}:{}".format("{}_dataset".format(model_use_case_id), version)
artifact = run.use_artifact(name)
train_table = artifact.get("train_table")
x_train = train_table.get_column("x_train", convert_to="numpy")
y_train = train_table.get_column("y_train", convert_to="numpy")
# Store values from our config dictionary into variables for easy accessingnum_classes =10input_shape = (28, 28, 1)
loss ="categorical_crossentropy"optimizer = run.config["optimizer"]
metrics = ["accuracy"]
batch_size = run.config["batch_size"]
epochs = run.config["epochs"]
validation_split = run.config["validation_split"]
# Create model architecturemodel = keras.Sequential(
[
layers.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
# Generate labels for training datay_train = keras.utils.to_categorical(y_train, num_classes)
# Create training and test setx_t, x_v, y_t, y_v = train_test_split(x_train, y_train, test_size=0.33)
# Train the modelmodel.fit(
x=x_t,
y=y_t,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_v, y_v),
callbacks=[WandbCallback(log_weights=True, log_evaluation=True)],
)
# Save model locallypath ="model.h5"model.save(path)
path ="./model.h5"registered_model_name ="MNIST-dev"name ="mnist_model"run.link_model(path=path, registered_model_name=registered_model_name, name=name)
run.finish()
2.12.8 - Document machine learning model
Add descriptions to model card to document your model
Add a description to the model card of your registered model to document aspects of your machine learning model. Some topics worth documenting include:
Summary: A summary of what the model is. The purpose of the model. The machine learning framework the model uses, and so forth.
Training data: Describe the training data used, processing done on the training data set, where is that data stored and so forth.
Architecture: Information about the model architecture, layers, and any specific design choices.
Deserialize the model: Provide information on how someone on your team can load the model into memory.
Task: The specific type of task or problem that the machine learning model is designed to perform. It’s a categorization of the model’s intended capability.
License: The legal terms and permissions associated with the use of the machine learning model. It helps model users understand the legal framework under which they can utilize the model.
References: Citations or references to relevant research papers, datasets, or external resources.
Deployment: Details on how and where the model is deployed and guidance on how the model is integrated into other enterprise systems, such as a workflow orchestration platforms.
Select View details next to the name of the registered model you want to create a model card for.
Go to the Model card section.
Within the Description field, provide information about your machine learning model. Format text within a model card with Markdown markup language.
For example, the following images shows the model card of a Credit-card Default Prediction registered model.
2.12.9 - Download a model version
How to download a model with W&B Python SDK
Use the W&B Python SDK to download a model artifact that you linked to the Model Registry.
You are responsible for providing additional Python functions, API calls to reconstruct, deserialize your model into a form that you can work with.
W&B suggests that you document information on how to load models into memory with model cards. For more information, see the Document machine learning models page.
Replace values within <> with your own:
import wandb
# Initialize a runrun = wandb.init(project="<project>", entity="<entity>")
# Access and download model. Returns path to downloaded artifactdownloaded_model_path = run.use_model(name="<your-model-name>")
Reference a model version with one of following formats listed:
latest - Use latest alias to specify the model version that is most recently linked.
v# - Use v0, v1, v2, and so on to fetch a specific version in the Registered Model
alias - Specify the custom alias that you and your team assigned to your model version
See use_model in the API Reference guide for more information on possible parameters and return type.
Example: Download and use a logged model
For example, in the proceeding code snippet a user called the use_model API. They specified the name of the model artifact they want to fetch and they also provided a version/alias. They then stored the path that returned from the API to the downloaded_model_path variable.
import wandb
entity ="luka"project ="NLP_Experiments"alias ="latest"# semantic nickname or identifier for the model versionmodel_artifact_name ="fine-tuned-model"# Initialize a runrun = wandb.init()
# Access and download model. Returns path to downloaded artifactdownloaded_model_path = run.use_model(name=f"{entity/project/model_artifact_name}:{alias}")
Planned deprecation for W&B Model Registry in 2024
The proceeding tabs demonstrate how to consume model artifacts using the soon to be deprecated Model Registry.
Use the W&B Registry to track, organize and consume model artifacts. For more information see the Registry docs.
Replace values within <> with your own:
import wandb
# Initialize a runrun = wandb.init(project="<project>", entity="<entity>")
# Access and download model. Returns path to downloaded artifactdownloaded_model_path = run.use_model(name="<your-model-name>")
Reference a model version with one of following formats listed:
latest - Use latest alias to specify the model version that is most recently linked.
v# - Use v0, v1, v2, and so on to fetch a specific version in the Registered Model
alias - Specify the custom alias that you and your team assigned to your model version
See use_model in the API Reference guide for more information on possible parameters and return type.
Select the registered model you want to receive notifications from.
Click on the Connect Slack button.
Follow the instructions to enable W&B in your Slack workspace that appear on the OAuth page.
Once you have configured Slack notifications for your team, you can pick and choose registered models to get notifications from.
A toggle that reads New model version linked to… appears instead of a Connect Slack button if you have Slack notifications configured for your team.
The screenshot below shows a FMNIST classifier registered model that has Slack notifications.
A message is automatically posted to the connected Slack channel each time a new model version is linked to the FMNIST classifier registered model.
2.12.11 - Manage data governance and access control
Use model registry role based access controls (RBAC) to control who can update protected aliases.
Use protected aliases to represent key stages of your model development pipeline. Only Model Registry Administrators can add, modify, or remove protected aliases. Model registry admins can define and use protected aliases. W&B blocks non admin users from adding or removing protected aliases from model versions.
Only Team admins or current registry admins can manage the list of registry admins.
For example, suppose you set staging and production as protected aliases. Any member of your team can add new model versions. However, only admins can add a staging or production alias.
Set up access control
The following steps describe how to set up access controls for your team’s model registry.
Share updates with collaborators, either as a LaTeX zip file a PDF.
The following image shows a section of a report created from metrics that were logged to W&B over the course of training.
View the report where the above image was taken from here.
How it works
Create a collaborative report with a few clicks.
Navigate to your W&B project workspace in the W&B App.
Click the Create report button in the upper right corner of your workspace.
A modal titled Create Report will appear. Select the charts and panels you want to add to your report. (You can add or remove charts and panels later).
Click Create report.
Edit the report to your desired state.
Click Publish to project.
Click the Share button to share your report with collaborators.
See the Create a report page for more information on how to create reports interactively an programmatically with the W&B Python SDK.
How to get started
Depending on your use case, explore the following resources to get started with W&B Reports:
Navigate to your project workspace in the W&B App.
Click Create report in the upper right corner of your workspace.
A modal will appear. Select the charts you would like to start with. You can add or delete charts later from the report interface.
Select the Filter run sets option to prevent new runs from being added to your report. You can toggle this option on or off. Once you click Create report, a draft report will be available in the report tab to continue working on.
Navigate to your project workspace in the W&B App.
Select to the Reports tab (clipboard image) in your project.
Select the Create Report button on the report page.
Create a report programmatically with the wandb library.
Install W&B SDK and Workspaces API:
pip install wandb wandb-workspaces
Next, import workspaces
import wandb
import wandb_workspaces.reports.v2 as wr
Create a report with wandb_workspaces.reports.v2.Report. Create a report instance with the Report Class Public API (wandb.apis.reports). Specify a name for the project.
report = wr.Report(project="report_standard")
Save the report. Reports are not uploaded to the W&B server until you call the .save() method:
report.save()
For information on how to edit a report interactively with the App UI or programmatically, see Edit a report.
3.2 - Edit a report
Edit a report interactively with the App UI or programmatically with the W&B SDK.
Edit a report interactively with the App UI or programmatically with the W&B SDK.
Reports consist of blocks. Blocks make up the body of a report. Within these blocks you can add text, images, embedded visualizations, plots from experiments and run, and panels grids.
Panel grids are a specific type of block that hold panels and run sets. Run sets are a collection of runs logged to a project in W&B. Panels are visualizations of run set data.
Ensure that you have wandb-workspaces installed in addition to the W&B Python SDK if you want to programmatically edit a report:
pip install wandb wandb-workspaces
Add plots
Each panel grid has a set of run sets and a set of panels. The run sets at the bottom of the section control what data shows up on the panels in the grid. Create a new panel grid if you want to add charts that pull data from a different set of runs.
Enter a forward slash (/) in the report to display a dropdown menu. Select Add panel to add a panel. You can add any panel that is supported by W&B, including a line plot, scatter plot or parallel coordinates chart.
Add plots to a report programmatically with the SDK. Pass a list of one or more plot or chart objects to the panels parameter in the PanelGrid Public API Class. Create a plot or chart object with its associated Python Class.
The proceeding examples demonstrates how to create a line plot and scatter plot.
For more information about available plots and charts you can add to a report programmatically, see wr.panels.
Add run sets
Add run sets from projects interactively with the App UI or the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown, choose Panel Grid. This will automatically import the run set from the project the report was created from.
Add run sets from projects with the wr.Runset() and wr.PanelGrid Classes. The proceeding procedure describes how to add a runset:
Create a wr.Runset() object instance. Provide the name of the project that contains the runsets for the project parameter and the entity that owns the project for the entity parameter.
Create a wr.PanelGrid() object instance. Pass a list of one or more runset objects to the runsets parameter.
Store one or more wr.PanelGrid() object instances in a list.
Update the report instance blocks attribute with the list of panel grid instances.
A report automatically updates run sets to show the latest data from the project. You can preserve the run set in a report by freezing that run set. When you freeze a run set, you preserve the state of the run set in a report at a point in time.
To freeze a run set when viewing a report, click the snowflake icon in its panel grid near the Filter button.
Add code blocks
Add code blocks to your report interactively with the App UI or with the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Code.
Select the name of the programming language on the right hand of the code block. This will expand a dropdown. From the dropdown, select your programming language syntax. You can choose from Javascript, Python, CSS, JSON, HTML, Markdown, and YAML.
Use the wr.CodeBlock Class to create a code block programmatically. Provide the name of the language and the code you want to display for the language and code parameters, respectively.
For example the proceeding example demonstrates a list in YAML file:
Add markdown to your report interactively with the App UI or with the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Markdown.
Use the wandb.apis.reports.MarkdownBlock Class to create a markdown block programmatically. Pass a string to the text parameter:
import wandb
import wandb_workspaces.reports.v2 as wr
report = wr.Report(project="report-editing")
report.blocks = [
wr.MarkdownBlock(text="Markdown cell with *italics* and **bold** and $e=mc^2$")
]
This will render a markdown block similar to:
Add HTML elements
Add HTML elements to your report interactively with the App UI or with the W&B SDK.
Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown select a type of text block. For example, to create an H2 heading block, select the Heading 2 option.
Pass a list of one or more HTML elements to wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to create an H1, H2, and an unordered list:
This will render a HTML elements to the following:
Embed rich media links
Embed rich media within the report with the App UI or with the W&B SDK.
Copy and past URLs into reports to embed rich media within the report. The following animations demonstrate how to copy and paste URLs from Twitter, YouTube, and SoundCloud.
Twitter
Copy and paste a Tweet link URL into a report to view the Tweet within the report.
Youtube
Copy and paste a YouTube video URL link to embed a video in the report.
SoundCloud
Copy and paste a SoundCloud link to embed an audio file into a report.
Pass a list of one or more embedded media objects to the wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to embed video and Twitter media into a report:
import wandb
import wandb_workspaces.reports.v2 as wr
report = wr.Report(project="report-editing")
report.blocks = [
wr.Video(url="https://www.youtube.com/embed/6riDJMI-Y8U"),
wr.Twitter(
embed_html='<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The voice of an angel, truly. <a href="https://twitter.com/hashtag/MassEffect?src=hash&ref_src=twsrc%5Etfw">#MassEffect</a> <a href="https://t.co/nMev97Uw7F">pic.twitter.com/nMev97Uw7F</a></p>— Mass Effect (@masseffect) <a href="https://twitter.com/masseffect/status/1428748886655569924?ref_src=twsrc%5Etfw">August 20, 2021</a></blockquote>\n' ),
]
report.save()
Duplicate and delete panel grids
If you have a layout that you would like to reuse, you can select a panel grid and copy-paste it to duplicate it in the same report or even paste it into a different report.
Highlight a whole panel grid section by selecting the drag handle in the upper right corner. Click and drag to highlight and select a region in a report such as panel grids, text, and headings.
Select a panel grid and press delete on your keyboard to delete a panel grid.
Collapse headers to organize Reports
Collapse headers in a Report to hide content within a text block. When the report is loaded, only headers that are expanded will show content. Collapsing headers in reports can help organize your content and prevent excessive data loading. The proceeding gif demonstrates the process.
Visualize relationships across multiple dimensions
To effectively visualize relationships across multiple dimensions, use a color gradient to represent one of the variables. This enhances clarity and makes patterns easier to interpret.
Choose a variable to represent with a color gradient (e.g., penalty scores, learning rates, etc.). This allows for a clearer understanding of how penalty (color) interacts with reward/side effects (y-axis) over training time (x-axis).
Highlight key trends. Hovering over a specific group of runs highlights them in the visualization.
3.3 - Collaborate on reports
Collaborate and share W&B Reports with peers, co-workers, and your team.
Once you have saved a report, you can select the Share button to collaborate. A draft copy of the report is created when you select the Edit button. Draft reports auto-save. Select Save to report to publish your changes to the shared report.
A warning notification will appear if an edit conflict occurs. This can occur if you and another collaborator edit the same report at the same time. The warning notification will guide you to resolve potential edit conflicts.
Comment on reports
Click the comment button on a panel in a report to add a comment directly to that panel.
3.4 - Clone and export reports
Export a W&B Report as a PDF or LaTeX.
Export reports
Export a report as a PDF or LaTeX. Within your report, select the kebab icon to expand the dropdown menu. Choose Download and select either PDF or LaTeX output format.
Cloning reports
Within your report, select the kebab icon to expand the dropdown menu. Choose the Clone this report button. Pick a destination for your cloned report in the modal. Choose Clone report.
Clone a report to reuse a project’s template and format. Cloned projects are visible to your team if you clone a project within the team’s account. Projects cloned within an individual’s account are only visible to that user.
Embed W&B reports directly into Notion or with an HTML IFrame element.
HTML iframe element
Select the Share button on the upper right hand corner within a report. A modal window will appear. Within the modal window, select Copy embed code. The copied code will render within an Inline Frame (IFrame) HTML element. Paste the copied code into an iframe HTML element of your choice.
Only public reports are viewable when embedded.
Confluence
The proceeding animation demonstrates how to insert the direct link to the report within an IFrame cell in Confluence.
Notion
The proceeding animation demonstrates how to insert a report into a Notion document using an Embed block in Notion and the report’s embedded code.
Gradio
You can use the gr.HTML element to embed W&B Reports within Gradio Apps and use them within Hugging Face Spaces.
import gradio as gr
defwandb_report(url):
iframe =f'<iframe src={url} style="border:none;height:1024px;width:100%">'return gr.HTML(iframe)
with gr.Blocks() as demo:
report = wandb_report(
"https://wandb.ai/_scott/pytorch-sweeps-demo/reports/loss-22-10-07-16-00-17---VmlldzoyNzU2NzAx" )
demo.launch()
3.6 - Compare runs across projects
Compare runs from two different projects with cross-project reports.
Compare runs from two different projects with cross-project reports. Use the project selector in the run set table to pick a project.
The visualizations in the section pull columns from the first active runset. Make sure that the first run set checked in the section has that column available if you do not see the metric you are looking for in the line plot.
This feature supports history data on time series lines, but we don’t support pulling different summary metrics from different projects. In other words, you can not create a scatter plot from columns that are only logged in another project.
If you need to compare runs from two projects and the columns are not working, add a tag to the runs in one project and then move those runs to the other project. You can still filter only the runs from each project, but the report includes all the columns for both sets of runs.
View-only report links
Share a view-only link to a report that is in a private project or team project.
View-only report links add a secret access token to the URL, so anyone who opens the link can view the page. Anyone can use the magic link to view the report without logging in first. For customers on W&B Local private cloud installations, these links remain behind your firewall, so only members of your team with access to your private instance and access to the view-only link can view the report.
In view-only mode, someone who is not logged in can see the charts and mouse over to see tooltips of values, zoom in and out on charts, and scroll through columns in the table. When in view mode, they cannot create new charts or new table queries to explore the data. View-only visitors to the report link won’t be able to click a run to get to the run page. Also, the view-only visitors would not be able to see the share modal but instead would see a tooltip on hover which says: Sharing not available for view only access.
The magic links are only available for “Private” and “Team” projects. For “Public” (anyone can view) or “Open” (anyone can view and contribute runs) projects, the links can’t turn on/off because this project is public implying that it is already available to anyone with the link.
Send a graph to a report
Send a graph from your workspace to a report to keep track of your progress. Click the dropdown menu on the chart or panel you’d like to copy to a report and click Add to report to select the destination report.
3.7 - Example reports
Reports gallery
Notes: Add a visualization with a quick summary
Capture an important observation, an idea for future work, or a milestone reached in the development of a project. All experiment runs in your report will link to their parameters, metrics, logs, and code, so you can save the full context of your work.
Jot down some text and pull in relevant charts to illustrate your insight.
Save the best examples from a complex code base for easy reference and future interaction. See the LIDAR point clouds W&B Report for an example of how to visualize LIDAR point clouds from the Lyft dataset and annotate with 3D bounding boxes.
Collaboration: Share findings with your colleagues
Explain how to get started with a project, share what you’ve observed so far, and synthesize the latest findings. Your colleagues can make suggestions or discuss details using comments on any panel or at the end of the report.
Include dynamic settings so that your colleagues can explore for themselves, get additional insights, and better plan their next steps. In this example, three types of experiments can be visualized independently, compared, or averaged.
See the SafeLife benchmark experiments W&B Report for an example of how to share first runs and observations of a benchmark.
Work log: Track what you’ve tried and plan next steps
Write down your thoughts on experiments, your findings, and any gotchas and next steps as you work through a project, keeping everything organized in one place. This lets you “document” all the important pieces beyond your scripts. See the Who Is Them? Text Disambiguation With Transformers W&B Report for an example of how you can report your findings.
Tell the story of a project, which you and others can reference later to understand how and why a model was developed. See The View from the Driver’s Seat W&B Report for how you can report your findings.
See the Learning Dexterity End-to-End Using W&B Reports for an example of how W&B Reports were used to explore how the OpenAI Robotics team used W&B Reports to run massive machine learning projects.
4 - Automations
4.1 - Model registry automations
Use an Automation for model CI (automated model evaluation pipelines) and model deployment.
Create an automation to trigger workflow steps, such as automated model testing and deployment. To create an automation, define the action you want to occur based on an event type.
For example, you can create a trigger that automatically deploys a model to GitHub when you add a new version of a registered model.
Looking for companion tutorials for automations?
This tutorial shows you how to set up an automation that triggers a Github Action for model evaluation and deployment
This video series shows webhook basics and how to set them up in W&B.
This demo details how to setup an automation to deploy a model to a Sagemaker Endpoint
Event types
An event is a change that takes place in the W&B ecosystem. The Model Registry supports two event types:
Use Linking a new artifact to a registered model to test new model candidates.
Use Adding a new alias to a version of the registered model to specify an alias that represents a special step of your workflow, like deploy, and any time a new model version has that alias applied.
Automate a webhook based on an action with the W&B App UI. To do this, first establish a webhook, then configure the webhook automation.
Your webhook’s endpoint must have a fully qualified domain name. W&B does not support connecting to an endpoint by IP address or by a hostname such as localhost. This restriction helps protect against server-side request forgery (SSRF) attacks and other related threat vectors.
Add a secret for authentication or authorization
Secrets are team-level variables that let you obfuscate private strings such as credentials, API keys, passwords, tokens, and more. W&B recommends you use secrets to store any string that you want to protect the plain text content of.
To use a secret in your webhook, you must first add that secret to your team’s secret manager.
Only W&B Admins can create, edit, or delete a secret.
Skip this section if the external server you send HTTP POST requests to does not use secrets.
Secrets are also available if you use W&B Server in an Azure, GCP, or AWS deployment. Connect with your W&B account team to discuss how you can use secrets in W&B if you use a different deployment type.
There are two types of secrets W&B suggests that you create when you use a webhook automation:
Access tokens: Authorize senders to help secure webhook requests
Secret: Ensure the authenticity and integrity of data transmitted from payloads
Follow the instructions below to create a webhook:
Navigate to the W&B App UI.
Click on Team Settings.
Scroll down the page until you find the Team secrets section.
Click on the New secret button.
A modal will appear. Provide a name for your secret in the Secret name field.
Add your secret into the Secret field.
(Optional) Repeat steps 5 and 6 to create another secret (such as an access token) if your webhook requires additional secret keys or tokens to authenticate your webhook.
Specify the secrets you want to use for your webhook automation when you configure the webhook. See the Configure a webhook section for more information.
Once you create a secret, you can access that secret in your W&B workflows with $.
If you use secrets in W&B Server, you are responsible for configuring security measures that satisfy your security needs.
W&B strongly recommends that you store secrets in a W&B instance of a cloud secrets manager provided by AWS, GCP, or Azure. Secret managers provided by AWS, GCP, and Azure are configured with advanced security capabilities.
W&B does not recommend that you use a Kubernetes cluster as the backend of your secrets store. Consider a Kubernetes cluster only if you are not able to use a W&B instance of a cloud secrets manager (AWS, GCP, or Azure), and you understand how to prevent security vulnerabilities that can occur if you use a cluster.
Configure a webhook
Before you can use a webhook, first configure that webhook in the W&B App UI.
Only W&B Admins can configure a webhook for a W&B Team.
Ensure you already created one or more secrets if your webhook requires additional secret keys or tokens to authenticate your webhook.
Navigate to the W&B App UI.
Click on Team Settings.
Scroll down the page until you find the Webhooks section.
Click on the New webhook button.
Provide a name for your webhook in the Name field.
Provide the endpoint URL for the webhook in the URL field.
(Optional) From the Secret dropdown menu, select the secret you want to use to authenticate the webhook payload.
(Optional) From the Access token dropdown menu, select the access token you want to use to authorize the sender.
(Optional) From the Access token dropdown menu select additional secret keys or tokens required to authenticate a webhook (such as an access token).
See the Troubleshoot your webhook section to view where the secret and access token are specified in
the POST request.
Add a webhook
Once you have a webhook configured and (optionally) a secret, navigate to the Model Registry App at https://wandb.ai/registry/model.
From the Event type dropdown, select an event type.
(Optional) If you selected A new version is added to a registered model event, provide the name of a registered model from the Registered model dropdown.
Select Webhooks from the Action type dropdown.
Click on the Next step button.
Select a webhook from the Webhook dropdown.
(Optional) Provide a payload in the JSON expression editor. See the Example payload section for common use case examples.
Click on Next step.
Provide a name for your webhook automation in the Automation name field.
(Optional) Provide a description for your webhook.
Click on the Create automation button.
Example payloads
The following tabs demonstrate example payloads based on common use cases. Within the examples they reference the following keys to refer to condition objects in the payload parameters:
${event_type} Refers to the type of event that triggered the action.
${event_author} Refers to the user that triggered the action.
${artifact_version} Refers to the specific artifact version that triggered the action. Passed as an artifact instance.
${artifact_version_string} Refers to the specific artifact version that triggered the action. Passed as a string.
${artifact_collection_name} Refers to the name of the artifact collection that the artifact version is linked to.
${project_name} Refers to the name of the project owning the mutation that triggered the action.
${entity_name} Refers to the name of the entity owning the mutation that triggered the action.
Verify that your access tokens have required set of permissions to trigger your GHA workflow. For more information, see these GitHub Docs.
Send a repository dispatch from W&B to trigger a GitHub action. For example, suppose you have workflow that accepts a repository dispatch as a trigger for the on key:
on:
repository_dispatch:
types: BUILD_AND_DEPLOY
The payload for the repository might look something like:
The event_type key in the webhook payload must match the types field in the GitHub workflow YAML file.
The contents and positioning of rendered template strings depends on the event or model version the automation is configured for. ${event_type} will render as either LINK_ARTIFACT or ADD_ARTIFACT_ALIAS. See below for an example mapping:
Use template strings to dynamically pass context from W&B to GitHub Actions and other tools. If those tools can call Python scripts, they can consume the registered model artifacts through the W&B API.
Review a W&B report, which illustrates how to use a Github Actions webhook automation for Model CI. Check out this GitHub repository to learn how to create model CI with a Modal Labs webhook.
Configure an ‘Incoming Webhook’ to get the webhook URL for your Teams Channel by configuring. The following is an example payload:
{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"summary": "New Notification",
"sections": [
{
"activityTitle": "Notification from WANDB",
"text": "This is an example message sent via Teams webhook.",
"facts": [
{
"name": "Author",
"value": "${event_author}" },
{
"name": "Event Type",
"value": "${event_type}" }
],
"markdown": true }
]
}
You can use template strings to inject W&B data into your payload at the time of execution (as shown in the Teams example above).
Setup your Slack app and add an incoming webhook integration with the instructions highlighted in the Slack API documentation. Ensure that you have the secret specified under Bot User OAuth Token as your W&B webhook’s access token.
Interactively troubleshoot your webhook with the W&B App UI or programmatically with a Bash script. You can troubleshoot a webhook when you create a new webhook or edit an existing webhook.
Interactively test a webhook with the W&B App UI.
Navigate to your W&B Team Settings page.
Scroll to the Webhooks section.
Click on the horizontal three docs (meatball icon) next to the name of your webhook.
Select Test.
From the UI panel that appears, paste your POST request to the field that appears.
Click on Test webhook.
Within the W&B App UI, W&B posts the response made by your endpoint.
The following bash script generates a POST request similar to the POST request W&B sends to your webhook automation when it is triggered.
Copy and paste the code below into a shell script to troubleshoot your webhook. Specify your own values for the following:
ACCESS_TOKEN
SECRET
PAYLOAD
API_ENDPOINT
#!/bin/bash
# Your access token and secretACCESS_TOKEN="your_api_key"SECRET="your_api_secret"# The data you want to send (for example, in JSON format)PAYLOAD='{"key1": "value1", "key2": "value2"}'# Generate the HMAC signature# For security, Wandb includes the X-Wandb-Signature in the header computed # from the payload and the shared secret key associated with the webhook # using the HMAC with SHA-256 algorithm.SIGNATURE=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" -binary | base64)# Make the cURL requestcurl -X POST \
-H "Content-Type: application/json"\
-H "Authorization: Bearer $ACCESS_TOKEN"\
-H "X-Wandb-Signature: $SIGNATURE"\
-d "$PAYLOAD" API_ENDPOINT
View automation
View automations associated to a registered model from the W&B App UI.
Scroll to the bottom of the page to the Automations section.
Hover your mouse next to the name of the automation and click on the kebob (three vertical dots) menu.
Select Delete.
4.2 - Trigger CI/CD events when artifact changes
Use an project scoped artifact automation in your project to trigger actions when aliases or versions in an artifact collection are created or changed.
Create an automation that triggers when an artifact is changed. Use artifact automations when you want to automate downstream actions for versioning artifacts. To create an automation, define the action you want to occur based on an event type.
Artifact automations are scoped to a project. This means that only events within a project will trigger an artifact automation.
This is in contrast to automations created in the W&B Model Registry. Automations created in the model registry are in scope of the Model Registry. They are triggered when events are performed on model versions linked to the Model Registry. For information on how to create an automations for model versions, see the Automations for Model CI/CD page in the Model Registry chapter.
Event types
An event is a change that takes place in the W&B ecosystem. You can define two different event types for artifact collections in your project: A new version of an artifact is created in a collection and An artifact alias is added.
Use the A new version of an artifact is created in a collection event type for applying recurring actions to each version of an artifact. For example, you can create an automation that automatically starts a training job when a new dataset artifact version is created.
Use the An artifact alias is added event type to create an automation that activates when a specific alias is applied to an artifact version. For example, you could create an automation that triggers an action when someone adds “test-set-quality-check” alias to an artifact that then triggers downstream processing on that dataset.
Create a webhook automation
Automate a webhook based on an action with the W&B App UI. To do this, you will first establish a webhook, then you will configure the webhook automation.
Specify an endpoint for your webhook that has an Address record (A record). W&B does not support connecting to endpoints that are exposed directly with IP addresses such as [0-255].[0-255].[0-255].[0.255] or endpoints exposed as localhost. This restriction helps protect against server-side request forgery (SSRF) attacks and other related threat vectors.
Add a secret for authentication or authorization
Secrets are team-level variables that let you obfuscate private strings such as credentials, API keys, passwords, tokens, and more. W&B recommends you use secrets to store any string that you want to protect the plain text content of.
To use a secret in your webhook, you must first add that secret to your team’s secret manager.
Only W&B Admins can create, edit, or delete a secret.
Skip this section if the external server you send HTTP POST requests to does not use secrets.
Secrets are also available if you use W&B Server in an Azure, GCP, or AWS deployment. Connect with your W&B account team to discuss how you can use secrets in W&B if you use a different deployment type.
There are two types of secrets W&B suggests that you create when you use a webhook automation:
Access tokens: Authorize senders to help secure webhook requests
Secret: Ensure the authenticity and integrity of data transmitted from payloads
Follow the instructions below to create a webhook:
Navigate to the W&B App UI.
Click on Team Settings.
Scroll down the page until you find the Team secrets section.
Click on the New secret button.
A modal will appear. Provide a name for your secret in the Secret name field.
Add your secret into the Secret field.
(Optional) Repeat steps 5 and 6 to create another secret (such as an access token) if your webhook requires additional secret keys or tokens to authenticate your webhook.
Specify the secrets you want to use for your webhook automation when you configure the webhook. See the Configure a webhook section for more information.
Once you create a secret, you can access that secret in your W&B workflows with $.
Configure a webhook
Before you can use a webhook, you will first need to configure that webhook in the W&B App UI.
Only W&B Admins can configure a webhook for a W&B Team.
Ensure you already created one or more secrets if your webhook requires additional secret keys or tokens to authenticate your webhook.
Navigate to the W&B App UI.
Click on Team Settings.
Scroll down the page until you find the Webhooks section.
Click on the New webhook button.
Provide a name for your webhook in the Name field.
Provide the endpoint URL for the webhook in the URL field.
(Optional) From the Secret dropdown menu, select the secret you want to use to authenticate the webhook payload.
(Optional) From the Access token dropdown menu, select the access token you want to use to authorize the sender.
(Optional) From the Access token dropdown menu select additional secret keys or tokens required to authenticate a webhook (such as an access token).
See the Troubleshoot your webhook section to view where the secret and access token are specified in
the POST request.
Add a webhook
Once you have a webhook configured and (optionally) a secret, navigate to your project workspace. Click on the Automations tab on the left sidebar.
From the Event type dropdown, select an event type.
If you selected A new version of an artifact is created in a collection event, provide the name of the artifact collection that the automation should respond to from the Artifact collection dropdown.
Select Webhooks from the Action type dropdown.
Click on the Next step button.
Select a webhook from the Webhook dropdown.
(Optional) Provide a payload in the JSON expression editor. See the Example payload section for common use case examples.
Click on Next step.
Provide a name for your webhook automation in the Automation name field.
(Optional) Provide a description for your webhook.
Click on the Create automation button.
Example payloads
The following tabs demonstrate example payloads based on common use cases. Within the examples they reference the following keys to refer to condition objects in the payload parameters:
${event_type} Refers to the type of event that triggered the action.
${event_author} Refers to the user that triggered the action.
${artifact_version} Refers to the specific artifact version that triggered the action. Passed as an artifact instance.
${artifact_version_string} Refers to the specific artifact version that triggered the action. Passed as a string.
${artifact_collection_name} Refers to the name of the artifact collection that the artifact version is linked to.
${project_name} Refers to the name of the project owning the mutation that triggered the action.
${entity_name} Refers to the name of the entity owning the mutation that triggered the action.
Verify that your access tokens have required set of permissions to trigger your GHA workflow. For more information, see these GitHub Docs.
Send a repository dispatch from W&B to trigger a GitHub action. For example, suppose you have workflow that accepts a repository dispatch as a trigger for the on key:
on:
repository_dispatch:
types: BUILD_AND_DEPLOY
The payload for the repository might look something like:
The event_type key in the webhook payload must match the types field in the GitHub workflow YAML file.
The contents and positioning of rendered template strings depends on the event or model version the automation is configured for. ${event_type} will render as either LINK_ARTIFACT or ADD_ARTIFACT_ALIAS. See below for an example mapping:
Use template strings to dynamically pass context from W&B to GitHub Actions and other tools. If those tools can call Python scripts, they can consume W&B artifacts through the W&B API.
Configure an ‘Incoming Webhook’ to get the webhook URL for your Teams Channel by configuring. The following is an example payload:
{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"summary": "New Notification",
"sections": [
{
"activityTitle": "Notification from WANDB",
"text": "This is an example message sent via Teams webhook.",
"facts": [
{
"name": "Author",
"value": "${event_author}" },
{
"name": "Event Type",
"value": "${event_type}" }
],
"markdown": true }
]
}
You can use template strings to inject W&B data into your payload at the time of execution (as shown in the Teams example above).
Setup your Slack app and add an incoming webhook integration with the instructions highlighted in the Slack API documentation. Ensure that you have the secret specified under Bot User OAuth Token as your W&B webhook’s access token.
Interactively troubleshoot your webhook with the W&B App UI or programmatically with a Bash script. You can troubleshoot a webhook when you create a new webhook or edit an existing webhook.
Interactively test a webhook with the W&B App UI.
Navigate to your W&B Team Settings page.
Scroll to the Webhooks section.
Click on the horizontal three docs (meatball icon) next to the name of your webhook.
Select Test.
From the UI panel that appears, paste your POST request to the field that appears.
Click on Test webhook.
Within the W&B App UI, W&B posts the response made by your endpoint.