This is the multi-page printable view of this section. Click here to print.
Manage data
- 1: Delete an artifact
- 2: Manage artifact data retention
- 3: Manage artifact storage and memory allocation
1 - Delete an artifact
Delete artifacts interactively with the App UI or programmatically with the W&B SDK. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage.
The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions.
The sections in this page describe how to delete specific artifact versions, how to delete an artifact collection, how to delete artifacts with and without aliases, and more. You can schedule when artifacts are deleted from W&B with TTL policies. For more information, see Manage data retention with Artifact TTL policy.
Delete an artifact version
To delete an artifact version:
- Select the name of the artifact. This will expand the artifact view and list all the artifact versions associated with that artifact.
- From the list of artifacts, select the artifact version you want to delete.
- On the right hand side of the workspace, select the kebab dropdown.
- Choose Delete.
An artifact version can also be deleted programatically via the delete() method. See the examples below.
Delete multiple artifact versions with aliases
The following code example demonstrates how to delete artifacts that have aliases associated with them. Provide the entity, project name, and run ID that created the artifacts.
import wandb
run = api.run("entity/project/run_id")
for artifact in run.logged_artifacts():
artifact.delete()
Set the delete_aliases
parameter to the boolean value, True
to delete aliases if the artifact has one or more aliases.
import wandb
run = api.run("entity/project/run_id")
for artifact in run.logged_artifacts():
# Set delete_aliases=True in order to delete
# artifacts with one more aliases
artifact.delete(delete_aliases=True)
Delete multiple artifact versions with a specific alias
The proceeding code demonstrates how to delete multiple artifact versions that have a specific alias. Provide the entity, project name, and run ID that created the artifacts. Replace the deletion logic with your own:
import wandb
runs = api.run("entity/project_name/run_id")
# Delete artifact ith alias 'v3' and 'v4
for artifact_version in runs.logged_artifacts():
# Replace with your own deletion logic.
if artifact_version.name[-2:] == "v3" or artifact_version.name[-2:] == "v4":
artifact.delete(delete_aliases=True)
Delete all versions of an artifact that do not have an alias
The following code snippet demonstrates how to delete all versions of an artifact that do not have an alias. Provide the name of the project and entity for the project
and entity
keys in wandb.Api
, respectively. Replace the <>
with the name of your artifact:
import wandb
# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})
artifact_type, artifact_name = "<>" # provide type and name
for v in api.artifact_versions(artifact_type, artifact_name):
# Clean up versions that don't have an alias such as 'latest'.
# NOTE: You can put whatever deletion logic you want here.
if len(v.aliases) == 0:
v.delete()
Delete an artifact collection
To delete an artifact collection:
- Navigate to the artifact collection you want to delete and hover over it.
- Select the kebab dropdown next to the artifact collection name.
- Choose Delete.
You can also delete artifact collection programmatically with the delete() method. Provide the name of the project and entity for the project
and entity
keys in wandb.Api
, respectively:
import wandb
# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})
collection = api.artifact_collection(
"<artifact_type>", "entity/project/artifact_collection_name"
)
collection.delete()
How to enable garbage collection based on how W&B is hosted
Garbage collection is enabled by default if you use W&B’s shared cloud. Based on how you host W&B, you might need to take additional steps to enable garbage collection, this includes:
- Set the
GORILLA_ARTIFACT_GC_ENABLED
environment variable to true:GORILLA_ARTIFACT_GC_ENABLED=true
- Enable bucket versioning if you use AWS, GCP or any other storage provider such as Minio. If you use Azure, enable soft deletion.
Soft deletion in Azure is equivalent to bucket versioning in other storage providers.
The following table describes how to satisfy requirements to enable garbage collection based on your deployment type.
The X
indicates you must satisfy the requirement:
Environment variable | Enable versioning | |
---|---|---|
Shared cloud | ||
Shared cloud with secure storage connector | X | |
Dedicated cloud | ||
Dedicated cloud with secure storage connector | X | |
Customer-managed cloud | X | X |
Customer managed on-prem | X | X |
2 - Manage artifact data retention
Schedule when artifacts are deleted from W&B with W&B Artifact time-to-live (TTL) policy. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W&B deletes artifacts, see the Delete artifacts page.
Check out this video tutorial to learn how to manage data retention with Artifacts TTL in the W&B App.
- Only team admins can view a team’s settings and access team level TTL settings such as (1) permitting who can set or edit a TTL policy or (2) setting a team default TTL.
- If you do not see the option to set or edit a TTL policy in an artifact’s details in the W&B App UI or if setting a TTL programmatically does not successfully change an artifact’s TTL property, your team admin has not given you permissions to do so.
Auto-generated Artifacts
Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W&B cannot have TTL policies set for them.
The following Artifact types indicate an auto-generated Artifact:
run_table
code
job
- Any Artifact type starting with:
wandb-*
You can check an Artifact’s type on the W&B platform or programmatically:
import wandb
run = wandb.init(project="<my-project-name>")
artifact = run.use_artifact(artifact_or_name="<my-artifact-name>")
print(artifact.type)
Replace the values enclosed with <>
with your own.
Define who can edit and set TTL policies
Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions.
- Navigate to your team’s profile page.
- Select the Settings tab.
- Navigate to the Artifacts time-to-live (TTL) section.
- From the TTL permissions dropdown, select who can set and edit TTL policies.
- Click on Review and save settings.
- Confirm the changes and select Save settings.
Create a TTL policy
Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created.
For all the code snippets below, replace the content wrapped in <>
with your information to use the code snippet.
Set a TTL policy when you create an artifact
Use the W&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days.
ttl
attribute.The steps are as follows:
- Create an artifact.
- Add content to the artifact such as files, a directory, or a reference.
- Define a TTL time limit with the
datetime.timedelta
data type that is part of Python’s standard library. - Log the artifact.
The following code snippet demonstrates how to create an artifact and set a TTL policy.
import wandb
from datetime import timedelta
run = wandb.init(project="<my-project-name>", entity="<my-entity>")
artifact = wandb.Artifact(name="<artifact-name>", type="<type>")
artifact.add_file("<my_file>")
artifact.ttl = timedelta(days=30) # Set TTL policy
run.log_artifact(artifact)
The preceding code snippet sets the TTL policy for the artifact to 30 days. In other words, W&B deletes the artifact after 30 days.
Set or edit a TTL policy after you create an artifact
Use the W&B App UI or the W&B Python SDK to define a TTL policy for an artifact that already exists.
createdAt
timestamp.- Fetch your artifact.
- Pass in a time delta to the artifact’s
ttl
attribute. - Update the artifact with the
save
method.
The following code snippet shows how to set a TTL policy for an artifact:
import wandb
from datetime import timedelta
artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = timedelta(days=365 * 2) # Delete in two years
artifact.save()
The preceding code example sets the TTL policy to two years.
- Navigate to your W&B project in the W&B App UI.
- Select the artifact icon on the left panel.
- From the list of artifacts, expand the artifact type you
- Select on the artifact version you want to edit the TTL policy for.
- Click on the Version tab.
- From the dropdown, select Edit TTL policy.
- Within the modal that appears, select Custom from the TTL policy dropdown.
- Within the TTL duration field, set the TTL policy in units of days.
- Select the Update TTL button to save your changes.
Set default TTL policies for a team
Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team’s default TTL.
- Navigate to your team’s profile page.
- Select the Settings tab.
- Navigate to the Artifacts time-to-live (TTL) section.
- Click on the Set team’s default TTL policy.
- Within the Duration field, set the TTL policy in units of days.
- Click on Review and save settings. 7/ Confirm the changes and then select Save settings.
Set a TTL policy outside of a run
Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days.
The following code sample shows how to fetch an artifact using the public API and set the TTL policy.
api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")
artifact.ttl = timedelta(days=365) # Delete in one year
artifact.save()
Deactivate a TTL policy
Use the W&B Python SDK or W&B App UI to deactivate a TTL policy for a specific artifact version.
- Fetch your artifact.
- Set the artifact’s
ttl
attribute toNone
. - Update the artifact with the
save
method.
The following code snippet shows how to turn off a TTL policy for an artifact:
artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = None
artifact.save()
- Navigate to your W&B project in the W&B App UI.
- Select the artifact icon on the left panel.
- From the list of artifacts, expand the artifact type you
- Select on the artifact version you want to edit the TTL policy for.
- Click on the Version tab.
- Click on the meatball UI icon next to the Link to registry button.
- From the dropdown, select Edit TTL policy.
- Within the modal that appears, select Deactivate from the TTL policy dropdown.
- Select the Update TTL button to save your changes.
View TTL policies
View TTL policies for artifacts with the Python SDK or with the W&B App UI.
Use a print statement to view an artifact’s TTL policy. The following example shows how to retrieve an artifact and view its TTL policy:
artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
print(artifact.ttl)
View a TTL policy for an artifact with the W&B App UI.
- Navigate to the W&B App at https://wandb.ai.
- Go to your W&B Project.
- Within your project, select the Artifacts tab in the left sidebar.
- Click on a collection.
Within the collection view you can see all of the artifacts in the selected collection. Within the Time to Live
column you will see the TTL policy assigned to that artifact.
3 - Manage artifact storage and memory allocation
W&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit.
For sensitive files, we recommend you set up Private Hosting or use reference artifacts.
During training, W&B locally saves logs, artifacts, and configuration files in the following local directories:
File | Default location | To change default location set: |
---|---|---|
logs | ./wandb |
dir in wandb.init or set the WANDB_DIR environment variable |
artifacts | ~/.cache/wandb |
the WANDB_CACHE_DIR environment variable |
configs | ~/.config/wandb |
the WANDB_CONFIG_DIR environment variable |
wandb
is initialized on, these default folders may not be located in a writeable part of the file system. This might trigger an error.Clean up local artifact cache
W&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the wandb artifact cache cleanup
command to prune the cache and to remove any files that have not been used recently.
The proceeding code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal:
$ wandb artifact cache cleanup 1GB