Bring your own bucket (BYOB)
15 minute read
Overview
Bring your own bucket (BYOB) allows you to store W&B artifacts and other related sensitive data in your own cloud or on-prem infrastructure. In case of Dedicated Cloud or Multi-tenant Cloud, data that you store in your bucket is not copied to the W&B managed infrastructure.
- Communication between W&B SDK / CLI / UI and your buckets occurs using pre-signed URLs.
- W&B uses a garbage collection process to delete W&B Artifacts. For more information, see Deleting Artifacts.
- You can specify a sub-path when configuring a bucket, to ensure that W&B does not store any files in a folder at the root of the bucket. It can help you better conform to your organzation’s bucket governance policy.
Data stored in the central database vs buckets
When using BYOB functionality, certain types of data will be stored in the W&B central database, and other types will be stored in your bucket.
Database
- Metadata for users, teams, artifacts, experiments, and projects
- Reports
- Experiment logs
- System metrics
- Console logs
Buckets
- Experiment files and metrics
- Artifact files
- Media files
- Run files
- Exported history metrics and system events in Parquet format
Bucket scopes
There are two scopes you can configure your storage bucket to:
Scope | Description |
---|---|
Instance level | In Dedicated Cloud and Self-Managed, any user with the required permissions within your organization or instance can access files stored in your instance’s storage bucket. Not applicable to Multi-tenant Cloud. |
Team level | If a W&B Team is configured to use a Team level storage bucket, team members can access files stored in it. Team level storage buckets allow greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements. Team level storage can help different business units or departments sharing an instance to efficiently use the infrastructure and administrative resources. It can also allow separate project teams to manage AI workflows for separate customer engagements. Available for all deployment types. You configure team level BYOB when setting up the team. |
This flexible design allows for many different storage topologies, depending on your organization’s needs. For example:
- The same bucket can be used for the instance and one or more teams.
- Each team can use a separate bucket, some teams can choose to write to the instance bucket, or multiple teams can share a bucket by writing to subpaths.
- Buckets for different teams can be hosted in different cloud infrastructure environments or regions, and can be managed by different storage admin teams.
For example, suppose you have a team called Kappa in your organization. Your organization (and Team Kappa) use the Instance level storage bucket by default. Next, you create a team called Omega. When you create Team Omega, you configure a Team level storage bucket for that team. Files generated by Team Omega are not accessible by Team Kappa. However, files created by Team Kappa are accessible by Team Omega. If you want to isolate data for Team Kappa, you must configure a Team level storage bucket for them as well.
Availability matrix
W&B can connect to the following storage providers:
- CoreWeave AI Object Storage is a high-performance, S3-compatible object storage service optimized for AI workloads.
- Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.
- Google Cloud Storage is a managed service for storing unstructured data at scale.
- Azure Blob Storage is a cloud-based object storage solution for storing massive amounts of unstructured data like text, binary data, images, videos, and logs.
- S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.
The following table shows the availability of BYOB at each scope for each W&B deployment type.
W&B deployment type | Instance level | Team level | Additional information |
---|---|---|---|
Dedicated Cloud | ✓1 | ✓ | Instance and team level BYOB are supported for CoreWeave AI Object Storage, AWS S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or on-prem infrastructure. |
Multi-tenant Cloud | Not Applicable | ✓1 | Team level BYOB is supported for AWS S3, and GCP Storage. W&B fully manages the default and only storage bucket for Microsoft Azure. Team level BYOB for CoreWeave AI Object Storage is coming soon. |
Self-Managed | ✓1 | ✓ | Instance and team level BYOB are supported for CoreWeave AI Object Storage, AWS S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises. |
1. CoreWeave AI Object Storage is currently supported only for instance or team level BYOB on Dedicated Cloud and Self-Managed. Team level BYOB on Multi-tenant Cloud is coming soon.
Set up BYOB
1. Provision your bucket
After verifying availability, you are ready to provision your storage bucket, including its access policy and CORS. Select a tab to continue.
- Dedicated Cloud or Self-Hosted v0.70.0 or newer. Not yet available for Multi-tenant Cloud.
- A CoreWeave account with AI Object Storage enabled and with permission to create buckets, API access keys, and secret keys.
- Your W&B instance must be able to connect to CoreWeave network endpoints.
For details, see Create a CoreWeave AI Object Storage bucket in the CoreWeave documentation.
- Create the bucket with a name of your choice in your preferred CoreWeave availability zone. Optionally create a folder for W&B to use as a sub-path for all W&B files. Make a note of the bucket name, availability zone, API access key, secret key, and sub-path.
- Set the following Cross-origin resource sharing (CORS) policy for the bucket:
CoreWeave storage is S3-compatible. For details about CORS, refer to Configuring cross-origin resource sharing (CORS) in the AWS documentation.
[ { "AllowedHeaders": [ "*" ], "AllowedMethods": [ "GET", "HEAD", "PUT" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "ETag" ], "MaxAgeSeconds": 3000 } ]
For details, see Create an S3 bucket in the AWS documentation.
-
Provision the KMS Key.
W&B requires you to provision a KMS Key to encrypt and decrypt the data on the S3 bucket. The key usage type must be
ENCRYPT_DECRYPT
. Assign the following policy to the key:{ "Version": "2012-10-17", "Statement": [ { "Sid" : "Internal", "Effect" : "Allow", "Principal" : { "AWS" : "<Your_Account_Id>" }, "Action" : "kms:*", "Resource" : "<aws_kms_key.key.arn>" }, { "Sid" : "External", "Effect" : "Allow", "Principal" : { "AWS" : "<aws_principal_and_role_arn>" }, "Action" : [ "kms:Decrypt", "kms:Describe*", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*" ], "Resource" : "<aws_kms_key.key.arn>" } ] }
Replace
<Your_Account_Id>
and<aws_kms_key.key.arn>
accordingly.If you are using Multi-tenant Cloud or Dedicated Cloud, replace
<aws_principal_and_role_arn>
with the corresponding value:- For Multi-tenant Cloud:
arn:aws:iam::725579432336:role/WandbIntegration
- For Dedicated Cloud:
arn:aws:iam::830241207209:root
This policy grants your AWS account full access to the key and also assigns the required permissions to the AWS account hosting the W&B Platform. Keep a record of the KMS Key ARN.
- For Multi-tenant Cloud:
-
Provision the S3 Bucket.
Follow these steps to provision the S3 bucket in your AWS account:
-
Create the S3 bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.
-
Enable server side encryption, using the KMS key from the previous step.
-
Configure CORS with the following policy:
[ { "AllowedHeaders": [ "*" ], "AllowedMethods": [ "GET", "HEAD", "PUT" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "ETag" ], "MaxAgeSeconds": 3000 } ]
If data in your bucket expires due to an object lifecycle management policy, you may lose the ability to read the history of some runs. -
Grant the required S3 permissions to the AWS account hosting the W&B Platform, which requires these permissions to generate pre-signed URLs that AI workloads in your cloud infrastructure or user browsers utilize to access the bucket.
{ "Version": "2012-10-17", "Id": "WandBAccess", "Statement": [ { "Sid": "WAndBAccountAccess", "Effect": "Allow", "Principal": { "AWS": "<aws_principal_and_role_arn>" }, "Action" : [ "s3:GetObject*", "s3:GetEncryptionConfiguration", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListBucketVersions", "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:PutObject", "s3:GetBucketCORS", "s3:GetBucketLocation", "s3:GetBucketVersioning" ], "Resource": [ "arn:aws:s3:::<wandb_bucket>", "arn:aws:s3:::<wandb_bucket>/*" ] } ] }
Replace
<wandb_bucket>
accordingly and keep a record of the bucket name. Next, configure W&B.If you are using Multi-tenant Cloud or Dedicated Cloud, replace
<aws_principal_and_role_arn>
with the corresponding value.- For Multi-tenant Cloud:
arn:aws:iam::725579432336:role/WandbIntegration
- For Dedicated Cloud:
arn:aws:iam::830241207209:root
- For Multi-tenant Cloud:
-
For more details, see the AWS self-managed hosting guide.
For details, see Create a bucket in the GCP documentation.
-
Provision the GCS bucket.
Follow these steps to provision the GCS bucket in your GCP project:
-
Create the GCS bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.
-
Set encryption type to
Google-managed
. -
Set the CORS policy with
gsutil
. This is not possible in the UI.-
Create a file called
cors-policy.json
locally. -
Copy the following CORS policy into the file and save it.
[ { "origin": ["*"], "responseHeader": ["Content-Type"], "exposeHeaders": ["ETag"], "method": ["GET", "HEAD", "PUT"], "maxAgeSeconds": 3000 } ]
If data in your bucket expires due to an object lifecycle management policy, you may lose the ability to read the history of some runs.
-
-
Replace
<bucket_name>
with the correct bucket name and rungsutil
.gsutil cors set cors-policy.json gs://<bucket_name>
-
Verify the bucket’s policy. Replace
<bucket_name>
with the correct bucket name.gsutil cors get gs://<bucket_name>
-
-
If you are using Multi-tenant Cloud or Dedicated Cloud, grant the
storage.admin
role to the GCP service account linked to the W&B Platform. W&B requires this role to check the bucket’s CORS configuration and attributes, such as whether object versioning is enabled. If the service account does not have thestorage.admin
role, these checks result in a HTTP 403 error.- For Multi-tenant Cloud, the account is:
wandb-integration@wandb-production.iam.gserviceaccount.com
- For Dedicated Cloud the account is:
deploy@wandb-production.iam.gserviceaccount.com
Keep a record of the bucket name. Next, configure W&B for BYOB.
- For Multi-tenant Cloud, the account is:
For details, see Create a blob storage container in the Azure documentation.
-
Provision the Azure Blob Storage container.
For the instance level BYOB, if you’re not using this Terraform module, follow the steps below to provision a Azure Blob Storage bucket in your Azure subscription:
-
Create a bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.
-
Configure the CORS policy on the bucket
To set the CORS policy through the UI go to the blob storage, scroll down to
Settings/Resource Sharing (CORS)
and then set the following:Parameter Value Allowed Origins *
Allowed Methods GET
,HEAD
,PUT
Allowed Headers *
Exposed Headers *
Max Age 3000
If data in your bucket expires due to an object lifecycle management policy, you may lose the ability to read the history of some runs.
-
-
Generate a storage account access key and make a note of its name and the storage account name. If you are using Dedicated Cloud, share the storage account name and access key with your W&B team using a secure sharing mechanism.
For team level BYOB, W&B recommends that you use Terraform to provision the Azure Blob Storage bucket along with the necessary access mechanism and permissions. If you use Dedicated Cloud, provide the OIDC issuer URL for your instance. Make a note of the following details:
- Storage account name
- Storage container name
- Managed identity client id
- Azure tenant id
Create your S3-compatible bucket. Make a note of:
- Access key
- Secret access key
- URL endpoint
- Bucket name
- Folder path, if applicable.
- Region
Next, determine the storage address.
2. Determine the storage address
This section explains the syntax to use to connect W&B to a BYOB storage bucket. In the examples, replace placeholder values between angle brackets (<>
) with your bucket’s details.
Select a tab for detailed instructions.
To configure CoreWeave for instance-level BYOB, specify the bucket name rather than the full bucket path.
For team level BYOB, determine the full bucket path using the following format. Replace placeholders between angle brackets (<>
) with the bucket’s values.
Bucket format:
cw://<accessKey>:<secretAccessKey>@cwobject.com/<bucketName>?tls=true
The cwobject.com
HTTPS endpoint is supported. TLS 1.3 is required. Contact support to express interest in other CoreWeave endpoints.
- The
cw://
protocol specifier is preferred.
Bucket format:
s3://<accessKey>:<secretAccessKey>@<s3_regional_url_endpoint>/<bucketName>?region=<region>
In the address, the region
parameter is mandatory unless both your W&B instance and your storage bucket are deployed AWS, and the W&B instance’s AWS_REGION
matches the bucket’s AWS S3 region.
Bucket format:
gs://<serviceAccountEmail>:<urlEncodedPrivateKey>@<bucketName>
Bucket format:
az://:<urlEncodedAccessKey>@<storageAccountName>/<containerName>
Bucket format:
s3://<accessKey>:<secretAccessKey>@<url_endpoint>/<bucketName>?region=<region>&tls=true
In the address, the region
parameter is mandatory.
This section is for S3-compatible storage buckets that are not hosted in S3, like MinIO hosted on your premises. For storage buckets hosted in AWS S3, see the AWS tab instead.
For Cloud-native storage buckets with an optional S3-compatible mode, use the Cloud-native protocol specifier when possible. For example, use cw://
for a CoreWeave bucket, rather than s3://
.
After determining the storage address, you are ready to configure instance level BYOB or configure team level BYOB.
3. Configure W&B
After you provision your bucket and determine its address, you are ready to configure BYOB at the instance level or team level.
Instance level BYOB
This section shows how to configure W&B for instance level BYOB. For Team Level BYOB, refer to Team level BYOB instead.
For Dedicated Cloud: Share the bucket details with your W&B team, who will configure your Dedicated Cloud instance.
For Self-Managed, you can configure instance level BYOB using the W&B App:
- Log in to W&B as a user with the
admin
role. - Click the user icon at the top, then click System Console.
- Go to Settings > System Connections.
- In the Bucket Storage section, ensure the identity in the Identity field is granted access to the new bucket.
- Select the Provider.
- Enter the new Bucket Name.
- Optionally, enter the Path to use in the new bucket.
- Click Save
For Self-Managed, W&B recommends using the Terraform module managed by W&B to provision a storage bucket along with the necessary access mechanism and related IAM permissions:
- AWS
- GCP
- Azure - Instance level BYOB or Team level BYOB
Team level BYOB
After you determine the storage location for your bucket, you can use the W&B App to configure team level BYOB while creating a team.
- After a team is created, its storage cannot be changed.
- For Instance level BYOB, refer to Instance level BYOB instead.
- If you plan to configure CoreWeave storage for the team, contact support to verify that your bucket is configured correctly in CoreWeave and to validate your team’s configuration, since the storage details cannot be changed after the team is created.
-
If you’re connecting to a cloud-native storage bucket in another cloud or to an S3-compatible storage bucket like MinIO for team-level BYOB in your Dedicated cloud or Self-managed instance, you must add the bucket path to the
GORILLA_SUPPORTED_FILE_STORES
environment variable and then restart W&B, before following the rest of these steps to use the storage bucket for a team. -
Log in to W&B as a user with the
admin
role, click the icon at the top left to open the left navigation, then click Create a team to collaborate. -
Provide a name for the team.
-
Set Storage Type to External storage.
To use the instance level storage for team storage (regardless of whether it is internal or external), leave Storage Type set to Internal, even if the instance level bucket is configured for BYOB. To use separate external storage for the team, set Storage Type for the team to External and configure the bucket details in the next step. -
Click Bucket location.
-
To use an existing bucket, select it from the list. To add a new bucket, click Add bucket at the bottom, then provide the bucket’s details.
Click Cloud provider and select CoreWeave, AWS, GCP, or Azure. CoreWeave is not yet available for teams on Multi-tenant Cloud. If the cloud provider is not listed, ensure that you followed step 1 to add the bucket path to the
GORILLA_SUPPORTED_FILE_STORES
environment variable. If no buckets from that provider are include in the environment variable, that provider is not listed.-
Specify the bucket.
- For CoreWeave, provide only the bucket name.
- For Amazon S3, GCP, or S3-compatible storage, provide the full bucket path you determined earlier.
- For Azure on W&B Dedicated or Self-Managed, set Account name to the Azure account and Container name to the Azure blob storage container.
-
Optionally:
- If applicable, set Path to the bucket sub-path.
- AWS: Set KMS key ARN to the ARN of your KMS encryption key.
- Azure: If applicable, specify values for Tenant ID and Managed Identity Client ID.
-
-
For Multi-tenant Cloud, optionally invite members to the team. In Invite team members, specify a comma-separated list of email addresses. Otherwise, you can invite members to the team after it is created.
For Dedicated Cloud or Self-Managed, you can invite members to the team after it is created.
-
Click Create team.
If W&B encounters errors accessing the bucket or detects invalid settings, an error or warning displays at the bottom of the page. Otherwise, the team is created.
Troubleshooting
Connecting to CoreWeave AI Object Storage
- Connection errors
- Verify that your W&B instance can connect to CoreWeave network endpoints.
- CoreWeave uses virtual-hosted style paths, where the bucket name is a subdomain at the beginning of the path. For example:
cw://bucket-name.cwobject.com
is correct, whileis not.cw://cwobject.com/bucket-name/
- Bucket names must not contain underscores (
_
) or other characters incompatible with DNS rules. - Bucket names must be globally unique among CoreWeave locations.
- Bucket names must not begin with
cw-
orvip-
, which are reserved prefixes.
- CORS validation failures
- A CORS policy is required. CoreWeave is S3-compatible; for details about CORS, refer to Configuring cross-origin resource sharing (CORS) in the AWS documentation.
AllowedMethods
must include methodsGET
,PUT
, andHEAD
.ExposeHeaders
must include `ETag.- W&B front-end domains must be included in the CORS policy’s
AllowedOrigins
. The example CORS policies provided on this page include all domains using*
.
- LOTA endpoint issues
- Connecting to LOTA endpoints from W&B is not yet supported. To express interest, contact support.
- Access key and permission errors
- Verify that your CoreWeave API Access Key is not expired.
- Verify that your CoreWeave API Access Key and Secret Key have sufficient permissions
GetObject
,PutObject
,DeleteObject
,ListBucket
. The examples in this page meet this requirement. Refer to Create and Manage Access Keys in the CoreWeave documentation.
Feedback
Was this page helpful?
Glad to hear it! If you have more to say, please let us know.
Sorry to hear that. Please tell us how we can improve.