Bring your own bucket (BYOB)

Overview

Bring your own bucket (BYOB) allows you to store W&B artifacts and other related sensitive data in your own cloud or on-prem infrastructure. In case of Dedicated Cloud or Multi-tenant Cloud, data that you store in your bucket is not copied to the W&B managed infrastructure.

Data stored in the central database vs buckets

When using BYOB functionality, certain types of data will be stored in the W&B central database, and other types will be stored in your bucket.

Database

  • Metadata for users, teams, artifacts, experiments, and projects
  • Reports
  • Experiment logs
  • System metrics
  • Console logs

Buckets

  • Experiment files and metrics
  • Artifact files
  • Media files
  • Run files
  • Exported history metrics and system events in Parquet format

Bucket scopes

There are two scopes you can configure your storage bucket to:

Scope Description
Instance level In Dedicated Cloud and Self-Managed, any user with the required permissions within your organization or instance can access files stored in your instance’s storage bucket. Not applicable to Multi-tenant Cloud.
Team level If a W&B Team is configured to use a Team level storage bucket, team members can access files stored in it. Team level storage buckets allow greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements.

Team level storage can help different business units or departments sharing an instance to efficiently use the infrastructure and administrative resources. It can also allow separate project teams to manage AI workflows for separate customer engagements. Available for all deployment types. You configure team level BYOB when setting up the team.

This flexible design allows for many different storage topologies, depending on your organization’s needs. For example:

  • The same bucket can be used for the instance and one or more teams.
  • Each team can use a separate bucket, some teams can choose to write to the instance bucket, or multiple teams can share a bucket by writing to subpaths.
  • Buckets for different teams can be hosted in different cloud infrastructure environments or regions, and can be managed by different storage admin teams.

For example, suppose you have a team called Kappa in your organization. Your organization (and Team Kappa) use the Instance level storage bucket by default. Next, you create a team called Omega. When you create Team Omega, you configure a Team level storage bucket for that team. Files generated by Team Omega are not accessible by Team Kappa. However, files created by Team Kappa are accessible by Team Omega. If you want to isolate data for Team Kappa, you must configure a Team level storage bucket for them as well.

Availability matrix

W&B can connect to the following storage providers:

  • CoreWeave AI Object Storage is a high-performance, S3-compatible object storage service optimized for AI workloads.
  • Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.
  • Google Cloud Storage is a managed service for storing unstructured data at scale.
  • Azure Blob Storage is a cloud-based object storage solution for storing massive amounts of unstructured data like text, binary data, images, videos, and logs.
  • S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.

The following table shows the availability of BYOB at each scope for each W&B deployment type.

W&B deployment type Instance level Team level Additional information
Dedicated Cloud 1 Instance and team level BYOB are supported for CoreWeave AI Object Storage, AWS S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or on-prem infrastructure.
Multi-tenant Cloud Not Applicable 1 Team level BYOB is supported for AWS S3, and GCP Storage. W&B fully manages the default and only storage bucket for Microsoft Azure. Team level BYOB for CoreWeave AI Object Storage is coming soon.
Self-Managed 1 Instance and team level BYOB are supported for CoreWeave AI Object Storage, AWS S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.

1. CoreWeave AI Object Storage is currently supported only for instance or team level BYOB on Dedicated Cloud and Self-Managed. Team level BYOB on Multi-tenant Cloud is coming soon.

Set up BYOB

1. Provision your bucket

After verifying availability, you are ready to provision your storage bucket, including its access policy and CORS. Select a tab to continue.

Requirements:

  • Dedicated Cloud or Self-Hosted v0.70.0 or newer. Not yet available for Multi-tenant Cloud.
  • A CoreWeave account with AI Object Storage enabled and with permission to create buckets, API access keys, and secret keys.
  • Your W&B instance must be able to connect to CoreWeave network endpoints.

For details, see Create a CoreWeave AI Object Storage bucket in the CoreWeave documentation.

  1. Create the bucket with a name of your choice in your preferred CoreWeave availability zone. Optionally create a folder for W&B to use as a sub-path for all W&B files. Make a note of the bucket name, availability zone, API access key, secret key, and sub-path.
  2. Set the following Cross-origin resource sharing (CORS) policy for the bucket:
    [
      {
        "AllowedHeaders": [
          "*"
        ],
        "AllowedMethods": [
          "GET",
          "HEAD",
          "PUT"
        ],
        "AllowedOrigins": [
          "*"
        ],
        "ExposeHeaders": [
          "ETag"
        ],
        "MaxAgeSeconds": 3000
      }
    ]
    
    CoreWeave storage is S3-compatible. For details about CORS, refer to Configuring cross-origin resource sharing (CORS) in the AWS documentation.

For details, see Create an S3 bucket in the AWS documentation.

  1. Provision the KMS Key.

    W&B requires you to provision a KMS Key to encrypt and decrypt the data on the S3 bucket. The key usage type must be ENCRYPT_DECRYPT. Assign the following policy to the key:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid" : "Internal",
          "Effect" : "Allow",
          "Principal" : { "AWS" : "<Your_Account_Id>" },
          "Action" : "kms:*",
          "Resource" : "<aws_kms_key.key.arn>"
        },
        {
          "Sid" : "External",
          "Effect" : "Allow",
          "Principal" : { "AWS" : "<aws_principal_and_role_arn>" },
          "Action" : [
            "kms:Decrypt",
            "kms:Describe*",
            "kms:Encrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*"
          ],
          "Resource" : "<aws_kms_key.key.arn>"
        }
      ]
    }
    

    Replace <Your_Account_Id> and <aws_kms_key.key.arn> accordingly.

    If you are using Multi-tenant Cloud or Dedicated Cloud, replace <aws_principal_and_role_arn> with the corresponding value:

    This policy grants your AWS account full access to the key and also assigns the required permissions to the AWS account hosting the W&B Platform. Keep a record of the KMS Key ARN.

  2. Provision the S3 Bucket.

    Follow these steps to provision the S3 bucket in your AWS account:

    1. Create the S3 bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

    2. Enable server side encryption, using the KMS key from the previous step.

    3. Configure CORS with the following policy:

      [
        {
            "AllowedHeaders": [
                "*"
            ],
            "AllowedMethods": [
                "GET",
                "HEAD",
                "PUT"
            ],
            "AllowedOrigins": [
                "*"
            ],
            "ExposeHeaders": [
                "ETag"
            ],
            "MaxAgeSeconds": 3000
        }
      ]
      
    4. Grant the required S3 permissions to the AWS account hosting the W&B Platform, which requires these permissions to generate pre-signed URLs that AI workloads in your cloud infrastructure or user browsers utilize to access the bucket.

      {
        "Version": "2012-10-17",
        "Id": "WandBAccess",
        "Statement": [
          {
            "Sid": "WAndBAccountAccess",
            "Effect": "Allow",
            "Principal": { "AWS": "<aws_principal_and_role_arn>" },
              "Action" : [
                "s3:GetObject*",
                "s3:GetEncryptionConfiguration",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:GetBucketCORS",
                "s3:GetBucketLocation",
                "s3:GetBucketVersioning"
              ],
            "Resource": [
              "arn:aws:s3:::<wandb_bucket>",
              "arn:aws:s3:::<wandb_bucket>/*"
            ]
          }
        ]
      }
      

      Replace <wandb_bucket> accordingly and keep a record of the bucket name. Next, configure W&B.

      If you are using Multi-tenant Cloud or Dedicated Cloud, replace <aws_principal_and_role_arn> with the corresponding value.

For more details, see the AWS self-managed hosting guide.

For details, see Create a bucket in the GCP documentation.

  1. Provision the GCS bucket.

    Follow these steps to provision the GCS bucket in your GCP project:

    1. Create the GCS bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

    2. Set encryption type to Google-managed.

    3. Set the CORS policy with gsutil. This is not possible in the UI.

      1. Create a file called cors-policy.json locally.

      2. Copy the following CORS policy into the file and save it.

        [
          {
            "origin": ["*"],
            "responseHeader": ["Content-Type"],
            "exposeHeaders": ["ETag"],
            "method": ["GET", "HEAD", "PUT"],
            "maxAgeSeconds": 3000
          }
        ]
        
    4. Replace <bucket_name> with the correct bucket name and run gsutil.

      gsutil cors set cors-policy.json gs://<bucket_name>
      
    5. Verify the bucket’s policy. Replace <bucket_name> with the correct bucket name.

      gsutil cors get gs://<bucket_name>
      
  2. If you are using Multi-tenant Cloud or Dedicated Cloud, grant the storage.admin role to the GCP service account linked to the W&B Platform. W&B requires this role to check the bucket’s CORS configuration and attributes, such as whether object versioning is enabled. If the service account does not have the storage.admin role, these checks result in a HTTP 403 error.

    • For Multi-tenant Cloud, the account is: wandb-integration@wandb-production.iam.gserviceaccount.com
    • For Dedicated Cloud the account is: deploy@wandb-production.iam.gserviceaccount.com

    Keep a record of the bucket name. Next, configure W&B for BYOB.

For details, see Create a blob storage container in the Azure documentation.

  1. Provision the Azure Blob Storage container.

    For the instance level BYOB, if you’re not using this Terraform module, follow the steps below to provision a Azure Blob Storage bucket in your Azure subscription:

    1. Create a bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

    2. Configure the CORS policy on the bucket

      To set the CORS policy through the UI go to the blob storage, scroll down to Settings/Resource Sharing (CORS) and then set the following:

      Parameter Value
      Allowed Origins *
      Allowed Methods GET, HEAD, PUT
      Allowed Headers *
      Exposed Headers *
      Max Age 3000
  2. Generate a storage account access key and make a note of its name and the storage account name. If you are using Dedicated Cloud, share the storage account name and access key with your W&B team using a secure sharing mechanism.

    For team level BYOB, W&B recommends that you use Terraform to provision the Azure Blob Storage bucket along with the necessary access mechanism and permissions. If you use Dedicated Cloud, provide the OIDC issuer URL for your instance. Make a note of the following details:

    • Storage account name
    • Storage container name
    • Managed identity client id
    • Azure tenant id

Create your S3-compatible bucket. Make a note of:

  • Access key
  • Secret access key
  • URL endpoint
  • Bucket name
  • Folder path, if applicable.
  • Region

Next, determine the storage address.

2. Determine the storage address

This section explains the syntax to use to connect W&B to a BYOB storage bucket. In the examples, replace placeholder values between angle brackets (<>) with your bucket’s details.

Select a tab for detailed instructions.

To configure CoreWeave for instance-level BYOB, specify the bucket name rather than the full bucket path.

For team level BYOB, determine the full bucket path using the following format. Replace placeholders between angle brackets (<>) with the bucket’s values.

Bucket format:

cw://<accessKey>:<secretAccessKey>@cwobject.com/<bucketName>?tls=true

The cwobject.com HTTPS endpoint is supported. TLS 1.3 is required. Contact support to express interest in other CoreWeave endpoints.

  • The cw:// protocol specifier is preferred.

Bucket format:

s3://<accessKey>:<secretAccessKey>@<s3_regional_url_endpoint>/<bucketName>?region=<region>

In the address, the region parameter is mandatory unless both your W&B instance and your storage bucket are deployed AWS, and the W&B instance’s AWS_REGION matches the bucket’s AWS S3 region.

Bucket format:

gs://<serviceAccountEmail>:<urlEncodedPrivateKey>@<bucketName>

Bucket format:

az://:<urlEncodedAccessKey>@<storageAccountName>/<containerName>

Bucket format:

s3://<accessKey>:<secretAccessKey>@<url_endpoint>/<bucketName>?region=<region>&tls=true

In the address, the region parameter is mandatory.

After determining the storage address, you are ready to configure instance level BYOB or configure team level BYOB.

3. Configure W&B

After you provision your bucket and determine its address, you are ready to configure BYOB at the instance level or team level.

Instance level BYOB

This section shows how to configure W&B for instance level BYOB. For Team Level BYOB, refer to Team level BYOB instead.

For Dedicated Cloud: Share the bucket details with your W&B team, who will configure your Dedicated Cloud instance.

For Self-Managed, you can configure instance level BYOB using the W&B App:

  1. Log in to W&B as a user with the admin role.
  2. Click the user icon at the top, then click System Console.
  3. Go to Settings > System Connections.
  4. In the Bucket Storage section, ensure the identity in the Identity field is granted access to the new bucket.
  5. Select the Provider.
  6. Enter the new Bucket Name.
  7. Optionally, enter the Path to use in the new bucket.
  8. Click Save

Team level BYOB

After you determine the storage location for your bucket, you can use the W&B App to configure team level BYOB while creating a team.

  1. If you’re connecting to a cloud-native storage bucket in another cloud or to an S3-compatible storage bucket like MinIO for team-level BYOB in your Dedicated cloud or Self-managed instance, you must add the bucket path to the GORILLA_SUPPORTED_FILE_STORES environment variable and then restart W&B, before following the rest of these steps to use the storage bucket for a team.

  2. Log in to W&B as a user with the admin role, click the icon at the top left to open the left navigation, then click Create a team to collaborate.

  3. Provide a name for the team.

  4. Set Storage Type to External storage.

  5. Click Bucket location.

  6. To use an existing bucket, select it from the list. To add a new bucket, click Add bucket at the bottom, then provide the bucket’s details.

    Click Cloud provider and select CoreWeave, AWS, GCP, or Azure. CoreWeave is not yet available for teams on Multi-tenant Cloud. If the cloud provider is not listed, ensure that you followed step 1 to add the bucket path to the GORILLA_SUPPORTED_FILE_STORES environment variable. If no buckets from that provider are include in the environment variable, that provider is not listed.

    • Specify the bucket.

      • For CoreWeave, provide only the bucket name.
      • For Amazon S3, GCP, or S3-compatible storage, provide the full bucket path you determined earlier.
      • For Azure on W&B Dedicated or Self-Managed, set Account name to the Azure account and Container name to the Azure blob storage container.
    • Optionally:

      • If applicable, set Path to the bucket sub-path.
      • AWS: Set KMS key ARN to the ARN of your KMS encryption key.
      • Azure: If applicable, specify values for Tenant ID and Managed Identity Client ID.
  7. For Multi-tenant Cloud, optionally invite members to the team. In Invite team members, specify a comma-separated list of email addresses. Otherwise, you can invite members to the team after it is created.

    For Dedicated Cloud or Self-Managed, you can invite members to the team after it is created.

  8. Click Create team.

    If W&B encounters errors accessing the bucket or detects invalid settings, an error or warning displays at the bottom of the page. Otherwise, the team is created.

Troubleshooting

Connecting to CoreWeave AI Object Storage
  • Connection errors
    • Verify that your W&B instance can connect to CoreWeave network endpoints.
    • CoreWeave uses virtual-hosted style paths, where the bucket name is a subdomain at the beginning of the path. For example: cw://bucket-name.cwobject.com is correct, while cw://cwobject.com/bucket-name/ is not.
    • Bucket names must not contain underscores (_) or other characters incompatible with DNS rules.
    • Bucket names must be globally unique among CoreWeave locations.
    • Bucket names must not begin with cw- or vip-, which are reserved prefixes.
  • CORS validation failures
    • A CORS policy is required. CoreWeave is S3-compatible; for details about CORS, refer to Configuring cross-origin resource sharing (CORS) in the AWS documentation.
    • AllowedMethods must include methods GET, PUT, and HEAD.
    • ExposeHeaders must include `ETag.
    • W&B front-end domains must be included in the CORS policy’s AllowedOrigins. The example CORS policies provided on this page include all domains using *.
  • LOTA endpoint issues
    • Connecting to LOTA endpoints from W&B is not yet supported. To express interest, contact support.
  • Access key and permission errors
    • Verify that your CoreWeave API Access Key is not expired.
    • Verify that your CoreWeave API Access Key and Secret Key have sufficient permissions GetObject, PutObject, DeleteObject, ListBucket. The examples in this page meet this requirement. Refer to Create and Manage Access Keys in the CoreWeave documentation.