This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

W&B Platform

W&B Platform is the foundational infrastructure, tooling and governance scaffolding which supports the W&B products like Core, Models and Weave.

W&B Platform is available in three different deployment options:

The following responsibility matrix outlines some of the key differences between the different options:

Deployment options

The following sections provide an overview of each deployment type.

W&B Multi-tenant Cloud

W&B Multi-tenant Cloud is a fully managed service deployed in W&B’s cloud infrastructure, where you can seamlessly access the W&B products at the desired scale, with cost-efficient options for pricing, and with continuous updates for the latest features and functionalities. W&B recommends to use the Multi-tenant Cloud for your product trial, or to manage your production AI workflows if you do not need the security of a private deployment, self-service onboarding is important, and cost efficiency is critical.

See W&B Multi-tenant Cloud for more information.

W&B Dedicated Cloud

W&B Dedicated Cloud is a single-tenant, fully managed service deployed in W&B’s cloud infrastructure. It is the best place to onboard W&B if your organization requires conformance to strict governance controls including data residency, have need of advanced security capabilities, and are looking to optimize their AI operating costs by not having to build & manage the required infrastructure with security, scale & performance characteristics.

See W&B Dedicated Cloud for more information.

W&B Customer-Managed

With this option, you can deploy and manage W&B Server on your own managed infrastructure. W&B Server is a self-contained packaged mechanism to run the W&B Platform & its supported W&B products. W&B recommends this option if all your existing infrastructure is on-prem, or your organization has strict regulatory needs that are not satisfied by W&B Dedicated Cloud. With this option, you are fully responsible to manage the provisioning, and continuous maintenance & upgrades of the infrastructure required to support W&B Server.

See W&B Self Managed for more information.

Next steps

If you’re looking to try any of the W&B products, W&B recommends using the Multi-tenant Cloud. If you’re looking for an enterprise-friendly setup, choose the appropriate deployment type for your trial here.

1 - Deployment options

1.1 - Use W&B Multi-tenant SaaS

W&B Multi-tenant Cloud is a fully managed platform deployed in W&B’s Google Cloud Platform (GCP) account in GPC’s North America regions. W&B Multi-tenant Cloud utilizes autoscaling in GCP to ensure that the platform scales appropriately based on increases or decreases in traffic.

Data security

For non enterprise plan users, all data is only stored in the shared cloud storage and is processed with shared cloud compute services. Depending on your pricing plan, you may be subject to storage limits.

Enterprise plan users can bring their own bucket (BYOB) using the secure storage connector at the team level to store their files such as models, datasets, and more. You can configure a single bucket for multiple teams or you can use separate buckets for different W&B Teams. If you do not configure secure storage connector for a team, that data is stored in the shared cloud storage.

Identity and access management (IAM)

If you are on enterprise plan, you can use the identity and access managements capabilities for secure authentication and effective authorization in your W&B Organization. The following features are available for IAM in Multi-tenant Cloud:

  • SSO authentication with OIDC or SAML. Reach out to your W&B team or support if you would like to configure SSO for your organization.
  • Configure appropriate user roles at the scope of the organization and within a team.
  • Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it with restricted projects.

Monitor

Organization admins can manage usage and billing for their account from the Billing tab in their account view. If using the shared cloud storage on Multi-tenant Cloud, an admin can optimize storage usage across different teams in their organization.

Maintenance

W&B Multi-tenant Cloud is a multi-tenant, fully managed platform. Since W&B Multi-tenant Cloud is managed by W&B, you do not incur the overhead and costs of provisioning and maintaining the W&B platform.

Compliance

Security controls for Multi-tenant Cloud are periodically audited internally and externally. Refer to the W&B Security Portal to request the SOC2 report and other security and compliance documents.

Next steps

Access Multi-tenant Cloud directly if you are looking for non-enterprise capabilities. To start with the enterprise plan, submit this form.

1.2 - Dedicated Cloud

Use dedicated cloud (Single-tenant SaaS)

W&B Dedicated Cloud is a single-tenant, fully managed platform deployed in W&B’s AWS, GCP or Azure cloud accounts. Each Dedicated Cloud instance has its own isolated network, compute and storage from other W&B Dedicated Cloud instances. Your W&B specific metadata and data is stored in an isolated cloud storage and is processed using isolated cloud compute services.

W&B Dedicated Cloud is available in multiple global regions for each cloud provider

Data security

You can bring your own bucket (BYOB) using the secure storage connector at the instance and team levels to store your files such as models, datasets, and more.

Similar to W&B Multi-tenant Cloud, you can configure a single bucket for multiple teams or you can use separate buckets for different teams. If you do not configure secure storage connector for a team, that data is stored in the instance level bucket.

In addition to BYOB with secure storage connector, you can utilize IP allowlisting to restrict access to your Dedicated Cloud instance from only trusted network locations.

You can also privately connect to your Dedicated Cloud instance using cloud provider’s secure connectivity solution.

Identity and access management (IAM)

Use the identity and access management capabilities for secure authentication and effective authorization in your W&B Organization. The following features are available for IAM in Dedicated Cloud instances:

Monitor

Use Audit logs to track user activity within your teams and to conform to your enterprise governance requirements. Also, you can view organization usage in our Dedicated Cloud instance with W&B Organization Dashboard.

Maintenance

Similar to W&B Multi-tenant Cloud, you do not incur the overhead and costs of provisioning and maintaining the W&B platform with Dedicated Cloud.

To understand how W&B manages updates on Dedicated Cloud, refer to the server release process.

Compliance

Security controls for W&B Dedicated Cloud are periodically audited internally and externally. Refer to the W&B Security Portal to request the security and compliance documents for your product assessment exercise.

Migration options

Migration to Dedicated Cloud from a Self-managed instance or Multi-tenant Cloud is supported.

Next steps

Submit this form if you are interested in using Dedicated Cloud.

1.2.1 - Supported Dedicated Cloud regions

AWS, GCP, and Azure support cloud computing services in multiple locations worldwide. Global regions help ensure that you satisfy requirements related to data residency & compliance, latency, cost efficiency and more. W&B supports many of the available global regions for Dedicated Cloud.

Supported AWS Regions

The following table lists AWS Regions that W&B currently supports for Dedicated Cloud instances.

Region location Region name
US East (Ohio) us-east-2
US East (N. Virginia) us-east-1
US West (N. California) us-west-1
US West (Oregon) us-west-2
Canada (Central) ca-central-1
Europe (Frankfurt) eu-central-1
Europe (Ireland) eu-west-1
Europe (London) eu-west-2
Europe (Milan) eu-south-1
Europe (Stockholm) eu-north-1
Asia Pacific (Mumbai) ap-south-1
Asia Pacific (Singapore) ap-southeast-1
Asia Pacific (Sydney) ap-southeast-2
Asia Pacific (Tokyo) ap-northeast-1
Asia Pacific (Seoul) ap-northeast-2

For more information about AWS Regions, see the Regions, Availability Zones, and Local Zones in the AWS Documentation.

See What to Consider when Selecting a Region for your Workloads for an overview of factors that you should consider when choosing an AWS Region.

Supported GCP Regions

The following table lists GCP Regions that W&B currently supports for Dedicated Cloud instances.

Region location Region name
South Carolina us-east1
N. Virginia us-east4
Iowa us-central1
Oregon us-west1
Los Angeles us-west2
Las Vegas us-west4
Toronto northamerica-northeast2
Belgium europe-west1
London europe-west2
Frankfurt europe-west3
Netherlands europe-west4
Sydney australia-southeast1
Tokyo asia-northeast1
Seoul asia-northeast3

For more information about GCP Regions, see Regions and zones in the GCP Documentation.

Supported Azure Region

The following table lists Azure regions that W&B currently supports for Dedicated Cloud instances.

Region location Region name
Virginia eastus
Iowa centralus
Washington westus2
California westus
Canada Central canadacentral
France Central francecentral
Netherlands westeurope
Tokyo, Saitama japaneast
Seoul koreacentral

For more information about Azure regions, see Azure geographies in the Azure Documentation.

1.2.2 - Export data from Dedicated cloud

Export data from Dedicated cloud

If you would like to export all the data managed in your Dedicated cloud instance, you can use the W&B SDK API to extract the runs, metrics, artifacts, and more with the Import and Export API. The following table has covers some of the key exporting use cases.

Purpose Documentation
Export project metadata Projects API
Export runs in a project Runs API
Export reports Reports API
Export artifacts Explore artifact graphs, Download and use artifacts

If you manage artifacts stored in the Dedicated cloud with Secure Storage Connector, you may not need to export the artifacts using the W&B SDK API.

1.3 - Self-managed

Deploying W&B in production

Use self-managed cloud or on-prem infrastructure

Deploy W&B Server on your AWS, GCP, or Azure cloud account or within your on-premises infrastructure.

Your IT/DevOps/MLOps team is responsible for provisioning your deployment, managing upgrades, and continuously maintaining your self managed W&B Server instance.

Deploy W&B Server within self managed cloud accounts

W&B recommends that you use official W&B Terraform scripts to deploy W&B Server into your AWS, GCP, or Azure cloud account.

See specific cloud provider documentation for more information on how to set up W&B Server in AWS, GCP or Azure.

Deploy W&B Server in on-prem infrastructure

You need to configure several infrastructure components in order to set up W&B Server in your on-prem infrastructure. Some of those components include include, but are not limited to:

  • (Strongly recommended) Kubernetes cluster
  • MySQL 8 database cluster
  • Amazon S3-compatible object storage
  • Redis cache cluster

See Install on on-prem infrastructure for more information on how to install W&B Server on your on-prem infrastructure. W&B can provide recommendations for the different components and provide guidance through the installation process.

Deploy W&B Server on a custom cloud platform

You can deploy W&B Server to a cloud platform that is not AWS, GCP, or Azure. Requirements for that are similar to that for deploying in on-prem infrastructure.

Obtain your W&B Server license

You need a W&B trial license to complete your configuration of the W&B server. Open the Deploy Manager to generate a free trial license.

The URL redirects you to a Get a License for W&B Local form. Provide the following information:

  1. Choose a deployment type from the Choose Platform step.
  2. Select the owner of the license or add a new organization in the Basic Information step.
  3. Provide a name for the instance in the Name of Instance field and optionally provide a description in the Description field in the Get a License step.
  4. Select the Generate License Key button.

A page displays with an overview of your deployment along with the license associated with the instance.

1.3.1 - Tutorial: Run a W&B Server using Docker

Run Weights and Biases on your own machines using Docker

Follow this “Hello, world!” example to learn the general workflow to install W&B Server for Dedicated Cloud and Self Managed hosting options. By the end of this demo, you will know how to host W&B Server on your local machine using a Trial Mode W&B license.

For demonstration purposes, this demo uses a local development server on port 8080 (localhost:8080).

Prerequisites

Before you get started, ensure your local machine satisfies the following requirements:

  1. Install Python
  2. Install Docker and ensure it is running
  3. Install or upgrade the latest version of W&B:
    pip install --upgrade wandb
    

1. Pull the W&B Docker image

Run the following in your terminal:

wandb server start

This command pulls the latest W&B Docker image wandb/local.

2. Create a W&B account

Navigate to http://localhost:8080/signup and create an initial user account. Provide a name, email address, a username, and a password:

Click the Sign Up button to create a W&B account.

Copy your API key

After you create an account, navigate to http://localhost:8080/authorize.

Copy the W&B API key that appears on the screen. At a later step, you will need this key at a later step to verify your login credentials.

3. Generate a license

Navigate to the W&B Deploy Manager at https://deploy.wandb.ai/deploy to generate a Trial Mode W&B license.

  1. Select Docker as your provider
  2. Click Next.
  3. Select a license owner from the Owner of license dropdown.
  4. Click Next.
  5. Provide a name for your license in the Name of Instance field.
  6. (Optional) Provide a description about your license in the Description field.
  7. Click the Generate License Key button.

After you click Generate License Key, W&B redirects you to a Deployment License page. Within the Deployment License page you can view information about your license instance such as the Deployment ID, the organization the license belongs to, and more.

4. Add trial license to your local host

  1. Within the Deployment License page of your license instance, click the Copy License button.
  2. Navigate to http://localhost:8080/system-admin/
  3. Paste your license into to License field.
  4. Click the Update settings button.

5. Check your browser is running the W&B App UI

Check that W&B is running on your local machine. Navigate to http://localhost:8080/home. You should see the W&B App UI in your browser.

6. Add programmatic access to your local W&B instance

  1. Navigate to http://localhost:8080/authorize to obtain your API key.
  2. Within your terminal, execute the following:
    wandb login --host=http://localhost:8080/
    
    If you are already logged into W&B with a different count, add the relogin flag:
    wandb login --relogin --host=http://localhost:8080
    
  3. Paste your API key when prompted.

W&B appends a localhost profile and your API key to your .netrc profile at /Users/username/.netrc for future automatic logins.

Add a volume to retain data

All metadata and files you log to W&B are temporarily stored in the https://deploy.wandb.ai/vol directory.

Mount a volume, or external storage, to your Docker container to retain files and metadata you store in your local W&B instance. W&B recommends that you store metadata in an external MySQL database and files in an external storage bucket such as Amazon S3.

For more information on how to mount a volume and for information on how Docker manages data, see Manage data in Docker page in the Docker documentation.

Volume considerations

The underlying file store should be resizable. W&B recommends that you set up alerts to inform you when you are close to reaching minimum storage thresholds so you can resize the underlying file system.

1.3.2 - Run W&B Server on Kubernetes

Deploy W&B Platform with Kubernetes Operator

W&B Kubernetes Operator

Use the W&B Kubernetes Operator to simplify deploying, administering, troubleshooting, and scaling your W&B Server deployments on Kubernetes. You can think of the operator as a smart assistant for your W&B instance.

The W&B Server architecture and design continuously evolves to expand AI developer tooling capabilities, and to provide appropriate primitives for high performance, better scalability, and easier administration. That evolution applies to the compute services, relevant storage and the connectivity between them. To help facilitate continuous updates and improvements across deployment types, W&B users a Kubernetes operator.

For more information about Kubernetes operators, see Operator pattern in the Kubernetes documentation.

Reasons for the architecture shift

Historically, the W&B application was deployed as a single deployment and pod within a Kubernetes Cluster or a single Docker container. W&B has, and continues to recommend, to externalize the Database and Object Store. Externalizing the Database and Object store decouples the application’s state.

As the application grew, the need to evolve from a monolithic container to a distributed system (microservices) was apparent. This change facilitates backend logic handling and seamlessly introduces built-in Kubernetes infrastructure capabilities. Distributed systems also supports deploying new services essential for additional features that W&B relies on.

Before 2024, any Kubernetes-related change required manually updating the terraform-kubernetes-wandb Terraform module. Updating the Terraform module ensures compatibility across cloud providers, configuring necessary Terraform variables, and executing a Terraform apply for each backend or Kubernetes-level change.

This process was not scalable since W&B Support had to assist each customer with upgrading their Terraform module.

The solution was to implement an operator that connects to a central deploy.wandb.ai server to request the latest specification changes for a given release channel and apply them. Updates are received as long as the license is valid. Helm is used as both the deployment mechanism for the W&B operator and the means for the operator to handle all configuration templating of the W&B Kubernetes stack, Helm-ception.

How it works

You can install the operator with helm or from the source. See charts/operator for detailed instructions.

The installation process creates a deployment called controller-manager and uses a custom resource definition named weightsandbiases.apps.wandb.com (shortName: wandb), that takes a single spec and applies it to the cluster:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: weightsandbiases.apps.wandb.com

The controller-manager installs charts/operator-wandb based on the spec of the custom resource, release channel, and a user defined config. The configuration specification hierarchy enables maximum configuration flexibility at the user end and enables W&B to release new images, configurations, features, and Helm updates automatically.

Refer to the configuration specification hierarchy and configuration reference for configuration options.

Configuration specification hierarchy

Configuration specifications follow a hierarchical model where higher-level specifications override lower-level ones. Here’s how it works:

  • Release Channel Values: This base level configuration sets default values and configurations based on the release channel set by W&B for the deployment.
  • User Input Values: Users can override the default settings provided by the Release Channel Spec through the System Console.
  • Custom Resource Values: The highest level of specification, which comes from the user. Any values specified here override both the User Input and Release Channel specifications. For a detailed description of the configuration options, see Configuration Reference.

This hierarchical model ensures that configurations are flexible and customizable to meet varying needs while maintaining a manageable and systematic approach to upgrades and changes.

Requirements to use the W&B Kubernetes Operator

Satisfy the following requirements to deploy W&B with the W&B Kubernetes operator:

Refer to the reference architecture. In addition, obtain a valid W&B Server license.

See this guide for a detailed explanation on how to set up and configure a self-managed installation.

Depending on the installation method, you might need to meet the following requirements:

  • Kubectl installed and configured with the correct Kubernetes cluster context.
  • Helm is installed.

Air-gapped installations

See the Deploy W&B in airgapped environment with Kubernetes tutorial on how to install the W&B Kubernetes Operator in an airgapped environment.

Deploy W&B Server application

This section describes different ways to deploy the W&B Kubernetes operator.

Choose one of the following:

  • If you have provisioned all required external services and want to deploy W&B onto Kubernetes with Helm CLI, continue here.
  • If you prefer managing infrastructure and the W&B Server with Terraform, continue here.
  • If you want to utilize the W&B Cloud Terraform Modules, continue here.

Deploy W&B with Helm CLI

W&B provides a Helm Chart to deploy the W&B Kubernetes operator to a Kubernetes cluster. This approach allows you to deploy W&B Server with Helm CLI or a continuous delivery tool like ArgoCD. Make sure that the above mentioned requirements are in place.

Follow those steps to install the W&B Kubernetes Operator with Helm CLI:

  1. Add the W&B Helm repository. The W&B Helm chart is available in the W&B Helm repository. Add the repo with the following commands:
helm repo add wandb https://charts.wandb.ai
helm repo update
  1. Install the Operator on a Kubernetes cluster. Copy and paste the following:
helm upgrade --install operator wandb/operator -n wandb-cr --create-namespace
  1. Configure the W&B operator custom resource to trigger the W&B Server installation. Create an operator.yaml file to customize the W&B Operator deployment, specifying your custom configuration. See Configuration Reference for details.

    Once you have the specification YAML created and filled with your values, run the following and the operator applies the configuration and install the W&B Server application based on your configuration.

    kubectl apply -f operator.yaml
    

    Wait until the deployment completes. This takes a few minutes.

  2. To verify the installation using the web UI, create the first admin user account, then follow the verification steps outlined in Verify the installation.

Deploy W&B with Helm Terraform Module

This method allows for customized deployments tailored to specific requirements, leveraging Terraform’s infrastructure-as-code approach for consistency and repeatability. The official W&B Helm-based Terraform Module is located here.

The following code can be used as a starting point and includes all necessary configuration options for a production grade deployment.

module "wandb" {
  source  = "wandb/wandb/helm"

  spec = {
    values = {
      global = {
        host    = "https://<HOST_URI>"
        license = "eyJhbGnUzaH...j9ZieKQ2x5GGfw"

        bucket = {
          <details depend on the provider>
        }

        mysql = {
          <redacted>
        }
      }

      ingress = {
        annotations = {
          "a" = "b"
          "x" = "y"
        }
      }
    }
  }
}

Note that the configuration options are the same as described in Configuration Reference, but that the syntax has to follow the HashiCorp Configuration Language (HCL). The Terraform module creates the W&B custom resource definition (CRD).

To see how W&B&Biases themselves use the Helm Terraform module to deploy “Dedicated cloud” installations for customers, follow those links:

Deploy W&B with W&B Cloud Terraform modules

W&B provides a set of Terraform Modules for AWS, GCP and Azure. Those modules deploy entire infrastructures including Kubernetes clusters, load balancers, MySQL databases and so on as well as the W&B Server application. The W&B Kubernetes Operator is already pre-baked with those official W&B cloud-specific Terraform Modules with the following versions:

Terraform Registry Source Code Version
AWS https://github.com/wandb/terraform-aws-wandb v4.0.0+
Azure https://github.com/wandb/terraform-azurerm-wandb v2.0.0+
GCP https://github.com/wandb/terraform-google-wandb v2.0.0+

This integration ensures that W&B Kubernetes Operator is ready to use for your instance with minimal setup, providing a streamlined path to deploying and managing W&B Server in your cloud environment.

For a detailed description on how to use these modules, refer to this section to self-managed installations section in the docs.

Verify the installation

To verify the installation, W&B recommends using the W&B CLI. The verify command executes several tests that verify all components and configurations.

Follow these steps to verify the installation:

  1. Install the W&B CLI:

    pip install wandb
    
  2. Log in to W&B:

    wandb login --host=https://YOUR_DNS_DOMAIN
    

    For example:

    wandb login --host=https://wandb.company-name.com
    
  3. Verify the installation:

    wandb verify
    

A successful installation and fully working W&B deployment shows the following output:

Default host selected:  https://wandb.company-name.com
Find detailed logs for this test at: /var/folders/pn/b3g3gnc11_sbsykqkm3tx5rh0000gp/T/tmpdtdjbxua/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................✅
Checking ability to send large payloads through proxy...................✅
Checking requests to base url...........................................✅
Checking requests made over signed URLs.................................✅
Checking CORs configuration of the bucket...............................✅
Checking wandb package version is up to date............................✅
Checking logged metrics, saving and downloading a file..................✅
Checking artifact save and download workflows...........................✅

Access the W&B Management Console

The W&B Kubernetes operator comes with a management console. It is located at ${HOST_URI}/console, for example https://wandb.company-name.com/ console.

There are two ways to log in to the management console:

  1. Open the W&B application in the browser and login. Log in to the W&B application with ${HOST_URI}/, for example https://wandb.company-name.com/

  2. Access the console. Click on the icon in the top right corner and then click System console. Only users with admin privileges can see the System console entry.

  1. Open console application in browser. Open the above described URL, which redirects you to the login screen:
  2. Retrieve the password from the Kubernetes secret that the installation generates:
    kubectl get secret wandb-password -o jsonpath='{.data.password}' | base64 -d
    
    Copy the password.
  3. Login to the console. Paste the copied password, then click Login.

Update the W&B Kubernetes operator

This section describes how to update the W&B Kubernetes operator.

Copy and paste the code snippets below into your terminal.

  1. First, update the repo with helm repo update:

    helm repo update
    
  2. Next, update the Helm chart with helm upgrade:

    helm upgrade operator wandb/operator -n wandb-cr --reuse-values
    

Update the W&B Server application

You no longer need to update W&B Server application if you use the W&B Kubernetes operator.

The operator automatically updates your W&B Server application when a new version of the software of W&B is released.

Migrate self-managed instances to W&B Operator

The proceeding section describe how to migrate from self-managing your own W&B Server installation to using the W&B Operator to do this for you. The migration process depends on how you installed W&B Server:

Migrate to Operator-based AWS Terraform Modules

For a detailed description of the migration process, continue here.

Migrate to Operator-based GCP Terraform Modules

Reach out to Customer Support or your W&B team if you have any questions or need assistance.

Migrate to Operator-based Azure Terraform Modules

Reach out to Customer Support or your W&B team if you have any questions or need assistance.

Migrate to Operator-based Helm chart

Follow these steps to migrate to the Operator-based Helm chart:

  1. Get the current W&B configuration. If W&B was deployed with an non-operator-based version of the Helm chart, export the values like this:

    helm get values wandb
    

    If W&B was deployed with Kubernetes manifests, export the values like this:

    kubectl get deployment wandb -o yaml
    

    You now have all the configuration values you need for the next step.

  2. Create a file called operator.yaml. Follow the format described in the Configuration Reference. Use the values from step 1.

  3. Scale the current deployment to 0 pods. This step is stops the current deployment.

    kubectl scale --replicas=0 deployment wandb
    
  4. Update the Helm chart repo:

    helm repo update
    
  5. Install the new Helm chart:

    helm upgrade --install operator wandb/operator -n wandb-cr --create-namespace
    
  6. Configure the new helm chart and trigger W&B application deployment. Apply the new configuration.

    kubectl apply -f operator.yaml
    

    The deployment takes a few minutes to complete.

  7. Verify the installation. Make sure that everything works by following the steps in Verify the installation.

  8. Remove to old installation. Uninstall the old helm chart or delete the resources that were created with manifests.

Migrate to Operator-based Terraform Helm chart

Follow these steps to migrate to the Operator-based Helm chart:

  1. Prepare Terraform config. Replace the Terraform code from the old deployment in your Terraform config with the one that is described here. Set the same variables as before. Do not change .tfvars file if you have one.
  2. Execute Terraform run. Execute terraform init, plan and apply
  3. Verify the installation. Make sure that everything works by following the steps in Verify the installation.
  4. Remove to old installation. Uninstall the old helm chart or delete the resources that were created with manifests.

Configuration Reference for W&B Server

This section describes the configuration options for W&B Server application. The application receives its configuration as custom resource definition named WeightsAndBiases. Some configuration options are exposed with the below configuration, some need to be set as environment variables.

The documentation has two lists of environment variables: basic and advanced. Only use environment variables if the configuration option that you need are not exposed using Helm Chart.

The W&B Server application configuration file for a production deployment requires the following contents. This YAML file defines the desired state of your W&B deployment, including the version, environment variables, external resources like databases, and other necessary settings.

apiVersion: apps.wandb.com/v1
kind: WeightsAndBiases
metadata:
  labels:
    app.kubernetes.io/name: weightsandbiases
    app.kubernetes.io/instance: wandb
  name: wandb
  namespace: default
spec:
  values:
    global:
      host: https://<HOST_URI>
      license: eyJhbGnUzaH...j9ZieKQ2x5GGfw
      bucket:
        <details depend on the provider>
      mysql:
        <redacted>
    ingress:
      annotations:
        <redacted>

Find the full set of values in the W&B Helm repository, and change only those values you need to override.

Complete example

This is an example configuration that uses GCP Kubernetes with GCP Ingress and GCS (GCP Object storage):

apiVersion: apps.wandb.com/v1
kind: WeightsAndBiases
metadata:
  labels:
    app.kubernetes.io/name: weightsandbiases
    app.kubernetes.io/instance: wandb
  name: wandb
  namespace: default
spec:
  values:
    global:
      host: https://abc-wandb.sandbox-gcp.wandb.ml
      bucket:
        name: abc-wandb-moving-pipefish
        provider: gcs
      mysql:
        database: wandb_local
        host: 10.218.0.2
        name: wandb_local
        password: 8wtX6cJHizAZvYScjDzZcUarK4zZGjpV
        port: 3306
        user: wandb
      license: eyJhbGnUzaHgyQjQyQWhEU3...ZieKQ2x5GGfw
    ingress:
      annotations:
        ingress.gcp.kubernetes.io/pre-shared-cert: abc-wandb-cert-creative-puma
        kubernetes.io/ingress.class: gce
        kubernetes.io/ingress.global-static-ip-name: abc-wandb-operator-address

Host

 # Provide the FQDN with protocol
global:
  # example host name,  replace with your own
  host: https://abc-wandb.sandbox-gcp.wandb.ml

Object storage (bucket)

AWS

global:
  bucket:
    provider: "s3"
    name: ""
    kmsKey: ""
    region: ""

GCP

global:
  bucket:
    provider: "gcs"
    name: ""

Azure

global:
  bucket:
    provider: "az"
    name: ""
    secretKey: ""

Other providers (Minio, Ceph, etc.)

For other S3 compatible providers, set the bucket configuration as a environment variable as follows:

global:
  extraEnv:
    "BUCKET": "s3://wandb:changeme@mydb.com/wandb?tls=true"

The variable contains a connection string in this form:

s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME

You can optionally tell W&B to only connect over TLS if you configure a trusted SSL certificate for your object store. To do so, add the tls query parameter to the url:

s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME?tls=true

MySQL

global:
   mysql:
     # Example values, replace with your own
     database: wandb_local
     host: 10.218.0.2
     name: wandb_local
     password: 8wtX6cJH...ZcUarK4zZGjpV
     port: 3306
     user: wandb

License

global:
  # Example license,  replace with your own
  license: eyJhbGnUzaHgyQjQy...VFnPS_KETXg1hi

Ingress

To identify the ingress class, see this FAQ entry.

Without TLS

global:
# IMPORTANT: Ingress is on the same level in the YAML as ‘global’ (not a child)
ingress:
  class: ""

With TLS

Create a secret that contains the certificate

kubectl create secret tls wandb-ingress-tls --key wandb-ingress-tls.key --cert wandb-ingress-tls.crt

Reference the secret in the ingress configuration

global:
# IMPORTANT: Ingress is on the same level in the YAML as ‘global’ (not a child)
ingress:
  class: ""
  annotations:
    {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  tls: 
    - secretName: wandb-ingress-tls
      hosts:
        - <HOST_URI>

In case of Nginx you might have to add the following annotation:

ingress:
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: 64m

Custom Kubernetes ServiceAccounts

Specify custom Kubernetes service accounts to run the W&B pods.

The following snippet creates a service account as part of the deployment with the specified name:

app:
  serviceAccount:
    name: custom-service-account
    create: true

parquet:
  serviceAccount:
    name: custom-service-account
    create: true

global:
  ...

The subsystems “app” and “parquet” run under the specified service account. The other subsystems run under the default service account.

If the service account already exists on the cluster, set create: false:

app:
  serviceAccount:
    name: custom-service-account
    create: false

parquet:
  serviceAccount:
    name: custom-service-account
    create: false
    
global:
  ...

You can specify service accounts on different subsystems such as app, parquet, console, and others:

app:
  serviceAccount:
    name: custom-service-account
    create: true

console:
  serviceAccount:
    name: custom-service-account
    create: true

global:
  ...

The service accounts can be different between the subsystems:

app:
  serviceAccount:
    name: custom-service-account
    create: false

console:
  serviceAccount:
    name: another-custom-service-account
    create: true

global:
  ...

External Redis

redis:
  install: false

global:
  redis:
    host: ""
    port: 6379
    password: ""
    parameters: {}
    caCert: ""

Alternatively with redis password in a Kubernetes secret:

kubectl create secret generic redis-secret --from-literal=redis-password=supersecret

Reference it in below configuration:

redis:
  install: false

global:
  redis:
    host: redis.example
    port: 9001
    auth:
      enabled: true
      secret: redis-secret
      key: redis-password

LDAP

Without TLS

global:
  ldap:
    enabled: true
    # LDAP server address including "ldap://" or "ldaps://"
    host:
    # LDAP search base to use for finding users
    baseDN:
    # LDAP user to bind with (if not using anonymous bind)
    bindDN:
    # Secret name and key with LDAP password to bind with (if not using anonymous bind)
    bindPW:
    # LDAP attribute for email and group ID attribute names as comma separated string values.
    attributes:
    # LDAP group allow list
    groupAllowList:
    # Enable LDAP TLS
    tls: false

With TLS

The LDAP TLS cert configuration requires a config map pre-created with the certificate content.

To create the config map you can use the following command:

kubectl create configmap ldap-tls-cert --from-file=certificate.crt

And use the config map in the YAML like the example below

global:
  ldap:
    enabled: true
    # LDAP server address including "ldap://" or "ldaps://"
    host:
    # LDAP search base to use for finding users
    baseDN:
    # LDAP user to bind with (if not using anonymous bind)
    bindDN:
    # Secret name and key with LDAP password to bind with (if not using anonymous bind)
    bindPW:
    # LDAP attribute for email and group ID attribute names as comma separated string values.
    attributes:
    # LDAP group allow list
    groupAllowList:
    # Enable LDAP TLS
    tls: true
    # ConfigMap name and key with CA certificate for LDAP server
    tlsCert:
      configMap:
        name: "ldap-tls-cert"
        key: "certificate.crt"

OIDC SSO

global: 
  auth:
    sessionLengthHours: 720
    oidc:
      clientId: ""
      secret: ""
      authMethod: ""
      issuer: ""

SMTP

global:
  email:
    smtp:
      host: ""
      port: 587
      user: ""
      password: ""

Environment Variables

global:
  extraEnv:
    GLOBAL_ENV: "example"

Custom certificate authority

customCACerts is a list and can take many certificates. Certificate authorities specified in customCACerts only apply to the W&B Server application.

global:
  customCACerts:
  - |
    -----BEGIN CERTIFICATE-----
    MIIBnDCCAUKgAwIBAg.....................fucMwCgYIKoZIzj0EAwIwLDEQ
    MA4GA1UEChMHSG9tZU.....................tZUxhYiBSb290IENBMB4XDTI0
    MDQwMTA4MjgzMFoXDT.....................oNWYggsMo8O+0mWLYMAoGCCqG
    SM49BAMCA0gAMEUCIQ.....................hwuJgyQRaqMI149div72V2QIg
    P5GD+5I+02yEp58Cwxd5Bj2CvyQwTjTO4hiVl1Xd0M0=
    -----END CERTIFICATE-----    
  - |
    -----BEGIN CERTIFICATE-----
    MIIBxTCCAWugAwIB.......................qaJcwCgYIKoZIzj0EAwIwLDEQ
    MA4GA1UEChMHSG9t.......................tZUxhYiBSb290IENBMB4XDTI0
    MDQwMTA4MjgzMVoX.......................UK+moK4nZYvpNpqfvz/7m5wKU
    SAAwRQIhAIzXZMW4.......................E8UFqsCcILdXjAiA7iTluM0IU
    aIgJYVqKxXt25blH/VyBRzvNhViesfkNUQ==
    -----END CERTIFICATE-----    

Configuration Reference for W&B Operator

This section describes configuration options for W&B Kubernetes operator (wandb-controller-manager). The operator receives its configuration in the form of a YAML file.

By default, the W&B Kubernetes operator does not need a configuration file. Create a configuration file if required. For example, you might need a configuration file to specify custom certificate authorities, deploy in an air gap environment and so forth.

Find the full list of spec customization in the Helm repository.

Custom CA

A custom certificate authority (customCACerts), is a list and can take many certificates. Those certificate authorities when added only apply to the W&B Kubernetes operator (wandb-controller-manager).

customCACerts:
- |
  -----BEGIN CERTIFICATE-----
  MIIBnDCCAUKgAwIBAg.....................fucMwCgYIKoZIzj0EAwIwLDEQ
  MA4GA1UEChMHSG9tZU.....................tZUxhYiBSb290IENBMB4XDTI0
  MDQwMTA4MjgzMFoXDT.....................oNWYggsMo8O+0mWLYMAoGCCqG
  SM49BAMCA0gAMEUCIQ.....................hwuJgyQRaqMI149div72V2QIg
  P5GD+5I+02yEp58Cwxd5Bj2CvyQwTjTO4hiVl1Xd0M0=
  -----END CERTIFICATE-----  
- |
  -----BEGIN CERTIFICATE-----
  MIIBxTCCAWugAwIB.......................qaJcwCgYIKoZIzj0EAwIwLDEQ
  MA4GA1UEChMHSG9t.......................tZUxhYiBSb290IENBMB4XDTI0
  MDQwMTA4MjgzMVoX.......................UK+moK4nZYvpNpqfvz/7m5wKU
  SAAwRQIhAIzXZMW4.......................E8UFqsCcILdXjAiA7iTluM0IU
  aIgJYVqKxXt25blH/VyBRzvNhViesfkNUQ==
  -----END CERTIFICATE-----  

FAQ

How to get the W&B Operator Console password

See Accessing the W&B Kubernetes Operator Management Console.

How to access the W&B Operator Console if Ingress doesn’t work

Execute the following command on a host that can reach the Kubernetes cluster:

kubectl port-forward svc/wandb-console 8082

Access the console in the browser with https://localhost:8082/ console.

See Accessing the W&B Kubernetes Operator Management Console on how to get the password (Option 2).

How to view W&B Server logs

The application pod is named wandb-app-xxx.

kubectl get pods
kubectl logs wandb-XXXXX-XXXXX

How to identify the Kubernetes ingress class

You can get the ingress class installed in your cluster by running

kubectl get ingressclass

1.3.2.1 - Kubernetes operator for air-gapped instances

Deploy W&B Platform with Kubernetes Operator (Airgapped)

Introduction

This guide provides step-by-step instructions to deploy the W&B Platform in air-gapped customer-managed environments.

Use an internal repository or registry to host the Helm charts and container images. Run all commands in a shell console with proper access to the Kubernetes cluster.

You could utilize similar commands in any continuous delivery tooling that you use to deploy Kubernetes applications.

Step 1: Prerequisites

Before starting, make sure your environment meets the following requirements:

  • Kubernetes version >= 1.28
  • Helm version >= 3
  • Access to an internal container registry with the required W&B images
  • Access to an internal Helm repository for W&B Helm charts

Step 2: Prepare internal container registry

Before proceeding with the deployment, you must ensure that the following container images are available in your internal container registry. These images are critical for the successful deployment of W&B components.

wandb/local                                             0.59.2
wandb/console                                           2.12.2
wandb/controller                                        1.13.0
otel/opentelemetry-collector-contrib                    0.97.0
bitnami/redis                                           7.2.4-debian-12-r9
quay.io/prometheus/prometheus                           v2.47.0
quay.io/prometheus-operator/prometheus-config-reloader  v0.67.0

Step 3: Prepare internal Helm chart repository

Along with the container images, you also must ensure that the following Helm charts are available in your internal Helm Chart repository.

The operator chart is used to deploy the W&B Operator, or the Controller Manager. While the platform chart is used to deploy the W&B Platform using the values configured in the custom resource definition (CRD).

Step 4: Set up Helm repository

Now, configure the Helm repository to pull the W&B Helm charts from your internal repository. Run the following commands to add and update the Helm repository:

helm repo add local-repo https://charts.yourdomain.com
helm repo update

Step 5: Install the Kubernetes operator

The W&B Kubernetes operator, also known as the controller manager, is responsible for managing the W&B platform components. To install it in an air-gapped environment, you must configure it to use your internal container registry.

To do so, you must override the default image settings to use your internal container registry and set the key airgapped: true to indicate the expected deployment type. Update the values.yaml file as shown below:

image:
  repository: registry.yourdomain.com/library/controller
  tag: 1.13.3
airgapped: true

You can find all supported values in the official Kubernetes operator repository.

Step 6: Configure CustomResourceDefinitions

After installing the W&B Kubernetes operator, you must configure the Custom Resource Definitions (CRDs) to point to your internal Helm repository and container registry.

This configuration ensures that the Kubernetes operators uses your internal registry and repository are when it deploys the required components of the W&B platform.

Below is an example of how to configure the CRD.

apiVersion: apps.wandb.com/v1
kind: WeightsAndBiases
metadata:
  labels:
    app.kubernetes.io/instance: wandb
    app.kubernetes.io/name: weightsandbiases
  name: wandb
  namespace: default

spec:
  chart:
    url: http://charts.yourdomain.com
    name: operator-wandb
    version: 0.18.0

  values:
    global:
      host: https://wandb.yourdomain.com
      license: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      bucket:
        accessKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        secretKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        name: s3.yourdomain.com:port #Ex.: s3.yourdomain.com:9000
        path: bucket_name
        provider: s3
        region: us-east-1
      mysql:
        database: wandb
        host: mysql.home.lab
        password: password
        port: 3306
        user: wandb
    
    # Ensre it's set to use your own MySQL
    mysql:
      install: false

    app:
      image:
        repository: registry.yourdomain.com/local
        tag: 0.59.2

    console:
      image:
        repository: registry.yourdomain.com/console
        tag: 2.12.2

    ingress:
      annotations:
        nginx.ingress.kubernetes.io/proxy-body-size: 64m
      class: nginx

    

To deploy the W&B platform, the Kubernetes Operator uses the operator-wandb chart from your internal repository and use the values from your CRD to configure the Helm chart.

You can find all supported values in the official Kubernetes operator repository.

Step 7: Deploy the W&B platform

Finally, after setting up the Kubernetes operator and the CRD, deploy the W&B platform using the following command:

kubectl apply -f wandb.yaml

FAQ

Refer to the below frequently asked questions (FAQs) and troubleshooting tips during the deployment process:

There is another ingress class. Can that class be used?

Yes, you can configure your ingress class by modifying the ingress settings in values.yaml.

The certificate bundle has more than one certificate. Would that work?

You must split the certificates into multiple entries in the customCACerts section of values.yaml.

How do you prevent the Kubernetes operator from applying unattended updates. Is that possible?

You can turn off auto-updates from the W&B console. Reach out to your W&B team for any questions on the supported versions. Also, note that W&B supports platform versions released in last 6 months. W&B recommends performing periodic upgrades.

Does the deployment work if the environment has no connection to public repositories?

As long as you have enabled the airgapped: true configuration, the Kubernetes operator does not attempt to reach public repositories. The Kubernetes operator attempts to use your internal resources.

1.3.3 - Install on public cloud

1.3.3.1 - Deploy W&B Platform on AWS

Hosting W&B Server on AWS.

W&B recommends using the W&B Server AWS Terraform Module to deploy the platform on AWS.

Before you start, W&B recommends that you choose one of the remote backends available for Terraform to store the State File.

The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.

The Terraform Module deploys the following mandatory components:

  • Load Balancer
  • AWS Identity & Access Management (IAM)
  • AWS Key Management System (KMS)
  • Amazon Aurora MySQL
  • Amazon VPC
  • Amazon S3
  • Amazon Route53
  • Amazon Certificate Manager (ACM)
  • Amazon Elastic Load Balancing (ALB)
  • Amazon Secrets Manager

Other deployment options can also include the following optional components:

  • Elastic Cache for Redis
  • SQS

Pre-requisite permissions

The account that runs Terraform needs to be able to create all components described in the Introduction and permission to create IAM Policies and IAM Roles and assign roles to resources.

General steps

The steps on this topic are common for any deployment option covered by this documentation.

  1. Prepare the development environment.

    • Install Terraform
    • W&B recommend creating a Git repository for version control.
  2. Create the terraform.tfvars file.

    The tvfars file content can be customized according to the installation type, but the minimum recommended will look like the example below.

    namespace                  = "wandb"
    license                    = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
    subdomain                  = "wandb-aws"
    domain_name                = "wandb.ml"
    zone_id                    = "xxxxxxxxxxxxxxxx"
    allowed_inbound_cidr       = ["0.0.0.0/0"]
    allowed_inbound_ipv6_cidr  = ["::/0"]
    

    Ensure to define variables in your tvfars file before you deploy because the namespace variable is a string that prefixes all resources created by Terraform.

    The combination of subdomain and domain will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will be wandb-aws.wandb.ml and the DNS zone_id where the FQDN record will be created.

    Both allowed_inbound_cidr and allowed_inbound_ipv6_cidr also require setting. In the module, this is a mandatory input. The proceeding example permits access from any source to the W&B installation.

  3. Create the file versions.tf

    This file will contain the Terraform and Terraform provider versions required to deploy W&B in AWS

    provider "aws" {
      region = "eu-central-1"
    
      default_tags {
        tags = {
          GithubRepo = "terraform-aws-wandb"
          GithubOrg  = "wandb"
          Enviroment = "Example"
          Example    = "PublicDnsExternal"
        }
      }
    }
    

    Refer to the Terraform Official Documentation to configure the AWS provider.

    Optionally, but highly recommended, add the remote backend configuration mentioned at the beginning of this documentation.

  4. Create the file variables.tf

    For every option configured in the terraform.tfvars Terraform requires a correspondent variable declaration.

    variable "namespace" {
      type        = string
      description = "Name prefix used for resources"
    }
    
    variable "domain_name" {
      type        = string
      description = "Domain name used to access instance."
    }
    
    variable "subdomain" {
      type        = string
      default     = null
      description = "Subdomain for accessing the Weights & Biases UI."
    }
    
    variable "license" {
      type = string
    }
    
    variable "zone_id" {
      type        = string
      description = "Domain for creating the Weights & Biases subdomain on."
    }
    
    variable "allowed_inbound_cidr" {
     description = "CIDRs allowed to access wandb-server."
     nullable    = false
     type        = list(string)
    }
    
    variable "allowed_inbound_ipv6_cidr" {
     description = "CIDRs allowed to access wandb-server."
     nullable    = false
     type        = list(string)
    }
    

This is the most straightforward deployment option configuration that creates all Mandatory components and installs in the Kubernetes Cluster the latest version of W&B.

  1. Create the main.tf

    In the same directory where you created the files in the General Steps, create a file main.tf with the following content:

    module "wandb_infra" {
      source  = "wandb/wandb/aws"
      version = "~>2.0"
    
      namespace   = var.namespace
      domain_name = var.domain_name
      subdomain   = var.subdomain
      zone_id     = var.zone_id
    
      allowed_inbound_cidr           = var.allowed_inbound_cidr
      allowed_inbound_ipv6_cidr      = var.allowed_inbound_ipv6_cidr
    
      public_access                  = true
      external_dns                   = true
      kubernetes_public_access       = true
      kubernetes_public_access_cidrs = ["0.0.0.0/0"]
    }
    
    data "aws_eks_cluster" "app_cluster" {
      name = module.wandb_infra.cluster_id
    }
    
    data "aws_eks_cluster_auth" "app_cluster" {
      name = module.wandb_infra.cluster_id
    }
    
    provider "kubernetes" {
      host                   = data.aws_eks_cluster.app_cluster.endpoint
      cluster_ca_certificate = base64decode(data.aws_eks_cluster.app_cluster.certificate_authority.0.data)
      token                  = data.aws_eks_cluster_auth.app_cluster.token
    }
    
    module "wandb_app" {
      source  = "wandb/wandb/kubernetes"
      version = "~>1.0"
    
      license                    = var.license
      host                       = module.wandb_infra.url
      bucket                     = "s3://${module.wandb_infra.bucket_name}"
      bucket_aws_region          = module.wandb_infra.bucket_region
      bucket_queue               = "internal://"
      database_connection_string = "mysql://${module.wandb_infra.database_connection_string}"
    
      # TF attempts to deploy while the work group is
      # still spinning up if you do not wait
      depends_on = [module.wandb_infra]
    }
    
    output "bucket_name" {
      value = module.wandb_infra.bucket_name
    }
    
    output "url" {
      value = module.wandb_infra.url
    }
    
  2. Deploy W&B

    To deploy W&B, execute the following commands:

    terraform init
    terraform apply -var-file=terraform.tfvars
    

Enable REDIS

Another deployment option uses Redis to cache the SQL queries and speed up the application response when loading the metrics for the experiments.

You need to add the option create_elasticache_subnet = true to the same main.tf file described in the Recommended deployment section to enable the cache.

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "~>2.0"

  namespace   = var.namespace
  domain_name = var.domain_name
  subdomain   = var.subdomain
  zone_id     = var.zone_id
	**create_elasticache_subnet = true**
}
[...]

Enable message broker (queue)

Deployment option 3 consists of enabling the external message broker. This is optional because the W&B brings embedded a broker. This option doesn’t bring a performance improvement.

The AWS resource that provides the message broker is the SQS, and to enable it, you will need to add the option use_internal_queue = false to the same main.tf described in the Recommended deployment section.

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "~>2.0"

  namespace   = var.namespace
  domain_name = var.domain_name
  subdomain   = var.subdomain
  zone_id     = var.zone_id
  **use_internal_queue = false**

[...]
}

Other deployment options

You can combine all three deployment options adding all configurations to the same file. The Terraform Module provides several options that can be combined along with the standard options and the minimal configuration found in Deployment - Recommended

Manual configuration

To use an Amazon S3 bucket as a file storage backend for W&B, you will need to:

you’ll need to create a bucket, along with an SQS queue configured to receive object creation notifications from that bucket. Your instance will need permissions to read from this queue.

Create an S3 Bucket and Bucket Notifications

Follow the procedure bellow to create an Amazon S3 bucket and enable bucket notifications.

  1. Navigate to Amazon S3 in the AWS Console.
  2. Select Create bucket.
  3. Within the Advanced settings, select Add notification within the Events section.
  4. Configure all object creation events to be sent to the SQS Queue you configured earlier.
Enterprise file storage settings

Enable CORS access. Your CORS configuration should look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>http://YOUR-W&B-SERVER-IP</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

Create an SQS Queue

Follow the procedure below to create an SQS Queue:

  1. Navigate to Amazon SQS in the AWS Console.
  2. Select Create queue.
  3. From the Details section, select a Standard queue type.
  4. Within the Access policy section, add permission to the following principals:
  • SendMessage
  • ReceiveMessage
  • ChangeMessageVisibility
  • DeleteMessage
  • GetQueueUrl

Optionally add an advanced access policy in the Access Policy section. For example, the policy for accessing Amazon SQS with a statement is as follows:

{
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : "*",
        "Action" : ["sqs:SendMessage"],
        "Resource" : "<sqs-queue-arn>",
        "Condition" : {
          "ArnEquals" : { "aws:SourceArn" : "<s3-bucket-arn>" }
        }
      }
    ]
}

Grant permissions to node that runs W&B

The node where W&B server is running must be configured to permit access to Amazon S3 and Amazon SQS. Depending on the type of server deployment you have opted for, you may need to add the following policy statements to your node role:

{
   "Statement":[
      {
         "Sid":"",
         "Effect":"Allow",
         "Action":"s3:*",
         "Resource":"arn:aws:s3:::<WANDB_BUCKET>"
      },
      {
         "Sid":"",
         "Effect":"Allow",
         "Action":[
            "sqs:*"
         ],
         "Resource":"arn:aws:sqs:<REGION>:<ACCOUNT>:<WANDB_QUEUE>"
      }
   ]
}

Configure W&B server

Finally, configure your W&B Server.

  1. Navigate to the W&B settings page at http(s)://YOUR-W&B-SERVER-HOST/system-admin.
  2. Enable the **Use an external file storage backend option
  3. Provide information about your Amazon S3 bucket, region, and Amazon SQS queue in the following format:
  • File Storage Bucket: s3://<bucket-name>
  • File Storage Region (AWS only): <region>
  • Notification Subscription: sqs://<queue-name>
  1. Select Update settings to apply the new settings.

Upgrade your W&B version

Follow the steps outlined here to update W&B:

  1. Add wandb_version to your configuration in your wandb_app module. Provide the version of W&B you want to upgrade to. For example, the following line specifies W&B version 0.48.1:
module "wandb_app" {
    source  = "wandb/wandb/kubernetes"
    version = "~>1.0"

    license       = var.license
    wandb_version = "0.48.1"
  1. After you update your configuration, complete the steps described in the Recommended deployment section.

Migrate to operator-based AWS Terraform modules

This section details the steps required to upgrade from pre-operator to post-operator environments using the terraform-aws-wandb module.

Before and after architecture

Previously, the W&B architecture used:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "1.16.10"
  ...
}

to control the infrastructure:

pre-operator-infra

and this module to deploy the W&B Server:

module "wandb_app" {
  source  = "wandb/wandb/kubernetes"
  version = "1.12.0"
}
pre-operator-k8s

Post-transition, the architecture uses:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "4.7.2"
  ...
}

to manage both the installation of infrastructure and the W&B Server to the Kubernetes cluster, thus eliminating the need for the module "wandb_app" in post-operator.tf.

post-operator-k8s

This architectural shift enables additional features (like OpenTelemetry, Prometheus, HPAs, Kafka, and image updates) without requiring manual Terraform operations by SRE/Infrastructure teams.

To commence with a base installation of the W&B Pre-Operator, ensure that post-operator.tf has a .disabled file extension and pre-operator.tf is active (that does not have a .disabled extension). Those files can be found here.

Prerequisites

Before initiating the migration process, ensure the following prerequisites are met:

  • Egress: The deployment can’t be airgapped. It needs access to deploy.wandb.ai to get the latest spec for the Release Channel.
  • AWS Credentials: Proper AWS credentials configured to interact with your AWS resources.
  • Terraform Installed: The latest version of Terraform should be installed on your system.
  • Route53 Hosted Zone: An existing Route53 hosted zone corresponding to the domain under which the application will be served.
  • Pre-Operator Terraform Files: Ensure pre-operator.tf and associated variable files like pre-operator.tfvars are correctly set up.

Pre-Operator set up

Execute the following Terraform commands to initialize and apply the configuration for the Pre-Operator setup:

terraform init -upgrade
terraform apply -var-file=./pre-operator.tfvars

pre-operator.tf should look something like this:

namespace     = "operator-upgrade"
domain_name   = "sandbox-aws.wandb.ml"
zone_id       = "Z032246913CW32RVRY0WU"
subdomain     = "operator-upgrade"
wandb_license = "ey..."
wandb_version = "0.51.2"

The pre-operator.tf configuration calls two modules:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "1.16.10"
  ...
}

This module spins up the infrastructure.

module "wandb_app" {
  source  = "wandb/wandb/kubernetes"
  version = "1.12.0"
}

This module deploys the application.

Post-Operator Setup

Make sure that pre-operator.tf has a .disabled extension, and post-operator.tf is active.

The post-operator.tfvars includes additional variables:

...
# wandb_version = "0.51.2" is now managed via the Release Channel or set in the User Spec.

# Required Operator Variables for Upgrade:
size                 = "small"
enable_dummy_dns     = true
enable_operator_alb  = true
custom_domain_filter = "sandbox-aws.wandb.ml"

Run the following commands to initialize and apply the Post-Operator configuration:

terraform init -upgrade
terraform apply -var-file=./post-operator.tfvars

The plan and apply steps will update the following resources:

actions:
  create:
    - aws_efs_backup_policy.storage_class
    - aws_efs_file_system.storage_class
    - aws_efs_mount_target.storage_class["0"]
    - aws_efs_mount_target.storage_class["1"]
    - aws_eks_addon.efs
    - aws_iam_openid_connect_provider.eks
    - aws_iam_policy.secrets_manager
    - aws_iam_role_policy_attachment.ebs_csi
    - aws_iam_role_policy_attachment.eks_efs
    - aws_iam_role_policy_attachment.node_secrets_manager
    - aws_security_group.storage_class_nfs
    - aws_security_group_rule.nfs_ingress
    - random_pet.efs
    - aws_s3_bucket_acl.file_storage
    - aws_s3_bucket_cors_configuration.file_storage
    - aws_s3_bucket_ownership_controls.file_storage
    - aws_s3_bucket_server_side_encryption_configuration.file_storage
    - helm_release.operator
    - helm_release.wandb
    - aws_cloudwatch_log_group.this[0]
    - aws_iam_policy.default
    - aws_iam_role.default
    - aws_iam_role_policy_attachment.default
    - helm_release.external_dns
    - aws_default_network_acl.this[0]
    - aws_default_route_table.default[0]
    - aws_iam_policy.default
    - aws_iam_role.default
    - aws_iam_role_policy_attachment.default
    - helm_release.aws_load_balancer_controller

  update_in_place:
    - aws_iam_policy.node_IMDSv2
    - aws_iam_policy.node_cloudwatch
    - aws_iam_policy.node_kms
    - aws_iam_policy.node_s3
    - aws_iam_policy.node_sqs
    - aws_eks_cluster.this[0]
    - aws_elasticache_replication_group.default
    - aws_rds_cluster.this[0]
    - aws_rds_cluster_instance.this["1"]
    - aws_default_security_group.this[0]
    - aws_subnet.private[0]
    - aws_subnet.private[1]
    - aws_subnet.public[0]
    - aws_subnet.public[1]
    - aws_launch_template.workers["primary"]

  destroy:
    - kubernetes_config_map.config_map
    - kubernetes_deployment.wandb
    - kubernetes_priority_class.priority
    - kubernetes_secret.secret
    - kubernetes_service.prometheus
    - kubernetes_service.service
    - random_id.snapshot_identifier[0]

  replace:
    - aws_autoscaling_attachment.autoscaling_attachment["primary"]
    - aws_route53_record.alb
    - aws_eks_node_group.workers["primary"]

You should see something like this:

post-operator-apply

Note that in post-operator.tf, there is a single:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "4.7.2"
  ...
}

Changes in the post-operator configuration:

  1. Update Required Providers: Change required_providers.aws.version from 3.6 to 4.0 for provider compatibility.
  2. DNS and Load Balancer Configuration: Integrate enable_dummy_dns and enable_operator_alb to manage DNS records and AWS Load Balancer setup through an Ingress.
  3. License and Size Configuration: Transfer the license and size parameters directly to the wandb_infra module to match new operational requirements.
  4. Custom Domain Handling: If necessary, use custom_domain_filter to troubleshoot DNS issues by checking the External DNS pod logs within the kube-system namespace.
  5. Helm Provider Configuration: Enable and configure the Helm provider to manage Kubernetes resources effectively:
provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.app_cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.app_cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.app_cluster.token
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.app_cluster.name]
      command     = "aws"
    }
  }
}

This comprehensive setup ensures a smooth transition from the Pre-Operator to the Post-Operator configuration, leveraging new efficiencies and capabilities enabled by the operator model.

1.3.3.2 - Deploy W&B Platform on GCP

Hosting W&B Server on GCP.

If you’ve determined to self-managed W&B Server, W&B recommends using the W&B Server GCP Terraform Module to deploy the platform on GCP.

The module documentation is extensive and contains all available options that can be used.

Before you start, W&B recommends that you choose one of the remote backends available for Terraform to store the State File.

The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.

The Terraform Module will deploy the following mandatory components:

  • VPC
  • Cloud SQL for MySQL
  • Cloud Storage Bucket
  • Google Kubernetes Engine
  • KMS Crypto Key
  • Load Balancer

Other deployment options can also include the following optional components:

  • Memory store for Redis
  • Pub/Sub messages system

Pre-requisite permissions

The account that will run the terraform need to have the role roles/owner in the GCP project used.

General steps

The steps on this topic are common for any deployment option covered by this documentation.

  1. Prepare the development environment.

    • Install Terraform
    • We recommend creating a Git repository with the code that will be used, but you can keep your files locally.
    • Create a project in Google Cloud Console
    • Authenticate with GCP (make sure to install gcloud before) gcloud auth application-default login
  2. Create the terraform.tfvars file.

    The tvfars file content can be customized according to the installation type, but the minimum recommended will look like the example below.

    project_id  = "wandb-project"
    region      = "europe-west2"
    zone        = "europe-west2-a"
    namespace   = "wandb"
    license     = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
    subdomain   = "wandb-gcp"
    domain_name = "wandb.ml"
    

    The variables defined here need to be decided before the deployment because. The namespace variable will be a string that will prefix all resources created by Terraform.

    The combination of subdomain and domain will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will be wandb-gcp.wandb.ml

  3. Create the file variables.tf

    For every option configured in the terraform.tfvars Terraform requires a correspondent variable declaration.

    variable "project_id" {
      type        = string
      description = "Project ID"
    }
    
    variable "region" {
      type        = string
      description = "Google region"
    }
    
    variable "zone" {
      type        = string
      description = "Google zone"
    }
    
    variable "namespace" {
      type        = string
      description = "Namespace prefix used for resources"
    }
    
    variable "domain_name" {
      type        = string
      description = "Domain name for accessing the Weights & Biases UI."
    }
    
    variable "subdomain" {
      type        = string
      description = "Subdomain for access the Weights & Biases UI."
    }
    
    variable "license" {
      type        = string
      description = "W&B License"
    }
    

This is the most straightforward deployment option configuration that will create all Mandatory components and install in the Kubernetes Cluster the latest version of W&B.

  1. Create the main.tf

    In the same directory where you created the files in the General Steps, create a file main.tf with the following content:

    provider "google" {
     project = var.project_id
     region  = var.region
     zone    = var.zone
    }
    
    provider "google-beta" {
     project = var.project_id
     region  = var.region
     zone    = var.zone
    }
    
    data "google_client_config" "current" {}
    
    provider "kubernetes" {
      host                   = "https://${module.wandb.cluster_endpoint}"
      cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
      token                  = data.google_client_config.current.access_token
    }
    
    # Spin up all required services
    module "wandb" {
      source  = "wandb/wandb/google"
      version = "~> 5.0"
    
      namespace   = var.namespace
      license     = var.license
      domain_name = var.domain_name
      subdomain   = var.subdomain
    }
    
    # You'll want to update your DNS with the provisioned IP address
    output "url" {
      value = module.wandb.url
    }
    
    output "address" {
      value = module.wandb.address
    }
    
    output "bucket_name" {
      value = module.wandb.bucket_name
    }
    
  2. Deploy W&B

    To deploy W&B, execute the following commands:

    terraform init
    terraform apply -var-file=terraform.tfvars
    

Deployment with REDIS Cache

Another deployment option uses Redis to cache the SQL queries and speedup the application response when loading the metrics for the experiments.

You need to add the option create_redis = true to the same main.tf file specified in the recommended Deployment option section to enable the cache.

[...]

module "wandb" {
  source  = "wandb/wandb/google"
  version = "~> 1.0"

  namespace    = var.namespace
  license      = var.license
  domain_name  = var.domain_name
  subdomain    = var.subdomain
  allowed_inbound_cidrs = ["*"]
  #Enable Redis
  create_redis = true

}
[...]

Deployment with External Queue

Deployment option 3 consists of enabling the external message broker. This is optional because the W&B brings embedded a broker. This option doesn’t bring a performance improvement.

The GCP resource that provides the message broker is the Pub/Sub, and to enable it, you will need to add the option use_internal_queue = false to the same main.tf specified in the recommended Deployment option section

[...]

module "wandb" {
  source  = "wandb/wandb/google"
  version = "~> 1.0"

  namespace          = var.namespace
  license            = var.license
  domain_name        = var.domain_name
  subdomain          = var.subdomain
  allowed_inbound_cidrs = ["*"]
  #Create and use Pub/Sub
  use_internal_queue = false

}

[...]

Other deployment options

You can combine all three deployment options adding all configurations to the same file. The Terraform Module provides several options that can be combined along with the standard options and the minimal configuration found in Deployment - Recommended

Manual configuration

To use a GCP Storage bucket as a file storage backend for W&B, you will need to create a:

Create PubSub Topic and Subscription

Follow the procedure below to create a PubSub topic and subscription:

  1. Navigate to the Pub/Sub service within the GCP Console
  2. Select Create Topic and provide a name for your topic.
  3. At the bottom of the page, select Create subscription. Ensure Delivery Type is set to Pull.
  4. Click Create.

Make sure the service account or account that your instance is running has the pubsub.admin role on this subscription. For details, see https://cloud.google.com/pubsub/docs/access-control#console.

Create Storage Bucket

  1. Navigate to the Cloud Storage Buckets page.
  2. Select Create bucket and provide a name for your bucket. Ensure you choose a Standard storage class.

Ensure that the service account or account that your instance is running has both:

  1. Enable CORS access. This can only be done using the command line. First, create a JSON file with the following CORS configuration.
cors:
- maxAgeSeconds: 3600
  method:
   - GET
   - PUT
     origin:
   - '<YOUR_W&B_SERVER_HOST>'
     responseHeader:
   - Content-Type

Note that the scheme, host, and port of the values for the origin must match exactly.

  1. Make sure you have gcloud installed, and logged into the correct GCP Project.
  2. Next, run the following:
gcloud storage buckets update gs://<BUCKET_NAME> --cors-file=<CORS_CONFIG_FILE>

Create PubSub Notification

Follow the procedure below in your command line to create a notification stream from the Storage Bucket to the Pub/Sub topic.

  1. Log into your GCP Project.
  2. Run the following in your terminal:
gcloud pubsub topics list  # list names of topics for reference
gcloud storage ls          # list names of buckets for reference

# create bucket notification
gcloud storage buckets notifications create gs://<BUCKET_NAME> --topic=<TOPIC_NAME>

Further reference is available on the Cloud Storage website.

Configure W&B server

  1. Finally, navigate to the W&B System Connections page at http(s)://YOUR-W&B-SERVER-HOST/console/settings/system.
  2. Select the provider Google Cloud Storage (gcs),
  3. Provide the name of the GCS bucket
  1. Press Update settings to apply the new settings.

Upgrade W&B Server

Follow the steps outlined here to update W&B:

  1. Add wandb_version to your configuration in your wandb_app module. Provide the version of W&B you want to upgrade to. For example, the following line specifies W&B version 0.48.1:
module "wandb_app" {
    source  = "wandb/wandb/kubernetes"
    version = "~>5.0"

    license       = var.license
    wandb_version = "0.58.1"
  1. After you update your configuration, complete the steps described in the Deployment option section.

1.3.3.3 - Deploy W&B Platform on Azure

Hosting W&B Server on Azure.

If you’ve determined to self-managed W&B Server, W&B recommends using the W&B Server Azure Terraform Module to deploy the platform on Azure.

The module documentation is extensive and contains all available options that can be used. We will cover some deployment options in this document.

Before you start, we recommend you choose one of the remote backends available for Terraform to store the State File.

The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.

The Terraform Module will deploy the following mandatory components:

  • Azure Resource Group
  • Azure Virtual Network (VPC)
  • Azure MySQL Fliexible Server
  • Azure Storage Account & Blob Storage
  • Azure Kubernetes Service
  • Azure Application Gateway

Other deployment options can also include the following optional components:

  • Azure Cache for Redis
  • Azure Event Grid

Pre-requisite permissions

The simplest way to get the AzureRM provider configured is via Azure CLI but the incase of automation using Azure Service Principal can also be useful. Regardless the authentication method used, the account that will run the Terraform needs to be able to create all components described in the Introduction.

General steps

The steps on this topic are common for any deployment option covered by this documentation.

  1. Prepare the development environment.
  • Install Terraform
  • We recommend creating a Git repository with the code that will be used, but you can keep your files locally.
  1. Create the terraform.tfvars file The tvfars file content can be customized according to the installation type, but the minimum recommended will look like the example below.

     namespace     = "wandb"
     wandb_license = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
     subdomain     = "wandb-aws"
     domain_name   = "wandb.ml"
     location      = "westeurope"
    

    The variables defined here need to be decided before the deployment because. The namespace variable will be a string that will prefix all resources created by Terraform.

    The combination of subdomain and domain will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will be wandb-aws.wandb.ml and the DNS zone_id where the FQDN record will be created.

  2. Create the file versions.tf This file will contain the Terraform and Terraform provider versions required to deploy W&B in AWS

terraform {
  required_version = "~> 1.3"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.17"
    }
  }
}

Refer to the Terraform Official Documentation to configure the AWS provider.

Optionally, but highly recommended, you can add the remote backend configuration mentioned at the beginning of this documentation.

  1. Create the file variables.tf. For every option configured in the terraform.tfvars Terraform requires a correspondent variable declaration.
  variable "namespace" {
    type        = string
    description = "String used for prefix resources."
  }

  variable "location" {
    type        = string
    description = "Azure Resource Group location"
  }

  variable "domain_name" {
    type        = string
    description = "Domain for accessing the Weights & Biases UI."
  }

  variable "subdomain" {
    type        = string
    default     = null
    description = "Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route."
  }

  variable "license" {
    type        = string
    description = "Your wandb/local license"
  }

This is the most straightforward deployment option configuration that will create all Mandatory components and install in the Kubernetes Cluster the latest version of W&B.

  1. Create the main.tf In the same directory where you created the files in the General Steps, create a file main.tf with the following content:
provider "azurerm" {
  features {}
}

provider "kubernetes" {
  host                   = module.wandb.cluster_host
  cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
  client_key             = base64decode(module.wandb.cluster_client_key)
  client_certificate     = base64decode(module.wandb.cluster_client_certificate)
}

provider "helm" {
  kubernetes {
    host                   = module.wandb.cluster_host
    cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
    client_key             = base64decode(module.wandb.cluster_client_key)
    client_certificate     = base64decode(module.wandb.cluster_client_certificate)
  }
}

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/azurerm"
  version = "~> 1.2"

  namespace   = var.namespace
  location    = var.location
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain

  deletion_protection = false

  tags = {
    "Example" : "PublicDns"
  }
}

output "address" {
  value = module.wandb.address
}

output "url" {
  value = module.wandb.url
}
  1. Deploy to W&B To deploy W&B, execute the following commands:

    terraform init
    terraform apply -var-file=terraform.tfvars
    

Deployment with REDIS Cache

Another deployment option uses Redis to cache the SQL queries and speed up the application response when loading the metrics for the experiments.

You must add the option create_redis = true to the same main.tf file that you used in recommended deployment to enable the cache.

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/azurerm"
  version = "~> 1.2"


  namespace   = var.namespace
  location    = var.location
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain

  create_redis       = true # Create Redis
  [...]

Deployment with External Queue

Deployment option 3 consists of enabling the external message broker. This is optional because the W&B brings embedded a broker. This option doesn’t bring a performance improvement.

The Azure resource that provides the message broker is the Azure Event Grid, and to enable it, you must add the option use_internal_queue = false to the same main.tf that you used in the recommended deployment

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/azurerm"
  version = "~> 1.2"


  namespace   = var.namespace
  location    = var.location
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain

  use_internal_queue       = false # Enable Azure Event Grid
  [...]
}

Other deployment options

You can combine all three deployment options adding all configurations to the same file. The Terraform Module provides several options that you can combine along with the standard options and the minimal configuration found in recommended deployment

1.3.3.4 - Reference Architecture

W&B Reference Architecture

This page describes a reference architecture for a Weights & Biases deployment and outlines the recommended infrastructure and resources to support a production deployment of the platform.

Depending on your chosen deployment environment for Weights & Biases (W&B), various services can help to enhance the resiliency of your deployment.

For instance, major cloud providers offer robust managed database services which help to reduce the complexity of database configuration, maintenance, high availability, and resilience.

This reference architecture addresses some common deployment scenarios and shows how you can integrate your W&B deployment with cloud vendor services for optimal performance and reliability.

Before you start

Running any application in production comes with its own set of challenges, and W&B is no exception. While we aim to streamline the process, certain complexities may arise depending on your unique architecture and design decisions. Typically, managing a production deployment involves overseeing various components, including hardware, operating systems, networking, storage, security, the W&B platform itself, and other dependencies. This responsibility extends to both the initial setup of the environment and its ongoing maintenance.

Consider carefully whether a self-managed approach with W&B is suitable for your team and specific requirements.

A strong understanding of how to run and maintain production-grade application is an important prerequisite before you deploy self-managed W&B. If your team needs assistance, our Professional Services team and partners offer support for implementation and optimization.

To learn more about managed solutions for running W&B instead of managing it yourself, refer to W&B Multi-tenant Cloud and W&B Dedicated Cloud.

Infrastructure

W&B infrastructure diagram

Application layer

The application layer consists of a multi-node Kubernetes cluster, with resilience against node failures. The Kubernetes cluster runs and maintains W&B’s pods.

Storage layer

The storage layer consists of a MySQL database and object storage. The MySQL database stores metadata and the object storage stores artifacts such as models and datasets.

Infrastructure requirements

Kubernetes

The W&B Server application is deployed as a Kubernetes Operator that deploys multiple Pods. For this reason, W&B requires a Kubernetes cluster with:

  • A fully configured and functioning Ingress controller
  • The capability to provision Persistent Volumes.

MySQL

W&B stores metadata in a MySQL database. The database’s performance and storage requirements depend on the shapes of the model parameters and related metadata. For example, the database grows in size as you track more training runs, and load on the database increases based on queries in run tables, user workspaces, and reports.

Consider the following when you deploy a self-managed MySQL database:

  • Backups. You should periodically back up the database to a separate facility. W&B recommends daily backups with at least 1 week of retention.
  • Performance. The disk the server is running on should be fast. W&B recommends running the database on an SSD or accelerated NAS.
  • Monitoring. The database should be monitored for load. If CPU usage is sustained at > 40% of the system for more than 5 minutes it is likely a good indication the server is resource starved.
  • Availability. Depending on your availability and durability requirements you might want to configure a hot standby on a separate machine that streams all updates in realtime from the primary server and can be used to failover to in the event that the primary server crashes or become corrupted.

Object storage

W&B requires object storage with Pre-signed URL and CORS support, deployed in Amazon S3, Azure Cloud Storage, Google Cloud Storage, or a storage service compatible with Amazon S3.service)

Versions

  • Kubernetes: at least version 1.29.
  • MySQL: at least 8.0.

Networking

In a deployment connected a public or private network, egress to the following endpoints is required during installation and during runtime: * https://deploy.wandb.ai * https://charts.wandb.ai * https://docker.io * https://quay.io * https://gcr.io

Access to W&B and to the object storage is required for the training infrastructure and for each system that tracks the needs of experiments.

DNS

The fully qualified domain name (FQDN) of the W&B deployment must resolve to the IP address of the ingress/load balancer using an A record.

SSL/TLS

W&B requires a valid signed SSL/TLS certificate for secure communication between clients and the server. SSL/TLS termination must occur on the ingress/load balancer. The W&B Server application does not terminate SSL or TLS connections.

Please note: W&B does not recommend the use self-signed certificates and custom CAs.

Supported CPU architectures

W&B runs on the Intel (x86) CPU architecture. ARM is not supported.

Infrastructure provisioning

Terraform is the recommended way to deploy W&B for production. Using Terraform, you define the required resources, their references to other resources, and their dependencies. W&B provides Terraform modules for the major cloud providers. For details, refer to Deploy W&B Server within self managed cloud accounts.

Sizing

Use the following general guidelines as a starting point when planning a deployment. W&B recommends that you monitor all components of a new deployment closely and that you make adjustments based on observed usage patterns. Continue to monitor production deployments over time and make adjustments as needed to maintain optimal performance.

Models only

Kubernetes

Environment CPU Memory Disk
Test/Dev 2 cores 16 GB 100 GB
Production 8 cores 64 GB 100 GB

Numbers are per Kubernetes worker node.

MySQL

Environment CPU Memory Disk
Test/Dev 2 cores 16 GB 100 GB
Production 8 cores 64 GB 500 GB

Numbers are per MySQL node.

Weave only

Kubernetes

Environment CPU Memory Disk
Test/Dev 4 cores 32 GB 100 GB
Production 12 cores 96 GB 100 GB

Numbers are per Kubernetes worker node.

MySQL

Environment CPU Memory Disk
Test/Dev 2 cores 16 GB 100 GB
Production 8 cores 64 GB 500 GB

Numbers are per MySQL node.

Models and Weave

Kubernetes

Environment CPU Memory Disk
Test/Dev 4 cores 32 GB 100 GB
Production 16 cores 128 GB 100 GB

Numbers are per Kubernetes worker node.

MySQL

Environment CPU Memory Disk
Test/Dev 2 cores 16 GB 100 GB
Production 8 cores 64 GB 500 GB

Numbers are per MySQL node.

Cloud provider instance recommendations

Services

Cloud Kubernetes MySQL Object Storage
AWS EKS RDS Aurora S3
GCP GKE Google Cloud SQL - Mysql Google Cloud Storage (GCS)
Azure AKS Azure Database for Mysql Azure Blob Storage

Machine types

These recommendations apply to each node of a self-managed deployment of W&B in cloud infrastructure.

AWS

Environment K8s (Models only) K8s (Weave only) K8s (Models&Weave) MySQL
Test/Dev r6i.large r6i.xlarge r6i.xlarge db.r6g.large
Production r6i.2xlarge r6i.4xlarge r6i.4xlarge db.r6g.2xlarge

GCP

Environment K8s (Models only) K8s (Weave only) K8s (Models&Weave) MySQL
Test/Dev n2-highmem-2 n2-highmem-4 n2-highmem-4 db-n1-highmem-2
Production n2-highmem-8 n2-highmem-16 n2-highmem-16 db-n1-highmem-8

Azure

Environment K8s (Models only) K8s (Weave only) K8s (Models&Weave) MySQL
Test/Dev Standard_E2_v5 Standard_E4_v5 Standard_E4_v5 MO_Standard_E2ds_v4
Production Standard_E8_v5 Standard_E16_v5 Standard_E16_v5 MO_Standard_E8ds_v4

1.3.4 - Deploy W&B Platform On-premises

Hosting W&B Server on on-premises infrastructure

Reach out to the W&B Sales Team for related question: contact@wandb.com.

Infrastructure guidelines

Before you start deploying W&B, refer to the reference architecture, especially the infrastructure requirements.

Application server

W&B recommends deploying W&B Server into its own namespace and a two availability zone node group with the following specifications to provide the best performance, reliability, and availability:

Specification Value
Bandwidth Dual 10 Gigabit+ Ethernet Network
Root Disk Bandwidth (Mbps) 4,750+
Root Disk Provision (GB) 100+
Core Count 4
Memory (GiB) 8

This ensures that W&B Server has sufficient disk space to process the application data and store temporary logs before they are externalized.

It also ensures fast and reliable data transfer, the necessary processing power and memory for smooth operation, and that W&B will not be affected by any noisy neighbors.

It is important to keep in mind that these specifications are minimum requirements, and actual resource needs may vary depending on the specific usage and workload of the W&B application. Monitoring the resource usage and performance of the application is critical to ensure that it operates optimally and to make adjustments as necessary.

Database server

W&B recommends a MySQL 8 database as a metadata store. The shape of the model parameters and related metadata impact the performance of the database. The database size grows as the ML practitioners track more training runs, and incurs read heavy load when queries are executed in run tables, users workspaces, and reports.

To ensure optimal performance W&B recommends deploying the W&B database on to a server with the following starting specs:

Specification Value
Bandwidth Dual 10 Gigabit+ Ethernet Network
Root Disk Bandwidth (Mbps) 4,750+
Root Disk Provision (GB) 1000+
Core Count 4
Memory (GiB) 32

Again, W&B recommends monitoring the resource usage and performance of the database to ensure that it operates optimally and to make adjustments as necessary.

Additionally, W&B recommends the following parameter overrides to tune the DB for MySQL 8.

Object storage

W&B is compatible with an object storage that supports S3 API interface, Signed URLs and CORS. W&B recommends specifying the storage array to the current needs of your practitioners and to capacity plan on a regular cadence.

More details on object store configuration can be found in the how-to section.

Some tested and working providers:

Secure Storage Connector

The Secure Storage Connector is not available for teams at this time for bare metal deployments.

MySQL database

There are a number of enterprise services that make operating a scalable MySQL database simpler. W&B recommends looking into one of the following solutions:

https://www.percona.com/software/mysql-database/percona-server

https://github.com/mysql/mysql-operator

Satisfy the conditions below if you run W&B Server MySQL 8.0 or when you upgrade from MySQL 5.7 to 8.0:

binlog_format = 'ROW'
innodb_online_alter_log_max_size = 268435456
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
binlog_row_image = 'MINIMAL'

Due to some changes in the way that MySQL 8.0 handles sort_buffer_size, you might need to update the sort_buffer_size parameter from its default value of 262144. The recommendation is to set the value to 67108864 (64MiB) to ensure that MySQL works efficiently with W&B. MySQL supports this configuration starting with v8.0.28.

Database considerations

Create a database and a user with the following SQL query. Replace SOME_PASSWORD with password of your choice:

CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'SOME_PASSWORD';
CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION;

Parameter group configuration

Ensure that the following parameter groups are set to tune the database performance:

binlog_format = 'ROW'
innodb_online_alter_log_max_size = 268435456
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
binlog_row_image = 'MINIMAL'
sort_buffer_size = 67108864

Object storage

The object store can be externally hosted on a Minio cluster, or any Amazon S3 compatible object store that has support for signed URLs. Run the following script to check if your object store supports signed URLs.

Additionally, the following CORS policy needs to be applied to the object store.

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>http://YOUR-W&B-SERVER-IP</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedMethod>HEAD</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

You can specify your credentials in a connection string when you connect to an Amazon S3 compatible object store. For example, you can specify the following:

s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME

You can optionally tell W&B to only connect over TLS if you configure a trusted SSL certificate for your object store. To do so, add the tls query parameter to the URL. For example, the following URL example demonstrates how to add the TLS query parameter to an Amazon S3 URI:

s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME?tls=true

Set BUCKET_QUEUE to internal:// if you use third-party object stores. This tells the W&B server to manage all object notifications internally instead of depending on an external SQS queue or equivalent.

The most important things to consider when running your own object store are:

  1. Storage capacity and performance. It’s fine to use magnetic disks, but you should be monitoring the capacity of these disks. Average W&B usage results in 10’s to 100’s of Gigabytes. Heavy usage could result in Petabytes of storage consumption.
  2. Fault tolerance. At a minimum, the physical disk storing the objects should be on a RAID array. If you use minio, consider running it in distributed mode.
  3. Availability. Monitoring should be configured to ensure the storage is available.

There are many enterprise alternatives to running your own object storage service such as:

  1. https://aws.amazon.com/s3/outposts/
  2. https://www.netapp.com/data-storage/storagegrid/

MinIO set up

If you use minio, you can run the following commands to create a bucket.

mc config host add local http://$MINIO_HOST:$MINIO_PORT "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY" --api s3v4
mc mb --region=us-east1 local/local-files

Deploy W&B Server application to Kubernetes

The recommended installation method is with the official W&B Helm chart. Follow this section to deploy the W&B Server application.

OpenShift

W&B supports operating from within an OpenShift Kubernetes cluster.

Run the container as an un-privileged user

By default, containers use a $UID of 999. Specify $UID >= 100000 and a $GID of 0 if your orchestrator requires the container run with a non-root user.

An example security context for Kubernetes looks similar to the following:

spec:
  securityContext:
    runAsUser: 100000
    runAsGroup: 0

Networking

Load balancer

Run a load balancer that stop network requests at the appropriate network boundary.

Common load balancers include:

  1. Nginx Ingress
  2. Istio
  3. Caddy
  4. Cloudflare
  5. Apache
  6. HAProxy

Ensure that all machines used to execute machine learning payloads, and the devices used to access the service through web browsers, can communicate to this endpoint.

SSL / TLS

W&B Server does not stop SSL. If your security policies require SSL communication within your trusted networks consider using a tool like Istio and side car containers. The load balancer itself should terminate SSL with a valid certificate. Using self-signed certificates is not supported and will cause a number of challenges for users. If possible using a service like Let’s Encrypt is a great way to provided trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you.

Example nginx configuration

The following is an example configuration using nginx as a reverse proxy.

events {}
http {
    # If we receive X-Forwarded-Proto, pass it through; otherwise, pass along the
    # scheme used to connect to this server
    map $http_x_forwarded_proto $proxy_x_forwarded_proto {
        default $http_x_forwarded_proto;
        ''      $scheme;
    }

    # Also, in the above case, force HTTPS
    map $http_x_forwarded_proto $sts {
        default '';
        "https" "max-age=31536000; includeSubDomains";
    }

    # If we receive X-Forwarded-Host, pass it though; otherwise, pass along $http_host
    map $http_x_forwarded_host $proxy_x_forwarded_host {
        default $http_x_forwarded_host;
        ''      $http_host;
    }

    # If we receive X-Forwarded-Port, pass it through; otherwise, pass along the
    # server port the client connected to
    map $http_x_forwarded_port $proxy_x_forwarded_port {
        default $http_x_forwarded_port;
        ''      $server_port;
    }

    # If we receive Upgrade, set Connection to "upgrade"; otherwise, delete any
    # Connection header that may have been passed to this server
    map $http_upgrade $proxy_connection {
        default upgrade;
        '' close;
    }

    server {
        listen 443 ssl;
        server_name         www.example.com;
        ssl_certificate     www.example.com.crt;
        ssl_certificate_key www.example.com.key;

        proxy_http_version 1.1;
        proxy_buffering off;
        proxy_set_header Host $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $proxy_connection;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $proxy_x_forwarded_proto;
        proxy_set_header X-Forwarded-Host $proxy_x_forwarded_host;

        location / {
            proxy_pass  http://$YOUR_UPSTREAM_SERVER_IP:8080/;
        }

        keepalive_timeout 10;
    }
}

Verify your installation

Very your W&B Server is configured properly. Run the following commands in your terminal:

pip install wandb
wandb login --host=https://YOUR_DNS_DOMAIN
wandb verify

Check log files to view any errors the W&B Server hits at startup. Run the following commands:

docker logs wandb-local
kubectl get pods
kubectl logs wandb-XXXXX-XXXXX

Contact W&B Support if you encounter errors.

1.3.5 - Update W&B license and version

Guide for updating W&B (Weights & Biases) version and license across different installation methods.

Update your W&B Server Version and License with the same method you installed W&B Server with. The following table lists how to update your license and version based on different deployment methods:

Release Type Description
Terraform W&B supports three public Terraform modules for cloud deployment: AWS, GCP, and Azure.
Helm You can use the Helm Chart to install W&B into an existing Kubernetes cluster.

Update with Terraform

Update your license and version with Terraform. The proceeding table lists W&B managed Terraform modules based cloud platform.

Cloud provider Terraform module
AWS AWS Terraform module
GCP GCP Terraform module
Azure Azure Terraform module
  1. First, navigate to the W&B maintained Terraform module for your appropriate cloud provider. See the preceding table to find the appropriate Terraform module based on your cloud provider.

  2. Within your Terraform configuration, update wandb_version and license in your Terraform wandb_app module configuration:

    module "wandb_app" {
        source  = "wandb/wandb/<cloud-specific-module>"
        version = "new_version"
        license       = "new_license_key" # Your new license key
        wandb_version = "new_wandb_version" # Desired W&B version
        ...
    }
    
  3. Apply the Terraform configuration with terraform plan and terraform apply.

    terraform init
    terraform apply
    
  4. (Optional) If you use a terraform.tfvars or other .tfvars file.

    Update or create a terraform.tfvars file with the new W&B version and license key.

    terraform plan -var-file="terraform.tfvars"
    

    Apply the configuration. In your Terraform workspace directory execute:

    terraform apply -var-file="terraform.tfvars"
    

Update with Helm

Update W&B with spec

  1. Specify a new version by modifying the image.tag and/or license values in your Helm chart *.yaml configuration file:

    license: 'new_license'
    image:
      repository: wandb/local
      tag: 'new_version'
    
  2. Execute the Helm upgrade with the following command:

    helm repo update
    helm upgrade --namespace=wandb --create-namespace \
      --install wandb wandb/wandb --version ${chart_version} \
      -f ${wandb_install_spec.yaml}
    

Update license and version directly

  1. Set the new license key and image tag as environment variables:

    export LICENSE='new_license'
    export TAG='new_version'
    
  2. Upgrade your Helm release with the command below, merging the new values with the existing configuration:

    helm repo update
    helm upgrade --namespace=wandb --create-namespace \
      --install wandb wandb/wandb --version ${chart_version} \
      --reuse-values --set license=$LICENSE --set image.tag=$TAG
    

For more details, see the upgrade guide in the public repository.

Update with admin UI

This method is only works for updating licenses that are not set with an environment variable in the W&B server container, typically in self-hosted Docker installations.

  1. Obtain a new license from the W&B Deployment Page, ensuring it matches the correct organization and deployment ID for the deployment you are looking to upgrade.
  2. Access the W&B Admin UI at <host-url>/system-settings.
  3. Navigate to the license management section.
  4. Enter the new license key and save your changes.

2 - Identity and access management (IAM)

W&B Platform has three IAM scopes within W&B: Organizations, Teams, and Projects.

Organization

An Organization is the root scope in your W&B account or instance. All actions in your account or instance take place within the context of that root scope, including managing users, managing teams, managing projects within teams, tracking usage and more.

If you are using Multi-tenant Cloud, you may have more than one organization where each may correspond to a business unit, a personal user, a joint partnership with another business and more.

If you are using Dedicated Cloud or a Self-managed instance, it corresponds to one organization. Your company may have more than one of Dedicated Cloud or Self-managed instances to map to different business units or departments, though that is strictly an optional way to manage AI practioners across your businesses or departments.

For more information, see Manage orrganizations.

Team

A Team is a subscope within a organization, that may map to a business unit / function, department, or a project team in your company. You may have more than one team in your organization depending on your deployment type and pricing plan.

AI projects are organized within the context of a team. The access control within a team is governed by team admins, who may or may not be admins at the parent organization level.

For more information, see Add and manage teams.

Project

A Project is a subscope within a team, that maps to an actual AI project with specific intended outcomes. You may have more than one project within a team. Each project has a visibility mode which determines who can access it.

Every project is comprised of Workspaces and Reports, and is linked to relevant Artifacts, Sweeps, Launch Jobs and Automations.

2.1 - Authentication

2.1.1 - Configure SSO with LDAP

Authenticate your credentials with the W&B Server LDAP server. The following guide explains how to configure the settings for W&B Server. It covers mandatory and optional configurations, as well as instructions for configuring the LDAP connection from systems settings UI. it also provides information on the different inputs of the LDAP configuration, such as the address, base distinguished name, and attributes. You can specify these attributes from the W&B App UI or using environment variables. You can setup either an anonymous bind, or bind with an administrator DN and Password.

Configure LDAP connection

  1. Navigate to the W&B App.
  2. Select your profile icon from the upper right. From the dropdown, select System Settings.
  3. Toggle Configure LDAP Client.
  4. Add the details in the form. Refer to Configuring Parameters section for details on each input.
  5. Click on Update Settings to test your settings. This will establish a test client/connection with the W&B server.
  6. If your connection is verified, toggle the Enable LDAP Authentication and select the Update Settings button.

Set LDAP an connection with the following environment variables:

Environment variable Required Example
LOCAL_LDAP_ADDRESS Yes ldaps://ldap.example.com:636
LOCAL_LDAP_BASE_DN Yes email=mail,group=gidNumber
LOCAL_LDAP_BIND_DN No cn=admin, dc=example,dc=org
LOCAL_LDAP_BIND_PW No
LOCAL_LDAP_ATTRIBUTES Yes email=mail, group=gidNumber
LOCAL_LDAP_TLS_ENABLE No
LOCAL_LDAP_GROUP_ALLOW_LIST No
LOCAL_LDAP_LOGIN No

See the Configuration parameters section for definitions of each environment variable. Note that the environment variable prefix LOCAL_LDAP was omitted from the definition names for clarity.

Configuration parameters

The following table lists and describes required and optional LDAP configurations.

Environment variable Definition Required
ADDRESS This is the address of your LDAP server within the VPC that hosts W&B Server. Yes
BASE_DN The root path searches start from and required for doing any queries into this directory. Yes
BIND_DN Path of the administrative user registered in the LDAP server. This is required if the LDAP server does not support unauthenticated binding. If specified, W&B Server connects to the LDAP server as this user. Otherwise, W&B Server connects using anonymous binding. No
BIND_PW The password for administrative user, this is used to authenticate the binding. If left blank, W&B Server connects using anonymous binding. No
ATTRIBUTES Provide an email and group ID attribute names as comma separated string values. Yes
TLS_ENABLE Enable TLS. No
GROUP_ALLOW_LIST Group allowlist. No
LOGIN This tells W&B Server to use LDAP to authenticate. Set to either True or False. Optionally set this to false to test the LDAP configuration. Set this to true to start LDAP authentication. No

2.1.2 - Configure SSO with OIDC

W&B Server’s support for OpenID Connect (OIDC) compatible identity providers allows for management of user identities and group memberships through external identity providers like Okta, Keycloak, Auth0, Google, and Entra.

OpenID Connect (OIDC)

W&B Server supports the following OIDC authentication flows for integrating with external Identity Providers (IdPs).

  1. Implicit Flow with Form Post
  2. Authorization Code Flow with Proof Key for Code Exchange (PKCE)

These flows authenticate users and provide W&B Server with the necessary identity information (in the form of ID tokens) to manage access control.

The ID token is a JWT that contains the user’s identity information, such as their name, username, email, and group memberships. W&B Server uses this token to authenticate the user and map them to appropriate roles or groups in the system.

In the context of W&B Server, access tokens authorize requests to APIs on behalf of the user, but since W&B Server’s primary concern is user authentication and identity, it only requires the ID token.

You can use environment variables to configure IAM options for your Dedicated cloud or Self-managed instance.

To assist with configuring Identity Providers for Dedicated cloud or Self-managed W&B Server installations, follow these guidelines to follow for various IdPs. If you’re using the SaaS version of W&B, reach out to support@wandb.com for assistance in configuring an Auth0 tenant for your organization.

Follow the procedure below to set up AWS Cognito for authorization:

  1. First, sign in to your AWS account and navigate to the AWS Cognito App.

    When you use OIDC for authentication and not authorization, public clients simplify setup
  2. Provide an allowed callback URL to configure the application in your IdP:

    • Add http(s)://YOUR-W&B-HOST/oidc/callback as the callback URL. Replace YOUR-W&B-HOST with your W&B host path.
  3. If your IdP supports universal logout, set the Logout URL to http(s)://YOUR-W&B-HOST. Replace YOUR-W&B-HOST with your W&B host path.

    For example, if your application was running at https://wandb.mycompany.com, you would replace YOUR-W&B-HOST with wandb.mycompany.com.

    The image below demonstrates how to provide allowed callback and sign-out URLs in AWS Cognito.

    If your instance is accessible from multiple hosts, be sure to include all of them here.

    wandb/local uses the implicit grant with the form_post response type by default.

    You can also configure wandb/local to perform an authorization_code grant that uses the PKCE Code Exchange flow.

  4. Select one or more OAuth grant types to configure how AWS Cognito delivers tokens to your app.

  5. W&B requires specific OpenID Connect (OIDC) scopes. Select the following from AWS Cognito App:

    • “openid”
    • “profile”
    • “email”

    For example, your AWS Cognito App UI should look similar to the following image:

    Required fields

    Select the Auth Method in the settings page or set the OIDC_AUTH_METHOD environment variable to tell wandb/local which grant to.

    You must set the Auth Method to pkce.

  6. You need a Client ID and the URL of your OIDC issuer. The OpenID discovery document must be available at $OIDC_ISSUER/.well-known/openid-configuration

    For example, , you can generate your issuer URL by appending your User Pool ID to the Cognito IdP URL from the App Integration tab within the User Pools section:

    Screenshot of issuer URL in AWS Cognito

    Do not use the “Cognito domain” for the IDP URL. Cognito provides it’s discovery document at https://cognito-idp.$REGION.amazonaws.com/$USER_POOL_ID

Follow the procedure below to set up Okta for authorization:

  1. Login to the Okta Portal at https://login.okta.com/.

  2. On the left side, select Applications and then Applications again.

  3. Click on “Create App integration.”

  4. On the screen named “Create a new app integration,” select OIDC - OpenID Connect and Single-Page Application. Then click “Next.”

  5. On the screen named “New Single-Page App Integration,” fill out the values as follows and click Save:

    • App integration name, for example “Weights & Biases”
    • Grant type: Select both Authorization Code and Implicit (hybrid)
    • Sign-in redirect URIs: https://YOUR_W_AND_B_URL/oidc/callback
    • Sign-out redirect URIs: https://YOUR_W_AND_B_URL/logout
    • Assignments: Select Skip group assignment for now
  6. On the overview screen of the Okta application that you just created, make note of the Client ID under Client Credentials under the General tab:

  7. To identify the Okta OIDC Issuer URL, select Settings and then Account on the left side. The Okta UI shows the company name under Organization Contact.

The OIDC issuer URL has the following format: https://COMPANY.okta.com. Replace COMPANY with the corresponding value. Make note of it.

  1. Login to the Azure Portal at https://portal.azure.com/.

  2. Select “Microsoft Entra ID” service.

  3. On the left side, select “App registrations.”

  4. On the top, click “New registration.”

    On the screen named “Register an application,” fill out the values as follows:

    • Specify a name, for example “Weights and Biases application”

    • By default the selected account type is: “Accounts in this organizational directory only (Default Directory only - Single tenant).” Modify if you need to.

    • Configure Redirect URI as type Web with value: https://YOUR_W_AND_B_URL/oidc/callback

    • Click “Register.”

    • Make a note of the “Application (client) ID” and “Directory (tenant) ID.”

  5. On the left side, click Authentication.

    • Under Front-channel logout URL, specify: https://YOUR_W_AND_B_URL/logout

    • Click “Save.”

  6. On the left side, click “Certificates & secrets.”

    • Click “Client secrets” and then click “New client secret.”

      On the screen named “Add a client secret,” fill out the values as follows:

      • Enter a description, for example “wandb”
      • Leave “Expires” as is or change if you have to.
      • Click “Add.”
    • Make a note of the “Value” of the secret. There is no need for the “Secret ID.”

You should now have made notes of three values:

  • OIDC Client ID
  • OIDC Client Secret
  • Tenant ID is needed for the OIDC Issuer URL

The OIDC issuer URL has the following format: https://login.microsoftonline.com/${TenantID}/v2.0

Set up SSO on the W&B Server

To set up SSO, you need administrator privileges and the following information:

  • OIDC Client ID
  • OIDC Auth method (implicit or pkce)
  • OIDC Issuer URL
  • OIDC Client Secret (optional; depends on how you have setup your IdP)

You can configure SSO using either the W&B Server UI or by passing environment variables to the wandb/local pod. The environment variables take precedence over UI.

The System Console is the successor to the System Settings page. It is available with the W&B Kubernetes Operator based deployment.

  1. Refer to Access the W&B Management Console.

  2. Navigate to Settings, then Authentication. Select OIDC in the Type dropdown.

  3. Enter the values.

  4. Click on Save.

  5. Log out and then log back in, this time using the IdP login screen.

  1. Sign in to your Weights&Biases instance.

  2. Navigate to the W&B App.

  3. From the dropdown, select System Settings:

  4. Enter your Issuer, Client ID, and Authentication Method.

  5. Select Update settings.

Security Assertion Markup Language (SAML)

W&B Server does not support SAML.

2.1.3 - Use federated identities with SDK

Use identity federation to sign in using your organizational credentials through W&B SDK. If your W&B organization admin has configured SSO for your organization, then you already use your organizational credentials to sign-in to the W&B app UI. In that sense, identity federation is like SSO for W&B SDK, but by using JSON Web Tokens (JWTs) directly. You can use identity federation as an alternative to API keys.

RFC 7523 forms the underlying basis for identity federation with SDK.

JWT issuer setup

As a first step, an organization admin must set up a federation between your W&B organization and a publicly accessible JWT issuer.

  • Go to the Settings tab in your organization dashboard
  • In the Authentication option, press Set up JWT Issuer
  • Add the JWT issuer URL in the text box and press Create

W&B will automatically look for a OIDC discovery document at the path ${ISSUER_URL}/.well-known/oidc-configuration, and try to find the JSON Web Key Set (JWKS) at a relevant URL in the discovery document. The JWKS is used for real-time validation of the JWTs to ensure that those have been issued by the relevant identity provider.

Using the JWT to access W&B

Once a JWT issuer has been setup for your W&B organization, users can start accessing the relevant W&B projects using JWTs issued by that identity provider. The mechanism for using JWTs is as follows:

  • You must sign-in to the identity provider using one of the mechanisms available in your organization. Some providers can be accessed in an automated manner using an API or SDK, while some can only be accessed using a relevant UI. Reach out to your W&B organization admin or the owner of the JWT issuer for details.
  • Once you’ve retrieved the JWT after signing in to your identity provider, store it in a file at a secure location and configure the absolute file path in an environment variable WANDB_IDENTITY_TOKEN_FILE.
  • Access your W&B project using the W&B SDK or CLI. The SDK or CLI should automatically detect the JWT and exchange it for a W&B access token after the JWT has been successfully validated. The W&B access token is used to access the relevant APIs for enabling your AI workflows, that is, to log runs, metrics, artifacts and so forth. The access token is by default stored at the path ~/.config/wandb/credentials.json. You can change that path by specifying the environment variable WANDB_CREDENTIALS_FILE.

JWT validation

As part of the workflow to exchange the JWT for a W&B access token and then access a project, the JWT undergoes following validations:

  • The JWT signature is verified using the JWKS at the W&B organization level. This is the first line of defense, and if this fails, that means there’s a problem with your JWKS or how your JWT is signed.
  • The iss claim in the JWT should be equal to the issuer URL configured at the organization level.
  • The sub claim in the JWT should be equal to the user’s email address as configured in the W&B organization.
  • The aud claim in the JWT should be equal to the name of the W&B organization which houses the project that you are accessing as part of your AI workflow. In case of Dedicated Cloud or Self-managed instances, you could configure an instance-level environment variable SKIP_AUDIENCE_VALIDATION to true to skip validation of the audience claim, or use wandb as the audience.
  • The exp claim in the JWT is checked to see if the token is valid or has expired and needs to be refreshed.

External service accounts

W&B has supported built-in service accounts with long-lived API keys for long. With the identity federation capability for SDK and CLI, you can also bring external service accounts that could use JWTs for authentication, though as long as those are issued by the same issuer which is configured at the organization level. A team admin can configure external service accounts within the scope of a team, like the built-in service accounts.

To configure an external service account:

  • Go to the Service Accounts tab for your team
  • Press New service account
  • Provide a name for the service account, select Federated Identity as the Authentication Method, provide a Subject, and press Create

The sub claim in the external service account’s JWT should be same as what the team admin configures as its subject in the team-level Service Accounts tab. That claim is verified as part of JWT validation. The aud claim requirement is similar to that for human user JWTs.

When using an external service account’s JWT to access W&B, it’s typically easier to automate the workflow to generate the initial JWT and continuously refresh it. If you would like to attribute the runs logged using an external service account to a human user, you can configure the environment variables WANDB_USERNAME or WANDB_USER_EMAIL for your AI workflow, similar to how it’s done for the built-in service accounts.

2.1.4 - Use service accounts to automate workflows

Manage automated or non-interactive workflows using org and team scoped service accounts

A service account represents a non-human or machine user that can automatically perform common tasks across projects within a team or across teams.

  • An org admin can create a service account at the scope of the organization.
  • A team admin can create a service account at the scope of that team.

A service account’s API key allows the caller to read from or write to projects within the service account’s scope.

Service accounts allow for centralized management of workflows by multiple users or teams, to automate experiment tracking for W&B Models or to log traces for W&B Weave. You have the option to associate a human user’s identity with a workflow managed by a service account, by using either of the environment variables WANDB_USERNAME or WANDB_USER_EMAIL.

Organization-scoped service accounts

Service accounts scoped to an organization have permissions to read and write in all projects in the organization, regardless of the team, with the exception of restricted projects. Before an organization-scoped service account can access a restricted project, an admin of that project must explicitly add the service account to the project.

An organization admin can obtain the API key for an organization-scoped service account from the Service Accounts tab of the organization or account dashboard.

To create a new organization-scoped service account:

  • Click New service account button in the Service Accounts tab of your organization dashboard.
  • Enter a Name.
  • Select a default team for the service account.
  • Click Create.
  • Next to the newly created service account, click Copy API key.
  • Store the copied API key in a secret manager or another secure but accessible location.

Team-scoped service accounts

A team-scoped service account can read and write in all projects within its team, except to restricted projects in that team. Before a team-scoped service account can access a restricted project, an admin of that project must explicitly add the service account to the project.

As a team admin, you can get the API key for a team-scoped service account in your team at <WANDB_HOST_URL>/<your-team-name>/service-accounts. Alternatively you can go to the Team settings for your team and then refer to the Service Accounts tab.

To create a new team scoped service account for your team:

  • Click New service account button in the Service Accounts tab of your team.
  • Enter a Name.
  • Select Generate API key (Built-in) as the authentication method.
  • Click Create.
  • Next to the newly created service account, click Copy API key.
  • Store the copied API key in a secret manager or another secure but accessible location.

If you do not configure a team in your model training or generative AI app environment that uses a team-scoped service account, the model runs or weave traces log to the named project within the service account’s parent team. In such a scenario, user attribution using the WANDB_USERNAME or WANDB_USER_EMAIL variables do not work unless the referenced user is part of the service account’s parent team.

External service accounts

In addition to Built-in service accounts, W&B also supports team-scoped External service accounts with the W&B SDK and CLI using Identity federation with identity providers (IdPs) that can issue JSON Web Tokens (JWTs).

2.2 - Access management

Manage users and teams within an organization

The first user to sign up to W&B with a unique organization domain is assigned as that organization’s instance administrator role. The organization administrator assigns specific users team administrator roles.

A team administrator is a user in organization that has administrative permissions within a team.

The organization administrator can access and use an organization’s account settings at https://wandb.ai/account-settings/ to invite users, assign or update a user’s role, create teams, remove users from your organization, assign the billing administrator, and more. See Add and manage users for more information.

Once an organization administrator creates a team, the instance administrator or ateam administrator can:

  • Invite users to that team or remove users from the team.
  • Assign or update a team member’s role.
  • Automatically add new users to a team when they join your organization.

Both the organization administrator and the team administrator use team dashboards at https://wandb.ai/<your-team-name> to manage teams. For more information on what organization administrators and team administrators can do, see Add and manage teams.

Limit visibility to specific projects

Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it. Limiting who can view a project is particularly useful if a team works with sensitive or confidential data.

An organization admin, team admin, or the owner of a project can both set and edit a project’s visibility.

For more information, see Project visibility.

2.2.1 - Manage your organization

As an administrator of an organization you can manage individual users within your organization and manage teams.

As a team administrator you can manage teams.

If you are looking to simplify user management in your organization, refer to Automate user and team management.

Change the name of your organization

  1. Navigate to https://wandb.ai/home.
  2. In the upper right corner of the page, select the User menu dropdown. Within the Account section of the dropdown, select Settings.
  3. Within the Settings tab, select General.
  4. Select the Change name button.
  5. Within the modal that appears, provide a new name for your organization and select the Save name button.

Add and manage users

As an administrator, use your organization’s dashboard to:

  • Invite or remove users.
  • Assign or update a user’s role.
  • Assign the billing administrator.

There are several ways an organization administrator can add users to an organization:

  1. Member-by-invite
  2. Auto provisioning with SSO
  3. Domain capture

Seats and pricing

The proceeding table summarizes how seats work for Models and Weave:

Product Seats Cost based on
Models Pay per set How many Models paid seats you have, and how much usage you’ve accrued determines your overall subscription cost. Each user can be assigned one of the three available seat types: Full, Viewer, and No-Access
Weave Free Usage based

Invite a user

Administrators can invite users to their organization, as well as specific teams within the organization.

  1. Navigate to https://wandb.ai/home.
  2. In the upper right corner of the page, select the User menu dropdown. Within the Account section of the dropdown, select Users.
  3. Select Invite new user.
  4. In the modal that appears, provide the email or username of the user in the Email or username field.
  5. (Recommended) Add the user to a team from the Choose teams dropdown menu.
  6. From the Select role dropdown, select the role to assign to the user. You can change the user’s role at a later time. See the table listed in Assign a role for more information about possible roles.
  7. Choose the Send invite button.

W&B sends an invite link using a third-party email server to the user’s email after you select the Send invite button. A user can access your organization once they accept the invite.

  1. Navigate to https://<org-name>.io/console/settings/. Replace <org-name> with your organization name.
  2. Select the Add user button
  3. Within the modal that appears, provide the email of the new user in the Email field.
  4. Select a role to assign to the user from the Role dropdown. You can change the user’s role at a later time. See the table listed in Assign a role for more information about possible roles.
  5. Check the Send invite email to user box if you want W&B to send an invite link using a third-party email server to the user’s email.
  6. Select the Add new user button.

Auto provision users

A W&B user with matching email domain can sign in to your W&B Organization with Single Sign-On (SSO) if you configure SSO and your SSO provider permits it. SSO is available for all Enterprise licenses.

W&B assigned auto-provisioning users “Member” roles by default. You can change the role of auto-provisioned users at any time.

Auto-provisioning users with SSO is on by default for Dedicated cloud instances and Self-managed deployments. You can turn off auto provisioning. Turning auto provisioning off enables you to selectively add specific users to your W&B organization.

The proceeding tabs describe how to turn off SSO based on deployment type:

Reach out to your W&B team if you are on Dedicated cloud instance and you want to turn off auto provisioning with SSO.

Use the W&B Console to turn off auto provisioning with SSO:

  1. Navigate to https://<org-name>.io/console/settings/. Replace <org-name> with your organization name.
  2. Choose Security
  3. Select the Disable SSO Provisioning to turn off auto provisioning with SSO.

Domain capture

Domain capture helps your employees join the your companies organization to ensure new users do not create assets outside of your company jurisdiction.

Domain capture lets you automatically add people with a company email address, such as  @example.com, to your W&B SaaS cloud organization. This helps all your employees join the right organization and ensures that new users do not create assets outside of your company jurisdiction.

This table summarizes the behavior of new and existing users with and without domain capture enabled:

With domain capture Without domain capture
New users Users who sign up for W&B from verified domains are automatically added as members to your organization’s default team. They can choose additional teams to join at sign up, if you enable team joining. They can still join other organizations and teams with an invitation. Users can create W&B accounts without knowing there is a centralized organization available.
Invited users Invited users automatically join your organization when accepting your invite. Invited users are not automatically added as members to your organization’s default team. They can still join other organizations and teams with an invitation. Invited users automatically join your organization when accepting your invite. They can still join other organizations and teams with an invitation.
Existing users Existing users with verified email addresses from your domains can join your organization’s teams within the W&B App. All data that existing users create before joining your organization remains. W&B does not migrate the existing user’s data. Existing W&B users may be spread across multiple organizations and teams.

To automatically assign non-invited new users to a default team when they join your organization:

  1. Navigate to https://wandb.ai/home.
  2. In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Settings.
  3. Within the Settings tab, select General.
  4. Choose the Claim domain button within Domain capture.
  5. Select the team that you want new users to automatically join from the Default team dropdown. If no teams are available, you’ll need to update team settings. See the instructions in Add and manage teams.
  6. Click the Claim email domain button.

You must enable domain matching within a team’s settings before you can automatically assign non-invited new users to that team.

  1. Navigate to the team’s dashboard at https://wandb.ai/<team-name>. Where <team-name> is the name of the team you want to enable domain matching.
  2. Select Team settings in the global navigation on the left side of the team’s dashboard.
  3. Within the Privacy section, toggle the “Recommend new users with matching email domains join this team upon signing up” option.

Reach out to your W&B Account Team if you use Dedicated or Self-managed deployment type to configure domain capture. Once configured, your W&B SaaS instance automatically prompts users who create a W&B account with your company email address to contact your administrator to request access to your Dedicated or Self-managed instance.

With domain capture Without domain capture
New users Users who sign up for W&B on SaaS cloud from verified domains are automatically prompted to contact an administrator with an email address you customize. They can still create an organizations on SaaS cloud to trial the product. Users can create W&B SaaS cloud accounts without learning their company has a centralized dedicated instance.
Existing users Existing W&B users may be spread across multiple organizations and teams. Existing W&B users may be spread across multiple organizations and teams.

Assign or update a user’s role

Every member in an Organization has an organization role and seat for both W&B Models and Weave. The type of seat they have determines both their billing status and the actions they can take in each product line.

You initially assign an organization role to a user when you invite them to your organization. You can change any user’s role at a later time.

A user within an organization can have one of the proceeding roles:

Role Descriptions
Administrator A instance administrator who can add or remove other users to the organization, change user roles, manage custom roles, add teams and more. W&B recommends ensuring there is more than one administrator in the event that your administrator is unavailable.
Member A regular user of the organization, invited by an instance administrator. A organization member cannot invite other users or manage existing users in the organization.
Viewer (Enterprise-only feature) A view-only user of your organization, invited by an instance administrator. A viewer only has read access to the organization and the underlying teams that they are a member of.
Custom Roles (Enterprise-only feature) Custom roles allow organization administrators to compose new roles by inheriting from the preceding View-Only or Member roles, and adding additional permissions to achieve fine-grained access control. Team administrators can then assign any of those custom roles to users in their respective teams.

To change a user’s role:

  1. Navigate to https://wandb.ai/home.
  2. In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Users.
  3. Provide the name or email of the user in the search bar.
  4. Select a role from the TEAM ROLE dropdown next to the name of the user.

Assign or update a user’s access

A user within an organization has one of the proceeding model seat or weave access types: full, viewer, or no access.

Seat type Description
Full Users with this role type have full permissions to write, read, and export data for Models or Weave.
Viewer A view-only user of your organization. A viewer only has read access to the organization and the underlying teams that they are a part of, and view only access to Models or Weave.
No access Users with this role have no access to the Models or Weave products.

Model seat type and weave access type are defined at the organization level, and inherited by the team. If you want to change a user’s seat type, navigate to the organization settings and follow the proceeding steps:

  1. For SaaS users, navigate to your organization’s settings at https://wandb.ai/account-settings/<organization>/settings. Ensure to replace the values enclosed in angle brackets (<>) with your organization name. For other Dedicated and Self-managed deployments, navigate to https://<your-instance>.wandb.io/org/dashboard.
  2. Select the Users tab.
  3. From the Role dropdown, select the seat type you want to assign to the user.

Remove a user

  1. Navigate to https://wandb.ai/home.
  2. In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Users.
  3. Provide the name or email of the user in the search bar.
  4. Select the ellipses or three dots icon () when it appears.
  5. From the dropdown, choose Remove member.

Assign the billing administrator

  1. Navigate to https://wandb.ai/home.
  2. In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Users.
  3. Provide the name or email of the user in the search bar.
  4. Under the Billing admin column, choose the user you want to assign as the billing administrator.

Add and manage teams

Use your organization’s dashboard to create and manage teams within your organization. The org administrator or a team administrator can:

  • Invite users to a team or remove users from a team.
  • Manage a team member’s roles.
  • Automate the addition of users to a team when they join your organization.
  • Manage team storage with the team’s dashboard at https://wandb.ai/<team-name>.

Create a team

Use your organization’s dashboard to create a team:

  1. Navigate to https://wandb.ai/home.
  2. Select Create a team to collaborate on the left navigation panel underneath Teams.
  3. Provide a name for your team in the Team name field in the modal that appears.
  4. Choose a storage type.
  5. Select the Create team button.

After you select Create team button, W&B redirects you to a new team page at https://wandb.ai/<team-name>. Where <team-name> consists of the name you provide when you create a team.

Once you have a team, you can add users to that team.

Invite users to a team

Invite users to a team in your organization. Use the team’s dashboard to invite users using their email address or W&B username if they already have a W&B account.

  1. Navigate to https://wandb.ai/<team-name>.
  2. Select Team settings in the global navigation on the left side of the dashboard.
  3. Select the Users tab.
  4. Choose on Invite a new user.
  5. Within the modal that appears, provide the email of the user in the Email or username field and select the role to assign to that user from the Select a team role dropdown. For more information about roles a user can have in a team, see Team roles.
  6. Choose on the Send invite button.

In addition to inviting users manually with email invites, you can automatically add new users to a team if the new user’s email matches the domain of your organization.

Match members to a team organization during sign up

Allow new users within your organization discover Teams within your organization when they sign-up. New users must have a verified email domain that matches your organization’s verified email domain. Verified new users can view a list of verified teams that belong to an organization when they sign up for a W&B account.

An organization administrator must enable domain claiming. To enable domain capture, see the steps described in Domain capture.

Assign or update a team member’s role

  1. Select the account type icon next to the name of the team member.
  2. From the drop-down, choose the account type you want that team member to posses.

This table lists the roles you can assign to a member of a team:

Role Definition
Administrator A user who can add and remove other users in the team, change user roles, and configure team settings.
Member A regular user of a team, invited by email or their organization-level username by the team administrator. A member user cannot invite other users to the team.
View-Only (Enterprise-only feature) A view-only user of a team, invited by email or their organization-level username by the team administrator. A view-only user only has read access to the team and its contents.
Service (Enterprise-only feature) A service worker or service account is an API key that is useful for utilizing W&B with your run automation tools. If you use an API key from a service account for your team, ensure to set the environment variable WANDB_USERNAME to correctly attribute runs to the appropriate user.
Custom Roles (Enterprise-only feature) Custom roles allow organization administrators to compose new roles by inheriting from the preceding View-Only or Member roles, and adding additional permissions to achieve fine-grained access control. Team administrators can then assign any of those custom roles to users in their respective teams. Refer to this article for details.

Remove users from a team

Remove a user from a team using the team’s dashboard. W&B preserves runs created in a team even if the member who created the runs is no longer on that team.

  1. Navigate to https://wandb.ai/<team-name>.
  2. Select Team settings in the left navigation bar.
  3. Select the Users tab.
  4. Hover your mouse next to the name of the user you want to delete. Select the ellipses or three dots icon () when it appears.
  5. From the dropdown, select Remove user.

2.2.2 - Manage access control for projects

Manage project access using visibility scopes and project-level roles

Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it.

You can use a combination of a couple of controls to configure the access level for any project within a W&B team. Visibility scope is the higher-level mechanism. Use that to control which groups of users can view or submit runs in a project. For a project with Team or Restricted visibility scope, you can then use Project level roles to control the level of access that each user has within the project.

Visibility scopes

There are four project visibility scopes you can choose from. In order of most public to most private, they are:

Scope Description
Open Anyone who knows about the project can view it and submit runs or reports.
Public Anyone who knows about the project can view it. Only your team can submit runs or reports.
Team Only members of the parent team can view the project and submit runs or reports. Anyone outside the team can not access the project.
Restricted Only invited members from the parent team can view the project and submit runs or reports.

Set visibility scope on a new or existing project

Set a project’s visibility scope when you create a project or when editing it later.

Set visibility scope when you create a new project

  1. Navigate to your W&B organization on SaaS Cloud, Dedicated Cloud, or Self-managed instance.
  2. Click the Create a new project button in the left hand sidebar’s My projects section. Alternatively, navigate to the Projects tab of your team and click the Create new project button in the upper right hand corner.
  3. After selecting the parent team and entering the name of the project, select the desired scope from the Project Visibility dropdown.

Complete the following step if you select Restricted visibility.

  1. Provide names of one or more W&B team members in the Invite team members field. Add only those members who are essential to collaborate on the project.

Edit visibility scope of an existing project

  1. Navigate to your W&B Project.
  2. Select the Overview tab on the left column.
  3. Click the Edit Project Details button on the upper right corner.
  4. From the Project Visibility dropdown, select the desired scope.

Complete the following step if you select Restricted visibility.

  1. Go to the Users tab in the project, and click Add user button to invite specific users to the restricted project.

Other key things to note for restricted scope

  • If you want to use a team-level service account in a restricted project, you should invite or add that specifically to the project. Otherwise a team-level service account can not access a restricted project by default.
  • You can not move runs from a restricted project, but you can move runs from a non-restricted project to a restricted one.
  • You can convert the visibility of a restricted project to only Team scope, irrespective of the team privacy setting Make all future team projects private (public sharing not allowed).
  • If the owner of a restricted project is not part of the parent team anymore, the team admin should change the owner to ensure seamless operations in the project.

Project level roles

For the Team or Restricted scoped projects in your team, you can assign a specific role to a user, which could be different from that user’s team level role. For example, if a user has Member role at the team level, you can assign the View-Only, or Admin, or any available custom role to that user within a Team or Restricted scope project in that team.

Assign project level role to a user

  1. Navigate to your W&B Project.
  2. Select the Overview tab on the left column.
  3. Go to the Users tab in the project.
  4. Click the currently assigned role for the pertinent user in the Project Role field, which should open up a dropdown listing the other available roles.
  5. Select another role from the dropdown. It should save instantly.

Other key things to note for project level roles

  • By default, project level roles for all users in a team or restricted scoped project inherit their respective team level roles.
  • You can not change the project level role of a user who has View-only role at the team level.
  • If the project level role for a user within a particular project is same as the team level role, and at some point if a team admin changes the team level role, the relevant project role is automatically changed to track the team level role.
  • If you change the project level role for a user within a particular project such that it is different from the team level role, and at some point if a team admin changes the team level role, the relevant project level role remains as is.
  • If you remove a user from a restricted project when their project level role was different from the team level role, and if you then add the user back to the project after some time, they would inherit the team level role due to the default behavior. If needed, you would need to change the project level role again to be different from the team level role.

2.3 - Automate user and team management

SCIM API

Use SCIM API to manage users, and the teams they belong to, in an efficient and repeatable manner. You can also use the SCIM API to manage custom roles or assign roles to users in your W&B organization. Role endpoints are not part of the official SCIM schema. W&B adds role endpoints to support automated management of custom roles.

SCIM API is especially useful if you want to:

  • manage user provisioning and de-provisioning at scale
  • manage users with a SCIM-supporting Identity Provider

There are broadly three categories of SCIM API - User, Group, and Roles.

User SCIM API

User SCIM API allows for creating, deactivating, getting the details of a user, or listing all users in a W&B organization. This API also supports assigning predefined or custom roles to users in an organization.

Group SCIM API

Group SCIM API allows for managing W&B teams, including creating or removing teams in an organization. Use the PATCH Group to add or remove users in an existing team.

Custom role API

Custom role SCIM API allows for managing custom roles, including creating, listing, or updating custom roles in an organization.

W&B Python SDK API

Just like how SCIM API allows you to automate user and team management, you can also use some of the methods available in the W&B Python SDK API for that purpose. Keep a note of the following methods:

Method name Purpose
create_user(email, admin=False) Add a user to the organization and optionally make them the organization admin.
user(userNameOrEmail) Return an existing user in the organization.
user.teams() Return the teams for the user. You can get the user object using the user(userNameOrEmail) method.
create_team(teamName, adminUserName) Create a new team and optionally make an organization-level user the team admin.
team(teamName) Return an existing team in the organization.
Team.invite(userNameOrEmail, admin=False) Add a user to the team. You can get the team object using the team(teamName) method.
Team.create_service_account(description) Add a service account to the team. You can get the team object using the team(teamName) method.
Member.delete() Remove a member user from a team. You can get the list of member objects in a team using the team object’s members attribute. And you can get the team object using the team(teamName) method.

2.4 - Manage users, groups, and roles with SCIM

The System for Cross-domain Identity Management (SCIM) API allows instance or organization admins to manage users, groups, and custom roles in their W&B organization. SCIM groups map to W&B teams.

The SCIM API is accessible at <host-url>/scim/ and supports the /Users and /Groups endpoints with a subset of the fields found in the RC7643 protocol. It additionally includes the /Roles endpoints which are not part of the official SCIM schema. W&B adds the /Roles endpoints to support automated management of custom roles in W&B organizations.

Authentication

The SCIM API is accessible by instance or organization admins using basic authentication with their API key. With basic authentication, send the HTTP request with the Authorization header that contains the word Basic followed by a space and a base64-encoded string for username:password where password is your API key. For example, to authorize as demo:p@55w0rd, the header should be Authorization: Basic ZGVtbzpwQDU1dzByZA==.

User resource

The SCIM user resource maps to W&B users.

Get user

  • Endpoint: <host-url>/scim/Users/{id}
  • Method: GET
  • Description: Retrieve the information for a specific user in your SaaS Cloud organization or your Dedicated Cloud or Self-managed instance by providing the user’s unique ID.
  • Request Example:
GET /scim/Users/abc
  • Response Example:
(Status 200)
{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1"
}

List users

  • Endpoint: <host-url>/scim/Users
  • Method: GET
  • Description: Retrieve the list of all users in your SaaS Cloud organization or your Dedicated Cloud or Self-managed instance.
  • Request Example:
GET /scim/Users
  • Response Example:
(Status 200)
{
    "Resources": [
        {
            "active": true,
            "displayName": "Dev User 1",
            "emails": {
                "Value": "dev-user1@test.com",
                "Display": "",
                "Type": "",
                "Primary": true
            },
            "id": "abc",
            "meta": {
                "resourceType": "User",
                "created": "2023-10-01T00:00:00Z",
                "lastModified": "2023-10-01T00:00:00Z",
                "location": "Users/abc"
            },
            "schemas": [
                "urn:ietf:params:scim:schemas:core:2.0:User"
            ],
            "userName": "dev-user1"
        }
    ],
    "itemsPerPage": 9999,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 1
}

Create user

  • Endpoint: <host-url>/scim/Users
  • Method: POST
  • Description: Create a new user resource.
  • Supported Fields:
Field Type Required
emails Multi-Valued Array Yes (Make sure primary email is set)
userName String Yes
  • Request Example:
POST /scim/Users
{
  "schemas": [
    "urn:ietf:params:scim:schemas:core:2.0:User"
  ],
  "emails": [
    {
      "primary": true,
      "value": "admin-user2@test.com"
    }
  ],
  "userName": "dev-user2"
}
  • Response Example:
(Status 201)
{
    "active": true,
    "displayName": "Dev User 2",
    "emails": {
        "Value": "dev-user2@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "def",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "location": "Users/def"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user2"
}

Delete user

  • Endpoint: <host-url>/scim/Users/{id}
  • Method: DELETE
  • Description: Fully delete a user from your SaaS Cloud organization or your Dedicated Cloud or Self-managed instance by providing the user’s unique ID. Use the Create user API to add the user again to the organization or instance if needed.
  • Request Example:
DELETE /scim/Users/abc
  • Response Example:
(Status 204)

Deactivate user

  • Endpoint: <host-url>/scim/Users/{id}
  • Method: PATCH
  • Description: Temporarily deactivate a user in your Dedicated Cloud or Self-managed instance by providing the user’s unique ID. Use the Reactivate user API to reactivate the user when needed.
  • Supported Fields:
Field Type Required
op String Type of operation. The only allowed value is replace.
value Object Object {"active": false} indicating that the user should be deactivated.
  • Request Example:
PATCH /scim/Users/abc
{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "value": {"active": false}
        }
    ]
}
  • Response Example: This returns the User object.
(Status 200)
{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1"
}

Reactivate user

  • Endpoint: <host-url>/scim/Users/{id}
  • Method: PATCH
  • Description: Reactivate a deactivated user in your Dedicated Cloud or Self-managed instance by providing the user’s unique ID.
  • Supported Fields:
Field Type Required
op String Type of operation. The only allowed value is replace.
value Object Object {"active": true} indicating that the user should be reactivated.
  • Request Example:
PATCH /scim/Users/abc
{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "value": {"active": true}
        }
    ]
}
  • Response Example: This returns the User object.
(Status 200)
{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1"
}

Assign organization-level role to user

  • Endpoint: <host-url>/scim/Users/{id}
  • Method: PATCH
  • Description: Assign an organization-level role to a user. The role can be one of admin, viewer or member as described here. For SaaS Cloud, ensure that you have configured the correct organization for SCIM API in user settings.
  • Supported Fields:
Field Type Required
op String Type of operation. The only allowed value is replace.
path String The scope at which role assignment operation takes effect. The only allowed value is organizationRole.
value String The predefined organization-level role to assign to the user. It can be one of admin, viewer or member. This field is case insensitive for predefined roles.
  • Request Example:
PATCH /scim/Users/abc
{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "path": "organizationRole",
            "value": "admin" // will set the user's organization-scoped role to admin
        }
    ]
}
  • Response Example: This returns the User object.
(Status 200)
{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1",
    "teamRoles": [  // Returns the user's roles in all the teams that they are a part of
        {
            "teamName": "team1",
            "roleName": "admin"
        }
    ],
    "organizationRole": "admin" // Returns the user's role at the organization scope
}

Assign team-level role to user

  • Endpoint: <host-url>/scim/Users/{id}
  • Method: PATCH
  • Description: Assign a team-level role to a user. The role can be one of admin, viewer, member or a custom role as described here. For SaaS Cloud, ensure that you have configured the correct organization for SCIM API in user settings.
  • Supported Fields:
Field Type Required
op String Type of operation. The only allowed value is replace.
path String The scope at which role assignment operation takes effect. The only allowed value is teamRoles.
value Object array A one-object array where the object consists of teamName and roleName attributes. The teamName is the name of the team where the user holds the role, and roleName can be one of admin, viewer, member or a custom role. This field is case insensitive for predefined roles and case sensitive for custom roles.
  • Request Example:
PATCH /scim/Users/abc
{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "path": "teamRoles",
            "value": [
                {
                    "roleName": "admin", // role name is case insensitive for predefined roles and case sensitive for custom roles
                    "teamName": "team1" // will set the user's role in the team team1 to admin
                }
            ]
        }
    ]
}
  • Response Example: This returns the User object.
(Status 200)
{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1",
    "teamRoles": [  // Returns the user's roles in all the teams that they are a part of
        {
            "teamName": "team1",
            "roleName": "admin"
        }
    ],
    "organizationRole": "admin" // Returns the user's role at the organization scope
}

Group resource

The SCIM group resource maps to W&B teams, that is, when you create a SCIM group in a W&B deployment, it creates a W&B team. Same applies to other group endpoints.

Get team

  • Endpoint: <host-url>/scim/Groups/{id}
  • Method: GET
  • Description: Retrieve team information by providing the team’s unique ID.
  • Request Example:
GET /scim/Groups/ghi
  • Response Example:
(Status 200)
{
    "displayName": "wandb-devs",
    "id": "ghi",
    "members": [
        {
            "Value": "abc",
            "Ref": "",
            "Type": "",
            "Display": "dev-user1"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Groups/ghi"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

List teams

  • Endpoint: <host-url>/scim/Groups
  • Method: GET
  • Description: Retrieve a list of teams.
  • Request Example:
GET /scim/Groups
  • Response Example:
(Status 200)
{
    "Resources": [
        {
            "displayName": "wandb-devs",
            "id": "ghi",
            "members": [
                {
                    "Value": "abc",
                    "Ref": "",
                    "Type": "",
                    "Display": "dev-user1"
                }
            ],
            "meta": {
                "resourceType": "Group",
                "created": "2023-10-01T00:00:00Z",
                "lastModified": "2023-10-01T00:00:00Z",
                "location": "Groups/ghi"
            },
            "schemas": [
                "urn:ietf:params:scim:schemas:core:2.0:Group"
            ]
        }
    ],
    "itemsPerPage": 9999,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 1
}

Create team

  • Endpoint: <host-url>/scim/Groups
  • Method: POST
  • Description: Create a new team resource.
  • Supported Fields:
Field Type Required
displayName String Yes
members Multi-Valued Array Yes (value sub-field is required and maps to a user ID)
  • Request Example:

Creating a team called wandb-support with dev-user2 as its member.

POST /scim/Groups
{
    "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
    "displayName": "wandb-support",
    "members": [
        {
            "value": "def"
        }
    ]
}
  • Response Example:
(Status 201)
{
    "displayName": "wandb-support",
    "id": "jkl",
    "members": [
        {
            "Value": "def",
            "Ref": "",
            "Type": "",
            "Display": "dev-user2"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Groups/jkl"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

Update team

  • Endpoint: <host-url>/scim/Groups/{id}
  • Method: PATCH
  • Description: Update an existing team’s membership list.
  • Supported Operations: add member, remove member
  • Request Example:

Adding dev-user2 to wandb-devs

PATCH /scim/Groups/ghi
{
	"schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
	"Operations": [
		{
			"op": "add",
			"path": "members",
			"value": [
	      {
					"value": "def",
				}
	    ]
		}
	]
}
  • Response Example:
(Status 200)
{
    "displayName": "wandb-devs",
    "id": "ghi",
    "members": [
        {
            "Value": "abc",
            "Ref": "",
            "Type": "",
            "Display": "dev-user1"
        },
        {
            "Value": "def",
            "Ref": "",
            "Type": "",
            "Display": "dev-user2"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:01:00Z",
        "location": "Groups/ghi"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

Delete team

  • Deleting teams is currently unsupported by the SCIM API since there is additional data linked to teams. Delete teams from the app to confirm you want everything deleted.

Role resource

The SCIM role resource maps to W&B custom roles. As mentioned earlier, the /Roles endpoints are not part of the official SCIM schema, W&B adds /Roles endpoints to support automated management of custom roles in W&B organizations.

Get custom role

  • Endpoint: <host-url>/scim/Roles/{id}
  • Method: GET
  • Description: Retrieve information for a custom role by providing the role’s unique ID.
  • Request Example:
GET /scim/Roles/abc
  • Response Example:
(Status 200)
{
    "description": "A sample custom role for example",
    "id": "Um9sZTo3",
    "inheritedFrom": "member", // indicates the predefined role
    "meta": {
        "resourceType": "Role",
        "created": "2023-11-20T23:10:14Z",
        "lastModified": "2023-11-20T23:31:23Z",
        "location": "Roles/Um9sZTo3"
    },
    "name": "Sample custom role",
    "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
    "permissions": [
        {
            "name": "artifact:read",
            "isInherited": true // inherited from member predefined role
        },
        ...
        ...
        {
            "name": "project:update",
            "isInherited": false // custom permission added by admin
        }
    ],
    "schemas": [
        ""
    ]
}

List custom roles

  • Endpoint: <host-url>/scim/Roles
  • Method: GET
  • Description: Retrieve information for all custom roles in the W&B organization
  • Request Example:
GET /scim/Roles
  • Response Example:
(Status 200)
{
   "Resources": [
        {
            "description": "A sample custom role for example",
            "id": "Um9sZTo3",
            "inheritedFrom": "member", // indicates the predefined role that the custom role inherits from
            "meta": {
                "resourceType": "Role",
                "created": "2023-11-20T23:10:14Z",
                "lastModified": "2023-11-20T23:31:23Z",
                "location": "Roles/Um9sZTo3"
            },
            "name": "Sample custom role",
            "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
            "permissions": [
                {
                    "name": "artifact:read",
                    "isInherited": true // inherited from member predefined role
                },
                ...
                ...
                {
                    "name": "project:update",
                    "isInherited": false // custom permission added by admin
                }
            ],
            "schemas": [
                ""
            ]
        },
        {
            "description": "Another sample custom role for example",
            "id": "Um9sZToxMg==",
            "inheritedFrom": "viewer", // indicates the predefined role that the custom role inherits from
            "meta": {
                "resourceType": "Role",
                "created": "2023-11-21T01:07:50Z",
                "location": "Roles/Um9sZToxMg=="
            },
            "name": "Sample custom role 2",
            "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
            "permissions": [
                {
                    "name": "launchagent:read",
                    "isInherited": true // inherited from viewer predefined role
                },
                ...
                ...
                {
                    "name": "run:stop",
                    "isInherited": false // custom permission added by admin
                }
            ],
            "schemas": [
                ""
            ]
        }
    ],
    "itemsPerPage": 9999,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 2
}

Create custom role

  • Endpoint: <host-url>/scim/Roles
  • Method: POST
  • Description: Create a new custom role in the W&B organization.
  • Supported Fields:
Field Type Required
name String Name of the custom role
description String Description of the custom role
permissions Object array Array of permission objects where each object includes a name string field that has value of the form w&bobject:operation. For example, a permission object for delete operation on W&B runs would have name as run:delete.
inheritedFrom String The predefined role which the custom role would inherit from. It can either be member or viewer.
  • Request Example:
POST /scim/Roles
{
    "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Role"],
    "name": "Sample custom role",
    "description": "A sample custom role for example",
    "permissions": [
        {
            "name": "project:update"
        }
    ],
    "inheritedFrom": "member"
}
  • Response Example:
(Status 201)
{
    "description": "A sample custom role for example",
    "id": "Um9sZTo3",
    "inheritedFrom": "member", // indicates the predefined role
    "meta": {
        "resourceType": "Role",
        "created": "2023-11-20T23:10:14Z",
        "lastModified": "2023-11-20T23:31:23Z",
        "location": "Roles/Um9sZTo3"
    },
    "name": "Sample custom role",
    "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
    "permissions": [
        {
            "name": "artifact:read",
            "isInherited": true // inherited from member predefined role
        },
        ...
        ...
        {
            "name": "project:update",
            "isInherited": false // custom permission added by admin
        }
    ],
    "schemas": [
        ""
    ]
}

Delete custom role

  • Endpoint: <host-url>/scim/Roles/{id}
  • Method: DELETE
  • Description: Delete a custom role in the W&B organization. Use it with caution. The predefined role from which the custom role inherited is now assigned to all users that were assigned the custom role before the operation.
  • Request Example:
DELETE /scim/Roles/abc
  • Response Example:
(Status 204)

Update custom role permissions

  • Endpoint: <host-url>/scim/Roles/{id}
  • Method: PATCH
  • Description: Add or remove custom permissions in a custom role in the W&B organization.
  • Supported Fields:
Field Type Required
operations Object array Array of operation objects
op String Type of operation within the operation object. It can either be add or remove.
path String Static field in the operation object. Only value allowed is permissions.
value Object array Array of permission objects where each object includes a name string field that has value of the form w&bobject:operation. For example, a permission object for delete operation on W&B runs would have name as run:delete.
  • Request Example:
PATCH /scim/Roles/abc
{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "add", // indicates the type of operation, other possible value being `remove`
            "path": "permissions",
            "value": [
                {
                    "name": "project:delete"
                }
            ]
        }
    ]
}
  • Response Example:
(Status 200)
{
    "description": "A sample custom role for example",
    "id": "Um9sZTo3",
    "inheritedFrom": "member", // indicates the predefined role
    "meta": {
        "resourceType": "Role",
        "created": "2023-11-20T23:10:14Z",
        "lastModified": "2023-11-20T23:31:23Z",
        "location": "Roles/Um9sZTo3"
    },
    "name": "Sample custom role",
    "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
    "permissions": [
        {
            "name": "artifact:read",
            "isInherited": true // inherited from member predefined role
        },
        ...
        ...
        {
            "name": "project:update",
            "isInherited": false // existing custom permission added by admin before the update
        },
        {
            "name": "project:delete",
            "isInherited": false // new custom permission added by admin as part of the update
        }
    ],
    "schemas": [
        ""
    ]
}

Update custom role metadata

  • Endpoint: <host-url>/scim/Roles/{id}
  • Method: PUT
  • Description: Update the name, description or inherited role for a custom role in the W&B organization. This operation doesn’t affect any of the existing, that is, non-inherited custom permissions in the custom role.
  • Supported Fields:
Field Type Required
name String Name of the custom role
description String Description of the custom role
inheritedFrom String The predefined role which the custom role inherits from. It can either be member or viewer.
  • Request Example:
PUT /scim/Roles/abc
{
    "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Role"],
    "name": "Sample custom role",
    "description": "A sample custom role for example but now based on viewer",
    "inheritedFrom": "viewer"
}
  • Response Example:
(Status 200)
{
    "description": "A sample custom role for example but now based on viewer", // changed the descripton per the request
    "id": "Um9sZTo3",
    "inheritedFrom": "viewer", // indicates the predefined role which is changed per the request
    "meta": {
        "resourceType": "Role",
        "created": "2023-11-20T23:10:14Z",
        "lastModified": "2023-11-20T23:31:23Z",
        "location": "Roles/Um9sZTo3"
    },
    "name": "Sample custom role",
    "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
    "permissions": [
        {
            "name": "artifact:read",
            "isInherited": true // inherited from viewer predefined role
        },
        ... // Any permissions that are in member predefined role but not in viewer will not be inherited post the update
        {
            "name": "project:update",
            "isInherited": false // custom permission added by admin
        },
        {
            "name": "project:delete",
            "isInherited": false // custom permission added by admin
        }
    ],
    "schemas": [
        ""
    ]
}

2.5 - Advanced IAM configuration

In addition to basic environment variables, you can use environment variables to configure IAM options for your Dedicated Cloud or Self-managed instance.

Choose any of the following environment variables for your instance depending on your IAM needs.

Environment variable Description
DISABLE_SSO_PROVISIONING Set this to true to turn off user auto-provisioning in your W&B instance.
SESSION_LENGTH If you would like to change the default user session expiry time, set this variable to the desired number of hours. For example, set SESSION_LENGTH to 24 to configure session expiry time to 24 hours. The default value is 720 hours.
GORILLA_ENABLE_SSO_GROUP_CLAIMS If you are using OIDC based SSO, set this variable to true to automate W&B team membership in your instance based on your OIDC groups. Add a groups claim to user OIDC token. It should be a string array where each entry is the name of a W&B team that the user should belong to. The array should include all the teams that a user is a part of.
GORILLA_LDAP_GROUP_SYNC If you are using LDAP based SSO, set it to true to automate W&B team membership in your instance based on your LDAP groups.
GORILLA_OIDC_CUSTOM_SCOPES If you are using OIDC based SSO, you can specify additional scopes that W&B instance should request from your identity provider. W&B does not change the SSO functionality due to these custom scopes in any way.
GORILLA_USE_IDENTIFIER_CLAIMS If you are using OIDC based SSO, set this variable to true to enforce username and full name of your users using specific OIDC claims from your identity provider. If set, ensure that you configure the enforced username and full name in the preferred_username and name OIDC claims respectively. Usernames can only contain alphanumeric characters along with underscores and hyphens as special characters.
GORILLA_DISABLE_PERSONAL_ENTITY Set this to true to turn off personal user projects in your W&B instance. If set, users can not create new personal projects in their personal entities, plus writes to existing personal projects are turned off.
GORILLA_DISABLE_ADMIN_TEAM_ACCESS Set this to true to restrict Organization or Instance Admins from self-joining or adding themselves to a W&B team, thus ensuring that only Data & AI personas have access to the projects within the teams.

3 - Data security

3.1 - Bring your own bucket (BYOB)

Bring your own bucket (BYOB) allows you to store W&B artifacts and other related sensitive data in your own cloud or on-prem infrastructure. In case of Dedicated cloud or SaaS Cloud, data that you store in your bucket is not copied to the W&B managed infrastructure.

Configuration options

There are two scopes you can configure your storage bucket to: at the Instance level or at a Team level.

  • Instance level: Any user that has relevant permissions within your organization can access files stored in your instance level storage bucket.
  • Team level: Members of a W&B Team can access files stored in the bucket configured at the Team level. Team level storage buckets allow greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements.

You can configure your bucket at both the instance level and separately for one or more teams within your organization.

For example, suppose you have a team called Kappa in your organization. Your organization (and Team Kappa) use the Instance level storage bucket by default. Next, you create a team called Omega. When you create Team Omega, you configure a Team level storage bucket for that team. Files generated by Team Omega are not accessible by Team Kappa. However, files created by Team Kappa are accessible by Team Omega. If you want to isolate data for Team Kappa, you must configure a Team level storage bucket for them as well.

Availability matrix

The following table shows the availability of BYOB across different W&B Server deployment types. An X means the feature is available on the specific deployment type.

W&B Server deployment type Instance level Team level Additional information
Dedicated cloud X X Both the instance and team level BYOB are available for Amazon Web Services, Google Cloud Platform and Microsoft Azure. For the team-level BYOB, you can connect to a cloud-native storage bucket in the same or another cloud, or even a S3-compatible secure storage like MinIO hosted in your cloud or on-prem infrastructure.
SaaS Cloud Not Applicable X The team level BYOB is available only for Amazon Web Services and Google Cloud Platform. W&B fully manages the default and only storage bucket for Microsoft Azure.
Self-managed X X Instance level BYOB is the default since the instance is fully managed by you. If your self-managed instance is in cloud, you can connect to a cloud-native storage bucket in the same or another cloud for the team-level BYOB. You can also use S3-compatible secure storage like MinIO for either of instance or team-level BYOB.

Cross-cloud or S3-compatible storage for team-level BYOB

You can connect to a cloud-native storage bucket in another cloud or to an S3-compatible storage bucket like MinIO for team-level BYOB in your Dedicated cloud or Self-managed instance.

To enable the use of cross-cloud or S3-compatible storage, specify the storage bucket including the relevant access key in one of the following formats, using the GORILLA_SUPPORTED_FILE_STORES environment variable for your W&B instance.

Configure an S3-compatible storage for team-level BYOB in Dedicated cloud or Self-managed instance

Specify the path using the following format:

s3://<accessKey>:<secretAccessKey>@<url_endpoint>/<bucketName>?region=<region>?tls=true

The region parameter is mandatory, except for when your W&B instance is in AWS and the AWS_REGION configured on the W&B instance nodes matches the region configured for the S3-compatible storage.

Configure a cross-cloud native storage for team-level BYOB in Dedicated cloud or Self-managed instance

Specify the path in a format specific to the locations of your W&B instance and storage bucket:

From W&B instance in GCP or Azure to a bucket in AWS:

s3://<accessKey>:<secretAccessKey>@<s3_regional_url_endpoint>/<bucketName>

From W&B instance in GCP or AWS to a bucket in Azure:

az://:<urlEncodedAccessKey>@<storageAccountName>/<containerName>

From W&B instance in AWS or Azure to a bucket in GCP:

gs://<serviceAccountEmail>:<urlEncodedPrivateKey>@<bucketName>

Reach out to W&B Support at support@wandb.com for more information.

Cloud storage in same cloud as W&B platform

Based on your use case, configure a storage bucket at the team or instance level. How a storage bucket is provisioned or configured is the same irrespective of the level it’s configured at, except for the access mechanism in Azure.

  1. Provision the KMS Key

    W&B requires you to provision a KMS Key to encrypt and decrypt the data on the S3 bucket. The key usage type must be ENCRYPT_DECRYPT. Assign the following policy to the key:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid" : "Internal",
          "Effect" : "Allow",
          "Principal" : { "AWS" : "<Your_Account_Id>" },
          "Action" : "kms:*",
          "Resource" : "<aws_kms_key.key.arn>"
        },
        {
          "Sid" : "External",
          "Effect" : "Allow",
          "Principal" : { "AWS" : "<aws_principal_and_role_arn>" },
          "Action" : [
            "kms:Decrypt",
            "kms:Describe*",
            "kms:Encrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*"
          ],
          "Resource" : "<aws_kms_key.key.arn>"
        }
      ]
    }
    

    Replace <Your_Account_Id> and <aws_kms_key.key.arn> accordingly.

    If you are using SaaS Cloud or Dedicated cloud, replace <aws_principal_and_role_arn> with the corresponding value:

    This policy grants your AWS account full access to the key and also assigns the required permissions to the AWS account hosting the W&B Platform. Keep a record of the KMS Key ARN.

  2. Provision the S3 Bucket

    Follow these steps to provision the S3 bucket in your AWS account:

    1. Create the S3 bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

    2. Enable bucket versioning.

    3. Enable server side encryption, using the KMS key from the previous step.

    4. Configure CORS with the following policy:

      [
          {
              "AllowedHeaders": [
                  "*"
              ],
              "AllowedMethods": [
                  "GET",
                  "HEAD",
                  "PUT"
              ],
              "AllowedOrigins": [
                  "*"
              ],
              "ExposeHeaders": [
                  "ETag"
              ],
              "MaxAgeSeconds": 3600
          }
      ]
      
    5. Grant the required S3 permissions to the AWS account hosting the W&B Platform, which requires these permissions to generate pre-signed URLs that AI workloads in your cloud infrastructure or user browsers utilize to access the bucket.

      {
        "Version": "2012-10-17",
        "Id": "WandBAccess",
        "Statement": [
          {
            "Sid": "WAndBAccountAccess",
            "Effect": "Allow",
            "Principal": { "AWS": "<aws_principal_and_role_arn>" },
              "Action" : [
                "s3:GetObject*",
                "s3:GetEncryptionConfiguration",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:GetBucketCORS",
                "s3:GetBucketLocation",
                "s3:GetBucketVersioning"
              ],
            "Resource": [
              "arn:aws:s3:::<wandb_bucket>",
              "arn:aws:s3:::<wandb_bucket>/*"
            ]
          }
        ]
      }
      

      Replace <wandb_bucket> accordingly and keep a record of the bucket name. If you are using Dedicated cloud, share the bucket name with your W&B team in case of instance level BYOB. In case of team level BYOB on any deployment type, configure the bucket while creating the team.

      If you are using SaaS Cloud or Dedicated cloud, replace <aws_principal_and_role_arn> with the corresponding value.

For more details, see the AWS self-managed hosting guide.

  1. Provision the GCS Bucket

    Follow these steps to provision the GCS bucket in your GCP project:

    1. Create the GCS bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

    2. Enable soft deletion.

    3. Enable object versioning.

    4. Set encryption type to Google-managed.

    5. Set the CORS policy with gsutil. This is not possible in the UI.

    6. Create a file called cors-policy.json locally.

    7. Copy the following CORS policy into the file and save it.

      [
      {
        "origin": ["*"],
        "responseHeader": ["Content-Type"],
        "exposeHeaders": ["ETag"],
        "method": ["GET", "HEAD", "PUT"],
        "maxAgeSeconds": 3600
      }
      ]
      
    8. Replace <bucket_name> with the correct bucket name and run gsutil.

      gsutil cors set cors-policy.json gs://<bucket_name>
      
    9. Verify the bucket’s policy. Replace <bucket_name> with the correct bucket name.

      gsutil cors get gs://<bucket_name>
      
  2. If you are using SaaS Cloud or Dedicated cloud, grant the Storage Admin role to the GCP service account linked to the W&B Platform:

    • For SaaS Cloud, the account is: wandb-integration@wandb-production.iam.gserviceaccount.com
    • For Dedicated cloud the account is: deploy@wandb-production.iam.gserviceaccount.com

    Keep a record of the bucket name. If you are using Dedicated cloud, share the bucket name with your W&B team in case of instance level BYOB. In case of team level BYOB on any deployment type, configure the bucket while creating the team.

  1. Provision the Azure Blob Storage

    For the instance level BYOB, if you’re not using this Terraform module, follow the steps below to provision a Azure Blob Storage bucket in your Azure subscription:

    • Create a bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

    • Enable blob and container soft deletion.

    • Enable versioning.

    • Configure the CORS policy on the bucket

      To set the CORS policy through the UI go to the blob storage, scroll down to Settings/Resource Sharing (CORS) and then set the following:

      Parameter Value
      Allowed Origins *
      Allowed Methods GET, HEAD, PUT
      Allowed Headers *
      Exposed Headers *
      Max Age 3600
  2. Generate a storage account access key, and keep a record of that along with the storage account name. If you are using Dedicated cloud, share the storage account name and access key with your W&B team using a secure sharing mechanism.

    For the team level BYOB, W&B recommends that you use Terraform to provision the Azure Blob Storage bucket along with the necessary access mechanism and permissions. If you use Dedicated cloud, provide the OIDC issuer URL for your instance. Make a note of details that you need to configure the bucket while creating the team:

    • Storage account name
    • Storage container name
    • Managed identity client id
    • Azure tenant id

Configure BYOB in W&B

To configure a storage bucket at the team level when you create a W&B Team:

  1. Provide a name for your team in the Team Name field.

  2. Select External storage for the Storage type option.

  3. Choose either New bucket from the dropdown or select an existing bucket.

    Multiple W&B Teams can use the same cloud storage bucket. To enable this, select an existing cloud storage bucket from the dropdown.

  4. From the Cloud provider dropdown, select your cloud provider.

  5. Provide the name of your storage bucket for the Name field. If you have a Dedicated cloud or Self-managed instance on Azure, provide the values for Account name and Container name fields.

  6. (Optional) Provide the bucket sub-path in the optional Path field. Do this if you would not like W&B to store any files in a folder at the root of the bucket.

  7. (Optional if using AWS bucket) Provide the ARN of your KMS encryption key for the KMS key ARN field.

  8. (Optional if using Azure bucket) Provide the values for the Tenant ID and the Managed Identity Client ID fields.

  9. (Optional on SaaS Cloud) Optionally invite team members when creating the team.

  10. Press the Create Team button.

An error or warning appears at the bottom of the page if there are issues accessing the bucket or the bucket has invalid settings.

Reach out to W&B Support at support@wandb.com to configure instance level BYOB for your Dedicated cloud or Self-managed instance.

3.2 - Access BYOB using pre-signed URLs

W&B uses pre-signed URLs to simplify access to blob storage from your AI workloads or user browsers. For basic information on pre-signed URLs, refer to Pre-signed URLs for AWS S3, Signed URLs for Google Cloud Storage and Shared Access Signature for Azure Blob Storage.

When needed, AI workloads or user browser clients within your network request pre-signed URLs from the W&B Platform. W&B Platform then access the relevant blob storage to generate the pre-signed URL with required permissions, and returns it back to the client. The client then uses the pre-signed URL to access the blob storage for object upload or retrieval operations. URL expiry time for object downloads is 1 hour, and it is 24 hours for object uploads as some large objects may need more time to upload in chunks.

Team-level access control

Each pre-signed URL is restricted to specific buckets based on team level access control in the W&B platform. If a user is part of a team which is mapped to a blob storage bucket using secure storage connector, and if that user is part of only that team, then the pre-signed URLs generated for their requests would not have permissions to access blob storage buckets mapped to other teams.

Network restriction

W&B recommends restricting the networks that can use pre-signed URLs to access the blob storage, by using IAM policy based restrictions on the buckets.

In case of AWS, one can use VPC or IP address based network restriction. It ensures that your W&B specific buckets are accessed only from networks where your AI workloads are running, or from gateway IP addresses that map to your user machines if your users access artifacts using the W&B UI.

Audit logs

W&B also recommends to use W&B audit logs in addition to blob storage specific audit logs. For latter, refer to AWS S3 access logs,Google Cloud Storage audit logs and Monitor Azure blob storage. Admin and security teams can use audit logs to keep track of which user is doing what in the W&B product and take necessary action if they determine that some operations need to be limited for certain users.

3.3 - Configure IP allowlisting for Dedicated Cloud

You can restrict access to your Dedicated Cloud instance from only an authorized list of IP addresses. This applies to the access from your AI workloads to the W&B APIs and from your user browsers to the W&B app UI as well. Once IP allowlisting has been set up for your Dedicated Cloud instance, W&B denies any requests from other unauthorized locations. Reach out to your W&B team to configure IP allowlisting for your Dedicated Cloud instance.

IP allowlisting is available on Dedicated Cloud instances on AWS, GCP and Azure.

You can use IP allowlisting with secure private connectivity. If you use IP allowlisting with secure private connectivity, W&B recommends using secure private connectivity for all traffic from your AI workloads and majority of the traffic from your user browsers if possible, while using IP allowlisting for instance administration from privileged locations.

3.4 - Configure private connectivity to Dedicated Cloud

You can connect to your Dedicated Cloud instance over the cloud provider’s secure private network. This applies to the access from your AI workloads to the W&B APIs and optionally from your user browsers to the W&B app UI as well. When using private connectivity, the relevant requests and responses do not transit through the public network or internet.

Secure private connectivity is available on Dedicated Cloud instances on AWS, GCP and Azure:

Once enabled, W&B creates a private endpoint service for your instance and provides you the relevant DNS URI to connect to. With that, you can create private endpoints in your cloud accounts that can route the relevant traffic to the private endpoint service. Private endpoints are easier to setup for your AI training workloads running within your cloud VPC or VNet. To use the same mechanism for traffic from your user browsers to the W&B app UI, you must configure appropriate DNS based routing from your corporate network to the private endpoints in your cloud accounts.

You can use secure private connectivity with IP allowlisting. If you use secure private connectivity for IP allowlisting, W&B recommends that you secure private connectivity for all traffic from your AI workloads and majority of the traffic from your user browsers if possible, while using IP allowlisting for instance administration from privileged locations.

3.5 - Data encryption in Dedicated cloud

W&B uses a W&B-managed cloud-native key to encrypt the W&B-managed database and object storage in every Dedicated cloud, by using the customer-managed encryption key (CMEK) capability in each cloud. In this case, W&B acts as a customer of the cloud provider, while providing the W&B platform as a service to you. Using a W&B-managed key means that W&B has control over the keys that it uses to encrypt the data in each cloud, thus doubling down on its promise to provide a highly safe and secure platform to all of its customers.

W&B uses a unique key to encrypt the data in each customer instance, providing another layer of isolation between Dedicated cloud tenants. The capability is available on AWS, Azure and GCP.

W&B doesn’t generally allow customers to bring their own cloud-native key to encrypt the W&B-managed database and object storage in their Dedicated cloud instance, because multiple teams and personas in an organization could have access to its cloud infrastructure for various reasons. Some of those teams or personas may not have context on W&B as a critical component in the organization’s technology stack, and thus may remove the cloud-native key completely or revoke W&B’s access to it. Such an action could corrupt all data in the organization’s W&B instance and thus leave it in a irrecoverable state.

If your organization needs to use their own cloud-native key to encrypt the W&B-managed database and object storage to approve the use of Dedicated cloud for your AI workflows, W&B can review it on a exception basis. If approved, use of your cloud-native key for encryption would conform to the shared responsibility model of W&B Dedicated cloud. If any user in your organization removes your key or revokes W&B’s access to it at any point when your Dedicated cloud instance is live, W&B would not be liable for any resulting data loss or corruption and also would not be responsible for recovery of such data.

4 - Configure privacy settings

Organization and Team admins can configure a set of privacy settings at the organization and team scopes respectively. When configured at the organization scope, organization admins enforce those settings for all teams in that organization.

Configure privacy settings for a team

Team admins can configure privacy settings for their respective teams from within the Privacy section of the team Settings tab. Each setting is configurable as long as it’s not enforced at the organization scope:

  • Hide this team from all non-members
  • Make all future team projects private (public sharing not allowed)
  • Allow any team member to invite other members (not just admins)
  • Turn off public sharing to outside of team for reports in private projects. This turns off existing magic links.
  • Allow users with matching organization email domain to join this team.
  • Enable code saving by default.

Enforce privacy settings for all teams

Organization admins can enforce privacy settings for all teams in their organization from within the Privacy section of the Settings tab in the account or organization dashboard. If organization admins enforce a setting, team admins are not allowed to configure that within their respective teams.

  • Enforce team visibility restrictions
    • Enable this option to hide all teams from non-members
  • Enforce privacy for future projects
    • Enable this option to enforce all future projects in all teams to be private or restricted
  • Enforce invitation control
    • Enable this option to prevent non-admins from inviting members to any team
  • Enforce report sharing control
    • Enable this option to turn off public sharing of reports in private projects and deactivate existing magic links
  • Enforce team self joining restrictions
    • Enable this option to restrict users with matching organization email domain from self-joining any team
    • This setting is applicable only to SaaS Cloud. It’s not available in Dedicated Cloud or Self-managed instances.
  • Enforce default code saving restrictions
    • Enable this option to turn off code saving by default for all teams

5 - Monitoring and usage

5.1 - Track user activity with audit logs

Use W&B audit logs to track user activity within your organization and to conform to your enterprise governance requirements. Audit logs are available in JSON format. How to access audit logs depends on your W&B platform deployment type:

W&B Platform Deployment type Audit logs access mechanism
Self-managed Synced to instance-level bucket every 10 minutes. Also available using the API.
Dedicated Cloud with secure storage connector (BYOB) Synced to instance-level bucket (BYOB) every 10 minutes. Also available using the API.
Dedicated Cloud with W&B managed storage (without BYOB) Only available using the API.

Once you’ve access to your audit logs, analyze those using your preferred tools, such as Pandas, Amazon Redshift, Google BigQuery, Microsoft Fabric, and more. You may need to transform the JSON-formatted audit logs into a format relevant to the tool before analysis. Information on how to transform your audit logs for specific tools is outside the scope of W&B documentation.

HIPAA compliance requires that you retain audit logs for a minimum of 6 years. For HIPAA-compliant Dedicated Cloud instances with BYOB, you must configure guardrails for your managed storage including any long-term retention storage, to ensure that no internal or external user can delete audit logs before the end of the mandatory retention period.

Audit log schema

The following table lists all the different keys that might be present in your audit logs. Each log contains only the assets relevant to the corresponding action, and others are omitted from the log.

Key Definition
timestamp Time stamp in RFC3339 format. For example: 2023-01-23T12:34:56Z, represents 12:34:56 UTC time on Jan 23, 2023.
action What action did the user take.
actor_user_id If present, ID of the logged-in user who performed the action.
response_code Http response code for the action.
artifact_asset If present, action was taken on this artifact id
artifact_sequence_asset If present, action was taken on this artifact sequence id
entity_asset If present, action was taken on this entity or team id.
project_asset If present, action was taken on this project id.
report_asset If present, action was taken on this report id.
user_asset If present, action was taken on this user asset.
cli_version If the action is taken via python SDK, this will contain the version
actor_ip IP address of the logged-in user.
actor_email if present, action was taken on this actor email.
artifact_digest if present, action was taken on this artifact digest.
artifact_qualified_name if present, action was taken on this artifact.
entity_name if present, action was taken on this entity or team name.
project_name if present, action was taken on this project name.
report_name if present, action was taken on this report name.
user_email if present, action was taken on this user email.

Personally identifiable information (PII), such as email ids and the names of projects, teams, and reports, is available only using the API endpoint option, and can be turned off as described below.

Fetch audit logs using API

An instance admin can fetch the audit logs for your W&B instance using the following API:

  1. Construct the full API endpoint using a combination of the base endpoint <wandb-platform-url>/admin/audit_logs and the following URL parameters:
    • numDays: logs will be fetched starting from today - numdays to most recent; defaults to 0, which returns logs only for today.
    • anonymize: if set to true, remove any PII; defaults to false
  2. Execute HTTP GET request on the constructed full API endpoint, either by directly running it within a modern browser, or by using a tool like Postman, HTTPie, cURL command or more.

If your W&B instance URL is https://mycompany.wandb.io and you would like to get audit logs without PII for user activity within the last week, you must use the API endpoint https://mycompany.wandb.io?numDays=7&anonymize=true.

The API response contains new-line separated JSON objects. Objects will include the fields described in the schema. It’s the same format which is used when syncing audit log files to an instance-level bucket (wherever applicable as mentioned earlier). In those cases, the audit logs are located at the /wandb-audit-logs directory in your bucket.

Actions

The following table describes possible actions that can be recorded by W&B:

Action Definition
artifact:create Artifact is created.
artifact:delete Artifact is deleted.
artifact:read Artifact is read.
project:delete Project is deleted.
project:read Project is read.
report:read Report is read.
run:delete Run is deleted.
run:delete_many Runs are deleted in batch.
run:update_many Runs are updated in batch.
run:stop Run is stopped.
run:undelete_many Runs are brought back from trash in batch.
run:update Run is updated.
sweep:create_agent Sweep agent is created.
team:invite_user User is invited to team.
team:create_service_account Service account is created for the team.
team:create Team is created.
team:uninvite User or service account is uninvited from team.
team:delete Team is deleted.
user:create User is created.
user:delete_api_key API key for the user is deleted.
user:deactivate User is deactivated.
user:create_api_key API key for the user is created.
user:permanently_delete User is permanently deleted.
user:reactivate User is reactivated.
user:update User is updated.
user:read User profile is read.
user:login User logs in.
user:initiate_login User initiates log in.
user:logout User logs out.

5.2 - Use Prometheus monitoring

Use Prometheus with W&B Server. Prometheus installs are exposed as a kubernetes ClusterIP service.

Follow the procedure below to access your Prometheus metrics endpoint (/metrics):

  1. Connect to the cluster with Kubernetes CLI toolkit, kubectl. See kubernetes’ Accessing Clusters documentation for more information.

  2. Find the internal address of the cluster with:

    kubectl describe svc prometheus
    
  3. Start a shell session inside your container running in your Kubernetes cluster with kubectl exec. Hit the endpoint at <internal address>/metrics.

    Copy the command below and execute it in your terminal and replace <internal address> with your internal address:

    kubectl exec <internal address>/metrics
    

A test pod starts, which you can exec into just to access anything in the network:

kubectl run -it testpod --image=alpine bin/ash --restart=Never --rm

From there you can choose to keep access internal to the network or expose it yourself with a kubernetes nodeport service.

5.3 - Configure Slack alerts

Integrate W&B Server with Slack.

Create the Slack application

Follow the procedure below to create a Slack application.

  1. Visit https://api.slack.com/apps and select Create an App.

  2. Provide a name for your app in the App Name field.

  3. Select a Slack workspace where you want to develop your app in. Ensure that the Slack workspace you use is the same workspace you intend to use for alerts.

Configure the Slack application

  1. On the left sidebar, select OAth & Permissions.

  2. Within the Scopes section, provide the bot with the incoming_webhook scope. Scopes give your app permission to perform actions in your development workspace.

    For more information about OAuth scopes for Bots, see the Understanding OAuth scopes for Bots tutorial in the Slack API documentation.

  3. Configure the Redirect URL to point to your W&B installation. Use the same URL that your host URL is set to in your local system settings. You can specify multiple URLs if you have different DNS mappings to your instance.

  4. Select Save URLs.

  5. You can optionally specify an IP range under Restrict API Token Usage, allow-list the IP or IP range of your W&B instances. Limiting the allowed IP address helps further secure your Slack application.

Register your Slack application with W&B

  1. Navigate to the System Settings or System Console page of your W&B instance, depending on your deployment

  2. Depending on the System page you are on follow one of the below options:

    • If you are in the System Console: go to Settings then to Notifications

    • If you are in the System Settings: toggle the Enable a custom Slack application to dispatch alerts to enable a custom Slack application

  3. Supply your Slack client ID and Slack secret then click Save. Navigate to Basic Information in Settings to find your application’s client ID and secret.

  4. Verify that everything is working by setting up a Slack integration in the W&B app.

5.4 - View organization dashboard

View organization usage of W&B

Use the organization dashboard to get a holistic view of users that belong to your organization, how users of your organization use W&B, along with properties such as:

  • Name: The name of the user and their W&B username.
  • Last active: The time the user last used W&B. This includes any activity that requires authentication, including viewing pages in the product, logging runs or taking any other action, or logging in.
  • Role: The role of the user.
  • Email: The email of the user.
  • Team: The names of teams the user belongs to.

View the status of a user

The Last Active column shows if a user is pending an invitation or an active user. A user is one of three states:

  • Invite pending: Admin has sent invite but user has not accepted invitation.
  • Active: User has accepted the invite and created an account.
  • Deactivated: Admin has revoked access of the user.

View and share how your organization uses W&B

View how your organization uses W&B in CSV format.

  1. Select the three dots next to the Add user button.

  2. From the dropdown, select Export as CSV.

This exports a CSV file that lists all users of an organization along with details about the user, such as their user name, time stamp of when they were last active, roles, email, and more.

View user activity

Use the Last Active column to get an Activity summary of an individual user.

  1. Hover your mouse over the Last Active entry for a user.
  2. A tooltip appears and provides a summary of information about the user’s activity.

A user is active if they:

  • log in to W&B.
  • view any page in the W&B App.
  • log runs.
  • use the SDK to track an experiment.
  • interact with the W&B Server in any way.

View active users over time

Use the Users active over time plot in the Organization dashboard to get an aggregate overview of how many users are active over time (right most plot in image below).

You can use the dropdown menu to filter results based on days, months, or all time.

6 - Configure SMTP

In W&B server, adding users to the instance or team will trigger an email invite. To send these email invites, W&B uses a third-party mail server. In some cases, organizations might have strict policies on traffic leaving the corporate network and hence causing these email invites to never be sent to the end user. W&B server offers an option to configure sending these invite emails via an internal SMTP server.

To configure, follow the steps below:

  • Set the GORILLA_EMAIL_SINK environment variable in the docker container or the kubernetes deployment to smtp://<user:password>@smtp.host.com:<port>
  • username and password are optional
  • If you’re using an SMTP server that’s designed to be unauthenticated you would just set the value for the environment variable like GORILLA_EMAIL_SINK=smtp://smtp.host.com:<port>
  • Commonly used port numbers for SMTP are ports 587, 465 and 25. Note that this might differ based on the type of the mail server you’re using.
  • To configure the default sender email address for SMTP, which is initially set to noreply@wandb.com, you can update it to an email address of your choice. This can be done by setting the GORILLA_EMAIL_FROM_ADDRESS environment variable on the server to your desired sender email address.

7 - Configure environment variables

How to configure the W&B Server installation

In addition to configuring instance level settings via the System Settings admin UI, W&B also provides a way to configure these values via code using Environment Variables. Also, refer to advanced configuration for IAM.

Environment variable reference

Environment Variable Description
LICENSE Your wandb/local license
MYSQL The MySQL connection string
BUCKET The S3 / GCS bucket for storing data
BUCKET_QUEUE The SQS / Google PubSub queue for object creation events
NOTIFICATIONS_QUEUE The SQS queue on which to publish run events
AWS_REGION The AWS Region where your bucket lives
HOST The FQD of your instance, that is https://my.domain.net
OIDC_ISSUER A URL to your Open ID Connect identity provider, that is https://cognito-idp.us-east-1.amazonaws.com/us-east-1_uiIFNdacd
OIDC_CLIENT_ID The Client ID of application in your identity provider
OIDC_AUTH_METHOD Implicit (default) or pkce, see below for more context
SLACK_CLIENT_ID The client ID of the Slack application you want to use for alerts
SLACK_SECRET The secret of the Slack application you want to use for alerts
LOCAL_RESTORE You can temporarily set this to true if you’re unable to access your instance. Check the logs from the container for temporary credentials.
REDIS Can be used to setup an external REDIS instance with W&B.
LOGGING_ENABLED When set to true, access logs are streamed to stdout. You can also mount a sidecar container and tail /var/log/gorilla.log without setting this variable.
GORILLA_ALLOW_USER_TEAM_CREATION When set to true, allows non-admin users to create a new team. False by default.
GORILLA_DATA_RETENTION_PERIOD How long to retain deleted data from runs in hours. Deleted run data is unrecoverable. Append an h to the input value. For example, "24h".
ENABLE_REGISTRY_UI When set to true, enables the new W&B Registry UI.

Advanced Reliability Settings

Redis

Configuring an external Redis server is optional but recommended for production systems. Redis helps improve the reliability of the service and enable caching to decrease load times, especially in large projects. Use a managed Redis service such ElastiCache with high availability (HA) and the following specifications:

  • Minimum 4GB of memory, suggested 8GB
  • Redis version 6.x
  • In transit encryption
  • Authentication enabled

To configure the Redis instance with W&B, you can navigate to the W&B settings page at http(s)://YOUR-W&B-SERVER-HOST/system-admin. Enable the “Use an external Redis instance” option, and fill in the Redis connection string in the following format:

Configuring REDIS in W&B

You can also configure Redis using the environment variable REDIS on the container or in your Kubernetes deployment. Alternatively, you could also setup REDIS as a Kubernetes secret.

This page assumes the Redis instance is running at the default port of 6379. If you configure a different port, setup authentication and also want to have TLS enabled on the redis instance the connection string format would look something like: redis://$USER:$PASSWORD@$HOST:$PORT?tls=true

8 - Release process for W&B Server

Release process for W&B Server

Frequency and deployment types

W&B Server releases apply to the Dedicated Cloud and Self-managed deployments. There are three kinds of server releases:

Release type Description
Monthly Monthly releases include new features, enhancements, plus medium and low severity bug fixes.
Patch Patch releases include critical and high severity bug fixes. Patches are only rarely released, as needed.
Feature The feature release targets a specific release date for a new product feature, which occasionally happens before the standard monthly release.

All releases are immediately deployed to all Dedicated Cloud instances once the acceptance testing phase is complete. It keeps those managed instances fully updated, making the latest features and fixes available to relevant customers. Customers with Self-managed instances are responsible for the update process on their own schedule, where they can use the latest Docker image. Refer to release support and end of life.

Release notes

The release notes for all releases are available at W&B Server Releases on GitHub. Customers who use Slack can receive automatic release announcements in their W&B Slack channel. Ask your W&B team to enable these updates.

Release update and downtime

A server release does not generally require instance downtime for Dedicated Cloud instances and for customers with Self-managed deployments who have implemented a proper rolling update process.

Downtime might occur for the following scenarios:

  • A new feature or enhancement requires changes to the underlying infrastructure such as compute, storage or network. W&B tries to send relevant advance notifications to Dedicated Cloud customers.
  • An infrastructure change due to a security patch or to avoid support end-of-life for a particular version. For urgent changes, Dedicated Cloud customers may not receive advance notifications. The priority here is to keep the fleet secure and fully supported.

For both cases, updates roll out to all Dedicated Cloud instances without exception. Customers with Self-managed instances are responsible to manage such updates on their own schedule. Refer to release support and end of life.

Release support and end of life policy

W&B supports every server release for six months from the release date. Dedicated Cloud instances are automatically updated. Customers with Self-managed instances are responsible to update their deployments in time to comply with the support policy. Avoid staying on a version older than six months as it would significantly limit support from W&B.