Read the following sections in this order if you are a first-time user of W&B and you are interested in training, tracking, and visualizing machine learning models and experiments:

Learn about runs, W&B’s basic unit of computation.
Create and track machine learning experiments with Experiments.
Discover W&B’s flexible and lightweight building block for dataset and model versioning with Artifacts.
Automate hyperparameter search and explore the space of possible models with Sweeps.
Manage the model lifecycle from training to production with Registry.
Visualize predictions across model versions with our Data Visualization guide.
Organize runs, embed and automate visualizations, describe your findings, and share updates with collaborators with Reports.

Are you a first-time user of W&B?

Try the quickstart to learn how to install W&B and how to add W&B to your code.

1 - W&B Quickstart

W&B Quickstart

Install W&B to track, visualize, and manage machine learning experiments of any size.

Are you looking for information on W&B Weave? See the Weave Python SDK quickstart or Weave TypeScript SDK quickstart.

To authenticate your machine with W&B, generate an API key from your user profile or at wandb.ai/authorize. Copy the API key and store it securely.

Install the `wandb` library and log in

Set the WANDB_API_KEY environment variable.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb
wandb login
```

pip install wandb

import wandb

wandb.login()

!pip install wandb
import wandb
wandb.login()

Start a run and track hyperparameters

In your Python script or notebook, initialize a W&B run object with wandb.init(). Use a dictionary for the config parameter to specify hyperparameter names and values.

run = wandb.init(
    project="my-awesome-project",  # Specify your project
    config={                        # Track hyperparameters and metadata
        "learning_rate": 0.01,
        "epochs": 10,
    },
)

A run serves as the core element of W&B, used to track metrics, create logs, and more.

Assemble the components

This mock training script logs simulated accuracy and loss metrics to W&B:

import wandb
import random

wandb.login()

# Project that the run is recorded to
project = "my-awesome-project"

# Dictionary with hyperparameters
config = {
    'epochs' : 10,
    'lr' : 0.01
}

with wandb.init(project=project, config=config) as run:
    offset = random.random() / 5
    print(f"lr: {config['lr']}")
    
    # Simulate a training run
    for epoch in range(2, config['epochs']):
        acc = 1 - 2**-config['epochs'] - random.random() / config['epochs'] - offset
        loss = 2**-config['epochs'] + random.random() / config['epochs'] + offset
        print(f"epoch={config['epochs']}, accuracy={acc}, loss={loss}")
        run.log({"accuracy": acc, "loss": loss})

Visit wandb.ai/home to view recorded metrics such as accuracy and loss and how they changed during each training step. The following image shows the loss and accuracy tracked from each run. Each run object appears in the Runs column with generated names.

Shows loss and accuracy tracked from each run.

Next steps

Explore more features of the W&B ecosystem:

Read the W&B Integration tutorials that combine W&B with frameworks like PyTorch, libraries like Hugging Face, and services like SageMaker.
Organize runs, automate visualizations, summarize findings, and share updates with collaborators using W&B Reports.
Create W&B Artifacts to track datasets, models, dependencies, and results throughout your machine learning pipeline.
Automate hyperparameter searches and optimize models with W&B Sweeps.
Analyze runs, visualize model predictions, and share insights on a central dashboard.
Visit W&B AI Academy to learn about LLMs, MLOps, and W&B Models through hands-on courses.
Visit weave-docs.wandb.ai to learn how to track track, experiment with, evaluate, deploy, and improve your LLM-based applications using Weave.

2 - W&B Models

W&B Models is the system of record for ML Practitioners who want to organize their models, boost productivity and collaboration, and deliver production ML at scale.

With W&B Models, you can:

Track and visualize all ML experiments.
Optimize and fine-tune models at scale with hyperparameter sweeps.
Maintain a centralized hub of all models, with a seamless handoff point to devops and deployment
Configure custom automations that trigger key workflows for model CI/CD.

Machine learning practitioners rely on W&B Models as their ML system of record to track and visualize experiments, manage model versions and lineage, and optimize hyperparameters.

2.1 - Experiments

Track machine learning experiments with W&B.

Try in Colab Try in W&B

Track machine learning experiments with a few lines of code. You can then review the results in an interactive dashboard or export your data to Python for programmatic access using our Public API.

Utilize W&B Integrations if you use popular frameworks such as PyTorch, Keras, or Scikit. See our Integration guides for a full list of integrations and information on how to add W&B to your code.

The image above shows an example dashboard where you can view and compare metrics across multiple runs.

How it works

Track a machine learning experiment with a few lines of code:

Create a W&B Run.
Store a dictionary of hyperparameters, such as learning rate or model type, into your configuration (wandb.Run.config).
Log metrics (wandb.Run.log()) over time in a training loop, such as accuracy and loss.
Save outputs of a run, like the model weights or a table of predictions.

The following code demonstrates a common W&B experiment tracking workflow:

# Start a run.
#
# When this block exits, it waits for logged data to finish uploading.
# If an exception is raised, the run is marked failed.
with wandb.init(entity="", project="my-project-name") as run:
  # Save mode inputs and hyperparameters.
  run.config.learning_rate = 0.01

  # Run your experiment code.
  for epoch in range(num_epochs):
    # Do some training...

    # Log metrics over time to visualize model performance.
    run.log({"loss": loss})

  # Upload model outputs as artifacts.
  run.log_artifact(model)

Get started

Depending on your use case, explore the following resources to get started with W&B Experiments:

Read the W&B Quickstart for a step-by-step outline of the W&B Python SDK commands you could use to create, track, and use a dataset artifact.
Explore this chapter to learn how to:
- Create an experiment
- Configure experiments
- Log data from experiments
- View results from experiments
Explore the W&B Python Library within the W&B API Reference Guide.

Best practices and tips

For best practices and tips for experiments and logging, see Best Practices: Experiments and Logging.

2.1.1 - Create an experiment

Create a W&B Experiment.

Use the W&B Python SDK to track machine learning experiments. You can then review the results in an interactive dashboard or export your data to Python for programmatic access with the W&B Public API.

This guide describes how to use W&B building blocks to create a W&B Experiment.

How to create a W&B Experiment

Create a W&B Experiment in four steps:

Initialize a W&B Run
Capture a dictionary of hyperparameters
Log metrics inside your training loop
Log an artifact to W&B

Initialize a W&B run

Use wandb.init() to create a W&B Run.

The following snippet creates a run in a W&B project named “cat-classification” with the description “My first experiment” to help identify this run. Tags “baseline” and “paper1” are included to remind us that this run is a baseline experiment intended for a future paper publication.

import wandb

with wandb.init(
    project="cat-classification",
    notes="My first experiment",
    tags=["baseline", "paper1"],
) as run:
    ...

wandb.init() returns a Run object.

Note: Runs are added to pre-existing projects if that project already exists when you call wandb.init(). For example, if you already have a project called “cat-classification”, that project will continue to exist and not be deleted. Instead, a new run is added to that project.

Capture a dictionary of hyperparameters

Save a dictionary of hyperparameters such as learning rate or model type. The model settings you capture in config are useful later to organize and query your results.

with wandb.init(
    ...,
    config={"epochs": 100, "learning_rate": 0.001, "batch_size": 128},
) as run:
    ...

For more information on how to configure an experiment, see Configure Experiments.

Log metrics inside your training loop

Call run.log() to log metrics about each training step such as accuracy and loss.

model, dataloader = get_model(), get_data()

for epoch in range(run.config.epochs):
    for batch in dataloader:
        loss, accuracy = model.training_step()
        run.log({"accuracy": accuracy, "loss": loss})

For more information on different data types you can log with W&B, see Log Data During Experiments.

Log an artifact to W&B

Optionally log a W&B Artifact. Artifacts make it easy to version datasets and models.

# You can save any file or even a directory. In this example, we pretend
# the model has a save() method that outputs an ONNX file.
model.save("path_to_model.onnx")
run.log_artifact("path_to_model.onnx", name="trained-model", type="model")

Learn more about Artifacts or about versioning models in Registry.

Putting it all together

The full script with the preceding code snippets is found below:

import wandb

with wandb.init(
    project="cat-classification",
    notes="",
    tags=["baseline", "paper1"],
    # Record the run's hyperparameters.
    config={"epochs": 100, "learning_rate": 0.001, "batch_size": 128},
) as run:
    # Set up model and data.
    model, dataloader = get_model(), get_data()

    # Run your training while logging metrics to visualize model performance.
    for epoch in range(run.config["epochs"]):
        for batch in dataloader:
            loss, accuracy = model.training_step()
            run.log({"accuracy": accuracy, "loss": loss})

    # Upload the trained model as an artifact.
    model.save("path_to_model.onnx")
    run.log_artifact("path_to_model.onnx", name="trained-model", type="model")

Next steps: Visualize your experiment

Use the W&B Dashboard as a central place to organize and visualize results from your machine learning models. With just a few clicks, construct rich, interactive charts like parallel coordinates plots, parameter importance analyzes, and additional chart types.

For more information on how to view experiments and specific runs, see Visualize results from experiments.

Best practices

The following are some suggested guidelines to consider when you create experiments:

Finish your runs: Use wandb.init() in a with statement to automatically mark the run as finished when the code completes or raises an exception.
- In Jupyter notebooks, it may be more convenient to manage the Run object yourself. In this case, you can explicitly call finish() on the Run object to mark it complete:
```
# In a notebook cell:
run = wandb.init()

# In a different cell:
run.finish()
```
Config: Track hyperparameters, architecture, dataset, and anything else you’d like to use to reproduce your model. These will show up in columns— use config columns to group, sort, and filter runs dynamically in the app.
Project: A project is a set of experiments you can compare together. Each project gets a dedicated dashboard page, and you can easily turn on and off different groups of runs to compare different model versions.
Notes: Set a quick commit message directly from your script. Edit and access notes in the Overview section of a run in the W&B App.
Tags: Identify baseline runs and favorite runs. You can filter runs using tags. You can edit tags at a later time on the Overview section of your project’s dashboard on the W&B App.
Create multiple run sets to compare experiments: When comparing experiments, create multiple run sets to make metrics easy to compare. You can toggle run sets on or off on the same chart or group of charts.

The following code snippet demonstrates how to define a W&B Experiment using the best practices listed above:

import wandb

config = {
    "learning_rate": 0.01,
    "momentum": 0.2,
    "architecture": "CNN",
    "dataset_id": "cats-0192",
}

with wandb.init(
    project="detect-cats",
    notes="tweak baseline",
    tags=["baseline", "paper1"],
    config=config,
) as run:
    ...

For more information about available parameters when defining a W&B Experiment, see the wandb.init() API docs in the API Reference Guide.

2.1.2 - Configure experiments

Use a dictionary-like object to save your experiment configuration

Try in Colab

Use the config property of a run to save your training configuration:

hyperparameter
input settings such as the dataset name or model type
any other independent variables for your experiments.

The wandb.Run.config property makes it easy to analyze your experiments and reproduce your work in the future. You can group by configuration values in the W&B App, compare the configurations of different W&B runs, and evaluate how each training configuration affects the output. The config property is a dictionary-like object that can be composed from multiple dictionary-like objects.

To save output metrics or dependent variables like loss and accuracy, use wandb.Run.log() instead of wandb.Run.config.

Set up an experiment configuration

Configurations are typically defined in the beginning of a training script. Machine learning workflows may vary, however, so you are not required to define a configuration at the beginning of your training script.

Use dashes (-) or underscores (_) instead of periods (.) in your config variable names.

Use the dictionary access syntax ["key"]["value"] instead of the attribute access syntax config.key.value if your script accesses wandb.Run.config keys below the root.

The following sections outline different common scenarios of how to define your experiments configuration.

Set the configuration at initialization

Pass a dictionary at the beginning of your script when you call the wandb.init() API to generate a background process to sync and log data as a W&B Run.

The proceeding code snippet demonstrates how to define a Python dictionary with configuration values and how to pass that dictionary as an argument when you initialize a W&B Run.

import wandb

# Define a config dictionary object
config = {
    "hidden_layer_sizes": [32, 64],
    "kernel_sizes": [3],
    "activation": "ReLU",
    "pool_sizes": [2],
    "dropout": 0.5,
    "num_classes": 10,
}

# Pass the config dictionary when you initialize W&B
with wandb.init(project="config_example", config=config) as run:
    ...

If you pass a nested dictionary as the config, W&B flattens the names using dots.

Access the values from the dictionary similarly to how you access other dictionaries in Python:

# Access values with the key as the index value
hidden_layer_sizes = run.config["hidden_layer_sizes"]
kernel_sizes = run.config["kernel_sizes"]
activation = run.config["activation"]

# Python dictionary get() method
hidden_layer_sizes = run.config.get("hidden_layer_sizes")
kernel_sizes = run.config.get("kernel_sizes")
activation = run.config.get("activation")

Throughout the Developer Guide and examples we copy the configuration values into separate variables. This step is optional. It is done for readability.

Set the configuration with argparse

You can set your configuration with an argparse object. argparse, short for argument parser, is a standard library module in Python 3.2 and above that makes it easy to write scripts that take advantage of all the flexibility and power of command line arguments.

This is useful for tracking results from scripts that are launched from the command line.

The proceeding Python script demonstrates how to define a parser object to define and set your experiment config. The functions train_one_epoch and evaluate_one_epoch are provided to simulate a training loop for the purpose of this demonstration:

# config_experiment.py
import argparse
import random

import numpy as np
import wandb


# Training and evaluation demo code
def train_one_epoch(epoch, lr, bs):
    acc = 0.25 + ((epoch / 30) + (random.random() / 10))
    loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
    return acc, loss


def evaluate_one_epoch(epoch):
    acc = 0.1 + ((epoch / 20) + (random.random() / 10))
    loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
    return acc, loss


def main(args):
    # Start a W&B Run
    with wandb.init(project="config_example", config=args) as run:
        # Access values from config dictionary and store them
        # into variables for readability
        lr = run.config["learning_rate"]
        bs = run.config["batch_size"]
        epochs = run.config["epochs"]

        # Simulate training and logging values to W&B
        for epoch in np.arange(1, epochs):
            train_acc, train_loss = train_one_epoch(epoch, lr, bs)
            val_acc, val_loss = evaluate_one_epoch(epoch)

            run.log(
                {
                    "epoch": epoch,
                    "train_acc": train_acc,
                    "train_loss": train_loss,
                    "val_acc": val_acc,
                    "val_loss": val_loss,
                }
            )


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument("-b", "--batch_size", type=int, default=32, help="Batch size")
    parser.add_argument(
        "-e", "--epochs", type=int, default=50, help="Number of training epochs"
    )
    parser.add_argument(
        "-lr", "--learning_rate", type=int, default=0.001, help="Learning rate"
    )

    args = parser.parse_args()
    main(args)

Set the configuration throughout your script

You can add more parameters to your config object throughout your script. The proceeding code snippet demonstrates how to add new key-value pairs to your config object:

import wandb

# Define a config dictionary object
config = {
    "hidden_layer_sizes": [32, 64],
    "kernel_sizes": [3],
    "activation": "ReLU",
    "pool_sizes": [2],
    "dropout": 0.5,
    "num_classes": 10,
}

# Pass the config dictionary when you initialize W&B
with wandb.init(project="config_example", config=config) as run:
    # Update config after you initialize W&B
    run.config["dropout"] = 0.2
    run.config.epochs = 4
    run.config["batch_size"] = 32

You can update multiple values at a time:

run.config.update({"lr": 0.1, "channels": 16})

Set the configuration after your Run has finished

Use the W&B Public API to update a completed run’s config.

You must provide the API with your entity, project name and the run’s ID. You can find these details in the Run object or in the W&B App:

with wandb.init() as run:
    ...

# Find the following values from the Run object if it was initiated from the
# current script or notebook, or you can copy them from the W&B App UI.
username = run.entity
project = run.project
run_id = run.id

# Note that api.run() returns a different type of object than wandb.init().
api = wandb.Api()
api_run = api.run(f"{username}/{project}/{run_id}")
api_run.config["bar"] = 32
api_run.update()

`absl.FLAGS`

You can also pass in absl flags.

flags.DEFINE_string("model", None, "model to run")  # name, default, help

run.config.update(flags.FLAGS)  # adds absl flags to config

File-Based Configs

If you place a file named config-defaults.yaml in the same directory as your run script, the run automatically picks up the key-value pairs defined in the file and passes them to wandb.Run.config.

The following code snippet shows a sample config-defaults.yaml YAML file:

batch_size:
  desc: Size of each mini-batch
  value: 32

You can override the default values automatically loaded from config-defaults.yaml by setting updated values in the config argument of wandb.init. For example:

import wandb

# Override config-defaults.yaml by passing custom values
with wandb.init(config={"epochs": 200, "batch_size": 64}) as run:
    ...

To load a configuration file other than config-defaults.yaml, use the --configs command-line argument and specify the path to the file:

python train.py --configs other-config.yaml

Example use case for file-based configs

Suppose you have a YAML file with some metadata for the run, and then a dictionary of hyperparameters in your Python script. You can save both in the nested config object:

hyperparameter_defaults = dict(
    dropout=0.5,
    batch_size=100,
    learning_rate=0.001,
)

config_dictionary = dict(
    yaml=my_yaml_file,
    params=hyperparameter_defaults,
)

with wandb.init(config=config_dictionary) as run:
    ...

TensorFlow v1 flags

You can pass TensorFlow flags into the wandb.Run.config object directly.

with wandb.init() as run:
    run.config.epochs = 4

    flags = tf.app.flags
    flags.DEFINE_string("data_dir", "/tmp/data")
    flags.DEFINE_integer("batch_size", 128, "Batch size.")
    run.config.update(flags.FLAGS)  # add tensorflow flags as config

2.1.3 - Projects

Compare versions of your model, explore results in a scratch workspace, and export findings to a report to save notes and visualizations

A project is a central location where you visualize results, compare experiments, view and download artifacts, create an automation, and more.

Each project has a visibility setting that determines who can access it. For more information about who can access a project, see Project visibility.

Each project contains the following tabs:

Overview: snapshot of your project
Workspace: personal visualization sandbox
Runs: A table that lists all the runs in your project
Automations: Automations configured in your project
Sweeps: automated exploration and optimization
Reports: saved snapshots of notes, runs, and graphs
Artifacts: Contains all runs and the artifacts associated with that run

Overview tab

Project name: The name of the project. W&B creates a project for you when you initialize a run with the name you provide for the project field. You can change the name of the project at any time by selecting the Edit button in the upper right corner.
Description: A description of the project.
Project visibility: The visibility of the project. The visibility setting that determines who can access it. See Project visibility for more information.
Last active: Timestamp of the last time data is logged to this project
Owner: The entity that owns this project
Contributors: The number of users that contribute to this project
Total runs: The total number of runs in this project
Total compute: we add up all the run times in your project to get this total
Undelete runs: Click the dropdown menu and click “Undelete all runs” to recover deleted runs in your project.
Delete project: click the dot menu in the right corner to delete a project

View a live example

Workspace tab

A project’s workspace gives you a personal sandbox to compare experiments. Use projects to organize models that can be compared, working on the same problem with different architectures, hyperparameters, datasets, preprocessing etc.

Runs Sidebar: list of all the runs in your project.

Dot menu: hover over a row in the sidebar to see the menu appear on the left side. Use this menu to rename a run, delete a run, or stop and active run.
Visibility icon: click the eye to turn on and off runs on graphs
Color: change the run color to another one of our presets or a custom color
Search: search runs by name. This also filters visible runs in the plots.
Filter: use the sidebar filter to narrow down the set of runs visible
Group: select a config column to dynamically group your runs, for example by architecture. Grouping makes plots show up with a line along the mean value, and a shaded region for the variance of points on the graph.
Sort: pick a value to sort your runs by, for example runs with the lowest loss or highest accuracy. Sorting will affect which runs show up on the graphs.
Expand button: expand the sidebar into the full table
Run count: the number in parentheses at the top is the total number of runs in the project. The number (N visualized) is the number of runs that have the eye turned on and are available to be visualized in each plot. In the example below, the graphs are only showing the first 10 of 183 runs. Edit a graph to increase the max number of runs visible.

If you pin, hide, or change the order of columns in the Runs tab, the Runs sidebar reflects these customizations.

Panels layout: use this scratch space to explore results, add and remove charts, and compare versions of your models based on different metrics

View a live example

Add a section of panels

Click the section dropdown menu and click “Add section” to create a new section for panels. You can rename sections, drag them to reorganize them, and expand and collapse sections.

Each section has options in the upper right corner:

Switch to custom layout: The custom layout allows you to resize panels individually.
Switch to standard layout: The standard layout lets you resize all panels in the section at once, and gives you pagination.
Add section: Add a section above or below from the dropdown menu, or click the button at the bottom of the page to add a new section.
Rename section: Change the title for your section.
Export section to report: Save this section of panels to a new report.
Delete section: Remove the whole section and all the charts. This can be undone with the undo button at the bottom of the page in the workspace bar.
Add panel: Click the plus button to add a panel to the section.

Move panels between sections

Drag and drop panels to reorder and organize into sections. You can also click the “Move” button in the upper right corner of a panel to select a section to move the panel to.

Resize panels

Standard layout: All panels maintain the same size, and there are pages of panels. You can resize the panels by clicking and dragging the lower right corner. Resize the section by clicking and dragging the lower right corner of the section.
Custom layout: All panels are sized individually, and there are no pages.

Search for metrics

Use the search box in the workspace to filter down the panels. This search matches the panel titles, which are by default the name of the metrics visualized.

Runs tab

Use the Runs tab to filter, group, and sort your runs.

The proceeding tabs demonstrate some common actions you can take in the Runs tab.

The Runs tab shows details about runs in the project. It shows a large number of columns by default.

When you customize the Runs tab, the customization is also reflected in the Runs selector of the Workspace tab.

To view all visible columns, scroll the page horizontally.
To change the order of the columns, drag a column to the left or right.
To pin a column, hover over the column name, click the action menu .... that appears, then click Pin column. Pinned columns appear near the left of the page, after the Name column. To unpin a pinned column, choose Unpin column.
To hide a column, hover over the column name, click the action menu .... that appears, then click Hide column. To view all columns that are currently hidden, click Columns.
To show, hide, pin, and unpin multiple columns at once, click Columns.
- Click the name of a hidden column to unhide it.
- Click the name of a visible column to hide it.
- Click the pin icon next to a visible column to pin it.

Sort all rows in a Table by the value in a given column.

Hover your mouse over the column title. A kebab menu will appear (three vertical docs).
Select on the kebab menu (three vertical dots).
Choose Sort Asc or Sort Desc to sort the rows in ascending or descending order, respectively.

The preceding image demonstrates how to view sorting options for a Table column called val_acc.

Filter all rows by an expression with the Filter button on the top left of the dashboard.

Select Add filter to add one or more filters to your rows. Three dropdown menus will appear. From left to right the filter types are based on: Column name, Operator , and Values

	Column name	Binary relation	Value
Accepted values	String	=, ≠, ≤, ≥, IN, NOT IN,	Integer, float, string, timestamp, null

The expression editor shows a list of options for each term using autocomplete on column names and logical predicate structure. You can connect multiple logical predicates into one expression using “and” or “or” (and sometimes parentheses).

The preceding image shows a filter that is based on the `val_loss` column. The filter shows runs with a validation loss less than or equal to 1.

Group all rows by the value in a particular column with the Group by button in a column header.

By default, this turns other numeric columns into histograms showing the distribution of values for that column across the group. Grouping is helpful for understanding higher-level patterns in your data.

Automations tab

Automate downstream actions for versioning artifacts. To create an automation, define trigger events and resulting actions. Actions include executing a webhook or launching a W&B job. For more information, see Automations.

Reports tab

See all the snapshots of results in one place, and share findings with your team.

Sweeps tab

Start a new sweep from your project.

Artifacts tab

View all artifacts associated with a project, from training datasets and fine-tuned models to tables of metrics and media.

Overview panel

On the overview panel, you’ll find a variety of high-level information about the artifact, including its name and version, the hash digest used to detect changes and prevent duplication, the creation date, and any aliases. You can add or remove aliases here, take notes on both the version as well as the artifact as a whole.

Metadata panel

The metadata panel provides access to the artifact’s metadata, which is provided when the artifact is constructed. This metadata might include configuration arguments required to reconstruct the artifact, URLs where more information can be found, or metrics produced during the run which logged the artifact. Additionally, you can see the configuration for the run which produced the artifact as well as the history metrics at the time of logging the artifact.

Usage panel

The Usage panel provides a code snippet for downloading the artifact for use outside of the web app, for example on a local machine. This section also indicates and links to the run which output the artifact and any runs which use the artifact as an input.

Files panel

The files panel lists the files and folders associated with the artifact. W&B uploads certain files for a run automatically. For example, requirements.txt shows the versions of each library the run used, and wandb-metadata.json, and wandb-summary.json include information about the run. Other files may be uploaded, such as artifacts or media, depending on the run’s configuration. You can navigate through this file tree and view the contents directly in the W&B web app.

Tables associated with artifacts are particularly rich and interactive in this context. Learn more about using Tables with Artifacts here.

Lineage panel

The lineage panel provides a view of all of the artifacts associated with a project and the runs that connect them to each other. It shows run types as blocks and artifacts as circles, with arrows to indicate when a run of a given type consumes or produces an artifact of a given type. The type of the particular artifact selected in the left-hand column is highlighted.

Click the Explode toggle to view all of the individual artifact versions and the specific runs that connect them.

Action History Audit tab

The action history audit tab shows all of the alias actions and membership changes for a Collection so you can audit the entire evolution of the resource.

Versions tab

The versions tab shows all versions of the artifact as well as columns for each of the numeric values of the Run History at the time of logging the version. This allows you to compare performance and quickly identify versions of interest.

Create a project

You can create a project in the W&B App or programmatically by specifying a project in a call to wandb.init().

In the W&B App, you can create a project from the Projects page or from a team’s landing page.

From the Projects page:

Click the global navigation icon in the upper left. The navigation sidebar opens.
In the Projects section of the navigation, click View all to open the project overview page.
Click Create new project.
Set Team to the name of the team that will own the project.
Specify a name for your project using the Name field.
Set Project visibility, which defaults to Team.
Optionally, provide a Description.
Click Create project.

From a team’s landing page:

Click the global navigation icon in the upper left. The navigation sidebar opens.
In the Teams section of the navigation, click the name of a team to open its landing page.
In the landing page, click Create new project.
Team is automatically set to the team that owns the landing page you were viewing. If necessary, change the team.
Specify a name for your project using the Name field.
Set Project visibility, which defaults to Team.
Optionally, provide a Description.
Click Create project.

To create a project programmatically, specify a project when calling wandb.init(). If the project does not yet exist, it is created automatically, and is owned by the specified entity. For example:

import wandb with wandb.init(entity="<entity>", project="<project_name>") as run: run.log({"accuracy": .95})

Refer to the wandb.init() API reference.

Star a project

Add a star to a project to mark that project as important. Projects that you and your team mark as important with stars appear at the top of your organization’s homepage.

For example, the proceeding image shows two projects that are marked as important, the zoo_experiment and registry_demo. Both projects appear within the top of the organization’s homepage within the Starred projects section.

There are two ways to mark a project as important: within a project’s overview tab or within your team’s profile page.

Navigate to your W&B project on the W&B App at https://wandb.ai/<team>/<project-name>.
Select the Overview tab from the project sidebar.
Choose the star icon in the upper right corner next to the Edit button.

Navigate to your team’s profile page at https://wandb.ai/<team>/projects.
Select the Projects tab.
Hover your mouse next to the project you want to star. Click on star icon that appears.

For example, the proceeding image shows the star icon next to the “Compare_Zoo_Models” project.

Confirm that your project appears on the landing page of your organization by clicking on the organization name in the top left corner of the app.

Delete a project

You can delete your project by clicking the three dots on the right of the overview tab.

If the project is empty, you can delete it by clicking the dropdown menu in the top-right and selecting Delete project.

Add notes to a project

Add notes to your project either as a description overview or as a markdown panel within your workspace.

Add description overview to a project

Descriptions you add to your page appear in the Overview tab of your profile.

Navigate to your W&B project
Select the Overview tab from the project sidebar
Choose Edit in the upper right hand corner
Add your notes in the Description field
Select the Save button

Create reports to create descriptive notes comparing runs

You can also create a W&B Report to add plots and markdown side by side. Use different sections to show different runs, and tell a story about what you worked on.

Add notes to run workspace

Navigate to your W&B project
Select the Workspace tab from the project sidebar
Choose the Add panels button from the top right corner
Select the TEXT AND CODE dropdown from the modal that appears
Select Markdown
Add your notes in the markdown panel that appears in your workspace

2.1.4 - View experiments results

A playground for exploring run data with interactive visualizations

W&B workspace is your personal sandbox to customize charts and explore model results. A W&B workspace consists of Tables and Panel sections:

Tables: All runs logged to your project are listed in the project’s table. Turn on and off runs, change colors, and expand the table to see notes, config, and summary metrics for each run.
Panel sections: A section that contains one or more panels. Create new panels, organize them, and export to reports to save snapshots of your workspace.

Workspace types

There are two main workspace categories: Personal workspaces and Saved views.

Personal workspaces: A customizable workspace for in-depth analysis of models and data visualizations. Only the owner of the workspace can edit and save changes. Teammates can view a personal workspace but teammates can not make changes to someone else’s personal workspace.
Saved views: Saved views are collaborative snapshots of a workspace. Anyone on your team can view, edit, and save changes to saved workspace views. Use saved workspace views for reviewing and discussing experiments, runs, and more.

The proceeding image shows multiple personal workspaces created by Cécile-parker’s teammates. In this project, there are no saved views:

Saved workspace views

Improve team collaboration with tailored workspace views. Create Saved Views to organize your preferred setup of charts and data.

Create a new saved workspace view

Navigate to a personal workspace or a saved view.
Make edits to the workspace.
Click on the meatball menu (three horizontal dots) at the top right corner of your workspace. Click on Save as a new view.

New saved views appear in the workspace navigation menu.

Update a saved workspace view

Saved changes overwrite the previous state of the saved view. Unsaved changes are not retained. To update a saved workspace view in W&B:

Navigate to a saved view.
Make the desired changes to your charts and data within the workspace.
Click the Save button to confirm your changes.

A confirmation dialog appears when you save your updates to a workspace view. If you prefer not to see this prompt in the future, select the option Do not show this modal next time before confirming your save.

Delete a saved workspace view

Remove saved views that are no longer needed.

Navigate to the saved view you want to remove.
Select the three horizontal lines (…) at the top right of the view.
Choose Delete view.
Confirm the deletion to remove the view from your workspace menu.

Share your customized workspace with your team by sharing the workspace URL directly. All users with access to the workspace project can see the saved Views of that workspace.

Workspace templates

This feature requires an Enterprise license.

Use workspace templates to quickly create workspaces using the same settings as an existing workspace instead of the default settings for new workspaces. Currently, a workspace template can define custom line plot settings.

Default workspace settings

By default, new workspaces use these default settings for line plots:

Setting	Default
X axis	Step
Smoothing type	Time weight EMA
Smoothing weight	0
Max runs	10
Grouping in charts	on
Group aggregation	Mean

Configure your workspace template

Open any workspace or create a new one.
Configure the workspace’s line plot settings according to your preferences.
Save the settings to your workspace template:
1. At the top of the workspace, click the action menu ... near the Undo and Redo arrow icons.
2. Click Save personal workspace template.
3. Review the line plot settings for the template, then click Save.

New workspaces will use these settings instead of the defaults.

View your workspace template

To view your workspace template’s current configuration:

From any page, select your user icon on the top right corner. From the dropdown, choose Settings.
Navigate to the Personal workspace template section. If you are using a workspace template, its configuration displays. Otherwise, the section includes no details.

Update your workspace template

To update your workspace template:

Open any workspace.
Modify the workspace’s settings. For example, set the number of runs to include to 11.
To save the changes to the template, click the action menu ... near the Undo and Redo arrow icons, then click Update personal workspace template.
Verify the settings, then click Update. The template is updated, and reapplied to all workspaces that use it.

Delete your workspace template

To delete your workspace template and go back to the default settings:

From any page, select your user icon on the top right corner. From the dropdown, choose Settings.
Navigate to the Personal workspace template section. Your workspace template’s configuration displays.
Click the trash icon next to Settings.

For Dedicated Cloud and Self-Managed, deleting your workspace template is supported on v0.70 and above. On older Server versions, update your workspace template to use the default settings instead.

Programmatically create workspaces

wandb-workspaces is a Python library for programmatically working with W&B workspaces and reports.

Define a workspace programmatically with wandb-workspaces. wandb-workspaces is a Python library for programmatically working with W&B workspaces and reports.

You can define the workspace’s properties, such as:

Set panel layouts, colors, and section orders.
Configure workspace settings like default x-axis, section order, and collapse states.
Add and customize panels within sections to organize workspace views.
Load and modify existing workspaces using a URL.
Save changes to existing workspaces or save as new views.
Filter, group, and sort runs programmatically using simple expressions.
Customize run appearance with settings like colors and visibility.
Copy views from one workspace to another for integration and reuse.

Install Workspace API

In addition to wandb, ensure that you install wandb-workspaces:

pip install wandb wandb-workspaces

Define and save a workspace view programmatically

import wandb_workspaces.reports.v2 as wr

workspace = ws.Workspace(entity="your-entity", project="your-project", views=[...])
workspace.save()

Edit an existing view

existing_workspace = ws.Workspace.from_url("workspace-url")
existing_workspace.views[0] = ws.View(name="my-new-view", sections=[...])
existing_workspace.save()

Copy a workspace `saved view` to another workspace

old_workspace = ws.Workspace.from_url("old-workspace-url")
old_workspace_view = old_workspace.views[0]
new_workspace = ws.Workspace(entity="new-entity", project="new-project", views=[old_workspace_view])

new_workspace.save()

See wandb-workspace examples for comprehensive workspace API examples. For an end to end tutorial, see Programmatic Workspaces tutorial.

2.1.5 - What are runs?

Learn about the basic building block of W&B, Runs.

A run is a single unit of computation logged by W&B. You can think of a W&B Run as an atomic element of your whole project. In other words, each run is a record of a specific computation, such as training a model and logging the results, hyperparameter sweeps, and so forth.

Common patterns for initiating a run include, but are not limited to:

Training a model
Changing a hyperparameter and conducting a new experiment
Conducting a new machine learning experiment with a different model
Logging data or a model as a W&B Artifact
Downloading a W&B Artifact

W&B stores runs that you create into projects. You can view runs and their properties within the run’s project workspace on the W&B App. You can also programmatically access run properties with the wandb.Api.Run object.

Anything you log with wandb.Run.log() is recorded in that run.

import wandb

entity = "nico"  # Replace with your W&B entity
project = "awesome-project"

with wandb.init(entity=entity, project=project) as run:
    run.log({"accuracy": 0.9, "loss": 0.1})

The first line imports the W&B Python SDK. The second line initializes a run in the project awesome-project under the entity nico. The third line logs the accuracy and loss of the model to that run.

Within the terminal, W&B returns:

wandb: Syncing run earnest-sunset-1
wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project
wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/1jx1ud12
wandb:                                                                                
wandb: 
wandb: Run history:
wandb: accuracy ▁
wandb:     loss ▁
wandb: 
wandb: Run summary:
wandb: accuracy 0.9
wandb:     loss 0.5
wandb: 
wandb: 🚀 View run earnest-sunset-1 at: https://wandb.ai/nico/awesome-project/runs/1jx1ud12
wandb: ⭐️ View project at: https://wandb.ai/nico/awesome-project
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20241105_111006-1jx1ud12/logs

The URL W&B returns in the terminal to redirects you to the run’s workspace in the W&B App UI. Note that the panels generated in the workspace corresponds to the single point.

Logging a metrics at a single point of time might not be that useful. A more realistic example in the case of training discriminative models is to log metrics at regular intervals. For example, consider the proceeding code snippet:

import wandb
import random

config = {
    "epochs": 10,
    "learning_rate": 0.01,
}

with wandb.init(project="awesome-project", config=config) as run:
    print(f"lr: {config['learning_rate']}")
      
    # Simulating a training run
    for epoch in range(config['epochs']):
      offset = random.random() / 5
      acc = 1 - 2**-epoch - random.random() / (epoch + 1) - offset
      loss = 2**-epoch + random.random() / (epoch + 1) + offset
      print(f"epoch={epoch}, accuracy={acc}, loss={loss}")
      run.log({"accuracy": acc, "loss": loss})

This returns the following output:

wandb: Syncing run jolly-haze-4
wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project
wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/pdo5110r
lr: 0.01
epoch=0, accuracy=-0.10070974957523078, loss=1.985328507123956
epoch=1, accuracy=0.2884687745057535, loss=0.7374362314407752
epoch=2, accuracy=0.7347387967382066, loss=0.4402409835486663
epoch=3, accuracy=0.7667969248039795, loss=0.26176963846423457
epoch=4, accuracy=0.7446848791003173, loss=0.24808611724405083
epoch=5, accuracy=0.8035095836268268, loss=0.16169791827329466
epoch=6, accuracy=0.861349032371624, loss=0.03432578493587426
epoch=7, accuracy=0.8794926436276016, loss=0.10331872172219471
epoch=8, accuracy=0.9424839917077272, loss=0.07767793473500445
epoch=9, accuracy=0.9584880427028566, loss=0.10531971149250456
wandb: 🚀 View run jolly-haze-4 at: https://wandb.ai/nico/awesome-project/runs/pdo5110r
wandb: Find logs at: wandb/run-20241105_111816-pdo5110r/logs

The training script calls wandb.Run.log() 10 times. Each time the script calls wandb.Run.log(), W&B logs the accuracy and loss for that epoch. Selecting the URL that W&B prints from the preceding output, directs you to the run’s workspace in the W&B App UI.

W&B captures the simulated training loop within a single run called jolly-haze-4. This is because the script calls wandb.init() method only once.

As another example, during a sweep, W&B explores a hyperparameter search space that you specify. W&B implements each new hyperparameter combination that the sweep creates as a unique run.

Initialize a W&B Run

Initialize a W&B Run with wandb.init(). The proceeding code snippet shows how to import the W&B Python SDK and initialize a run.

Ensure to replace values enclosed in angle brackets (< >) with your own values:

import wandb

with wandb.init(entity="<entity>", project="<project>") as run:
    # Your code here

When you initialize a run, W&B logs your run to the project you specify for the project field (wandb.init(project="<project>"). W&B creates a new project if the project does not already exist. If the project already exists, W&B stores the run in that project.

If you do not specify a project name, W&B stores the run in a project called Uncategorized.

Each run in W&B has a unique identifier known as a run ID. You can specify a unique ID or let W&B randomly generate one for you.

Each run also has a human-readable, non-unique run name. You can specify a name for your run or let W&B randomly generate one for you. You can rename a run after initializing it.

For example, consider the following code snippet:

import wandb

run = wandb.init(entity="wandbee", project="awesome-project")

The code snippet produces the following output:

🚀 View run exalted-darkness-6 at: 
https://wandb.ai/nico/awesome-project/runs/pgbn9y21
Find logs at: wandb/run-20241106_090747-pgbn9y21/logs

Since the preceding code did not specify an argument for the id parameter, W&B creates a unique run ID. Where nico is the entity that logged the run, awesome-project is the name of the project the run is logged to, exalted-darkness-6 is the name of the run, and pgbn9y21 is the run ID.

Notebook users

Specify run.finish() at the end of your run to mark the run finished. This helps ensure that the run is properly logged to your project and does not continue in the background.

import wandb

run = wandb.init(entity="<entity>", project="<project>")
# Training code, logging, and so forth
run.finish()

If you group runs into experiments, you can move a run into or out of a group or from one group to another.

Each run has a state that describes the current status of the run. See Run states for a full list of possible run states.

Run states

The proceeding table describes the possible states a run can be in:

State	Description
`Crashed`	Run stopped sending heartbeats in the internal process, which can happen if the machine crashes.
`Failed`	Run ended with a non-zero exit status.
`Finished`	Run ended and fully synced data, or called `wandb.Run.finish()`.
`Killed`	Run was forcibly stopped before it could finish.
`Running`	Run is still running and has recently sent a heartbeat.

Unique run identifiers

Run IDs are unique identifiers for runs. By default, W&B generates a random and unique run ID for you when you initialize a new run. You can also specify your own unique run ID when you initialize a run.

Autogenerated run IDs

If you do not specify a run ID when you initialize a run, W&B generates a random run ID for you. You can find the unique ID of a run in the W&B App.

Navigate to the W&B App.
Navigate to the W&B project you specified when you initialized the run.
Within your project’s workspace, select the Runs tab.
Select the Overview tab.

W&B displays the unique run ID in the Run path field. The run path consists of the name of your team, the name of the project, and the run ID. The unique ID is the last part of the run path.

For example, in the proceeding image, the unique run ID is 9mxi1arc:

Custom run IDs

You can specify your own run ID by passing the id parameter to the wandb.init() method.

import wandb

run = wandb.init(entity="<project>", project="<project>", id="<run-id>")

You can use a run’s unique ID to directly navigate to the run’s overview page in the W&B App. The proceeding cell shows the URL path for a specific run:

https://wandb.ai/<entity>/<project>/<run-id>

Where values enclosed in angle brackets (< >) are placeholders for the actual values of the entity, project, and run ID.

Name your run

The name of a run is a human-readable, non-unique identifier.

By default, W&B generates a random run name when you initialize a new run. The name of a run appears within your project’s workspace and at the top of the run’s overview page.

Use run names as a way to quickly identify a run in your project workspace.

You can specify a name for your run by passing the name parameter to the wandb.init() method.

import wandb

with wandb.init(entity="<project>", project="<project>", name="<run-name>") as run:
    # Your code here

Rename a run

After you initialize a run, you can rename it from your workspace or its Runs page.

Navigate to your W&B project.
Select the Workspace or Runs tab from the project sidebar.
Search or scroll to the run you want to rename.

Hover over the run name, click the three vertical dots, then select the scope:
- Rename run for project: The run is renamed across the project.
- Rename run for workspace: The run is renamed only in this workspace.
Type a new name for the run. To generate a new random name, leave the field blank.
Submit the form. The run’s new name displays. An information icon appears next to a run that has a custom name in the workspace. Hover over it for more details.

You can also rename a run from a run set in a report:

In the report, click the pencil icon to open the report editor.
In the run set, find the run to rename. Hover over the report name, click the three vertical dots, then select either:

Rename run for project: rename the run across the entire project. To generate a new random name, leave the field blank.
Rename run for panel grid rename the run only in the report, preserving the existing name in other contexts. Generating a new random name is not supported.

Submit the form.

Click Publish report.

Add a note to a run

Notes that you add to a specific run appear on the run page in the Overview tab and in the table of runs on the project page.

Navigate to your W&B project
Select the Workspace tab from the project sidebar
Select the run you want to add a note to from the run selector
Choose the Overview tab
Select the pencil icon next to the Description field and add your notes

Stop a run

Stop a run from the W&B App or programmatically.

Navigate to the terminal or code editor where you initialized the run.
Press Ctrl+D to stop the run.

For example, following the preceding instructions, your terminal might looks similar to the following:

KeyboardInterrupt
wandb: 🚀 View run legendary-meadow-2 at: https://wandb.ai/nico/history-blaster-4/runs/o8sdbztv
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20241106_095857-o8sdbztv/logs

Navigate to the W&B App to confirm the run is no longer active:

Navigate to the project that your run was logging to.
Select the name of the run.

You can find the name of the run that you stop from the output of your terminal or code editor. For example, in the preceding example, the name of the run is legendary-meadow-2.

3. Choose the **Overview** tab from the project sidebar.

Next to the State field, the run’s state changes from running to Killed.

Navigate to the project that your run is logging to.
Select the run you want to stop within the run selector.
Choose the Overview tab from the project sidebar.
Select the top button next to the State field.

Next to the State field, the run’s state changes from running to Killed.

See State fields for a full list of possible run states.

View logged runs

View a information about a specific run such as the state of the run, artifacts logged to the run, log files recorded during the run, and more.

To view a specific run:

Navigate to the W&B App.
Navigate to the W&B project you specified when you initialized the run.
Within the project sidebar, select the Workspace tab.
Within the run selector, click the run you want to view, or enter a partial run name to filter for matching runs.

Note that the URL path of a specific run has the proceeding format:

https://wandb.ai/<team-name>/<project-name>/runs/<run-id>

Where values enclosed in angle brackets (< >) are placeholders for the actual values of the team name, project name, and run ID.

Customize how runs are displayed

This section shows how to customize how runs are displayed in your project’s Workspace and Runs tab, which share the same display configuration.

A workspace is limited to displaying a maximum of 1000 runs, regardless of its configuration.

To customize which columns are visible:

In the project sidebar, navigate to the Runs tab.
Above the list of runs, click Columns.
Click the name of a hidden column to show it. Click the name of a visible column to hide it.

You can optionally search by column name using fuzzy search, an exact match, or regular expressions. Drag columns to change their order.
Click Done to close the column browser.

To sort the list of runs by any visible column:

Hover over the column name, then click its action ... menu.
Click Sort ascending or Sort descending.

Pinned columns are shown on the right-hand side. Unpinned columns are shown on the left-hand side of the Runs tab and are not shown on the Workspace tab.

To pin a column:

In the project sidebar, navigate to the Runs tab.
Click Pin column.

To unpin a column:

In the project sidebar, navigate to the Workspace or Runs tab.
Hover over the column name, then click its action ... menu.
Click Unpin column.

By default, long run names are truncated in the middle for readability. To customize the truncation of run names:

Click the action ... menu at the top of the list of runs.
Set Run name cropping to crop the end, middle, or beginning.

See the Runs tab.

Overview tab

Use the Overview tab to learn about specific run information in a project, such as:

Author: The W&B entity that creates the run.
Command: The command that initializes the run.
Description: A description of the run that you provided. This field is empty if you do not specify a description when you create the run. You can add a description to a run with the W&B App UI or programmatically with the Python SDK.
Tracked Hours: The amount of time the run is actively computing or logging data, excluding any pauses or waiting periods. This metric helps you understand the actual computational time spent on your run.
Runtime: Measures the total time from the start to the end of the run. It’s the wall-clock time for the run, including any time where the run is paused or waiting for resources. This metric provides the complete elapsed time for your run.
Git repository: The git repository associated with the run. You must enable git to view this field.
Host name: Where W&B computes the run. W&B displays the name of your machine if you initialize the run locally on your machine.
Name: The name of the run.
OS: Operating system that initializes the run.
Python executable: The command that starts the run.
Python version: Specifies the Python version that creates the run.
Run path: Identifies the unique run identifier in the form entity/project/run-ID.
Start time: The timestamp when you initialize the run.
State: The state of the run.
System hardware: The hardware W&B uses to compute the run.
Tags: A list of strings. Tags are useful for organizing related runs together or applying temporary labels like baseline or production.
W&B CLI version: The W&B CLI version installed on the machine that hosted the run command.
Git state: The most recent git commit SHA of a repository or working directory where the run is initialized. This field is empty if you do not enable Git when you create the run or if the git information is not available.

W&B stores the proceeding information below the overview section:

Artifact Outputs: Artifact outputs produced by the run.
Config: List of config parameters saved with wandb.Run.config.
Summary: List of summary parameters saved with wandb.Run.log(). By default, W&B sets this value to the last value logged.

View an example project overview here.

Workspace tab

Use the Workspace tab to view, search, group, and arrange visualizations such as autogenerated and custom plots, system metrics, and more.

View an example project workspace here

Runs tab

Use the Runs tab to filter, group, and sort your runs.

The proceeding tabs demonstrate some common actions you can take in the Runs tab.

The Runs tab shows details about runs in the project. It shows a large number of columns by default.

To view all visible columns, scroll the page horizontally.
To change the order of the columns, drag a column to the left or right.
To pin a column, hover over the column name, click the action menu .... that appears, then click Pin column. Pinned columns appear near the left of the page, after the Name column. To unpin a pinned column, choose Unpin column
To hide a column, hover over the column name, click the action menu .... that appears, then click Hide column. To view all columns that are currently hidden, click Columns.
To show, hide, pin, and unpin multiple columns at once, click Columns.
- Click the name of a hidden column to unhide it.
- Click the name of a visible column to hide it.
- Click the pin icon next to a visible column to pin it.

When you customize the Runs tab, the customization is also reflected in the Runs selector of the Workspace tab.

Sort all rows in a Table by the value in a given column.

Hover your mouse over the column title. A kebab menu will appear (three vertical docs).
Select on the kebab menu (three vertical dots).
Choose Sort Asc or Sort Desc to sort the rows in ascending or descending order, respectively.

The preceding image demonstrates how to view sorting options for a Table column called val_acc.

Filter all rows by an expression with the Filter button above the dashboard.

Select Add filter to add one or more filters to your rows. Three dropdown menus will appear. From left to right the filter types are based on: Column name, Operator , and Values

	Column name	Binary relation	Value
Accepted values	String	=, ≠, ≤, ≥, IN, NOT IN,	Integer, float, string, timestamp, null

The preceding image shows a filter that is based on the `val_loss` column. The filter shows runs with a validation loss less than or equal to 1.

Group all rows by the value in a particular column with the Group by button above the dashboard.

By default, this turns other numeric columns into histograms that each show the distribution of values for that column across the group. Grouping is helpful for understanding higher-level patterns in your data.

The Group by feature is distinct from a run’s run group. You can group runs by run group. To move a run to a different run group, refer to Assign a group or job type to a run.

Logs tab

The Log tab shows output printed on the command line such as the standard output (stdout) and standard error (stderr).

Choose the Download button in the upper right hand corner to download the log file.

View an example logs tab here.

Files tab

Use the Files tab to view files associated with a specific run such as model checkpoints, validation set examples, and more

View an example files tab here.

Artifacts tab

The Artifacts tab lists the input and output artifacts for the specified run.

View example artifact graphs.

Delete runs

Delete one or more runs from a project with the W&B App.

Navigate to the project that contains the runs you want to delete.
Select the Runs tab from the project sidebar.
Select the checkbox next to the runs you want to delete.
Choose the Delete button (trash can icon) above the table.
From the modal that appears, choose Delete.

Once a run with a specific ID is deleted, its ID may not be used again. Trying to initiate a run with a previously deleted ID will show an error and prevent initiation.

For projects that contain a large number of runs, you can use either the search bar to filter runs you want to delete using Regex or the filter button to filter runs based on their status, tags, or other properties.

Organize runs

This section provides instructions on how to organize runs using groups and job types. By assigning runs to groups (for example, experiment names) and specifying job types (for example, preprocessing, training, evaluation, debugging), you can streamline your workflow and improve model comparison.

Assign a group or job type to a run

Each run in W&B can be categorized by group and a job type:

Group: a broad category for the experiment, used to organize and filter runs.
Job type: the function of the run, such as preprocessing, training, or evaluation.

The proceeding example workspace, trains a baseline model using increasing amounts of data from the Fashion-MNIST dataset. The workspace uses colorts to represent the amount of data used:

Yellow to dark green indicate increasing amounts of data for the baseline model.
Light blue to violet to magenta indicate amounts of data for a more complex “double” model with additional parameters.

Use W&B’s filtering options and search bar to compare runs based on specific conditions, such as:

Training on the same dataset.
Evaluating on the same test set.

When you apply filters, the Table view is updated automatically. This allows you to identify performance differences between models, such as determining which classes are significantly more challenging for one model compared to another.

2.1.5.1 - Add labels to runs with tags

Add tags to label runs with particular features that might not be obvious from the logged metrics or artifact data.

For example, you can add a tag to a run to indicated that run’s model is in_production, that run is preemptible, this run represents the baseline, and so forth.

Add tags to one or more runs

Programmatically or interactively add tags to your runs.

Based on your use case, select the tab below that best fits your needs:

You can add tags to a run when it is created:

import wandb

run = wandb.init(
  entity="entity",
  project="<project-name>",
  tags=["tag1", "tag2"]
)

You can also update the tags after you initialize a run. For example, the proceeding code snippet shows how to update a tag if a particular metrics crosses a pre-defined threshold:

import wandb

run = wandb.init(
  entity="entity", 
  project="capsules", 
  tags=["debug"]
  )

# python logic to train model

if current_loss < threshold:
    run.tags = run.tags + ("release_candidate",)

After you create a run, you can update tags using the Public API. For example:

run = wandb.Api().run("{entity}/{project}/{run-id}")
run.tags.append("tag1")  # you can choose tags based on run data here
run.update()

This method is best suited to tagging large numbers of runs with the same tag or tags.

Navigate to your project workspace.
Select Runs in the from the project sidebar.
Select one or more runs from the table.
Once you select one or more runs, select the Tag button above the table.
Type the tag you want to add and select the Create new tag checkbox to add the tag.

This method is best suited to applying a tag or tags to a single run manually.

Navigate to your project workspace.
Select a run from the list of runs within your project’s workspace.
Select Overview from the project sidebar.
Select the gray plus icon (+) button next to Tags.
Type a tag you want to add and select Add below the text box to add a new tag.

Remove tags from one or more runs

Tags can also be removed from runs with the W&B App UI.

This method is best suited to removing tags from a large numbers of runs.

In the Run sidebar of the project, select the table icon in the upper-right. This will expand the sidebar into the full runs table.
Hover over a run in the table to see a checkbox on the left or look in the header row for a checkbox to select all runs.
Select the checkbox to enable bulk actions.
Select the runs you want to remove tags.
Select the Tag button above the rows of runs.
Select the checkbox next to a tag to remove it from the run.

In the left sidebar of the Run page, select the top Overview tab. The tags on the run are visible here.
Hover over a tag and select the “x” to remove it from the run.

2.1.5.2 - Create and manage multiple runs in a single process

Manage multiple runs in a single Python process using W&B’s reinit functionality

Manage multiple runs in a single Python process. This is useful for workflows where you want to keep a primary process active while creating short-lived secondary processes for sub-tasks. Some use cases include:

Keeping a single “primary” run active throughout a script while spinning up short-lived “secondary” runs for evaluations or sub-tasks.
Orchestrating sub-experiments in a single file.
Logging from one “main” process to several runs that represent different tasks or time periods.

By default, W&B assumes each Python process has only one active run at a time when you call wandb.init(). If you call wandb.init() again, W&B will either return the same run or finish the old run before starting a new one, depending on the configuration. The content in this guide explains how to use reinit to modify the wandb.init() behavior to enable multiple runs in a single Python process.

Requirements

To manage multiple runs in a single Python process, you must have W&B Python SDK version v0.19.10 or newer.

`reinit` options

Use the reinit parameter to configure how W&B handles multiple calls to wandb.init(). The following table describes valid arguments and their effects:

	Description	Creates a run?	Example use case
`create_new`	Create a new run with `wandb.init()` without finishing existing, active runs. W&B does not automatically switch the global `wandb.Run` to new runs. You must hold onto each run object yourself. See the multiple runs in one process example below for details.	Yes	Ideal for creating and managing concurrent processes. For example, a “primary” run that remains active while you start or end “secondary” runs.
`finish_previous`	Finish all active runs with `run.finish()` before creating a new one run with `wandb.init()`. Default behavior for non notebook environments.	Yes	Ideal when you want to break sequential sub-processes into separate individual runs.
`return_previous`	Return the most recent, unfinished run. Default behavior for notebook environments.	No

W&B does not support create_new mode for W&B Integrations that assume a single global run, such as Hugging Face Trainer, Keras callbacks, and PyTorch Lightning. If you use these integrations, you should run each sub-experiment in a separate process.

Specifying `reinit`

Use wandb.init() with the reinit argument directly:

import wandb
wandb.init(reinit="<create_new|finish_previous|return_previous>")

Use wandb.init() and pass a wandb.Settings object to the settings parameter. Specify reinit in the Settings object:

import wandb
wandb.init(settings=wandb.Settings(reinit="<create_new|finish_previous|return_previous>"))

Use wandb.setup() to set the reinit option globally for all runs in the current process. This is useful if you want to configure the behavior once and have it apply to all subsequent wandb.init() calls in that process.
```
import wandb
wandb.setup(wandb.Settings(reinit="<create_new|finish_previous|return_previous>"))
```
Specify the desired value for reinit in the environment variable WANDB_REINIT. Defining an environment variable applies the reinit option to wandb.init() calls.
```
export WANDB_REINIT="<create_new|finish_previous|return_previous>"
```

The following code snippet shows a high level overview how to set up W&B to create a new run each time you call wandb.init():

import wandb

wandb.setup(wandb.Settings(reinit="create_new"))

with wandb.init() as experiment_results_run:
    # This run will be used to log the results of each experiment.
    # You can think of this as a parent run that collects results
      with wandb.init() as run:
         # The do_experiment() function logs fine-grained metrics
         # to the given run and returns result metrics that
         # you want to track separately.
         experiment_results = do_experiment(run)

         # After each experiment, log its results to a parent
         # run. Each point in the parent run's charts corresponds
         # to one experiment's results.
         experiment_results_run.log(experiment_results)

Example: Concurrent processes

Suppose you want to create a primary process that remains open for the script’s entire lifespan, while periodically spawning short-lived secondary processes without finishing the primary process. For example, this pattern can be useful if you want to train a model in the primary run, but compute evaluations or do other work in separate runs.

To achieve this, use reinit="create_new" and initialize multiple runs. For this example, suppose “Run A” is the primary process that remains open throughout the script, while “Run B1”, “Run B2”, are short-lived secondary runs for tasks like evaluation.

The high level workflow might look like this:

Initialize the primary process Run A with wandb.init() and log training metrics.
Initialize Run B1 (with wandb.init()), log data, then finish it.
Log more data to Run A.
Initialize Run B2, log data, then finish it.
Continue logging to Run A.
Finally finish Run A at the end.

The following Python code example demonstrates this workflow:

import wandb

def train(name: str) -> None:
    """Perform one training iteration in its own W&B run.

    Using a 'with wandb.init()' block with `reinit="create_new"` ensures that
    this training sub-run can be created even if another run (like our primary
    tracking run) is already active.
    """
    with wandb.init(
        project="my_project",
        name=name,
        reinit="create_new"
    ) as run:
        # In a real script, you'd run your training steps inside this block.
        run.log({"train_loss": 0.42})  # Replace with your real metric(s)

def evaluate_loss_accuracy() -> (float, float):
    """Returns the current model's loss and accuracy.
    
    Replace this placeholder with your real evaluation logic.
    """
    return 0.27, 0.91  # Example metric values

# Create a 'primary' run that remains active throughout multiple train/eval steps.
with wandb.init(
    project="my_project",
    name="tracking_run",
    reinit="create_new"
) as tracking_run:
    # 1) Train once under a sub-run named 'training_1'
    train("training_1")
    loss, accuracy = evaluate_loss_accuracy()
    tracking_run.log({"eval_loss": loss, "eval_accuracy": accuracy})

    # 2) Train again under a sub-run named 'training_2'
    train("training_2")
    loss, accuracy = evaluate_loss_accuracy()
    tracking_run.log({"eval_loss": loss, "eval_accuracy": accuracy})
    
    # The 'tracking_run' finishes automatically when this 'with' block ends.

Note three key points from the previous example:

reinit="create_new" creates a new run each time you call wandb.init().
You keep references of each run. wandb.run does not automatically point to the new run created with reinit="create_new". Store new runs in variables like run_a, run_b1, etc., and call .log() or .finish() on those objects as needed.
You can finish sub-runs whenever you want while keeping the primary run open until.
Finish your runs with run.finish() when you are done logging to them. This ensures that all data is uploaded and the run is properly closed.

2.1.5.3 - Customize run colors

W&B automatically assigns a color to each run that you create in your project. You can change the default color of a run to help you visually distinguish it from other runs in the table and graphs. Reset your project workspace to restore the default colors for all runs in the table.

Run colors are locally scoped. On the project page, custom colors apply only to your own workspace. In reports, custom colors for runs apply only at the section level. You can visualize the same run in different sections, which can use different custom colors per section.

Edit default run colors

Click the Runs tab from the project sidebar.
Click the dot color next to the run name in the Name column.
Select a color from the color palette or the color picker, or enter a hex code.

Edit default run color in project workspace

Randomize run colors

To randomize the colors of all runs in the table:

Click the Runs tab from the project sidebar.
Hover over the Name column header, click the three horizontal dots (…), and select Randomize run colors from the dropdown menu.

The option to randomize run colors is available only after modify the run’s table in some way, such as by sorting, filtering, searching, or grouping.

Reset run colors

To restore the default colors for all runs in the table:

Click the Runs tab from the project sidebar.
Hover over the Name column header, click the three horizontal dots (…), and select Reset colors from the dropdown menu.

2.1.5.4 - Filter and search runs

How to use the sidebar and table on the project page

Use your project page to gain insights from runs logged to W&B. You can filter and search runs from both the Workspace page and the Runs page.

Filter runs

Filter runs based on their status, tags, regular expressions (RegEx) or other properties with the filter button.

See Customize run colors for more information on how to edit, randomize, and reset run colors.

Filter runs with tags

Filter runs based on their tags with the filter button.

Click on the Runs tab from the project sidebar.
Select the Filter button, which looks like a funnel, at the top of the runs table.
From left to right, select "Tags" from the dropdown menu, select a logic operator, and select a filter search value.

Filter runs with regex

If regex doesn’t provide you the desired results, you can make use of tags to filter out the runs in Runs Table. Tags can be added either on run creation or after they’re finished. Once the tags are added to a run, you can add a tag filter as shown in the gif below.

Click on the Runs tab from the project sidebar.
Click on the search box at the top of the runs table.
Ensure that the RegEx toggle (.*) is enabled (the toggle should be blue).
Enter your regular expression in the search box.

Search runs

Use regular expressions (RegEx) to find runs with the regular expression you specify. When you type a query in the search box, that will filter down the visible runs in the graphs on the workspace as well as filtering the rows of the table.

Group runs

To group runs by one or more columns (including hidden columns):

Below the search box, click the Group button, which looks like a lined sheet of paper.
Select one or more columns to group results by.
Each set of grouped runs is collapsed by default. To expand it, click the arrow next to the group name.

Sort runs by minimum and maximum values

Sort the runs table by the minimum or maximum value of a logged metric. This is particularly useful if you want to view the best (or worst) recorded value.

The following steps describe how to sort the run table by a specific metric based on the minimum or maximum recorded value:

Hover your mouse over the column with the metric you want to sort with.
Select the kebab menu (three vertical lines).
From the dropdown, select either Show min or Show max.
From the same dropdown, select Sort by asc or Sort by desc to sort in ascending or descending order, respectively.

Search End Time for runs

We provide a column named End Time that logs that last heartbeat from the client process. The field is hidden by default.

Export runs table to CSV

Export the table of all your runs, hyperparameters, and summary metrics to a CSV with the download button.

2.1.5.5 - Fork a run

Forking a W&B run

The ability to fork a run is in private preview. Contact W&B Support at support@wandb.com to request access to this feature.

Use fork_from when you initialize a run with wandb.init() to “fork” from an existing W&B run. When you fork from a run, W&B creates a new run using the run ID and step of the source run.

Forking a run enables you to explore different parameters or models from a specific point in an experiment without impacting the original run.

Forking a run requires wandb SDK version >= 0.16.5
Forking a run requires monotonically increasing steps. You can not use non-monotonic steps defined with define_metric() to set a fork point because it would disrupt the essential chronological order of run history and system metrics.

Start a forked run

To fork a run, use the fork_from argument in wandb.init() and specify the source run ID and the step from the source run to fork from:

import wandb

# Initialize a run to be forked later
original_run = wandb.init(project="your_project_name", entity="your_entity_name")
# ... perform training or logging ...
original_run.finish()

# Fork the run from a specific step
forked_run = wandb.init(
    project="your_project_name",
    entity="your_entity_name",
    fork_from=f"{original_run.id}?_step=200",
)

Using an immutable run ID

Use an immutable run ID to ensure you have a consistent and unchanging reference to a specific run. Follow these steps to obtain the immutable run ID from the user interface:

Access the Overview Tab: Navigate to the Overview tab on the source run’s page.
Copy the Immutable Run ID: Click on the ... menu (three dots) located in the top-right corner of the Overview tab. Select the Copy Immutable Run ID option from the dropdown menu.

By following these steps, you will have a stable and unchanging reference to the run, which can be used for forking a run.

Continue from a forked run

After initializing a forked run, you can continue logging to the new run. You can log the same metrics for continuity and introduce new metrics.

For example, the following code example shows how to first fork a run and then how to log metrics to the forked run starting from a training step of 200:

import wandb
import math

# Initialize the first run and log some metrics
run1 = wandb.init("your_project_name", entity="your_entity_name")
for i in range(300):
    run1.log({"metric": i})
run1.finish()

# Fork from the first run at a specific step and log the metric starting from step 200
run2 = wandb.init(
    "your_project_name", entity="your_entity_name", fork_from=f"{run1.id}?_step=200"
)

# Continue logging in the new run
# For the first few steps, log the metric as is from run1
# After step 250, start logging the spikey pattern
for i in range(200, 300):
    if i < 250:
        run2.log({"metric": i})  # Continue logging from run1 without spikes
    else:
        # Introduce the spikey behavior starting from step 250
        subtle_spike = i + (2 * math.sin(i / 3.0))  # Apply a subtle spikey pattern
        run2.log({"metric": subtle_spike})
    # Additionally log the new metric at all steps
    run2.log({"additional_metric": i * 1.1})
run2.finish()

Rewind and forking compatibility

Forking compliments a rewind by providing more flexibility in managing and experimenting with your runs.

When you fork from a run, W&B creates a new branch off a run at a specific point to try different parameters or models.

When you rewind a run, W&B let’s you correct or modify the run history itself.

2.1.5.6 - Group runs into experiments

Group training and evaluation runs into larger experiments

Group individual jobs into experiments by passing a unique group name to wandb.init().

Use cases

Distributed training: Use grouping if your experiments are split up into different pieces with separate training and evaluation scripts that should be viewed as parts of a larger whole.
Multiple processes: Group multiple smaller processes together into an experiment.
K-fold cross-validation: Group together runs with different random seeds to see a larger experiment. Here’s an example of k-fold cross-validation with sweeps and grouping.

There are several ways to set grouping:

1. Set group in your script

Pass an optional group and job_type to wandb.init(). This gives you a dedicated group page for each experiment, which contains the individual runs. For example:wandb.init(group="experiment_1", job_type="eval")

2. Set a group environment variable

Use WANDB_RUN_GROUP to specify a group for your runs as an environment variable. For more on this, check our docs for Environment Variables. Group should be unique within your project and shared by all runs in the group. You can use wandb.util.generate_id() to generate a unique 8 character string to use in all your processes— for example, os.environ["WANDB_RUN_GROUP"] = "experiment-" + wandb.util.generate_id()

3. Set a group in the UI

After a run is initialized, you can move it to a new group from your workspace or its Runs page.

Navigate to your W&B project.
Select the Workspace or Runs tab from the project sidebar.
Search or scroll to the run you want to rename.

Hover over the run name, click the three vertical dots, then click Move to another group.
To create a new group, click New group. Type a group name, then submit the form.
Select the run’s new group from the list, then click Move.

4. Toggle grouping by columns in the UI

You can dynamically group by any column, including a column that is hidden. For example, if you use wandb.Run.config to log batch size or learning rate, you can then group by those hyperparameters dynamically in the web app. The Group by feature is distinct from a run’s run group. You can group runs by run group. To move a run to a different run group, refer to Set a group in the UI.

In the list of runs, the Group column is hidden by default.

To group runs by one or more columns:

Click Group.
Click the names of one or more columns.
If you selected more than one column, drag them to change the grouping order.
Click anywhere outside of the form to dismiss it.

Customize how runs are displayed

You can customize how runs are displayed in your project from the Workspace or Runs tabs. Both tabs use the same display configuration.

To customize which columns are visible:

Above the list of runs, click Columns.
Click the name of a hidden column to show it. Click the name of a visible column to hide it.

You can optionally search by column name using fuzzy search, an exact match, or regular expressions. Drag columns to change their order.
Click Done to close the column browser.

To sort the list of runs by any visible column:

Hover over the column name, then click its action ... menu.
Click Sort ascending or Sort descending.

Pinned columns are shown on the right-hand side. To pin or unpin a column:

Hover over the column name, then click its action ... menu.
Click Pin column or Unpin column.

By default, long run names are truncated in the middle for readability. To customize the truncation of run names:

Click the action ... menu at the top of the list of runs.
Set Run name cropping to crop the end, middle, or beginning.

Distributed training with grouping

Suppose you set grouping in wandb.init(), we will group runs by default in the UI. You can toggle this on and off by clicking the Group button at the top of the table. Here’s an example project generated from sample code where we set grouping. You can click on each “Group” row in the sidebar to get to a dedicated group page for that experiment.

From the project page above, you can click a Group in the left sidebar to get to a dedicated page like this one:

Grouping dynamically in the UI

You can group runs by any column, for example by hyperparameter. Here’s an example of what that looks like:

Sidebar: Runs are grouped by the number of epochs.
Graphs: Each line represents the group’s mean, and the shading indicates the variance. This behavior can be changed in the graph settings.

Turn off grouping

Click the grouping button and clear group fields at any time, which returns the table and graphs to their ungrouped state.

Grouping graph settings

Click the edit button in the upper right corner of a graph and select the Advanced tab to change the line and shading. You can select the mean, minimum, or maximum value for the line in each group. For the shading, you can turn off shading, and show the min and max, the standard deviation, and the standard error.

2.1.5.7 - Move runs

This page shows how to move a run from one project to another, into or out of a team, or from one team to another. You must have access to the run at its current and new locations.

When you move a run, historical artifacts associated with it are not moved. To move an artifact manually, you can use the wandb artifact get SDK command or the Api.artifact API to download the artifact, then use wandb artifact put or the Api.artifact API to upload it to the run’s new location.

To customize the Runs tab, refer to Project page.

If you group runs into experiments, refer to Set a group in the UI.

Move runs between your projects

To move runs from one project to another:

Navigate to the project that contains the runs you want to move.
Select the Runs tab from the project sidebar.
Select the checkbox next to the runs you want to move.
Choose the Move button above the table.
Select the destination project from the dropdown.

Move runs to a team

Move runs to a team you are a member of:

Navigate to the project that contains the runs you want to move.
Select the Runs tab from the project sidebar.
Select the checkbox next to the runs you want to move.
Choose the Move button above the table.
Select the destination team and project from the dropdown.

2.1.5.8 - Resume a run

Resume a paused or exited W&B Run

Specify how a run should behave in the event that run stops or crashes. To resume or enable a run to automatically resume, you will need to specify the unique run ID associated with that run for the id parameter:

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="<resume>")

W&B encourages you to provide the name of the W&B Project where you want to store the run.

Pass one of the following arguments to the resume parameter to determine how W&B should respond. In each case, W&B first checks if the run ID already exists.

Argument	Description	Run ID exists	Run ID does not exist	Use case
`"must"`	W&B must resume run specified by the run ID.	W&B resumes run with the same run ID.	W&B raises an error.	Resume a run that must use the same run ID.
`"allow"`	Allow W&B to resume run if run ID exists.	W&B resumes run with the same run ID.	W&B initializes a new run with specified run ID.	Resume a run without overriding an existing run.
`"never"`	Never allow W&B to resume a run specified by run ID.	W&B raises an error.	W&B initializes a new run with specified run ID.

You can also specify resume="auto" to let W&B to automatically try to restart the run on your behalf. However, you will need to ensure that you restart your run from the same directory. See the Enable runs to automatically resume section for more information.

For all the examples below, replace values enclosed within <> with your own.

Resume a run that must use the same run ID

If a run is stopped, crashes, or fails, you can resume it using the same run ID. To do so, initialize a run and specify the following:

Set the resume parameter to "must" (resume="must")
Provide the run ID of the run that stopped or crashed

The following code snippet shows how to accomplish this with the W&B Python SDK:

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="must")

Unexpected results will occur if multiple processes use the same id concurrently.

For more information on how to manage multiple processes, see the Log distributed training experiments

Resume a run without overriding the existing run

Resume a run that stopped or crashed without overriding the existing run. This is especially helpful if your process doesn’t exit successfully. The next time you start W&B, W&B will start logging from the last step.

Set the resume parameter to "allow" (resume="allow") when you initialize a run with W&B. Provide the run ID of the run that stopped or crashed. The following code snippet shows how to accomplish this with the W&B Python SDK:

import wandb

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="allow")

Enable runs to automatically resume

The following code snippet shows how to enable runs to automatically resume with the Python SDK or with environment variables.

The following code snippet shows how to specify a W&B run ID with the Python SDK.

Replace values enclosed within <> with your own:

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="<resume>")

The following example shows how to specify the W&B WANDB_RUN_ID variable in a bash script:

RUN_ID="$1"

WANDB_RESUME=allow WANDB_RUN_ID="$RUN_ID" python eval.py

Within your terminal, you could run the shell script along with the W&B run ID. The following code snippet passes the run ID akj172:

sh run_experiment.sh akj172

Automatic resuming only works if the process is restarted on top of the same filesystem as the failed process.

For example, suppose you execute a python script called train.py in a directory called Users/AwesomeEmployee/Desktop/ImageClassify/training/. Within train.py, the script creates a run that enables automatic resuming. Suppose next that the training script is stopped. To resume this run, you would need to restart your train.py script within Users/AwesomeEmployee/Desktop/ImageClassify/training/ .

If you can not share a filesystem, specify the WANDB_RUN_ID environment variable or pass the run ID with the W&B Python SDK. See the Custom run IDs section in the “What are runs?” page for more information on run IDs.

Resume preemptible Sweeps runs

Automatically requeue interrupted sweep runs. This is particularly useful if you run a sweep agent in a compute environment that is subject to preemption such as a SLURM job in a preemptible queue, an EC2 spot instance, or a Google Cloud preemptible VM.

Use the mark_preempting function to automatically requeue interrupted sweep runs. For example:

run = wandb.init()  # Initialize a run
run.mark_preempting()

The following table outlines how W&B handles runs based on the exit status of the a sweep run.

Status	Behavior
Status code 0	Run is considered to have terminated successfully and it will not be requeued.
Nonzero status	W&B automatically appends the run to a run queue associated with the sweep.
No status	Run is added to the sweep run queue. Sweep agents consume runs off the run queue until the queue is empty. Once the queue is empty, the sweep queue resumes generating new runs based on the sweep search algorithm.

2.1.5.9 - Rewind a run

Rewind

Rewind a run

The option to rewind a run is in private preview. Contact W&B Support at support@wandb.com to request access to this feature.

W&B currently does not support:

Log rewind: Logs are reset in the new run segment.
System metrics rewind: W&B logs only new system metrics after the rewind point.
Artifact association: W&B associates artifacts with the source run that produces them.

To rewind a run, you must have W&B Python SDK version >= 0.17.1.
You must use monotonically increasing steps. This does not work with non-monotonic steps defined with define_metric() because it disrupts the required chronological order of run history and system metrics.

Rewind a run to correct or modify the history of a run without losing the original data. In addition, when you rewind a run, you can log new data from that point in time. W&B recomputes the summary metrics for the run you rewind based on the newly logged history. This means the following behavior:

History truncation: W&B truncates the history to the rewind point, allowing new data logging.
Summary metrics: Recomputed based on the newly logged history.
Configuration preservation: W&B preserves the original configurations and you can merge new configurations.

When you rewind a run, W&B resets the state of the run to the specified step, preserving the original data and maintaining a consistent run ID. This means that:

Run archiving: W&B archives the original runs. Runs are accessible from the Run Overview tab.
Artifact association: Associates artifacts with the run that produce them.
Immutable run IDs: Introduced for consistent forking from a precise state.
Copy immutable run ID: A button to copy the immutable run ID for improved run management.

Rewind and forking compatibility

Forking compliments a rewind.

When you fork from a run, W&B creates a new branch off a run at a specific point to try different parameters or models.

When you rewind a run, W&B lets you correct or modify the run history itself.

Rewind a run

Use resume_from with wandb.init() to “rewind” a run’s history to a specific step. Specify the name of the run and the step you want to rewind from:

import wandb
import math

# Initialize the first run and log some metrics
# Replace with your_project_name and your_entity_name!
run1 = wandb.init(project="your_project_name", entity="your_entity_name")
for i in range(300):
    run1.log({"metric": i})
run1.finish()

# Rewind from the first run at a specific step and log the metric starting from step 200
run2 = wandb.init(project="your_project_name", entity="your_entity_name", resume_from=f"{run1.id}?_step=200")

# Continue logging in the new run
# For the first few steps, log the metric as is from run1
# After step 250, start logging the spikey pattern
for i in range(200, 300):
    if i < 250:
        run2.log({"metric": i, "step": i})  # Continue logging from run1 without spikes
    else:
        # Introduce the spikey behavior starting from step 250
        subtle_spike = i + (2 * math.sin(i / 3.0))  # Apply a subtle spikey pattern
        run2.log({"metric": subtle_spike, "step": i})
    # Additionally log the new metric at all steps
    run2.log({"additional_metric": i * 1.1, "step": i})
run2.finish()

View an archived run

After you rewind a run, you can explore archived run with the W&B App UI. Follow these steps to view archived runs:

Access the Overview Tab: Navigate to the Overview tab on the run’s page. This tab provides a comprehensive view of the run’s details and history.
Locate the Forked From field: Within the Overview tab, find the Forked From field. This field captures the history of the resumptions. The Forked From field includes a link to the source run, allowing you to trace back to the original run and understand the entire rewind history.

By using the Forked From field, you can effortlessly navigate the tree of archived resumptions and gain insights into the sequence and origin of each rewind.

Fork from a run that you rewind

To fork from a rewound run, use the fork_from argument in wandb.init() and specify the source run ID and the step from the source run to fork from:

import wandb

# Fork the run from a specific step
forked_run = wandb.init(
    project="your_project_name",
    entity="your_entity_name",
    fork_from=f"{rewind_run.id}?_step=500",
)

# Continue logging in the new run
for i in range(500, 1000):
    forked_run.log({"metric": i*3})
forked_run.finish()

2.1.5.10 - Semantic run plot legends

Create semantic legends for charts

Create visually meaningful line plots and plot legends by color-coding your W&B runs based on metrics or configuration parameters. Identify patterns and trends across experiments by coloring runs according to their performance metrics (highest, lowest, or latest values). W&B automatically groups your runs into color-coded buckets based on the values of your selected parameter.

Navigate to your workspace’s settings page to configure metric or configuration-based colors for runs:

Navigate to your W&B project.
Select the Workspace tab from the project sidebar.
Click on the Settings icon (⚙️) in the top right corner.
From the drawer, select Runs then select Key-based colors.
- From the Key dropdown, select the metric you want to use for assigning colors to runs.
- From the Y value dropdown, select the y value you want to use for assigning colors to runs.
- Set the the number of buckets to a value from 2 to 8.

The following sections describe how to set the metric and y value and as how to customize the buckets used for assigning colors to runs.

Set a metric

The metric options in your Key dropdown are derived from the key-value pairs you log to W&B and default metrics defined by W&B.

Default metrics

Relative Time (Process): The relative time of the run, measured in seconds since the start of the run.
Relative Time (Wall): The relative time of the run, measured in seconds since the start of the run, adjusted for wall clock time.
Wall Time: The wall clock time of the run, measured in seconds since the epoch.
Step: The step number of the run, which is typically used to track the progress of training or evaluation.

Custom metrics

Color runs and create meaningful plot legends based on custom metrics logged by your training or evaluation scripts. Custom metrics are logged as key-value pairs, where the key is the name of the metric and the value is the metric value.

For example, the following code snippet logs accuracy ("acc" key) and loss ("loss" key) during a training loop:

import wandb
import random

epochs = 10

with wandb.init(project="basic-intro") as run:
  # Block simulates a training loop logging metrics
  offset = random.random() / 5
  for epoch in range(2, epochs):
      acc = 1 - 2 ** -epoch - random.random() / epoch - offset
      loss = 2 ** -epoch + random.random() / epoch + offset

      # Log metrics from your script to W&B
      run.log({"acc": acc, "loss": loss})

Within the Key dropdown, both "acc" and "loss" are available options.

Set a configuration key

The configuration options in your Key dropdown are derived from the key-value pairs you pass to the config parameter when you initialize a W&B run. Configuration keys are typically used to log hyperparameters or other settings used in your training or evaluation scripts.

import wandb

config = {
  "learning_rate": 0.01,
  "batch_size": 32,
  "optimizer": "adam"
}

with wandb.init(project="basic-intro", config=config) as run:
  # Your training code here
  pass

Within the Key dropdown, "learning_rate", "batch_size", and "optimizer" are available options.

Set a y value

You can choose from the following options:

Latest: Determine color based on Y value at last logged step for each line.
Max: Color based on highest Y value logged against the metric.
Min: Color based on lowest Y value logged against the metric.

Customize buckets

Buckets are ranges of values that W&B uses to categorize runs based on the metric or configuration key you select. Buckets are evenly distributed across the range of values for the specified metric or configuration key and each bucket is assigned a unique color. Runs that fall within that bucket’s range are displayed in that color.

Consider the following:

Key is set to "Accuracy" (abbreviated as "acc").
Y value is set to "Max"

With this configuration, W&B colors each run based on their accuracy values. The colors vary from a light yellow color to a deep color. Lighter colors represent lower accuracy values, while deeper colors represent higher accuracy values.

Six buckets are defined for the metric, with each bucket representing a range of accuracy values. Within the Buckets section, the following range of buckets are defined:

Bucket 1: (Min - 0.7629)
Bucket 2: (0.7629 - 0.7824)
Bucket 3: (0.7824 - 0.8019)
Bucket 4: (0.8019 - 0.8214)
Bucket 5: (0.8214 - 0.8409)
Bucket 6: (0.8409 - Max)

In the line plot below, the run with the highest accuracy (0.8232) is colored in a deep purple (Bucket 5), while the run with the lowest accuracy (0.7684) is colored in a light orange (Bucket 2). The other runs are colored based on their accuracy values, with the color gradient indicating their relative performance.

2.1.5.11 - Send an alert

Send alerts, triggered from your Python code, to your Slack or email

Try in Colab

Create alerts with Slack or email if your run crashes or with a custom trigger. For example, you can create an alert if the gradient of your training loop starts to blow up (reports NaN) or a step in your ML pipeline completes. Alerts apply to all projects where you initialize runs, including both personal and team projects.

And then see W&B Alerts messages in Slack (or your email):

W&B Alerts require you to add run.alert() to your code. Without modifying your code, Automations provide another way to notify Slack based on an event in W&B, such as when an artifact artifact version is created or when a run metric meets or changes by a threshold.

For example, an automation can notify a Slack channel when a new version is created, run an automated testing webhook when the production alias is added to an artifact, or start a validation job only when a run’s loss is within acceptable bounds.

Read the Automations overview or create an automation.

Create an alert

The following guide only applies to alerts in multi-tenant cloud.

If you’re using W&B Server in your Private Cloud or on W&B Dedicated Cloud, refer to Configure Slack alerts in W&B Server to set up Slack alerts.

To set up an alert, take these steps, which are detailed in the following sections:

Turn on Alerts in your W&B User Settings.
Add run.alert() to your code.
Test the configuration.

1. Turn on alerts in your W&B User Settings

In your User Settings:

Scroll to the Alerts section
Turn on Scriptable run alerts to receive alerts from run.alert()
Use Connect Slack to pick a Slack channel to post alerts. We recommend the Slackbot channel because it keeps the alerts private.
Email will go to the email address you used when you signed up for W&B. We recommend setting up a filter in your email so all these alerts go into a folder and don’t fill up your inbox.

You will only have to do this the first time you set up W&B Alerts, or when you’d like to modify how you receive alerts.

2. Add `run.alert()` to your code

Add run.alert() to your code (either in a Notebook or Python script) wherever you’d like it to be triggered

import wandb

run = wandb.init()
run.alert(title="High Loss", text="Loss is increasing rapidly")

3. Test the configuration

Check your Slack or emails for the alert message. If you didn’t receive any, make sure you’ve got emails or Slack turned on for Scriptable Alerts in your User Settings

Example

This simple alert sends a warning when accuracy falls below a threshold. In this example, it only sends alerts at least 5 minutes apart.

import wandb
from wandb import AlertLevel

run = wandb.init()

if acc < threshold:
    run.alert(
        title="Low accuracy",
        text=f"Accuracy {acc} is below the acceptable threshold {threshold}",
        level=AlertLevel.WARN,
        wait_duration=300,
    )

Tag or mention users

Use the at sign @ followed by the Slack user ID to tag yourself or your colleagues in either the title or the text of the alert. You can find a Slack user ID from their Slack profile page.

run.alert(title="Loss is NaN", text=f"Hey <@U1234ABCD> loss has gone to NaN")

Configure team alerts

Team admins can set up alerts for the team on the team settings page: wandb.ai/teams/your-team.

Team alerts apply to everyone on your team. W&B recommends using the Slackbot channel because it keeps alerts private.

Change Slack channel to send alerts to

To change what channel alerts are sent to, click Disconnect Slack and then reconnect. After you reconnect, pick a different Slack channel.

2.1.6 - Log objects and media

Keep track of metrics, videos, custom plots, and more

Log a dictionary of metrics, media, or custom objects to a step with the W&B Python SDK. W&B collects the key-value pairs during each step and stores them in one unified dictionary each time you log data with wandb.Run.log(). Data logged from your script is saved locally to your machine in a directory called wandb, then synced to the W&B cloud or your private server.

Key-value pairs are stored in one unified dictionary only if you pass the same value for each step. W&B writes all of the collected keys and values to memory if you log a different value for step.

Each call to wandb.Run.log() is a new step by default. W&B uses steps as the default x-axis when it creates charts and panels. You can optionally create and use a custom x-axis or capture a custom summary metric. For more information, see Customize log axes.

Use wandb.Run.log() to log consecutive values for each step: 0, 1, 2, and so on. It is not possible to write to a specific history step. W&B only writes to the “current” and “next” step.

Automatically logged data

W&B automatically logs the following information during a W&B Experiment:

System metrics: CPU and GPU utilization, network, etc. For the GPU, these are fetched with nvidia-smi.
Command line: The stdout and stderr are picked up and show in the logs tab on the run page.

Turn on Code Saving in your account’s Settings page to log:

Git commit: Pick up the latest git commit and see it on the overview tab of the run page, as well as a diff.patch file if there are any uncommitted changes.
Dependencies: The requirements.txt file will be uploaded and shown on the files tab of the run page, along with any files you save to the wandb directory for the run.

What data is logged with specific W&B API calls?

With W&B, you can decide exactly what you want to log. The following lists some commonly logged objects:

Datasets: You have to specifically log images or other dataset samples for them to stream to W&B.
Plots: Use wandb.plot() with wandb.Run.log() to track charts. See Log Plots for more information.
Tables: Use wandb.Table to log data to visualize and query with W&B. See Log Tables for more information.
PyTorch gradients: Add wandb.Run.watch(model) to see gradients of the weights as histograms in the UI.
Configuration information: Log hyperparameters, a link to your dataset, or the name of the architecture you’re using as config parameters, passed in like this: wandb.init(config=your_config_dictionary). See the PyTorch Integrations page for more information.
Metrics: Use wandb.Run.log() to see metrics from your model. If you log metrics like accuracy and loss from inside your training loop, you’ll get live updating graphs in the UI.

Common workflows

Compare the best accuracy: To compare the best value of a metric across runs, set the summary value for that metric. By default, summary is set to the last value you logged for each key. This is useful in the table in the UI, where you can sort and filter runs based on their summary metrics, to help compare runs in a table or bar chart based on their best accuracy, instead of final accuracy. For example: wandb.run.summary["best_accuracy"] = best_accuracy
View multiple metrics on one chart: Log multiple metrics in the same call to wandb.Run.log(), like this: wandb.log({"acc'": 0.9, "loss": 0.1}) and they will both be available to plot against in the UI
Customize the x-axis: Add a custom x-axis to the same log call to visualize your metrics against a different axis in the W&B dashboard. For example: wandb.Run.log({'acc': 0.9, 'epoch': 3, 'batch': 117}). To set the default x-axis for a given metric use Run.define_metric()
Log rich media and charts: wandb.Run.log() supports the logging of a wide variety of data types, from media like images and videos to tables and charts.

Best practices and tips

For best practices and tips for Experiments and logging, see Best Practices: Experiments and Logging.

2.1.6.1 - Create and track plots from experiments

Create and track plots from machine learning experiments.

Using the methods in wandb.plot, you can track charts with wandb.Run.log(), including charts that change over time during training. To learn more about our custom charting framework, check out the custom charts walkthrough.

Basic charts

These simple charts make it easy to construct basic visualizations of metrics and results.

Log a custom line plot—a list of connected and ordered points on arbitrary axes.

import wandb

with wandb.init() as run:
    data = [[x, y] for (x, y) in zip(x_values, y_values)]
    table = wandb.Table(data=data, columns=["x", "y"])
    run.log(
        {
            "my_custom_plot_id": wandb.plot.line(
                table, "x", "y", title="Custom Y vs X Line Plot"
            )
        }
    )

You can use this to log curves on any two dimensions. If you’re plotting two lists of values against each other, the number of values in the lists must match exactly. For example, each point must have an x and a y.

See in the app

Run the code

Log a custom scatter plot—a list of points (x, y) on a pair of arbitrary axes x and y.

import wandb

with wandb.init() as run:
    data = [[x, y] for (x, y) in zip(class_x_scores, class_y_scores)]
    table = wandb.Table(data=data, columns=["class_x", "class_y"])
    run.log({"my_custom_id": wandb.plot.scatter(table, "class_x", "class_y")})

You can use this to log scatter points on any two dimensions. If you’re plotting two lists of values against each other, the number of values in the lists must match exactly. For example, each point must have an x and a y.

See in the app

Run the code

Log a custom bar chart—a list of labeled values as bars—natively in a few lines:

import wandb

with wandb.init() as run:
    data = [[label, val] for (label, val) in zip(labels, values)]
    table = wandb.Table(data=data, columns=["label", "value"])
    run.log(
        {
        "my_bar_chart_id": wandb.plot.bar(
            table, "label", "value", title="Custom Bar Chart"
        )
    }
)

You can use this to log arbitrary bar charts. The number of labels and values in the lists must match exactly. Each data point must have both.

See in the app

Run the code

Log a custom histogram—sort a list of values into bins by count/frequency of occurrence—natively in a few lines. Let’s say I have a list of prediction confidence scores (scores) and want to visualize their distribution:

import wandb

with wandb.init() as run:
    data = [[s] for s in scores]
    table = wandb.Table(data=data, columns=["scores"])
    run.log({"my_histogram": wandb.plot.histogram(table, "scores", title="Histogram")})

You can use this to log arbitrary histograms. Note that data is a list of lists, intended to support a 2D array of rows and columns.

See in the app

Run the code

Plot multiple lines, or multiple different lists of x-y coordinate pairs, on one shared set of x-y axes:

import wandb
with wandb.init() as run:
    run.log(
        {
            "my_custom_id": wandb.plot.line_series(
                xs=[0, 1, 2, 3, 4],
                ys=[[10, 20, 30, 40, 50], [0.5, 11, 72, 3, 41]],
            keys=["metric Y", "metric Z"],
            title="Two Random Metrics",
            xname="x units",
        )
    }
)

Note that the number of x and y points must match exactly. You can supply one list of x values to match multiple lists of y values, or a separate list of x values for each list of y values.

See in the app

Model evaluation charts

These preset charts have built-in wandb.plot() methods that make it quick and easy to log charts directly from your script and see the exact information you’re looking for in the UI.

Create a Precision-Recall curve in one line:

import wandb
with wandb.init() as run:
    # ground_truth is a list of true labels, predictions is a list of predicted scores
    # e.g. ground_truth = [0, 1, 1, 0], predictions = [0.1, 0.4, 0.35, 0.8]
    ground_truth = [0, 1, 1, 0]
    predictions = [0.1, 0.4, 0.35, 0.8]
    run.log({"pr": wandb.plot.pr_curve(ground_truth, predictions)})

You can log this whenever your code has access to:

a model’s predicted scores (predictions) on a set of examples
the corresponding ground truth labels (ground_truth) for those examples
(optionally) a list of the labels/class names (labels=["cat", "dog", "bird"...] if label index 0 means cat, 1 = dog, 2 = bird, etc.)
(optionally) a subset (still in list format) of the labels to visualize in the plot

See in the app

Run the code

Create an ROC curve in one line:

import wandb

with wandb.init() as run:
    # ground_truth is a list of true labels, predictions is a list of predicted scores
    # e.g. ground_truth = [0, 1, 1, 0], predictions = [0.1, 0.4, 0.35, 0.8]
    ground_truth = [0, 1, 1, 0]
    predictions = [0.1, 0.4, 0.35, 0.8]
    run.log({"roc": wandb.plot.roc_curve(ground_truth, predictions)})

You can log this whenever your code has access to:

a model’s predicted scores (predictions) on a set of examples
the corresponding ground truth labels (ground_truth) for those examples
(optionally) a list of the labels/ class names (labels=["cat", "dog", "bird"...] if label index 0 means cat, 1 = dog, 2 = bird, etc.)
(optionally) a subset (still in list format) of these labels to visualize on the plot

See in the app

Run the code

Create a multi-class confusion matrix in one line:

import wandb

cm = wandb.plot.confusion_matrix(
    y_true=ground_truth, preds=predictions, class_names=class_names
)

with wandb.init() as run:
    run.log({"conf_mat": cm})

You can log this wherever your code has access to:

a model’s predicted labels on a set of examples (preds) or the normalized probability scores (probs). The probabilities must have the shape (number of examples, number of classes). You can supply either probabilities or predictions but not both.
the corresponding ground truth labels for those examples (y_true)
a full list of the labels/class names as strings of class_names. Examples: class_names=["cat", "dog", "bird"] if index 0 is cat, 1 is dog, 2 is bird.

See in the app

Run the code

Interactive custom charts

For full customization, tweak a built-in Custom Chart preset or create a new preset, then save the chart. Use the chart ID to log data to that custom preset directly from your script.

import wandb
# Create a table with the columns to plot
table = wandb.Table(data=data, columns=["step", "height"])

# Map from the table's columns to the chart's fields
fields = {"x": "step", "value": "height"}

# Use the table to populate the new custom chart preset
# To use your own saved chart preset, change the vega_spec_name
# To edit the title, change the string_fields
my_custom_chart = wandb.plot_table(
    vega_spec_name="carey/new_chart",
    data_table=table,
    fields=fields,
    string_fields={"title": "Height Histogram"},
)

with wandb.init() as run:
    # Log the custom chart
    run.log({"my_custom_chart": my_custom_chart})

Run the code

Matplotlib and Plotly plots

Instead of using W&B Custom Charts with wandb.plot(), you can log charts generated with matplotlib and Plotly.

import wandb
import matplotlib.pyplot as plt

with wandb.init() as run:
    # Create a simple matplotlib plot
    plt.figure()
    plt.plot([1, 2, 3, 4])
    plt.ylabel("some interesting numbers")
    
    # Log the plot to W&B
    run.log({"chart": plt})

Just pass a matplotlib plot or figure object to wandb.Run.log(). By default we’ll convert the plot into a Plotly plot. If you’d rather log the plot as an image, you can pass the plot into wandb.Image. We also accept Plotly charts directly.

If you’re getting an error “You attempted to log an empty plot” then you can store the figure separately from the plot with fig = plt.figure() and then log fig in your call to wandb.Run.log().

Log custom HTML to W&B Tables

W&B supports logging interactive charts from Plotly and Bokeh as HTML and adding them to Tables.

Log Plotly figures to Tables as HTML

You can log interactive Plotly charts to wandb Tables by converting them to HTML.

import wandb
import plotly.express as px

# Initialize a new run
with wandb.init(project="log-plotly-fig-tables", name="plotly_html") as run:

    # Create a table
    table = wandb.Table(columns=["plotly_figure"])

    # Create path for Plotly figure
    path_to_plotly_html = "./plotly_figure.html"

    # Example Plotly figure
    fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])

    # Write Plotly figure to HTML
    # Set auto_play to False prevents animated Plotly charts
    # from playing in the table automatically
    fig.write_html(path_to_plotly_html, auto_play=False)

    # Add Plotly figure as HTML file into Table
    table.add_data(wandb.Html(path_to_plotly_html))

    # Log Table
    run.log({"test_table": table})

Log Bokeh figures to Tables as HTML

You can log interactive Bokeh charts to wandb Tables by converting them to HTML.

from scipy.signal import spectrogram
import holoviews as hv
import panel as pn
from scipy.io import wavfile
import numpy as np
from bokeh.resources import INLINE

hv.extension("bokeh", logo=False)
import wandb


def save_audio_with_bokeh_plot_to_html(audio_path, html_file_name):
    sr, wav_data = wavfile.read(audio_path)
    duration = len(wav_data) / sr
    f, t, sxx = spectrogram(wav_data, sr)
    spec_gram = hv.Image((t, f, np.log10(sxx)), ["Time (s)", "Frequency (hz)"]).opts(
        width=500, height=150, labelled=[]
    )
    audio = pn.pane.Audio(wav_data, sample_rate=sr, name="Audio", throttle=500)
    slider = pn.widgets.FloatSlider(end=duration, visible=False)
    line = hv.VLine(0).opts(color="white")
    slider.jslink(audio, value="time", bidirectional=True)
    slider.jslink(line, value="glyph.location")
    combined = pn.Row(audio, spec_gram * line, slider).save(html_file_name)


html_file_name = "audio_with_plot.html"
audio_path = "hello.wav"
save_audio_with_bokeh_plot_to_html(audio_path, html_file_name)

wandb_html = wandb.Html(html_file_name)

with wandb.init(project="audio_test") as run:
    my_table = wandb.Table(columns=["audio_with_plot"], data=[[wandb_html]])
    run.log({"audio_table": my_table})

2.1.6.2 - Customize log axes

Set a custom x-axis when you log metrics to W&B. By default, W&B logs metrics as steps. Each step corresponds to a wandb.Run.log() API call.

For example, the following script has a for loop that iterates 10 times. In each iteration, the script logs a metric called validation_loss and increments the step number by 1.

import wandb

with wandb.init() as run:
  # range function creates a sequence of numbers from 0 to 9
  for i in range(10):
    log_dict = {
        "validation_loss": 1/(i+1)   
    }
    run.log(log_dict)

In the project’s workspace, the validation_loss metric is plotted against the step x-axis, which increments by 1 each time wandb.Run.log() is called. From the previous code, the x-axis shows the step numbers 0, 1, 2, …, 9.

Line plot panel that uses `step` as the x-axis.

In certain situations, it makes more sense to log metrics against a different x-axis such as a logarithmic x-axis. Use the define_metric() method to use any metric you log as a custom x-axis.

Specify the metric that you want to appear as the y-axis with the name parameter. The step_metric parameter specifies the metric you want to use as the x-axis. When you log a custom metric, specify a value for both the x-axis and the y-axis as key-value pairs in a dictionary.

Copy and paste the following code snippet to set a custom x-axis metric. Replace the values within <> with your own values:

import wandb

custom_step = "<custom_step>"  # Name of custom x-axis
metric_name = "<metric>"  # Name of y-axis metric

with wandb.init() as run:
    # Specify the step metric (x-axis) and the metric to log against it (y-axis)
    run.define_metric(step_metric = custom_step, name = metric_name)

    for i in range(10):
        log_dict = {
            custom_step : int,  # Value of x-axis
            metric_name : int,  # Value of y-axis
        }
        run.log(log_dict)

As an example, the following code snippet creates a custom x-axis called x_axis_squared. The value of the custom x-axis is the square of the for loop index i (i**2). The y-axis consists of mock values for validation loss ("validation_loss") using Python’s built-in random module:

import wandb
import random

with wandb.init() as run:
    run.define_metric(step_metric = "x_axis_squared", name = "validation_loss")

    for i in range(10):
        log_dict = {
            "x_axis_squared": i**2,
            "validation_loss": random.random(),
        }
        run.log(log_dict)

The following image shows the resulting plot in the W&B App UI. The validation_loss metric is plotted against the custom x-axis x_axis_squared, which is the square of the for loop index i. Note that the x-axis values are 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, which correspond to the squares of 0, 1, 2, ..., 9 respectively.

Line plot panel that uses a custom x axis. Values are logged to W&B as the square of the loop number.

You can set a custom x-axis for multiple metrics using globs with string prefixes. As an example, the following code snippet plots logged metrics with the prefix train/* to the x-axis train/step:

import wandb

with wandb.init() as run:

    # set all other train/ metrics to use this step
    run.define_metric("train/*", step_metric="train/step")

    for i in range(10):
        log_dict = {
            "train/step": 2**i,  # exponential growth w/ internal W&B step
            "train/loss": 1 / (i + 1),  # x-axis is train/step
            "train/accuracy": 1 - (1 / (1 + i)),  # x-axis is train/step
            "val/loss": 1 / (1 + i),  # x-axis is internal wandb step
        }
        run.log(log_dict)

2.1.6.3 - Log distributed training experiments

Use W&B to log distributed training experiments with multiple GPUs.

During a distributed training experiment, you train a model using multiple machines or clients in parallel. W&B can help you track distributed training experiments. Based on your use case, track distributed training experiments using one of the following approaches:

Track a single process: Track a rank 0 process (also known as a “leader” or “coordinator”) with W&B. This is a common solution for logging distributed training experiments with the PyTorch Distributed Data Parallel (DDP) Class.
Track multiple processes: For multiple processes, you can either:
- Track each process separately using one run per process. You can optionally group them together in the W&B App UI.
- Track all processes to a single run.

Track a single process

This section describes how to track values and metrics available to your rank 0 process. Use this approach to track only metrics that are available from a single process. Typical metrics include GPU/CPU utilization, behavior on a shared validation set, gradients and parameters, and loss values on representative data examples.

Within the rank 0 process, initialize a W&B run with wandb.init() and log experiments (wandb.log) to that run.

The following sample Python script (log-ddp.py) demonstrates one way to track metrics on two GPUs on a single machine using PyTorch DDP. PyTorch DDP (DistributedDataParallel intorch.nn) is a popular library for distributed training. The basic principles apply to any distributed training setup, but the implementation may differ.

The Python script:

Starts multiple processes with torch.distributed.launch.
Checks the rank with the --local_rank command line argument.
If the rank is set to 0, sets up wandb logging conditionally in the train() function.

if __name__ == "__main__":
    # Get args
    args = parse_args()

    if args.local_rank == 0:  # only on main process
        # Initialize wandb run
        run = wandb.init(
            entity=args.entity,
            project=args.project,
        )
        # Train model with DDP
        train(args, run)
    else:
        train(args)

Explore an example dashboard showing metrics tracked from a single process.

The dashboard displays system metrics for both GPUs, such as temperature and utilization.

However, the loss values as a function epoch and batch size were only logged from a single GPU.

Track multiple processes

Track multiple processes with W&B with one of the following approaches:

Tracking each process separately by creating a run for each process.
Tracking all processes to a single run.

Track each process separately

This section describes how to track each process separately by creating a run for each process. Within each run you log metrics, artifacts, and forth to their respective run. Call wandb.Run.finish() at the end of training, to mark that the run has completed so that all processes exit properly.

You might find it difficult to keep track of runs across multiple experiments. To mitigate this, provide a value to the group parameter when you initialize W&B (wandb.init(group='group-name')) to keep track of which run belongs to a given experiment. For more information about how to keep track of training and evaluation W&B Runs in experiments, see Group Runs.

Use this approach if you want to track metrics from individual processes. Typical examples include the data and predictions on each node (for debugging data distribution) and metrics on individual batches outside of the main node. This approach is not necessary to get system metrics from all nodes nor to get summary statistics available on the main node.

The following Python code snippet demonstrates how to set the group parameter when you initialize W&B:

if __name__ == "__main__":
    # Get args
    args = parse_args()
    # Initialize run
    run = wandb.init(
        entity=args.entity,
        project=args.project,
        group="DDP",  # all runs for the experiment in one group
    )
    # Train model with DDP
    train(args, run)

    run.finish()  # mark the run as finished

Explore the W&B App UI to view an example dashboard of metrics tracked from multiple processes. Note that there are two W&B Runs grouped together in the left sidebar. Click on a group to view the dedicated group page for the experiment. The dedicated group page displays metrics from each process separately.

The preceding image demonstrates the W&B App UI dashboard. On the sidebar we see two experiments. One labeled ’null’ and a second (bound by a yellow box) called ‘DPP’. If you expand the group (select the Group dropdown) you will see the W&B Runs that are associated to that experiment.

Track all processes to a single run

Parameters prefixed by x_ (such as x_label) are in public preview. Create a GitHub issue in the W&B repository to provide feedback.

Requirements

To track multiple processes to a single run, you must have:

W&B Python SDK version v0.19.9 or newer.
W&B Server v0.68 or newer.

In this approach you use a primary node and one or more worker nodes. Within the primary node you initialize a W&B run. For each worker node, initialize a run using the run ID used by the primary node. During training each worker node logs to the same run ID as the primary node. W&B aggregates metrics from all nodes and displays them in the W&B App UI.

Within the primary node, initialize a W&B run with wandb.init(). Pass in a wandb.Settings object to the settings parameter (wandb.init(settings=wandb.Settings()) with the following:

The mode parameter set to "shared" to enable shared mode.
A unique label for x_label. You use the value you specify for x_label to identify which node the data is coming from in logs and system metrics in the W&B App UI. If left unspecified, W&B creates a label for you using the hostname and a random hash.
Set the x_primary parameter to True to indicate that this is the primary node.
Optionally provide a list of GPU indexes ([0,1,2]) to x_stats_gpu_device_ids to specify which GPUs W&B tracks metrics for. If you do not provide a list, W&B tracks metrics for all GPUs on the machine.

Make note of the run ID of the primary node. Each worker node needs the run ID of the primary node.

x_primary=True distinguishes a primary node from worker nodes. Primary nodes are the only nodes that upload files shared across nodes such as configuration files, telemetry and more. Worker nodes do not upload these files.

For each worker node, initialize a W&B run with wandb.init() and provide the following:

A wandb.Settings object to the settings parameter (wandb.init(settings=wandb.Settings()) with:
- The mode parameter set to "shared" to enable shared mode.
- A unique label for x_label. You use the value you specify for x_label to identify which node the data is coming from in logs and system metrics in the W&B App UI. If left unspecified, W&B creates a label for you using the hostname and a random hash.
- Set the x_primary parameter to False to indicate that this is a worker node.
Pass the run ID used by the primary node to the id parameter.
Optionally set x_update_finish_state to False. This prevents non-primary nodes from updating the run’s state to finished prematurely, ensuring the run state remains consistent and managed by the primary node.

Consider using an environment variable to set the run ID of the primary node that you can then define in each worker node’s machine.

The following sample code demonstrates the high level requirements for tracking multiple processes to a single run:

import wandb

# Initialize a run in the primary node
run = wandb.init(
    entity="entity",
    project="project",
	settings=wandb.Settings(
        x_label="rank_0", 
        mode="shared", 
        x_primary=True,
        x_stats_gpu_device_ids=[0, 1],  # (Optional) Only track metrics for GPU 0 and 1
        )
)

# Note the run ID of the primary node.
# Each worker node needs this run ID.
run_id = run.id

# Initialize a run in a worker node using the run ID of the primary node
run = wandb.init(
	settings=wandb.Settings(x_label="rank_1", mode="shared", x_primary=False),
	id=run_id,
)

# Initialize a run in a worker node using the run ID of the primary node
run = wandb.init(
	settings=wandb.Settings(x_label="rank_2", mode="shared", x_primary=False),
	id=run_id,
)

In a real world example, each worker node might be on a separate machine.

See the Distributed Training with Shared Mode report for an end-to-end example on how to train a model on a multi-node and multi-GPU Kubernetes cluster in GKE.

View console logs from multi node processes in the project that the run logs to:

Navigate to the project that contains the run.
Click on the Runs tab in the left sidebar.
Click on the run you want to view.
Click on the Logs tab in the left sidebar.

You can filter console logs based on the labels you provide for x_label in the UI search bar located at the top of the console log page. For example, the following image shows which options are available to filter the console log by if values rank0, rank1, rank2, rank3, rank4, rank5, and rank6 are provided to x_label.`

See Console logs for more information.

W&B aggregates system metrics from all nodes and displays them in the W&B App UI. For example, the following image shows a sample dashboard with system metrics from multiple nodes. Each node possesses a unique label (rank_0, rank_1, rank_2) that you specify in the x_label parameter.

See Line plots for information on how to customize line plot panels.

Example use cases

The following code snippets demonstrate common scenarios for advanced distributed use cases.

Spawn process

Use the wandb.setup()method in your main function if you initiate a run in a spawned process:

import multiprocessing as mp

def do_work(n):
    with wandb.init(config=dict(n=n)) as run:
        run.log(dict(this=n * n))

def main():
    wandb.setup()
    pool = mp.Pool(processes=4)
    pool.map(do_work, range(4))


if __name__ == "__main__":
    main()

Pass a run object as an argument to share runs between processes:

def do_work(run):
    with wandb.init() as run:
        run.log(dict(this=1))

def main():
    run = wandb.init()
    p = mp.Process(target=do_work, kwargs=dict(run=run))
    p.start()
    p.join()
    run.finish()  # mark the run as finished


if __name__ == "__main__":
    main()

W&B can not guarantee the logging order. Synchronization should be done by the author of the script.

Troubleshooting

There are two common issues you might encounter when using W&B and distributed training:

Hanging at the beginning of training - A wandb process can hang if the wandb multiprocessing interferes with the multiprocessing from distributed training.
Hanging at the end of training - A training job might hang if the wandb process does not know when it needs to exit. Call the wandb.Run.finish() API at the end of your Python script to tell W&B that the run finished. The wandb.Run.finish() API will finish uploading data and will cause W&B to exit. W&B recommends using wandb service command to improve the reliability of your distributed jobs. Both of the preceding training issues are commonly found in versions of the W&B SDK where wandb service is unavailable.

Enable W&B Service

Depending on your version of the W&B SDK, you might already have W&B Service enabled by default.

W&B SDK 0.13.0 and above

W&B Service is enabled by default for versions of the W&B SDK 0.13.0 and above.

W&B SDK 0.12.5 and above

Modify your Python script to enable W&B Service for W&B SDK version 0.12.5 and above. Use the wandb.require method and pass the string "service" within your main function:

if __name__ == "__main__":
    main()


def main():
    wandb.require("service")
    # rest-of-your-script-goes-here

For optimal experience we do recommend you upgrade to the latest version.

W&B SDK 0.12.4 and below

Set the WANDB_START_METHOD environment variable to "thread" to use multithreading instead if you use a W&B SDK version 0.12.4 and below.

2.1.6.4 - Log media and objects

Log rich media, from 3D point clouds and molecules to HTML and histograms

Try in Colab

We support images, video, audio, and more. Log rich media to explore your results and visually compare your runs, models, and datasets. Read on for examples and how-to guides.

For details, see the Data types reference.

For more details, check out a demo report about visualize model predictions or watch a video walkthrough.

Pre-requisites

In order to log media objects with the W&B SDK, you may need to install additional dependencies. You can install these dependencies by running the following command:

pip install wandb[media]

Images

Log images to track inputs, outputs, filter weights, activations, and more.

Images can be logged directly from NumPy arrays, as PIL images, or from the filesystem.

Each time you log images from a step, we save them to show in the UI. Expand the image panel, and use the step slider to look at images from different steps. This makes it easy to compare how a model’s output changes during training.

It’s recommended to log fewer than 50 images per step to prevent logging from becoming a bottleneck during training and image loading from becoming a bottleneck when viewing results.

Provide arrays directly when constructing images manually, such as by using make_grid from torchvision.

Arrays are converted to png using Pillow.

import wandb

with wandb.init(project="image-log-example") as run:

    images = wandb.Image(image_array, caption="Top: Output, Bottom: Input")

    run.log({"examples": images})

We assume the image is gray scale if the last dimension is 1, RGB if it’s 3, and RGBA if it’s 4. If the array contains floats, we convert them to integers between 0 and 255. If you want to normalize your images differently, you can specify the mode manually or just supply a PIL.Image, as described in the “Logging PIL Images” tab of this panel.

For full control over the conversion of arrays to images, construct the PIL.Image yourself and provide it directly.

from PIL import Image

with wandb.init(project="") as run:
    # Create a PIL image from a NumPy array
    image = Image.fromarray(image_array)

    # Optionally, convert to RGB if needed
    if image.mode != "RGB":
        image = image.convert("RGB")

    # Log the image
    run.log({"example": wandb.Image(image, caption="My Image")})

For even more control, create images however you like, save them to disk, and provide a filepath.

import wandb
from PIL import Image

with wandb.init(project="") as run:

    im = Image.fromarray(...)
    rgb_im = im.convert("RGB")
    rgb_im.save("myimage.jpg")

    run.log({"example": wandb.Image("myimage.jpg")})

Image overlays

Log semantic segmentation masks and interact with them (altering opacity, viewing changes over time, and more) via the W&B UI.

To log an overlay, provide a dictionary with the following keys and values to the masks keyword argument of wandb.Image:

one of two keys representing the image mask:
- "mask_data": a 2D NumPy array containing an integer class label for each pixel
- "path": (string) a path to a saved image mask file
"class_labels": (optional) a dictionary mapping the integer class labels in the image mask to their readable class names

To log multiple masks, log a mask dictionary with multiple keys, as in the code snippet below.

See a live example

Sample code

mask_data = np.array([[1, 2, 2, ..., 2, 2, 1], ...])

class_labels = {1: "tree", 2: "car", 3: "road"}

mask_img = wandb.Image(
    image,
    masks={
        "predictions": {"mask_data": mask_data, "class_labels": class_labels},
        "ground_truth": {
            # ...
        },
        # ...
    },
)

Segmentation masks for a key are defined at each step (each call to run.log()).

If steps provide different values for the same mask key, only the most recent value for the key is applied to the image.
If steps provide different mask keys, all values for each key are shown, but only those defined in the step being viewed are applied to the image. Toggling the visibility of masks not defined in the step do not change the image.

Log bounding boxes with images, and use filters and toggles to dynamically visualize different sets of boxes in the UI.

See a live example

To log a bounding box, you’ll need to provide a dictionary with the following keys and values to the boxes keyword argument of wandb.Image:

box_data: a list of dictionaries, one for each box. The box dictionary format is described below.
- position: a dictionary representing the position and size of the box in one of two formats, as described below. Boxes need not all use the same format.
  - Option 1: {"minX", "maxX", "minY", "maxY"}. Provide a set of coordinates defining the upper and lower bounds of each box dimension.
  - Option 2: {"middle", "width", "height"}. Provide a set of coordinates specifying the middle coordinates as [x,y], and width and height as scalars.
- class_id: an integer representing the class identity of the box. See class_labels key below.
- scores: a dictionary of string labels and numeric values for scores. Can be used for filtering boxes in the UI.
- domain: specify the units/format of the box coordinates. Set this to “pixel” if the box coordinates are expressed in pixel space, such as integers within the bounds of the image dimensions. By default, the domain is assumed to be a fraction/percentage of the image, expressed as a floating point number between 0 and 1.
- box_caption: (optional) a string to be displayed as the label text on this box
class_labels: (optional) A dictionary mapping class_ids to strings. By default we will generate class labels class_0, class_1, etc.

Check out this example:

import wandb

class_id_to_label = {
    1: "car",
    2: "road",
    3: "building",
    # ...
}

img = wandb.Image(
    image,
    boxes={
        "predictions": {
            "box_data": [
                {
                    # one box expressed in the default relative/fractional domain
                    "position": {"minX": 0.1, "maxX": 0.2, "minY": 0.3, "maxY": 0.4},
                    "class_id": 2,
                    "box_caption": class_id_to_label[2],
                    "scores": {"acc": 0.1, "loss": 1.2},
                    # another box expressed in the pixel domain
                    # (for illustration purposes only, all boxes are likely
                    # to be in the same domain/format)
                    "position": {"middle": [150, 20], "width": 68, "height": 112},
                    "domain": "pixel",
                    "class_id": 3,
                    "box_caption": "a building",
                    "scores": {"acc": 0.5, "loss": 0.7},
                    # ...
                    # Log as many boxes an as needed
                }
            ],
            "class_labels": class_id_to_label,
        },
        # Log each meaningful group of boxes with a unique key name
        "ground_truth": {
            # ...
        },
    },
)

with wandb.init(project="my_project") as run:
    run.log({"driving_scene": img})

Image overlays in Tables

Interactive Segmentation Masks in Tables

To log Segmentation Masks in tables, you will need to provide a wandb.Image object for each row in the table.

An example is provided in the Code snippet below:

table = wandb.Table(columns=["ID", "Image"])

for id, img, label in zip(ids, images, labels):
    mask_img = wandb.Image(
        img,
        masks={
            "prediction": {"mask_data": label, "class_labels": class_labels}
            # ...
        },
    )

    table.add_data(id, mask_img)

with wandb.init(project="my_project") as run:
    run.log({"Table": table})

To log Images with Bounding Boxes in tables, you will need to provide a wandb.Image object for each row in the table.

An example is provided in the code snippet below:

table = wandb.Table(columns=["ID", "Image"])

for id, img, boxes in zip(ids, images, boxes_set):
    box_img = wandb.Image(
        img,
        boxes={
            "prediction": {
                "box_data": [
                    {
                        "position": {
                            "minX": box["minX"],
                            "minY": box["minY"],
                            "maxX": box["maxX"],
                            "maxY": box["maxY"],
                        },
                        "class_id": box["class_id"],
                        "box_caption": box["caption"],
                        "domain": "pixel",
                    }
                    for box in boxes
                ],
                "class_labels": class_labels,
            }
        },
    )

Histograms

If a sequence of numbers, such as a list, array, or tensor, is provided as the first argument, we will construct the histogram automatically by calling np.histogram. All arrays/tensors are flattened. You can use the optional num_bins keyword argument to override the default of 64 bins. The maximum number of bins supported is 512.

In the UI, histograms are plotted with the training step on the x-axis, the metric value on the y-axis, and the count represented by color, to ease comparison of histograms logged throughout training. See the “Histograms in Summary” tab of this panel for details on logging one-off histograms.

run.log({"gradients": wandb.Histogram(grads)})

If you want more control, call np.histogram and pass the returned tuple to the np_histogram keyword argument.

np_hist_grads = np.histogram(grads, density=True, range=(0.0, 1.0))
run.log({"gradients": wandb.Histogram(np_hist_grads)})

If histograms are in your summary they will appear on the Overview tab of the Run Page. If they are in your history, we plot a heatmap of bins over time on the Charts tab.

3D visualizations

Log 3D point clouds and Lidar scenes with bounding boxes. Pass in a NumPy array containing coordinates and colors for the points to render.

point_cloud = np.array([[0, 0, 0, COLOR]])

run.log({"point_cloud": wandb.Object3D(point_cloud)})

The W&B UI truncates the data at 300,000 points.

NumPy array formats

Three different formats of NumPy arrays are supported for flexible color schemes.

[[x, y, z], ...] nx3
[[x, y, z, c], ...] nx4 | c is a category in the range [1, 14] (Useful for segmentation)
[[x, y, z, r, g, b], ...] nx6 | r,g,b are values in the range [0,255]for red, green, and blue color channels.

Python object

Using this schema, you can define a Python object and pass it in to the from_point_cloud method.

pointsis a NumPy array containing coordinates and colors for the points to render using the same formats as the simple point cloud renderer shown above.
boxes is a NumPy array of python dictionaries with three attributes:
- corners- a list of eight corners
- label- a string representing the label to be rendered on the box (Optional)
- color- rgb values representing the color of the box
- score - a numeric value that will be displayed on the bounding box that can be used to filter the bounding boxes shown (for example, to only show bounding boxes where score > 0.75). (Optional)
type is a string representing the scene type to render. Currently the only supported value is lidar/beta

point_list = [
    [
        2566.571924017235, # x
        746.7817289698219, # y
        -15.269245470863748,# z
        76.5, # red
        127.5, # green
        89.46617199365393 # blue
    ],
    [ 2566.592983606823, 746.6791987335685, -15.275803826279521, 76.5, 127.5, 89.45471117247024 ],
    [ 2566.616361739416, 746.4903185513501, -15.28628929674075, 76.5, 127.5, 89.41336375503832 ],
    [ 2561.706014951675, 744.5349468458361, -14.877496818222781, 76.5, 127.5, 82.21868245418283 ],
    [ 2561.5281847916694, 744.2546118233013, -14.867862032341005, 76.5, 127.5, 81.87824684536432 ],
    [ 2561.3693562897465, 744.1804761656741, -14.854129178142523, 76.5, 127.5, 81.64137897587152 ],
    [ 2561.6093071504515, 744.0287526628543, -14.882135189841177, 76.5, 127.5, 81.89871499537098 ],
    # ... and so on
]

run.log({"my_first_point_cloud": wandb.Object3D.from_point_cloud(
     points = point_list,
     boxes = [{
         "corners": [
                [ 2601.2765123137915, 767.5669506323393, -17.816764802288663 ],
                [ 2599.7259021588347, 769.0082337923552, -17.816764802288663 ],
                [ 2599.7259021588347, 769.0082337923552, -19.66876480228866 ],
                [ 2601.2765123137915, 767.5669506323393, -19.66876480228866 ],
                [ 2604.8684867834395, 771.4313904894723, -17.816764802288663 ],
                [ 2603.3178766284827, 772.8726736494882, -17.816764802288663 ],
                [ 2603.3178766284827, 772.8726736494882, -19.66876480228866 ],
                [ 2604.8684867834395, 771.4313904894723, -19.66876480228866 ]
        ],
         "color": [0, 0, 255], # color in RGB of the bounding box
         "label": "car", # string displayed on the bounding box
         "score": 0.6 # numeric displayed on the bounding box
     }],
     vectors = [
        {"start": [0, 0, 0], "end": [0.1, 0.2, 0.5], "color": [255, 0, 0]}, # color is optional
     ],
     point_cloud_type = "lidar/beta",
)})

When viewing a point cloud, you can hold control and use the mouse to move around inside the space.

Point cloud files

You can use the from_file method to load in a JSON file full of point cloud data.

run.log({"my_cloud_from_file": wandb.Object3D.from_file(
     "./my_point_cloud.pts.json"
)})

An example of how to format the point cloud data is shown below.

{
    "boxes": [
        {
            "color": [
                0,
                255,
                0
            ],
            "score": 0.35,
            "label": "My label",
            "corners": [
                [
                    2589.695869075582,
                    760.7400443552185,
                    -18.044831294622487
                ],
                [
                    2590.719039645323,
                    762.3871153874499,
                    -18.044831294622487
                ],
                [
                    2590.719039645323,
                    762.3871153874499,
                    -19.54083129462249
                ],
                [
                    2589.695869075582,
                    760.7400443552185,
                    -19.54083129462249
                ],
                [
                    2594.9666662674313,
                    757.4657929961453,
                    -18.044831294622487
                ],
                [
                    2595.9898368371723,
                    759.1128640283766,
                    -18.044831294622487
                ],
                [
                    2595.9898368371723,
                    759.1128640283766,
                    -19.54083129462249
                ],
                [
                    2594.9666662674313,
                    757.4657929961453,
                    -19.54083129462249
                ]
            ]
        }
    ],
    "points": [
        [
            2566.571924017235,
            746.7817289698219,
            -15.269245470863748,
            76.5,
            127.5,
            89.46617199365393
        ],
        [
            2566.592983606823,
            746.6791987335685,
            -15.275803826279521,
            76.5,
            127.5,
            89.45471117247024
        ],
        [
            2566.616361739416,
            746.4903185513501,
            -15.28628929674075,
            76.5,
            127.5,
            89.41336375503832
        ]
    ],
    "type": "lidar/beta"
}

NumPy arrays

Using the same array formats defined above, you can use numpy arrays directly with the from_numpy method to define a point cloud.

run.log({"my_cloud_from_numpy_xyz": wandb.Object3D.from_numpy(
     np.array(  
        [
            [0.4, 1, 1.3], # x, y, z
            [1, 1, 1], 
            [1.2, 1, 1.2]
        ]
    )
)})

run.log({"my_cloud_from_numpy_cat": wandb.Object3D.from_numpy(
     np.array(  
        [
            [0.4, 1, 1.3, 1], # x, y, z, category 
            [1, 1, 1, 1], 
            [1.2, 1, 1.2, 12], 
            [1.2, 1, 1.3, 12], 
            [1.2, 1, 1.4, 12], 
            [1.2, 1, 1.5, 12], 
            [1.2, 1, 1.6, 11], 
            [1.2, 1, 1.7, 11], 
        ]
    )
)})

run.log({"my_cloud_from_numpy_rgb": wandb.Object3D.from_numpy(
     np.array(  
        [
            [0.4, 1, 1.3, 255, 0, 0], # x, y, z, r, g, b 
            [1, 1, 1, 0, 255, 0], 
            [1.2, 1, 1.3, 0, 255, 255],
            [1.2, 1, 1.4, 0, 255, 255],
            [1.2, 1, 1.5, 0, 0, 255],
            [1.2, 1, 1.1, 0, 0, 255],
            [1.2, 1, 0.9, 0, 0, 255],
        ]
    )
)})

run.log({"protein": wandb.Molecule("6lu7.pdb")})

Log molecular data in any of 10 file types:pdb, pqr, mmcif, mcif, cif, sdf, sd, gro, mol2, or mmtf.

W&B also supports logging molecular data from SMILES strings, rdkit mol files, and rdkit.Chem.rdchem.Mol objects.

resveratrol = rdkit.Chem.MolFromSmiles("Oc1ccc(cc1)C=Cc1cc(O)cc(c1)O")

run.log(
    {
        "resveratrol": wandb.Molecule.from_rdkit(resveratrol),
        "green fluorescent protein": wandb.Molecule.from_rdkit("2b3p.mol"),
        "acetaminophen": wandb.Molecule.from_smiles("CC(=O)Nc1ccc(O)cc1"),
    }
)

When your run finishes, you’ll be able to interact with 3D visualizations of your molecules in the UI.

See a live example using AlphaFold

Molecule structure

PNG image

wandb.Image converts numpy arrays or instances of PILImage to PNGs by default.

run.log({"example": wandb.Image(...)})
# Or multiple images
run.log({"example": [wandb.Image(...) for img in images]})

Video

Videos are logged using the wandb.Video data type:

run.log({"example": wandb.Video("myvideo.mp4")})

Now you can view videos in the media browser. Go to your project workspace, run workspace, or report and click Add visualization to add a rich media panel.

2D view of a molecule

You can log a 2D view of a molecule using the wandb.Image data type and rdkit:

molecule = rdkit.Chem.MolFromSmiles("CC(=O)O")
rdkit.Chem.AllChem.Compute2DCoords(molecule)
rdkit.Chem.AllChem.GenerateDepictionMatching2DStructure(molecule, molecule)
pil_image = rdkit.Chem.Draw.MolToImage(molecule, size=(300, 300))

run.log({"acetic_acid": wandb.Image(pil_image)})

Other media

W&B also supports logging of a variety of other media types.

Audio

run.log({"whale songs": wandb.Audio(np_array, caption="OooOoo", sample_rate=32)})

A maximum of 100 audio clips can be logged per step. For more usage information, see audio-file.

Video

run.log({"video": wandb.Video(numpy_array_or_path_to_video, fps=4, format="gif")})

If a numpy array is supplied we assume the dimensions are, in order: time, channels, width, height. By default we create a 4 fps gif image (ffmpeg and the moviepy python library are required when passing numpy objects). Supported formats are "gif", "mp4", "webm", and "ogg". If you pass a string to wandb.Video we assert the file exists and is a supported format before uploading to wandb. Passing a BytesIO object will create a temporary file with the specified format as the extension.

On the W&B Run and Project Pages, you will see your videos in the Media section.

For more usage information, see video-file.

Text

Use wandb.Table to log text in tables to show up in the UI. By default, the column headers are ["Input", "Output", "Expected"]. To ensure optimal UI performance, the default maximum number of rows is set to 10,000. However, users can explicitly override the maximum with wandb.Table.MAX_ROWS = {DESIRED_MAX}.

with wandb.init(project="my_project") as run:
    columns = ["Text", "Predicted Sentiment", "True Sentiment"]
    # Method 1
    data = [["I love my phone", "1", "1"], ["My phone sucks", "0", "-1"]]
    table = wandb.Table(data=data, columns=columns)
    run.log({"examples": table})

    # Method 2
    table = wandb.Table(columns=columns)
    table.add_data("I love my phone", "1", "1")
    table.add_data("My phone sucks", "0", "-1")
    run.log({"examples": table})

You can also pass a pandas DataFrame object.

table = wandb.Table(dataframe=my_dataframe)

For more usage information, see string.

HTML

run.log({"custom_file": wandb.Html(open("some.html"))})
run.log({"custom_string": wandb.Html('<a href="https://mysite">Link</a>')})

Custom HTML can be logged at any key, and this exposes an HTML panel on the run page. By default, we inject default styles; you can turn off default styles by passing inject=False.

run.log({"custom_file": wandb.Html(open("some.html"), inject=False)})

For more usage information, see html-file.

2.1.6.5 - Log models

Try in Colab

Log models

The following guide describes how to log models to a W&B run and interact with them.

The following APIs are useful for tracking models as a part of your experiment tracking workflow. Use the APIs listed on this page to log models to a run, and to access metrics, tables, media, and other objects.

W&B suggests that you use W&B Artifacts if you want to:

Create and keep track of different versions of serialized data besides models, such as datasets, prompts, and more.
Explore lineage graphs of a model or any other objects tracked in W&B.
Interact with the model artifacts these methods created, such as updating properties (metadata, aliases, and descriptions)

For more information on W&B Artifacts and advanced versioning use cases, see the Artifacts documentation.

Log a model to a run

Use the log_model to log a model artifact that contains content within a directory you specify. The log_model method also marks the resulting model artifact as an output of the W&B run.

You can track a model’s dependencies and the model’s associations if you mark the model as the input or output of a W&B run. View the lineage of the model within the W&B App UI. See the Explore and traverse artifact graphs page within the Artifacts chapter for more information.

Provide the path where your model files are saved to the path parameter. The path can be a local file, directory, or reference URI to an external bucket such as s3://bucket/path.

Ensure to replace values enclosed in <> with your own.

import wandb

# Initialize a W&B run
run = wandb.init(project="<your-project>", entity="<your-entity>")

# Log the model
run.log_model(path="<path-to-model>", name="<name>")

Optionally provide a name for the model artifact for the name parameter. If name is not specified, W&B will use the basename of the input path prepended with the run ID as the name.

Keep track of the name that you, or W&B assigns, to the model. You will need the name of the model to retrieve the model path with the use_model method.

See log_model in the API Reference for parameters.

Example: Log a model to a run

import os
import wandb
from tensorflow import keras
from tensorflow.keras import layers

config = {"optimizer": "adam", "loss": "categorical_crossentropy"}

# Initialize a W&B run
run = wandb.init(entity="charlie", project="mnist-experiments", config=config)

# Hyperparameters
loss = run.config["loss"]
optimizer = run.config["optimizer"]
metrics = ["accuracy"]
num_classes = 10
input_shape = (28, 28, 1)

# Training algorithm
model = keras.Sequential(
    [
        layers.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

# Configure the model for training
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

# Save model
model_filename = "model.h5"
local_filepath = "./"
full_path = os.path.join(local_filepath, model_filename)
model.save(filepath=full_path)

# Log the model to the W&B run
run.log_model(path=full_path, name="MNIST")
run.finish()

When the user called log_model, a model artifact named MNIST was created and the file model.h5 was added to the model artifact. Your terminal or notebook will print information of where to find information about the run the model was logged to.

View run different-surf-5 at: https://wandb.ai/charlie/mnist-experiments/runs/wlby6fuw
Synced 5 W&B file(s), 0 media file(s), 1 artifact file(s) and 0 other file(s)
Find logs at: ./wandb/run-20231206_103511-wlby6fuw/logs

Download and use a logged model

Use the use_model function to access and download models files previously logged to a W&B run.

Provide the name of the model artifact where the model files you are want to retrieve are stored. The name you provide must match the name of an existing logged model artifact.

If you did not define name when originally logged the files with log_model, the default name assigned is the basename of the input path, prepended with the run ID.

Ensure to replace other the values enclosed in <> with your own:

import wandb

# Initialize a run
run = wandb.init(project="<your-project>", entity="<your-entity>")

# Access and download model. Returns path to downloaded artifact
downloaded_model_path = run.use_model(name="<your-model-name>")

The use_model function returns the path of downloaded model files. Keep track of this path if you want to link this model later. In the preceding code snippet, the returned path is stored in a variable called downloaded_model_path.

Example: Download and use a logged model

For example, in the proceeding code snippet a user called the use_model API. They specified the name of the model artifact they want to fetch and they also provided a version/alias. They then stored the path that is returned from the API to the downloaded_model_path variable.

import wandb

entity = "luka"
project = "NLP_Experiments"
alias = "latest"  # semantic nickname or identifier for the model version
model_artifact_name = "fine-tuned-model"

# Initialize a run
run = wandb.init(project=project, entity=entity)
# Access and download model. Returns path to downloaded artifact
downloaded_model_path = run.use_model(name = f"{model_artifact_name}:{alias}")

See use_model in the API Reference for parameters and return type.

Log and link a model to the W&B Model Registry

The link_model method is currently only compatible with the legacy W&B Model Registry, which will soon be deprecated. To learn how to link a model artifact to the new edition of model registry, visit the Registry linking guide.

Use the link_model method to log model files to a W&B Run and link it to the W&B Model Registry. If no registered model exists, W&B will create a new one for you with the name you provide for the registered_model_name parameter.

Linking a model is analogous to ‘bookmarking’ or ‘publishing’ a model to a centralized team repository of models that others members of your team can view and consume.

When you link a model, that model is not duplicated in the Registry or moved out of the project and into the registry. A linked model is a pointer to the original model in your project.

Use the Registry to organize your best models by task, manage model lifecycle, facilitate easy tracking and auditing throughout the ML lifecyle, and automate downstream actions with webhooks or jobs.

A Registered Model is a collection or folder of linked model versions in the Model Registry. Registered models typically represent candidate models for a single modeling use case or task.

The proceeding code snippet shows how to link a model with the link_model API. Ensure to replace other the values enclosed in <> with your own:

import wandb

run = wandb.init(entity="<your-entity>", project="<your-project>")
run.link_model(path="<path-to-model>", registered_model_name="<registered-model-name>")
run.finish()

See link_model in the API Reference guide for optional parameters.

If the registered-model-name matches the name of a registered model that already exists within the Model Registry, the model will be linked to that registered model. If no such registered model exists, a new one will be created and the model will be the first one linked.

For example, suppose you have an existing registered model named “Fine-Tuned-Review-Autocompletion” in your Model Registry (see example here). And suppose that a few model versions are already linked to it: v0, v1, v2. If you call link_model with registered-model-name="Fine-Tuned-Review-Autocompletion", the new model will be linked to this existing registered model as v3. If no registered model with this name exists, a new one will be created and the new model will be linked as v0.

Example: Log and link a model to the W&B Model Registry

For example, the proceeding code snippet logs model files and links the model to a registered model name "Fine-Tuned-Review-Autocompletion".

To do this, a user calls the link_model API. When they call the API, they provide a local filepath that points the content of the model (path) and they provide a name for the registered model to link it to (registered_model_name).

import wandb

path = "/local/dir/model.pt"
registered_model_name = "Fine-Tuned-Review-Autocompletion"

run = wandb.init(project="llm-evaluation", entity="noa")
run.link_model(path=path, registered_model_name=registered_model_name)
run.finish()

Reminder: A registered model houses a collection of bookmarked model versions.

2.1.6.6 - Log summary metrics

In addition to values that change over time during training, it is often important to track a single value that summarizes a model or a preprocessing step. Log this information in a W&B Run’s summary dictionary. A Run’s summary dictionary can handle numpy arrays, PyTorch tensors or TensorFlow tensors. When a value is one of these types we persist the entire tensor in a binary file and store high level metrics in the summary object, such as min, mean, variance, percentiles, and more.

The last value logged with wandb.Run.log() is automatically set as the summary dictionary in a W&B Run. If a summary metric dictionary is modified, the previous value is lost.

The following code snippet demonstrates how to provide a custom summary metric to W&B:

import wandb
import argparse

with wandb.init(config=args) as run:
  best_accuracy = 0
  for epoch in range(1, args.epochs + 1):
      test_loss, test_accuracy = test()
      if test_accuracy > best_accuracy:
          run.summary["best_accuracy"] = test_accuracy
          best_accuracy = test_accuracy

You can update the summary attribute of an existing W&B Run after training has completed. Use the W&B Public API to update the summary attribute:

api = wandb.Api()
run = api.run("username/project/run_id")
run.summary["tensor"] = np.random.random(1000)
run.summary.update()

Customize summary metrics

Custom summary metrics are useful for capturing model performance at the best step of training in your run.summary. For example, you might want to capture the maximum accuracy or the minimum loss value, instead of the final value.

By default, the summary uses the final value from history. To customize summary metrics, pass the summary argument in define_metric. It accepts the following values:

"min"
"max"
"mean"
"best"
"last"
"none"

You can use "best" only when you also set the optional objective argument to "minimize" or "maximize".

The following example adds the min and max values of loss and accuracy to the summary:

import wandb
import random

random.seed(1)

with wandb.init() as run:
    # Min and max summary values for loss
    run.define_metric("loss", summary="min")
    run.define_metric("loss", summary="max")

    # Min and max summary values for accuracy
    run.define_metric("acc", summary="min")
    run.define_metric("acc", summary="max")

    for i in range(10):
        log_dict = {
            "loss": random.uniform(0, 1 / (i + 1)),
            "acc": random.uniform(1 / (i + 1), 1),
        }
        run.log(log_dict)

View summary metrics

View summary values in a run’s Overview page or the project’s runs table.

Navigate to the W&B App.
Select the Workspace tab.
From the list of runs, click the name of the run that logged the summary values.
Select the Overview tab.
View the summary values in the Summary section.

Navigate to the W&B App.
Select the Runs tab.
Within the runs table, you can view the summary values within the columns based on the name of the summary value.

You can use the W&B Public API to fetch the summary values of a run.

The following code example demonstrates one way to retrieve the summary values logged to a specific run using the W&B Public API and pandas:

import wandb
import pandas

entity = "<your-entity>"
project = "<your-project>"
run_name = "<your-run-name>" # Name of run with summary values

all_runs = []

for run in api.runs(f"{entity}/{project_name}"):
    print("Fetching details for run: ", run.id, run.name)
    run_data = {
              "id": run.id,
              "name": run.name,
              "url": run.url,
              "state": run.state,
              "tags": run.tags,
              "config": run.config,
              "created_at": run.created_at,
              "system_metrics": run.system_metrics,
              "summary": run.summary,
              "project": run.project,
              "entity": run.entity,
              "user": run.user,
              "path": run.path,
              "notes": run.notes,
              "read_only": run.read_only,
              "history_keys": run.history_keys,
              "metadata": run.metadata,
          }
    all_runs.append(run_data)
  
# Convert to DataFrame  
df = pd.DataFrame(all_runs)

# Get row based on the column name (run) and convert to dictionary
df[df['name']==run_name].summary.reset_index(drop=True).to_dict()

2.1.6.7 - Log tables

Log tables with W&B.

Try in Colab

Use wandb.Table to log data to visualize and query with W&B. In this guide, learn how to:

Create tables

To define a Table, specify the columns you want to see for each row of data. Each row might be a single item in your training dataset, a particular step or epoch during training, a prediction made by your model on a test item, an object generated by your model, etc. Each column has a fixed type: numeric, text, boolean, image, video, audio, etc. You do not need to specify the type in advance. Give each column a name, and make sure to only pass data of that type into that column index. For a more detailed example, see the W&B Tables guide.

Use the wandb.Table constructor in one of two ways:

List of Rows: Log named columns and rows of data. For example the proceeding code snippet generates a table with two rows and three columns:

wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]])

Pandas DataFrame: Log a DataFrame using wandb.Table(dataframe=my_df). Column names will be extracted from the DataFrame.

From an existing array or dataframe

# assume a model has returned predictions on four images
# with the following fields available:
# - the image id
# - the image pixels, wrapped in a wandb.Image()
# - the model's predicted label
# - the ground truth label
my_data = [
    [0, wandb.Image("img_0.jpg"), 0, 0],
    [1, wandb.Image("img_1.jpg"), 8, 0],
    [2, wandb.Image("img_2.jpg"), 7, 1],
    [3, wandb.Image("img_3.jpg"), 1, 1],
]

# create a wandb.Table() with corresponding columns
columns = ["id", "image", "prediction", "truth"]
test_table = wandb.Table(data=my_data, columns=columns)

Add data

Tables are mutable. As your script executes you can add more data to your table, up to 200,000 rows. There are two ways to add data to a table:

Add a Row: table.add_data("3a", "3b", "3c"). Note that the new row is not represented as a list. If your row is in list format, use the star notation, * ,to expand the list to positional arguments: table.add_data(*my_row_list). The row must contain the same number of entries as there are columns in the table.
Add a Column: table.add_column(name="col_name", data=col_data). Note that the length of col_data must be equal to the table’s current number of rows. Here, col_data can be a list data, or a NumPy NDArray.

Adding data incrementally

This code sample shows how to create and populate a W&B table incrementally. You define the table with predefined columns, including confidence scores for all possible labels, and add data row by row during inference. You can also add data to tables incrementally when resuming runs.

# Define the columns for the table, including confidence scores for each label
columns = ["id", "image", "guess", "truth"]
for digit in range(10):  # Add confidence score columns for each digit (0-9)
    columns.append(f"score_{digit}")

# Initialize the table with the defined columns
test_table = wandb.Table(columns=columns)

# Iterate through the test dataset and add data to the table row by row
# Each row includes the image ID, image, predicted label, true label, and confidence scores
for img_id, img in enumerate(mnist_test_data):
    true_label = mnist_test_data_labels[img_id]  # Ground truth label
    guess_label = my_model.predict(img)  # Predicted label
    test_table.add_data(
        img_id, wandb.Image(img), guess_label, true_label
    )  # Add row data to the table

Adding data to resumed runs

You can incrementally update a W&B table in resumed runs by loading an existing table from an artifact, retrieving the last row of data, and adding the updated metrics. Then, reinitialize the table for compatibility and log the updated version back to W&B.

import wandb

# Initialize a run 
with wandb.init(project="my_project") as run:

    # Load the existing table from the artifact
    best_checkpt_table = run.use_artifact(table_tag).get(table_name)

    # Get the last row of data from the table for resuming
    best_iter, best_metric_max, best_metric_min = best_checkpt_table.data[-1]

    # Update the best metrics as needed

    # Add the updated data to the table
    best_checkpt_table.add_data(best_iter, best_metric_max, best_metric_min)

    # Reinitialize the table with its updated data to ensure compatibility
    best_checkpt_table = wandb.Table(
        columns=["col1", "col2", "col3"], data=best_checkpt_table.data
    )

    # Initialize the Run
    run = wandb.init()

    # Log the updated table to W&B
    run.log({table_name: best_checkpt_table})

Retrieve data

Once data is in a Table, access it by column or by row:

Row Iterator: Users can use the row iterator of Table such as for ndx, row in table.iterrows(): ... to efficiently iterate over the data’s rows.
Get a Column: Users can retrieve a column of data using table.get_column("col_name") . As a convenience, users can pass convert_to="numpy" to convert the column to a NumPy NDArray of primitives. This is useful if your column contains media types such as wandb.Image so that you can access the underlying data directly.

Save tables

After you generate a table of data in your script, for example a table of model predictions, save it to W&B to visualize the results live.

Log a table to a run

Use wandb.Run.log() to save your table to the run, like so:

with wandb.init() as run:
    my_table = wandb.Table(columns=["a", "b"], data=[["1a", "1b"], ["2a", "2b"]])
    run.log({"table_key": my_table})

Each time a table is logged to the same key, a new version of the table is created and stored in the backend. This means you can log the same table across multiple training steps to see how model predictions improve over time, or compare tables across different runs, as long as they’re logged to the same key. You can log up to 200,000 rows.

To log more than 200,000 rows, you can override the limit with:

wandb.Table.MAX_ARTIFACT_ROWS = X

However, this would likely cause performance issues, such as slower queries, in the UI.

Access tables programmatically

In the backend, Tables are persisted as Artifacts. If you are interested in accessing a specific version, you can do so with the artifact API:

with wandb.init() as run:
    my_table = run.use_artifact("run-<run-id>-<table-name>:<tag>").get("<table-name>")

For more information on Artifacts, see the Artifacts Chapter in the Developer Guide.

Visualize tables

Any table logged this way will show up in your Workspace on both the Run Page and the Project Page. For more information, see Visualize and Analyze Tables.

Artifact tables

Use artifact.add() to log tables to the Artifacts section of your run instead of the workspace. This could be useful if you have a dataset that you want to log once and then reference for future runs.

with wandb.init(project="my_project") as run:
    # create a wandb Artifact for each meaningful step
    test_predictions = wandb.Artifact("mnist_test_preds", type="predictions")

    # [build up your predictions data as above]
    test_table = wandb.Table(data=data, columns=columns)
    test_predictions.add(test_table, "my_test_key")
    run.log_artifact(test_predictions)

Refer to this Colab for a detailed example of artifact.add() with image data and this Report for an example of how to use Artifacts and Tables to version control and deduplicate tabular data.

Join Artifact tables

You can join tables you have locally constructed or tables you have retrieved from other artifacts using wandb.JoinedTable(table_1, table_2, join_key).

Args	Description
table_1	(str, `wandb.Table`, ArtifactEntry) the path to a `wandb.Table` in an artifact, the table object, or ArtifactEntry
table_2	(str, `wandb.Table`, ArtifactEntry) the path to a `wandb.Table` in an artifact, the table object, or ArtifactEntry
join_key	(str, [str, str]) key or keys on which to perform the join

To join two Tables you have logged previously in an artifact context, fetch them from the artifact and join the result into a new Table.

For example, the proceeding code example demonstrates how to read one Table of original songs called 'original_songs' and another Table of synthesized versions of the same songs called 'synth_songs'. The code joins the two tables on "song_id", and uploads the resulting table as a new W&B Table:

import wandb

with wandb.init(project="my_project") as run:

    # fetch original songs table
    orig_songs = run.use_artifact("original_songs:latest")
    orig_table = orig_songs.get("original_samples")

    # fetch synthesized songs table
    synth_songs = run.use_artifact("synth_songs:latest")
    synth_table = synth_songs.get("synth_samples")

    # join tables on "song_id"
    join_table = wandb.JoinedTable(orig_table, synth_table, "song_id")
    join_at = wandb.Artifact("synth_summary", "analysis")

    # add table to artifact and log to W&B
    join_at.add(join_table, "synth_explore")
    run.log_artifact(join_at)

Read this tutorial for an example on how to combine two previously stored tables stored in different Artifact objects.

2.1.6.8 - Track CSV files with experiments

Importing and logging data into W&B

Use the W&B Python Library to log a CSV file and visualize it in a W&B Dashboard. W&B Dashboard are the central place to organize and visualize results from your machine learning models. This is particularly useful if you have a CSV file that contains information of previous machine learning experiments that are not logged in W&B or if you have CSV file that contains a dataset.

Import and log your dataset CSV file

We suggest you utilize W&B Artifacts to make it easier to re-use the contents of the CSV file easier to use.

To get started, first import your CSV file. In the proceeding code snippet, replace the iris.csv filename with the name of your CSV filename:

import wandb
import pandas as pd

# Read our CSV into a new DataFrame
new_iris_dataframe = pd.read_csv("iris.csv")

Convert the CSV file to a W&B Table to utilize W&B Dashboards.

# Convert the DataFrame into a W&B Table
iris_table = wandb.Table(dataframe=new_iris_dataframe)

Next, create a W&B Artifact and add the table to the Artifact:

# Add the table to an Artifact to increase the row
# limit to 200000 and make it easier to reuse
iris_table_artifact = wandb.Artifact("iris_artifact", type="dataset")
iris_table_artifact.add(iris_table, "iris_table")

# Log the raw csv file within an artifact to preserve our data
iris_table_artifact.add_file("iris.csv")

For more information about W&B Artifacts, see the Artifacts chapter.

Lastly, start a new W&B Run to track and log to W&B with wandb.init:

# Start a W&B run to log data
run = wandb.init(project="tables-walkthrough")

# Log the table to visualize with a run...
run.log({"iris": iris_table})

# and Log as an Artifact to increase the available row limit!
run.log_artifact(iris_table_artifact)

The wandb.init() API spawns a new background process to log data to a Run, and it synchronizes data to wandb.ai (by default). View live visualizations on your W&B Workspace Dashboard. The following image demonstrates the output of the code snippet demonstration.

The full script with the preceding code snippets is found below:

import wandb
import pandas as pd

# Read our CSV into a new DataFrame
new_iris_dataframe = pd.read_csv("iris.csv")

# Convert the DataFrame into a W&B Table
iris_table = wandb.Table(dataframe=new_iris_dataframe)

# Add the table to an Artifact to increase the row
# limit to 200000 and make it easier to reuse
iris_table_artifact = wandb.Artifact("iris_artifact", type="dataset")
iris_table_artifact.add(iris_table, "iris_table")

# log the raw csv file within an artifact to preserve our data
iris_table_artifact.add_file("iris.csv")

# Start a W&B run to log data
run = wandb.init(project="tables-walkthrough")

# Log the table to visualize with a run...
run.log({"iris": iris_table})

# and Log as an Artifact to increase the available row limit!
run.log_artifact(iris_table_artifact)

# Finish the run (useful in notebooks)
run.finish()

Import and log your CSV of Experiments

In some cases, you might have your experiment details in a CSV file. Common details found in such CSV files include:

A name for the experiment run
Initial notes
Tags to differentiate the experiments
Configurations needed for your experiment (with the added benefit of being able to utilize our Sweeps Hyperparameter Tuning).

Experiment	Model Name	Notes	Tags	Num Layers	Final Train Acc	Final Val Acc	Training Losses
Experiment 1	mnist-300-layers	Overfit way too much on training data	[latest]	300	0.99	0.90	[0.55, 0.45, 0.44, 0.42, 0.40, 0.39]
Experiment 2	mnist-250-layers	Current best model	[prod, best]	250	0.95	0.96	[0.55, 0.45, 0.44, 0.42, 0.40, 0.39]
Experiment 3	mnist-200-layers	Did worse than the baseline model. Need to debug	[debug]	200	0.76	0.70	[0.55, 0.45, 0.44, 0.42, 0.40, 0.39]
…	…	…	…	…	…	…
Experiment N	mnist-X-layers	NOTES	…	…	…	…	[…, …]

W&B can take CSV files of experiments and convert it into a W&B Experiment Run. The proceeding code snippets and code script demonstrates how to import and log your CSV file of experiments:

To get started, first read in your CSV file and convert it into a Pandas DataFrame. Replace "experiments.csv" with the name of your CSV file:

import wandb
import pandas as pd

FILENAME = "experiments.csv"
loaded_experiment_df = pd.read_csv(FILENAME)

PROJECT_NAME = "Converted Experiments"

EXPERIMENT_NAME_COL = "Experiment"
NOTES_COL = "Notes"
TAGS_COL = "Tags"
CONFIG_COLS = ["Num Layers"]
SUMMARY_COLS = ["Final Train Acc", "Final Val Acc"]
METRIC_COLS = ["Training Losses"]

# Format Pandas DataFrame to make it easier to work with
for i, row in loaded_experiment_df.iterrows():
    run_name = row[EXPERIMENT_NAME_COL]
    notes = row[NOTES_COL]
    tags = row[TAGS_COL]

    config = {}
    for config_col in CONFIG_COLS:
        config[config_col] = row[config_col]

    metrics = {}
    for metric_col in METRIC_COLS:
        metrics[metric_col] = row[metric_col]

    summaries = {}
    for summary_col in SUMMARY_COLS:
        summaries[summary_col] = row[summary_col]

Next, start a new W&B Run to track and log to W&B with wandb.init():

run = wandb.init(
    project=PROJECT_NAME, name=run_name, tags=tags, notes=notes, config=config
)

As an experiment runs, you might want to log every instance of your metrics so they are available to view, query, and analyze with W&B. Use the run.log() command to accomplish this:

run.log({key: val})

You can optionally log a final summary metric to define the outcome of the run using the define_metric API. This example adds the summary metrics to our run with run.summary.update():

run.summary.update(summaries)

For more information about summary metrics, see Log Summary Metrics.

Below is the full example script that converts the above sample table into a W&B Dashboard:

FILENAME = "experiments.csv"
loaded_experiment_df = pd.read_csv(FILENAME)

PROJECT_NAME = "Converted Experiments"

EXPERIMENT_NAME_COL = "Experiment"
NOTES_COL = "Notes"
TAGS_COL = "Tags"
CONFIG_COLS = ["Num Layers"]
SUMMARY_COLS = ["Final Train Acc", "Final Val Acc"]
METRIC_COLS = ["Training Losses"]

for i, row in loaded_experiment_df.iterrows():
    run_name = row[EXPERIMENT_NAME_COL]
    notes = row[NOTES_COL]
    tags = row[TAGS_COL]

    config = {}
    for config_col in CONFIG_COLS:
        config[config_col] = row[config_col]

    metrics = {}
    for metric_col in METRIC_COLS:
        metrics[metric_col] = row[metric_col]

    summaries = {}
    for summary_col in SUMMARY_COLS:
        summaries[summary_col] = row[summary_col]

    run = wandb.init(
        project=PROJECT_NAME, name=run_name, tags=tags, notes=notes, config=config
    )

    for key, val in metrics.items():
        if isinstance(val, list):
            for _val in val:
                run.log({key: _val})
        else:
            run.log({key: val})

    run.summary.update(summaries)
    run.finish()

2.1.7 - Track Jupyter notebooks

Use W&B with Jupyter to get interactive visualizations without leaving your notebook.

Use W&B with Jupyter to get interactive visualizations without leaving your notebook. Combine custom analysis, experiments, and prototypes, all fully logged.

Use cases for W&B with Jupyter notebooks

Iterative experimentation: Run and re-run experiments, tweaking parameters, and have all the runs you do saved automatically to W&B without having to take manual notes along the way.
Code saving: When reproducing a model, it’s hard to know which cells in a notebook ran, and in which order. Turn on code saving on your settings page to save a record of cell execution for each experiment.
Custom analysis: Once runs are logged to W&B, it’s easy to get a dataframe from the API and do custom analysis, then log those results to W&B to save and share in reports.

Getting started in a notebook

Start your notebook with the following code to install W&B and link your account:

!pip install wandb -qqq
import wandb
wandb.login()

Next, set up your experiment and save hyperparameters:

wandb.init(
    project="jupyter-projo",
    config={
        "batch_size": 128,
        "learning_rate": 0.01,
        "dataset": "CIFAR-100",
    },
)

After running wandb.init() , start a new cell with %%wandb to see live graphs in the notebook. If you run this cell multiple times, data will be appended to the run.

%%wandb

# Your training loop here

Try it for yourself in this example notebook.

Rendering live W&B interfaces directly in your notebooks

You can also display any existing dashboards, sweeps, or reports directly in your notebook using the %wandb magic:

# Display a project workspace
%wandb USERNAME/PROJECT
# Display a single run
%wandb USERNAME/PROJECT/runs/RUN_ID
# Display a sweep
%wandb USERNAME/PROJECT/sweeps/SWEEP_ID
# Display a report
%wandb USERNAME/PROJECT/reports/REPORT_ID
# Specify the height of embedded iframe
%wandb USERNAME/PROJECT -h 2048

As an alternative to the %%wandb or %wandb magics, after running wandb.init() you can end any cell with wandb.Run.finish() to show in-line graphs, or call ipython.display(...) on any report, sweep, or run object returned from our apis.

import wandb
from IPython.display import display
# Initialize a run
run = wandb.init()

# If cell outputs run.finish(), you'll see live graphs
run.finish()

Want to know more about what you can do with W&B? Check out our guide to logging data and media, learn how to integrate us with your favorite ML toolkits, or just dive straight into the reference docs or our repo of examples.

Additional Jupyter features in W&B

Easy authentication in Colab: When you call wandb.init for the first time in a Colab, we automatically authenticate your runtime if you’re currently logged in to W&B in your browser. On the overview tab of your run page, you’ll see a link to the Colab.
Jupyter Magic: Display dashboards, sweeps and reports directly in your notebooks. The %wandb magic accepts a path to your project, sweeps or reports and will render the W&B interface directly in the notebook.
Launch dockerized Jupyter: Call wandb docker --jupyter to launch a docker container, mount your code in it, ensure Jupyter is installed, and launch on port 8888.
Run cells in arbitrary order without fear: By default, we wait until the next time wandb.init is called to mark a run as finished. That allows you to run multiple cells (say, one to set up data, one to train, one to test) in whatever order you like and have them all log to the same run. If you turn on code saving in settings, you’ll also log the cells that were executed, in order and in the state in which they were run, enabling you to reproduce even the most non-linear of pipelines. To mark a run as complete manually in a Jupyter notebook, call run.finish.

import wandb

run = wandb.init()

# training script and logging goes here

run.finish()

2.1.8 - Experiments limits and performance

Keep your pages in W&B faster and more responsive by logging within these suggested bounds.

Keep your pages in W&B faster and more responsive by logging within the following suggested bounds.

Logging considerations

Use wandb.Run.log() to track experiment metrics.

Distinct metric count

For faster performance, keep the total number of distinct metrics in a project under 10,000.

import wandb

with wandb.init() as run:
    run.log(
        {
            "a": 1,  # "a" is a distinct metric
            "b": {
                "c": "hello",  # "b.c" is a distinct metric
                "d": [1, 2, 3],  # "b.d" is a distinct metric
            },
        }
    )

W&B automatically flattens nested values. This means that if you pass a dictionary, W&B turns it into a dot-separated name. For config values, W&B supports 3 dots in the name. For summary values, W&B supports 4 dots.

If your workspace suddenly slows down, check whether recent runs have unintentionally logged thousands of new metrics. (This is easiest to spot by seeing sections with thousands of plots that have only one or two runs visible on them.) If they have, consider deleting those runs and recreating them with the desired metrics.

Value width

Limit the size of a single logged value to under 1 MB and the total size of a single run.log call to under 25 MB. This limit does not apply to wandb.Media types like wandb.Image, wandb.Audio, etc.

import wandb

run = wandb.init(project="wide-values")

# not recommended
run.log({"wide_key": range(10000000)})

# not recommended
with open("large_file.json", "r") as f:
    large_data = json.load(f)
    run.log(large_data)

run.finish()

Wide values can affect the plot load times for all metrics in the run, not just the metric with the wide values.

Data is saved and tracked even if you log values wider than the recommended amount. However, your plots may load more slowly.

Metric frequency

Pick a logging frequency that is appropriate to the metric you are logging. As a general rule of thumb, log wider values less frequently than narrower values. W&B recommends:

Scalars: <100,000 logged points per metric
Media: <50,000 logged points per metric
Histograms: <10,000 logged points per metric

import wandb

with wandb.init(project="metric-frequency") as run:
    # Not recommended
    run.log(
        {
            "scalar": 1,  # 100,000 scalars
            "media": wandb.Image(...),  # 100,000 images
            "histogram": wandb.Histogram(...),  # 100,000 histograms
        }
    )

    # Recommended
    run.log(
        {
            "scalar": 1,  # 100,000 scalars
        },
        commit=True,
    )  # Commit batched, per-step metrics together

    run.log(
        {
            "media": wandb.Image(...),  # 50,000 images
        },
        commit=False,
    )
    
    run.log(
        {
            "histogram": wandb.Histogram(...),  # 10,000 histograms
        },
        commit=False,
    )

W&B continues to accept your logged data but pages may load more slowly if you exceed guidelines.

Config size

Limit the total size of your run config to less than 10 MB. Logging large values could slow down your project workspaces and runs table operations.

import wandb 

# Recommended
with wandb.init(
    project="config-size",
    config={
        "lr": 0.1,
        "batch_size": 32,
        "epochs": 4,
    }
) as run:
    # Your training code here
    pass

# Not recommended
with wandb.init(
    project="config-size",
    config={
        "large_list": list(range(10000000)),  # Large list
        "large_string": "a" * 10000000,  # Large string
    }
) as run:
    # Your training code here
    pass

# Not recommended
with open("large_config.json", "r") as f:
    large_config = json.load(f)
    wandb.init(config=large_config)

Workspace considerations

Run count

To reduce loading times, keep the total number of runs in a single project under:

100,000 on SaaS Cloud
10,000 on Dedicated Cloud or Self-managed

Run counts over these thresholds can slow down operations that involve project workspaces or runs tables, especially when grouping runs or collecting a large number of distinct metrics during runs. See also the Metric count section.

If your team accesses the same set of runs frequently, such as the set of recent runs, consider moving less frequently used runs in bulk to a new “archive” project, leaving a smaller set of runs in your working project.

Workspace performance

This section gives tips for optimizing the performance of your workspace.

Panel count

By default, a workspace is automatic, and generates standard panels for each logged key. If a workspace for a large project includes panels for many logged keys, the workspace may be slow to load and use. To improve performance, you can:

Reset the workspace to manual mode, which includes no panels by default.
Use Quick add to selectively add panels for the logged keys you need to visualize.

Deleting unused panels one at a time has little impact on performance. Instead, reset the workspace and seletively add back only those panels you need.

To learn more about configuring your workspace, refer to Panels.

Section count

Having hundreds of sections in a workspace can hurt performance. Consider creating sections based on high-level groupings of metrics and avoiding an anti-pattern of one section for each metric.

If you find you have too many sections and performance is slow, consider the workspace setting to create sections by prefix rather than suffix, which can result in fewer sections and better performance.

Metric count

When logging between 5000 and 100,000 metrics per run, W&B recommends using a manual workspace. In Manual mode, you can easily add and remove panels in bulk as you choose to explore different sets of metrics. With a more focused set of plots, the workspace loads faster. Metrics that are not plotted are still collected and stored as usual.

To reset a workspace to manual mode, click the workspace’s action ... menu, then click Reset workspace. Resetting a workspace has no impact on stored metrics for runs. See workspace panel management.

File count

Keep the total number of files uploaded for a single run under 1,000. You can use W&B Artifacts when you need to log a large number of files. Exceeding 1,000 files in a single run can slow down your run pages.

Reports vs. Workspaces

A report is a free-form composition of arbitrary arrangements of panels, text, and media, allowing you to easily share your insights with colleagues.

By contrast, a workspace allows high-density and performant analysis of dozens to thousands of metrics across hundreds to hundreds of thousands of runs. Workspaces have optimized caching, querying, and loading capabilities, when compared to reports. Workspaces are recommended for a project that is used primarily for analysis, rather than presentation, or when you need to show 20 or more plots together.

Python script performance

There are a few ways that the performance of your python script is reduced:

The size of your data is too large. Large data sizes could introduce a >1 ms overhead to the training loop.
The speed of your network and how the W&B backend is configured
If you call wandb.Run.log() more than a few times per second. This is due to a small latency added to the training loop every time wandb.Run.log() is called.

Is frequent logging slowing your training runs down? Check out this Colab for methods to get better performance by changing your logging strategy.

W&B does not assert any limits beyond rate limiting. The W&B Python SDK automatically completes an exponential “backoff” and “retry” requests that exceed limits. W&B Python SDK responds with a “Network failure” on the command line. For unpaid accounts, W&B may reach out in extreme cases where usage exceeds reasonable thresholds.

Rate limits

W&B SaaS Cloud API implements a rate limit to maintain system integrity and ensure availability. This measure prevents any single user from monopolizing available resources in the shared infrastructure, ensuring that the service remains accessible to all users. You may encounter a lower rate limit for a variety of reasons.

Rate limits are subject to change.

If you encounter a rate limit, you receive a HTTP 429 Rate limit exceeded error and the response includes rate limit HTTP headers.

Rate limit HTTP headers

The preceding table describes rate limit HTTP headers:

Header name	Description
RateLimit-Limit	The amount of quota available per time window, scaled in the range of 0 to 1000
RateLimit-Remaining	The amount of quota in the current rate limit window, scaled in the range of 0 and 1000
RateLimit-Reset	The number of seconds until the current quota resets

Rate limits on metric logging API

wandb.Run.log() logs your training data to W&B. This API is engaged through either online or offline syncing. In either case, it imposes a rate limit quota limit in a rolling time window. This includes limits on total request size and request rate, where latter refers to the number of requests in a time duration.

W&B applies rate limits per W&B project. So if you have 3 projects in a team, each project has its own rate limit quota. Users on Paid plans have higher rate limits than Free plans.

If you encounter a rate limit, you receive a HTTP 429 Rate limit exceeded error and the response includes rate limit HTTP headers.

Suggestions for staying under the metrics logging API rate limit

Exceeding the rate limit may delay run.finish() until the rate limit resets. To avoid this, consider the following strategies:

Update your W&B Python SDK version: Ensure you are using the latest version of the W&B Python SDK. The W&B Python SDK is regularly updated and includes enhanced mechanisms for gracefully retrying requests and optimizing quota usage.
Reduce metric logging frequency: Minimize the frequency of logging metrics to conserve your quota. For example, you can modify your code to log metrics every five epochs instead of every epoch:

import wandb
import random

with wandb.init(project="basic-intro") as run:
    for epoch in range(10):
        # Simulate training and evaluation
        accuracy = 1 - 2 ** -epoch - random.random() / epoch
        loss = 2 ** -epoch + random.random() / epoch

        # Log metrics every 5 epochs
        if epoch % 5 == 0:
            run.log({"acc": accuracy, "loss": loss})

Manual data syncing: W&B store your run data locally if you are rate limited. You can manually sync your data with the command wandb sync <run-file-path>. For more details, see the wandb sync reference.

Rate limits on GraphQL API

The W&B Models UI and SDK’s public API make GraphQL requests to the server for querying and modifying data. For all GraphQL requests in SaaS Cloud, W&B applies rate limits per IP address for unauthorized requests and per user for authorized requests. The limit is based on request rate (request per second) within a fixed time window, where your pricing plan determines the default limits. For relevant SDK requests that specify a project path (for example, reports, runs, artifacts), W&B applies rate limits per project, measured by database query time.

Users on Teams and Enterprise plans receive higher rate limits than those on the Free plan. When you hit the rate limit while using the W&B Models SDK’s public API, you see a relevant message indicating the error in the standard output.

If you encounter a rate limit, you receive a HTTP 429 Rate limit exceeded error and the response includes rate limit HTTP headers.

Suggestions for staying under the GraphQL API rate limit

If you are fetching a large volume of data using the W&B Models SDK’s public API, consider waiting at least one second between requests. If you receive a HTTP 429 Rate limit exceeded error or see RateLimit-Remaining=0 in the response headers, wait for the number of seconds specified in RateLimit-Reset before retrying.

Browser considerations

The W&B app can be memory-intensive and performs best in Chrome. Depending on your computer’s memory, having W&B active in 3+ tabs at once can cause performance to degrade. If you encounter unexpectedly slow performance, consider closing other tabs or applications.

Reporting performance issues to W&B

W&B takes performance seriously and investigates every report of lag. To expedite investigation, when reporting slow loading times consider invoking W&B’s built-in performance logger that captures key metrics and performance events. Append the URL parameter &PERF_LOGGING to a page that is loading slowly, then share the output of your console with your account team or Support.

2.1.9 - Reproduce experiments

Reproduce an experiment that a team member creates to verify and validate their results.

Before you reproduce an experiment, you need to make note of the:

Name of the project the run was logged to
Name of the run you want to reproduce

To reproduce an experiment:

Navigate to the project where the run is logged to.
Select the Workspace tab in the left sidebar.
From the list of runs, select the run that you want to reproduce.
Click Overview.

To continue, download the experiment’s code at a given hash or clone the experiment’s entire repository.

Download the experiment’s Python script or notebook:

In the Command field, make a note of the name of the script that created the experiment.
Select the Code tab in the left navigation bar.
Click Download next to the file that corresponds to the script or notebook.

Clone the GitHub repository your teammate used when creating the experiment. To do this:

If necessary, gain access to the GitHub repository that your teammate used to create the experiment.
Copy the Git repository field, which contains the GitHub repository URL.

Clone the repository:

git clone https://github.com/your-repo.git && cd your-repo

Copy and paste the Git state field into your terminal. The Git state is a set of Git commands that checks out the exact commit that your teammate used to create the experiment. Replace values specified in the proceeding code snippet with your own:
```
git checkout -b "<run-name>" 0123456789012345678901234567890123456789
```

Select Files in the left navigation bar.
Download the requirements.txt file and store it in your working directory. This directory should contain either the cloned GitHub repository or the downloaded Python script or notebook.
(Recommended) Create a Python virtual environment.
Install the requirements specified in the requirements.txt file.
```
pip install -r requirements.txt
```
Now that you have the code and dependencies, you can run the script or notebook to reproduce the experiment. If you cloned a repository, you might need to navigate to the directory where the script or notebook is located. Otherwise, you can run the script or notebook from your working directory.

If you downloaded a Python notebook, navigate to the directory where you downloaded the notebook and run the following command in your terminal:

jupyter notebook

If you downloaded a Python script, navigate to the directory where you downloaded the script and run the following command in your terminal; Replace values enclosed in <> with your own:

python <your-script-name>.py

2.1.10 - Import and export data

Import data from MLFlow, export or update data that you have saved to W&B

Export data or import data with W&B Public APIs.

This feature requires python>=3.8

Import data from MLFlow

W&B supports importing data from MLFlow, including experiments, runs, artifacts, metrics, and other metadata.

Install dependencies:

# note: this requires py38+
pip install wandb[importers]

wandb login

Import all runs from an existing MLFlow server:

from wandb.apis.importers.mlflow import MlflowImporter

importer = MlflowImporter(mlflow_tracking_uri="...")

runs = importer.collect_runs()
importer.import_runs(runs)

By default, importer.collect_runs() collects all runs from the MLFlow server. If you prefer to upload a special subset, you can construct your own runs iterable and pass it to the importer.

import mlflow
from wandb.apis.importers.mlflow import MlflowRun

client = mlflow.tracking.MlflowClient(mlflow_tracking_uri)

runs: Iterable[MlflowRun] = []
for run in mlflow_client.search_runs(...):
    runs.append(MlflowRun(run, client))

importer.import_runs(runs)

You might need to configure the Databricks CLI first if you import from Databricks MLFlow.

Set mlflow-tracking-uri="databricks" in the previous step.

To skip importing artifacts, you can pass artifacts=False:

importer.import_runs(runs, artifacts=False)

To import to a specific W&B entity and project, you can pass a Namespace:

from wandb.apis.importers import Namespace

importer.import_runs(runs, namespace=Namespace(entity, project))

Export Data

Use the Public API to export or update data that you have saved to W&B. Before using this API, log data from your script. Check the Quickstart for more details.

Use Cases for the Public API

Export Data: Pull down a dataframe for custom analysis in a Jupyter Notebook. Once you have explored the data, you can sync your findings by creating a new analysis run and logging results, for example: wandb.init(job_type="analysis")
Update Existing Runs: You can update the data logged in association with a W&B run. For example, you might want to update the config of a set of runs to include additional information, like the architecture or a hyperparameter that wasn’t originally logged.

See the Generated Reference Docs for details on available functions.

Create an API key

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Find the run path

To use the Public API, you’ll often need the run path which is <entity>/<project>/<run_id>. In the app UI, open a run page and click the Overview tab to get the run path.

Export Run Data

Download data from a finished or active run. Common usage includes downloading a dataframe for custom analysis in a Jupyter notebook, or using custom logic in an automated environment.

import wandb

api = wandb.Api()
run = api.run("<entity>/<project>/<run_id>")

The most commonly used attributes of a run object are:

Attribute	Meaning
`run.config`	A dictionary of the run’s configuration information, such as the hyperparameters for a training run or the preprocessing methods for a run that creates a dataset Artifact. Think of these as the run’s inputs.
`run.history()`	A list of dictionaries meant to store values that change while the model is training such as loss. The command `run.log()` appends to this object.
`run.summary`	A dictionary of information that summarizes the run’s results. This can be scalars like accuracy and loss, or large files. By default, `run.log()` sets the summary to the final value of a logged time series. The contents of the summary can also be set directly. Think of the summary as the run’s outputs.

You can also modify or update the data of past runs. By default a single instance of an api object will cache all network requests. If your use case requires real time information in a running script, call api.flush() to get updated values.

Understanding different run attributes

The following code snippet shows how to create a run, log some data, and then access the run’s attributes:

import wandb
import random

with wandb.init(project="public-api-example") as run:
    n_epochs = 5
    config = {"n_epochs": n_epochs}
    run.config.update(config)
    for n in range(run.config.get("n_epochs")):
        run.log(
            {"val": random.randint(0, 1000), "loss": (random.randint(0, 1000) / 1000.00)}
        )

The following sections describe the different outputs for the above run object attributes

`run.config`

{"n_epochs": 5}

`run.summary`

{
    "_runtime": 4,
    "_step": 4,
    "_timestamp": 1644345412,
    "_wandb": {"runtime": 3},
    "loss": 0.041,
    "val": 525,
}

Sampling

The default history method samples the metrics to a fixed number of samples (the default is 500, you can change this with the samples __ argument). If you want to export all of the data on a large run, you can use the run.scan_history() method. For more details see the API Reference.

Querying Multiple Runs

This example script finds a project and outputs a CSV of runs with name, configs and summary stats. Replace <entity> and <project> with your W&B entity and the name of your project, respectively.

import pandas as pd
import wandb

api = wandb.Api()
entity, project = "<entity>", "<project>"
runs = api.runs(entity + "/" + project)

summary_list, config_list, name_list = [], [], []
for run in runs:
    # .summary contains output keys/values for
    # metrics such as accuracy.
    #  We call ._json_dict to omit large files
    summary_list.append(run.summary._json_dict)

    # .config contains the hyperparameters.
    #  We remove special values that start with _.
    config_list.append({k: v for k, v in run.config.items() if not k.startswith("_")})

    # .name is the human-readable name of the run.
    name_list.append(run.name)

runs_df = pd.DataFrame(
    {"summary": summary_list, "config": config_list, "name": name_list}
)

runs_df.to_csv("project.csv")

run.finish()

The W&B API also provides a way for you to query across runs in a project with api.runs(). The most common use case is exporting runs data for custom analysis. The query interface is the same as the one MongoDB uses.

runs = api.runs(
    "username/project",
    {"$or": [{"config.experiment_name": "foo"}, {"config.experiment_name": "bar"}]},
)
print(f"Found {len(runs)} runs")

Calling api.runs returns a Runs object that is iterable and acts like a list. By default the object loads 50 runs at a time in sequence as required, but you can change the number loaded per page with the per_page keyword argument.

api.runs also accepts an order keyword argument. The default order is -created_at. To order results ascending, specify +created_at. You can also sort by config or summary values. For example, summary.val_acc or config.experiment_name.

Error Handling

If errors occur while talking to W&B servers a wandb.CommError will be raised. The original exception can be introspected via the exc attribute.

Get the latest git commit through the API

In the UI, click on a run and then click the Overview tab on the run page to see the latest git commit. It’s also in the file wandb-metadata.json . Using the public API, you can get the git hash with run.commit.

Get a run’s name and ID during a run

After calling wandb.init() you can access the random run ID or the human readable run name from your script like this:

Unique run ID (8 character hash): run.id
Random run name (human readable): run.name

If you’re thinking about ways to set useful identifiers for your runs, here’s what we recommend:

Run ID: leave it as the generated hash. This needs to be unique across runs in your project.
Run name: This should be something short, readable, and preferably unique so that you can tell the difference between different lines on your charts.
Run notes: This is a great place to put a quick description of what you’re doing in your run. You can set this with wandb.init(notes="your notes here")
Run tags: Track things dynamically in run tags, and use filters in the UI to filter your table down to just the runs you care about. You can set tags from your script and then edit them in the UI, both in the runs table and the overview tab of the run page. See the detailed instructions here.

Public API Examples

Export data to visualize in matplotlib or seaborn

Check out our API examples for some common export patterns. You can also click the download button on a custom plot or on the expanded runs table to download a CSV from your browser.

Read metrics from a run

This example outputs timestamp and accuracy saved with run.log({"accuracy": acc}) for a run saved to "<entity>/<project>/<run_id>".

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
if run.state == "finished":
    for i, row in run.history().iterrows():
        print(row["_timestamp"], row["accuracy"])

Filter runs

You can filters by using the MongoDB Query Language.

Date

runs = api.runs(
    "<entity>/<project>",
    {"$and": [{"created_at": {"$lt": "YYYY-MM-DDT##", "$gt": "YYYY-MM-DDT##"}}]},
)

Read specific metrics from a run

To pull specific metrics from a run, use the keys argument. The default number of samples when using run.history() is 500. Logged steps that do not include a specific metric will appear in the output dataframe as NaN. The keys argument will cause the API to sample steps that include the listed metric keys more frequently.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
if run.state == "finished":
    for i, row in run.history(keys=["accuracy"]).iterrows():
        print(row["_timestamp"], row["accuracy"])

Compare two runs

This will output the config parameters that are different between run1 and run2.

import pandas as pd
import wandb

api = wandb.Api()

# replace with your <entity>, <project>, and <run_id>
run1 = api.run("<entity>/<project>/<run_id>")
run2 = api.run("<entity>/<project>/<run_id>")


df = pd.DataFrame([run1.config, run2.config]).transpose()

df.columns = [run1.name, run2.name]
print(df[df[run1.name] != df[run2.name]])

Outputs:

              c_10_sgd_0.025_0.01_long_switch base_adam_4_conv_2fc
batch_size                                 32                   16
n_conv_layers                               5                    4
optimizer                             rmsprop                 adam

Update metrics for a run, after the run has finished

This example sets the accuracy of a previous run to 0.9. It also modifies the accuracy histogram of a previous run to be the histogram of numpy_array.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
run.summary["accuracy"] = 0.9
run.summary["accuracy_histogram"] = wandb.Histogram(numpy_array)
run.summary.update()

Rename a metric in a completed run

This example renames a summary column in your tables.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
run.summary["new_name"] = run.summary["old_name"]
del run.summary["old_name"]
run.summary.update()

Renaming a column only applies to tables. Charts will still refer to metrics by their original names.

Update config for an existing run

This examples updates one of your configuration settings.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
run.config["key"] = updated_value
run.update()

Export system resource consumptions to a CSV file

The snippet below would find the system resource consumptions and then, save them to a CSV.

import wandb

run = wandb.Api().run("<entity>/<project>/<run_id>")

system_metrics = run.history(stream="events")
system_metrics.to_csv("sys_metrics.csv")

Get unsampled metric data

When you pull data from history, by default it’s sampled to 500 points. Get all the logged data points using run.scan_history(). Here’s an example downloading all the loss data points logged in history.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
history = run.scan_history()
losses = [row["loss"] for row in history]

Get paginated data from history

If metrics are being fetched slowly on our backend or API requests are timing out, you can try lowering the page size in scan_history so that individual requests don’t time out. The default page size is 500, so you can experiment with different sizes to see what works best:

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
run.scan_history(keys=sorted(cols), page_size=100)

Export metrics from all runs in a project to a CSV file

This script pulls down the runs in a project and produces a dataframe and a CSV of runs including their names, configs, and summary stats. Replace <entity> and <project> with your W&B entity and the name of your project, respectively.

import pandas as pd
import wandb

api = wandb.Api()
entity, project = "<entity>", "<project>"
runs = api.runs(entity + "/" + project)

summary_list, config_list, name_list = [], [], []
for run in runs:
    # .summary contains the output keys/values
    #  for metrics such as accuracy.
    #  We call ._json_dict to omit large files
    summary_list.append(run.summary._json_dict)

    # .config contains the hyperparameters.
    #  We remove special values that start with _.
    config_list.append({k: v for k, v in run.config.items() if not k.startswith("_")})

    # .name is the human-readable name of the run.
    name_list.append(run.name)

runs_df = pd.DataFrame(
    {"summary": summary_list, "config": config_list, "name": name_list}
)

runs_df.to_csv("project.csv")

Get the starting time for a run

This code snippet retrieves the time at which the run was created.

import wandb

api = wandb.Api()

run = api.run("entity/project/run_id")
start_time = run.created_at

Upload files to a finished run

The code snippet below uploads a selected file to a finished run.

import wandb

api = wandb.Api()

run = api.run("entity/project/run_id")
run.upload_file("file_name.extension")

Download a file from a run

This finds the file “model-best.h5” associated with run ID uxte44z7 in the cifar project and saves it locally.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
run.file("model-best.h5").download()

Download all files from a run

This finds all files associated with a run and saves them locally.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
for file in run.files():
    file.download()

Get runs from a specific sweep

This snippet downloads all the runs associated with a particular sweep.

import wandb

api = wandb.Api()

sweep = api.sweep("<entity>/<project>/<sweep_id>")
sweep_runs = sweep.runs

Get the best run from a sweep

The following snippet gets the best run from a given sweep.

import wandb

api = wandb.Api()

sweep = api.sweep("<entity>/<project>/<sweep_id>")
best_run = sweep.best_run()

The best_run is the run with the best metric as defined by the metric parameter in the sweep config.

Download the best model file from a sweep

This snippet downloads the model file with the highest validation accuracy from a sweep with runs that saved model files to model.h5.

import wandb

api = wandb.Api()

sweep = api.sweep("<entity>/<project>/<sweep_id>")
runs = sorted(sweep.runs, key=lambda run: run.summary.get("val_acc", 0), reverse=True)
val_acc = runs[0].summary.get("val_acc", 0)
print(f"Best run {runs[0].name} with {val_acc}% val accuracy")

runs[0].file("model.h5").download(replace=True)
print("Best model saved to model-best.h5")

Delete all files with a given extension from a run

This snippet deletes files with a given extension from a run.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")

extension = ".png"
files = run.files()
for file in files:
    if file.name.endswith(extension):
        file.delete()

Download system metrics data

This snippet produces a dataframe with all the system resource consumption metrics for a run and then saves it to a CSV.

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")
system_metrics = run.history(stream="events")
system_metrics.to_csv("sys_metrics.csv")

Update summary metrics

You can pass a dictionary to update summary metrics.

summary.update({"key": val})

Get the command that ran the run

Each run captures the command that launched it on the run overview page. To pull this command down from the API, you can run:

import wandb

api = wandb.Api()

run = api.run("<entity>/<project>/<run_id>")

meta = json.load(run.file("wandb-metadata.json").download())
program = ["python"] + [meta["program"]] + meta["args"]

2.1.11 - Environment variables

Set W&B environment variables.

When you’re running a script in an automated environment, you can control W&B with environment variables set before the script runs or within the script.

# This is secret and shouldn't be checked into version control
WANDB_API_KEY=$YOUR_API_KEY
# Name and notes optional
WANDB_NAME="My first run"
WANDB_NOTES="Smaller learning rate, more regularization."

# Only needed if you don't check in the wandb/settings file
WANDB_ENTITY=$username
WANDB_PROJECT=$project

# If you don't want your script to sync to the cloud
os.environ["WANDB_MODE"] = "offline"

# Add sweep ID tracking to Run objects and related classes
os.environ["WANDB_SWEEP_ID"] = "b05fq58z"

Optional environment variables

Use these optional environment variables to do things like set up authentication on remote machines.

Variable name	Usage
`WANDB_ANONYMOUS`	Set this to `allow`, `never`, or `must` to let users create anonymous runs with secret urls.
`WANDB_API_KEY`	Sets the authentication key associated with your account. You can find your key on your settings page. This must be set if `wandb login` hasn’t been run on the remote machine.
`WANDB_BASE_URL`	If you’re using wandb/local you should set this environment variable to `http://YOUR_IP:YOUR_PORT`
`WANDB_CACHE_DIR`	This defaults to ~/.cache/wandb, you can override this location with this environment variable
`WANDB_CONFIG_DIR`	This defaults to ~/.config/wandb, you can override this location with this environment variable
`WANDB_CONFIG_PATHS`	Comma separated list of yaml files to load into wandb.config. See config.
`WANDB_CONSOLE`	Set this to “off” to disable stdout / stderr logging. This defaults to “on” in environments that support it.
`WANDB_DATA_DIR`	Where to upload staging artifacts. The default location depends on your platform, because it uses the value of `user_data_dir` from the `platformdirs` Python package. Make sure this directory exists and the running user has permission to write to it.
`WANDB_DIR`	Where to store all generated files. If unset, defaults to the `wandb` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of downloaded artifacts, which you can set using the `WANDB_ARTIFACT_DIR` environment variable.
`WANDB_ARTIFACT_DIR`	Where to store all downloaded artifacts. If unset, defaults to the `artifacts` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of generated metadata files, which you can set using the `WANDB_DIR` environment variable.
`WANDB_DISABLE_GIT`	Prevent wandb from probing for a git repository and capturing the latest commit / diff.
`WANDB_DISABLE_CODE`	Set this to true to prevent wandb from saving notebooks or git diffs. We’ll still save the current commit if we’re in a git repo.
`WANDB_DOCKER`	Set this to a docker image digest to enable restoring of runs. This is set automatically with the wandb docker command. You can obtain an image digest by running `wandb docker my/image/name:tag --digest`
`WANDB_ENTITY`	The entity associated with your run. If you have run `wandb init` in the directory of your training script, it will create a directory named wandb and will save a default entity which can be checked into source control. If you don’t want to create that file or want to override the file you can use the environmental variable.
`WANDB_ERROR_REPORTING`	Set this to false to prevent wandb from logging fatal errors to its error tracking system.
`WANDB_HOST`	Set this to the hostname you want to see in the wandb interface if you don’t want to use the system provided hostname
`WANDB_IGNORE_GLOBS`	Set this to a comma separated list of file globs to ignore. These files will not be synced to the cloud.
`WANDB_JOB_NAME`	Specify a name for any jobs created by `wandb`.
`WANDB_JOB_TYPE`	Specify the job type, like “training” or “evaluation” to indicate different types of runs. See grouping for more info.
`WANDB_MODE`	If you set this to “offline” wandb will save your run metadata locally and not sync to the server. If you set this to `disabled` wandb will turn off completely.
`WANDB_NAME`	The human-readable name of your run. If not set it will be randomly generated for you
`WANDB_NOTEBOOK_NAME`	If you’re running in jupyter you can set the name of the notebook with this variable. We attempt to auto detect this.
`WANDB_NOTES`	Longer notes about your run. Markdown is allowed and you can edit this later in the UI.
`WANDB_PROJECT`	The project associated with your run. This can also be set with `wandb init`, but the environmental variable will override the value.
`WANDB_RESUME`	By default this is set to never. If set to auto wandb will automatically resume failed runs. If set to must forces the run to exist on startup. If you want to always generate your own unique ids, set this to allow and always set `WANDB_RUN_ID`.
`WANDB_RUN_GROUP`	Specify the experiment name to automatically group runs together. See grouping for more info.
`WANDB_RUN_ID`	Set this to a globally unique string (per project) corresponding to a single run of your script. It must be no longer than 64 characters. All non-word characters will be converted to dashes. This can be used to resume an existing run in cases of failure.
`WANDB_QUIET`	Set this to `true` to limit statements logged to standard output to critical statements only. If this is set all logs will be written to `$WANDB_DIR/debug.log`.
`WANDB_SILENT`	Set this to `true` to silence wandb log statements. This is useful for scripted commands. If this is set all logs will be written to `$WANDB_DIR/debug.log`.
`WANDB_SHOW_RUN`	Set this to `true` to automatically open a browser with the run url if your operating system supports it.
`WANDB_SWEEP_ID`	Add sweep ID tracking to `Run` objects and related classes, and display in the UI.
`WANDB_TAGS`	A comma separated list of tags to be applied to the run.
`WANDB_USERNAME`	The username of a member of your team associated with the run. This can be used along with a service account API key to enable attribution of automated runs to members of your team.
`WANDB_USER_EMAIL`	The email of a member of your team associated with the run. This can be used along with a service account API key to enable attribution of automated runs to members of your team.

Singularity environments

If you’re running containers in Singularity you can pass environment variables by pre-pending the above variables with SINGULARITYENV_. More details about Singularity environment variables can be found here.

Running on AWS

If you’re running batch jobs in AWS, it’s easy to authenticate your machines with your W&B credentials. Get your API key from your settings page, and set the WANDB_API_KEY environment variable in the AWS batch job spec.

2.2 - Sweeps

Hyperparameter search and model optimization with W&B Sweeps

Try in Colab Try in W&B

Use W&B Sweeps to automate hyperparameter search and visualize rich, interactive experiment tracking. Pick from popular search methods such as Bayesian, grid search, and random to search the hyperparameter space. Scale and parallelize sweep across one or more machines.

How it works

Create a sweep with two W&B CLI commands:

Initialize a sweep

wandb sweep --project <project-name> <path-to-config file>

Start the sweep agent

wandb agent <sweep-ID>

The preceding code snippet, and the colab linked on this page, show how to initialize and create a sweep with wht W&B CLI. See the Sweeps walkthrough for a step-by-step outline of the W&B Python SDK commands to use to define a sweep configuration, initialize a sweep, and start a sweep.

How to get started

Depending on your use case, explore the following resources to get started with W&B Sweeps:

Read through the sweeps walkthrough for a step-by-step outline of the W&B Python SDK commands to use to define a sweep configuration, initialize a sweep, and start a sweep.
Explore this chapter to learn how to:
Explore a curated list of Sweep experiments that explore hyperparameter optimization with W&B Sweeps. Results are stored in W&B Reports.

For a step-by-step video, see: Tune Hyperparameters Easily with W&B Sweeps.

2.2.1 - Tutorial: Define, initialize, and run a sweep

Sweeps quickstart shows how to define, initialize, and run a sweep. There are four main steps

This page shows how to define, initialize, and run a sweep. There are four main steps:

Set up your training code
Define the search space with a sweep configuration
Initialize the sweep
Start the sweep agent

Copy and paste the following code into a Jupyter Notebook or Python script:

# Import the W&B Python Library and log into W&B
import wandb

# 1: Define objective/training function
def objective(config):
    score = config.x**3 + config.y
    return score

def main():
    with wandb.init(project="my-first-sweep") as run:
        score = objective(run.config)
        run.log({"score": score})

# 2: Define the search space
sweep_configuration = {
    "method": "random",
    "metric": {"goal": "minimize", "name": "score"},
    "parameters": {
        "x": {"max": 0.1, "min": 0.01},
        "y": {"values": [1, 3, 7]},
    },
}

# 3: Start the sweep
sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep")

wandb.agent(sweep_id, function=main, count=10)

The following sections break down and explains each step in the code sample.

Set up your training code

Define a training function that takes in hyperparameter values from wandb.Run.config and uses them to train a model and return metrics.

Optionally provide the name of the project where you want the output of the W&B Run to be stored (project parameter in wandb.init()). If the project is not specified, the run is put in an “Uncategorized” project.

Both the sweep and the run must be in the same project. Therefore, the name you provide when you initialize W&B must match the name of the project you provide when you initialize a sweep.

# 1: Define objective/training function
def objective(config):
    score = config.x**3 + config.y
    return score


def main():
    with wandb.init(project="my-first-sweep") as run:
        score = objective(run.config)
        run.log({"score": score})

Define the search space with a sweep configuration

Specify the hyperparameters to sweep in a dictionary. For configuration options, see Define sweep configuration.

The proceeding example demonstrates a sweep configuration that uses a random search ('method':'random'). The sweep will randomly select a random set of values listed in the configuration for the batch size, epoch, and the learning rate.

W&B minimizes the metric specified in the metric key when "goal": "minimize" is associated with it. In this case, W&B will optimize for minimizing the metric score ("name": "score").

# 2: Define the search space
sweep_configuration = {
    "method": "random",
    "metric": {"goal": "minimize", "name": "score"},
    "parameters": {
        "x": {"max": 0.1, "min": 0.01},
        "y": {"values": [1, 3, 7]},
    },
}

Initialize the Sweep

W&B uses a Sweep Controller to manage sweeps on the cloud (standard), locally (local) across one or more machines. For more information about Sweep Controllers, see Search and stop algorithms locally.

A sweep identification number is returned when you initialize a sweep:

sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep")

For more information about initializing sweeps, see Initialize sweeps.

Start the Sweep

Use the wandb.agent API call to start a sweep.

wandb.agent(sweep_id, function=main, count=10)

Visualize results (optional)

Open your project to see your live results in the W&B App dashboard. With just a few clicks, construct rich, interactive charts like parallel coordinates plots, parameter importance analyzes, and additional chart types.

For more information about how to visualize results, see Visualize sweep results. For an example dashboard, see this sample Sweeps Project.

Stop the agent (optional)

In the terminal, press Ctrl+C to stop the current run. Press it again to terminate the agent.

2.2.2 - Add W&B (wandb) to your code

Add W&B to your Python code script or Jupyter Notebook.

There are numerous ways to add the W&B Python SDK to your script or notebook. This section provides a “best practice” example that shows how to integrate the W&B Python SDK into your own code.

Original training script

Suppose you have the following code in a Python script. We define a function called main that mimics a typical training loop. For each epoch, the accuracy and loss is computed on the training and validation data sets. The values are randomly generated for the purpose of this example.

We defined a dictionary called config where we store hyperparameters values. At the end of the cell, we call the main function to execute the mock training code.

import random
import numpy as np

def train_one_epoch(epoch, lr, bs):
    acc = 0.25 + ((epoch / 30) + (random.random() / 10))
    loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
    return acc, loss

def evaluate_one_epoch(epoch):
    acc = 0.1 + ((epoch / 20) + (random.random() / 10))
    loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
    return acc, loss

# config variable with hyperparameter values
config = {"lr": 0.0001, "bs": 16, "epochs": 5}

def main():
    # Note that we define values from `wandb.Run.config`
    # instead of defining hard values
    lr = config["lr"]
    bs = config["bs"]
    epochs = config["epochs"]

    for epoch in np.arange(1, epochs):
        train_acc, train_loss = train_one_epoch(epoch, lr, bs)
        val_acc, val_loss = evaluate_one_epoch(epoch)

        print("epoch: ", epoch)
        print("training accuracy:", train_acc, "training loss:", train_loss)
        print("validation accuracy:", val_acc, "training loss:", val_loss)

Training script with W&B Python SDK

The following code examples demonstrate how to add the W&B Python SDK into your code. If you start W&B Sweep jobs in the CLI, you will want to explore the CLI tab. If you start W&B Sweep jobs within a Jupyter notebook or Python script, explore the Python SDK tab.

To create a W&B Sweep, we added the following to the code example:

Import the W&B Python SDK.
Create a dictionary object where the key-value pairs define the sweep configuration. In the proceeding example, the batch size (batch_size), epochs (epochs), and the learning rate (lr) hyperparameters are varied during each sweep. For more information, see Define sweep configuration.
Pass the sweep configuration dictionary to wandb.sweep(). This initializes the sweep. This returns a sweep ID (sweep_id). For more information, see Initialize sweeps.
Use the wandb.init() API to generate a background process to sync and log data as a W&B Run.
(Optional) define values from wandb.config instead of defining hard coded values.
Log the metric you want to optimize with wandb.Run.log(). You must log the metric defined in your configuration. Within the configuration dictionary (sweep_configuration in this example), you define the sweep to maximize the val_acc value.
Start the sweep with the wandb.agent API call. Provide the sweep ID and the name of the function the sweep will execute (function=main), and specify the maximum number of runs to try to four (count=4). For more informationp, see Start sweep agents.

import wandb
import numpy as np
import random


# Define training function that takes in hyperparameter
# values from `wandb.Run.config` and uses them to train a
# model and return the metrics
def train_one_epoch(epoch, lr, bs):
    acc = 0.25 + ((epoch / 30) + (random.random() / 10))
    loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
    return acc, loss


def evaluate_one_epoch(epoch):
    acc = 0.1 + ((epoch / 20) + (random.random() / 10))
    loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
    return acc, loss


# Define a sweep config dictionary
sweep_configuration = {
    "method": "random",
    "name": "sweep",
    "metric": {"goal": "maximize", "name": "val_acc"},
    "parameters": {
        "batch_size": {"values": [16, 32, 64]},
        "epochs": {"values": [5, 10, 15]},
        "lr": {"max": 0.1, "min": 0.0001},
    },
}

# (Optional) Provide a name for the project.
project = "my-first-sweep"

def main():
    # Use the `with` context manager statement to automatically end the run.
    # This is equivalent to using `run.finish()` at the end of each run
    with wandb.init(project=project) as run:

        # This code fetches the hyperparameter values from `wandb.Run.config`
        # instead of defining them explicitly
        lr = run.config["lr"]
        bs = run.config["batch_size"]
        epochs = run.config["epochs"]

        # Execute the training loop and log the performance values to W&B
        for epoch in np.arange(1, epochs):
            train_acc, train_loss = train_one_epoch(epoch, lr, bs)
            val_acc, val_loss = evaluate_one_epoch(epoch)

            run.log(
                {
                    "epoch": epoch,
                    "train_acc": train_acc,
                    "train_loss": train_loss,
                    "val_acc": val_acc,
                    "val_loss": val_loss,
                }
            )


if __name__ == "__main__":
    # Initialize the sweep by passing in the config dictionary
    sweep_id = wandb.sweep(sweep=sweep_configuration, project=project)

    # Start the sweep job
    wandb.agent(sweep_id, function=main, count=4)

The preceding code snippet shows how to initialize a wandb.init() API within a with context manager statement to generate a background process to sync and log data as a W&B Run. This ensures the run is properly terminated after uploading the logged values. An alternative approach is to call wandb.init() and wandb.Run.finish() at the beginning and end of the training script, respectively.

To create a W&B Sweep, we first create a YAML configuration file. The configuration file contains the hyperparameters we want the sweep to explore. In the proceeding example, the batch size (batch_size), epochs (epochs), and the learning rate (lr) hyperparameters are varied during each sweep.

# config.yaml
program: train.py
method: random
name: sweep
metric:
  goal: maximize
  name: val_acc
parameters:
  batch_size:
    values: [16, 32, 64]
  lr:
    min: 0.0001
    max: 0.1
  epochs:
    values: [5, 10, 15]

For more information on how to create a W&B Sweep configuration, see Define sweep configuration.

You must provide the name of your Python script for the program key in your YAML file.

Next, we add the following to the code example:

Import the W&B Python SDK (wandb) and PyYAML (yaml). PyYAML is used to read in our YAML configuration file.
Read in the configuration file.
Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. We pass the config object to the config parameter.
Define hyperparameter values from wandb.Run.config instead of using hard coded values.
Log the metric we want to optimize with wandb.Run.log(). You must log the metric defined in your configuration. Within the configuration dictionary (sweep_configuration in this example) we defined the sweep to maximize the val_acc value.

import wandb
import yaml
import random
import numpy as np


def train_one_epoch(epoch, lr, bs):
    acc = 0.25 + ((epoch / 30) + (random.random() / 10))
    loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
    return acc, loss


def evaluate_one_epoch(epoch):
    acc = 0.1 + ((epoch / 20) + (random.random() / 10))
    loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
    return acc, loss


def main():
    # Set up your default hyperparameters
    with open("./config.yaml") as file:
        config = yaml.load(file, Loader=yaml.FullLoader)

    with wandb.init(config=config) as run:
        for epoch in np.arange(1, run.config['epochs']):
            train_acc, train_loss = train_one_epoch(epoch, run.config['lr'], run.config['batch_size'])
            val_acc, val_loss = evaluate_one_epoch(epoch)
            run.log(
                {
                    "epoch": epoch,
                    "train_acc": train_acc,
                    "train_loss": train_loss,
                    "val_acc": val_acc,
                    "val_loss": val_loss,
                }
            )

# Call the main function.
main()

In your CLI, set a maximum number of runs for the sweep agent to try. This is optional. This example we set the maximum number to 5.

NUM=5

Next, initialize the sweep with the wandb sweep command. Provide the name of the YAML file. Optionally provide the name of the project for the project flag (--project):

wandb sweep --project sweep-demo-cli config.yaml

This returns a sweep ID. For more information on how to initialize sweeps, see Initialize sweeps.

Copy the sweep ID and replace sweepID in the proceeding code snippet to start the sweep job with the wandb agent command:

wandb agent --count $NUM your-entity/sweep-demo-cli/sweepID

For more information, see Start sweep jobs.

Consideration when logging metrics

Be sure to log the sweep’s metric to W&B explicitly. Do not log metrics for your sweep inside a subdirectory.

For example, consider the following pseudocode. A user wants to log the validation loss ("val_loss": loss). First they pass the values into a dictionary. However, the dictionary passed to wandb.Run.log() does not explicitly access the key-value pair in the dictionary:

# Import the W&B Python Library and log into W&B
import wandb
import random

def train():
    # Simulate training and validation metrics
    offset = random.random() / 5
    epoch = 5  # Simulate an epoch value
    acc = 1 - 2**-epoch - random.random() / epoch - offset
    loss = 2**-epoch + random.random() / epoch + offset
    return loss, acc


def main():
    with wandb.init(entity="<entity>", project="my-first-sweep") as run:
        val_loss, val_acc = train()
        # Incorrect. You must explicitly access the
        # key-value pair in the dictionary
        # See next code block to see how to correctly log metrics
        run.log({"val_loss": val_loss, "val_acc": val_acc})

sweep_configuration = {
    "method": "random",
    "metric": {"goal": "minimize", "name": "val_loss"},
    "parameters": {
        "x": {"max": 0.1, "min": 0.01},
        "y": {"values": [1, 3, 7]},
    },
}

# Initialize the sweep with the configuration dictionary
sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep")

# Start the sweep job
wandb.agent(sweep_id, function=main, count=10)

Instead, explicitly access the key-value pair within the Python dictionary. For example, the following code specifies the key-value pair when you pass the dictionary to the wandb.Run.log() method:

# Import the W&B Python Library and log into W&B
import wandb
import random


def train():
    offset = random.random() / 5
    acc = 1 - 2**-epoch - random.random() / epoch - offset
    loss = 2**-epoch + random.random() / epoch + offset

    return loss, acc


def main():
    with wandb.init(entity="<entity>", project="my-first-sweep") as run:
        # Correct. Explicitly access the key-value pair in the dictionary
        # when logging metrics
        val_loss, val_acc = train()
        run.log({"val_loss": val_loss, "val_acc": val_acc})


sweep_configuration = {
    "method": "random",
    "metric": {"goal": "minimize", "name": "val_loss"},
    "parameters": {
        "x": {"max": 0.1, "min": 0.01},
        "y": {"values": [1, 3, 7]},
    },
}

sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep")

wandb.agent(sweep_id, function=main, count=10)

2.2.3 - Define a sweep configuration

Learn how to create configuration files for sweeps.

A W&B Sweep combines a strategy for exploring hyperparameter values with the code that evaluates them. The strategy can be as simple as trying every option or as complex as Bayesian Optimization and Hyperband (BOHB).

Define a sweep configuration either in a Python dictionary or a YAML file. How you define your sweep configuration depends on how you want to manage your sweep.

Define your sweep configuration in a YAML file if you want to initialize a sweep and start a sweep agent from the command line. Define your sweep in a Python dictionary if you initialize a sweep and start a sweep entirely within a Python script or notebook.

The following guide describes how to format your sweep configuration. See Sweep configuration options for a comprehensive list of top-level sweep configuration keys.

Basic structure

Both sweep configuration format options (YAML and Python dictionary) utilize key-value pairs and nested structures.

Use top-level keys within your sweep configuration to define qualities of your sweep search such as the name of the sweep (name key), the parameters to search through (parameters key), the methodology to search the parameter space (method key), and more.

For example, the proceeding code snippets show the same sweep configuration defined within a YAML file and within a Python dictionary. Within the sweep configuration there are five top level keys specified: program, name, method, metric and parameters.

Define a sweep configuration in a YAML file if you want to manage sweeps interactively from the command line (CLI)

program: train.py
name: sweepdemo
method: bayes
metric:
  goal: minimize
  name: validation_loss
parameters:
  learning_rate:
    min: 0.0001
    max: 0.1
  batch_size:
    values: [16, 32, 64]
  epochs:
    values: [5, 10, 15]
  optimizer:
    values: ["adam", "sgd"]

Define a sweep in a Python dictionary data structure if you define training algorithm in a Python script or notebook.

The proceeding code snippet stores a sweep configuration in a variable named sweep_configuration:

sweep_configuration = {
    "name": "sweepdemo",
    "method": "bayes",
    "metric": {"goal": "minimize", "name": "validation_loss"},
    "parameters": {
        "learning_rate": {"min": 0.0001, "max": 0.1},
        "batch_size": {"values": [16, 32, 64]},
        "epochs": {"values": [5, 10, 15]},
        "optimizer": {"values": ["adam", "sgd"]},
    },
}

Within the top level parameters key, the following keys are nested: learning_rate, batch_size, epoch, and optimizer. For each of the nested keys you specify, you can provide one or more values, a distribution, a probability, and more. For more information, see the parameters section in Sweep configuration options.

Double nested parameters

Sweep configurations support nested parameters. To delineate a nested parameter, use an additional parameters key under the top level parameter name. Sweep configs support multi-level nesting.

Specify a probability distribution for your random variables if you use a Bayesian or random hyperparameter search. For each hyperparameter:

Create a top level parameters key in your sweep config.
Within the parameterskey, nest the following:
1. Specify the name of hyperparameter you want to optimize.
2. Specify the distribution you want to use for the distribution key. Nest the distribution key-value pair underneath the hyperparameter name.
3. Specify one or more values to explore. The value (or values) should be inline with the distribution key.
  1. (Optional) Use an additional parameters key under the top level parameter name to delineate a nested parameter.

Nested parameters defined in sweep configuration overwrite keys specified in a W&B run configuration.

For example, suppose you initialize a W&B run with the following configuration in a train.py Python script (see Lines 1-2). Next, you define a sweep configuration in a dictionary called sweep_configuration (see Lines 4-13). You then pass the sweep config dictionary to wandb.sweep to initialize a sweep config (see Line 16).

def main():
    run = wandb.init(config={"nested_param": {"manual_key": 1}})


sweep_configuration = {
    "top_level_param": 0,
    "nested_param": {
        "learning_rate": 0.01,
        "double_nested_param": {"x": 0.9, "y": 0.8},
    },
}

# Initialize sweep by passing in config.
sweep_id = wandb.sweep(sweep=sweep_configuration, project="<project>")

# Start sweep job.
wandb.agent(sweep_id, function=main, count=4)

The nested_param.manual_key that is passed when the W&B run is initialized is not accessible. The wandb.Run.config only possess the key-value pairs that are defined in the sweep configuration dictionary.

Sweep configuration template

The following template shows how you can configure parameters and specify search constraints. Replace hyperparameter_name with the name of your hyperparameter and any values enclosed in <>.

program: <insert>
method: <insert>
parameter:
  hyperparameter_name0:
    value: 0  
  hyperparameter_name1: 
    values: [0, 0, 0]
  hyperparameter_name: 
    distribution: <insert>
    value: <insert>
  hyperparameter_name2:  
    distribution: <insert>
    min: <insert>
    max: <insert>
    q: <insert>
  hyperparameter_name3: 
    distribution: <insert>
    values:
      - <list_of_values>
      - <list_of_values>
      - <list_of_values>
early_terminate:
  type: hyperband
  s: 0
  eta: 0
  max_iter: 0
command:
- ${Command macro}
- ${Command macro}
- ${Command macro}
- ${Command macro}

To express a numeric value using scientific notation, add the YAML !!float operator, which casts the value to a floating point number. For example, min: !!float 1e-5. See Command example.

Sweep configuration examples

program: train.py
method: random
metric:
  goal: minimize
  name: loss
parameters:
  batch_size:
    distribution: q_log_uniform_values
    max: 256 
    min: 32
    q: 8
  dropout: 
    values: [0.3, 0.4, 0.5]
  epochs:
    value: 1
  fc_layer_size: 
    values: [128, 256, 512]
  learning_rate:
    distribution: uniform
    max: 0.1
    min: 0
  optimizer:
    values: ["adam", "sgd"]

sweep_config = {
    "method": "random",
    "metric": {"goal": "minimize", "name": "loss"},
    "parameters": {
        "batch_size": {
            "distribution": "q_log_uniform_values",
            "max": 256,
            "min": 32,
            "q": 8,
        },
        "dropout": {"values": [0.3, 0.4, 0.5]},
        "epochs": {"value": 1},
        "fc_layer_size": {"values": [128, 256, 512]},
        "learning_rate": {"distribution": "uniform", "max": 0.1, "min": 0},
        "optimizer": {"values": ["adam", "sgd"]},
    },
}

Bayes hyperband example

program: train.py
method: bayes
metric:
  goal: minimize
  name: val_loss
parameters:
  dropout:
    values: [0.15, 0.2, 0.25, 0.3, 0.4]
  hidden_layer_size:
    values: [96, 128, 148]
  layer_1_size:
    values: [10, 12, 14, 16, 18, 20]
  layer_2_size:
    values: [24, 28, 32, 36, 40, 44]
  learn_rate:
    values: [0.001, 0.01, 0.003]
  decay:
    values: [1e-5, 1e-6, 1e-7]
  momentum:
    values: [0.8, 0.9, 0.95]
  epochs:
    value: 27
early_terminate:
  type: hyperband
  s: 2
  eta: 3
  max_iter: 27

The proceeding tabs show how to specify either a minimum or maximum number of iterations for early_terminate:

The brackets for this example are: [3, 3*eta, 3*eta*eta, 3*eta*eta*eta], which equals [3, 9, 27, 81].

early_terminate:
  type: hyperband
  min_iter: 3

The brackets for this example are [27/eta, 27/eta/eta], which equals [9, 3].

early_terminate:
  type: hyperband
  max_iter: 27
  s: 2

Macro and custom command arguments example

For more complex command line arguments, you can use macros to pass environment variables, the Python interpreter, and additional arguments. W&B supports pre defined macros and custom command line arguments that you can specify in your sweep configuration.

For example, the following sweep configuration (sweep.yaml) defines a command that runs a Python script (run.py) with the ${env}, ${interpreter}, and ${program} macros replaced with the appropriate values when the sweep runs.

The --batch_size=${batch_size}, --test=True, and --optimizer=${optimizer} arguments use custom macros to pass the values of the batch_size, test, and optimizer parameters defined in the sweep configuration.

program: run.py
method: random
metric:
  name: validation_loss
parameters:
  learning_rate:
    min: 0.0001
    max: 0.1
command:
  - ${env}
  - ${interpreter}
  - ${program}
  - "--batch_size=${batch_size}"
  - "--optimizer=${optimizer}"
  - "--test=True"

The associated Python script (run.py) can then parse these command line arguments using the argparse module.

# run.py  
import wandb
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--batch_size', type=int)
parser.add_argument('--optimizer', type=str, choices=['adam', 'sgd'], required=True)
parser.add_argument('--test', type=str2bool, default=False)
args = parser.parse_args()

# Initialize a W&B Run
with wandb.init('test-project') as run:
    run.log({'validation_loss':1})

See the Command macros section in Sweep configuration options for a list of pre-defined macros you can use in your sweep configuration.

Boolean arguments

The argparse module does not support boolean arguments by default. To define a boolean argument, you can use the action parameter or use a custom function to convert the string representation of the boolean value to a boolean type.

As an example, you can use the following code snippet to define a boolean argument. Pass store_true or store_false as an argument to ArgumentParser.

import wandb
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--test', action='store_true')
args = parser.parse_args()

args.test  # This will be True if --test is passed, otherwise False

You can also define a custom function to convert the string representation of the boolean value to a boolean type. For example, the following code snippet defines the str2bool function, which converts a string to a boolean value.

def str2bool(v: str) -> bool:
  """Convert a string to a boolean. This is required because
  argparse does not support boolean arguments by default.
  """
  if isinstance(v, bool):
      return v
  return v.lower() in ('yes', 'true', 't', '1')

2.2.3.1 - Sweep configuration options

A sweep configuration consists of nested key-value pairs. Use top-level keys within your sweep configuration to define qualities of your sweep search such as the parameters to search through (parameter key), the methodology to search the parameter space (method key), and more.

The proceeding table lists top-level sweep configuration keys and a brief description. See the respective sections for more information about each key.

Top-level keys	Description
`program`	(required) Training script to run
`entity`	The entity for this sweep
`project`	The project for this sweep
`description`	Text description of the sweep
`name`	The name of the sweep, displayed in the W&B UI.
`method`	(required) The search strategy
`metric`	The metric to optimize (only used by certain search strategies and stopping criteria)
`parameters`	(required) Parameter bounds to search
`early_terminate`	Any early stopping criteria
`command`	Command structure for invoking and passing arguments to the training script
`run_cap`	Maximum number of runs for this sweep

See the Sweep configuration structure for more information on how to structure your sweep configuration.

`metric`

Use the metric top-level sweep configuration key to specify the name, the goal, and the target metric to optimize.

Key	Description
`name`	Name of the metric to optimize.
`goal`	Either `minimize` or `maximize` (Default is `minimize`).
`target`	Goal value for the metric you are optimizing. The sweep does not create new runs when if or when a run reaches a target value that you specify. Active agents that have a run executing (when the run reaches the target) wait until the run completes before the agent stops creating new runs.

`parameters`

In your YAML file or Python script, specify parameters as a top level key. Within the parameters key, provide the name of a hyperparameter you want to optimize. Common hyperparameters include: learning rate, batch size, epochs, optimizers, and more. For each hyperparameter you define in your sweep configuration, specify one or more search constraints.

The proceeding table shows supported hyperparameter search constraints. Based on your hyperparameter and use case, use one of the search constraints below to tell your sweep agent where (in the case of a distribution) or what (value, values, and so forth) to search or use.

Search constraint	Description
`values`	Specifies all valid values for this hyperparameter. Compatible with `grid`.
`value`	Specifies the single valid value for this hyperparameter. Compatible with `grid`.
`distribution`	Specify a probability distribution. See the note following this table for information on default values.
`probabilities`	Specify the probability of selecting each element of `values` when using `random`.
`min`, `max`	(`int`or `float`) Maximum and minimum values. If `int`, for `int_uniform` -distributed hyperparameters. If `float`, for `uniform` -distributed hyperparameters.
`mu`	(`float`) Mean parameter for `normal` - or `lognormal` -distributed hyperparameters.
`sigma`	(`float`) Standard deviation parameter for `normal` - or `lognormal` -distributed hyperparameters.
`q`	(`float`) Quantization step size for quantized hyperparameters.
`parameters`	Nest other parameters inside a root level parameter.

W&B sets the following distributions based on the following conditions if a distribution is not specified:

categorical if you specify values
int_uniform if you specify max and min as integers
uniform if you specify max and min as floats
constant if you provide a set to value

`method`

Specify the hyperparameter search strategy with the method key. There are three hyperparameter search strategies to choose from: grid, random, and Bayesian search.

Grid search

Iterate over every combination of hyperparameter values. Grid search makes uninformed decisions on the set of hyperparameter values to use on each iteration. Grid search can be computationally costly.

Grid search executes forever if it is searching within in a continuous search space.

Random search

Choose a random, uninformed, set of hyperparameter values on each iteration based on a distribution. Random search runs forever unless you stop the process from the command line, within your python script, or the W&B App.

Specify the distribution space with the metric key if you choose random (method: random) search.

Bayesian search

In contrast to random and grid search, Bayesian models make informed decisions. Bayesian optimization uses a probabilistic model to decide which values to use through an iterative process of testing values on a surrogate function before evaluating the objective function. Bayesian search works well for small numbers of continuous parameters but scales poorly. For more information about Bayesian search, see the Bayesian Optimization Primer paper.

Bayesian search runs forever unless you stop the process from the command line, within your python script, or the W&B App.

Distribution options for random and Bayesian search

Within the parameter key, nest the name of the hyperparameter. Next, specify the distribution key and specify a distribution for the value.

The proceeding tables lists distributions W&B supports.

Value for `distribution` key	Description
`constant`	Constant distribution. Must specify the constant value (`value`) to use.
`categorical`	Categorical distribution. Must specify all valid values (`values`) for this hyperparameter.
`int_uniform`	Discrete uniform distribution on integers. Must specify `max` and `min` as integers.
`uniform`	Continuous uniform distribution. Must specify `max` and `min` as floats.
`q_uniform`	Quantized uniform distribution. Returns `round(X / q) * q` where X is uniform. `q` defaults to `1`.
`log_uniform`	Log-uniform distribution. Returns a value `X` between `exp(min)` and `exp(max)`such that the natural logarithm is uniformly distributed between `min` and `max`.
`log_uniform_values`	Log-uniform distribution. Returns a value `X` between `min` and `max` such that `log(`X`)` is uniformly distributed between `log(min)` and `log(max)`.
`q_log_uniform`	Quantized log uniform. Returns `round(X / q) * q` where `X` is `log_uniform`. `q` defaults to `1`.
`q_log_uniform_values`	Quantized log uniform. Returns `round(X / q) * q` where `X` is `log_uniform_values`. `q` defaults to `1`.
`inv_log_uniform`	Inverse log uniform distribution. Returns `X`, where `log(1/X)` is uniformly distributed between `min` and `max`.
`inv_log_uniform_values`	Inverse log uniform distribution. Returns `X`, where `log(1/X)` is uniformly distributed between `log(1/max)` and `log(1/min)`.
`normal`	Normal distribution. Return value is normally distributed with mean `mu` (default `0`) and standard deviation `sigma` (default `1`).
`q_normal`	Quantized normal distribution. Returns `round(X / q) * q` where `X` is `normal`. Q defaults to 1.
`log_normal`	Log normal distribution. Returns a value `X` such that the natural logarithm `log(X)` is normally distributed with mean `mu` (default `0`) and standard deviation `sigma` (default `1`).
`q_log_normal`	Quantized log normal distribution. Returns `round(X / q) * q` where `X` is `log_normal`. `q` defaults to `1`.

`early_terminate`

Use early termination (early_terminate) to stop poorly performing runs. If early termination occurs, W&B stops the current run before it creates a new run with a new set of hyperparameter values.

You must specify a stopping algorithm if you use early_terminate. Nest the type key within early_terminate within your sweep configuration.

Stopping algorithm

W&B currently supports Hyperband stopping algorithm.

Hyperband hyperparameter optimization evaluates if a program should stop or if it should to continue at one or more pre-set iteration counts, called brackets.

When a W&B run reaches a bracket, the sweep compares that run’s metric to all previously reported metric values. The sweep terminates the run if the run’s metric value is too high (when the goal is minimization) or if the run’s metric is too low (when the goal is maximization).

Brackets are based on the number of logged iterations. The number of brackets corresponds to the number of times you log the metric you are optimizing. The iterations can correspond to steps, epochs, or something in between. The numerical value of the step counter is not used in bracket calculations.

Specify either min_iter or max_iter to create a bracket schedule.

Key	Description
`min_iter`	Specify the iteration for the first bracket
`max_iter`	Specify the maximum number of iterations.
`s`	Specify the total number of brackets (required for `max_iter`)
`eta`	Specify the bracket multiplier schedule (default: `3`).
`strict`	Enable ‘strict’ mode that prunes runs aggressively, more closely following the original Hyperband paper. Defaults to false.

Hyperband checks which runs to end once every few minutes. The end run timestamp might differ from the specified brackets if your run or iteration are short.

`command`

Modify the format and contents with nested values within the command key. You can directly include fixed components such as filenames.

On Unix systems, /usr/bin/env ensures that the OS chooses the correct Python interpreter based on the environment.

W&B supports the following macros for variable components of the command:

Command macro	Description
`${env}`	`/usr/bin/env` on Unix systems, omitted on Windows.
`${interpreter}`	Expands to `python`.
`${program}`	Training script filename specified by the sweep configuration `program` key.
`${args}`	Hyperparameters and their values in the form `--param1=value1 --param2=value2`.
`${args_no_boolean_flags}`	Hyperparameters and their values in the form `--param1=value1` except boolean parameters are in the form `--boolean_flag_param` when `True` and omitted when `False`.
`${args_no_hyphens}`	Hyperparameters and their values in the form `param1=value1 param2=value2`.
`${args_json}`	Hyperparameters and their values encoded as JSON.
`${args_json_file}`	The path to a file containing the hyperparameters and their values encoded as JSON.
`${envvar}`	A way to pass environment variables. `${envvar:MYENVVAR}` __ expands to the value of MYENVVAR environment variable. __

2.2.4 - Initialize a sweep

Initialize a W&B Sweep

W&B uses a Sweep Controller to manage sweeps on the cloud (standard), locally (local) across one or more machines. After a run completes, the sweep controller will issue a new set of instructions describing a new run to execute. These instructions are picked up by agents who actually perform the runs. In a typical W&B Sweep, the controller lives on the W&B server. Agents live on your machines.

The following code snippets demonstrate how to initialize sweeps with the CLI and within a Jupyter Notebook or Python script.

Before you initialize a sweep, make sure you have a sweep configuration defined either in a YAML file or a nested Python dictionary object in your script. For more information, see Define sweep configuration.
Both the W&B Sweep and the W&B Run must be in the same project. Therefore, the name you provide when you initialize W&B (wandb.init()) must match the name of the project you provide when you initialize a W&B Sweep (wandb.sweep()).

Use the W&B SDK to initialize a sweep. Pass the sweep configuration dictionary to the sweep parameter. Optionally provide the name of the project for the project parameter (project) where you want the output of the W&B Run to be stored. If the project is not specified, the run is put in an “Uncategorized” project.

import wandb

# Example sweep configuration
sweep_configuration = {
    "method": "random",
    "name": "sweep",
    "metric": {"goal": "maximize", "name": "val_acc"},
    "parameters": {
        "batch_size": {"values": [16, 32, 64]},
        "epochs": {"values": [5, 10, 15]},
        "lr": {"max": 0.1, "min": 0.0001},
    },
}

sweep_id = wandb.sweep(sweep=sweep_configuration, project="project-name")

The wandb.sweep() function returns the sweep ID. The sweep ID includes the entity name and the project name. Make a note of the sweep ID.

Use the W&B CLI to initialize a sweep. Provide the name of your configuration file. Optionally provide the name of the project for the project flag. If the project is not specified, the W&B Run is put in an “Uncategorized” project.

Use the wandb sweep command to initialize a sweep. The proceeding code example initializes a sweep for a sweeps_demo project and uses a config.yaml file for the configuration.

wandb sweep --project sweeps_demo config.yaml

This command will print out a sweep ID. The sweep ID includes the entity name and the project name. Make a note of the sweep ID.

2.2.5 - Start or stop a sweep agent

Start or stop a W&B Sweep Agent on one or more machines.

Start a W&B Sweep on one or more agents on one or more machines. W&B Sweep agents query the W&B server you launched when you initialized a W&B Sweep (wandb sweep) for hyperparameters and use them to run model training.

To start a W&B Sweep agent, provide the W&B Sweep ID that was returned when you initialized a W&B Sweep. The W&B Sweep ID has the form:

entity/project/sweep_ID

Where:

entity: Your W&B username or team name.
project: The name of the project where you want the output of the W&B Run to be stored. If the project is not specified, the run is put in an “Uncategorized” project.
sweep_ID: The pseudo random, unique ID generated by W&B.

Provide the name of the function the W&B Sweep will execute if you start a W&B Sweep agent within a Jupyter Notebook or Python script.

The proceeding code snippets demonstrate how to start an agent with W&B. We assume you already have a configuration file and you have already initialized a W&B Sweep. For more information about how to define a configuration file, see Define sweep configuration.

Use the wandb agent command to start a sweep. Provide the sweep ID that was returned when you initialized the sweep. Copy and paste the code snippet below and replace sweep_id with your sweep ID:

wandb agent sweep_id

Use the W&B Python SDK library to start a sweep. Provide the sweep ID that was returned when you initialized the sweep. In addition, provide the name of the function the sweep will execute.

wandb.agent(sweep_id=sweep_id, function=function_name)

Stop W&B agent

Random and Bayesian searches will run forever. You must stop the process from the command line, within your python script, or the Sweeps UI.

Optionally specify the number of W&B Runs a Sweep agent should try. The following code snippets demonstrate how to set a maximum number of W&B Runs with the CLI and within a Jupyter Notebook, Python script.

First, initialize your sweep. For more information, see Initialize sweeps.

sweep_id = wandb.sweep(sweep_config)

Next, start the sweep job. Provide the sweep ID generated from sweep initiation. Pass an integer value to the count parameter to set the maximum number of runs to try.

sweep_id, count = "dtzl1o7u", 10
wandb.agent(sweep_id, count=count)

If you start a new run after the sweep agent has finished, within the same script or notebook, then you should call wandb.teardown() before starting the new run.

First, initialize your sweep with the wandb sweep command. For more information, see Initialize sweeps.

wandb sweep config.yaml

Pass an integer value to the count flag to set the maximum number of runs to try.

NUM=10
SWEEPID="dtzl1o7u"
wandb agent --count $NUM $SWEEPID

2.2.6 - Parallelize agents

Parallelize W&B Sweep agents on multi-core or multi-GPU machine.

Parallelize your W&B Sweep agents on a multi-core or multi-GPU machine. Before you get started, ensure you have initialized your W&B Sweep. For more information on how to initialize a W&B Sweep, see Initialize sweeps.

Parallelize on a multi-CPU machine

Depending on your use case, explore the proceeding tabs to learn how to parallelize W&B Sweep agents using the CLI or within a Jupyter Notebook.

Use the wandb agent command to parallelize your sweep agent across multiple CPUs with the terminal. Provide the sweep ID that was returned when you initialized the sweep.

Open more than one terminal window on your local machine.
Copy and paste the code snippet below and replace sweep_id with your sweep ID:

wandb agent sweep_id

Use the W&B Python SDK library to parallelize your W&B Sweep agent across multiple CPUs within Jupyter Notebooks. Ensure you have the sweep ID that was returned when you initialized the sweep. In addition, provide the name of the function the sweep will execute for the function parameter:

Open more than one Jupyter Notebook.
Copy and past the W&B Sweep ID on multiple Jupyter Notebooks to parallelize a W&B Sweep. For example, you can paste the following code snippet on multiple jupyter notebooks to paralleliz your sweep if you have the sweep ID stored in a variable called sweep_id and the name of the function is function_name:

wandb.agent(sweep_id=sweep_id, function=function_name)

Parallelize on a multi-GPU machine

Follow the procedure outlined to parallelize your W&B Sweep agent across multiple GPUs with a terminal using CUDA Toolkit:

Open more than one terminal window on your local machine.
Specify the GPU instance to use with CUDA_VISIBLE_DEVICES when you start a W&B Sweep job (wandb agent). Assign CUDA_VISIBLE_DEVICES an integer value corresponding to the GPU instance to use.

For example, suppose you have two NVIDIA GPUs on your local machine. Open a terminal window and set CUDA_VISIBLE_DEVICES to 0 (CUDA_VISIBLE_DEVICES=0). Replace sweep_ID in the proceeding example with the W&B Sweep ID that is returned when you initialized a W&B Sweep:

Terminal 1

CUDA_VISIBLE_DEVICES=0 wandb agent sweep_ID

Open a second terminal window. Set CUDA_VISIBLE_DEVICES to 1 (CUDA_VISIBLE_DEVICES=1). Paste the same W&B Sweep ID for the sweep_ID mentioned in the proceeding code snippet:

Terminal 2

CUDA_VISIBLE_DEVICES=1 wandb agent sweep_ID

2.2.7 - Visualize sweep results

Visualize the results of your W&B Sweeps with the W&B App UI.

Visualize the results of your W&B Sweeps with the W&B App. Navigate to the W&B App. Choose the project that you specified when you initialized a sweep. You will be redirected to your project workspace. Select the Sweep icon on the left panel (broom icon). From the Sweep UI, select the name of your Sweep from the list.

By default, W&B will automatically create a parallel coordinates plot, a parameter importance plot, and a scatter plot when you start a W&B Sweep job.

Parallel coordinates charts summarize the relationship between large numbers of hyperparameters and model metrics at a glance. For more information on parallel coordinates plots, see Parallel coordinates.

The scatter plot(left) compares the W&B Runs that were generated during the Sweep. For more information about scatter plots, see Scatter Plots.

The parameter importance plot(right) lists the hyperparameters that were the best predictors of, and highly correlated to desirable values of your metrics. For more information on parameter importance plots, see Parameter Importance.

You can alter the dependent and independent values (x and y axis) that are automatically used. Within each panel there is a pencil icon called Edit panel. Choose Edit panel. A model will appear. Within the modal, you can alter the behavior of the graph.

For more information on all default W&B visualization options, see Panels. See the Data Visualization docs for information on how to create plots from W&B Runs that are not part of a W&B Sweep.

2.2.8 - Manage sweeps with the CLI

Pause, resume, and cancel a W&B Sweep with the CLI.

Pause, resume, and cancel a W&B Sweep with the CLI. Pausing a W&B Sweep tells the W&B agent that new W&B Runs should not be executed until the Sweep is resumed. Resuming a Sweep tells the agent to continue executing new W&B Runs. Stopping a W&B Sweep tells the W&B Sweep agent to stop creating or executing new W&B Runs. Cancelling a W&B Sweep tells the Sweep agent to kill currently executing W&B Runs and stop executing new Runs.

In each case, provide the W&B Sweep ID that was generated when you initialized a W&B Sweep. Optionally open a new terminal window to execute the proceeding commands. A new terminal window makes it easier to execute a command if a W&B Sweep is printing output statements to your current terminal window.

Use the following guidance to pause, resume, and cancel sweeps.

Pause sweeps

Pause a W&B Sweep so it temporarily stops executing new W&B Runs. Use the wandb sweep --pause command to pause a W&B Sweep. Provide the W&B Sweep ID that you want to pause.

wandb sweep --pause entity/project/sweep_ID

Resume sweeps

Resume a paused W&B Sweep with the wandb sweep --resume command. Provide the W&B Sweep ID that you want to resume:

wandb sweep --resume entity/project/sweep_ID

Stop sweeps

Finish a W&B sweep to stop executing newW&B Runs and let currently executing Runs finish.

wandb sweep --stop entity/project/sweep_ID

Cancel sweeps

Cancel a sweep to kill all running runs and stop running new runs. Use the wandb sweep --cancel command to cancel a W&B Sweep. Provide the W&B Sweep ID that you want to cancel.

wandb sweep --cancel entity/project/sweep_ID

For a full list of CLI command options, see the wandb sweep CLI Reference Guide.

Pause, resume, stop, and cancel a sweep across multiple agents

Pause, resume, stop, or cancel a W&B Sweep across multiple agents from a single terminal. For example, suppose you have a multi-core machine. After you initialize a W&B Sweep, you open new terminal windows and copy the Sweep ID to each new terminal.

Within any terminal, use the wandb sweep CLI command to pause, resume, stop, or cancel a W&B Sweep. For example, the proceeding code snippet demonstrates how to pause a W&B Sweep across multiple agents with the CLI:

wandb sweep --pause entity/project/sweep_ID

Specify the --resume flag along with the Sweep ID to resume the Sweep across your agents:

wandb sweep --resume entity/project/sweep_ID

For more information on how to parallelize W&B agents, see Parallelize agents.

2.2.9 - Learn more about sweeps

Collection of useful sources for Sweeps.

Academic papers

Li, Lisha, et al. “Hyperband: A novel bandit-based approach to hyperparameter optimization.” The Journal of Machine Learning Research 18.1 (2017): 6765-6816.

Sweep Experiments

The following W&B Reports demonstrate examples of projects that explore hyperparameter optimization with W&B Sweeps.

Drought Watch Benchmark Progress
- Description: Developing the baseline and exploring submissions to the Drought Watch benchmark.
Tuning Safety Penalties in Reinforcement Learning
- Description: We examine agents trained with different side effect penalties on three different tasks: pattern creation, pattern removal, and navigation.
Meaning and Noise in Hyperparameter Search with W&B Stacey Svetlichnaya
- Description: How do we distinguish signal from pareidolia (imaginary patterns)? This article is showcases what is possible with W&B and aims to inspire further exploration.
Who is Them? Text Disambiguation with Transformers
- Description: Using Hugging Face to explore models for natural language understanding
DeepChem: Molecular Solubility
- Description: Predict chemical properties from molecular structure with random forests and deep nets.
Intro to MLOps: Hyperparameter Tuning
- Description: Explore why hyperparameter optimization matters and look at three algorithms to automate hyperparameter tuning for your machine learning models.

selfm-anaged

The following how-to-guide demonstrates how to solve real-world problems with W&B:

Sweeps with XGBoost
- Description: How to use W&B Sweeps for hyperparameter tuning using XGBoost.

Sweep GitHub repository

W&B advocates open source and welcome contributions from the community. Find the W&B Sweeps GitHub repository. For information on how to contribute to the W&B open source repo, see the W&B GitHub Contribution guidelines.

2.2.10 - Manage algorithms locally

Search and stop algorithms locally instead of using the W&B cloud-hosted service.

The hyper-parameter controller is hosted by Weights & Biased as a cloud service by default. W&B agents communicate with the controller to determine the next set of parameters to use for training. The controller is also responsible for running early stopping algorithms to determine which runs can be stopped.

The local controller feature allows the user to commence search and stop algorithms locally. The local controller gives the user the ability to inspect and instrument the code in order to debug issues as well as develop new features which can be incorporated into the cloud service.

This feature is offered to support faster development and debugging of new algorithms for the Sweeps tool. It is not intended for actual hyperparameter optimization workloads.

Before you get start, you must install the W&B SDK(wandb). Type the following code snippet into your command line:

pip install wandb sweeps

The following examples assume you already have a configuration file and a training loop defined in a python script or Jupyter Notebook. For more information about how to define a configuration file, see Define sweep configuration.

Run the local controller from the command line

Initialize a sweep similarly to how you normally would when you use hyper-parameter controllers hosted by W&B as a cloud service. Specify the controller flag (controller) to indicate you want to use the local controller for W&B sweep jobs:

wandb sweep --controller config.yaml

Alternatively, you can separate initializing a sweep and specifying that you want to use a local controller into two steps.

To separate the steps, first add the following key-value to your sweep’s YAML configuration file:

controller:
  type: local

Next, initialize the sweep:

wandb sweep config.yaml

wandb sweep generates a sweep ID. After you initialized the sweep, start a controller with wandb controller:

wandb controller {user}/{entity}/{sweep_id}

Once you have specified you want to use a local controller, start one or more Sweep agents to execute the sweep. Start a W&B Sweep similar to how you normally would. See Start sweep agents, for more information.

wandb sweep sweep_ID

Run a local controller with W&B Python SDK

The following code snippets demonstrate how to specify and use a local controller with the W&B Python SDK.

The simplest way to use a controller with the Python SDK is to pass the sweep ID to the wandb.controller method. Next, use the return objects run method to start the sweep job:

sweep = wandb.controller(sweep_id)
sweep.run()

If you want more control of the controller loop:

import wandb

sweep = wandb.controller(sweep_id)
while not sweep.done():
    sweep.print_status()
    sweep.step()
    time.sleep(5)

Or even more control over the parameters served:

import wandb

sweep = wandb.controller(sweep_id)
while not sweep.done():
    params = sweep.search()
    sweep.schedule(params)
    sweep.print_status()

If you want to specify your sweep entirely with code you can do something like this:

import wandb

sweep = wandb.controller()
sweep.configure_search("grid")
sweep.configure_program("train-dummy.py")
sweep.configure_controller(type="local")
sweep.configure_parameter("param1", value=3)
sweep.create()
sweep.run()

2.2.11 - Sweeps troubleshooting

Troubleshoot common W&B Sweep issues.

Troubleshoot common error messages with the guidance suggested.

`CommError, Run does not exist` and `ERROR Error uploading`

Your W&B Run ID might be defined if these two error messages are both returned. As an example, you might have a similar code snippet defined somewhere in your Jupyter Notebooks or Python script:

wandb.init(id="some-string")

You can not set a Run ID for W&B Sweeps because W&B automatically generates random, unique IDs for Runs created by W&B Sweeps.

W&B Run IDs need to be unique within a project.

We recommend you pass a name to the name parameter when you initialized W&B, if you want to set a custom name that will appear on tables and graphs. For example:

wandb.init(name="a helpful readable run name")

`Cuda out of memory`

Refactor your code to use process-based executions if you see this error message. More specifically, rewrite your code to a Python script. In addition, call the W&B Sweep Agent from the CLI, instead of the W&B Python SDK.

As an example, suppose you rewrite your code to a Python script called train.py. Add the name of the training script (train.py) to your YAML Sweep configuration file (config.yaml in this example):

program: train.py
method: bayes
metric:
  name: validation_loss
  goal: maximize
parameters:
  learning_rate:
    min: 0.0001
    max: 0.1
  optimizer:
    values: ["adam", "sgd"]

Next, add the following to your train.py Python script:

if _name_ == "_main_":
    train()

Navigate to your CLI and initialize a W&B Sweep with wandb sweep:

wandb sweep config.yaml

Make a note of the W&B Sweep ID that is returned. Next, start the Sweep job with wandb agent with the CLI instead of the Python SDK (wandb.agent). Replace sweep_ID in the code snippet below with the Sweep ID that was returned in the previous step:

wandb agent sweep_ID

`anaconda 400 error`

The following error usually occurs when you do not log the metric that you are optimizing:

wandb: ERROR Error while calling W&B API: anaconda 400 error: 
{"code": 400, "message": "TypeError: bad operand type for unary -: 'NoneType'"}

Within your YAML file or nested dictionary you specify a key named “metric” to optimize. Ensure that you log (wandb.log) this metric. In addition, ensure you use the exact metric name that you defined the sweep to optimize within your Python script or Jupyter Notebook. For more information about configuration files, see Define sweep configuration.

2.2.12 - Sweeps UI

Describes the different components of the Sweeps UI.

The state (State), creation time (Created), the entity that started the sweep (Creator), the number of runs completed (Run count), and the time it took to compute the sweep (Compute time) are displayed in the Sweeps UI. The expected number of runs a sweep will create (Est. Runs) is provided when you do a grid search over a discrete search space. You can also click on a sweep to pause, resume, stop, or kill the sweep from the interface.

2.2.13 - Tutorial: Create sweep job from project

Tutorial on how to create sweep jobs from a pre-existing W&B project.

This tutorial explains how to create sweep jobs from a pre-existing W&B project. We will use the Fashion MNIST dataset to train a PyTorch convolutional neural network how to classify images. The required code an dataset is located in the W&B examples repository (PyTorch CNN Fashion)

Explore the results in this W&B Dashboard.

1. Create a project

First, create a baseline. Download the PyTorch MNIST dataset example model from W&B examples GitHub repository. Next, train the model. The training script is within the examples/pytorch/pytorch-cnn-fashion directory.

Clone this repo git clone https://github.com/wandb/examples.git
Open this example cd examples/pytorch/pytorch-cnn-fashion
Run a run manually python train.py

Optionally explore the example appear in the W&B App UI dashboard.

View an example project page →

2. Create a sweep

From your project page, open the Sweep tab in the sidebar and select Create Sweep.

The auto-generated configuration guesses values to sweep over based on the runs you have completed. Edit the configuration to specify what ranges of hyperparameters you want to try. When you launch the sweep, it starts a new process on the hosted W&B sweep server. This centralized service coordinates the agents— the machines that are running the training jobs.

3. Launch agents

Next, launch an agent locally. You can launch up to 20 agents on different machines in parallel if you want to distribute the work and finish the sweep job more quickly. The agent will print out the set of parameters it’s trying next.

Now you’re running a sweep. The following image demonstrates what the dashboard looks like as the example sweep job is running. View an example project page →

Seed a new sweep with existing runs

Launch a new sweep using existing runs that you’ve previously logged.

Open your project table.
Select the runs you want to use with checkboxes on the left side of the table.
Click the dropdown to create a new sweep.

Your sweep will now be set up on our server. All you need to do is launch one or more agents to start running runs.

If you kick off the new sweep as a bayesian sweep, the selected runs will also seed the Gaussian Process.

2.3 - Tables

Iterate on datasets and understand model predictions

Try in Colab Try in W&B

Use W&B Tables to visualize and query tabular data. For example:

Compare how different models perform on the same test set
Identify patterns in your data
Look at sample model predictions visually
Query to find commonly misclassified examples

Semantic segmentation predictions table The above image shows a table with semantic segmentation and custom metrics. View this table here in this sample project from the W&B ML Course.

How it works

A Table is a two-dimensional grid of data where each column has a single type of data. Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types.

Log a Table

Log a table with a few lines of code:

wandb.init(): Create a run to track results.
wandb.Table(): Create a new table object.
- columns: Set the column names.
- data: Set the contents of the table.
run.log(): Log the table to save it to W&B.

import wandb

run = wandb.init(project="table-test")
my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
run.log({"Table Name": my_table})

How to get started

Quickstart: Learn to log data tables, visualize data, and query data.
Tables Gallery: See example use cases for Tables.

2.3.1 - Tutorial: Log tables, visualize and query data

Explore how to use W&B Tables with this 5 minute Quickstart.

The following Quickstart demonstrates how to log data tables, visualize data, and query data.

Select the button below to try a PyTorch Quickstart example project on MNIST data.

1. Log a table

Log a table with W&B. You can either construct a new table or pass a Pandas Dataframe.

To construct and log a new Table, you will use:

wandb.init(): Create a run to track results.
wandb.Table(): Create a new table object.
- columns: Set the column names.
- data: Set the contents of each row.
wandb.Run.log(): Log the table to save it to W&B.

Here’s an example:

import wandb

with wandb.init(project="table-test") as run:
    # Create and log a new table.
    my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
    run.log({"Table Name": my_table})

Pass a Pandas Dataframe to wandb.Table() to create a new table.

import wandb
import pandas as pd

df = pd.read_csv("my_data.csv")

with wandb.init(project="df-table") as run:
    # Create a new table from the DataFrame
    # and log it to W&B.
  my_table = wandb.Table(dataframe=df)
  run.log({"Table Name": my_table})

For more information on supported data types, see the wandb.Table in the W&B API Reference Guide.

2. Visualize tables in your project workspace

View the resulting table in your workspace.

Navigate to your project in the W&B App.
Select the name of your run in your project workspace. A new panel is added for each unique table key.

In this example, my_table, is logged under the key "Table Name".

3. Compare across model versions

Log sample tables from multiple W&B Runs and compare results in the project workspace. In this example workspace, we show how to combine rows from multiple different versions in the same table.

Use the table filter, sort, and grouping features to explore and evaluate model results.

2.3.2 - Log tables

Visualize and log tabular data with W&B Tables. A W&B Table is a two-dimensional grid of data where each column has a single type of data. Each row represents one or more data points logged to a W&B run. W&B Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types.

A W&B Table is a specialized data type in W&B, logged as an artifact object.

You create and log table objects using the W&B Python SDK. When you create a table object, you specify the columns and data for the table and a mode. The mode determines how the table is logged and updated during your ML experiments.

INCREMENTAL mode is supported on W&B Server v0.70.0 and above.

Create and log a table

Initialize a new run with wandb.init().
Create a Table object with the wandb.Table Class. Specify the columns and data for the table for the columns and data parameters, respectively. It is recommended to set the optional log_mode parameter to one of the three modes: IMMUTABLE (the default), MUTABLE, or INCREMENTAL. See Table Logging Modes in the next section for more information.
Log the table to W&B with run.log().

The following example shows how to create and log a table with two columns, a and b, and two rows of data, ["a1", "b1"] and ["a2", "b2"]:

import wandb

# Start a new run
with wandb.init(project="table-demo") as run:

    # Create a table object with two columns and two rows of data
    my_table = wandb.Table(
        columns=["a", "b"],
        data=[["a1", "b1"], ["a2", "b2"]],
        log_mode="IMMUTABLE"
        )

    # Log the table to W&B
    run.log({"Table Name": my_table})

Logging modes

The wandb.Table log_mode parameter determines how a table is logged and updated during your ML experiments. The log_mode parameter accepts one of three arguments: IMMUTABLE, MUTABLE, and INCREMENTAL. Each mode has different implications for how a table is logged, how it can be modified, and how it is rendered in the W&B App.

The following describes the three logging modes, the high-level differences, and common use case for each mode:

Mode	Definition	Use Cases	Benefits
`IMMUTABLE`	Once a table is logged to W&B, you cannot modify it.	- Storing tabular data generated at the end of a run for further analysis	- Minimal overhead when logged at the end of a run - All rows rendered in UI
`MUTABLE`	After you log a table to W&B, you can overwrite the existing table with a new one.	- Adding columns or rows to existing tables - Enriching results with new information	- Capture Table mutations - All rows rendered in UI
`INCREMENTAL`	Add batches of new rows to a table throughout the machine learning experiment.	- Adding rows to tables in batches - Long-running training jobs - Processing large datasets in batches - Monitoring ongoing results	- View updates on UI during training - Ability to step through increments

The next sections show example code snippets for each mode along with considerations when to use each mode.

MUTABLE mode

MUTABLE mode updates an existing table by replacing the existing table with a new one. MUTABLE mode is useful when you want to add new columns and rows to an existing table in a non iterative process. Within the UI, the table is rendered with all rows and columns, including the new ones added after the initial log.

In MUTABLE mode, the table object is replaced each time you log the table. Overwriting a table with a new one is computationally expensive and can be slow for large tables.

The following example shows how to create a table in MUTABLE mode, log it, and then add new columns to it. The table object is logged three times: once with the initial data, once with the confidence scores, and once with the final predictions.

The following example uses a placeholder function load_eval_data() to load data and a placeholder function model.predict() to make predictions. You will need to replace these with your own data loading and prediction functions.

import wandb
import numpy as np

with wandb.init(project="mutable-table-demo") as run:

    # Create a table object with MUTABLE logging mode
    table = wandb.Table(columns=["input", "label", "prediction"],
                        log_mode="MUTABLE")

    # Load data and make predictions
    inputs, labels = load_eval_data() # Placeholder function
    raw_preds = model.predict(inputs) # Placeholder function

    for inp, label, pred in zip(inputs, labels, raw_preds):
        table.add_data(inp, label, pred)

    # Step 1: Log initial data 
    run.log({"eval_table": table})  # Log initial table

    # Step 2: Add confidence scores (e.g. max softmax)
    confidences = np.max(raw_preds, axis=1)
    table.add_column("confidence", confidences)
    run.log({"eval_table": table})  # Add confidence info

    # Step 3: Add post-processed predictions
    # (e.g., thresholded or smoothed outputs)
    post_preds = (confidences > 0.7).astype(int)
    table.add_column("final_prediction", post_preds)
    run.log({"eval_table": table})  # Final update with another column

If you only want to add new batches of rows (no columns) incrementally like in a training loop, consider using INCREMENTAL mode instead.

INCREMENTAL mode

In incremental mode, you log batches of rows to a table during the machine learning experiment. This is ideal for monitoring long-running jobs or when working with large tables that would be inefficient to log during the run for updates. Within the UI, the table is updated with new rows as they are logged, allowing you to view the latest data without having to wait for the entire run to finish. You can also step through the increments to view the table at different points in time.

Run workspaces in the W&B App have a limit of 100 increments. If you log more than 100 increments, only the most recent 100 are shown in the run workspace.

The following example creates a table in INCREMENTAL mode, logs it, and then adds new rows to it. Note that the table is logged once per training step (step).

The following example uses a placeholder function get_training_batch() to load data, a placeholder function train_model_on_batch() to train the model, and a placeholder function predict_on_batch() to make predictions. You will need to replace these with your own data loading, training, and prediction functions.

import wandb

with wandb.init(project="incremental-table-demo") as run:

    # Create a table with INCREMENTAL logging mode
    table = wandb.Table(columns=["step", "input", "label", "prediction"],
                        log_mode="INCREMENTAL")

    # Training loop
    for step in range(get_num_batches()): # Placeholder function
        # Load batch data
        inputs, labels = get_training_batch(step) # Placeholder function

        # Train and predict
        train_model_on_batch(inputs, labels) # Placeholder function
        predictions = predict_on_batch(inputs) # Placeholder function

        # Add batch data to table
        for input_item, label, prediction in zip(inputs, labels, predictions):
            table.add_data(step, input_item, label, prediction)

        # Log the table incrementally
        run.log({"training_table": table}, step=step)

Incremental logging is generally more computationally efficient than logging a new table each time (log_mode=MUTABLE). However, the W&B App may not render all rows in the table if you log a large number of increments. If your goal is to update and view your table data while your run is ongoing and to have all the data available for analysis, consider using two tables. One with INCREMENTAL log mode and one with IMMUTABLE log mode.

The following example shows how to combine INCREMENTAL and IMMUTABLE logging modes to achieve this.

import wandb

with wandb.init(project="combined-logging-example") as run:

    # Create an incremental table for efficient updates during training
    incr_table = wandb.Table(columns=["step", "input", "prediction", "label"],
                            log_mode="INCREMENTAL")

    # Training loop
    for step in range(get_num_batches()):
        # Process batch
        inputs, labels = get_training_batch(step)
        predictions = model.predict(inputs)

        # Add data to incremental table
        for inp, pred, label in zip(inputs, predictions, labels):
            incr_table.add_data(step, inp, pred, label)

        # Log the incremental update (suffix with -incr to distinguish from final table)
        run.log({"table-incr": incr_table}, step=step)

    # At the end of training, create a complete immutable table with all data
    # Using the default IMMUTABLE mode to preserve the complete dataset
    final_table = wandb.Table(columns=incr_table.columns, data=incr_table.data, log_mode="IMMUTABLE")
    run.log({"table": final_table})

In this example, the incr_table is logged incrementally (with log_mode="INCREMENTAL") during training. This allows you to log and view updates to the table as new data is processed. At the end of training, an immutable table (final_table) is created with all data from the incremental table. The immutable table is logged to preserve the complete dataset for further analysis and it enables you to view all rows in the W&B App.

Examples

Enriching evaluation results with MUTABLE

import wandb
import numpy as np

with wandb.init(project="mutable-logging") as run:

    # Step 1: Log initial predictions
    table = wandb.Table(columns=["input", "label", "prediction"], log_mode="MUTABLE")
    inputs, labels = load_eval_data()
    raw_preds = model.predict(inputs)

    for inp, label, pred in zip(inputs, labels, raw_preds):
        table.add_data(inp, label, pred)

    run.log({"eval_table": table})  # Log raw predictions

    # Step 2: Add confidence scores (e.g. max softmax)
    confidences = np.max(raw_preds, axis=1)
    table.add_column("confidence", confidences)
    run.log({"eval_table": table})  # Add confidence info

    # Step 3: Add post-processed predictions
    # (e.g., thresholded or smoothed outputs)
    post_preds = (confidences > 0.7).astype(int)
    table.add_column("final_prediction", post_preds)
    run.log({"eval_table": table})

Resuming runs with INCREMENTAL tables

You can continue logging to an incremental table when resuming a run:

# Start or resume a run
resumed_run = wandb.init(project="resume-incremental", id="your-run-id", resume="must")

# Create the incremental table; no need to populate with data from previously logged table
# Increments will be continue to be added to the Table artifact.
table = wandb.Table(columns=["step", "metric"], log_mode="INCREMENTAL")

# Continue logging
for step in range(resume_step, final_step):
    metric = compute_metric(step)
    table.add_data(step, metric)
    resumed_run.log({"metrics": table}, step=step)

resumed_run.finish()

Increments are logged to a new table if you turn off summaries on a key used for the incremental table using wandb.Run.define_metric("<table_key>", summary="none") or wandb.Run.define_metric("*", summary="none").

Training with INCREMENTAL batch training


with wandb.init(project="batch-training-incremental") as run:

    # Create an incremental table
    table = wandb.Table(columns=["step", "input", "label", "prediction"], log_mode="INCREMENTAL")

    # Simulated training loop
    for step in range(get_num_batches()):
        # Load batch data
        inputs, labels = get_training_batch(step)

        # Train the model on this batch
        train_model_on_batch(inputs, labels)

        # Run model inference
        predictions = predict_on_batch(inputs)

        # Add data to the table
        for input_item, label, prediction in zip(inputs, labels, predictions):
            table.add_data(step, input_item, label, prediction)

        # Log the current state of the table incrementally
        run.log({"training_table": table}, step=step)

2.3.3 - Visualize and analyze tables

Visualize and analyze W&B Tables.

Customize your W&B Tables to answer questions about your machine learning model’s performance, analyze your data, and more.

Interactively explore your data to:

Compare changes precisely across models, epochs, or individual examples
Understand higher-level patterns in your data
Capture and communicate your insights with visual samples

W&B Tables posses the following behaviors:

Stateless in an artifact context: any table logged alongside an artifact version resets to its default state after you close the browser window
Stateful in a workspace or report context: any changes you make to a table in a single run workspace, multi-run project workspace, or Report persists.

For information on how to save your current W&B Table view, see Save your view.

Compare two tables

Compare two tables with a merged view or a side-by-side view. For example, the image below demonstrates a table comparison of MNIST data.

Follow these steps to compare two tables:

Go to your project in the W&B App.
Select the artifacts icon on the left panel.
Select an artifact version.

In the following image we demonstrate a model’s predictions on MNIST validation data after each of five epochs (view interactive example here).

Click on 'predictions' to view the Table

Hover over the second artifact version you want to compare in the sidebar and click Compare when it appears. For example, in the image below we select a version labeled as “v4” to compare to MNIST predictions made by the same model after 5 epochs of training.

Merged view

Initially you see both tables merged together. The first table selected has index 0 and a blue highlight, and the second table has index 1 and a yellow highlight. View a live example of merged tables here.

From the merged view, you can

choose the join key: use the dropdown at the top left to set the column to use as the join key for the two tables. Typically this is the unique identifier of each row, such as the filename of a specific example in your dataset or an incrementing index on your generated samples. Note that it’s currently possible to select any column, which may yield illegible tables and slow queries.
concatenate instead of join: select “concatenating all tables” in this dropdown to union all the rows from both tables into one larger Table instead of joining across their columns
reference each Table explicitly: use 0, 1, and * in the filter expression to explicitly specify a column in one or both table instances
visualize detailed numerical differences as histograms: compare the values in any cell at a glance

Side-by-side view

To view the two tables side-by-side, change the first dropdown from “Merge Tables: Table” to “List of: Table” and then update the “Page size” respectively. Here the first Table selected is on the left and the second one is on the right. Also, you can compare these tables vertically as well by clicking on the “Vertical” checkbox.

compare the tables at a glance: apply any operations (sort, filter, group) to both tables in tandem and spot any changes or differences quickly. For example, view the incorrect predictions grouped by guess, the hardest negatives overall, the confidence score distribution by true label, etc.
explore two tables independently: scroll through and focus on the side/rows of interest

Visualize how values change throughout your runs

View how values you log to a table change throughout your runs with a step slider. Slide the step slider to view the values logged at different steps. For example, you can view how the loss, accuracy, or other metrics change after each run.

The slider uses a key to determine the step value. The default key for the slider is _step, a special key that W&B automatically logs for you. The _step key is an integer that increments by 1 each time you call wandb.Run.log() in your code.

To add a step slider to a W&B Table:

Navigate to your project’s workspace.
Click Add panel in the top right corner of the workspace.
Select Query panel.
Within the query expression editor, select runs and press Enter on your keyboard.
Click the gear icon to view the settings for the panel.
Set Render As selector to Stepper.
Set Stepper Key to _step or the key to use as the unit for the step slider.

The following image shows a query panel with three W&B runs and the values they logged at step 295.

Within the W&B App UI you may notice duplicate values for multiple steps. This duplication can occur if multiple runs log the same value at different steps, or if a run does not log values at every step. If a value is missing for a given step, W&B uses the last value that was logged as the slider key.

Custom step key

The step key can be any numeric metric that you log in your runs as the step key, such as epoch or global_step. When you use a custom step key, W&B maps each value of that key to a step (_step) in the run.

This table shows how a custom step key epoch maps to _step values for three different runs: serene-sponge, lively-frog, and vague-cloud. Each row represents a call to wandb.Run.log() at a particular _step in a run. The columns show the corresponding epoch values, if any, that were logged at those steps. Some _step values are omitted to save space.

The first time wandb.Run.log() was called, none of the runs logged an epoch value, so the table shows empty values for epoch.

`_step`	vague-cloud (`epoch`)	lively-frog(`epoch`)	serene-sponge (`epoch`)
1
2			1
4		1	2
5	1
6			3
8		2	4
10			5
12		3	6
14			7
15	2
16		4	8
18			9
20	3	5	10

Now, if the slider is set to epoch = 1, the following happens:

vague-cloud finds epoch = 1 and returns the value logged at _step = 5
lively-frog finds epoch = 1 and returns the value logged at _step = 4
serene-sponge finds epoch = 1 and returns the value logged at _step = 2

If the slider is set to epoch = 9:

vague-cloud also doesn’t log epoch = 9, so W&B uses the latest prior value epoch = 3 and returns the value logged at _step = 20
lively-frog doesn’t log epoch = 9, but the latest prior value is epoch = 5 so it returns the value logged at _step = 20
serene-sponge finds epoch = 9 and return the value logged at _step = 18

Compare artifacts

You can also compare tables across time or model variants.

Compare tables across time

Log a table in an artifact for each meaningful step of training to analyze model performance over training time. For example, you could log a table at the end of every validation step, after every 50 epochs of training, or any frequency that makes sense for your pipeline. Use the side-by-side view to visualize changes in model predictions.

For a more detailed walkthrough of visualizing predictions across training time, see the predictions over time report and this interactive notebook example.

Compare tables across model variants

Compare two artifact versions logged at the same step for two different models to analyze model performance across different configurations (hyperparameters, base architectures, and so forth).

For example, compare predictions between a baseline and a new model variant, 2x_layers_2x_lr, where the first convolutional layer doubles from 32 to 64, the second from 128 to 256, and the learning rate from 0.001 to 0.002. From this live example, use the side-by-side view and filter down to the incorrect predictions after 1 (left tab) versus 5 training epochs (right tab).

Save your view

Tables you interact with in the run workspace, project workspace, or a report automatically saves their view state. If you apply any table operations then close your browser, the table retains the last viewed configuration when you next navigate to the table.

Tables you interact with in the artifact context remains stateless.

To save a table from a workspace in a particular state, export it to a W&B Report. To export a table to report:

Select the kebab icon (three vertical dots) in the top right corner of your workspace visualization panel.
Select either Share panel or Add to report.

Examples

These reports highlight the different use cases of W&B Tables:

2.3.4 - Example tables

Examples of W&B Tables

The following sections highlight some of the ways you can use tables:

View your data

Log metrics and rich media during model training or evaluation, then visualize results in a persistent database synced to the cloud, or to your hosting instance.

For example, check out this table that shows a balanced split of a photos dataset.

Interactively explore your data

View, sort, filter, group, join, and query tables to understand your data and model performance—no need to browse static files or rerun analysis scripts.

For example, see this report on style-transferred audio.

Compare model versions

Quickly compare results across different training epochs, datasets, hyperparameter choices, model architectures etc.

For example, see this table that compares two models on the same test images.

Track every detail and see the bigger picture

Zoom in to visualize a specific prediction at a specific step. Zoom out to see the aggregate statistics, identify patterns of errors, and understand opportunities for improvement. This tool works for comparing steps from a single model training, or results across different model versions.

For example, see this example table that analyzes results after one and then after five epochs on the MNIST dataset.

Example Projects with W&B Tables

The following highlight some real W&B Projects that use W&B Tables.

Image classification

Read Visualize Data for Image Classification, follow the data visualization nature Colab, or explore the artifacts context to see how a CNN identifies ten types of living things (plants, bird, insects, etc) from iNaturalist photos.

Compare the distribution of true labels across two different models' predictions.

Audio

Interact with audio tables in Whale2Song - W&B Tables for Audio on timbre transfer. You can compare a recorded whale song with a synthesized rendition of the same melody on an instrument like violin or trumpet. You can also record your own songs and explore their synthesized versions in W&B with the audio transfer Colab.

Text

Browse text samples from training data or generated output, dynamically group by relevant fields, and align your evaluation across model variants or experiment settings. Render text as Markdown or use visual diff mode to compare texts. See the Shakespeare text generation report for an example of a character-based RNN.

Doubling the size of the hidden layer yields some more creative prompt completions.

Video

Browse and aggregate over videos logged during training to understand your models. Here is an early example using the SafeLife benchmark for RL agents seeking to minimize side effects

Browse easily through the few successful agents

Tabular data

View a report on how to split and pre-process tabular data with version control and de-duplication.

Comparing model variants (semantic segmentation)

An interactive notebook and live example of logging Tables for semantic segmentation and comparing different models. Try your own queries in this Table.

Find the best predictions across two models on the same test set

Analyzing improvement over training time

A detailed report on how to visualize predictions over time and the accompanying interactive notebook.

2.3.5 - Export table data

How to export data from tables.

Like all W&B Artifacts, Tables can be converted into pandas dataframes for easy data exporting.

Convert `table` to `artifact`

First, you’ll need to convert the table to an artifact. The easiest way to do this using artifact.get(table, "table_name"):

# Create and log a new table.
with wandb.init() as r:
    artifact = wandb.Artifact("my_dataset", type="dataset")
    table = wandb.Table(
        columns=["a", "b", "c"], data=[(i, i * 2, 2**i) for i in range(10)]
    )
    artifact.add(table, "my_table")
    wandb.log_artifact(artifact)

# Retrieve the created table using the artifact you created.
with wandb.init() as r:
    artifact = r.use_artifact("my_dataset:latest")
    table = artifact.get("my_table")

Convert `artifact` to Dataframe

Then, convert the table into a dataframe:

# Following from the last code example:
df = table.get_dataframe()

Export Data

Now you can export using any method dataframe supports:

# Converting the table data to .csv
df.to_csv("example.csv", encoding="utf-8")

Next Steps

Check out the reference documentation on artifacts.
Go through our Tables Walktrough guide.
Check out the Dataframe reference docs.

2.4 - W&B App UI

This section provides details to help you use the W&B App UI. Manage workspaces, teams, and registries, visualize and observe experiments, create panels and reports, configure automations, and more.

Access the W&B App in a web browser.

A W&B Multi-tenant deployment is accessible on the public web at https://wandb.ai/.
A W&B Dedicated Cloud deployment is accessible at the domain you configured when you signed up for W&B Dedicated Cloud. An admin user can update the domain in the W&B Management Console. Click on the icon in the top right corner and then click System console.
A W&B Self-Managed deployment is accessible at the hostname you configured when you deployed W&B. For example, if you deploy using Helm, the hostname is configured in values.global.host. An admin user can update the domain in the W&B Management Console. Click on the icon in the top right corner and then click System console.

Learn more:

Track experiments using runs or sweeps.
Configure deployment settings and defaults.
Add panels to visualize your experiments, such as line plots, bar plots, media panels, query panels, and tables.
Add custom charts.
Create and share reports.

2.4.1 - Panels

Use workspace panel visualizations to explore your logged data by key, visualize the relationships between hyperparameters and output metrics, and more.

Workspace modes

W&B projects support two different workspace modes. The icon next to the workspace name shows its mode.

Icon	Workspace mode
	Automated workspaces automatically generate panels for all keys logged in the project. Choose an automatic workspace: To get started quickly by visualizing all available data for the project. For a smaller projects that log fewer keys. For more broad analysis. If you delete a panel from an automatic workspace, you can use Quick add to recreate it.
	Manual workspaces start as blank slates and display only those panels intentionally added by users. Choose a manual workspace: When you care mainly about a fraction of the keys logged in the project. For more focused analysis. To improve the performance of a workspace, avoiding loading panels that are less useful to you. Use Quick add to easily populate a manual workspace and its sections with useful visualizations rapidly.

Icon

Workspace mode

Automated workspaces automatically generate panels for all keys logged in the project. Choose an automatic workspace:

To get started quickly by visualizing all available data for the project.
For a smaller projects that log fewer keys.
For more broad analysis.

If you delete a panel from an automatic workspace, you can use Quick add to recreate it.

Manual workspaces start as blank slates and display only those panels intentionally added by users. Choose a manual workspace:

When you care mainly about a fraction of the keys logged in the project.
For more focused analysis.
To improve the performance of a workspace, avoiding loading panels that are less useful to you.

Use Quick add to easily populate a manual workspace and its sections with useful visualizations rapidly.

To change how a workspace generates panels, reset the workspace.

Undo changes to your workspace

To undo changes to your workspace, click the Undo button (arrow that points left) or type CMD + Z (macOS) or CTRL + Z (Windows / Linux).

Reset a workspace

To reset a workspace:

At the top of the workspace, click the action menu ....
Click Reset workspace.

Configure the workspace layout

To configure the workspace layout, click Settings near the top of the workspace, then click Workspace layout.

Hide empty sections during search (turned on by default)
Sort panels alphabetically (turned off by default)
Section organization (grouped by first prefix by default). To modify this setting:
1. Click the padlock icon.
2. Choose how to group panels within a section.

To configure defaults for the workspace’s line plots, refer to Line plots.

Configure a section’s layout

To configure the layout of a section, click its gear icon, then click Display preferences.

Turn on or off colored run names in tooltips (turned on by default)
Only show highlighted run in companion chart tooltips (turned off by default)
Number of runs shown in tooltips (a single run, all runs, or Default)
Display full run names on the primary chart tooltip (turned off by default)

View a panel in full-screen mode

In full-screen mode, the run selector displays and panels use full full-fidelity sampling mode plots with 10,000 buckets, rather than 1000 buckets otherwise.

To view a panel in full-screen mode:

Hover over the panel.
Click the panel’s action menu ..., then click the full-screen button, which looks like a viewfinder or an outline showing the four corners of a square.
When you share the panel while viewing it in full-screen mode, the resulting link opens in full-screen mode automatically.

To get back to a panel’s workspace from full-screen mode, click the left-pointing arrow at the top of the page.

Add panels

This section shows various ways to add panels to your workspace.

Add a panel manually

Add panels to your workspace one at a time, either globally or at the section level.

To add a panel globally, click Add panels in the control bar near the panel search field.
To add a panel directly to a section instead, click the section’s action ... menu, then click + Add panels.
Select the type of panel to add, such as a chart. The panel’s configuration details appear, with defaults selected.
Optionally, customize the panel and its display preferences. Configuration options depend on the type of panel you select. To learn more about the options for each type of panel, refer to the relevant section below, such as Line plots or Bar plots.
Click Apply.

Quick add panels

Use Quick add to add a panel automatically for each key you select, either globally or at the section level.

For an automated workspace with no deleted panels, the Quick add option is not visible because the workspace already includes panels for all logged keys. You can use Quick add to re-add a panel that you deleted.

To use Quick add to add a panel globally, click Add panels in the control bar near the panel search field, then click Quick add.
To use Quick add to add a panel directly to a section, click the section’s action ... menu, click Add panels, then click Quick add.
A list of panels appears. Each panel with a checkmark is already included in the workspace.
- To add all available panels, click the Add panels button at the top of the list. The Quick Add list closes and the new panels display in the workspace.
- To add an individual panel from the list, hover over the panel’s row, then click Add. Repeat this step for each panel you want to add, then click the X at the top right to close the Quick Add list. The new panels display in the workspace.
Optionally, customize the panel’s settings.

This section shows how to share a panel using a link.

To share a panel using a link, you can either:

While viewing the panel in full-screen mode, copy the URL from the browser.
Click the action menu ... and select Copy panel URL.

Share the link with the user or team. When they access the link, the panel opens in full-screen mode.

To return to a panel’s workspace from full-screen mode, click the left-pointing arrow at the top of the page.

Compose a panel’s full-screen link programmatically

In certain situations, such as when creating an automation, it can be useful to include the panel’s full-screen URL. This section shows the format for a panel’s full-screen URL. In the proceeding example, replace the entity, project, panel, and section names in brackets.

https://wandb.ai/<ENTITY_NAME>/<PROJECT_NAME>?panelDisplayName=<PANEL_NAME>&panelSectionName=<SECTON_NAME>

If multiple panels in the same section have the same name, this URL opens the first panel with the name.

To embed a panel in a website or share it on social media, the panel must be viewable by anyone with the link. If a project is private, only members of the project can view the panel. If the project is public, anyone with the link can view the panel.

To get the code to embed or share a panel on social media:

From the workspace, hover over the panel, then click its action menu ....
Click the Share tab.
Change Only those who are invited have access to Anyone with the link can view. Otherwise, the choices in the next step are not available.
Choose Share on Twitter, Share on Reddit, Share on LinkedIn, or Copy embed link.

Email a panel report

To email a single panel as a stand-alone report:

Hover over the panel, then click the panel’s action menu ....
Click Share panel in report.
Select the Invite tab.
Enter an email address or username.
Optionally, change can view to can edit.
Click Invite. W&B sends an email to the user with a clickable link to the report that contains only the panel you are sharing.

Unlike when you share a panel, the recipient cannot get to the workspace from this report.

Manage panels

Edit a panel

To edit a panel:

Click its pencil icon.
Modify the panel’s settings.
To change the panel to a different type, select the type and then configure the settings.
Click Apply.

Move a panel

To move a panel to a different section, you can use the drag handle on the panel. To select the new section from a list instead:

If necessary, create a new section by clicking Add section after the last section.
Click the action ... menu for the panel.
Click Move, then select a new section.

You can also use the drag handle to rearrange panels within a section.

Duplicate a panel

To duplicate a panel:

At the top of the panel, click the action ... menu.
Click Duplicate.

If desired, you can customize or move the duplicated panel.

Remove panels

To remove a panel:

Hover your mouse over the panel.
Select the action ... menu.
Click Delete.

To remove all panels from a manual workspace, click its action ... menu, then click Clear all panels.

To remove all panels from an automatic or manual workspace, you can reset the workspace. Select Automatic to start with the default set of panels, or select Manual to start with an empty workspace with no panels.

Manage sections

By default, sections in a workspace reflect the logging hierarchy of your keys. However, in a manual workspace, sections appear only after you start adding panels.

Add a section

To add a section, click Add section after the last section.

To add a new section before or after an existing section, you can instead click the section’s action ... menu, then click New section below or New section above.

Manage a section’s panels

Sections with a large number of panels are paginated by default. The default number of panels on a page depend on the panel’s configuration and on the sizes of the panels in the section.

The Custom grid layout will soon be removed. W&B suggests that you no longer use Custom grid layouts. Consider updating your workspace from Custom grid to Standard grid.

When the Custom grid layout is removed, workspaces will be updated to use the Standard grid layout, which will no longer be configurable.

To check which layout a section uses, click the section’s action ... menu. To change a section’s layout, select Standard grid or Custom grid in the Layout grid section.
To resize a panel, hover over it, click the drag handle, and drag it to adjust the panel’s size.

If a section uses the Standard grid, resizing one panel resizes all panels in the section.
If a section uses the Custom grid, you can customize the size of each panel separately.

If a section is paginated, you can customize the number of panels to show on a page:
At the top of the section, click 1 to of , where <X> is the number of visible panels and <Y> is the total number of panels.
Choose how many panels to show per page, up to 100.
To delete a panel from a section:
Hover over the panel, then click its action ... menu.
Click Delete.

If you reset a workspace to an automated workspace, all deleted panels appear again.

Rename a section

To rename a section, click its action ... menu, then click Rename section.

Delete a section

To delete a section, click its ... menu, then click Delete section. This removes the section and its panels.

2.4.1.1 - Line plots

Visualize metrics, customize axes, and compare multiple lines on a plot

Line plots show up by default when you plot metrics over time with wandb.Run.log(). Customize with chart settings to compare multiple lines on the same plot, calculate custom axes, and rename labels.

Edit line plot settings

This section shows how to edit the settings for an individual line plot panel, all line plot panels in a section, or all line plot panels in a workspace.

If you’d like to use a custom x-axis, make sure it’s logged in the same call to wandb.Run.log() that you use to log the y-axis.

Individual line plot

A line plot’s individual settings override the line plot settings for the section or the workspace. To customize a line plot:

Hover your mouse over the panel, then click the gear icon.
Within the drawer that appears, select a tab to edit its settings.
Click Apply.

Line plot settings

You can configure these settings for a line plot:

Date: Configure the plot’s data-display details.

X axis: Select the value to use for the X axis (defaults to Step). You can change the x-axis to Relative Time or select a custom axis based on values you log with W&B. You can also configure the X axis scale and range.
- Relative Time (Wall) is clock time since the process started, so if you started a run and resumed it a day later and logged something that would be plotted a 24hrs.
- Relative Time (Process) is time inside the running process, so if you started a run and ran for 10 seconds and resumed a day later that point would be plotted at 10s.
- Wall Time is minutes elapsed since the start of the first run on the graph.
- Step increments by default each time wandb.Run.log() is called, and is supposed to reflect the number of training steps you’ve logged from your model.
Y axis: Select one or more y-axes from the logged values, including metrics and hyperparameters that change over time. You can also configure the X axis scale and range.
Point aggregation method. Either Random sampling (the default) or Full fidelity. Refer to Sampling.
Smoothing: Change the smoothing on the line plot. Defaults to Time weighted EMA. Other values include No smoothing, Running average, and Gaussian.
Outliers: Rescale to exclude outliers from the default plot min and max scale.
Max number of runs or groups: Show more lines on the line plot at once by increasing this number, which defaults to 10 runs. You’ll see the message “Showing first 10 runs” on the top of the chart if there are more than 10 runs available but the chart is constraining the number visible.
Chart type: Change between a line plot, an area plot, and a percentage area plot.

Grouping: Configure whether and how to group and aggregate runs in the plot.

Group by: Select a column, and all the runs with the same value in that column will be grouped together.
Agg: Aggregation— the value of the line on the graph. The options are mean, median, min, and max of the group.

Chart: Specify titles for the panel, the X axis, and the Y axis, and the -axis, hide or show the legend, and configure its position.

Legend: Customize the appearance of the panel’s legend, if it is enabled.

Legend: The field in the legend for each line in the plot in the legend of the plot for each line.
Legend template: Define a fully customizable template for the legend, specifying exactly what text and variables you want to show up in the template at the top of the line plot as well as the legend that appears when you hover your mouse over the plot.

Expressions: Add custom calculated expressions to the panel.

Y Axis Expressions: Add calculated metrics to your graph. You can use any of the logged metrics as well as configuration values like hyperparameters to calculate custom lines.
X Axis Expressions: Rescale the x-axis to use calculated values using custom expressions. Useful variables include**_step** for the default x-axis, and the syntax for referencing summary values is ${summary:value}

All line plots in a section

To customize the default settings for all line plots in a section, overriding workspace settings for line plots:

Click the section’s gear icon to open its settings.
Within the drawer that appears, select the Data or Display preferences tabs to configure the default settings for the section. For details about each Data setting, refer to the preceding section, Individual line plot. For details about each display preference, refer to Configure section layout.

All line plots in a workspace

To customize the default settings for all line plots in a workspace:

Click the workspace’s settings, which has a gear with the label Settings.
Click Line plots.
Within the drawer that appears, select the Data or Display preferences tabs to configure the default settings for the workspace.
- For details about each Data setting, refer to the preceding section, Individual line plot.
- For details about each Display preferences section, refer to Workspace display preferences. At the workspace level, you can configure the default Zooming behavior for line plots. This setting controls whether to synchronize zooming across line plots with a matching x-axis key. Disabled by default.

Visualize average values on a plot

If you have several different experiments and you’d like to see the average of their values on a plot, you can use the Grouping feature in the table. Click “Group” above the run table and select “All” to show averaged values in your graphs.

Here is what the graph looks like before averaging:

The proceeding image shows a graph that represents average values across runs using grouped lines.

Visualize NaN value on a plot

You can also plot NaN values including PyTorch tensors on a line plot with wandb.Run.log(). For example:

with wandb.init() as run:
    # Log a NaN value
    run.log({"test": float("nan")})

Compare two metrics on one chart

Select the Add panels button in the top right corner of the page.
From the left panel that appears, expand the Evaluation dropdown.
Select Run comparer

Change the color of the line plots

Sometimes the default color of runs is not helpful for comparison. To help overcome this, wandb provides two instances with which one can manually change the colors.

Each run is given a random color by default upon initialization.

Upon clicking any of the colors, a color palette appears from which we can manually choose the color we want.

Hover your mouse over the panel you want to edit its settings for.
Select the pencil icon that appears.
Choose the Legend tab.

Visualize on different x axes

If you’d like to see the absolute time that an experiment has taken, or see what day an experiment ran, you can switch the x axis. Here’s an example of switching from steps to relative time and then to wall time.

Area plots

In the line plot settings, in the advanced tab, click on different plot styles to get an area plot or a percentage area plot.

Zoom

Click and drag a rectangle to zoom vertically and horizontally at the same time. This changes the x-axis and y-axis zoom.

Hide chart legend

Turn off the legend in the line plot with this simple toggle:

Create a run metrics notification

Use Automations to notify your team when a run metric meets a condition you specify. An automation can post to a Slack channel or run a webhook.

From a line plot, you can quickly create a run metrics notification for the metric it shows:

Hover over the panel, then click the bell icon.
Configure the automation using the basic or advanced configuration controls. For example, apply a run filter to limit the scope of the automation, or configure an absolute threshold.

Learn more about Automations.

Visualize CoreWeave infrastructure alerts

Observe infrastructure alerts such as GPU failures, thermal violations, and more during machine learning experiments you log to W&B. During a W&B run, CoreWeave Mission Control monitors your compute infrastructure.

This feature is in Preview and only available when training on a CoreWeave cluster. Contact your W&B representative for access.

If an error occurs, CoreWeave sends that information to W&B. W&B populates infrastructure information onto your run’s plots in your project’s workspace. CoreWeave attempts to automatically resolve some issues, and W&B surfaces that information in the run’s page.

Find infrastructure issues in a run

W&B surfaces both SLURM job issues and cluster node issues. View infrastructure errors in a run:

Navigate to your project on the W&B App.
Select the Workspace tab to view your project’s workspace.
Search and select the name of the run that contains an infrastructure issue. If CoreWeave detected an infrastructure issue, one or more red vertical lines with an exclamation mark overlay the run’s plots.
Select an issue on a plot or select the Issues button in the top right of the page. A drawer appears that lists each issue reported by CoreWeave.

Tip

To views runs with infrastructure issues at a glance, pin the Issues column to your W&B Workspace to view runs that logged an issue at a glance. For more information about how to pin a column, see Customize how runs are displayed.

The Overall Grafana view at the top of the drawer redirects you to the SLURM job’s Grafana dashboard, which contains system-level details about the run. The Issues summary describes the root error that the SLURM job reported to CoreWeave Mission Control. The summary section also describes any attempts to automatically resolve the error made by CoreWeave.

The All Issues list all issues that occurs during the run in chronological order, with the most recent issue at the top. The list contains the job issue and node issue alerts. Within each issue alert is the name of the issue, the timestamp when the issue occurred, a link to the Grafana dashboard for that issue, and a brief summary that describes the issue.

The following table shows example alerts for each category of infrastructure issues:

Category	Example alerts
Node Availability & Readiness	`KubeNodeNotReadyHGX`, `NodeExtendedDownTime`
GPU/Accelerator Errors	`GPUFallenOffBusHGX`, `GPUFaultHGX`, `NodeTooFewGPUs`
Hardware Errors	`HardwareErrorFatal`, `NodeRAIDMemberDegraded`
Networking & DNS	`NodeDNSFailureHGX`, `NodeEthFlappingLegacyNonGPU`
Power, Cooling, and Management	`NodeCPUHZThrottle`, `RedfishDown`
DPU & NVSwitch	`DPUNcoreVersionBelowDesired`, `NVSwitchFaultHGX`
Miscellaneous	`NodePCISpeedRootGBT`, `NodePCIWidthRootSMC`

For detailed information on error types, see the SLURM Job Metrics on the CoreWeave Docs.

Debug infrastructure issues

Each run that you create in W&B corresponds to a single SLURM job in CoreWeave. You can view a failed job’s Grafana dashboard or discover more information about a single node. The link within the Overview section of the Issues drawer links to the SLURM job Grafana dashboard. Expand the All Issues dropdown to view both job and node issues and their respective Grafana dashboards.

Note

The Grafana dashboard is only available for W&B users with a CoreWeave account. Contact W&B to configure Grafana with your W&B organization.

Depending on the issue, you may need to adjust the SLURM job configuration, investigate the node’s status, restart the job, or take other actions as needed.

For more information about CoreWeave SLURM jobs in Grafana, see Slurm/Job Metrics on the CoreWeave Docs. See Job info: alerts for detailed information about job alerts.

2.4.1.1.1 - Line plot reference

X-Axis

You can set the x-axis of a line plot to any value that you have logged with W&B.log as long as it’s always logged as a number.

Y-Axis variables

You can set the y-axis variables to any value you have logged with wandb.log as long as you were logging numbers, arrays of numbers or a histogram of numbers. If you logged more than 1500 points for a variable, W&B samples down to 1500 points.

You can change the color of your y axis lines by changing the color of the run in the runs table.

X range and Y range

You can change the maximum and minimum values of X and Y for the plot.

X range default is from the smallest value of your x-axis to the largest.

Y range default is from the smallest value of your metrics and zero to the largest value of your metrics.

Max runs/groups

By default you will only plot 10 runs or groups of runs. The runs will be taken from the top of your runs table or run set, so if you sort your runs table or run set you can change the runs that are shown.

A workspace is limited to displaying a maximum of 1000 runs, regardless of its configuration.

Legend

You can control the legend of your chart to show for any run any config value that you logged and meta data from the runs such as the created at time or the user who created the run.

Example:

${run:displayName} - ${config:dropout} will make the legend name for each run something like royal-sweep - 0.5 where royal-sweep is the run name and 0.5 is the config parameter named dropout.

You can set value inside[[ ]] to display point specific values in the crosshair when hovering over a chart. For example \[\[ $x: $y ($original) ]] would display something like “2: 3 (2.9)”

Supported values inside [[ ]] are as follows:

Value	Meaning
`${x}`	X value
`${y}`	Y value (Including smoothing adjustment)
`${original}`	Y value not including smoothing adjustment
`${mean}`	Mean of grouped runs
`${stddev}`	Standard Deviation of grouped runs
`${min}`	Min of grouped runs
`${max}`	Max of grouped runs
`${percent}`	Percent of total (for stacked area charts)

Grouping

You can aggregate all of the runs by turning on grouping, or group over an individual variable. You can also turn on grouping by grouping inside the table and the groups will automatically populate into the graph.

Smoothing

You can set the smoothing coefficient to be between 0 and 1 where 0 is no smoothing and 1 is maximum smoothing.

Ignore outliers

Rescale the plot to exclude outliers from the default plot min and max scale. The setting’s impact on the plot depends on the plot’s sampling mode.

For plots that use random sampling mode, when you enable Ignore outliers, only points from 5% to 95% are shown. When outliers are shown, they are not formatted differently from other points.
For plots that use full fidelity mode, all points are always shown, condensed down to the last value in each bucket. When Ignore outliers is enabled, the minimum and maximum bounds of each bucket are shaded. Otherwise, no area is shaded.

Expression

Expression lets you plot values derived from metrics like 1-accuracy. It currently only works if you are plotting a single metric. You can do simple arithmetic expressions, +, -, *, / and % as well as ** for powers.

Plot style

Select a style for your line plot.

Line plot:

Area plot:

Percentage area plot:

2.4.1.1.2 - Point aggregation

Use point aggregation methods within your line plots for improved data visualization accuracy and performance. There are two types of point aggregation modes: full fidelity and random sampling. W&B uses full fidelity mode by default.

Full fidelity

When you use full fidelity mode, W&B breaks the x-axis into dynamic buckets based on the number of data points. It then calculates the minimum, maximum, and average values within each bucket while rendering a point aggregation for the line plot.

There are three main advantages to using full fidelity mode for point aggregation:

Preserve extreme values and spikes: retain extreme values and spikes in your data
Configure how minimum and maximum points render: use the W&B App to interactively decide whether you want to show extreme (min/max) values as a shaded area.
Explore your data without losing data fidelity: W&B recalculates x-axis bucket sizes when you zoom into specific data points. This helps ensure that you can explore your data without losing accuracy. Caching is used to store previously computed aggregations to help reduce loading times which is particularly useful if you are navigating through large datasets.

Configure how minimum and maximum points render

Show or hide minimum and maximum values with shaded areas around your line plots.

The proceeding image shows a blue line plot. The light blue shaded area represents the minimum and maximum values for each bucket.

There are three ways to render minimum and maximum values in your line plots:

Never: The min/max values are not displayed as a shaded area. Only show the aggregated line across the x-axis bucket.
On hover: The shaded area for min/max values appears dynamically when you hover over the chart. This option keeps the view uncluttered while allowing you to inspect ranges interactively.
Always: The min/max shaded area is consistently displayed for every bucket in the chart, helping you visualize the full range of values at all times. This can introduce visual noise if there are many runs visualized in the chart.

By default, the minimum and maximum values are not displayed as shaded areas. To view one of the shaded area options, follow these steps:

Navigate to your W&B project
Select on the Workspace icon on the left tab
Select the gear icon on the top right corner of the screen next to the left of the Add panels button.
From the UI slider that appears, select Line plots
Within the Point aggregation section, choose On over or Always from the Show min/max values as a shaded area dropdown menu.

Navigate to your W&B project
Select on the Workspace icon on the left tab
Select the line plot panel you want to enable full fidelity mode for
Within the modal that appears, select On hover or Always from the Show min/max values as a shaded area dropdown menu.

Explore your data without losing data fidelity

Analyze specific regions of the dataset without missing critical points like extreme values or spikes. When you zoom in on a line plot, W&B adjusts the buckets sizes used to calculate the minimum, maximum, and average values within each bucket.

W&B divides the x-axis is dynamically into 1000 buckets by default. For each bucket, W&B calculates the following values:

Minimum: The lowest value in that bucket.
Maximum: The highest value in that bucket.
Average: The mean value of all points in that bucket.

W&B plots values in buckets in a way that preserves full data representation and includes extreme values in every plot. When zoomed in to 1,000 points or fewer, full fidelity mode renders every data point without additional aggregation.

To zoom in on a line plot, follow these steps:

Navigate to your W&B project
Select on the Workspace icon on the left tab
Optionally add a line plot panel to your workspace or navigate to an existing line plot panel.
Click and drag to select a specific region to zoom in on.

Line plot grouping and expressions

When you use Line Plot Grouping, W&B applies the following based on the mode selected:

Non-windowed sampling (grouping): Aligns points across runs on the x-axis. The average is taken if multiple points share the same x-value; otherwise, they appear as discrete points.
Windowed sampling (grouping and expressions): Divides the x-axis either into 250 buckets or the number of points in the longest line (whichever is smaller). W&B takes an average of points within each bucket.
Full fidelity (grouping and expressions): Similar to non-windowed sampling, but fetches up to 500 points per run to balance performance and detail.

Random sampling

Random sampling uses 1500 randomly sampled points to render line plots. Random sampling is useful for performance reasons when you have a large number of data points.

Random sampling samples non-deterministically. This means that random sampling sometimes excludes important outliers or spikes in the data and therefore reduces data accuracy.

Enable random sampling

By default, W&B uses full fidelity mode. To enable random sampling, follow these steps:

Navigate to your W&B project
Select on the Workspace icon on the left tab
Select the gear icon on the top right corner of the screen next to the left of the Add panels button.
From the UI slider that appears, select Line plots
Choose Random sampling from the Point aggregation section

Navigate to your W&B project
Select on the Workspace icon on the left tab
Select the line plot panel you want to enable random sampling for
Within the modal that appears, select Random sampling from the Point aggregation method section

Access non sampled data

You can access the complete history of metrics logged during a run using the W&B Run API. The following example demonstrates how to retrieve and process the loss values from a specific run:

# Initialize the W&B API
run = api.run("l2k2/examples-numpy-boston/i0wt6xua")

# Retrieve the history of the 'Loss' metric
history = run.scan_history(keys=["Loss"])

# Extract the loss values from the history
losses = [row["Loss"] for row in history]

2.4.1.1.3 - Smooth line plots

In line plots, use smoothing to see trends in noisy data.

W&B supports several types of smoothing:

Time weighted exponential moving average (TWEMA) smoothing
Gaussian smoothing
Running average
Exponential moving average (EMA) smoothing

See these live in an interactive W&B report.

Time Weighted Exponential Moving Average (TWEMA) smoothing (Default)

The Time Weighted Exponential Moving Average (TWEMA) smoothing algorithm is a technique for smoothing time series data by exponentially decaying the weight of previous points. For details about the technique, see Exponential Smoothing. The range is 0 to 1. There is a de-bias term added so that early values in the time series are not biased towards zero.

The TWEMA algorithm takes the density of points on the line (the number of y values per unit of range on x-axis) into account. This allows consistent smoothing when displaying multiple lines with different characteristics simultaneously.

Here is sample code for how this works under the hood:

const smoothingWeight = Math.min(Math.sqrt(smoothingParam || 0), 0.999);
let lastY = yValues.length > 0 ? 0 : NaN;
let debiasWeight = 0;

return yValues.map((yPoint, index) => {
  const prevX = index > 0 ? index - 1 : 0;
  // VIEWPORT_SCALE scales the result to the chart's x-axis range
  const changeInX =
    ((xValues[index] - xValues[prevX]) / rangeOfX) * VIEWPORT_SCALE;
  const smoothingWeightAdj = Math.pow(smoothingWeight, changeInX);

  lastY = lastY * smoothingWeightAdj + yPoint;
  debiasWeight = debiasWeight * smoothingWeightAdj + 1;
  return lastY / debiasWeight;
});

Here’s what this looks like in the app:

Gaussian smoothing

Gaussian smoothing (or Gaussian kernel smoothing) computes a weighted average of the points, where the weights correspond to a gaussian distribution with the standard deviation specified as the smoothing parameter. The smoothed value is calculated for every input x value, based on the points occurring both before and after it.

Here’s what this looks like in the app:

Running average smoothing

Running average is a smoothing algorithm that replaces a point with the average of points in a window before and after the given x value. See “Boxcar Filter” on Wikipedia. The selected parameter for running average tells Weights and Biases the number of points to consider in the moving average.

Consider using Gaussian Smoothing instead if your points are spaced unevenly on the x-axis.

Here’s what this looks like in the app:

Exponential Moving Average (EMA) smoothing

The Exponential Moving Average (EMA) smoothing algorithm is a rule of thumb technique for smoothing time series data using the exponential window function. For details about the technique, see Exponential Smoothing. The range is 0 to 1. A debias term is added so that early values in the time series are not biases towards zero.

In many situations, EMA smoothing is applied to a full scan of history, rather than bucketing first before smoothing. This often produces more accurate smoothing.

In the following situations, EMA smoothing is after bucketing instead:

Sampling
Grouping
Expressions
Non-monotonic x-axes
Time-based x-axes

Here is sample code for how this works under the hood:

  data.forEach(d => {
    const nextVal = d;
    last = last * smoothingWeight + (1 - smoothingWeight) * nextVal;
    numAccum++;
    debiasWeight = 1.0 - Math.pow(smoothingWeight, numAccum);
    smoothedData.push(last / debiasWeight);

Here’s what this looks like in the app:

Hide original data

By default, the original unsmoothed data displays in the plot as a faint line in the background. Click Show Original to turn this off.

2.4.1.2 - Bar plots

Visualize metrics, customize axes, and compare categorical data as bars.

A bar plot presents categorical data with rectangular bars which can be plotted vertically or horizontally. Bar plots show up by default with wandb.Run.log() when all logged values are of length one.

Plotting Box and horizontal Bar plots in W&B

Customize with chart settings to limit max runs to show, group runs by any config and rename labels.

Customize bar plots

You can also create Box or Violin Plots to combine many summary statistics into one chart type**.**

Group runs via runs table.
Click ‘Add panel’ in the workspace.
Add a standard ‘Bar Chart’ and select the metric to plot.
Under the ‘Grouping’ tab, pick ‘box plot’ or ‘Violin’, etc. to plot either of these styles.

2.4.1.3 - Parallel coordinates

Compare results across machine learning experiments

Parallel coordinates charts summarize the relationship between large numbers of hyperparameters and model metrics at a glance.

Axes: Different hyperparameters from wandb.Run.config and metrics from wandb.Run.log().
Lines: Each line represents a single run. Mouse over a line to see a tooltip with details about the run. All lines that match the current filters will be shown, but if you turn off the eye, lines will be grayed out.

Create a parallel coordinates panel

Go to the landing page for your workspace
Click Add Panels
Select Parallel coordinates

Panel Settings

To configure the panel, click the edit button in the upper right corner of the panel.

Tooltip: On hover, a legend shows up with info on each run
Titles: Edit the axis titles to be more readable
Gradient: Customize the gradient to be any color range you like
Log scale: Each axis can be set to view on a log scale independently
Flip axis: Switch the axis direction— this is useful when you have both accuracy and loss as columns

Interact with a live parallel coordinates panel

2.4.1.4 - Scatter plots

This page shows how to use scatter plots in W&B.

Use case

Use scatter plots to compare multiple runs and visualize the performance of an experiment:

Plot lines for minimum, maximum, and average values.
Customize metadata tooltips.
Control point colors.
Adjust axis ranges.
Use a log scale for the axes.

Example

The following example shows a scatter plot displaying validation accuracy for different models over several weeks of experimentation. The tooltip includes batch size, dropout, and axis values. A line also shows the running average of validation accuracy.

See a live example →

Create a scatter plot

To create a scatter plot in the W&B UI:

Navigate to the Workspaces tab.
In the Charts panel, click the action menu ....
From the pop-up menu, select Add panels.
In the Add panels menu, select Scatter plot.
Set the x and y axes to plot the data you want to view. Optionally, set maximum and minimum ranges for your axes or add a z axis.
Click Apply to create the scatter plot.
View the new scatter plot in the Charts panel.

2.4.1.5 - Media panels

A media panel visualizes logged keys for media objects, including 3D objects, audio, images, video, or point clouds. This page shows how to add and manage media panels in a workspace.

Add a media panel

To add a media panel for a logged key using the default configuration, use Quick Add. You can add a media panel globally or to a specific section.

Global: Click Add panels in the control bar near the panel search field.
Section: Click the section’s action ... menu, then click Add panels.
In the list of available panels, find the key for the panel, then click Add. Repeat this step for each media panel you want to add, then click the X at the top right to close the Quick Add list.
Optionally, configure the panel.

You can add a media panel globally or to a specific section:

Global: Click Add panels in the control bar near the panel search field.
Section: Click the section’s action ... menu, then click Add panels.
Click the Media section to expand it.
Select the type of media the panel visualizes, 3d objects, images, video, or audio. The panel configuration screen displays. Configure the panel, then click Apply. Refer to Configure a media panel.

Configure a media panel

Panels for all media types have the same options.

When you add a media panel manually, its configuration page opens after you select the type of media. To update the configuration for an existing panel, hover over the panel, then click the gear icon that appears at the top right. This section describes the settings available in each tab.

Overlays

This tab appears for images and point clouds logged with segmentation masks or bounding boxes.

Search and filter overlays by name.
Customize overlay colors.

Display

Customize the panel’s overall appearance and behavior.

Configure the panel’s title.
Select the media keys to visualize.
Customize the panel’s slider and playback behavior.
- Configure the slider key, which defaults to Step.
- Set Stride length to the number of steps to advance for each click of the slider.
- Turn on or off Snap to existing step. If it is turned on, the stepper advances to the next existing step after Stride length. Otherwise, it advances by Stride length even if that does not align with an existing step.
Images: Turn on or off smoothing.
3d objects: Configure the background color and point color.

Layout

Customize the display of the panel’s individual items.

Turn on or off Grid mode.
- When it is turned on, you can choose a custom X and Y axis to plot on top of each item. More than one item displays in each row, and you limit how many rows to show.
- When it is turned off, you can customize the number of columns to use for the panel’s content, and you can configure the panel’s content, which defaults to Run.
Optionally limit the Max runs to include in the panel.
Optionally specify a Media display limit to limit the number of media items to include per run.
Images and videos: Turn on or off display of full-size media.
Images: When Fit media is turned on, resize the panel’s media to fit the panel’s size.
Point clouds: Optionally turn on the right-handed system for plotting points, rather than the default left-handed system.

All media panels in a section

To customize the default settings for all media panels in a section, overriding workspace settings for media panels:

Click the section’s gear icon to open its settings.
Click Media settings.
Within the drawer that appears, click the Display or Layout tab to configure the default media settings for the workspace. You can configure settings for images, videos, audio, and 3d objects. The settings that appear depend on the section’s current media panels.

For details about each setting, refer to Configure a media panel.

All media panels in a workspace

To customize the default settings for all media panels in a workspace:

Click the workspace’s settings, which has a gear with the label Settings.
Click Media settings.
Within the drawer that appears, click the Display or Layout tab to configure the default media settings for the workspace. You can configure settings for images, videos, audio, and 3d objects. The settings that appear depend on the workspace’s current media panels.

For details about each setting, refer to Configure a media panel.

Interact with a media panel

Click a media panel to view it in full screen mode.
Use the stepper at the top of a media panel to step through media runs.
To configure a media panel, hover over it and click the gear icon at the top.
For an image that was logged with segmentation masks, you can customize their appearance or turn each one on or off. Hover over the panel, then click the lower gear icon.
For an image or point cloud that was logged with bounding boxes, you can customize their appearance or turn each one on or off. Hover over the panel, then click the lower gear icon.

2.4.1.6 - Save and diff code

By default, W&B only saves the latest git commit hash. You can turn on more code features to compare the code between your experiments dynamically in the UI.

Starting with wandb version 0.8.28, W&B can save the code from your main training file where you call wandb.init().

Save library code

When you enable code saving, W&B saves the code from the file that called wandb.init(). To save additional library code, you have three options:

Call `wandb.Run.log_code(".")` after calling `wandb.init()`

import wandb

with wandb.init() as run:
  run.log_code(".")

Pass a settings object to `wandb.init()` with `code_dir` set

import wandb

wandb.init(settings=wandb.Settings(code_dir="."))

This captures all python source code files in the current directory and all subdirectories as an artifact. For more control over the types and locations of source code files that are saved, see the reference docs.

Set code saving in the UI

In addition to setting code saving programmatically, you can also toggle this feature in your W&B account Settings. Note that this will enable code saving for all teams associated with your account.

By default, W&B disables code saving for all teams.

Log in to your W&B account.
Go to Settings > Privacy.
Under Project and content security, toggle Disable default code saving on.

Code comparer

Compare code used in different W&B runs:

Select the Add panels button in the top right corner of the page.
Expand TEXT AND CODE dropdown and select Code.

Jupyter session history

W&B saves the history of code executed in your Jupyter notebook session. When you call wandb.init() inside of Jupyter, W&B adds a hook to automatically save a Jupyter notebook containing the history of code executed in your current session.

Navigate to your project workspaces that contains your code.
Select the Artifacts tab in the left navigation bar.
Expand the code artifact.
Select the Files tab.

This displays the cells that were run in your session along with any outputs created by calling iPython’s display method. This enables you to see exactly what code was run within Jupyter in a given run. When possible W&B also saves the most recent version of the notebook which you would find in the code directory as well.

2.4.1.7 - Parameter importance

Visualize the relationships between your model’s hyperparameters and output metrics

Discover which of your hyperparameters were the best predictors of, and highly correlated to desirable values of your metrics.

Correlation is the linear correlation between the hyperparameter and the chosen metric (in this case val_loss). So a high correlation means that when the hyperparameter has a higher value, the metric also has higher values and vice versa. Correlation is a great metric to look at but it can’t capture second order interactions between inputs and it can get messy to compare inputs with wildly different ranges.

Therefore W&B also calculates an importance metric. W&B trains a random forest with the hyperparameters as inputs and the metric as the target output and report the feature importance values for the random forest.

The idea for this technique was inspired by a conversation with Jeremy Howard who has pioneered the use of random forest feature importances to explore hyperparameter spaces at Fast.ai. W&B highly recommends you check out this lecture (and these notes) to learn more about the motivation behind this analysis.

Hyperparameter importance panel untangles the complicated interactions between highly correlated hyperparameters. In doing so, it helps you fine tune your hyperparameter searches by showing you which of your hyperparameters matter the most in terms of predicting model performance.

Creating a hyperparameter importance panel

Navigate to your W&B project.
Select Add panels button.
Expand the CHARTS dropdown, choose Parallel coordinates from the dropdown.

If an empty panel appears, make sure that your runs are ungrouped

With the parameter manager, we can manually set the visible and hidden parameters.

Manually setting the visible and hidden fields

Interpreting a hyperparameter importance panel

This panel shows you all the parameters passed to the wandb.Run.config object in your training script. Next, it shows the feature importances and correlations of these config parameters with respect to the model metric you select (val_loss in this case).

Importance

The importance column shows you the degree to which each hyperparameter was useful in predicting the chosen metric. Imagine a scenario were you start tuning a plethora of hyperparameters and using this plot to hone in on which ones merit further exploration. The subsequent sweeps can then be limited to the most important hyperparameters, thereby finding a better model faster and cheaper.

W&B calculate importances using a tree based model rather than a linear model as the former are more tolerant of both categorical data and data that’s not normalized.

In the preceding image, you can see that epochs, learning_rate, batch_size and weight_decay were fairly important.

Correlations

Correlations capture linear relationships between individual hyperparameters and metric values. They answer the question of whether there a significant relationship between using a hyperparameter, such as the SGD optimizer, and the val_loss (the answer in this case is yes). Correlation values range from -1 to 1, where positive values represent positive linear correlation, negative values represent negative linear correlation and a value of 0 represents no correlation. Generally a value greater than 0.7 in either direction represents strong correlation.

You might use this graph to further explore the values that are have a higher correlation to our metric (in this case you might pick stochastic gradient descent or adam over rmsprop or nadam) or train for more epochs.

correlations show evidence of association, not necessarily causation.
correlations are sensitive to outliers, which might turn a strong relationship to a moderate one, specially if the sample size of hyperparameters tried is small.
and finally, correlations only capture linear relationships between hyperparameters and metrics. If there is a strong polynomial relationship, it won’t be captured by correlations.

The disparities between importance and correlations result from the fact that importance accounts for interactions between hyperparameters, whereas correlation only measures the affects of individual hyperparameters on metric values. Secondly, correlations capture only the linear relationships, whereas importances can capture more complex ones.

As you can see both importance and correlations are powerful tools for understanding how your hyperparameters influence model performance.

2.4.1.8 - Compare run metrics

Compare metrics across multiple runs

Use the Run Comparer to see differences and similarities across runs in your project.

Add a Run Comparer panel

Select the Add panels button in the top right corner of the page.
From the Evaluation section, select Run comparer.

Use Run Comparer

Run Comparer shows the configuration and logged metrics for the 10 first visible runs in the project, one column per run.

To change the runs to compare, you can search, filter, group, or sort the list of runs on the left-hand side. The Run Comparer updates automatically.
To filter or search for a configuration key, use the search field at the top of the Run Comparer.
To quickly see differences and hide identical values, toggle Diff only at the top of the panel.
To adjust the column width or row height, use the formatting buttons at the top of the panel.
To copy any configuration or metric’s value, hover your mouse over the value, then click the copy button. The entire value is copied, even if it is too long to display on the screen.

By default, Run Comparer does not differentiate runs with different values for job_type. This means that it is possible to compare runs that are not comparable within a project. For example, you could compare a training run to a model evaluation run. A training run could contain run logs, hyperparameters, training loss metrics, and the model itself. An evaluation run could use the model to check the model’s performance on new training data.

When you search, filter, group, or sort the list of runs in the Runs Table, the Run Comparer automatically updates to compare the first 10 runs. Filter or search within the Runs Table to compare similar runs, such as by filtering or sorting the list by job_type. Learn more about filtering runs.

2.4.1.9 - Query panels

Some features on this page are in beta, hidden behind a feature flag. Add weave-plot to your bio on your profile page to unlock all related features.

Looking for W&B Weave? W&B’s suite of tools for Generative AI application building? Find the docs for weave here: wandb.me/weave.

Use query panels to query and interactively visualize your data.

Create a query panel

Add a query to your workspace or within a report.

Navigate to your project’s workspace.
In the upper right hand corner, click Add panel.
From the dropdown, select Query panel.

Type and select /Query panel.

Alternatively, you can associate a query with a set of runs:

Within your report, type and select /Panel grid.
Click the Add panel button.
From the dropdown, select Query panel.

Query components

Expressions

Use query expressions to query your data stored in W&B such as runs, artifacts, models, tables, and more.

Example: Query a table

Suppose you want to query a W&B Table. In your training code you log a table called "cifar10_sample_table":

import wandb
with wandb.init() as run:
  run.log({"cifar10_sample_table":<MY_TABLE>})

Within the query panel you can query your table with:

runs.summary["cifar10_sample_table"]

Breaking this down:

runs is a variable automatically injected in Query Panel Expressions when the Query Panel is in a Workspace. Its “value” is the list of runs which are visible for that particular Workspace. Read about the different attributes available within a run here.
summary is an op which returns the Summary object for a Run. Ops are mapped, meaning this op is applied to each Run in the list, resulting in a list of Summary objects.
["cifar10_sample_table"] is a Pick op (denoted with brackets), with a parameter of predictions. Since Summary objects act like dictionaries or maps, this operation picks the predictions field off of each Summary object.

To learn how to write your own queries interactively, see the Query panel demo.

Configurations

Select the gear icon on the upper left corner of the panel to expand the query configuration. This allows the user to configure the type of panel and the parameters for the result panel.

Result panels

Finally, the query result panel renders the result of the query expression, using the selected query panel, configured by the configuration to display the data in an interactive form. The following images shows a Table and a Plot of the same data.

Basic operations

The following common operations you can make within your query panels.

Sort

Sort from the column options: Column sort options

Filter

You can either filter directly in the query or using the filter button in the top left corner (second image) Query filter syntax

Map

Map operations iterate over lists and apply a function to each element in the data. You can do this directly with a panel query or by inserting a new column from the column options. Map operation query Map column insertion

Groupby

You can groupby using a query or from the column options. Group by query Group by column options

Concat

The concat operation allows you to concatenate 2 tables and concatenate or join from the panel settings Table concatenation

Join

It is also possible to join tables directly in the query. Consider the following query expression:

project("luis_team_test", "weave_example_queries").runs.summary["short_table_0"].table.rows.concat.join(\
project("luis_team_test", "weave_example_queries").runs.summary["short_table_1"].table.rows.concat,\
(row) => row["Label"],(row) => row["Label"], "Table1", "Table2",\
"false", "false")

The table on the left is generated from:

project("luis_team_test", "weave_example_queries").\
runs.summary["short_table_0"].table.rows.concat.join

The table in the right is generated from:

project("luis_team_test", "weave_example_queries").\
runs.summary["short_table_1"].table.rows.concat

Where:

(row) => row["Label"] are selectors for each table, determining which column to join on
"Table1" and "Table2" are the names of each table when joined
true and false are for left and right inner/outer join settings

Runs object

Use query panels to access the runs object. Run objects store records of your experiments. You can find more details in Accessing runs object but, as quick overview, runs object has available:

summary: A dictionary of information that summarizes the run’s results. This can be scalars like accuracy and loss, or large files. By default, wandb.Run.log() sets the summary to the final value of a logged time series. You can set the contents of the summary directly. Think of the summary as the run’s outputs.
history: A list of dictionaries meant to store values that change while the model is training such as loss. The command wandb.Run.log() appends to this object.
config: A dictionary of the run’s configuration information, such as the hyperparameters for a training run or the preprocessing methods for a run that creates a dataset Artifact. Think of these as the run’s “inputs”

Access Artifacts

Artifacts are a core concept in W&B. They are a versioned, named collection of files and directories. Use Artifacts to track model weights, datasets, and any other file or directory. Artifacts are stored in W&B and can be downloaded or used in other runs. You can find more details and examples in Accessing artifacts. Artifacts are normally accessed from the project object:

project.artifactVersion(): returns the specific artifact version for a given name and version within a project
project.artifact(""): returns the artifact for a given name within a project. You can then use .versions to get a list of all versions of this artifact
project.artifactType(): returns the artifactType for a given name within a project. You can then use .artifacts to get a list of all artifacts with this type
project.artifactTypes: returns a list of all artifact types under the project

2.4.1.9.1 - Embed objects

W&B’s Embedding Projector allows users to plot multi-dimensional embeddings on a 2D plane using common dimension reduction algorithms like PCA, UMAP, and t-SNE.

Embeddings are used to represent objects (people, images, posts, words, etc…) with a list of numbers - sometimes referred to as a vector. In machine learning and data science use cases, embeddings can be generated using a variety of approaches across a range of applications. This page assumes the reader is familiar with embeddings and is interested in visually analyzing them inside of W&B.

Embedding Examples

Hello World

W&B allows you to log embeddings using the wandb.Table class. Consider the following example of 3 embeddings, each consisting of 5 dimensions:

import wandb

with wandb.init(project="embedding_tutorial") as run:
  embeddings = [
      # D1   D2   D3   D4   D5
      [0.2, 0.4, 0.1, 0.7, 0.5],  # embedding 1
      [0.3, 0.1, 0.9, 0.2, 0.7],  # embedding 2
      [0.4, 0.5, 0.2, 0.2, 0.1],  # embedding 3
  ]
  run.log(
      {"embeddings": wandb.Table(columns=["D1", "D2", "D3", "D4", "D5"], data=embeddings)}
  )
  run.finish()

After running the above code, the W&B dashboard will have a new Table containing your data. You can select 2D Projection from the upper right panel selector to plot the embeddings in 2 dimensions. Smart default will be automatically selected, which can be easily overridden in the configuration menu accessed by clicking the gear icon. In this example, we automatically use all 5 available numeric dimensions.

Digits MNIST

While the above example shows the basic mechanics of logging embeddings, typically you are working with many more dimensions and samples. Let’s consider the MNIST Digits dataset (UCI ML hand-written digits dataset s) made available via SciKit-Learn. This dataset has 1797 records, each with 64 dimensions. The problem is a 10 class classification use case. We can convert the input data to an image for visualization as well.

import wandb
from sklearn.datasets import load_digits

with wandb.init(project="embedding_tutorial") as run:

  # Load the dataset
  ds = load_digits(as_frame=True)
  df = ds.data

  # Create a "target" column
  df["target"] = ds.target.astype(str)
  cols = df.columns.tolist()
  df = df[cols[-1:] + cols[:-1]]

  # Create an "image" column
  df["image"] = df.apply(
      lambda row: wandb.Image(row[1:].values.reshape(8, 8) / 16.0), axis=1
  )
  cols = df.columns.tolist()
  df = df[cols[-1:] + cols[:-1]]

  run.log({"digits": df})

After running the above code, again we are presented with a Table in the UI. By selecting 2D Projection we can configure the definition of the embedding, coloring, algorithm (PCA, UMAP, t-SNE), algorithm parameters, and even overlay (in this case we show the image when hovering over a point). In this particular case, these are all “smart defaults” and you should see something very similar with a single click on 2D Projection. (Interact with this embedding tutorial example).

Logging Options

You can log embeddings in a number of different formats:

Single Embedding Column: Often your data is already in a “matrix”-like format. In this case, you can create a single embedding column - where the data type of the cell values can be list[int], list[float], or np.ndarray.
Multiple Numeric Columns: In the above two examples, we use this approach and create a column for each dimension. We currently accept python int or float for the cells.

Single embedding column Multiple numeric columns

Furthermore, just like all tables, you have many options regarding how to construct the table:

Directly from a dataframe using wandb.Table(dataframe=df)
Directly from a list of data using wandb.Table(data=[...], columns=[...])
Build the table incrementally row-by-row (great if you have a loop in your code). Add rows to your table using table.add_data(...)
Add an embedding column to your table (great if you have a list of predictions in the form of embeddings): table.add_col("col_name", ...)
Add a computed column (great if you have a function or model you want to map over your table): table.add_computed_columns(lambda row, ndx: {"embedding": model.predict(row)})

Plotting Options

After selecting 2D Projection, you can click the gear icon to edit the rendering settings. In addition to selecting the intended columns (see above), you can select an algorithm of interest (along with the desired parameters). Below you can see the parameters for UMAP and t-SNE respectively.

UMAP parameters t-SNE parameters

Note: we currently downsample to a random subset of 1000 rows and 50 dimensions for all three algorithms.

2.4.2 - Custom charts

Create custom charts in your W&B project. Log arbitrary tables of data and visualize them exactly how you want. Control details of fonts, colors, and tooltips with the power of Vega.

Code: Try an example Colab Colab notebook.
Video: Watch a walkthrough video.
Example: Quick Keras and Sklearn demo notebook

Supported charts from vega.github.io/vega

How it works

Log data: From your script, log config and summary data.
Customize the chart: Pull in logged data with a GraphQL query. Visualize the results of your query with Vega, a powerful visualization grammar.
Log the chart: Call your own preset from your script with wandb.plot_table().

If you do not see the expected data, the column you are looking for might not be logged in the selected runs. Save your chart, go back out to the runs table, and verify selected runs using the eye icon.

Log charts from a script

Builtin presets

W&B has a number of builtin chart presets that you can log directly from your script. These include line plots, scatter plots, bar charts, histograms, PR curves, and ROC curves.

wandb.plot.line()

Log a custom line plot—a list of connected and ordered points (x,y) on arbitrary axes x and y.

with wandb.init() as run:
  data = [[x, y] for (x, y) in zip(x_values, y_values)]
  table = wandb.Table(data=data, columns=["x", "y"])
  run.log(
      {
          "my_custom_plot_id": wandb.plot.line(
              table, "x", "y", title="Custom Y vs X Line Plot"
          )
      }
  )

A line plot logs curves on any two dimensions. If you plot two lists of values against each other, the number of values in the lists must match exactly (for example, each point must have an x and a y).

See an example report or try an example Google Colab notebook.

wandb.plot.scatter()

Log a custom scatter plot—a list of points (x, y) on a pair of arbitrary axes x and y.

with wandb.init() as run:
  data = [[x, y] for (x, y) in zip(class_x_prediction_scores, class_y_prediction_scores)]
  table = wandb.Table(data=data, columns=["class_x", "class_y"])
  run.log({"my_custom_id": wandb.plot.scatter(table, "class_x", "class_y")})

You can use this to log scatter points on any two dimensions. Note that if you’re plotting two lists of values against each other, the number of values in the lists must match exactly (for example, each point must have an x and a y).

See an example report or try an example Google Colab notebook.

wandb.plot.bar()

Log a custom bar chart—a list of labeled values as bars—natively in a few lines:

with wandb.init() as run:
  data = [[label, val] for (label, val) in zip(labels, values)]
  table = wandb.Table(data=data, columns=["label", "value"])
  run.log(
      {
          "my_bar_chart_id": wandb.plot.bar(
              table, "label", "value", title="Custom Bar Chart"
          )
      }
  )

You can use this to log arbitrary bar charts. Note that the number of labels and values in the lists must match exactly (for example, each data point must have both).

See an example report or try an example Google Colab notebook.

wandb.plot.histogram()

Log a custom histogram—sort list of values into bins by count/frequency of occurrence—natively in a few lines. Let’s say I have a list of prediction confidence scores (scores) and want to visualize their distribution:

with wandb.init() as run:
  data = [[s] for s in scores]
  table = wandb.Table(data=data, columns=["scores"])
  run.log({"my_histogram": wandb.plot.histogram(table, "scores", title=None)})

You can use this to log arbitrary histograms. Note that data is a list of lists, intended to support a 2D array of rows and columns.

See an example report or try an example Google Colab notebook.

wandb.plot.pr_curve()

Create a Precision-Recall curve in one line:

with wandb.init() as run:
  plot = wandb.plot.pr_curve(ground_truth, predictions, labels=None, classes_to_plot=None)

  run.log({"pr": plot})

You can log this whenever your code has access to:

a model’s predicted scores (predictions) on a set of examples
the corresponding ground truth labels (ground_truth) for those examples
(optionally) a list of the labels/class names (labels=["cat", "dog", "bird"...] if label index 0 means cat, 1 = dog, 2 = bird, etc.)
(optionally) a subset (still in list format) of the labels to visualize in the plot

See an example report or try an example Google Colab notebook.

wandb.plot.roc_curve()

Create an ROC curve in one line:

with wandb.init() as run:
  # ground_truth is a list of true labels, predictions is a list of predicted scores
  ground_truth = [0, 1, 0, 1, 0, 1]
  predictions = [0.1, 0.4, 0.35, 0.8, 0.7, 0.9]

  # Create the ROC curve plot
  # labels is an optional list of class names, classes_to_plot is an optional subset of those labels to visualize
  plot = wandb.plot.roc_curve(
      ground_truth, predictions, labels=None, classes_to_plot=None
  )

  run.log({"roc": plot})

You can log this whenever your code has access to:

a model’s predicted scores (predictions) on a set of examples
the corresponding ground truth labels (ground_truth) for those examples
(optionally) a list of the labels/ class names (labels=["cat", "dog", "bird"...] if label index 0 means cat, 1 = dog, 2 = bird, etc.)
(optionally) a subset (still in list format) of these labels to visualize on the plot

See an example report or try an example Google Colab notebook.

Custom presets

Tweak a builtin preset, or create a new preset, then save the chart. Use the chart ID to log data to that custom preset directly from your script. Try an example Google Colab notebook.

# Create a table with the columns to plot
table = wandb.Table(data=data, columns=["step", "height"])

# Map from the table's columns to the chart's fields
fields = {"x": "step", "value": "height"}

# Use the table to populate the new custom chart preset
# To use your own saved chart preset, change the vega_spec_name
my_custom_chart = wandb.plot_table(
    vega_spec_name="carey/new_chart",
    data_table=table,
    fields=fields,
)

Log data

You can log the following data types from your script and use them in a custom chart:

Config: Initial settings of your experiment (your independent variables). This includes any named fields you’ve logged as keys to wandb.Run.config at the start of your training. For example: wandb.Run.config.learning_rate = 0.0001
Summary: Single values logged during training (your results or dependent variables). For example, wandb.Run.log({"val_acc" : 0.8}). If you write to this key multiple times during training via wandb.Run.log(), the summary is set to the final value of that key.
History: The full time series of the logged scalar is available to the query via the history field
summaryTable: If you need to log a list of multiple values, use a wandb.Table() to save that data, then query it in your custom panel.
historyTable: If you need to see the history data, then query historyTable in your custom chart panel. Each time you call wandb.Table() or log a custom chart, you’re creating a new table in history for that step.

How to log a custom table

Use wandb.Table() to log your data as a 2D array. Typically each row of this table represents one data point, and each column denotes the relevant fields/dimensions for each data point which you’d like to plot. As you configure a custom panel, the whole table will be accessible via the named key passed to wandb.Run.log()(custom_data_table below), and the individual fields will be accessible via the column names (x, y, and z). You can log tables at multiple time steps throughout your experiment. The maximum size of each table is 10,000 rows. Try an example a Google Colab.

with wandb.init() as run:
  # Logging a custom table of data
  my_custom_data = [[x1, y1, z1], [x2, y2, z2]]
  run.log(
      {"custom_data_table": wandb.Table(data=my_custom_data, columns=["x", "y", "z"])}
  )

Customize the chart

Add a new custom chart to get started, then edit the query to select data from your visible runs. The query uses GraphQL to fetch data from the config, summary, and history fields in your runs.

Custom visualizations

Select a Chart in the upper right corner to start with a default preset. Next, select Chart fields to map the data you’re pulling in from the query to the corresponding fields in your chart.

The following image shows an example on how to select a metric then mapping that into the bar chart fields below.

How to edit Vega

Click Edit at the top of the panel to go into Vega edit mode. Here you can define a Vega specification that creates an interactive chart in the UI. You can change any aspect of the chart. For example, you can change the title, pick a different color scheme, show curves as a series of points instead of as connected lines. You can also make changes to the data itself, such as using a Vega transform to bin an array of values into a histogram. The panel preview will update interactively, so you can see the effect of your changes as you edit the Vega spec or query. Refer to the Vega documentation and tutorials .

Field references

To pull data into your chart from W&B, add template strings of the form "${field:<field-name>}" anywhere in your Vega spec. This will create a dropdown in the Chart Fields area on the right side, which users can use to select a query result column to map into Vega.

To set a default value for a field, use this syntax: "${field:<field-name>:<placeholder text>}"

Saving chart presets

Apply any changes to a specific visualization panel with the button at the bottom of the modal. Alternatively, you can save the Vega spec to use elsewhere in your project. To save the reusable chart definition, click Save as at the top of the Vega editor and give your preset a name.

Articles and guides

Common use cases

Customize bar plots with error bars
Show model validation metrics which require custom x-y coordinates (like precision-recall curves)
Overlay data distributions from two different models/experiments as histograms
Show changes in a metric via snapshots at multiple points during training
Create a unique visualization not yet available in W&B (and hopefully share it with the world)

2.4.2.1 - Tutorial: Use custom charts

Tutorial of using the custom charts feature in the W&B UI

Use custom charts to control the data you’re loading in to a panel and its visualization.

1. Log data to W&B

First, log data in your script. Use wandb.Run.config for single points set at the beginning of training, like hyperparameters. Use wandb.Run.log() for multiple points over time, and log custom 2D arrays with wandb.Table(). We recommend logging up to 10,000 data points per logged key.

with wandb.init() as run: 

  # Logging a custom table of data
  my_custom_data = [[x1, y1, z1], [x2, y2, z2]]
  run.log(
    {"custom_data_table": wandb.Table(data=my_custom_data, columns=["x", "y", "z"])}
  )

Try a quick example notebook to log the data tables, and in the next step we’ll set up custom charts. See what the resulting charts look like in the live report.

2. Create a query

Once you’ve logged data to visualize, go to your project page and click the + button to add a new panel, then select Custom Chart. You can follow along in the custom charts demo workspace.

Add a query

Click summary and select historyTable to set up a new query pulling data from the run history.
Type in the key where you logged the wandb.Table(). In the code snippet above, it was my_custom_table . In the example notebook, the keys are pr_curve and roc_curve.

Set Vega fields

Now that the query is loading in these columns, they’re available as options to select in the Vega fields dropdown menus:

x-axis: runSets_historyTable_r (recall)
y-axis: runSets_historyTable_p (precision)
color: runSets_historyTable_c (class label)

3. Customize the chart

Now that looks pretty good, but I’d like to switch from a scatter plot to a line plot. Click Edit to change the Vega spec for this built in chart. Follow along in the custom charts demo workspace.

I updated the Vega spec to customize the visualization:

add titles for the plot, legend, x-axis, and y-axis (set “title” for each field)
change the value of “mark” from “point” to “line”
remove the unused “size” field

To save this as a preset that you can use elsewhere in this project, click Save as at the top of the page. Here’s what the result looks like, along with an ROC curve:

Bonus: Composite Histograms

Histograms can visualize numerical distributions to help us understand larger datasets. Composite histograms show multiple distributions across the same bins, letting us compare two or more metrics across different models or across different classes within our model. For a semantic segmentation model detecting objects in driving scenes, we might compare the effectiveness of optimizing for accuracy versus intersection over union (IOU), or we might want to know how well different models detect cars (large, common regions in the data) versus traffic signs (much smaller, less common regions). In the demo Colab, you can compare the confidence scores for two of the ten classes of living things.

To create your own version of the custom composite histogram panel:

Create a new Custom Chart panel in your Workspace or Report (by adding a “Custom Chart” visualization). Hit the “Edit” button in the top right to modify the Vega spec starting from any built-in panel type.
Replace that built-in Vega spec with my MVP code for a composite histogram in Vega. You can modify the main title, axis titles, input domain, and any other details directly in this Vega spec using Vega syntax (you could change the colors or even add a third histogram :)
Modify the query in the right hand side to load the correct data from your wandb logs. Add the field summaryTable and set the corresponding tableKey to class_scores to fetch the wandb.Table logged by your run. This will let you populate the two histogram bin sets (red_bins and blue_bins) via the dropdown menus with the columns of the wandb.Table logged as class_scores. For my example, I chose the animal class prediction scores for the red bins and plant for the blue bins.
You can keep making changes to the Vega spec and query until you’re happy with the plot you see in the preview rendering. Once you’re done, click Save as in the top and give your custom plot a name so you can reuse it. Then click Apply from panel library to finish your plot.

Here’s what my results look like from a very brief experiment: training on only 1000 examples for one epoch yields a model that’s very confident that most images are not plants and very uncertain about which images might be animals.

2.4.3 - Console logs

When you run an experiment, you may notice various messages printed to your console. W&B captures console logs and displays them in the W&B App. Use these messages to debug and monitor the behavior of your experiment.

View console logs

Access console logs for a run in the W&B App:

Navigate to your project in the W&B App.
Select a run within the Runs table.
Click the Logs tab in the project sidebar.

Only 10,000 lines of your logs are shown due to storage limitations

Types of console logs

W&B captures several types of console logs: informational messages, warnings, and errors, with a prefix to indicate the log’s severity.

Informational messages

Informational messages provide updates about the run’s progress and status. They are typically prefixed with wandb:.

wandb: Starting Run: abc123
wandb: Run data is saved locally in ./wandb/run-20240125_120000-abc123

Warning messages

Warnings about potential issues that don’t stop execution are prefixed with WARNING:

WARNING Found .wandb file, not streaming tensorboard metrics.
WARNING These runs were logged with a previous version of wandb.

Error messages

Error messages for serious issues are prefixed with ERROR:. These indicate problems that may prevent the run from completing successfully.

ERROR Unable to save notebook session history.
ERROR Failed to save notebook.

Console log settings

Within your code, pass the wandb.Settings object to wandb.init() to configure how W&B handles console logs. Within wandb.Settings, you can set the following parameters to control console log behavior:

show_errors: If set to True, error messages are displayed in the W&B App. If set to False, error messages are not shown.
silent: If set to True, all W&B console output will be suppressed. This is useful for production environments where you want to minimize console noise.
show_warnings: If set to True, warning messages are displayed in the W&B App. If set to False, warning messages are not shown.
show_info: If set to True, informational messages are displayed in the W&B App. If set to False, informational messages are not shown.

The following example shows how to configure these settings:

import wandb

settings = wandb.Settings(
    show_errors=True,  # Show error messages in the W&B App
    silent=False,      # Disable all W&B console output
    show_warnings=True # Show warning messages in the W&B App
)

with wandb.init(settings=settings) as run:
    # Your training code here
    run.log({"accuracy": 0.95})

Custom logging

W&B captures console logs from your application, but it does not interfere with your own logging setup. You can use Python’s built-in print() function or the logging module to log messages.

import wandb

with wandb.init(project="my-project") as run:
    for i in range(100, 1000, 100):
        # This will log to W&B and print to console
        run.log({"epoch": i, "loss": 0.1 * i})
        print(f"epoch: {i} loss: {0.1 * i}")

The console logs will look similar to the following:

1 epoch:  100 loss: 1.3191105127334595
2 epoch:  200 loss: 0.8664389848709106
3 epoch:  300 loss: 0.6157898902893066
4 epoch:  400 loss: 0.4961796700954437
5 epoch:  500 loss: 0.42592573165893555
6 epoch:  600 loss: 0.3771176040172577
7 epoch:  700 loss: 0.3393910825252533
8 epoch:  800 loss: 0.3082585036754608
9 epoch:  900 loss: 0.28154927492141724

Time stamps

Time stamps are automatically added to each console log entry. This allows you to track when each log message was generated.

You can toggle the time stamps in the console logs on or off. Within the console page select the Timestamp visible dropdown in the top left corner. You can choose to show or hide the time stamps.

Search console logs

Use the search bar at the top of the console logs page to filter logs by keywords. You can search for specific terms, labels, or error messages.

Filter with custom labels

Parameters prefixed by x_ (such as x_label) are in public preview. Create a GitHub issue in the W&B repository to provide feedback.

You can filter console logs based on the labels you pass as arguments for x_label in wandb.Settings in the UI search bar located at the top of the console log page.

import wandb

# Initialize a run in the primary node
run = wandb.init(
    entity="entity",
    project="project",
	settings=wandb.Settings(
        x_label="custom_label"  # (Optional) Custom label for filtering logs
        )
)

Download console logs

Download console logs for a run in the W&B App:

Navigate to your project in the W&B App.
Select a run within the Runs table.
Click the Logs tab in the project sidebar.
Click the download button in the top right corner of the console logs page.

Copy console logs

Copy console logs for a run in the W&B App:

Navigate to your project in the W&B App.
Select a run within the Runs table.
Click the Logs tab in the project sidebar.
Click the copy button in the top right corner of the console logs page.

2.4.4 - Manage workspace, section, and panel settings

Within a given workspace page there are three different setting levels: workspaces, sections, and panels. Workspace settings apply to the entire workspace. Section settings apply to all panels within a section. Panel settings apply to individual panels.

Workspace settings

Workspace settings apply to all sections and all panels within those sections. You can edit two types of workspace settings: Workspace layout and Line plots. Workspace layouts determine the structure of the workspace, while Line plots settings control the default settings for line plots in the workspace.ne plots** settings control the default settings for line plots in the workspace.

To edit settings that apply to the overall structure of this workspace:

Navigate to your project workspace.
Click the gear icon next to the New report button to view the workspace settings.
Choose Workspace layout to change the workspace’s layout, or choose Line plots to configure default settings for line plots in the workspace.

After customizing your workspace, you can use workspace templates to quickly create new workspaces with the same settings. Refer to Workspace templates.

Workspace layout options

Configure a workspaces layout to define the overall structure of the workspace. This includes sectioning logic and panel organization.

The workspace layout options page shows whether the workspace generates panels automatically or manually. To adjust a workspace’s panel generation mode, refer to Panels.

This table describes each workspace layout option.

Workspace setting	Description
Hide empty sections during search	Hide sections that do not contain any panels when searching for a panel.
Sort panels alphabetically	Sort panels in your workspaces alphabetically.
Section organization	Remove all existing sections and panels and repopulate them with new section names. Groups the newly populated sections either by first or last prefix.

W&B suggests that you organize sections by grouping the first prefix rather than grouping by the last prefix. Grouping by the first prefix can result in fewer sections and better performance.

Line plots options

Set global defaults and custom rules for line plots in a workspace by modifying the Line plots workspace settings.

You can edit two main settings within Line plots settings: Data and Display preferences. The Data tab contains the following settings:

Line plot setting	Description
X axis	The scale of the x-axis in line plots. The x-axis is set to Step by default. See the proceeding table for the list of x-axis options.
Range	Minimum and maximum settings to display for x axis.
Smoothing	Change the smoothing on the line plot. For more information about smoothing, see Smooth line plots.
Outliers	Rescale to exclude outliers from the default plot min and max scale.
Point aggregation method	Improve data visualization accuracy and performance. See Point aggregation for more information.
Max number of runs or groups	Limit the number of runs or groups displayed on the line plot.

In addition to Step, there are other options for the x-axis:

X axis option	Description
Relative Time (Wall)	Timestamp since the process starts. For example, suppose start a run and resume that run the next day. If you then log something, the recorded point is 24 hours.
Relative Time (Process)	Timestamp inside the running process. For example, suppose you start a run and let it continue for 10 seconds. The next day you resume that run. The point is recorded as 10 seconds.
Wall Time	Minutes elapsed since the start of the first run on the graph.
Step	Increments each time you call `wandb.Run.log()`.

For information on how to edit an individual line plot, see Edit line panel settings in Line plots.

Within the Display preferences tab, you can toggle the proceeding settings:

Display preference	Description
Remove legends from all panels	Remove the panel’s legend
Display colored run names in tooltips	Show the runs as colored text within the tooltip
Only show highlighted run in companion chart tooltip	Display only highlighted runs in chart tooltip
Number of runs shown in tooltips	Display the number of runs in the tooltip
Display full run names on the primary chart tooltip	Display the full name of the run in the chart tooltip

Section settings

Section settings apply to all panels within that section. Within a workspace section you can sort panels, rearrange panels, and rename the section name.

Modify section settings by selecting the three horizontal dots (…) in the upper right corner of a section.

From the dropdown, you can edit the following settings that apply to the entire section:

Section setting	Description
Rename a section	Rename the name of the section
Sort panels A-Z	Sort panels within a section alphabetically
Rearrange panels	Select and drag a panel within a section to manually order your panels

The proceeding animation demonstrates how to rearrange panels within a section:

In addition to the settings described in the preceding table, you can also edit how sections appear in your workspaces such as Add section below, Add section above, Delete section, and Add section to report.

Panel settings

Customize an individual panel’s settings to compare multiple lines on the same plot, calculate custom axes, rename labels, and more. To edit a panel’s settings:

Hover your mouse over the panel you want to edit.
Select the pencil icon that appears.
Within the modal that appears, you can edit settings related to the panel’s data, display preferences, and more.

For a complete list of settings you can apply to a panel, see Edit line panel settings.

2.4.5 - Settings

Use the Weights and Biases Settings Page to customize your individual user profile or team settings.

Within your individual user account you can edit: your profile picture, display name, geography location, biography information, emails associated to your account, and manage alerts for runs. You can also use the settings page to link your GitHub repository and delete your account. For more information, see User settings.

Use the team settings page to invite or remove new members to a team, manage alerts for team runs, change privacy settings, and view and manage storage usage. For more information about team settings, see Team settings.

2.4.5.1 - Manage user settings

Manage your profile information, account defaults, alerts, participation in beta products, GitHub integration, storage usage, account activation, and create teams in your user settings.

Navigate to your user profile page and select your user icon on the top right corner. From the dropdown, choose Settings.

Profile

Within the Profile section you can manage and modify your account name and institution. You can optionally add a biography, location, link to a personal or your institution’s website, and upload a profile image.

Edit your intro

To edit your intro, click Edit at the top of your profile. The WYSIWYG editor that opens supports Markdown.

To edit a line, click it. To save time, you can type / and choose Markdown from the list.
Use an item’s drag handles to move it.
To delete a block, click the drag handle, then click Delete.
To save your changes, click Save.

To add a follow badge for the @weights_biases account on X, you could add a Markdown-style link with an HTML <img> tag that points to the badge image:

[![X: @weights_biases](https://img.shields.io/twitter/follow/weights_biases?style=social)](https://x.com/intent/follow?screen_name=weights_biases)

In an <img> tag, you can specify width, height, or both. If you specify only one of them, the image’s proportions are maintained.

Teams

Create a new team in the Team section. To create a new team, select the New team button and provide the following:

Team name - the name of your team. The team mane must be unique. Team names can not be changed.
Team type - Select either the Work or Academic button.
Company/Organization - Provide the name of the team’s company or organization. Choose the dropdown menu to select a company or organization. You can optionally provide a new organization.

Only administrative accounts can create a team.

Beta features

Within the Beta Features section you can optionally enable fun add-ons and sneak previews of new products in development. Select the toggle switch next to the beta feature you want to enable.

Alerts

Get notified when your runs crash, finish, or set custom alerts with wandb.Run.alert(). Receive notifications either through Email or Slack. Toggle the switch next to the event type you want to receive alerts from.

Runs finished: whether a Weights and Biases run successfully finished.
Run crashed: notification if a run has failed to finish.

For more information about how to set up and manage alerts, see Send alerts with wandb.Run.alert().

Personal GitHub integration

Connect a personal Github account. To connect a Github account:

Select the Connect Github button. This will redirect you to an open authorization (OAuth) page.
Select the organization to grant access in the Organization access section.
Select Authorize wandb.

Delete your account

Select the Delete Account button to delete your account.

Account deletion can not be reversed.

Storage

The Storage section describes the total memory usage the your account has consumed on the Weights and Biases servers. The default storage plan is 100GB. For more information about storage and pricing, see the Pricing page.

2.4.5.2 - Manage billing settings

Manage your organization’s billing settings

Navigate to your user profile page and select your user icon on the top right corner. From the dropdown, choose Billing, or choose Settings and then select the Billing tab.

Plan details

The Plan details section summarizes your organization’s current plan, charges, limits, and usage.

For details and a list of users, click Manage users.
For details about usage, click View usage.
Amount of storage your organization uses, both free and paid. From here, you can purchase additional storage and manage storage that is currently in use. Learn more about storage settings.

From here, you can compare plans or talk to Sales.

Plan usage

This section visually summarizes current usage and displays upcoming usage charges. For detailed insights into usage by month, click View usage on an individual tile. To export usage by calendar month, team, or project, click Export CSV.

Usage alerts

Usage alerts are not available on the Enterprise plan.

For organizations on paid plans, admins receive alerts via email once per billing period when certain thresholds are met, along with details about how to increase your organization’s limits if you are a billing admin and how to contact a billing admin otherwise. On the Pro plan, only the billing admin receives usage alerts.

These alerts are not configurable, and are sent when:

Your organization is approaching a monthly limit of a category of usage (85% of hours used) and when it reaches 100% of the limit, according to your plan.
Your organization’s accumulated average charges for a billing period exceed these thresholds: $200, $450, $700, and $1000. These overage charges are incurred when your organization accumulates more usage than your plan includes for tracked hours, storage, or W&B Weave data ingestion.

For questions about usage or billing, contact your account team or Support.

Payment methods

This section shows the payment methods on file for your organization. If you have not added a payment method, you will be prompted to do so when you upgrade your plan or add paid storage.

Billing admin

This section shows the current billing admin. The billing admin is an organization admin, receives all billing-related emails, and can view and manage payment methods.

In W&B Dedicated Cloud, multiple users can be billing admins. In W&B Multi-tenant Cloud, only one user at a time can be the billing admin.

To change the billing admin or assign the role to additional users:

Click Manage roles.
Search for a user.
Click the Billing admin field in that user’s row.
Read the summary, then click Change billing user.

Invoices

If you pay using a credit card, this section allows you to view monthly invoices.

For Enterprise accounts that pay via wire transfer, this section is blank. For questions, contact your account team.
If your organization incurs no charges, no invoice is generated.

2.4.5.3 - Manage team settings

Manage a team’s members, avatar, alerts, and privacy settings with the Team Settings page.

Team settings

Change your team’s settings, including members, avatar, alerts, privacy, and usage. Organization admins and team admins can view and edit a team’s settings.

Only Administration account types can change team settings or remove a member from a team.

Members

The Members section shows a list of all pending invitations and the members that have either accepted the invitation to join the team. Each member listed displays a member’s name, username, email, team role, as well as their access privileges to Models and W&B Weave, which is inherited by from the Organization. You can choose from the standard team roles Admin, Member, and View-only. If your organization has created custom roles, you can assign a custom role instead.

See Add and Manage teams for information on how to create a team, manage teams, and manage team membership and roles. To configure who can invite new members and configure other privacy settings for the team, refer to Privacy.

Avatar

Set an avatar by navigating to the Avatar section and uploading an image.

Select the Update Avatar to prompt a file dialog to appear.
From the file dialog, choose the image you want to use.

Alerts

Notify your team when runs crash, finish, or set custom alerts. Your team can receive alerts either through email or Slack.

Toggle the switch next to the event type you want to receive alerts from. Weights and Biases provides the following event type options be default:

Runs finished: whether a Weights and Biases run successfully finished.
Run crashed: if a run has failed to finish.

For more information about how to set up and manage alerts, see Send alerts with wandb.Run.alert().

Slack notifications

Configure Slack destinations where your team’s automations can send notifications when an event occurs in a Registry or a project, such as when a new artifact is created or when a run metric meets a defined threshold. Refer to Create a Slack automation.

This feature is available for all Enterprise licenses.

Webhooks

Configure webhooks that your team’s automations can run when an event occurs in a Registry or a project, such as when a new artifact is created or when a run metric meets a defined threshold. Refer to Create a webhook automation.

This feature is available for all Enterprise licenses.

Privacy

Navigate to the Privacy section to change privacy settings. Only organization admins can modify privacy setting.

Turn off the ability to make future projects public or to share reports publicly.
Allow any team member to invite other members, rather than only team admins.
Manage whether code saving is turned on by default.

Usage

The Usage section describes the total memory usage the team has consumed on the Weights and Biases servers. The default storage plan is 100GB. For more information about storage and pricing, see the Pricing page.

Storage

The Storage section describes the cloud storage bucket configuration that is being used for the team’s data. For more information, see Secure Storage Connector or check out our W&B Server docs if you are self-hosting.

2.4.5.4 - Manage email settings

Manage emails from the Settings page.

Add, delete, manage email types and primary email addresses in your W&B Profile Settings page. Select your profile icon in the upper right corner of the W&B dashboard. From the dropdown, select Settings. Within the Settings page, scroll down to the Emails dashboard:

Manage primary email

The primary email is marked with a 😎 emoji. The primary email is automatically defined with the email you provided when you created a W&B account.

Select the kebab dropdown to change the primary email associated with your Weights And Biases account:

Only verified emails can be set as primary

Add emails

Select + Add Email to add an email. This will take you to an Auth0 page. You can enter in the credentials for the new email or connect using single sign-on (SSO).

Delete emails

Select the kebab dropdown and choose Delete Emails to delete an email that is registered to your W&B account

Primary emails cannot be deleted. You need to set a different email as a primary email before deleting.

Log in methods

The Log in Methods column displays the log in methods that are associated with your account.

A verification email is sent to your email account when you create a W&B account. Your email account is considered unverified until you verify your email address. Unverified emails are displayed in red.

Attempt to log in with your email address again to retrieve a second verification email if you no longer have the original verification email that was sent to your email account.

Contact support@wandb.com for account log in issues.

2.4.5.5 - Manage teams

Collaborate with your colleagues, share results, and track all the experiments across your team.

Use W&B Teams as a central workspace for your ML team to build better models faster.

Track all the experiments your team has tried so you never duplicate work.
Save and reproduce previously trained models.
Share progress and results with your boss and collaborators.
Catch regressions and immediately get alerted when performance drops.
Benchmark model performance and compare model versions.

Create a collaborative team

Sign up or log in to your free W&B account.
Click Invite Team in the navigation bar.
Create your team and invite collaborators.
To configure your team, refer to Manage team settings.

Note: Only the admin of an organization can create a new team.

Create a team profile

You can customize your team’s profile page to show an introduction and showcase reports and projects that are visible to the public or team members. Present reports, projects, and external links.

Highlight your best research to visitors by showcasing your best public reports
Showcase the most active projects to make it easier for teammates to find them
Find collaborators by adding external links to your company or research lab’s website and any papers you’ve published

Remove team members

Team admins can open the team settings page and click the delete button next to the departing member’s name. Any runs logged to the team remain after a user leaves.

Manage team roles and permissions

Select a team role when you invite colleagues to join a team. There are following team role options:

Admin: Team admins can add and remove other admins or team members. They have permissions to modify all projects and full deletion permissions. This includes, but is not limited to, deleting runs, projects, artifacts, and sweeps.
Member: A regular member of the team. By default, only an admin can invite a team member. To change this behavior, refer to Manage team settings.

A team member can delete only runs they created. Suppose you have two members A and B. Member B moves a run from team B’s project to a different project owned by Member A. Member A cannot delete the run Member B moved to Member A’s project. An admin can manage runs and sweep runs created by any team member.

View-Only (Enterprise-only feature): View-Only members can view assets within the team such as runs, reports, and workspaces. They can follow and comment on reports, but they can not create, edit, or delete project overview, reports, or runs.
Custom roles (Enterprise-only feature): Custom roles allow organization admins to compose new roles based on either of the View-Only or Member roles, together with additional permissions to achieve fine-grained access control. Team admins can then assign any of those custom roles to users in their respective teams. Refer to Introducing Custom Roles for W&B Teams for details.
Service accounts (Enterprise-only feature): Refer to Use service accounts to automate workflows.

W&B recommends to have more than one admin in a team. It is a best practice to ensure that admin operations can continue when the primary admin is not available.

Team settings

Team settings allow you to manage the settings for your team and its members. With these privileges, you can effectively oversee and organize your team within W&B.

Permissions	View-Only	Team Member	Team Admin
Add team members			X
Remove team members			X
Manage team settings			X

Registry

The proceeding table lists permissions that apply to all projects across a given team.

Permissions	View-Only	Team Member	Registry Admin	Team Admin
Add aliases		X	X	X
Add models to the registry		X	X	X
View models in the registry	X	X	X	X
Download models	X	X	X	X
Add or remove Registry Admins			X	X
Add or remove Protected Aliases			X

For more details about protected aliases, refer to Registry Access Controls.

Reports

Report permissions grant access to create, view, and edit reports. The proceeding table lists permissions that apply to all reports across a given team.

Permissions	View-Only	Team Member	Team Admin
View reports	X	X	X
Create reports		X	X
Edit reports		X (team members can only edit their own reports)	X
Delete reports		X (team members can only edit their own reports)	X

Experiments

The proceeding table lists permissions that apply to all experiments across a given team.

Permissions	View-Only	Team Member	Team Admin
View experiment metadata (includes history metrics, system metrics, files, and logs)	X	X	X
Edit experiment panels and workspaces		X	X
Log experiments		X	X
Delete experiments		X (team members can only delete experiments they created)	X
Stop experiments		X (team members can only stop experiments they created)	X

Artifacts

The proceeding table lists permissions that apply to all artifacts across a given team.

Permissions	View-Only	Team Member	Team Admin
View artifacts	X	X	X
Create artifacts		X	X
Delete artifacts		X	X
Edit metadata		X	X
Edit aliases		X	X
Delete aliases		X	X
Download artifact		X	X

System settings (W&B Server only)

Use system permissions to create and manage teams and their members and to adjust system settings. These privileges enable you to effectively administer and maintain the W&B instance.

Permissions	View-Only	Team Member	Team Admin	System Admin
Configure system settings				X
Create/delete teams				X

Team service account behavior

When you configure a team in your training environment, you can use a service account from that team to log runs in either of private or public projects within that team. Additionally, you can attribute those runs to a user if WANDB_USERNAME or WANDB_USER_EMAIL variable exists in your environment and the referenced user is part of that team.
When you do not configure a team in your training environment and use a service account, the runs log to the named project within that service account’s parent team. In this case as well, you can attribute the runs to a user if WANDB_USERNAME or WANDB_USER_EMAIL variable exists in your environment and the referenced user is part of the service account’s parent team.
A service account can not log runs to a private project in a team different from its parent team. A service account can log to runs to project only if the project is set to Open project visibility.

Team trials

See the pricing page for more information on W&B plans. You can download all your data at any time, either using the dashboard UI or the Export API.

Privacy settings

You can see the privacy settings of all team projects on the team settings page: app.wandb.ai/teams/your-team-name

Advanced configuration

Secure storage connector

The team-level secure storage connector allows teams to use their own cloud storage bucket with W&B. This provides greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements. Refer to Secure Storage Connector for more information.

2.4.5.6 - Manage storage

Ways to manage W&B data storage.

If you are approaching or exceeding your storage limit, there are multiple paths forward to manage your data. The path that’s best for you will depend on your account type and your current project setup.

Manage storage consumption

W&B offers different methods of optimizing your storage consumption:

Use reference artifacts to track files saved outside the W&B system, instead of uploading them to W&B storage.
Use an external cloud storage bucket for storage. (Enterprise only)

Delete data

You can also choose to delete data to remain under your storage limit. There are several ways to do this:

Delete data interactively with the app UI.
Set a TTL policy on Artifacts so they are automatically deleted.

2.4.5.7 - System metrics

Metrics automatically logged by W&B.

This page provides detailed information about the system metrics that are tracked by the W&B SDK.

wandb automatically logs system metrics every 15 seconds.

CPU

Process CPU Percent (CPU)

Percentage of CPU usage by the process, normalized by the number of available CPUs.

W&B assigns a cpu tag to this metric.

Process CPU Threads

The number of threads utilized by the process.

W&B assigns a proc.cpu.threads tag to this metric.

Disk

By default, the usage metrics are collected for the / path. To configure the paths to be monitored, use the following setting:

run = wandb.init(
    settings=wandb.Settings(
        x_stats_disk_paths=("/System/Volumes/Data", "/home", "/mnt/data"),
    ),
)

Disk Usage Percent

Represents the total system disk usage in percentage for specified paths.

W&B assigns a disk.{path}.usagePercent tag to this metric.

Disk Usage

Represents the total system disk usage in gigabytes (GB) for specified paths. The paths that are accessible are sampled, and the disk usage (in GB) for each path is appended to the samples.

W&B assigns a disk.{path}.usageGB tag to this metric.

Disk In

Indicates the total system disk read in megabytes (MB). The initial disk read bytes are recorded when the first sample is taken. Subsequent samples calculate the difference between the current read bytes and the initial value.

W&B assigns a disk.in tag to this metric.

Disk Out

Represents the total system disk write in megabytes (MB). Similar to Disk In, the initial disk write bytes are recorded when the first sample is taken. Subsequent samples calculate the difference between the current write bytes and the initial value.

W&B assigns a disk.out tag to this metric.

Memory

Process Memory RSS

Represents the Memory Resident Set Size (RSS) in megabytes (MB) for the process. RSS is the portion of memory occupied by a process that is held in main memory (RAM).

W&B assigns a proc.memory.rssMB tag to this metric.

Process Memory Percent

Indicates the memory usage of the process as a percentage of the total available memory.

W&B assigns a proc.memory.percent tag to this metric.

Memory Percent

Represents the total system memory usage as a percentage of the total available memory.

W&B assigns a memory_percent tag to this metric.

Memory Available

Indicates the total available system memory in megabytes (MB).

W&B assigns a proc.memory.availableMB tag to this metric.

Network

Network Sent

Represents the total bytes sent over the network. The initial bytes sent are recorded when the metric is first initialized. Subsequent samples calculate the difference between the current bytes sent and the initial value.

W&B assigns a network.sent tag to this metric.

Network Received

Indicates the total bytes received over the network. Similar to Network Sent, the initial bytes received are recorded when the metric is first initialized. Subsequent samples calculate the difference between the current bytes received and the initial value.

W&B assigns a network.recv tag to this metric.

NVIDIA GPU

In addition to the metrics described below, if the process and/or its descendants use a particular GPU, W&B captures the corresponding metrics as gpu.process.{gpu_index}.{metric_name}

GPU Memory Utilization

Represents the GPU memory utilization in percent for each GPU.

W&B assigns a gpu.{gpu_index}.memory tag to this metric.

GPU Memory Allocated

Indicates the GPU memory allocated as a percentage of the total available memory for each GPU.

W&B assigns a gpu.{gpu_index}.memoryAllocated tag to this metric.

GPU Memory Allocated Bytes

Specifies the GPU memory allocated in bytes for each GPU.

W&B assigns a gpu.{gpu_index}.memoryAllocatedBytes tag to this metric.

GPU Utilization

Reflects the GPU utilization in percent for each GPU.

W&B assigns a gpu.{gpu_index}.gpu tag to this metric.

GPU Temperature

The GPU temperature in Celsius for each GPU.

W&B assigns a gpu.{gpu_index}.temp tag to this metric.

GPU Power Usage Watts

Indicates the GPU power usage in Watts for each GPU.

W&B assigns a gpu.{gpu_index}.powerWatts tag to this metric.

GPU Power Usage Percent

Reflects the GPU power usage as a percentage of its power capacity for each GPU.

W&B assigns a gpu.{gpu_index}.powerPercent tag to this metric.

GPU SM Clock Speed

Represents the clock speed of the Streaming Multiprocessor (SM) on the GPU in MHz. This metric is indicative of the processing speed within the GPU cores responsible for computation tasks.

W&B assigns a gpu.{gpu_index}.smClock tag to this metric.

GPU Memory Clock Speed

Represents the clock speed of the GPU memory in MHz, which influences the rate of data transfer between the GPU memory and processing cores.

W&B assigns a gpu.{gpu_index}.memoryClock tag to this metric.

GPU Graphics Clock Speed

Represents the base clock speed for graphics rendering operations on the GPU, expressed in MHz. This metric often reflects performance during visualization or rendering tasks.

W&B assigns a gpu.{gpu_index}.graphicsClock tag to this metric.

GPU Corrected Memory Errors

Tracks the count of memory errors on the GPU that W&B automatically corrects by error-checking protocols, indicating recoverable hardware issues.

W&B assigns a gpu.{gpu_index}.correctedMemoryErrors tag to this metric.

GPU Uncorrected Memory Errors

Tracks the count of memory errors on the GPU that W&B uncorrected, indicating non-recoverable errors which can impact processing reliability.

W&B assigns a gpu.{gpu_index}.unCorrectedMemoryErrors tag to this metric.

GPU Encoder Utilization

Represents the percentage utilization of the GPU’s video encoder, indicating its load when encoding tasks (for example, video rendering) are running.

W&B assigns a gpu.{gpu_index}.encoderUtilization tag to this metric.

AMD GPU

W&B extracts metrics from the output of the rocm-smi tool supplied by AMD (rocm-smi -a --json).

ROCm 6.x (latest) and 5.x formats are supported. Learn more about ROCm formats in the AMD ROCm documentation. The newer format includes more details.

AMD GPU Utilization

Represents the GPU utilization in percent for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.gpu tag to this metric.

AMD GPU Memory Allocated

Indicates the GPU memory allocated as a percentage of the total available memory for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.memoryAllocated tag to this metric.

AMD GPU Temperature

The GPU temperature in Celsius for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.temp tag to this metric.

AMD GPU Power Usage Watts

The GPU power usage in Watts for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.powerWatts tag to this metric.

AMD GPU Power Usage Percent

Reflects the GPU power usage as a percentage of its power capacity for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.powerPercent to this metric.

Apple ARM Mac GPU

Apple GPU Utilization

Indicates the GPU utilization in percent for Apple GPU devices, specifically on ARM Macs.

W&B assigns a gpu.0.gpu tag to this metric.

Apple GPU Memory Allocated

The GPU memory allocated as a percentage of the total available memory for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.memoryAllocated tag to this metric.

Apple GPU Temperature

The GPU temperature in Celsius for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.temp tag to this metric.

Apple GPU Power Usage Watts

The GPU power usage in Watts for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.powerWatts tag to this metric.

Apple GPU Power Usage Percent

The GPU power usage as a percentage of its power capacity for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.powerPercent tag to this metric.

Graphcore IPU

Graphcore IPUs (Intelligence Processing Units) are unique hardware accelerators designed specifically for machine intelligence tasks.

IPU Device Metrics

These metrics represent various statistics for a specific IPU device. Each metric has a device ID (device_id) and a metric key (metric_key) to identify it. W&B assigns a ipu.{device_id}.{metric_key} tag to this metric.

Metrics are extracted using the proprietary gcipuinfo library, which interacts with Graphcore’s gcipuinfo binary. The sample method fetches these metrics for each IPU device associated with the process ID (pid). Only the metrics that change over time, or the first time a device’s metrics are fetched, are logged to avoid logging redundant data.

For each metric, the method parse_metric is used to extract the metric’s value from its raw string representation. The metrics are then aggregated across multiple samples using the aggregate method.

The following lists available metrics and their units:

Average Board Temperature (average board temp (C)): Temperature of the IPU board in Celsius.
Average Die Temperature (average die temp (C)): Temperature of the IPU die in Celsius.
Clock Speed (clock (MHz)): The clock speed of the IPU in MHz.
IPU Power (ipu power (W)): Power consumption of the IPU in Watts.
IPU Utilization (ipu utilisation (%)): Percentage of IPU utilization.
IPU Session Utilization (ipu utilisation (session) (%)): IPU utilization percentage specific to the current session.
Data Link Speed (speed (GT/s)): Speed of data transmission in Giga-transfers per second.

Google Cloud TPU

Tensor Processing Units (TPUs) are Google’s custom-developed ASICs (Application Specific Integrated Circuits) used to accelerate machine learning workloads.

TPU Memory usage

The current High Bandwidth Memory usage in bytes per TPU core.

W&B assigns a tpu.{tpu_index}.memoryUsageBytes tag to this metric.

TPU Memory usage percentage

The current High Bandwidth Memory usage in percent per TPU core.

W&B assigns a tpu.{tpu_index}.memoryUsageBytes tag to this metric.

TPU Duty cycle

TensorCore duty cycle percentage per TPU device. Tracks the percentage of time over the sample period during which the accelerator TensorCore was actively processing. A larger value means better TensorCore utilization.

W&B assigns a tpu.{tpu_index}.dutyCycle tag to this metric.

AWS Trainium

AWS Trainium is a specialized hardware platform offered by AWS that focuses on accelerating machine learning workloads. The neuron-monitor tool from AWS is used to capture the AWS Trainium metrics.

Trainium Neuron Core Utilization

The utilization percentage of each NeuronCore, reported on a per-core basis.

W&B assigns a trn.{core_index}.neuroncore_utilization tag to this metric.

Trainium Host Memory Usage, Total

The total memory consumption on the host in bytes.

W&B assigns a trn.host_total_memory_usage tag to this metric.

Trainium Neuron Device Total Memory Usage

The total memory usage on the Neuron device in bytes.

W&B assigns a trn.neuron_device_total_memory_usage) tag to this metric.

Trainium Host Memory Usage Breakdown:

The following is a breakdown of memory usage on the host:

Application Memory (trn.host_total_memory_usage.application_memory): Memory used by the application.
Constants (trn.host_total_memory_usage.constants): Memory used for constants.
DMA Buffers (trn.host_total_memory_usage.dma_buffers): Memory used for Direct Memory Access buffers.
Tensors (trn.host_total_memory_usage.tensors): Memory used for tensors.

Trainium Neuron Core Memory Usage Breakdown

Detailed memory usage information for each NeuronCore:

Constants (trn.{core_index}.neuroncore_memory_usage.constants)
Model Code (trn.{core_index}.neuroncore_memory_usage.model_code)
Model Shared Scratchpad (trn.{core_index}.neuroncore_memory_usage.model_shared_scratchpad)
Runtime Memory (trn.{core_index}.neuroncore_memory_usage.runtime_memory)
Tensors (trn.{core_index}.neuroncore_memory_usage.tensors)

OpenMetrics

Capture and log metrics from external endpoints that expose OpenMetrics / Prometheus-compatible data with support for custom regex-based metric filters to be applied to the consumed endpoints.

Refer to Monitoring GPU cluster performance in W&B for a detailed example of how to use this feature in a particular case of monitoring GPU cluster performance with the NVIDIA DCGM-Exporter.

2.4.5.8 - Anonymous mode

Log and visualize data without a W&B account

Are you publishing code that you want anyone to be able to run easily? Use anonymous mode to let someone run your code, see a W&B dashboard, and visualize results without needing to create a W&B account first.

Allow results to be logged in anonymous mode with:

import wandb

wandb.init(anonymous="allow")

For example, the proceeding code snippet shows how to create and log an artifact with W&B:

import wandb

run = wandb.init(anonymous="allow")

artifact = wandb.Artifact(name="art1", type="foo")
artifact.add_file(local_path="path/to/file")
run.log_artifact(artifact)

run.finish()

Try the example notebook to see how anonymous mode works.

3 - W&B Weave

Are you looking for the official Weave documentation? Head over to weave-docs.wandb.ai.

W&B Weave is a framework for tracking, experimenting with, evaluating, deploying, and improving LLM-based applications. Designed for flexibility and scalability, Weave supports every stage of your LLM application development workflow:

Tracing & Monitoring: Track LLM calls and application logic to debug and analyze production systems.
Systematic Iteration: Refine and iterate on prompts, datasets and models.
Experimentation: Experiment with different models and prompts in the LLM Playground.
Evaluation: Use custom or pre-built scorers alongside our comparison tools to systematically assess and enhance application performance.
Guardrails: Protect your application with safeguards for content moderation, prompt safety, and more.

Get started with Weave

Are you new to Weave? Set up and start using Weave with the Python quickstart or TypeScript quickstart.

Advanced guides

Learn more about advanced topics:

Integrations: Use Weave with popular LLM providers, local models, frameworks, and third-party services.
Cookbooks: Build with Weave using Python and TypeScript. Tutorials are available as interactive notebooks.
W&B AI Academy: Build advanced RAG systems, improve LLM prompting, fine-tune LLMs, and more.
Weave Python SDK
Weave TypeScript SDK
Weave Service API

4 - W&B Core

W&B Core is the foundational framework supporting W&B Models and W&B Weave, and is itself supported by the W&B Platform.

W&B Core provides capabilities across the entire ML lifecycle. With W&B Core, you can:

Version and manage ML pipelines with full lineage tracing for easy auditing and reproducibility.
Explore and evaluate data and metrics using interactive, configurable visualizations.
Document and share insights across the entire organization by generating live reports in digestible, visual formats that are easily understood by non-technical stakeholders.
Query and create visualizations of your data that serve your custom needs.
Protect sensitive strings using secrets.
Configure automations that trigger key workflows for model CI/CD.

4.1 - Artifacts

Overview of W&B Artifacts, how they work, and how to get started using them.

Try in Colab Try in W&B

Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. You can log hyperparameters, metadata, and metrics to a run, and you can use an artifact to log, track, and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as output.

Use cases

You can use artifacts throughout your entire ML workflow as inputs and outputs of runs. You can use datasets, models, or even other artifacts as inputs for processing.

Use Case	Input	Output
Model Training	Dataset (training and validation data)	Trained Model
Dataset Pre-Processing	Dataset (raw data)	Dataset (pre-processed data)
Model Evaluation	Model + Dataset (test data)	W&B Table
Model Optimization	Model	Optimized Model

The proceeding code snippets are meant to be run in order.

Create an artifact

Create an artifact with four lines of code:

Create a W&B run.
Create an artifact object with the wandb.Artifact API.
Add one or more files, such as a model file or dataset, to your artifact object.
Log your artifact to W&B.

For example, the proceeding code snippet shows how to log a file called dataset.h5 to an artifact called example_artifact:

import wandb

run = wandb.init(project="artifacts-example", job_type="add-dataset")
artifact = wandb.Artifact(name="example_artifact", type="dataset")
artifact.add_file(local_path="./dataset.h5", name="training_dataset")
artifact.save()

# Logs the artifact version "my_data" as a dataset with data from dataset.h5

The type of the artifact affects how it appears in the W&B platform. If you do not specify a type, it defaults to unspecified.
Each label of the dropdown represents a different type parameter value. In the above code snippet, the artifact’s type is dataset.

See the track external files page for information on how to add references to files or directories stored in external object storage, like an Amazon S3 bucket.

Download an artifact

Indicate the artifact you want to mark as input to your run with the use_artifact method.

Following the preceding code snippet, this next code block shows how to use the training_dataset artifact:

artifact = run.use_artifact(
    "training_dataset:latest"
)  # returns a run object using the "my_data" artifact

This returns an artifact object.

Next, use the returned object to download all contents of the artifact:

datadir = (
    artifact.download()
)  # downloads the full `my_data` artifact to the default directory.

You can pass a custom path into the root parameter to download an artifact to a specific directory. For alternate ways to download artifacts and to see additional parameters, see the guide on downloading and using artifacts.

Next steps

Learn how to version and update artifacts.
Learn how to trigger downstream workflows or notify a Slack channel in response to changes to your artifacts with automations.
Learn about the registry, a space that houses trained models.
Explore the Python SDK and CLI reference guides.

4.1.1 - Create an artifact

Create, construct a W&B Artifact. Learn how to add one or more files or a URI reference to an Artifact.

Use the W&B Python SDK to construct artifacts from W&B Runs. You can add files, directories, URIs, and files from parallel runs to artifacts. After you add a file to an artifact, save the artifact to the W&B Server or your own private server.

For information on how to track external files, such as files stored in Amazon S3, see the Track external files page.

How to construct an artifact

Construct a W&B Artifact in three steps:

1. Create an artifact Python object with `wandb.Artifact()`

Initialize the wandb.Artifact() class to create an artifact object. Specify the following parameters:

Name: Specify a name for your artifact. The name should be unique, descriptive, and easy to remember. Use an artifacts name to both: identify the artifact in the W&B App UI and when you want to use that artifact.
Type: Provide a type. The type should be simple, descriptive and correspond to a single step of your machine learning pipeline. Common artifact types include 'dataset' or 'model'.

The “name” and “type” you provide is used to create a directed acyclic graph. This means you can view the lineage of an artifact on the W&B App.

See the Explore and traverse artifact graphs for more information.

Artifacts can not have the same name, even if you specify a different type for the types parameter. In other words, you can not create an artifact named cats of type dataset and another artifact with the same name of type model.

You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see the wandb.Artifact Class definition in the Python SDK Reference Guide.

The proceeding example demonstrates how to create a dataset artifact:

import wandb

artifact = wandb.Artifact(name="<replace>", type="<replace>")

Replace the string arguments in the preceding code snippet with your own name and type.

2. Add one more files to the artifact

Add files, directories, external URI references (such as Amazon S3) and more with artifact methods. For example, to add a single text file, use the add_file method:

artifact.add_file(local_path="hello_world.txt", name="optional-name")

You can also add multiple files with the add_dir method. To add files, see Update an artifact.

3. Save your artifact to the W&B server

Finally, save your artifact to the W&B server. Artifacts are associated with a run. Therefore, use a run objects log_artifact() method to save the artifact.

# Create a W&B Run. Replace 'job-type'.
run = wandb.init(project="artifacts-example", job_type="job-type")

run.log_artifact(artifact)

You can optionally construct an artifact outside of a W&B run. For more information, see Track external files.

Calls to log_artifact are performed asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example:

for i in range(10):
    a = wandb.Artifact(
        "race",
        type="dataset",
        metadata={
            "index": i,
        },
    )
    # ... add files to artifact a ...
    run.log_artifact(a)

The artifact version v0 is NOT guaranteed to have an index of 0 in its metadata, as the artifacts may be logged in an arbitrary order.

Add files to an artifact

The following sections demonstrate how to construct artifacts with different file types and from parallel runs.

For the following examples, assume you have a project directory with multiple files and a directory structure:

project-directory
|-- images
|   |-- cat.png
|   +-- dog.png
|-- checkpoints
|   +-- model.h5
+-- model.h5

Add a single file

The proceeding code snippet demonstrates how to add a single, local file to your artifact:

# Add a single file
artifact.add_file(local_path="path/file.format")

For example, suppose you had a file called 'file.txt' in your working local directory.

artifact.add_file("path/file.txt")  # Added as `file.txt'

The artifact now has the following content:

file.txt

Optionally, pass the desired path within the artifact for the name parameter.

artifact.add_file(local_path="path/file.format", name="new/path/file.format")

The artifact is stored as:

new/path/file.txt

API Call	Resulting artifact
`artifact.add_file('model.h5')`	model.h5
`artifact.add_file('checkpoints/model.h5')`	model.h5
`artifact.add_file('model.h5', name='models/mymodel.h5')`	models/mymodel.h5

Add multiple files

The proceeding code snippet demonstrates how to add an entire, local directory to your artifact:

# Recursively add a directory
artifact.add_dir(local_path="path/file.format", name="optional-prefix")

The proceeding API calls produce the proceeding artifact content:

API Call Resulting artifact

API Call	Resulting artifact
`artifact.add_dir('images')`	`cat.png` `dog.png`
`artifact.add_dir('images', name='images')`	`images/cat.png` `images/dog.png`
`artifact.new_file('hello.txt')`	`hello.txt`

artifact.add_dir('images')

cat.png

dog.png

artifact.add_dir('images', name='images')

images/cat.png

images/dog.png

artifact.new_file('hello.txt') hello.txt

Add a URI reference

Artifacts track checksums and other information for reproducibility if the URI has a scheme that W&B library knows how to handle.

Add an external URI reference to an artifact with the add_reference method. Replace the 'uri' string with your own URI. Optionally pass the desired path within the artifact for the name parameter.

# Add a URI reference
artifact.add_reference(uri="uri", name="optional-name")

Artifacts currently support the following URI schemes:

http(s)://: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the ETag and Content-Length response headers.
s3://: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.
gs://: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects.

The proceeding API calls will produce the proceeding artifacts:

API call	Resulting artifact contents
`artifact.add_reference('s3://my-bucket/model.h5')`	`model.h5`
`artifact.add_reference('s3://my-bucket/checkpoints/model.h5')`	`model.h5`
`artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5')`	`models/mymodel.h5`
`artifact.add_reference('s3://my-bucket/images')`	`cat.png` `dog.png`
`artifact.add_reference('s3://my-bucket/images', name='images')`	`images/cat.png` `images/dog.png`

Add files to artifacts from parallel runs

For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact.

import wandb
import time

# We will use ray to launch our runs in parallel
# for demonstration purposes. You can orchestrate
# your parallel runs however you want.
import ray

ray.init()

artifact_type = "dataset"
artifact_name = "parallel-artifact"
table_name = "distributed_table"
parts_path = "parts"
num_parallel = 5

# Each batch of parallel writers should have its own
# unique group name.
group_name = "writer-group-{}".format(round(time.time()))


@ray.remote
def train(i):
    """
    Our writer job. Each writer will add one image to the artifact.
    """
    with wandb.init(group=group_name) as run:
        artifact = wandb.Artifact(name=artifact_name, type=artifact_type)

        # Add data to a wandb table. In this case we use example data
        table = wandb.Table(columns=["a", "b", "c"], data=[[i, i * 2, 2**i]])

        # Add the table to folder in the artifact
        artifact.add(table, "{}/table_{}".format(parts_path, i))

        # Upserting the artifact creates or appends data to the artifact
        run.upsert_artifact(artifact)


# Launch your runs in parallel
result_ids = [train.remote(i) for i in range(num_parallel)]

# Join on all the writers to make sure their files have
# been added before finishing the artifact.
ray.get(result_ids)

# Once all the writers are finished, finish the artifact
# to mark it ready.
with wandb.init(group=group_name) as run:
    artifact = wandb.Artifact(artifact_name, type=artifact_type)

    # Create a "PartitionTable" pointing to the folder of tables
    # and add it to the artifact.
    artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name)

    # Finish artifact finalizes the artifact, disallowing future "upserts"
    # to this version.
    run.finish_artifact(artifact)

4.1.2 - Download and use artifacts

Download and use Artifacts from multiple projects.

Download and use an artifact that is already stored on the W&B server or construct an artifact object and pass it in to for de-duplication as necessary.

Team members with view-only seats cannot download artifacts.

Download and use an artifact stored on W&B

Download and use an artifact stored in W&B either inside or outside of a W&B Run. Use the Public API (wandb.Api) to export (or update data) already saved in W&B. For more information, see the W&B Public API Reference guide.

First, import the W&B Python SDK. Next, create a W&B Run:

import wandb

run = wandb.init(project="<example>", job_type="<job-type>")

Indicate the artifact you want to use with the use_artifact method. This returns a run object. In the proceeding code snippet specifies an artifact called 'bike-dataset' with the alias 'latest':

artifact = run.use_artifact("bike-dataset:latest")

Use the object returned to download all the contents of the artifact:

datadir = artifact.download()

You can optionally pass a path to the root parameter to download the contents of the artifact to a specific directory. For more information, see the Python SDK Reference Guide.

Use the get_path method to download only subset of files:

path = artifact.get_path(name)

This fetches only the file at the path name. It returns an Entry object with the following methods:

Entry.download: Downloads file from the artifact at path name
Entry.ref: If add_reference stored the entry as a reference, returns the URI

References that have schemes that W&B knows how to handle get downloaded just like artifact files. For more information, see Track external files.

First, import the W&B SDK. Next, create an artifact from the Public API Class. Provide the entity, project, artifact, and alias associated with that artifact:

import wandb

api = wandb.Api()
artifact = api.artifact("entity/project/artifact:alias")

Use the object returned to download the contents of the artifact:

artifact.download()

You can optionally pass a path the root parameter to download the contents of the artifact to a specific directory. For more information, see the API Reference Guide.

Use the wandb artifact get command to download an artifact from the W&B server.

$ wandb artifact get project/artifact:alias --root mnist/

Partially download an artifact

You can optionally download part of an artifact based on a prefix. Using the path_prefix parameter, you can download a single file or the content of a sub-folder.

artifact = run.use_artifact("bike-dataset:latest")

artifact.download(path_prefix="bike.png") # downloads only bike.png

Alternatively, you can download files from a certain directory:

artifact.download(path_prefix="images/bikes/") # downloads files in the images/bikes directory

Use an artifact from a different project

Specify the name of artifact along with its project name to reference an artifact. You can also reference artifacts across entities by specifying the name of the artifact with its entity name.

The following code example demonstrates how to query an artifact from another project as input to the current W&B run.

import wandb

run = wandb.init(project="<example>", job_type="<job-type>")
# Query W&B for an artifact from another project and mark it
# as an input to this run.
artifact = run.use_artifact("my-project/artifact:alias")

# Use an artifact from another entity and mark it as an input
# to this run.
artifact = run.use_artifact("my-entity/my-project/artifact:alias")

Construct and use an artifact simultaneously

Simultaneously construct and use an artifact. Create an artifact object and pass it to use_artifact. This creates an artifact in W&B if it does not exist yet. The use_artifact API is idempotent, so you can call it as many times as you like.

import wandb

artifact = wandb.Artifact("reference model")
artifact.add_file("model.h5")
run.use_artifact(artifact)

For more information about constructing an artifact, see Construct an artifact.

4.1.3 - Update an artifact

Update an existing Artifact inside and outside of a W&B Run.

Pass desired values to update the description, metadata, and alias of an artifact. Call the save() method to update the artifact on the W&B servers. You can update an artifact during a W&B Run or outside of a Run.

Use the W&B Public API (wandb.Api) to update an artifact outside of a run. Use the Artifact API (wandb.Artifact) to update an artifact during a run.

You can not update the alias of artifact linked to a model in Model Registry.

The proceeding code example demonstrates how to update the description of an artifact using the wandb.Artifact API:

import wandb

run = wandb.init(project="<example>")
artifact = run.use_artifact("<artifact-name>:<alias>")
artifact.description = "<description>"
artifact.save()

The proceeding code example demonstrates how to update the description of an artifact using the wandb.Api API:

import wandb

api = wandb.Api()

artifact = api.artifact("entity/project/artifact:alias")

# Update the description
artifact.description = "My new description"

# Selectively update metadata keys
artifact.metadata["oldKey"] = "new value"

# Replace the metadata entirely
artifact.metadata = {"newKey": "new value"}

# Add an alias
artifact.aliases.append("best")

# Remove an alias
artifact.aliases.remove("latest")

# Completely replace the aliases
artifact.aliases = ["replaced"]

# Persist all artifact modifications
artifact.save()

For more information, see the Weights and Biases Artifact API.

You can also update an Artifact collection in the same way as a singular artifact:

import wandb
run = wandb.init(project="<example>")
api = wandb.Api()
artifact = api.artifact_collection(type="<type-name>", collection="<collection-name>")
artifact.name = "<new-collection-name>"
artifact.description = "<This is where you'd describe the purpose of your collection.>"
artifact.save()

For more information, see the Artifacts Collection reference.

4.1.4 - Create an artifact alias

Create custom aliases for W&B Artifacts.

Use aliases as pointers to specific versions. By default, Run.log_artifact adds the latest alias to the logged version.

An artifact version v0 is created and attached to your artifact when you log an artifact for the first time. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.

For example, if you want your training script to pull the most recent version of a dataset, specify latest when you use that artifact. The proceeding code example demonstrates how to download a recent dataset artifact named bike-dataset that has an alias, latest:

import wandb

run = wandb.init(project="<example-project>")

artifact = run.use_artifact("bike-dataset:latest")

artifact.download()

You can also apply a custom alias to an artifact version. For example, if you want to mark that model checkpoint is the best on the metric AP-50, you could add the string 'best-ap50' as an alias when you log the model artifact.

artifact = wandb.Artifact("run-3nq3ctyy-bike-model", type="model")
artifact.add_file("model.h5")
run.log_artifact(artifact, aliases=["latest", "best-ap50"])

4.1.5 - Create an artifact version

Create a new artifact version from a single run or from a distributed process.

Create a new artifact version with a single run or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an incremental artifact.

We recommend that you create an incremental artifact when you need to apply changes to a subset of files in an artifact, where the size of the original artifact is significantly larger.

Create new artifact versions from scratch

There are two ways to create a new artifact version: from a single run and from distributed runs. They are defined as follows:

Single run: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis.
Distributed runs: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions.

W&B will create a new artifact and assign it a v0 alias if you pass a name to the wandb.Artifact API that does not exist in your project. W&B checksums the contents when you log again to the same artifact. If the artifact changed, W&B saves a new version v1.

W&B will retrieve an existing artifact if you pass a name and artifact type to the wandb.Artifact API that matches an existing artifact in your project. The retrieved artifact will have a version greater than 1.

Single run

Log a new version of an Artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact.

Based on your use case, select one of the tabs below to create a new artifact version inside or outside of a run:

Create an artifact version within a W&B run:

Create a run with wandb.init.
Create a new artifact or retrieve an existing one with wandb.Artifact.
Add files to the artifact with .add_file.
Log the artifact to the run with .log_artifact.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Create an artifact version outside of a W&B run:

Create a new artifact or retrieve an existing one with wanb.Artifact.
Add files to the artifact with .add_file.
Save the artifact with .save.

artifact = wandb.Artifact("artifact_name", "artifact_type")
# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
artifact.save()

Distributed runs

Allow a collection of runs to collaborate on a version before committing it. This is in contrast to single run mode described above where one run provides all the data for a new version.

Each run in the collection needs to be aware of the same unique ID (called distributed_id) in order to collaborate on the same version. By default, if present, W&B uses the run’s group as set by wandb.init(group=GROUP) as the distributed_id.
There must be a final run that “commits” the version, permanently locking its state.
Use upsert_artifact to add to the collaborative artifact and finish_artifact to finalize the commit.

Consider the following example. Different runs (labelled below as Run 1, Run 2, and Run 3) add a different image file to the same artifact with upsert_artifact.

Run 1:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 2:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image2.png")
    run.upsert_artifact(artifact, distributed_id="my_dist_artifact")

Run 3

Must run after Run 1 and Run 2 complete. The Run that calls finish_artifact can include files in the artifact, but does not need to.

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")
    # Add Files and Assets to the artifact
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image3.png")
    run.finish_artifact(artifact, distributed_id="my_dist_artifact")

Create a new artifact version from an existing version

Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn’t change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an incremental artifact.

Here are some scenarios for each type of incremental change you might encounter:

add: you periodically add a new subset of files to a dataset after collecting a new batch.
remove: you discovered several duplicate files and want to remove them from your artifact.
update: you corrected annotations for a subset of files and want to replace the old files with the correct ones.

You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you will need to have all the contents of your artifact on your local disk. When making an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version.

You can create an incremental artifact within a single run or with a set of runs (distributed mode).

Follow the procedure below to incrementally change an artifact:

Obtain the artifact version you want to perform an incremental change on:

saved_artifact = run.use_artifact("my_artifact:latest")

client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")

Create a draft with:

draft_artifact = saved_artifact.new_draft()

Perform any incremental changes you want to see in the next version. You can either add, remove, or modify an existing entry.

Select one of the tabs for an example on how to perform each of these changes:

Add a file to an existing artifact version with the add_file method:

draft_artifact.add_file("file_to_add.txt")

You can also add multiple files by adding a directory with the add_dir method.

Remove a file from an existing artifact version with the remove method:

draft_artifact.remove("file_to_remove.txt")

You can also remove multiple files with the remove method by passing in a directory path.

Modify or replace contents by removing the old contents from the draft and adding the new contents back in:

draft_artifact.remove("modified_file.txt")
draft_artifact.add_file("modified_file.txt")

Lastly, log or save your changes. The following tabs show you how to save your changes inside and outside of a W&B run. Select the tab that is appropriate for your use case:

run.log_artifact(draft_artifact)

draft_artifact.save()

Putting it all together, the code examples above look like:

with wandb.init(job_type="modify dataset") as run:
    saved_artifact = run.use_artifact(
        "my_artifact:latest"
    )  # fetch artifact and input it into your run
    draft_artifact = saved_artifact.new_draft()  # create a draft version

    # modify a subset of files in the draft version
    draft_artifact.add_file("file_to_add.txt")
    draft_artifact.remove("dir_to_remove/")
    run.log_artifact(
        draft_artifact
    )  # log your changes to create a new version and mark it as output to your run

client = wandb.Api()
saved_artifact = client.artifact("my_artifact:latest")  # load your artifact
draft_artifact = saved_artifact.new_draft()  # create a draft version

# modify a subset of files in the draft version
draft_artifact.remove("deleted_file.txt")
draft_artifact.add_file("modified_file.txt")
draft_artifact.save()  # commit changes to the draft

4.1.6 - Track external files

Track files saved in an external bucket, HTTP file server, or an NFS share.

Use reference artifacts to track and use files saved outside of W&B servers, for example in CoreWeave AI Object Storage, an Amazon Simple Storage Service (Amazon S3) bucket, GCS bucket, Azure blob, HTTP file server, or NFS share.

W&B logs metadata about the the object, such as the object’s ETag and size. If object versioning is enabled on the bucket, the version ID is also logged.

If you log an artifact that does not track external files, W&B saves the artifact’s files to W&B servers. This is the default behavior when you log artifacts with the W&B Python SDK.

See the Artifacts quickstart for information on how to save files and directories to W&B servers instead.

The following describes how to construct reference artifacts.

Track an artifact in an external bucket

Use the W&B Python SDK to track references to files stored outside of W&B.

Initialize a run with wandb.init().
Create an artifact object with wandb.Artifact().
Specify the reference to the bucket path with the artifact object’s add_reference() method.
Log the artifact’s metadata with run.log_artifact().

import wandb

# Initialize a W&B run
run = wandb.init()

# Create an artifact object
artifact = wandb.Artifact(name="name", type="type")

# Add a reference to the bucket path
artifact.add_reference(uri = "uri/to/your/bucket/path")

# Log the artifact's metadata
run.log_artifact(artifact)
run.finish()

Suppose your bucket has the following directory structure:

s3://my-bucket

|datasets/
  |---- mnist/
|models/
  |---- cnn/

The datasets/mnist/ directory contains a collection of images. Track the directory as a dataset with wandb.Artifact.add_reference(). The following code sample creates a reference artifact mnist:latest using the artifact object’s add_reference() method.

import wandb

run = wandb.init()
artifact = wandb.Artifact(name="mnist", type="dataset")
artifact.add_reference(uri="s3://my-bucket/datasets/mnist")
run.log_artifact(artifact)
run.finish()

Within the W&B App, you can look through the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. The W&B App does not render rich media such as images, audio, and so forth because the data itself is not contained within the artifact.

W&B Artifacts support any Amazon S3 compatible interface, including MinIO. The scripts described below work as-is with MinIO, when you set the AWS_S3_ENDPOINT_URL environment variable to point at your MinIO server.

By default, W&B imposes a 10,000 object limit when adding an object prefix. You can adjust this limit by specifying max_objects= when you call add_reference().

Download an artifact from an external bucket

W&B retrieves the files from the underlying bucket when it downloads a reference artifact using the metadata recorded when the artifact is logged. If your bucket has object versioning enabled, W&B retrieves the object version that corresponds to the state of the file at the time an artifact was logged. As you evolve the contents of your bucket, you can always point to the exact version of your data a given model was trained on, because the artifact serves as a snapshot of your bucket during the training run.

The following code sample shows how to download a reference artifact. The the APIs for downloading artifacts are the same for both reference and non-reference artifacts:

import wandb

run = wandb.init()
artifact = run.use_artifact("mnist:latest", type="dataset")
artifact_dir = artifact.download()

W&B recommends that you enable ‘Object Versioning’ on your storage buckets if you overwrite files as part of your workflow. With versioning enabled on your buckets, artifacts with references to files that have been overwritten will still be intact because the older object versions are retained.

Based on your use case, read the instructions to enable object versioning: AWS, GCP, Azure.

Add and download an external reference example

The following code sample uploads a dataset to an Amazon S3 bucket, tracks it with a reference artifact, then downloads it:

import boto3
import wandb

run = wandb.init()

# Training here...

s3_client = boto3.client("s3")
s3_client.upload_file(file_name="my_model.h5", bucket="my-bucket", object_name="models/cnn/my_model.h5")

# Log the model artifact
model_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("s3://my-bucket/models/cnn/")
run.log_artifact(model_artifact)

At a later point, you can download the model artifact. Specify the name of the artifact and its type:

import wandb

run = wandb.init()
artifact = run.use_artifact(artifact_or_name = "cnn", type="model")
datadir = artifact.download()

See the following reports for an end-to-end walkthrough on how to track artifacts by reference for GCP or Azure:

Cloud storage credentials

W&B uses the default mechanism to look for credentials based on the cloud provider you use. Read the documentation from your cloud provider to learn more about the credentials used:

Cloud provider	Credentials Documentation
CoreWeave AI Object Storage	CoreWeave AI Object Storage documentation
AWS	Boto3 documentation
GCP	Google Cloud documentation
Azure	Azure documentation

For AWS, if the bucket is not located in the configured user’s default region, you must set the AWS_REGION environment variable to match the bucket region.

Rich media such as images, audio, video, and point clouds may fail to render in the App UI depending on the CORS configuration of your bucket. Allow listing app.wandb.ai in your bucket’s CORS settings will allow the App UI to properly render such rich media.

If rich media such as images, audio, video, and point clouds does not render in the App UI, ensure that app.wandb.ai is allowlisted in your bucket’s CORS policy.

Track an artifact in a filesystem

Another common pattern for fast access to datasets is to expose an NFS mount point to a remote filesystem on all machines running training jobs. This can be an even simpler solution than a cloud storage bucket because from the perspective of the training script, the files look just like they are sitting on your local filesystem. Luckily, that ease of use extends into using Artifacts to track references to file systems, whether they are mounted or not.

Suppose you have a filesystem mounted at /mount with the following structure:

mount
|datasets/
		|-- mnist/
|models/
		|-- cnn/

Within mnist/ is a dataset, a collection of images. You can track it with an artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")
run.log_artifact(artifact)

By default, W&B imposes a 10,000 file limit when adding a reference to a directory. You can adjust this limit by specifying max_objects= when you call add_reference().

Note the triple slash in the URL. The first component is the file:// prefix that denotes the use of filesystem references. The second component begins the path to the dataset, /mount/datasets/mnist/.

The resulting artifact mnist:latest looks and acts like a regular artifact. The only difference is that the artifact only consists of metadata about the files, such as their sizes and MD5 checksums. The files themselves never leave your system.

You can interact with this artifact just as you would a normal artifact. In the UI, you can browse the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. However, the UI cannot render rich media such as images, audio, because the data itself is not contained within the artifact.

Downloading a reference artifact:

import wandb

run = wandb.init()
artifact = run.use_artifact("entity/project/mnist:latest", type="dataset")
artifact_dir = artifact.download()

For a filesystem reference, a download() operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of /mount/datasets/mnist are copied into the directory artifacts/mnist:v0/. If an artifact contains a reference to a file that was overwritten, then download() will throw an error because the artifact can no longer be reconstructed.

Putting it all together, you can use the following code to track a dataset under a mounted filesystem that feeds into a training job:

import wandb

run = wandb.init()

artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("file:///mount/datasets/mnist/")

# Track the artifact and mark it as an input to
# this run in one swoop. A new artifact version
# is only logged if the files under the directory
# changed.
run.use_artifact(artifact)

artifact_dir = artifact.download()

# Perform training here...

To track a model, log the model artifact after the training script writes the model files to the mount point:

import wandb

run = wandb.init()

# Training here...

# Write model to disk

model_artifact = wandb.Artifact("cnn", type="model")
model_artifact.add_reference("file:///mount/cnn/my_model.h5")
run.log_artifact(model_artifact)

4.1.7 - Manage data

4.1.7.1 - Delete an artifact

Delete artifacts interactively with the App UI or programmatically with the W&B SDK/

Delete artifacts interactively with the App UI or programmatically with the W&B SDK. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage.

The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions.

The sections in this page describe how to delete specific artifact versions, how to delete an artifact collection, how to delete artifacts with and without aliases, and more. You can schedule when artifacts are deleted from W&B with TTL policies. For more information, see Manage data retention with Artifact TTL policy.

Artifacts that are scheduled for deletion with a TTL policy, deleted with the W&B SDK, or deleted with the W&B App UI are first soft-deleted. Artifacts that are soft deleted undergo garbage collection before they are hard-deleted.

Delete an artifact version

To delete an artifact version:

Select the name of the artifact. This will expand the artifact view and list all the artifact versions associated with that artifact.
From the list of artifacts, select the artifact version you want to delete.
On the right hand side of the workspace, select the kebab dropdown.
Choose Delete.

An artifact version can also be deleted programatically via the delete() method. See the examples below.

Delete multiple artifact versions with aliases

The following code example demonstrates how to delete artifacts that have aliases associated with them. Provide the entity, project name, and run ID that created the artifacts.

import wandb

run = api.run("entity/project/run_id")

for artifact in run.logged_artifacts():
    artifact.delete()

Set the delete_aliases parameter to the boolean value, True to delete aliases if the artifact has one or more aliases.

import wandb

run = api.run("entity/project/run_id")

for artifact in run.logged_artifacts():
    # Set delete_aliases=True in order to delete
    # artifacts with one more aliases
    artifact.delete(delete_aliases=True)

Delete multiple artifact versions with a specific alias

The proceeding code demonstrates how to delete multiple artifact versions that have a specific alias. Provide the entity, project name, and run ID that created the artifacts. Replace the deletion logic with your own:

import wandb

runs = api.run("entity/project_name/run_id")

# Delete artifact ith alias 'v3' and 'v4
for artifact_version in runs.logged_artifacts():
    # Replace with your own deletion logic.
    if artifact_version.name[-2:] == "v3" or artifact_version.name[-2:] == "v4":
        artifact.delete(delete_aliases=True)

Delete all versions of an artifact that do not have an alias

The following code snippet demonstrates how to delete all versions of an artifact that do not have an alias. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively. Replace the <> with the name of your artifact:

import wandb

# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})

artifact_type, artifact_name = "<>"  # provide type and name
for v in api.artifact_versions(artifact_type, artifact_name):
    # Clean up versions that don't have an alias such as 'latest'.
    # NOTE: You can put whatever deletion logic you want here.
    if len(v.aliases) == 0:
        v.delete()

Delete an artifact collection

To delete an artifact collection:

Navigate to the artifact collection you want to delete and hover over it.
Select the kebab dropdown next to the artifact collection name.
Choose Delete.

You can also delete artifact collection programmatically with the delete() method. Provide the name of the project and entity for the project and entity keys in wandb.Api, respectively:

import wandb

# Provide your entity and a project name when you
# use wandb.Api methods.
api = wandb.Api(overrides={"project": "project", "entity": "entity"})
collection = api.artifact_collection(
    "<artifact_type>", "entity/project/artifact_collection_name"
)
collection.delete()

How to enable garbage collection based on how W&B is hosted

Garbage collection is enabled by default if you use W&B’s shared cloud. Based on how you host W&B, you might need to take additional steps to enable garbage collection, this includes:

Set the GORILLA_ARTIFACT_GC_ENABLED environment variable to true: GORILLA_ARTIFACT_GC_ENABLED=true
Enable bucket versioning if you use AWS, GCP or any other storage provider such as Minio. If you use Azure, enable soft deletion.
Soft deletion in Azure is equivalent to bucket versioning in other storage providers.

The following table describes how to satisfy requirements to enable garbage collection based on your deployment type.

The X indicates you must satisfy the requirement:

	Environment variable	Enable versioning
Shared cloud
Shared cloud with secure storage connector		X
Dedicated cloud
Dedicated cloud with secure storage connector		X
Customer-managed cloud	X	X
Customer managed on-prem	X	X

note Secure storage connector is currently only available for Google Cloud Platform and Amazon Web Services.

4.1.7.2 - Manage artifact data retention

Time to live policies (TTL)

Try in Colab

Schedule when artifacts are deleted from W&B with a W&B Artifact time-to-live (TTL) policy. When you delete an artifact, W&B marks that artifact as a soft-delete. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W&B deletes artifacts, see the Delete artifacts page.

Watch a Managing data retention with Artifacts TTL video tutorial to learn how to manage data retention with Artifacts TTL in the W&B App.

W&B deactivates the option to set a TTL policy for model artifacts linked to the Model Registry. This is to help ensure that linked models do not accidentally expire if used in production workflows.

Only team admins can view a team’s settings and access team level TTL settings such as (1) permitting who can set or edit a TTL policy or (2) setting a team default TTL.
If you do not see the option to set or edit a TTL policy in an artifact’s details in the W&B App UI or if setting a TTL programmatically does not successfully change an artifact’s TTL property, your team admin has not given you permissions to do so.

Auto-generated Artifacts

Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W&B cannot have TTL policies set for them.

The following Artifact types indicate an auto-generated Artifact:

run_table
code
job
Any Artifact type starting with: wandb-*

You can check an Artifact’s type on the W&B platform or programmatically:

import wandb

run = wandb.init(project="<my-project-name>")
artifact = run.use_artifact(artifact_or_name="<my-artifact-name>")
print(artifact.type)

Replace the values enclosed with <> with your own.

Define who can edit and set TTL policies

Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions.

Only team admins can define who can set or edit a TTL policy.

Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
From the TTL permissions dropdown, select who can set and edit TTL policies.
Click on Review and save settings.
Confirm the changes and select Save settings.

Create a TTL policy

Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created.

For all the code snippets below, replace the content wrapped in <> with your information to use the code snippet.

Set a TTL policy when you create an artifact

Use the W&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days.

Defining a TTL policy when you create an artifact is similar to how you normally create an artifact. With the exception that you pass in a time delta to the artifact’s ttl attribute.

The steps are as follows:

Create an artifact.
Add content to the artifact such as files, a directory, or a reference.
Define a TTL time limit with the datetime.timedelta data type that is part of Python’s standard library.
Log the artifact.

The following code snippet demonstrates how to create an artifact and set a TTL policy.

import wandb
from datetime import timedelta

run = wandb.init(project="<my-project-name>", entity="<my-entity>")
artifact = wandb.Artifact(name="<artifact-name>", type="<type>")
artifact.add_file("<my_file>")

artifact.ttl = timedelta(days=30)  # Set TTL policy
run.log_artifact(artifact)

The preceding code snippet sets the TTL policy for the artifact to 30 days. In other words, W&B deletes the artifact after 30 days.

Set or edit a TTL policy after you create an artifact

Use the W&B App UI or the W&B Python SDK to define a TTL policy for an artifact that already exists.

When you modify an artifact’s TTL, the time the artifact takes to expire is still calculated using the artifact’s createdAt timestamp.

Fetch your artifact.
Pass in a time delta to the artifact’s ttl attribute.
Update the artifact with the save method.

The following code snippet shows how to set a TTL policy for an artifact:

import wandb
from datetime import timedelta

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = timedelta(days=365 * 2)  # Delete in two years
artifact.save()

The preceding code example sets the TTL policy to two years.

Navigate to your W&B project in the W&B App UI.
Select the artifact icon on the left panel.
From the list of artifacts, expand the artifact type you
Select on the artifact version you want to edit the TTL policy for.
Click on the Version tab.
From the dropdown, select Edit TTL policy.
Within the modal that appears, select Custom from the TTL policy dropdown.
Within the TTL duration field, set the TTL policy in units of days.
Select the Update TTL button to save your changes.

Set default TTL policies for a team

Only team admins can set a default TTL policy for a team.

Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team’s default TTL.

Navigate to your team’s profile page.
Select the Settings tab.
Navigate to the Artifacts time-to-live (TTL) section.
Click on the Set team’s default TTL policy.
Within the Duration field, set the TTL policy in units of days.
Click on Review and save settings. 7/ Confirm the changes and then select Save settings.

Set a TTL policy outside of a run

Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days.

The following code sample shows how to fetch an artifact using the public API and set the TTL policy.

api = wandb.Api()

artifact = api.artifact("entity/project/artifact:alias")

artifact.ttl = timedelta(days=365)  # Delete in one year

artifact.save()

Deactivate a TTL policy

Use the W&B Python SDK or W&B App UI to deactivate a TTL policy for a specific artifact version.

Fetch your artifact.
Set the artifact’s ttl attribute to None.
Update the artifact with the save method.

The following code snippet shows how to turn off a TTL policy for an artifact:

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
artifact.ttl = None
artifact.save()

Navigate to your W&B project in the W&B App UI.
Select the artifact icon on the left panel.
From the list of artifacts, expand the artifact type you
Select on the artifact version you want to edit the TTL policy for.
Click on the Version tab.
Click on the meatball UI icon next to the Link to registry button.
From the dropdown, select Edit TTL policy.
Within the modal that appears, select Deactivate from the TTL policy dropdown.
Select the Update TTL button to save your changes.

View TTL policies

View TTL policies for artifacts with the Python SDK or with the W&B App UI.

Use a print statement to view an artifact’s TTL policy. The following example shows how to retrieve an artifact and view its TTL policy:

artifact = run.use_artifact("<my-entity/my-project/my-artifact:alias>")
print(artifact.ttl)

View a TTL policy for an artifact with the W&B App UI.

Navigate to the W&B App.
Go to your W&B Project.
Within your project, select the Artifacts tab in the left sidebar.
Click on a collection.

Within the collection view you can see all of the artifacts in the selected collection. Within the Time to Live column you will see the TTL policy assigned to that artifact.

4.1.7.3 - Manage artifact storage and memory allocation

Manage storage, memory allocation of W&B Artifacts.

W&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit.

For sensitive files, we recommend you set up Private Hosting or use reference artifacts.

During training, W&B locally saves logs, artifacts, and configuration files in the following local directories:

File	Default location	To change default location set:
logs	`./wandb`	`dir` in `wandb.init` or set the `WANDB_DIR` environment variable
artifacts	`~/.cache/wandb`	the `WANDB_CACHE_DIR` environment variable
configs	`~/.config/wandb`	the `WANDB_CONFIG_DIR` environment variable
staging artifacts for upload	`~/.cache/wandb-data/`	the `WANDB_DATA_DIR` environment variable
downloaded artifacts	`./artifacts`	the `WANDB_ARTIFACT_DIR` environment variable

For a complete guide to using environment variables to configure W&B, see the environment variables reference.

Depending on the machine on wandb is initialized on, these default folders may not be located in a writeable part of the file system. This might trigger an error.

Clean up local artifact cache

W&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the wandb artifact cache cleanup command to prune the cache and to remove any files that have not been used recently.

The proceeding code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal:

$ wandb artifact cache cleanup 1GB

4.1.8 - Explore artifact graphs

Traverse automatically created direct acyclic W&B Artifact graphs.

W&B automatically tracks the artifacts a given run logged as well as the artifacts a given run uses. These artifacts can include datasets, models, evaluation results, or more. You can explore an artifact’s lineage to track and manage the various artifacts produced throughout the machine learning lifecycle.

Lineage

Tracking an artifact’s lineage has several key benefits:

Reproducibility: By tracking the lineage of all artifacts, teams can reproduce experiments, models, and results, which is essential for debugging, experimentation, and validating machine learning models.
Version Control: Artifact lineage involves versioning artifacts and tracking their changes over time. This allows teams to roll back to previous versions of data or models if needed.
Auditing: Having a detailed history of the artifacts and their transformations enables organizations to comply with regulatory and governance requirements.
Collaboration and Knowledge Sharing: Artifact lineage facilitates better collaboration among team members by providing a clear record of attempts as well as what worked, and what didn’t. This helps in avoiding duplication of efforts and accelerates the development process.

Finding an artifact’s lineage

When selecting an artifact in the Artifacts tab, you can see your artifact’s lineage. This graph view shows a general overview of your pipeline.

To view an artifact graph:

Navigate to your project in the W&B App UI
Choose the artifact icon on the left panel.
Select Lineage.

Navigating the lineage graph

The artifact or job type you provide appears in front of its name, with artifacts represented by blue icons and runs represented by green icons. Arrows detail the input and output of a run or artifact on the graph.

You can view the type and the name of artifact in both the left sidebar and in the Lineage tab.

For a more detailed view, click any individual artifact or run to get more information on a particular object.

Artifact clusters

When a level of the graph has five or more runs or artifacts, it creates a cluster. A cluster has a search bar to find specific versions of runs or artifacts and pulls an individual node from a cluster to continue investigating the lineage of a node inside a cluster.

Clicking on a node opens a preview with an overview of the node. Clicking on the arrow extracts the individual run or artifact so you can examine the lineage of the extracted node.

Use the API to track lineage

You can also navigate a graph using the W&B API.

Create an artifact. First, create a run with wandb.init. Then,create a new artifact or retrieve an existing one with wandb.Artifact. Next, add files to the artifact with .add_file. Finally, log the artifact to the run with .log_artifact. The finished code looks something like this:

with wandb.init() as run:
    artifact = wandb.Artifact("artifact_name", "artifact_type")

    # Add Files and Assets to the artifact using
    # `.add`, `.add_file`, `.add_dir`, and `.add_reference`
    artifact.add_file("image1.png")
    run.log_artifact(artifact)

Use the artifact object’s logged_by and used_by methods to walk the graph from the artifact:

# Walk up and down the graph from an artifact:
producer_run = artifact.logged_by()
consumer_runs = artifact.used_by()

Next steps

4.1.9 - Artifact data privacy and compliance

Learn where W&B files are stored by default. Explore how to save, store sensitive information.

Files are uploaded to Google Cloud bucket managed by W&B when you log artifacts. The contents of the bucket are encrypted both at rest and in transit. Artifact files are only visible to users who have access to the corresponding project.

When you delete a version of an artifact, it is marked for soft deletion in our database and removed from your storage cost. When you delete an entire artifact, it is queued for permanently deletion and all of its contents are removed from the W&B bucket. If you have specific needs around file deletion please reach out to Customer Support.

For sensitive datasets that cannot reside in a multi-tenant environment, you can use either a private W&B server connected to your cloud bucket or reference artifacts. Reference artifacts track references to private buckets without sending file contents to W&B. Reference artifacts maintain links to files on your buckets or servers. In other words, W&B only keeps track of the metadata associated with the files and not the files themselves.

Create a reference artifact similar to how you create a non reference artifact:

import wandb

run = wandb.init()
artifact = wandb.Artifact("animals", type="dataset")
artifact.add_reference("s3://my-bucket/animals")

For alternatives, contact us at contact@wandb.com to talk about private cloud and on-premises installations.

4.1.10 - Tutorial: Create, track, and use a dataset artifact

Artifacts quickstart shows how to create, track, and use a dataset artifact with W&B.

This walkthrough demonstrates how to create, track, and use a dataset artifact from W&B Runs.

1. Log into W&B

Import the W&B library and log in to W&B. You will need to sign up for a free W&B account if you have not done so already.

import wandb

wandb.login()

2. Initialize a run

Use the wandb.init() API to generate a background process to sync and log data as a W&B Run. Provide a project name and a job type:

# Create a W&B Run. Here we specify 'dataset' as the job type since this example
# shows how to create a dataset artifact.
run = wandb.init(project="artifacts-example", job_type="upload-dataset")

3. Create an artifact object

Create an artifact object with the wandb.Artifact() API. Provide a name for the artifact and a description of the file type for the name and type parameters, respectively.

For example, the following code snippet demonstrates how to create an artifact called ‘bicycle-dataset’ with a ‘dataset’ label:

artifact = wandb.Artifact(name="bicycle-dataset", type="dataset")

For more information about how to construct an artifact, see Construct artifacts.

Add the dataset to the artifact

Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named dataset.h5 that is saved locally on our machine to the artifact:

# Add a file to the artifact's contents
artifact.add_file(local_path="dataset.h5")

Replace the filename dataset.h5 in the preceding code snippet with the path to the file you want to add to the artifact.

4. Log the dataset

Use the W&B run objects log_artifact() method to both save your artifact version and declare the artifact as an output of the run.

# Save the artifact version to W&B and mark it
# as the output of this run
run.log_artifact(artifact)

A 'latest' alias is created by default when you log an artifact. For more information about artifact aliases and versions, see Create a custom alias and Create new artifact versions, respectively.

5. Download and use the artifact

The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W&B servers.

First, initialize a new run object with wandb.init().
Second, use the run objects use_artifact() method to tell W&B what artifact to use. This returns an artifact object.
Third, use the artifacts download() method to download the contents of the artifact.

# Create a W&B Run. Here we specify 'training' for 'type'
# because we will use this run to track training.
run = wandb.init(project="artifacts-example", job_type="training")

# Query W&B for an artifact and mark it as input to this run
artifact = run.use_artifact("bicycle-dataset:latest")

# Download the artifact's contents
artifact_dir = artifact.download()

Alternatively, you can use the Public API (wandb.Api) to export (or update data) data already saved in a W&B outside of a Run. See Track external files for more information.

4.2 - Secrets

Overview of W&B secrets, how they work, and how to get started using them.

W&B Secret Manager allows you to securely and centrally store, manage, and inject secrets, which are sensitive strings such as access tokens, bearer tokens, API keys, or passwords. W&B Secret Manager removes the need to add sensitive strings directly to your code or when configuring a webhook’s header or payload.

Secrets are stored and managed in each team’s Secret Manager, in the Team secrets section of the team settings.

Only W&B Admins can create, edit, or delete a secret.
Secrets are included as a core part of W&B, including in W&B Server deployments that you host in Azure, GCP, or AWS. Connect with your W&B account team to discuss how you can use secrets in W&B if you use a different deployment type.
In W&B Server, you are responsible for configuring security measures that satisfy your security needs.
- W&B strongly recommends that you store secrets in a W&B instance of a cloud provider’s secrets manager provided by AWS, GCP, or Azure, which are configured with advanced security capabilities.
- W&B recommends against using a Kubernetes cluster as the backend of your secrets store unless you are unable to use a W&B instance of a cloud secrets manager (AWS, GCP, or Azure), and you understand how to prevent security vulnerabilities that can occur if you use a cluster.

Add a secret

To add a secret:

If the receiving service requires it to authenticate incoming webhooks, generate the required token or API key. If necessary, save the sensitive string securely, such as in a password manager.
Log in to W&B and go to the team’s Settings page.
In the Team Secrets section, click New secret.
Using letters, numbers, and underscores (_), provide a name for the secret.
Paste the sensitive string into the Secret field.
Click Add secret.

Specify the secrets you want to use for your webhook automation when you configure the webhook. See the Configure a webhook section for more information.

Once you create a secret, you can access that secret in a webhook automation’s payload using the format ${SECRET_NAME}.

Rotate a secret

To rotate a secret and update its value:

Click the pencil icon in the secret’s row to open the secret’s details.
Set Secret to the new value. Optionally click Reveal secret to verify the new value.
Click Add secret. The secret’s value updates and no longer resolves to the previous value.

After a secret is created or updated, you can no longer reveal its current value. Instead, rotate the secret to a new value.

Delete a secret

To delete a secret:

Click the trash icon in the secret’s row.
Read the confirmation dialog, then click Delete. The secret is deleted immediately and permanently.

Manage access to secrets

A team’s automations can use the team’s secrets. Before you remove a secret, update or remove automations that use it so they don’t stop working.

4.3 - Registry

Try in Colab

W&B Registry is a curated central repository of W&B Artifact versions within your organization. Users who have permission within your organization can download and use artifacts, share, and collaboratively manage the lifecycle of all artifacts, regardless of the team that user belongs to.

You can use the Registry to track artifact versions, audit the history of an artifact’s usage and changes, ensure governance and compliance of your artifacts, and automate downstream processes such as model CI/CD.

In summary, use W&B Registry to:

Promote artifact versions that satisfy a machine learning task to other users in your organization.
Organize artifacts with tags so that you can find or reference specific artifacts.
Track an artifact’s lineage and audit the history of changes.
Automate downstream processes such as model CI/CD.
Limit who in your organization can access artifacts in each registry.

The preceding image shows the Registry App with “Model” and “Dataset” core registries along with custom registries.

Learn the basics

Each organization initially contains two registries that you can use to organize your model and dataset artifacts called Models and Datasets, respectively. You can create additional registries to organize other artifact types based on your organization’s needs.

Each registry consists of one or more collections. Each collection represents a distinct task or use case.

To add an artifact to a registry, you first log a specific artifact version to W&B. Each time you log an artifact, W&B automatically assigns a version to that artifact. Artifact versions use 0 indexing, so the first version is v0, the second version is v1, and so on.

Once you log an artifact to W&B, you can then link that specific artifact version to a collection in the registry.

The term “link” refers to pointers that connect where W&B stores the artifact and where the artifact is accessible in the registry. W&B does not duplicate artifacts when you link an artifact to a collection.

As an example, the proceeding code example shows how to log and link a model artifact called “my_model.txt” to a collection named “first-collection” in the core registry:

Initialize a W&B Run.
Log the artifact to W&B.
Specify the name of the collection and registry to link your artifact version to.
Link the artifact to the collection.

Save this Python code to a script and run it. W&B Python SDK version 0.18.6 or newer is required.

import wandb
import random

# Initialize a W&B Run to track the artifact
run = wandb.init(project="registry_quickstart") 

# Create a simulated model file so that you can log it
with open("my_model.txt", "w") as f:
   f.write("Model: " + str(random.random()))

# Log the artifact to W&B
logged_artifact = run.log_artifact(
    artifact_or_path="./my_model.txt", 
    name="gemma-finetuned", 
    type="model" # Specifies artifact type
)

# Specify the name of the collection and registry
# you want to publish the artifact to
COLLECTION_NAME = "first-collection"
REGISTRY_NAME = "model"

# Link the artifact to the registry
run.link_artifact(
    artifact=logged_artifact, 
    target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"
)

W&B automatically creates a collection for you if the collection you specify in the returned run object’s link_artifact(target_path = "") method does not exist within the registry you specify.

The URL that your terminal prints directs you to the project where W&B stores your artifact.

Navigate to the Registry App to view artifact versions that you and other members of your organization publish. To do so, first navigate to W&B. Select Registry in the left sidebar below Applications. Select the “Model” registry. Within the registry, you should see the “first-collection” collection with your linked artifact version.

Once you link an artifact version to a collection within a registry, members of your organization can view, download, and manage your artifact versions, create downstream automations, and more if they have the proper permissions.

If an artifact version logs metrics (such as by using run.log_artifact()), you can view metrics for that version from its details page, and you can compare metrics across artifact versions from the collection’s page. Refer to View linked artifacts in a registry.

Enable W&B Registry

Based on your deployment type, satisfy the following conditions to enable W&B Registry:

Deployment type	How to enable
Multi-tenant Cloud	No action required. W&B Registry is available on the W&B App.
Dedicated Cloud	Contact your account team to enable W&B Registry for your deployment.
Self-Managed	Set the environment variable `ENABLE_REGISTRY_UI` to `true`. Refer to Configure environment variables. Requires Server v0.59.2 or newer.

Resources to get started

Depending on your use case, explore the following resources to get started with the W&B Registry:

Check out the tutorial video:
- Getting started with Registry from W&B
Take the W&B Model CI/CD course and learn how to:
- Use W&B Registry to manage and version your artifacts, track lineage, and promote models through different lifecycle stages.
- Automate your model management workflows using webhooks.
- Integrate the registry with external ML systems and tools for model evaluation, monitoring, and deployment.

Migrate from the legacy Model Registry to W&B Registry

The legacy Model Registry is scheduled for deprecation with the exact date not yet decided. Before deprecating the legacy Model Registry, W&B will migrate the contents of the legacy Model Registry to the W&B Registry.

See Migrating from legacy Model Registry for more information about the migration process from the legacy Model Registry to W&B Registry.

Until the migration occurs, W&B supports both the legacy Model Registry and the new Registry.

To view the legacy Model Registry, navigate to the Model Registry in the W&B App. A banner appears at the top of the page that enables you to use the legacy Model Registry App UI.

Reach out to support@wandb.com with any questions or to speak to the W&B Product Team about any concerns about the migration.

4.3.1 - Registry types

W&B supports two types of registries: Core registries and Custom registries.

Core registry

A core registry is a template for specific use cases: Models and Datasets.

By default, the Models registry is configured to accept "model" artifact types and the Dataset registry is configured to accept "dataset" artifact types. An admin can add additional accepted artifact types.

The preceding image shows the Models and the Dataset core registry along with a custom registry called Fine_Tuned_Models in the W&B Registry App UI.

A core registry has organization visibility. A registry admin can not change the visibility of a core registry.

Custom registry

Custom registries are not restricted to "model" artifact types or "dataset" artifact types.

You can create a custom registry for each step in your machine learning pipeline, from initial data collection to final model deployment.

For example, you might create a registry called “Benchmark_Datasets” for organizing curated datasets to evaluate the performance of trained models. Within this registry, you might have a collection called “User_Query_Insurance_Answer_Test_Data” that contains a set of user questions and corresponding expert-validated answers that the model has never seen during training.

A custom registry can have either organization or restricted visibility. A registry admin can change the visibility of a custom registry from organization to restricted. However, the registry admin can not change a custom registry’s visibility from restricted to organizational visibility.

For information on how to create a custom registry, see Create a custom registry.

Summary

The proceeding table summarizes the differences between core and custom registries:

	Core	Custom
Visibility	Organizational visibility only. Visibility can not be altered.	Either organization or restricted. Visibility can be altered from organization to restricted visibility.
Metadata	Preconfigured and not editable by users.	Users can edit.
Artifact types	Preconfigured and accepted artifact types cannot be removed. Users can add additional accepted artifact types.	Admin can define accepted types.
Customization	Can add additional types to the existing list.	Edit registry name, description, visibility, and accepted artifact types.

4.3.2 - Create a custom registry

A custom registry offers flexibility and control over the artifact types that you can use, allows you to restrict the registry’s visibility, and more.

See the summary table in Registry types for a complete comparison of core and custom registries.

Create a custom registry

To create a custom registry:

Navigate to the Registry App at https://wandb.ai/registry/.
Within Custom registry, click on the Create registry button.
Provide a name for your registry in the Name field.
Optionally provide a description about the registry.
Select who can view the registry from the Registry visibility dropdown. See Registry visibility types for more information on registry visibility options.
Select either All types or Specify types from the Accepted artifacts type dropdown.
(If you select Specify types) Add one or more artifact types that your registry accepts.
Click on the Create registry button.

An artifact type cannot be removed from a registry once it is saved in the registry’s settings.

For example, the proceeding image shows a custom registry called Fine_Tuned_Models that a user is about to create. The registry is Restricted to only members that are manually added to the registry.

Visibility types

The visibility of a registry determines who can access that registry. Restricting the visibility of a custom registry helps ensure that only specified members can access that registry.

There are two type registry visibility options for a custom registry:

Visibility	Description
Restricted	Only invited organization members can access the registry.
Organization	Everyone in the org can access the registry.

A team administrator or registry administrator can set the visibility of a custom registry.

The user who creates a custom registry with Restricted visibility is added to the registry automatically as its registry admin.

Configure the visibility of a custom registry

A team administrator or registry administrator can assign the visibility of a custom registry during or after the creation of a custom registry.

To restrict the visibility of an existing custom registry:

Navigate to the Registry App at https://wandb.ai/registry/.
Select a registry.
Click on the gear icon on the upper right hand corner.
From the Registry visibility dropdown, select the desired registry visibility.
if you select Restricted visibility:
1. Add members of your organization that you want to have access to this registry. Scroll to the Registry members and roles section and click on the Add member button.
2. Within the Member field, add the email or username of the member you want to add.
3. Click Add new member.

Changing registry visibility settings from private to public or team-restricted access

See Create a custom registry for more information on how assign the visibility of a custom registry when a team administrator creates it.

4.3.3 - Configure registry access

A registry admin can configure registry roles, add users, or remove users from a registry by configuring the registry’s settings.

Manage users

Add a user or a team

Registry admins can add individual users or entire teams to a registry. To add a user or team to a registry:

Navigate to the Registry at https://wandb.ai/registry/.
Select the registry you want to add a user or team to.
Click on the gear icon on the upper right hand corner to access the registry settings.
In the Registry access section, click Add access.
Specify one or more user names, emails, or the team names to the Include users and teams field.
Click Add access.

Learn more about configuring user roles in a registry, or Registry role permissions .

Remove a user or team

A registry admin can remove individual users or entire teams from a registry. To remove a user or team from a registry:

Navigate to the Registry at https://wandb.ai/registry/.
Select the registry you want to remove a user from.
Click on the gear icon on the upper right hand corner to access the registry settings.
Navigate to the Registry access section and type in the username, email, or team you want to remove.
Click the Delete button.

Removing a user from a team also removes that user’s access to the registry.

Registry roles

Each user in a registry has a registry role, which determines what they can do in that registry.

W&B automatically assigns a default registry role to a user or team when they are added to a registry.

Entity	Default registry role
Team	Viewer
User (non admin)	Viewer
Org admin	Admin

A registry admin can assign or modify roles for users and teams in a registry. See Configure user roles in a registry for more information.

W&B role types

There are two different types of roles in W&B: Team roles and Registry roles.

Your role in a team has no impact or relationship to your role in any registry.

The proceeding table lists the different roles a user can have and their permissions:

Permission	Permission Group	Viewer	Member	Admin
View a collection’s details	Read	X	X	X
View a linked artifact’s details	Read	X	X	X
Usage: Consume an artifact in a registry with use_artifact	Read	X	X	X
Download a linked artifact	Read	X	X	X
Download files from an artifact’s file viewer	Read	X	X	X
Search a registry	Read	X	X	X
View a registry’s settings and user list	Read	X	X	X
Create a new automation for a collection	Create		X	X
Turn on Slack notifications for new version being added	Create		X	X
Create a new collection	Create		X	X
Create a new custom registry	Create		X	X
Edit collection card (description)	Update		X	X
Edit linked artifact description	Update		X	X
Add or delete a collection’s tag	Update		X	X
Add or delete an alias from a linked artifact	Update		X	X
Link a new artifact	Update		X	X
Edit allowed types list for a registry	Update		X	X
Edit custom registry name	Update		X	X
Delete a collection	Delete		X	X
Delete an automation	Delete		X	X
Unlink an artifact from a registry	Delete		X	X
Edit accepted artifact types for a registry	Admin			X
Change registry visibility (Organization or Restricted)	Admin			X
Add users to a registry	Admin			X
Assign or change a user’s role in a registry	Admin			X

Inherited permissions

A user’s permission in a registry depends on the highest level of privilege assigned to that user, whether individually or by team membership.

For example, suppose a registry admin adds a user called Nico to Registry A and assigns them a Viewer registry role. A registry admin then adds a team called Foundation Model Team to Registry A and assigns Foundation Model Team a Member registry role.

Nico is a member of the Foundation Model Team, which is a Member of the Registry. Because Member has more permission than Viewer, W&B grants Nico the Member role.

The proceeding table demonstrates the highest level of permission in the event of a conflict between a user’s individual registry role and the registry role of a team they are a member of:

Team registry role	Individual registry role	Inherited registry role
Viewer	Viewer	Viewer
Member	Viewer	Member
Admin	Viewer	Admin

If there is a conflict, W&B displays the highest level of permissions next to the name of the user.

For example, in the proceeding image Alex inherits Member role privileges because they are a member of the smle-reg-team-1 team.

Configure registry roles

Navigate to the Registry at https://wandb.ai/registry/.
Select the registry you want to configure.
Click the gear icon on the upper right hand corner.
Scroll to the Registry members and roles section.
Within the Member field, search for the user or team you want to edit permissions for.
In the Registry role column, click the user’s role.
From the dropdown, select the role you want to assign to the user.

4.3.4 - Create a collection

A collection is a set of linked artifact versions within a registry. Each collection represents a distinct task or use case.

For example, within the core Dataset registry you might have multiple collections. Each collection contains a different dataset such as MNIST, CIFAR-10, or ImageNet.

As another example, you might have a registry called “chatbot” that contains a collection for model artifacts, another collection for dataset artifacts, and another collection for fine-tuned model artifacts.

How you organize a registry and their collections is up to you.

If you are familiar with W&B Model Registry, you might aware of registered models. Registered models in the Model Registry are now referred to as collections in the W&B Registry.

Collection types

Each collection accepts one, and only one, type of artifact. The type you specify restricts what sort of artifacts you, and other members of your organization, can link to that collection.

You can think of artifact types similar to data types in programming languages such as Python. In this analogy, a collection can store strings, integers, or floats but not a mix of these data types.

For example, suppose you create a collection that accepts “dataset” artifact types. This means that you can only link future artifact versions that have the type “dataset” to this collection. Similarly, you can only link artifacts of type “model” to a collection that accepts only model artifact types.

You specify an artifact’s type when you create that artifact object. Note the type field in wandb.Artifact():

import wandb

# Initialize a run
run = wandb.init(
  entity = "<team_entity>",
  project = "<project>"
  )

# Create an artifact object
artifact = wandb.Artifact(
    name="<artifact_name>", 
    type="<artifact_type>"
    )

When you create a collection, you can select from a list of predefined artifact types. The artifact types available to you depend on the registry that the collection belongs to. .

Before you link an artifact to a collection or create a new collection, investigate the types of artifacts that collection accepts.

Check the types of artifact that a collection accepts

Before you link to a collection, inspect the artifact type that the collection accepts. You can inspect the artifact types that collection accepts programmatically with the W&B Python SDK or interactively with the W&B App

An error message appears if you try to create link an artifact to a collection that does not accept that artifact type.

You can find the accepted artifact types on the registry card on the homepage or within a registry’s settings page.

For both methods, first navigate to your W&B Registry App.

Within the homepage of the Registry App, you can view the accepted artifact types by scrolling to the registry card of that registry. The gray horizontal ovals within the registry card lists the artifact types that registry accepts.

For example, the preceding image shows multiple registry cards on the Registry App homepage. Within the Model registry card, you can see two artifact types: model and model-new.

To view accepted artifact types within a registry’s settings page:

Click on the registry card you want to view the settings for.
Click on the gear icon in the upper right corner.
Scroll to the Accepted artifact types field.

Programmatically view the artifact types that a registry accepts with the W&B Python SDK:

import wandb

registry_name = "<registry_name>"
artifact_types = wandb.Api().project(name=f"wandb-registry-{registry_name}").artifact_types()
print(artifact_type.name for artifact_type in artifact_types)

Note that you do not initialize a run with the proceeding code snippet. This is because it is unnecessary to create a run if you are only querying the W&B API and not tracking an experiment, artifact and so on.

Once you know what type of artifact a collection accepts, you can create a collection.

Create a collection

Interactively or programmatically create a collection within a registry. You can not change the type of artifact that a collection accepts after you create it.

Programmatically create a collection

Use the wandb.init.link_artifact() method to link an artifact to a collection. Specify both the collection and the registry to the target_path field as a path that takes the form of:

f"wandb-registry-{registry_name}/{collection_name}"

Where registry_name is the name of the registry and collection_name is the name of the collection. Ensure to append the prefix wandb-registry- to the registry name.

W&B automatically creates a collection for you if you try to link an artifact to a collection that does not exist. If you specify a collection that does exists, W&B links the artifact to the existing collection.

The proceeding code snippet shows how to programmatically create a collection. Ensure to replace other the values enclosed in <> with your own:

import wandb

# Initialize a run
run = wandb.init(entity = "<team_entity>", project = "<project>")

# Create an artifact object
artifact = wandb.Artifact(
  name = "<artifact_name>",
  type = "<artifact_type>"
  )

registry_name = "<registry_name>"
collection_name = "<collection_name>"
target_path = f"wandb-registry-{registry_name}/{collection_name}"

# Link the artifact to a collection
run.link_artifact(artifact = artifact, target_path = target_path)

run.finish()

Interactively create a collection

The following steps describe how to create a collection within a registry using the W&B Registry App UI:

Navigate to the Registry App in the W&B App UI.
Select a registry.
Click on the Create collection button in the upper right hand corner.
Provide a name for your collection in the Name field.
Select a type from the Type dropdown. Or, if the registry enables custom artifact types, provide one or more artifact types that this collection accepts.
Optionally provide a description of your collection in the Description field.
Optionally add one or more tags in the Tags field.
Click Link version.
From the Project dropdown, select the project where your artifact is stored.
From the Artifact collection dropdown, select your artifact.
From the Version dropdown, select the artifact version you want to link to your collection.
Click on the Create collection button.

4.3.5 - Link an artifact version to a registry

Link artifact versions to a collection to make them available to other members in your organization.

When you link an artifact to a registry, this “publishes” that artifact to that registry. Any user that has access to that registry can access the linked artifact versions in the collection.

In other words, linking an artifact to a registry collection brings that artifact version from a private, project-level scope, to a shared organization level scope.

The term “type” refers to the artifact object’s type. When you create an artifact object (wandb.Artifact), or log an artifact (wandb.init.log_artifact), you specify a type for the type parameter.

Link an artifact to a collection

Link an artifact version to a collection interactively or programmatically.

Before you link an artifact to a registry, check the types of artifacts that collection permits. For more information about collection types, see “Collection types” within Create a collection.

Based on your use case, follow the instructions described in the tabs below to link an artifact version.

If an artifact version logs metrics (such as by using run.log_artifact()), you can view metrics for that version from its details page, and you can compare metrics across artifact versions from the artifact’s page. Refer to View linked artifacts in a registry.

Watch a video demonstrating linking a version (8 min).

Programmatically link an artifact version to a collection with wandb.init.Run.link_artifact().

Before you link an artifact to a collection, ensure that the registry that the collection belongs to already exists. To check that the registry exists, navigate to the Registry app on the W&B App UI and search for the name of the registry.

Use the target_path parameter to specify the collection and registry you want to link the artifact version to. The target path consists of the prefix “wandb-registry”, the name of the registry, and the name of the collection separated by a forward slashes:

wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}

Copy and paste the code snippet below to link an artifact version to a collection within an existing registry. Replace values enclosed in <> with your own:

import wandb

# Initialize a run
run = wandb.init(
  entity = "<team_entity>",
  project = "<project_name>"
)

# Create an artifact object
# The type parameter specifies both the type of the 
# artifact object and the collection type
artifact = wandb.Artifact(name = "<name>", type = "<type>")

# Add the file to the artifact object. 
# Specify the path to the file on your local machine.
artifact.add_file(local_path = "<local_path_to_artifact>")

# Specify the collection and registry to link the artifact to
REGISTRY_NAME = "<registry_name>"  
COLLECTION_NAME = "<collection_name>"
target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"

# Link the artifact to the collection
run.link_artifact(artifact = artifact, target_path = target_path)

If you want to link an artifact version to the Model registry or the Dataset registry, set the artifact type to "model" or "dataset", respectively.

Navigate to the Registry App.
Hover your mouse next to the name of the collection you want to link an artifact version to.
Select the meatball menu icon (three horizontal dots) next to View details.
From the dropdown, select Link new version.
From the sidebar that appears, select the name of a team from the Team dropdown.
From the Project dropdown, select the name of the project that contains your artifact.
From the Artifact dropdown, select the name of the artifact.
From the Version dropdown, select the artifact version you want to link to the collection.

Navigate to your project’s artifact browser on the W&B App at: https://wandb.ai/<entity>/<project>/artifacts
Select the Artifacts icon on the left sidebar.
Click on the artifact version you want to link to your registry.
Within the Version overview section, click the Link to registry button.
From the modal that appears on the right of the screen, select an artifact from the Select a register model menu dropdown.
Click Next step.
(Optional) Select an alias from the Aliases dropdown.
Click Link to registry.

View a linked artifact’s metadata, version data, usage, lineage information and more in the Registry App.

View linked artifacts in a registry

View information about linked artifacts such as metadata, lineage, and usage information in the Registry App.

Navigate to the Registry App.
Select the name of the registry that you linked the artifact to.
Select the name of the collection.
If the collection’s artifacts log metrics, compare metrics across versions by clicking Show metrics.
From the list of artifact versions, select the version you want to access. Version numbers are incrementally assigned to each linked artifact version starting with v0.
To view details about an artifact version, click the version. From the tabs in this page, you can view that version’s metadata (including logged metrics), lineage, and usage information.

Make note of the Full Name field within the Version tab. The full name of a linked artifact consists of the registry, collection name, and the alias or index of the artifact version.

wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{INTEGER}

You need the full name of a linked artifact to access the artifact version programmatically.

Troubleshooting

Below are some common things to double check if you are not able to link an artifact.

Logging artifacts from a personal account

Artifacts logged to W&B with a personal entity can not be linked to the registry. Make sure that you log artifacts using a team entity within your organization. Only artifacts logged within an organization’s team can be linked to the organization’s registry.

Ensure that you log an artifact with a team entity if you want to link that artifact to a registry.

Find your team entity

W&B uses the name of your team as the team’s entity. For example, if your team is called team-awesome, your team entity is team-awesome.

You can confirm the name of your team by:

Navigate to your team’s W&B profile page.
Copy the site’s URL. It has the form of https://wandb.ai/<team>. Where <team> is the both the name of your team and the team’s entity.

Log from a team entity

Specify the team as the entity when you initialize a run with wandb.init(). If you do not specify the entity when you initialize a run, the run uses your default entity which may or may not be your team entity.

import wandb   

run = wandb.init(
  entity='<team_entity>', 
  project='<project_name>'
  )

Log the artifact to the run either with run.log_artifact or by creating an Artifact object and then adding files to it with:
```
artifact = wandb.Artifact(name="<artifact_name>", type="<type>")
```
To log artifacts, see Construct artifacts.
If an artifact is logged to your personal entity, you will need to re-log it to an entity within your organization.

Confirm the path of a registry in the W&B App UI

There are two ways to confirm the path of a registry with the UI: create an empty collection and view the collection details or copy and paste the autogenerated code on the collection’s homepage.

Copy and paste autogenerated code

Navigate to the Registry app at https://wandb.ai/registry/.
Click the registry you want to link an artifact to.
At the top of the page, you will see an autogenerated code block.
Copy and paste this into your code, ensure to replace the last part of the path with the name of your collection.

Create an empty collection

Navigate to the Registry app at https://wandb.ai/registry/.
Click the registry you want to link an artifact to.
Click on the empty collection. If an empty collection does not exist, create a new collection.
Within the code snippet that appears, identify the target_path field within .link_artifact().
(Optional) Delete the collection.

For example, after completing the steps outlined, you find the code block with the target_path parameter:

target_path = 
      "smle-registries-bug-bash/wandb-registry-Golden Datasets/raw_images"

Breaking this down into its components, you can see what you will need to use to create the path to link your artifact programmatically:

ORG_ENTITY_NAME = "smle-registries-bug-bash"
REGISTRY_NAME = "Golden Datasets"
COLLECTION_NAME = "raw_images"

Ensure that you replace the name of the collection from the temporary collection with the name of the collection that you want to link your artifact to.

4.3.6 - Reference an artifact version with aliases

Reference a specific artifact version with one or more aliases. W&B automatically assigns aliases to each artifact you link with the same name. You can also create one or more custom aliases to reference a specific artifact version.

Aliases appear as rectangles with the name of that alias in the rectangle in the Registry UI. If an alias is protected, it appears as a gray rectangle with a lock icon. Otherwise, the alias appears as an orange rectangle. Aliases are not shared across registries.

When to use an alias versus using a tag

Use an alias to reference a specific artifact version. Each alias within a collection is unique. Only one artifact version can have a specific alias at a time.

Use tags to organize and group artifact versions or collections based on a common theme. Multiple artifact versions and collections can share the same tag.

When you add an alias to an artifact version, you can optionally start a Registry automation to notify a Slack channel or trigger a webhook.

Default aliases

W&B automatically assigns the following aliases to each artifact version you link with the same name:

The latest alias to the most recent artifact version you link to a collection.
A unique version number. W&B counts each artifact version (zero indexing) you link. W&B uses the count number to assign a unique version number to that artifact.

For example, if you link an artifact named zoo_model three times, W&B creates three aliases v0, v1, and v2 respectively. v2 also has the latest alias.

Custom aliases

Create one or more custom aliases for a specific artifact versions based on your unique use case. For example:

You might use aliases such as dataset_version_v0, dataset_version_v1, and dataset_version_v2 to identify which dataset a model was trained on.
You might use a best_model alias to keep track of the best performing artifact model version.

Any user with a Member or Admin registry role on a registry can add or remove a custom alias from a linked artifact in that registry. If appropriate, use protected aliases to label and identify which artifact versions to protect from modification or deletion.

You can create a custom alias with the W&B Registry or the Python SDK. Based on your use case, click on a tab below that best fits your needs.

Navigate to the W&B Registry.
Click the View details button in a collection.
Within the Versions section, click the View button for a specific artifact version.
Click the + button to add one or more aliases next to the Aliases field.

When you link an artifact version to a collection with the Python SDK you can optionally provide a list of one or more aliases as an argument to the alias parameter in link_artifact(). W&B creates an alias (non protected alias) for you if the alias you provide does not already exist.

The following code snippet demonstrates how to link an artifact version to a collection and add aliases to that artifact version with the Python SDK. Replace values within <> with your own:

import wandb

# Initialize a run
run = wandb.init(entity = "<team_entity>", project = "<project_name>")

# Create an artifact object
# The type parameter specifies both the type of the 
# artifact object and the collection type
artifact = wandb.Artifact(name = "<name>", type = "<type>")

# Add the file to the artifact object. 
# Specify the path to the file on your local machine.
artifact.add_file(local_path = "<local_path_to_artifact>")

# Specify the collection and registry to link the artifact to
REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"
target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"

# Link the artifact version to the collection
# Add one or more aliases to this artifact version
run.link_artifact(
    artifact = artifact, 
    target_path = target_path, 
    aliases = ["<alias_1>", "<alias_2>"]
    )

Protected aliases

Use a protected alias to both label and identify artifact versions that should not be modified or deleted. For example, consider using a production protected alias to label and identify artifact versions that are in used in your organization’s machine learning production pipeline.

Registry admins and service accounts with the Admin role can create protected aliases and add or remove protected aliases from an artifact version. Members and Viewers cannot unlink a protected version or delete a collection that contains a protected . See Configure registry access for details.

Common protected aliases include:

Production: The artifact version is ready for production use.
Staging: The artifact version is ready for testing.

Create a protected alias

The following steps describe how to create a protected alias in the W&B Registry UI:

Navigate to the Registry App.
Select a registry.
Click the gear button on the top right of the page to view the registry’s settings.
Within the Protected Aliases section, click the + button to add one or more protected aliases.

After creation, each protected alias appears as a gray rectangle with a lock icon in the Protected Aliases section.

Unlike custom aliases that are not protected, creating protected aliases is available exclusively in the W&B Registry UI and not programmatically with the Python SDK. To add a protected alias to an artifact version, you can use the W&B Registry UI or the Python SDK.

The following steps describe how to add a protected alias to an artifact version with the W&B Registry UI:

Navigate to the W&B Registry.
Click the View details button in a collection.
Within the Versions section, select the View button for a specific artifact version.
Click the + button to add one or more protected aliases next to the Aliases field.

After a protected alias is created, an admin can add it to an artifact version programmatically with the Python SDK. See the W&B Registry and Python SDK tabs in Create a custom alias section above for an example on how to add a protected alias to an artifact version.

Find existing aliases

You can find existing aliases with the global search bar in the W&B Registry. To find a protected alias:

Navigate to the W&B Registry App.
Specify the search term in the search bar at the top of the page. Press Enter to search.

Search results appear below the search bar if the term you specify matches an existing registry, collection name, artifact version tag, collection tag, or alias.

Example

The following code example is a continuation of the W&B Registry Tutorial. To use the following code, you must first retrieve and process the Zoo dataset as described in the notebook. Once you have the Zoo dataset, you can create an artifact version and add custom aliases to it.

The following code snippet shows how to create an artifact version and add custom aliases to it. The example uses the Zoo dataset from the UCI Machine Learning Repository and the Model collection in the Zoo_Classifier_Models registry.

import wandb

# Initialize a run
run = wandb.init(entity = "smle-reg-team-2", project = "zoo_experiment")

# Create an artifact object
# The type parameter specifies both the type of the 
# artifact object and the collection type
artifact = wandb.Artifact(name = "zoo_dataset", type = "dataset")

# Add the file to the artifact object. 
# Specify the path to the file on your local machine.
artifact.add_file(local_path="zoo_dataset.pt", name="zoo_dataset")
artifact.add_file(local_path="zoo_labels.pt", name="zoo_labels")

# Specify the collection and registry to link the artifact to
REGISTRY_NAME = "Model"
COLLECTION_NAME = "Zoo_Classifier_Models"
target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"

# Link the artifact version to the collection
# Add one or more aliases to this artifact version
run.link_artifact(
    artifact = artifact,
    target_path = target_path,
    aliases = ["production-us", "production-eu"]
    )

First, you create an artifact object (wandb.Artifact()).
Next, you add two dataset PyTorch tensors to the artifact object with wandb.Artifact.add_file().
Lastly, you link the artifact version to the Model collection in the Zoo_Classifier_Models registry with link_artifact(). You also add two custom aliases to the artifact version by passing production-us and production-eu as arguments to the aliases parameter.

4.3.7 - Download an artifact from a registry

Use the W&B Python SDK to download an artifact linked to a registry. To download and use an artifact, you need to know the name of the registry, the name of the collection, and the alias or index of the artifact version you want to download.

Once you know the properties of the artifact, you can construct the path to the linked artifact and download the artifact. Alternatively, you can copy and paste a pre-generated code snippet from the W&B App UI to download an artifact linked to a registry.

Construct path to linked artifact

To download an artifact linked to a registry, you must know the path of that linked artifact. The path consists of the registry name, collection name, and the alias or index of the artifact version you want to access.

Once you have the registry, collection, and alias or index of the artifact version, you can construct the path to the linked artifact using the proceeding string template:

# Artifact name with version index specified
f"wandb-registry-{REGISTRY}/{COLLECTION}:v{INDEX}"

# Artifact name with alias specified
f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}"

Replace the values within the curly braces {} with the name of the registry, collection, and the alias or index of the artifact version you want to access.

Specify model or dataset to link an artifact version to the core Model registry or the core Dataset registry, respectively.

Use the wandb.init.use_artifact method to access the artifact and download its contents once you have the path of the linked artifact. The proceeding code snippet shows how to use and download an artifact linked to the W&B Registry. Ensure to replace values within <> with your own:

import wandb

REGISTRY = '<registry_name>'
COLLECTION = '<collection_name>'
ALIAS = '<artifact_alias>'

run = wandb.init(
   entity = '<team_name>',
   project = '<project_name>'
   )  

artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}"
# artifact_name = '<artifact_name>' # Copy and paste Full name specified on the Registry App
fetched_artifact = run.use_artifact(artifact_or_name = artifact_name)  
download_path = fetched_artifact.download()

The .use_artifact() method both creates a run and marks the artifact you download as the input to that run. Marking an artifact as the input to a run enables W&B to track the lineage of that artifact.

If you do not want to create a run, you can use the wandb.Api() object to access the artifact:

import wandb

REGISTRY = "<registry_name>"
COLLECTION = "<collection_name>"
VERSION = "<version>"

api = wandb.Api()
artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}"
artifact = api.artifact(name = artifact_name)

Example: Use and download an artifact linked to the W&B Registry

The proceeding code example shows how a user can download an artifact linked to a collection called phi3-finetuned in the Fine-tuned Models registry. The alias of the artifact version is set to production.

import wandb

TEAM_ENTITY = "product-team-applications"
PROJECT_NAME = "user-stories"

REGISTRY = "Fine-tuned Models"
COLLECTION = "phi3-finetuned"
ALIAS = 'production'

# Initialize a run inside the specified team and project
run = wandb.init(entity=TEAM_ENTITY, project = PROJECT_NAME)

artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}"

# Access an artifact and mark it as input to your run for lineage tracking
fetched_artifact = run.use_artifact(artifact_or_name = name)  

# Download artifact. Returns path to downloaded contents
downloaded_path = fetched_artifact.download()

See use_artifact and Artifact.download() in the API Reference for parameters and return type.

Users with a personal entity that belong to multiple organizations

Users with a personal entity that belong to multiple organizations must also specify either the name of their organization or use a team entity when accessing artifacts linked to a registry.

import wandb

REGISTRY = "<registry_name>"
COLLECTION = "<collection_name>"
VERSION = "<version>"

# Ensure you are using your team entity to instantiate the API
api = wandb.Api(overrides={"entity": "<team-entity>"})
artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}"
artifact = api.artifact(name = artifact_name)

# Use org display name or org entity in the path
api = wandb.Api()
artifact_name = f"{ORG_NAME}/wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}"
artifact = api.artifact(name = artifact_name)

Where the ORG_NAME is the display name of your organization. Multi-tenant SaaS users can find the name of their organization in the organization’s settings page at https://wandb.ai/account-settings/. Dedicated Cloud and Self-Managed users, contact your account administrator to confirm your organization’s display name.

Copy and paste pre-generated code snippet

W&B creates a code snippet that you can copy and paste into your Python script, notebook, or terminal to download an artifact linked to a registry.

Navigate to the Registry App.
Select the name of the registry that contains your artifact.
Select the name of the collection.
From the list of artifact versions, select the version you want to access.
Select the Usage tab.
Copy the code snippet shown in the Usage API section.
Paste the code snippet into your Python script, notebook, or terminal.

4.3.8 - Find registry items

Use the global search bar in the W&B Registry App to find a registry, collection, artifact version tag, collection tag, or alias. You can use MongoDB-style queries to filter registries, collections, and artifact versions based on specific criteria using the W&B Python SDK.

Only items that you have permission to view appear in the search results.

Search for registry items

To search for a registry item:

Navigate to the W&B Registry App.
Specify the search term in the search bar at the top of the page. Press Enter to search.

Search results appear below the search bar if the term you specify matches an existing registry, collection name, artifact version tag, collection tag, or alias.

Query registry items with MongoDB-style queries

Use the wandb.Api().registries() and query predicates to filter registries, collections, and artifact versions based on one or more MongoDB-style queries.

The following table lists query names you can use based on the type of item you want to filter:

	query name
registries	`name`, `description`, `created_at`, `updated_at`
collections	`name`, `tag`, `description`, `created_at`, `updated_at`
versions	`tag`, `alias`, `created_at`, `updated_at`, `metadata`

The proceeding code examples demonstrate some common search scenarios.

To use the wandb.Api().registries() method, first import the W&B Python SDK (wandb) library:

import wandb

# (Optional) Create an instance of the wandb.Api() class for readability
api = wandb.Api()

Filter all registries that contain the string model:

# Filter all registries that contain the string `model`
registry_filters = {
    "name": {"$regex": "model"}
}

# Returns an iterable of all registries that match the filters
registries = api.registries(filter=registry_filters)

Filter all collections, independent of registry, that contains the string yolo in the collection name:

# Filter all collections, independent of registry, that 
# contains the string `yolo` in the collection name
collection_filters = {
    "name": {"$regex": "yolo"}
}

# Returns an iterable of all collections that match the filters
collections = api.registries().collections(filter=collection_filters)

Filter all collections, independent of registry, that contains the string yolo in the collection name and possesses cnn as a tag:

# Filter all collections, independent of registry, that contains the
# string `yolo` in the collection name and possesses `cnn` as a tag
collection_filters = {
    "name": {"$regex": "yolo"},
    "tag": "cnn"
}

# Returns an iterable of all collections that match the filters
collections = api.registries().collections(filter=collection_filters)

Find all artifact versions that contains the string model and has either the tag image-classification or an latest alias:

# Find all artifact versions that contains the string `model` and 
# has either the tag `image-classification` or an `latest` alias
registry_filters = {
    "name": {"$regex": "model"}
}

# Use logical $or operator to filter artifact versions
version_filters = {
    "$or": [
        {"tag": "image-classification"},
        {"alias": "production"}
    ]
}

# Returns an iterable of all artifact versions that match the filters
artifacts = api.registries(filter=registry_filters).collections().versions(filter=version_filters)

See the MongoDB documentation for more information on logical query operators.

Each item in the artifacts iterable in the previous code snippet is an instance of the Artifact class. This means that you can access each artifact’s attributes, such as name, collection, aliases, tags, created_at, and more:

for art in artifacts:
    print(f"artifact name: {art.name}")
    print(f"collection artifact belongs to: { art.collection.name}")
    print(f"artifact aliases: {art.aliases}")
    print(f"tags attached to artifact: {art.tags}")
    print(f"artifact created at: {art.created_at}\n")

For a complete list of an artifact object’s attributes, see the Artifacts Class in the API Reference docs.

Filter all artifact versions, independent of registry or collection, created between 2024-01-08 and 2025-03-04 at 13:10 UTC:

# Find all artifact versions created between 2024-01-08 and 2025-03-04 at 13:10 UTC. 

artifact_filters = {
    "alias": "latest",
    "created_at" : {"$gte": "2024-01-08", "$lte": "2025-03-04 13:10:00"},
}

# Returns an iterable of all artifact versions that match the filters
artifacts = api.registries().collections().versions(filter=artifact_filters)

Specify the date and time in the format YYYY-MM-DD HH:MM:SS. You can omit the hours, minutes, and seconds if you want to filter by date only.

See the MongoDB documentation for more information on query comparisons.

4.3.9 - Organize versions with tags

Use tags to organize collections or artifact versions within collections. You can add, remove, edit tags with the Python SDK or W&B App UI.

Create and add tags to organize your collections or artifact versions within your registry. Add, modify, view, or remove tags to a collection or artifact version with the W&B App UI or the W&B Python SDK.

When to use a tag versus using an alias

Use aliases when you need to reference a specific artifact version uniquely. For example, use an alias such as ‘production’ or ’latest’ to ensure that artifact_name:alias always points to a single, specific version.

Use tags when you want more flexibility for grouping or searching. Tags are ideal when multiple versions or collections can share the same label, and you don’t need the guarantee that only one version is associated with a specific identifier.

Add a tag to a collection

Use the W&B App UI or Python SDK to add a tag to a collection:

Use the W&B App UI to add a tag to a collection:

Navigate to the W&B Registry App.
Click on a registry card
Click View details next to the name of a collection
Within the collection card, click on the plus icon (+) next to the Tags field and type in the name of the tag
Press Enter on your keyboard

import wandb

COLLECTION_TYPE = "<collection_type>"
ORG_NAME = "<org_name>"
REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"

full_name = f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"

collection = wandb.Api().artifact_collection(
  type_name = COLLECTION_TYPE, 
  name = full_name
  )

collection.tags = ["your-tag"]
collection.save()

Update tags that belong to a collection

Update a tag programmatically by reassigning or by mutating the tags attribute. W&B recommends, and it is good Python practice, that you reassign the tags attribute instead of in-place mutation.

For example, the proceeding code snippet shows common ways to update a list with reassignment. For brevity, we continue the code example from the Add a tag to a collection section:

collection.tags = [*collection.tags, "new-tag", "other-tag"]
collection.tags = collection.tags + ["new-tag", "other-tag"]

collection.tags = set(collection.tags) - set(tags_to_delete)
collection.tags = []  # deletes all tags

The following code snippet shows how you can use in-place mutation to update tags that belong to an artifact version:

collection.tags += ["new-tag", "other-tag"]
collection.tags.append("new-tag")

collection.tags.extend(["new-tag", "other-tag"])
collection.tags[:] = ["new-tag", "other-tag"]
collection.tags.remove("existing-tag")
collection.tags.pop()
collection.tags.clear()

View tags that belong to a collection

Use the W&B App UI to view tags added to a collection:

Navigate to the W&B Registry App.
Click on a registry card
Click View details next to the name of a collection

If a collection has one or more tags, you can view those tags within the collection card next to the Tags field.

Tags added to a collection also appear next to the name of that collection.

For example, in the proceeding image, a tag called “tag1” was added to the “zoo-dataset-tensors” collection.

Remove a tag from a collection

Use the W&B App UI to remove a tag from a collection:

Navigate to the W&B Registry App.
Click on a registry card
Click View details next to the name of a collection
Within the collection card, hover your mouse over the name of the tag you want to remove
Click on the cancel button (X icon)

Add a tag to an artifact version

Add a tag to an artifact version linked to a collection with the W&B App UI or with the Python SDK.

Navigate to the W&B Registry at https://wandb.ai/registry
Click on a registry card
Click View details next to the name of the collection you want to add a tag to
Scroll down to Versions
Click View next to an artifact version
Within the Version tab, click on the plus icon (+) next to the Tags field and type in the name of the tag
Press Enter on your keyboard

Fetch the artifact version you want to add or update a tag to. Once you have the artifact version, you can access the artifact object’s tag attribute to add or modify tags to that artifact. Pass in one or more tags as list to the artifacts tag attribute.

Like other artifacts, you can fetch an artifact from W&B without creating a run or you can create a run and fetch the artifact within that run. In either case, ensure to call the artifact object’s save method to update the artifact on the W&B servers.

Copy and paste an appropriate code cells below to add or modify an artifact version’s tag. Replace the values in <> with your own.

The proceeding code snippet shows how to fetch an artifact and add a tag without creating a new run:

import wandb

ARTIFACT_TYPE = "<TYPE>"
ORG_NAME = "<org_name>"
REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"
VERSION = "<artifact_version>"

artifact_name = f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}"

artifact = wandb.Api().artifact(name = artifact_name, type = ARTIFACT_TYPE)
artifact.tags = ["tag2"] # Provide one or more tags in a list
artifact.save()

The proceeding code snippet shows how to fetch an artifact and add a tag by creating a new run:

import wandb

ORG_NAME = "<org_name>"
REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"
VERSION = "<artifact_version>"

run = wandb.init(entity = "<entity>", project="<project>")

artifact_name = f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}"

artifact = run.use_artifact(artifact_or_name = artifact_name)
artifact.tags = ["tag2"] # Provide one or more tags in a list
artifact.save()

Update tags that belong to an artifact version

Update a tag programmatically by reassigning or by mutating the tags attribute. W&B recommends, and it is good Python practice, that you reassign the tags attribute instead of in-place mutation.

For example, the proceeding code snippet shows common ways to update a list with reassignment. For brevity, we continue the code example from the Add a tag to an artifact version section:

artifact.tags = [*artifact.tags, "new-tag", "other-tag"]
artifact.tags = artifact.tags + ["new-tag", "other-tag"]

artifact.tags = set(artifact.tags) - set(tags_to_delete)
artifact.tags = []  # deletes all tags

The following code snippet shows how you can use in-place mutation to update tags that belong to an artifact version:

artifact.tags += ["new-tag", "other-tag"]
artifact.tags.append("new-tag")

artifact.tags.extend(["new-tag", "other-tag"])
artifact.tags[:] = ["new-tag", "other-tag"]
artifact.tags.remove("existing-tag")
artifact.tags.pop()
artifact.tags.clear()

View tags that belong to an artifact version

View tags that belong to an artifact version that is linked to a registry with the W&B App UI or with the Python SDK.

Navigate to the W&B Registry App.
Click on a registry card
Click View details next to the name of the collection you want to add a tag to
Scroll down to Versions section

If an artifact version has one or more tags, you can view those tags within the Tags column.

Fetch the artifact version to view its tags. Once you have the artifact version, you can view tags that belong to that artifact by viewing the artifact object’s tag attribute.

Like other artifacts, you can fetch an artifact from W&B without creating a run or you can create a run and fetch the artifact within that run.

Copy and paste an appropriate code cells below to add or modify an artifact version’s tag. Replace the values in <> with your own.

The proceeding code snippet shows how to fetch and view an artifact version’s tags without creating a new run:

import wandb

ARTIFACT_TYPE = "<TYPE>"
ORG_NAME = "<org_name>"
REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"
VERSION = "<artifact_version>"

artifact_name = f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}"

artifact = wandb.Api().artifact(name = artifact_name, type = artifact_type)
print(artifact.tags)

The proceeding code snippet shows how to fetch and view artifact version’s tags by creating a new run:

import wandb

ORG_NAME = "<org_name>"
REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"
VERSION = "<artifact_version>"

run = wandb.init(entity = "<entity>", project="<project>")

artifact_name = f"{ORG_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}"

artifact = run.use_artifact(artifact_or_name = artifact_name)
print(artifact.tags)

Remove a tag from an artifact version

Navigate to the W&B Registry App.
Click on a registry card
Click View details next to the name of the collection you want to add a tag to
Scroll down to Versions
Click View next to an artifact version
Within the Version tab, hover your mouse over the name of the tag
Click on the cancel button (X icon)

Search existing tags

Use the W&B App UI to search existing tags in collections and artifact versions:

Navigate to the W&B Registry App.
Click on a registry card
Within the search bar, type in the name of a tag.

Find artifact versions with a specific tag

Use the W&B Python SDK to find artifact versions that have a set of tags:

import wandb

api = wandb.Api()
tagged_artifact_versions = api.artifacts(
    type_name = "<artifact_type>",
    name = "<artifact_name>",
    tags = ["<tag_1>", "<tag_2>"]
)

for artifact_version in tagged_artifact_versions:
    print(artifact_version.tags)

4.3.10 - Annotate collections

Add human-friendly text to your collections to help users understand the purpose of the collection and the artifacts it contains.

Depending on the collection, you might want to include information about the training data, model architecture, task, license, references, and deployment. The proceeding lists some topics worth documenting in a collection:

W&B recommends including at minimum these details:

Summary: The purpose of the collection. The machine learning framework used for the machine learning experiment.
License: The legal terms and permissions associated with the use of the machine learning model. It helps model users understand the legal framework under which they can utilize the model. Common licenses include Apache 2.0, MIT, and GPL.
References: Citations or references to relevant research papers, datasets, or external resources.

If your collection contains training data, consider including these additional details:

Training data: Describe the training data used
Processing: Processing done on the training data set.
Data storage: Where is that data stored and how to access it.

If your collection contains a machine learning model, consider including these additional details:

Architecture: Information about the model architecture, layers, and any specific design choices.
Task: The specific type of task or problem that the machine that the collection model is designed to perform. It’s a categorization of the model’s intended capability.
Deserialize the model: Provide information on how someone on your team can load the model into memory.
Deployment: Details on how and where the model is deployed and guidance on how the model is integrated into other enterprise systems, such as a workflow orchestration platforms.

Add a description to a collection

Interactively or programmatically add a description to a collection with the W&B Registry UI or Python SDK.

Navigate to the W&B Registry App.
Click on a collection.
Select View details next to the name of the collection.
Within the Description field, provide information about your collection. Format text within with Markdown markup language.

Use the wandb.Api().artifact_collection() method to access a collection’s description. Use the returned object’s description property to add, or update, a description to the collection.

Specify the collection’s type for the type_name parameter and the collection’s full name for the name parameter. A collection’s name consists of the prefix “wandb-registry”, the name of the registry, and the name of the collection separated by a forward slashes:

wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}

Copy and paste the proceeding code snippet into your Python script or notebook. Replace values enclosed in angle brackets (<>) with your own.

import wandb

api = wandb.Api()

collection = api.artifact_collection(
  type_name = "<collection_type>", 
  name = "<collection_name>"
  )


collection.description = "This is a description."
collection.save()

For example, the proceeding image shows a collection that documents a model’s architecture, intended use, performance information and more.

4.3.11 - Create and view lineage maps

Create a lineage map in the W&B Registry.

Within a collection in the W&B Registry, you can view a history of the artifacts that an ML experiment uses. This history is called a lineage graph.

You can also view lineage graphs for artifacts you log to W&B that are not part of a collection.

Lineage graphs can show the specific run that logs an artifact. In addition, lineage graphs can also show which run used an artifact as an input. In other words, lineage graphs can show the input and output of a run.

For example, the proceeding image shows artifacts created and used throughout an ML experiment:

From left to right, the image shows:

Multiple runs log the split_zoo_dataset:v4 artifact.
The “rural-feather-20” run uses the split_zoo_dataset:v4 artifact for training.
The output of the “rural-feather-20” run is a model artifact called zoo-ylbchv20:v0.
A run called “northern-lake-21” uses the model artifact zoo-ylbchv20:v0 to evaluate the model.

Track the input of a run

Mark an artifact as an input or dependency of a run with the wandb.init.use_artifact API.

The proceeding code snippet shows how to use the use_artifact. Replace values enclosed in angle brackets (< >) with your values:

import wandb

# Initialize a run
run = wandb.init(project="<project>", entity="<entity>")

# Get artifact, mark it as a dependency
artifact = run.use_artifact(artifact_or_name="<name>", aliases="<alias>")

Track the output of a run

Use (wandb.init.log_artifact) to declare an artifact as an output of a run.

The proceeding code snippet shows how to use the wandb.init.log_artifact API. Ensure to replace values enclosed in angle brackets (< >) with your values:

import wandb

# Initialize a run
run = wandb.init(entity  "<entity>", project = "<project>",)
artifact = wandb.Artifact(name = "<artifact_name>", type = "<artifact_type>")
artifact.add_file(local_path = "<local_filepath>", name="<optional-name>")

# Log the artifact as an output of the run
run.log_artifact(artifact_or_path = artifact)

For more information on about creating artifacts, see Create an artifact.

View lineage graphs in a collection

View the lineage of an artifact linked to a collection in the W&B Registry.

Navigate to the W&B Registry.
Select the collection that contains the artifact.
From the dropdown, click the artifact version you want to view its lineage graph.
Select the “Lineage” tab.

Once you are in an artifact’s lineage graph page, you can view additional information about any node in that lineage graph.

Select a run node to view that run’s details, such as the run’s ID, the run’s name, the run’s state, and more. As an example, the proceeding image shows information about the rural-feather-20 run:

Select an artifact node to view that artifact’s details, such as its full name, type, creation time, and associated aliases.

4.3.12 - Delete registry

This page shows how a Team admin or Registry admin can delete a custom registry. A core registry cannot be deleted.

A Team admin can delete any custom registry in the organization.
A Registry admin can delete a custom registry that they created.

Deleting a registry also deletes collections that belong to that registry, but does not delete artifacts linked to the registry. Such an artifact remains in the original project that the artifact was logged to.

Use the wandb API’s delete() method to delete a registry programmatically. The following example illustrates how to:

Fetch the registry you want to delete with api.registry().
Call the delete() method on the returned registry object to delete the registry.

import wandb

# Initialize the W&B API
api = wandb.Api()

# Fetch the registry you want to delete
fetched_registry = api.registry("<registry_name>")

# Deleting a registry
fetched_registry.delete()

Navigate to the Registry App at https://wandb.ai/registry/.
Select the custom registry you want to delete.
Click the gear icon in the upper right corner to view the registry’s settings.
To delete the registry, click the trash can icon in the upper right corner of the settings page.
Confirm the registry to delete by entering its name in the modal that appears, then click Delete.

4.3.13 - Migrate from legacy Model Registry

W&B is migrating from the legacy Model Registry to the enhanced W&B Registry. This transition is designed to be seamless and fully managed by W&B. The migration process will preserve your workflows while unlocking powerful new features. For any questions or support, contact support@wandb.com.

Reasons for the migration

W&B Registry offers major improvements over the legacy Model Registry:

Unified, organization-level experience: Share and manage curated artifacts across your organization, regardless of teams.
Improved governance: Use access control, restricted registries, and visibility settings to manage user access.
Enhanced functionality: New features such as custom registries, better search, audit trails, and automation support help modernize your ML infrastructure.

The following table summarizes the key differences between the legacy Model Registry and the new W&B Registry:

Feature	Legacy W&B Model Registry	W&B Registry
Artifact Visibility	Team-level only - access restricted to team members	Org-level visibility with fine-grained permission controls
Custom Registries	Not supported	Fully supported — create registries for any artifact type
Access Control	Not available	Role-based access (Admin, Member, Viewer) at the registry level
Terminology	“Registered models”: pointers to model versions	“Collections”: pointers to any artifact versions
Registry Scope	Only supports model versioning	Supports models, datasets, custom artifacts, and more
Automations	Registry-level automations	Registry- and collection-level automations supported and copied during migration
Search & Discoverability	Limited search and discoverability	Central search within W&B Registry across all registries in the organization
API Compatibility	Uses `wandb.init.link_model()` and MR-specific patterns	Modern SDK APIs (`link_artifact()`, `use_artifact()`) with auto-redirection
Migration	End-of-life	Automatically migrated and enhanced — data is copied, not deleted

Preparing for the migration

No action required: The migration is fully automated and managed by W&B. You do not need to run scripts, update configurations, or move data manually.
Stay informed: You will receive communications (banners in the W&B App UI) 2 weeks prior to your scheduled migration.
Review permissions: After the migration, admins should check registry access to ensure alignment with your team’s needs.
Use new paths in future work: Old code continues to work, W&B recommends using the new W&B Registry paths for new projects.

Migration process

Temporary write operation pause

During migration, write operations for your team’s Model Registry will be paused to ensure data consistency for up to one hour. Write operations to the newly created migrated W&B Registry will also be paused during the migration.

Data migration

W&B will migrate the following data from the legacy Model Registry to the new W&B Registry:

Collections
Linked artifact versions
Version history
Aliases, tags, and descriptions
Automations (both collection and registry-level)
Permissions, including service account roles and protected aliases

Within the W&B App UI, the legacy Model Registry will be replaced with the new W&B Registry. Migrated registries will have the name of your team followed by mr-migrated:

<team-name>-mr-migrated

These registries default to Restricted visibility, preserving your existing privacy boundaries. Only the original members of the <team-name> will have access to their respective registries.

After the migration

After the migration completes:

The legacy Model Registry becomes read-only. You can still view and access your data, but no new writes will be allowed.
Data in the legacy Model Registry is copied to the new W&B Registry, not moved. No data is deleted.
Access all your data from the new W&B Registry.
Use the new Registry UI for versioning, governance, audit trails, and automation.
Continue using your old code.
- Existing paths and API calls will automatically redirect to the new W&B Registry.
- Artifact version paths are redirected.
The legacy Model Registry will temporarily remain visible in the UI. W&B will eventually hide the legacy Model Registry.
Explore enhanced functionality in the Registry such as:

Code will continue to work

Existing API calls in your code that refer to the legacy Model Registry will automatically redirect to the new W&B Registry. The following API calls will continue to work without any changes:

wandb.Api().artifact()
wandb.run.use_artifact()
wandb.run.link_artifact()
wandb.Artifact().link()

Legacy paths will redirect to new W&B Registry paths

W&B will automatically redirect legacy Model Registry paths to the new W&B Registry format. This means you can continue using your existing code without needing to refactor paths immediately. Note that automatic redirection only applies to collections that were created in the legacy Model Registry before migration.

For example:

If the legacy Model Registry had collection "my-model" already present, the link action will redirect successfully
If the legacy Model Registry did not have collection "my-model", it will not redirect and will lead to an error

# This will redirect successfully if "my-model" existed in legacy Model Registry
run.link_artifact(artifact, "team-name/model-registry/my-model")

# This will fail if "new-model" did not exist in legacy Model Registry
run.link_artifact(artifact, "team-name/model-registry/new-model")

To fetch versions from the legacy Model Registry, paths consisted of a team name, a "model-registry" string, collection name, and version:

f"{team-name}/model-registry/{collection-name}:{version}"

W&B will automatically redirect these paths to the new W&B Registry format, which includes the organization name, a "wandb-registry" string, the team name, collection name, and version:

# Redirects to new path
f"{org-name}/wandb-registry-{team-name}/{collection-name}:{version}"

Python SDK warnings

A warning error may appear if you continue to use legacy Model Registry paths in your code. The warning will not break your code, but it indicates that you should update your paths to the new W&B Registry format.

Whether a warning appears depends on the version of the W&B Python SDK you are using:

Users on the latest W&B SDK (v0.21.0 and above) will see a non-breaking warning in their logs indicating that a redirect has occurred.
For older SDK versions, the redirect will still work silently without emitting a warning. Some metadata such as entity or project names may reflect legacy values.

Frequently asked questions

How will I know when my org is being migrated?

You will receive advance notice with an in-app banner or direct communication from W&B.

Will there be downtime?

Write operations to the legacy Model Registry and the new W&B Registry will be paused for a approximately one hour during the migration. All other W&B services will remain available.

Will this break my code?

No. All legacy Model Registry paths and Python SDK calls will automatically redirect to the new Registry.

Will my data be deleted?

No. Your data will be copied to the new W&B Registry. The legacy Model Registry becomes read-only and later hidden. No data is removed or lost.

What if I’m using an older SDK?

Redirects will still work, but you will not see warnings about them. For the best experience, upgrade to the latest version of the W&B SDK.

Can I rename/modify my migrated registry?

Yes, renaming and other operations such as adding or removing members from a migrated registry are allowed. These registries are simply custom registries underneath, and the redirection will continue working even after migration.

Questions?

For support or to discuss your migration, contact support@wandb.com. W&B is committed to helping you transition smoothly to the new W&B Registry.

4.3.14 - Model registry

Model registry to manage the model lifecycle from training to production

W&B will eventually stop supporting W&B Model Registry. Users are encouraged to instead use W&B Registry for linking and sharing their model artifacts versions. W&B Registry broadens the capabilities of the legacy W&B Model Registry. For more information about W&B Registry, see the Registry docs.

W&B will migrate existing model artifacts linked to the legacy Model Registry to the new W&B Registry in the near future. See Migrating from legacy Model Registry for information about the migration process.

The W&B Model Registry houses a team’s trained models where ML Practitioners can publish candidates for production to be consumed by downstream teams and stakeholders. It is used to house staged/candidate models and manage workflows associated with staging.

With W&B Model Registry, you can:

Bookmark your best model versions for each machine learning task.
Automate downstream processes and model CI/CD.
Move model versions through its ML lifecycle; from staging to production.
Track a model’s lineage and audit the history of changes to production models.

How it works

Track and manage your staged models with a few simple steps.

Log a model version: In your training script, add a few lines of code to save the model files as an artifact to W&B.
Compare performance: Check live charts to compare the metrics and sample predictions from model training and validation. Identify which model version performed the best.
Link to registry: Bookmark the best model version by linking it to a registered model, either programmatically in Python or interactively in the W&B UI.

The following code snippet demonstrates how to log and link a model to the Model Registry:

import wandb
import random

# Start a new W&B run
run = wandb.init(project="models_quickstart")

# Simulate logging model metrics
run.log({"acc": random.random()})

# Create a simulated model file
with open("my_model.h5", "w") as f:
    f.write("Model: " + str(random.random()))

# Log and link the model to the Model Registry
run.link_model(path="./my_model.h5", registered_model_name="MNIST")

run.finish()

Connect model transitions to CI/CD workflows: transition candidate models through workflow stages and automate downstream actions with webhooks.

How to get started

Depending on your use case, explore the following resources to get started with W&B Models:

Check out the two-part video series:
1. Logging and registering models
2. Consuming models and automating downstream processes in the Model Registry.
Read the models walkthrough for a step-by-step outline of the W&B Python SDK commands you could use to create, track, and use a dataset artifact.
Learn about:
- Protected models and access control.
- How to connect Registry to CI/CD processes.
- Set up Slack notifications when a new model version is linked to a registered model.
Review What is an ML Model Registry? to learn how to integrate Model Registry into your ML workflow.
Take the W&B Enterprise Model Management course and learn how to:
- Use the W&B Model Registry to manage and version your models, track lineage, and promote models through different lifecycle stages
- Automate your model management workflows using webhooks.
- See how the Model Registry integrates with external ML systems and tools in your model development lifecycle for model evaluation, monitoring, and deployment.

4.3.14.1 - Tutorial: Use W&B for model management

Learn how to use W&B for Model Management

The following walkthrough shows you how to log a model to W&B. By the end of the walkthrough you will:

Create and train a model with the MNIST dataset and the Keras framework.
Log the model that you trained to a W&B project
Mark the dataset used as a dependency to the model you created
Link the model to the W&B Registry.
Evaluate the performance of the model you link to the registry
Mark a model version ready for production.

Copy the code snippets in the order presented in this guide.
Code not unique to the Model Registry are hidden in collapsible cells.

Setting up

Before you get started, import the Python dependencies required for this walkthrough:

import wandb
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from wandb.integration.keras import WandbMetricsLogger
from sklearn.model_selection import train_test_split

Provide your W&B entity to the entity variable:

entity = "<entity>"

Create a dataset artifact

First, create a dataset. The proceeding code snippet creates a function that downloads the MNIST dataset:

def generate_raw_data(train_size=6000):
    eval_size = int(train_size / 6)
    (x_train, y_train), (x_eval, y_eval) = keras.datasets.mnist.load_data()

    x_train = x_train.astype("float32") / 255
    x_eval = x_eval.astype("float32") / 255
    x_train = np.expand_dims(x_train, -1)
    x_eval = np.expand_dims(x_eval, -1)

    print("Generated {} rows of training data.".format(train_size))
    print("Generated {} rows of eval data.".format(eval_size))

    return (x_train[:train_size], y_train[:train_size]), (
        x_eval[:eval_size],
        y_eval[:eval_size],
    )

# Create dataset
(x_train, y_train), (x_eval, y_eval) = generate_raw_data()

Next, upload the dataset to W&B. To do this, create an artifact object and add the dataset to that artifact.

project = "model-registry-dev"

model_use_case_id = "mnist"
job_type = "build_dataset"

# Initialize a W&B run
run = wandb.init(entity=entity, project=project, job_type=job_type)

# Create W&B Table for training data
train_table = wandb.Table(data=[], columns=[])
train_table.add_column("x_train", x_train)
train_table.add_column("y_train", y_train)
train_table.add_computed_columns(lambda ndx, row: {"img": wandb.Image(row["x_train"])})

# Create W&B Table for eval data
eval_table = wandb.Table(data=[], columns=[])
eval_table.add_column("x_eval", x_eval)
eval_table.add_column("y_eval", y_eval)
eval_table.add_computed_columns(lambda ndx, row: {"img": wandb.Image(row["x_eval"])})

# Create an artifact object
artifact_name = "{}_dataset".format(model_use_case_id)
artifact = wandb.Artifact(name=artifact_name, type="dataset")

# Add wandb.WBValue obj to the artifact.
artifact.add(train_table, "train_table")
artifact.add(eval_table, "eval_table")

# Persist any changes made to the artifact.
artifact.save()

# Tell W&B this run is finished.
run.finish()

Storing files (such as datasets) to an artifact is useful in the context of logging models because you lets you track a model’s dependencies.

Train a model

Train a model with the artifact dataset you created in the previous step.

Declare dataset artifact as an input to the run

Declare the dataset artifact you created in a previous step as the input to the W&B run. This is particularly useful in the context of logging models because declaring an artifact as an input to a run lets you track the dataset (and the version of the dataset) used to train a specific model. W&B uses the information collected to create a lineage map.

Use the use_artifact API to both declare the dataset artifact as the input of the run and to retrieve the artifact itself.

job_type = "train_model"
config = {
    "optimizer": "adam",
    "batch_size": 128,
    "epochs": 5,
    "validation_split": 0.1,
}

# Initialize a W&B run
run = wandb.init(project=project, job_type=job_type, config=config)

# Retrieve the dataset artifact
version = "latest"
name = "{}:{}".format("{}_dataset".format(model_use_case_id), version)
artifact = run.use_artifact(artifact_or_name=name)

# Get specific content from the dataframe
train_table = artifact.get("train_table")
x_train = train_table.get_column("x_train", convert_to="numpy")
y_train = train_table.get_column("y_train", convert_to="numpy")

For more information about tracking the inputs and output of a model, see Create model lineage map.

Define and train model

For this walkthrough, define a 2D Convolutional Neural Network (CNN) with Keras to classify images from the MNIST dataset.

Train CNN on MNIST data

# Store values from our config dictionary into variables for easy accessing
num_classes = 10
input_shape = (28, 28, 1)
loss = "categorical_crossentropy"
optimizer = run.config["optimizer"]
metrics = ["accuracy"]
batch_size = run.config["batch_size"]
epochs = run.config["epochs"]
validation_split = run.config["validation_split"]

# Create model architecture
model = keras.Sequential(
    [
        layers.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

# Generate labels for training data
y_train = keras.utils.to_categorical(y_train, num_classes)

# Create training and test set
x_t, x_v, y_t, y_v = train_test_split(x_train, y_train, test_size=0.33)

Next, train the model:

# Train the model
model.fit(
    x=x_t,
    y=y_t,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_v, y_v),
    callbacks=[WandbCallback(log_weights=True, log_evaluation=True)],
)

Finally, save the model locally on your machine:

# Save model locally
path = "model.h5"
model.save(path)

Log and link a model to the Model Registry

Use the link_model API to log model one ore more files to a W&B run and link it to the W&B Model Registry.

path = "./model.h5"
registered_model_name = "MNIST-dev"

run.link_model(path=path, registered_model_name=registered_model_name)
run.finish()

W&B creates a registered model for you if the name you specify for registered-model-name does not already exist.

See link_model in the API Reference guide for optional parameters.

Evaluate the performance of a model

It is common practice to evaluate the performance of a one or more models.

First, get the evaluation dataset artifact stored in W&B in a previous step.

job_type = "evaluate_model"

# Initialize a run
run = wandb.init(project=project, entity=entity, job_type=job_type)

model_use_case_id = "mnist"
version = "latest"

# Get dataset artifact, mark it as a dependency
artifact = run.use_artifact(
    "{}:{}".format("{}_dataset".format(model_use_case_id), version)
)

# Get desired dataframe
eval_table = artifact.get("eval_table")
x_eval = eval_table.get_column("x_eval", convert_to="numpy")
y_eval = eval_table.get_column("y_eval", convert_to="numpy")

Download the model version from W&B that you want to evaluate. Use the use_model API to access and download your model.

alias = "latest"  # alias
name = "mnist_model"  # name of the model artifact

# Access and download model. Returns path to downloaded artifact
downloaded_model_path = run.use_model(name=f"{name}:{alias}")

Load the Keras model and compute the loss:

model = keras.models.load_model(downloaded_model_path)

y_eval = keras.utils.to_categorical(y_eval, 10)
(loss, _) = model.evaluate(x_eval, y_eval)
score = (loss, _)

Finally, log the loss metric to the W&B run:

# # Log metrics, images, tables, or any data useful for evaluation.
run.log(data={"loss": (loss, _)})

Promote a model version

Mark a model version ready for the next stage of your machine learning workflow with a model alias. Each registered model can have one or more model aliases. A model alias can only belong to a single model version at a time.

For example, suppose that after evaluating a model’s performance, you are confident that the model is ready for production. To promote that model version, add the production alias to that specific model version.

The production alias is one of the most common aliases used to mark a model as production-ready.

You can add an alias to a model version interactively with the W&B App UI or programmatically with the Python SDK. The following steps show how to add an alias with the W&B Model Registry App:

Navigate to the Model Registry App.
Click View details next to the name of your registered model.
Within the Versions section, click the View button next to the name of the model version you want to promote.
Next to the Aliases field, click the plus icon (+).
Type in production into the field that appears.
Press Enter on your keyboard.

4.3.14.2 - Model Registry Terms and Concepts

Model Registry terms and concepts

The following terms describe key components of the W&B Model Registry: model version, model artifact, and registered model.

Model version

A model version represents a single model checkpoint. Model versions are a snapshot at a point in time of a model and its files within an experiment.

A model version is an immutable directory of data and metadata that describes a trained model. W&B suggests that you add files to your model version that let you store (and restore) your model architecture and learned parameters at a later date.

A model version belongs to one, and only one, model artifact. A model version can belong to zero or more, registered models. Model versions are stored in a model artifact in the order they are logged to the model artifact. W&B automatically creates a new model version if it detects that a model you log (to the same model artifact) has different contents than a previous model version.

Store files within model versions that are produced from the serialization process provided by your modeling library (for example, PyTorch and Keras).

Model alias

Model aliases are mutable strings that allow you to uniquely identify or reference a model version in your registered model with a semantically related identifier. You can only assign an alias to one version of a registered model. This is because an alias should refer to a unique version when used programmatically. It also allows aliases to be used to capture a model’s state (champion, candidate, production).

It is common practice to use aliases such as "best", "latest", "production", or "staging" to mark model versions with special purposes.

For example, suppose you create a model and assign it a "best" alias. You can refer to that specific model with run.use_model

import wandb
run = wandb.init()
name = f"{entity/project/model_artifact_name}:{alias}"
run.use_model(name=name)

Model tags

Model tags are keywords or labels that belong to one or more registered models.

Use model tags to organize registered models into categories and to search over those categories in the Model Registry’s search bar. Model tags appear at the top of the Registered Model Card. You might choose to use them to group your registered models by ML task, owning team, or priority. The same model tag can be added to multiple registered models to allow for grouping.

Model tags, which are labels applied to registered models for grouping and discoverability, are different from model aliases. Model aliases are unique identifiers or nicknames that you use to fetch a model version programatically. To learn more about using tags to organize the tasks in your Model Registry, see Organize models.

Model artifact

A model artifact is a collection of logged model versions. Model versions are stored in a model artifact in the order they are logged to the model artifact.

A model artifact can contain one or more model versions. A model artifact can be empty if no model versions are logged to it.

For example, suppose you create a model artifact. During model training, you periodically save your model during checkpoints. Each checkpoint corresponds to its own model version. All of the model versions created during your model training and checkpoint saving are stored in the same model artifact you created at the beginning of your training script.

The proceeding image shows a model artifact that contains three model versions: v0, v1, and v2.

View an example model artifact here.

Registered model

A registered model is a collection of pointers (links) to model versions. You can think of a registered model as a folder of “bookmarks” of candidate models for the same ML task. Each “bookmark” of a registered model is a pointer to a model version that belongs to a model artifact. You can use model tags to group your registered models.

Registered models often represent candidate models for a single modeling use case or task. For example, you might create registered model for different image classification task based on the model you use: ImageClassifier-ResNet50, ImageClassifier-VGG16, DogBreedClassifier-MobileNetV2 and so on. Model versions are assigned version numbers in the order in which they were linked to the registered model.

View an example Registered Model here.

4.3.14.3 - Track a model

Track a model, the model’s dependencies, and other information relevant to that model with the W&B Python SDK.

Under the hood, W&B creates a lineage of model artifact that you can view with the W&B App or programmatically with the W&B Python SDK. See the Create model lineage map for more information.

How to log a model

Use the run.log_model API to log a model. Provide the path where your model files are saved to the path parameter. The path can be a local file, directory, or reference URI to an external bucket such as s3://bucket/path.

Optionally provide a name for the model artifact for the name parameter. If name is not specified, W&B uses the basename of the input path prepended with the run ID.

Copy and paste the proceeding code snippet. Ensure to replace values enclosed in <> with your own.

import wandb

# Initialize a W&B run
run = wandb.init(project="<project>", entity="<entity>")

# Log the model
run.log_model(path="<path-to-model>", name="<name>")

Example: Log a Keras model to W&B

The proceeding code example shows how to log a convolutional neural network (CNN) model to W&B.

import os
import wandb
from tensorflow import keras
from tensorflow.keras import layers

config = {"optimizer": "adam", "loss": "categorical_crossentropy"}

# Initialize a W&B run
run = wandb.init(entity="charlie", project="mnist-project", config=config)

# Training algorithm
loss = run.config["loss"]
optimizer = run.config["optimizer"]
metrics = ["accuracy"]
num_classes = 10
input_shape = (28, 28, 1)

model = keras.Sequential(
    [
        layers.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

# Save model
model_filename = "model.h5"
local_filepath = "./"
full_path = os.path.join(local_filepath, model_filename)
model.save(filepath=full_path)

# Log the model
run.log_model(path=full_path, name="MNIST")

# Explicitly tell W&B to end the run.
run.finish()

4.3.14.4 - Create a registered model

Create a registered model to hold all the candidate models for your modeling tasks.

Create a registered model to hold all the candidate models for your modeling tasks. You can create a registered model interactively within the Model Registry or programmatically with the Python SDK.

Programmatically create registered a model

Programmatically register a model with the W&B Python SDK. W&B automatically creates a registered model for you if the registered model doesn’t exist.

Ensure to replace other the values enclosed in <> with your own:

import wandb

run = wandb.init(entity="<entity>", project="<project>")
run.link_model(path="<path-to-model>", registered_model_name="<registered-model-name>")
run.finish()

The name you provide for registered_model_name is the name that appears in the Model Registry App.

Interactively create a registered model

Interactively create a registered model within the Model Registry App.

Navigate to the Model Registry App.
Click the New registered model button located in the top right of the Model Registry page.
From the panel that appears, select the entity you want the registered model to belong to from the Owning Entity dropdown.
Provide a name for your model in the Name field.
From the Type dropdown, select the type of artifacts to link to the registered model.
(Optional) Add a description about your model in the Description field.
(Optional) Within the Tags field, add one or more tags.
Click Register model.

Manual linking a model to the model registry is useful for one-off models. However, it is often useful to programmatically link model versions to the model registry.

For example, suppose you have a nightly job. It is tedious to manually link a model created each night. Instead, you could create a script that evaluates the model, and if the model improves in performance, link that model to the model registry with the W&B Python SDK.

4.3.14.5 - Link a model version

Link a model version to a registered model with the W&B App or programmatically with the Python SDK.

Programmatically link a model

Use the link_model method to programmatically log model files to a W&B run and link it to the W&B Model Registry.

Ensure to replace other the values enclosed in <> with your own:

import wandb

run = wandb.init(entity="<entity>", project="<project>")
run.link_model(path="<path-to-model>", registered_model_name="<registered-model-name>")
run.finish()

W&B creates a registered model for you if the name you specify for the registered-model-name parameter does not already exist.

For example, suppose you have an existing registered model named “Fine-Tuned-Review-Autocompletion”(registered-model-name="Fine-Tuned-Review-Autocompletion") in your Model Registry. And suppose that a few model versions are linked to it: v0, v1, v2. If you programmatically link a new model and use the same registered model name (registered-model-name="Fine-Tuned-Review-Autocompletion"), W&B links this model to the existing registered model and assigns it a model version v3. If no registered model with this name exists, a new one registered model is created and it will have a model version v0.

See an example “Fine-Tuned-Review-Autocompletion” registered model here.

Interactively link a model

Interactively link a model with the Model Registry or with the Artifact browser.

Navigate to the Model Registry App.
Hover your mouse next to the name of the registered model you want to link a new model to.
Select the meatball menu icon (three horizontal dots) next to View details.
From the dropdown, select Link new version.
From the Project dropdown, select the name of the project that contains your model.
From the Model Artifact dropdown, select the name of the model artifact.
From the Version dropdown, select the model version you want to link to the registered model.

Navigate to your project’s artifact browser on the W&B App at: https://wandb.ai/<entity>/<project>/artifacts
Select the Artifacts icon on the left sidebar.
Click on the model version you want to link to your registry.
Within the Version overview section, click the Link to registry button.
From the modal that appears on the right of the screen, select a registered model from the Select a register model menu dropdown.
Click Next step.
(Optional) Select an alias from the Aliases dropdown.
Click Link to registry.

View the source of linked models

There are two ways to view the source of linked models: The artifact browser within the project that the model is logged to and the W&B Model Registry.

A pointer connects a specific model version in the model registry to the source model artifact (located within the project the model is logged to). The source model artifact also has a pointer to the model registry.

Navigate to your Model Registry App.
Select View details next the name of your registered model.
Within the Versions section, select View next to the model version you want to investigate.
Click on the Version tab within the right panel.
Within the Version overview section there is a row that contains a Source Version field. The Source Version field shows both the name of the model and the model’s version.

For example, the following image shows a v0 model version called mnist_model (see Source version field mnist_model:v0), linked to a registered model called MNIST-dev.

Navigate to your project’s artifact browser on the W&B App at: https://wandb.ai/<entity>/<project>/artifacts
Select the Artifacts icon on the left sidebar.
Expand the model dropdown menu from the Artifacts panel.
Select the name and version of the model linked to the model registry.
Click on the Version tab within the right panel.
Within the Version overview section there is a row that contains a Linked To field. The Linked To field shows both the name of the registered model and the version it possesses(registered-model-name:version).

For example, in the following image, there is a registered model called MNIST-dev (see the Linked To field). A model version called mnist_model with a version v0(mnist_model:v0) points to the MNIST-dev registered model.

4.3.14.6 - Organize models

Use model tags to organize registered models into categories and to search over those categories.

Navigate to the W&B Model Registry app.
Select View details next to the name of the registered model you want to add a model tag to.
Scroll to the Model card section.
Click the plus button (+) next to the Tags field.
Type in the name for your tag or search for a pre-existing model tag. For example. the following image shows multiple model tags added to a registered model called FineTuned-Review-Autocompletion:

4.3.14.7 - Create model lineage map

This page describes creating lineage graphs in the legacy W&B Model Registry. To learn about lineage graphs in W&B Registry, refer to Create and view lineage maps.

W&B will transition assets from the legacy W&B Model Registry to the new W&B Registry. This migration will be fully managed and triggered by W&B, requiring no intervention from users. The process is designed to be as seamless as possible, with minimal disruption to existing workflows. Refer to Migrate from legacy Model Registry.

A useful feature of logging model artifacts to W&B are lineage graphs. Lineage graphs show artifacts logged by a run as well as artifacts used by specific run.

This means that, when you log a model artifact, you at a minimum have access to view the W&B run that used or produced the model artifact. If you track a dependency, you also see the inputs used by the model artifact.

For example, the proceeding image shows artifacts created and used throughout an ML experiment:

From left to right, the image shows:

The jumping-monkey-1 W&B run created the mnist_dataset:v0 dataset artifact.
The vague-morning-5 W&B run trained a model using the mnist_dataset:v0 dataset artifact. The output of this W&B run was a model artifact called mnist_model:v0.
A run called serene-haze-6 used the model artifact (mnist_model:v0) to evaluate the model.

Track an artifact dependency

Declare an dataset artifact as an input to a W&B run with the use_artifact API to track a dependency.

The proceeding code snippet shows how to use the use_artifact API:

# Initialize a run
run = wandb.init(project=project, entity=entity)

# Get artifact, mark it as a dependency
artifact = run.use_artifact(artifact_or_name="name", aliases="<alias>")

Once you have retrieved your artifact, you can use that artifact to (for example), evaluate the performance of a model.

Example: Train a model and track a dataset as the input of a model

job_type = "train_model"

config = {
    "optimizer": "adam",
    "batch_size": 128,
    "epochs": 5,
    "validation_split": 0.1,
}

run = wandb.init(project=project, job_type=job_type, config=config)

version = "latest"
name = "{}:{}".format("{}_dataset".format(model_use_case_id), version)

artifact = run.use_artifact(name)

train_table = artifact.get("train_table")
x_train = train_table.get_column("x_train", convert_to="numpy")
y_train = train_table.get_column("y_train", convert_to="numpy")

# Store values from our config dictionary into variables for easy accessing
num_classes = 10
input_shape = (28, 28, 1)
loss = "categorical_crossentropy"
optimizer = run.config["optimizer"]
metrics = ["accuracy"]
batch_size = run.config["batch_size"]
epochs = run.config["epochs"]
validation_split = run.config["validation_split"]

# Create model architecture
model = keras.Sequential(
    [
        layers.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

# Generate labels for training data
y_train = keras.utils.to_categorical(y_train, num_classes)

# Create training and test set
x_t, x_v, y_t, y_v = train_test_split(x_train, y_train, test_size=0.33)

# Train the model
model.fit(
    x=x_t,
    y=y_t,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_v, y_v),
    callbacks=[WandbCallback(log_weights=True, log_evaluation=True)],
)

# Save model locally
path = "model.h5"
model.save(path)

path = "./model.h5"
registered_model_name = "MNIST-dev"
name = "mnist_model"

run.link_model(path=path, registered_model_name=registered_model_name, name=name)
run.finish()

4.3.14.8 - Document machine learning model

Add descriptions to model card to document your model

Add a description to the model card of your registered model to document aspects of your machine learning model. Some topics worth documenting include:

Summary: A summary of what the model is. The purpose of the model. The machine learning framework the model uses, and so forth.
Training data: Describe the training data used, processing done on the training data set, where is that data stored and so forth.
Architecture: Information about the model architecture, layers, and any specific design choices.
Deserialize the model: Provide information on how someone on your team can load the model into memory.
Task: The specific type of task or problem that the machine learning model is designed to perform. It’s a categorization of the model’s intended capability.
License: The legal terms and permissions associated with the use of the machine learning model. It helps model users understand the legal framework under which they can utilize the model.
References: Citations or references to relevant research papers, datasets, or external resources.
Deployment: Details on how and where the model is deployed and guidance on how the model is integrated into other enterprise systems, such as a workflow orchestration platforms.

Add a description to the model card

Navigate to the W&B Model Registry app.
Select View details next to the name of the registered model you want to create a model card for.
Go to the Model card section.
Within the Description field, provide information about your machine learning model. Format text within a model card with Markdown markup language.

For example, the following images shows the model card of a Credit-card Default Prediction registered model. Model card credit scoring

4.3.14.9 - Download a model version

How to download a model with W&B Python SDK

Use the W&B Python SDK to download a model artifact that you linked to the Model Registry.

You are responsible for providing additional Python functions, API calls to reconstruct, deserialize your model into a form that you can work with.

W&B suggests that you document information on how to load models into memory with model cards. For more information, see the Document machine learning models page.

Replace values within <> with your own:

import wandb

# Initialize a run
run = wandb.init(project="<project>", entity="<entity>")

# Access and download model. Returns path to downloaded artifact
downloaded_model_path = run.use_model(name="<your-model-name>")

Reference a model version with one of following formats listed:

latest - Use latest alias to specify the model version that is most recently linked.
v# - Use v0, v1, v2, and so on to fetch a specific version in the Registered Model
alias - Specify the custom alias that you and your team assigned to your model version

See use_model in the API Reference guide for more information on possible parameters and return type.

Example: Download and use a logged model

For example, in the proceeding code snippet a user called the use_model API. They specified the name of the model artifact they want to fetch and they also provided a version/alias. They then stored the path that returned from the API to the downloaded_model_path variable.

import wandb

entity = "luka"
project = "NLP_Experiments"
alias = "latest"  # semantic nickname or identifier for the model version
model_artifact_name = "fine-tuned-model"

# Initialize a run
run = wandb.init()
# Access and download model. Returns path to downloaded artifact

downloaded_model_path = run.use_model(name=f"{entity/project/model_artifact_name}:{alias}")

Planned deprecation for W&B Model Registry in 2024

The proceeding tabs demonstrate how to consume model artifacts using the soon to be deprecated Model Registry.

Use the W&B Registry to track, organize and consume model artifacts. For more information see the Registry docs.

Replace values within <> with your own:

import wandb
# Initialize a run
run = wandb.init(project="<project>", entity="<entity>")
# Access and download model. Returns path to downloaded artifact
downloaded_model_path = run.use_model(name="<your-model-name>")

Reference a model version with one of following formats listed:

latest - Use latest alias to specify the model version that is most recently linked.
v# - Use v0, v1, v2, and so on to fetch a specific version in the Registered Model
alias - Specify the custom alias that you and your team assigned to your model version

See use_model in the API Reference guide for parameters and return type.

Navigate to the Model Registry App.
Select View details next to the name of the registered model that contains the model you want to download.
Within the Versions section, select the View button next to the model version you want to download.
Select the Files tab.
Click on the download button next to the model file you want to download.

4.3.14.10 - Create alerts and notifications

Get Slack notifications when a new model version is linked to the model registry.

Receive Slack notifications when a new model version is linked to the model registry.

Navigate to the W&B Model Registry app.
Select the registered model you want to receive notifications from.
Click on the Connect Slack button.
Follow the instructions to enable W&B in your Slack workspace that appear on the OAuth page.

Once you have configured Slack notifications for your team, you can pick and choose registered models to get notifications from.

A toggle that reads New model version linked to… appears instead of a Connect Slack button if you have Slack notifications configured for your team.

The screenshot below shows a FMNIST classifier registered model that has Slack notifications.

A message is automatically posted to the connected Slack channel each time a new model version is linked to the FMNIST classifier registered model.

4.3.14.11 - Manage data governance and access control

Use model registry role based access controls (RBAC) to control who can update protected aliases.

Use protected aliases to represent key stages of your model development pipeline. Only Model Registry Administrators can add, modify, or remove protected aliases. Model registry admins can define and use protected aliases. W&B blocks non admin users from adding or removing protected aliases from model versions.

Only Team admins or current registry admins can manage the list of registry admins.

For example, suppose you set staging and production as protected aliases. Any member of your team can add new model versions. However, only admins can add a staging or production alias.

Set up access control

The following steps describe how to set up access controls for your team’s model registry.

Navigate to the W&B Model Registry app.
Select the gear button on the top right of the page.
Select the Manage registry admins button.
Within the Members tab, select the users you want to grant access to add and remove protected aliases from model versions.

Add protected aliases

Navigate to the W&B Model Registry app.
Select the gear button on the top right of the page.
Scroll down to the Protected Aliases section.
Click on the plus icon (+) icon to add new a new alias.

4.4 - Reports

Project management and collaboration tools for machine learning projects

Try in Colab Try in W&B

Use W&B Reports to:

Organize Runs.
Embed and automate visualizations.
Describe your findings.
Share updates with collaborators, either as a LaTeX zip file a PDF.

The following image shows a section of a report created from metrics that were logged to W&B over the course of training.

View the report where the above image was taken from here.

How it works

Create a collaborative report with a few clicks.

Navigate to your W&B project workspace in the W&B App.
Click the Create report button in the upper right corner of your workspace.

A modal titled Create Report will appear. Select the charts and panels you want to add to your report. (You can add or remove charts and panels later).
Click Create report.
Edit the report to your desired state.
Click Publish to project.
Click the Share button to share your report with collaborators.

See the Create a report page for more information on how to create reports interactively an programmatically with the W&B Python SDK.

How to get started

Depending on your use case, explore the following resources to get started with W&B Reports:

Check out our video demonstration to get an overview of W&B Reports.
Explore the Reports gallery for examples of live reports.
Try the Programmatic Workspaces tutorial to learn how to create and customize your workspace.
Read curated Reports in W&B Fully Connected.

Recommended practices and tips

For best practices and tips for Experiments and logging, see Best Practices: Reports.

4.4.1 - Create a report

Create a W&B Report with the W&B App or programmatically.

W&B Report and Workspace API is in Public Preview.

Select a tab below to learn how to create a report in the W&B App or programmatically with the W&B Report and Workspace API.

See this Google Colab for an example on how to programmatically create a report.

Navigate to your project workspace in the W&B App.
Click Create report in the upper right corner of your workspace.
A modal will appear. Select the charts you would like to start with. You can add or delete charts later from the report interface.
Select the Filter run sets option to prevent new runs from being added to your report. You can toggle this option on or off. Once you click Create report, a draft report will be available in the report tab to continue working on.

Navigate to your project workspace in the W&B App.
Select to the Reports tab (clipboard image) in your project.
Select the Create Report button on the report page.

Create a report programmatically:

Install W&B SDK (wandb) and Report and Workspace API (wandb-workspaces):
```
pip install wandb wandb-workspaces
```

Next, import workspaces

import wandb
import wandb_workspaces.reports.v2 as wr

Create a report with wandb_workspaces.reports.v2.Report. Create a report instance with the Report Class Public API (wandb.apis.reports). Specify a name for the project.
```
report = wr.Report(project="report_standard")
```
Save the report. Reports are not uploaded to the W&B server until you call the .save() method:
```
report.save()
```

For information on how to edit a report interactively with the App UI or programmatically, see Edit a report.

4.4.2 - Edit a report

Edit a report interactively with the App UI or programmatically with the W&B SDK.

W&B Report and Workspace API is in Public Preview.

Edit a report interactively with the App UI or programmatically with the W&B SDK.

Reports consist of blocks. Blocks make up the body of a report. Within these blocks you can add text, images, embedded visualizations, plots from experiments and run, and panels grids.

Panel grids are a specific type of block that hold panels and run sets. Run sets are a collection of runs logged to a project in W&B. Panels are visualizations of run set data.

Check out the Programmatic workspaces tutorial for a step by step example on how create and customize a saved workspace view.

Verify that you have the W&B Report and Workspace API wandb-workspaces installed in addition to the W&B Python SDK if you want to programmatically edit a report:

pip install wandb wandb-workspaces

Add plots

Each panel grid has a set of run sets and a set of panels. The run sets at the bottom of the section control what data shows up on the panels in the grid. Create a new panel grid if you want to add charts that pull data from a different set of runs.

Enter a forward slash (/) in the report to display a dropdown menu. Select Add panel to add a panel. You can add any panel that is supported by W&B, including a line plot, scatter plot or parallel coordinates chart.

Add plots to a report programmatically with the SDK. Pass a list of one or more plot or chart objects to the panels parameter in the PanelGrid Public API Class. Create a plot or chart object with its associated Python Class.

The proceeding examples demonstrates how to create a line plot and scatter plot.

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(
    project="report-editing",
    title="An amazing title",
    description="A descriptive description.",
)

blocks = [
    wr.PanelGrid(
        panels=[
            wr.LinePlot(x="time", y="velocity"),
            wr.ScatterPlot(x="time", y="acceleration"),
        ]
    )
]

report.blocks = blocks
report.save()

For more information about available plots and charts you can add to a report programmatically, see wr.panels.

Add run sets

Add run sets from projects interactively with the App UI or the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown, choose Panel Grid. This will automatically import the run set from the project the report was created from.

If you import a panel into a report, run names are inherited from the project. In the report, you can optionally rename a run to give the reader more context. The run is renamed only in the individual panel. If you clone the panel in the same report, the run is also renamed in the cloned panel.

In the report, click the pencil icon to open the report editor.
In the run set, find the run to rename. Hover over the report name, click the three vertical dots. Select one of the following choices, then submit the form.
- Rename run for project: rename the run across the entire project. To generate a new random name, leave the field blank.
- Rename run for panel grid rename the run only in the report, preserving the existing name in other contexts. Generating a new random name is not supported.
Click Publish report.

Add run sets from projects with the wr.Runset() and wr.PanelGrid Classes. The proceeding procedure describes how to add a runset:

Create a wr.Runset() object instance. Provide the name of the project that contains the run sets for the project parameter and the entity that owns the project for the entity parameter.
Create a wr.PanelGrid() object instance. Pass a list of one or more runset objects to the run sets parameter.
Store one or more wr.PanelGrid() object instances in a list.
Update the report instance blocks attribute with the list of panel grid instances.

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(
    project="report-editing",
    title="An amazing title",
    description="A descriptive description.",
)

panel_grids = wr.PanelGrid(
    runsets=[wr.RunSet(project="<project-name>", entity="<entity-name>")]
)

report.blocks = [panel_grids]
report.save()

You can optionally add runsets and panels with one call to the SDK:

import wandb

report = wr.Report(
    project="report-editing",
    title="An amazing title",
    description="A descriptive description.",
)

panel_grids = wr.PanelGrid(
    panels=[
        wr.LinePlot(
            title="line title",
            x="x",
            y=["y"],
            range_x=[0, 100],
            range_y=[0, 100],
            log_x=True,
            log_y=True,
            title_x="x axis title",
            title_y="y axis title",
            ignore_outliers=True,
            groupby="hyperparam1",
            groupby_aggfunc="mean",
            groupby_rangefunc="minmax",
            smoothing_factor=0.5,
            smoothing_type="gaussian",
            smoothing_show_original=True,
            max_runs_to_show=10,
            plot_type="stacked-area",
            font_size="large",
            legend_position="west",
        ),
        wr.ScatterPlot(
            title="scatter title",
            x="y",
            y="y",
            # z='x',
            range_x=[0, 0.0005],
            range_y=[0, 0.0005],
            # range_z=[0,1],
            log_x=False,
            log_y=False,
            # log_z=True,
            running_ymin=True,
            running_ymean=True,
            running_ymax=True,
            font_size="small",
            regression=True,
        ),
    ],
    runsets=[wr.RunSet(project="<project-name>", entity="<entity-name>")],
)


report.blocks = [panel_grids]
report.save()

Freeze a run set

A report automatically updates run sets to show the latest data from the project. You can preserve the run set in a report by freezing that run set. When you freeze a run set, you preserve the state of the run set in a report at a point in time.

To freeze a run set when viewing a report, click the snowflake icon in its panel grid near the Filter button.

Filter a run set programmatically

Programmatically filter run sets and add them to a report with the Workspace and Reports API.

The general syntax for a filter expression is:

Filter('key') operation <value>

Where key is the name of the filter, operation is a comparison operator (e.g., >, <, ==, in, not in, or, and and), and <value> is the value to compare against. Filter is a placeholder for the type of filter you want to apply. The following table lists the available filters and their descriptions:

Filter	Description	Available keys
`Config('key')`	Filter by config values	Values specified in `config` parameter in `wandb.init(config=)`.
`SummaryMetric('key')`	Filter by summary metrics	Values you log to a run with `wandb.Run.log()`.
`Tags('key')`	Filter by tags	Tag values that you add to your run (programmatically or with the W&B App).
`Metric('key')`	Filter by run properties	`tags`, `state`, `displayName`, `jobType`

Once you have defined your filters, you can create a report and pass the filtered run sets to wr.PanelGrid(runsets=). See the Report and Workspace API tabs throughout this page for more information on how to add various elements to a report programmatically.

The following examples demonstrate how to filter run sets in a report.

Config filters

Filter a runset by one or more config values. Config values are parameters you specify in your run configuration (wandb.init(config=)).

For example, the following code snippet first initializes a run with a config value for learning_rate and batch_size, then filters runs in a report based on the learning_rate config value.

import wandb

config = {
    "learning_rate": 0.01,
    "batch_size": 32,
}

with wandb.init(project="<project>", entity="<entity>", config=config) as run:
    # Your training code here
    pass

Within your Python script or notebook, you can then programmatically filter runs that have a learning rate greater than 0.01.

import wandb_workspaces.reports.v2 as wr

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Config('learning_rate') > 0.01"
)

You can also filter by multiple config values with the and operator:

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Config('learning_rate') > 0.01 and Config('batch_size') == 32"
)

Continuing from the previous example, you can create a report with the filtered runset as follows:

report = wr.Report(
  entity="your_entity",
  project="your_project",
  title="My Report"
)

report.blocks = [
  wr.PanelGrid(
      runsets=[runset],
      panels=[
          wr.LinePlot(
              x="Step",
              y=["accuracy"],
          )
      ]
  )
]

report.save()

Metric filters

Filter run sets based on a run’s: tag (tags), run state (state), run name (displayName), or job type (jobType).

Metric filters posses a different syntax. Pass a list of values as a list.

Metric('key') operation [<value>]

For example, consider the following Python snippet that creates three runs and assigns each of them a name:

import wandb

with wandb.init(project="<project>", entity="<entity>") as run:
    for i in range(3):
        run.name = f"run{i+1}"
        # Your training code here
        pass

When you create your report, you can filter runs by their display name. For example, to filter runs with names run1, run2, and run3, you can use the following code:

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Metric('displayName') in ['run1', 'run2', 'run3']"
)

You can find the name of the run in the Overview page of a run in the W&B App or programmatically with Api.runs().run.name.

The following examples demonstrate how to filter a runset by the run’s state (finished, crashed, or running):

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Metric('state') in ['finished']"
)

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Metric('state') not in ['crashed']"
)

SummaryMetric filters

The following examples demonstrate how to filter a run set by summary metrics. Summary metrics are the values you log to a run with wandb.Run.log(). After you log a run, you can find the names of your summary metrics in the W&B App under the Summary section of a run’s Overview page.

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="SummaryMetric('accuracy') > 0.9"
)

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Metric('state') in ['finished'] and SummaryMetric('train/train_loss') < 0.5"
)

Tags filters

The following code snippet shows how to filter a runs set by its tags. Tags are values you add to a run (programmatically or with the W&B App).

runset = wr.Runset(
  entity="your_entity",
  project="your_project",
  filters="Tags('training') == 'training'"
)

Add code blocks

Add code blocks to your report interactively with the App UI or with the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Code.

Select the name of the programming language on the right hand of the code block. This will expand a dropdown. From the dropdown, select your programming language syntax. You can choose from Javascript, Python, CSS, JSON, HTML, Markdown, and YAML.

Use the wr.CodeBlock Class to create a code block programmatically. Provide the name of the language and the code you want to display for the language and code parameters, respectively.

For example the proceeding example demonstrates a list in YAML file:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.CodeBlock(
        code=["this:", "- is", "- a", "cool:", "- yaml", "- file"], language="yaml"
    )
]

report.save()

This will render a code block similar to:

this:
- is
- a
cool:
- yaml
- file

The proceeding example demonstrates a Python code block:

report = wr.Report(project="report-editing")


report.blocks = [wr.CodeBlock(code=["Hello, World!"], language="python")]

report.save()

This will render a code block similar to:

Hello, World!

Add markdown

Add markdown to your report interactively with the App UI or with the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown choose Markdown.

Use the wandb.apis.reports.MarkdownBlock Class to create a markdown block programmatically. Pass a string to the text parameter:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.MarkdownBlock(text="Markdown cell with *italics* and **bold** and $e=mc^2$")
]

This will render a markdown block similar to:

Add HTML elements

Add HTML elements to your report interactively with the App UI or with the W&B SDK.

Enter a forward slash (/) in the report to display a dropdown menu. From the dropdown select a type of text block. For example, to create an H2 heading block, select the Heading 2 option.

Pass a list of one or more HTML elements to wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to create an H1, H2, and an unordered list:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.H1(text="How Programmatic Reports work"),
    wr.H2(text="Heading 2"),
    wr.UnorderedList(items=["Bullet 1", "Bullet 2"]),
]

report.save()

This will render a HTML elements to the following:

Embed rich media links

Embed rich media within the report with the App UI or with the W&B SDK.

Copy and past URLs into reports to embed rich media within the report. The following animations demonstrate how to copy and paste URLs from Twitter, YouTube, and SoundCloud.

Twitter

Copy and paste a Tweet link URL into a report to view the Tweet within the report.

Youtube

Copy and paste a YouTube video URL link to embed a video in the report.

SoundCloud

Copy and paste a SoundCloud link to embed an audio file into a report.

Pass a list of one or more embedded media objects to the wandb.apis.reports.blocks attribute. The proceeding example demonstrates how to embed video and Twitter media into a report:

import wandb
import wandb_workspaces.reports.v2 as wr

report = wr.Report(project="report-editing")

report.blocks = [
    wr.Video(url="https://www.youtube.com/embed/6riDJMI-Y8U"),
    wr.Twitter(
        embed_html='<blockquote class="twitter-tweet"><p lang="en" dir="ltr">The voice of an angel, truly. <a href="https://twitter.com/hashtag/MassEffect?src=hash&amp;ref_src=twsrc%5Etfw">#MassEffect</a> <a href="https://t.co/nMev97Uw7F">pic.twitter.com/nMev97Uw7F</a></p>&mdash; Mass Effect (@masseffect) <a href="https://twitter.com/masseffect/status/1428748886655569924?ref_src=twsrc%5Etfw">August 20, 2021</a></blockquote>\n'
    ),
]
report.save()

Duplicate and delete panel grids

If you have a layout that you would like to reuse, you can select a panel grid and copy-paste it to duplicate it in the same report or even paste it into a different report.

Highlight a whole panel grid section by selecting the drag handle in the upper right corner. Click and drag to highlight and select a region in a report such as panel grids, text, and headings.

Select a panel grid and press delete on your keyboard to delete a panel grid.

Collapse headers to organize Reports

Collapse headers in a Report to hide content within a text block. When the report is loaded, only headers that are expanded will show content. Collapsing headers in reports can help organize your content and prevent excessive data loading. The proceeding gif demonstrates the process.

Visualize relationships across multiple dimensions

To effectively visualize relationships across multiple dimensions, use a color gradient to represent one of the variables. This enhances clarity and makes patterns easier to interpret.

Choose a variable to represent with a color gradient (e.g., penalty scores, learning rates, etc.). This allows for a clearer understanding of how penalty (color) interacts with reward/side effects (y-axis) over training time (x-axis).
Highlight key trends. Hovering over a specific group of runs highlights them in the visualization.

4.4.3 - Collaborate on reports

Collaborate and share W&B Reports with peers, co-workers, and your team.

This page describes various ways to collaborate on reports with your team.

When viewing a report, click Share, then:

To share a link to the report with an email address or a username, click Invite. Enter an email address or username, select Can view or Can edit, then click Invite. If you share by email, the email address does not need to be a member of your organization or team.
To generate a sharing link instead, click Share. Adjust the permissions for the link, then click Copy report link. Share the link with the member.

When viewing the report, click a panel to open it in full screen mode. If you copy the URL from the browser and share it with another user, when they access the link the panel will open directly in full screen mode.

Edit a report

When any team member clicks the Edit button to begin editing the report, a draft is automatically saved. Select Save to report to publish your changes.

If an edit conflict occurs, such as when two team members edit the report at once, a warning notification helps you to resolve any conflicts.

Report sharing modal for a report in a 'Public' project

Comment on reports

Click Comment to leave a comment on a report.

To comment directly on a panel, hover over the panel, then click the comment button, which looks like a speech bubble.

Star a report

If your team has a large number of reports, click Star at the top of a report to add it to your favorites. When viewing your team’s list of reports, click the star in a report’s row to add it to your favorites. Starred reports appear at the top of the list.

From the list of reports, you can see how many members have starred each report to gauge its relative popularity.

4.4.4 - Clone and export reports

Export a W&B Report as a PDF or LaTeX.

W&B Report and Workspace API is in Public Preview.

Export reports

Export a report as a PDF or LaTeX. Within your report, select the kebab icon to expand the dropdown menu. Choose Download and select either PDF or LaTeX output format.

Cloning reports

Within your report, select the kebab icon to expand the dropdown menu. Choose the Clone this report button. Pick a destination for your cloned report in the modal. Choose Clone report.

Clone a report to reuse a project’s template and format. Cloned projects are visible to your team if you clone a project within the team’s account. Projects cloned within an individual’s account are only visible to that user.

Load a Report from a URL to use it as a template.

report = wr.Report(
    project=PROJECT, title="Quickstart Report", description="That was easy!"
)  # Create
report.save()  # Save
new_report = wr.Report.from_url(report.url)  # Load

Edit the content within new_report.blocks.

pg = wr.PanelGrid(
    runsets=[
        wr.Runset(ENTITY, PROJECT, "First Run Set"),
        wr.Runset(ENTITY, PROJECT, "Elephants Only!", query="elephant"),
    ],
    panels=[
        wr.LinePlot(x="Step", y=["val_acc"], smoothing_factor=0.8),
        wr.BarPlot(metrics=["acc"]),
        wr.MediaBrowser(media_keys="img", num_columns=1),
        wr.RunComparer(diff_only="split", layout={"w": 24, "h": 9}),
    ],
)
new_report.blocks = (
    report.blocks[:1] + [wr.H1("Panel Grid Example"), pg] + report.blocks[1:]
)
new_report.save()

4.4.5 - Embed a report

Embed W&B reports directly into Notion or with an HTML IFrame element.

HTML iframe element

Select the Share button on the upper right hand corner within a report. A modal window will appear. Within the modal window, select Copy embed code. The copied code will render within an Inline Frame (IFrame) HTML element. Paste the copied code into an iframe HTML element of your choice.

Only public reports are viewable when embedded.

Confluence

The proceeding animation demonstrates how to insert the direct link to the report within an IFrame cell in Confluence.

Notion

The proceeding animation demonstrates how to insert a report into a Notion document using an Embed block in Notion and the report’s embedded code.

Gradio

You can use the gr.HTML element to embed W&B Reports within Gradio Apps and use them within Hugging Face Spaces.

import gradio as gr


def wandb_report(url):
    iframe = f'<iframe src={url} style="border:none;height:1024px;width:100%">'
    return gr.HTML(iframe)


with gr.Blocks() as demo:
    report = wandb_report(
        "https://wandb.ai/_scott/pytorch-sweeps-demo/reports/loss-22-10-07-16-00-17---VmlldzoyNzU2NzAx"
    )
demo.launch()

4.4.6 - Compare runs across projects

Compare runs from two different projects with cross-project reports.

Watch a video demonstrating comparing runs across projects (2 min).

Compare runs from two different projects with cross-project reports. Use the project selector in the run set table to pick a project.

The visualizations in the section pull columns from the first active runset. Make sure that the first run set checked in the section has that column available if you do not see the metric you are looking for in the line plot.

This feature supports history data on time series lines, but we don’t support pulling different summary metrics from different projects. In other words, you can not create a scatter plot from columns that are only logged in another project.

If you need to compare runs from two projects and the columns are not working, add a tag to the runs in one project and then move those runs to the other project. You can still filter only the runs from each project, but the report includes all the columns for both sets of runs.

View-only report links

Share a view-only link to a report that is in a private project or team project.

View-only report links add a secret access token to the URL, so anyone who opens the link can view the page. Anyone can use the magic link to view the report without logging in first. For customers on W&B Local private cloud installations, these links remain behind your firewall, so only members of your team with access to your private instance and access to the view-only link can view the report.

In view-only mode, someone who is not logged in can see the charts and mouse over to see tooltips of values, zoom in and out on charts, and scroll through columns in the table. When in view mode, they cannot create new charts or new table queries to explore the data. View-only visitors to the report link won’t be able to click a run to get to the run page. Also, the view-only visitors would not be able to see the share modal but instead would see a tooltip on hover which says: Sharing not available for view only access.

The magic links are only available for “Private” and “Team” projects. For “Public” (anyone can view) or “Open” (anyone can view and contribute runs) projects, the links can’t turn on/off because this project is public implying that it is already available to anyone with the link.

Send a graph to a report

Send a graph from your workspace to a report to keep track of your progress. Click the dropdown menu on the chart or panel you’d like to copy to a report and click Add to report to select the destination report.

4.4.7 - Example reports

Reports gallery

Notes: Add a visualization with a quick summary

Capture an important observation, an idea for future work, or a milestone reached in the development of a project. All experiment runs in your report will link to their parameters, metrics, logs, and code, so you can save the full context of your work.

Jot down some text and pull in relevant charts to illustrate your insight.

See the What To Do When Inception-ResNet-V2 Is Too Slow W&B Report for an example of how you can share comparisons of training time.

Save the best examples from a complex code base for easy reference and future interaction. See the LIDAR point clouds W&B Report for an example of how to visualize LIDAR point clouds from the Lyft dataset and annotate with 3D bounding boxes.

Explain how to get started with a project, share what you’ve observed so far, and synthesize the latest findings. Your colleagues can make suggestions or discuss details using comments on any panel or at the end of the report.

Include dynamic settings so that your colleagues can explore for themselves, get additional insights, and better plan their next steps. In this example, three types of experiments can be visualized independently, compared, or averaged.

See the SafeLife benchmark experiments W&B Report for an example of how to share first runs and observations of a benchmark.

Use sliders and configurable media panels to showcase a model’s results or training progress. View the Cute Animals and Post-Modern Style Transfer: StarGAN v2 for Multi-Domain Image Synthesis report for an example W&B Report with sliders.

Work log: Track what you’ve tried and plan next steps

Write down your thoughts on experiments, your findings, and any gotchas and next steps as you work through a project, keeping everything organized in one place. This lets you “document” all the important pieces beyond your scripts. See the Who Is Them? Text Disambiguation With Transformers W&B Report for an example of how you can report your findings.

Tell the story of a project, which you and others can reference later to understand how and why a model was developed. See The View from the Driver’s Seat W&B Report for how you can report your findings.

See the Learning Dexterity End-to-End Using W&B Reports for an example of how W&B Reports were used to explore how the OpenAI Robotics team used W&B Reports to run massive machine learning projects.

4.5 - Automations

This feature requires a Pro or Enterprise plan.

This page describes automations in W&B. Create an automation to trigger workflow steps, such as automated model testing and deployment, based on an event in W&B, such as when an artifact artifact version is created or when a run metric meets or changes by a threshold.

For example, an automation can notify a Slack channel when a new version is created, trigger an automated testing webhook when the production alias is added to an artifact, or start a validation job only when a run’s loss is within acceptable bounds.

Overview

An automation can start when a specific event occurs in a registry or project.

In a Registry, an automation can start:

When a new artifact version is linked to a collection. For example, trigger testing and validation workflows for new candidate models.
When an alias is added to an artifact version. For example, trigger a deployment workflow when an alias is added to a model version.

In a project, an automation can start:

When a new version is added to an artifact. For example, start a training job when a new version of a dataset artifact is added to a given collection.
When an alias is added to an artifact version. For example, trigger a PII redaction workflow when the alias “redaction” is added to a dataset artifact.
When a tag is added to an artifact version. For example, trigger a geo-specific workflow when the tag “europe” is added to an artifact version.
When a metric for a run meets or exceeds a configured threshold.
When a metric for a run changes by a configured threshold.
When a run’s status changes to Running, Failed, or Finished.

Optionally filter runs by user or run name.

For more details, see Automation events and scopes.

To create an automation, you:

If required, configure secrets for sensitive strings the automation requires, such as access tokens, passwords, or sensitive configuration details. Secrets are defined in your Team Settings. Secrets are most commonly used in webhook automations to securely pass credentials or tokens to the webhook’s external service without exposing it in plain text or hard-coding it in the webhook’s payload.
Configure the webhook or Slack notification to authorize W&B to post to Slack or run the webhook on your behalf. A single automation action (webhook or Slack notification) can be used by multiple automations. These actions are defined in your Team Settings.
In the project or registry, create the automation:
1. Define the event to watch for, such as when a new artifact version is added.
2. Define the action to take when the event occurs (posting to a Slack channel or running a webhook). For a webhook, specify a secret to use for the access token and/or a secret to send with the payload, if required.

Limitations

Run metric automations are currently supported only in W&B Multi-tenant Cloud.

Next steps

Create an automation.
Learn about Automation events and scopes.
Create a secret.

4.5.1 - Create an automation

This feature requires a Pro or Enterprise plan.

This page gives an overview of creating and managing W&B automations. For more detailed instructions, refer to Create a Slack automation or Create a webhook automation.

Looking for companion tutorials for automations?

Requirements

A team admin can create and manage automations for the team’s projects, as well as components of their automations, such as webhooks, secrets, and Slack integrations. Refer to Team settings.
To create a registry automation, you must have access to the registry. Refer to Configure Registry access.
To create a Slack automation, you must have permission to post to the Slack instance and channel you select.

Create an automation

Create an automation from the project or registry’s Automations tab. At a high level, to create an automation, follow these steps:

If necessary, create a W&B secret for each sensitive string required by the automation, such as an access token, password, or SSH key. Secrets are defined in your Team Settings. Secrets are most commonly used in webhook automations.
Configure the webhook or Slack integration to authorize W&B to post to Slack or run the webhook on your behalf. A single webhook or Slack integration can be used by multiple automations. These actions are defined in your Team Settings.
In the project or registry, create the automation, which specifies the event to watch for and the action to take (such as posting to Slack or running a webhook). When you create a webhook automation, you configure the payload it sends.

Or, from a line plot in the workspace, you can quickly create a run metric automation for the metric it shows:

Hover over the panel, then click the bell icon at the top of the panel.
Configure the automation using the basic or advanced configuration controls. For example, apply a run filter to limit the scope of the automation, or configure an absolute threshold.

For details, refer to:

View and manage automations

View and manage automations from a project or registry’s Automations tab.

To view an automation’s details, click its name.
To edit an automation, click its action ... menu, then click Edit automation.
To delete an automation, click its action ... menu, then click Delete automation.

Next steps

Learn more about automation events and scopes
Create a Slack automation.
Create a webhook automation.
Create a secret.

4.5.1.1 - Create a Slack automation

This feature requires a Pro or Enterprise plan.

This page shows how to create a Slack automation. To create a webhook automation, refer to Create a webhook automation instead.

At a high level, to create a Slack automation, you take these steps:

Add a Slack integration, which authorizes W&B to post to the Slack instance and channel.
Create the automation, which defines the event to watch for and the channel to notify.

Add a Slack integration

A team admin can add a Slack integration to the team.

Log in to W&B and go to Team Settings.
In the Slack channel integrations section, click Connect Slack to add a new Slack instance. To add a channel for an existing Slack instance, click New integration.
If necessary, sign in to Slack in your browser. When prompted, grant W&B permission to post to the Slack channel you select. Read the page, then click Search for a channel and begin typing the channel name. Select the channel from the list, then click Allow.
In Slack, go to the channel you selected. If you see a post like [Your Slack handle] added an integration to this channel: Weights & Biases, the integration is configured correctly.

Now you can create an automation that notifies the Slack channel you configured.

View and manage Slack integrations

A team admin can view and manage the team’s Slack instances and channels.

Log in to W&B and go to Team Settings.
View each Slack destination in the Slack channel integrations section.
Delete a destination by clicking its trash icon.

Create an automation

After you add a Slack integration, select Registry or Project, then follow these steps to create an automation that notifies the Slack channel.

A Registry admin can create automations in that registry.

Log in to W&B.
Click the name of a registry to view its details,
To create an automation scoped to the registry, click the Automations tab, then click Create automation. An automation that is scoped to a registry is automatically applied to all of its collections (including those created in the future).

To create an automation scoped only to a specific collection in the registry, click the collection’s action ... menu, then click Create automation. Alternatively, while viewing a collection, create an automation for it using the Create automation button in the Automations section of the collection’s details page.
Choose the event to watch for.

Fill in any additional fields that appear, which depend upon the event. For example, if you select An artifact alias is added, you must specify the Alias regex.

Click Next step.
Select the team that owns the Slack integration.
Set Action type to Slack notification. Select the Slack channel, then click Next step.
Provide a name for the automation. Optionally, provide a description.
Click Create automation.

A W&B admin can create automations in a project.

Log in to W&B.
Go the project page and click the Automations tab, then click Create automation.

Or, from a line plot in the workspace, you can quickly create a run metric automation for the metric it shows. Hover over the panel, then click the bell icon at the top of the panel.
Choose the event to watch for.

Fill in any additional fields that appear, which depend upon the event. For example, if you select An artifact alias is added, you must specify the Alias regex.

Click Next step.
Select the team that owns the Slack integration.
Set Action type to Slack notification. Select the Slack channel, then click Next step.
Provide a name for the automation. Optionally, provide a description.
Click Create automation.

View and manage automations

Manage the registry’s automations from the registry’s Automations tab.
Mamage a collection’s automations from the Automations section of the collection’s details page.

From either of these pages, a Registry admin can manage existing automations:

To view an automation’s details, click its name.
To edit an automation, click its action ... menu, then click Edit automation.
To delete an automation, click its action ... menu, then click Delete automation. Confirmation is required.

A W&B admin can view and manage a project’s automations from the project’s Automations tab.

To view an automation’s details, click its name.
To edit an automation, click its action ... menu, then click Edit automation.
To delete an automation, click its action ... menu, then click Delete automation. Confirmation is required.

4.5.1.2 - Create a webhook automation

This feature requires a Pro or Enterprise plan.

This page shows how to create a webhook automation. To create a Slack automation, refer to Create a Slack automation instead.

At a high level, to create a webhook automation, you take these steps:

If necessary, create a W&B secret for each sensitive string required by the automation, such as an access token, password, or SSH key. Secrets are defined in your Team Settings.
Create a webhook to define the endpoint and authorization details and grant the integration access to any secrets it needs.
Create the automation to define the event to watch for and the payload W&B will send. Grant the automation access to any secrets it needs for the payload.

Create a webhook

A team admin can add a webhook for the team.

If the webhook requires a Bearer token or its payload requires a sensitive string, create a secret that contains it before creating the webhook. You can configure at most one access token and one other secret for a webhook. Your webhook’s authentication and authorization requirements are determined by the webhook’s service.

Log in to W&B and go to Team Settings page.
In the Webhooks section, click New webhook.
Provide a name for the webhook.
Provide the endpoint URL for the webhook.
If the webhook requires a Bearer token, set Access token to the secret that contains it. When using the webhook automation, W&B sets the Authorization: Bearer HTTP header to the access token, and you can access the token in the ${ACCESS_TOKEN} payload variable. Learn more about the structure of the POST request W&B sends to the webhook service in Troubleshoot your webhook.
If the webhook requires a password or other sensitive string in its payload, set Secret to the secret that contains it. When you configure the automation that uses the webhook, you can access the secret as a payload variable by prefixing its name with $.

If the webhook’s access token is stored in a secret, you must also complete the next step to specify the secret as the access token.
To verify that the W&B can connect and authenticate to the endpoint:
1. Optionally, provide a payload to test. To refer to a secret the webhook has access to in the payload, prefix its name with $. This payload is only used for testing and is not saved. You configure an automation’s payload when you create the automation. See Troubleshoot your webhook to view where the secret and access token are specified in the POST request.
2. Click Test. W&B attempts to connect to the webhook’s endpoint using the credentials you configured. If you provided a payload, W&B sends it.
If the test does not succeed, verify the webhook’s configuration and try again. If necessary, refer to Troubleshoot your webhook.

Screenshot showing two webhooks in a Team

Now you can create an automation that uses the webhook.

Create an automation

After you configure a webhook, select Registry or Project, then follow these steps to create an automation that triggers the webhook.

A Registry admin can create automations in that registry. Registry automations are applied to all collections in the registry, including those added in the future.

Log in to W&B.
Click the name of a registry to view its details,
To create an automation scoped to the registry, click the Automations tab, then click Create automation. An automation that is scoped to a registry is automatically applied to all of its collections (including those created in the future).

To create an automation scoped only to a specific collection in the registry, click the collection’s action ... menu, then click Create automation. Alternatively, while viewing a collection, create an automation for it using the Create automation button in the Automations section of the collection’s details page.
Choose the event to watch for. Fill in any additional fields that appear, which depend upon the event. For example, if you select An artifact alias is added, you must specify the Alias regex. Click Next step.
Select the team that owns the webhook.
Set Action type to Webhooks. then select the webhook to use.
If you configured an access token for the webhook, you can access the token in the ${ACCESS_TOKEN} payload variable. If you configured a secret for the webhook, you can access it in the payload by prefixing its name with $. Your webhook’s requirements are determined by the webhook’s service.
Click Next step.
Provide a name for the automation. Optionally, provide a description. Click Create automation.

A W&B admin can create automations in a project.

Log in to W&B and go to the project page.
In the sidebar, click Automations, then click Create automation.

Or, from a line plot in the workspace, you can quickly create a run metric automation for the metric it shows. Hover over the panel, then click the bell icon at the top of the panel.
Choose the event to watch for, such as when an artifact alias is added or when a run metric meets a given threshold.
1. Fill in any additional fields that appear, which depend upon the event. For example, if you select An artifact alias is added, you must specify the Alias regex.
2. Optionally specify a collection filter. Otherwise, the automation is applied to all collections in the project, including those added in the future.
Click Next step.
Select the team that owns the webhook.
Set Action type to Webhooks. then select the webhook to use.
If your webhook requires a payload, construct it and paste it into the Payload field. If you configured an access token for the webhook, you can access the token in the ${ACCESS_TOKEN} payload variable. If you configured a secret for the webhook, you can access it in the payload by prefixing its name with $. Your webhook’s requirements are determined by the webhook’s service.
Click Next step.
Provide a name for the automation. Optionally, provide a description. Click Create automation.

View and manage automations

Manage a registry’s automations from the registry’s Automations tab.
Manage a collection’s automations from the Automations section of the collection’s details page.

From either of these pages, a Registry admin can manage existing automations:

To view an automation’s details, click its name.
To edit an automation, click its action ... menu, then click Edit automation.
To delete an automation, click its action ... menu, then click Delete automation. Confirmation is required.

A W&B admin can view and manage a project’s automations from the project’s Automations tab.

To view an automation’s details, click its name.
To edit an automation, click its action ... menu, then click Edit automation.
To delete an automation, click its action ... menu, then click Delete automation. Confirmation is required.

Payload reference

Use these sections to construct your webhoook’s payload. For details about testing your webhook and its payload, refer to Troubleshoot your webhook.

Payload variables

This section describes the variables you can use to construct your webhook’s payload.

Variable	Details
`${project_name}`	The name of the project that owns the mutation that triggered the action.
`${entity_name}`	The name of the entity or team that owns the mutation that triggered the action.
`${event_type}`	The type of event that triggered the action.
`${event_author}`	The user that triggered the action.
`${alias}`	Contains an artifact’s alias if the automation is triggered by the An artifact alias is added event. For other automations, this variable is blank.
`${tag}`	Contains an artifact’s tags if the automation is triggered by the An artifact tag is added event. For other automations, this variable is blank.
`${artifact_collection_name}`	The name of the artifact collection that the artifact version is linked to.
`${artifact_metadata.<KEY>}`	The value of an arbitrary top-level metadata key from the artifact version that triggered the action. Replace `<KEY>` with the name of a top-level metadata key. Only top-level metadata keys are available in the webhook’s payload.
`${artifact_version}`	The `Wandb.Artifact` representation of the artifact version that triggered the action.
`${artifact_version_string}`	The `string` representation of the artifact version that triggered the action.
`${ACCESS_TOKEN}`	The value of the access token configured in the webhook, if an access token is configured. The access token is automatically passed in the `Authorization: Bearer` HTTP header.
`${SECRET_NAME}`	If configured, the value of a secret configured in the webhook. Replace `SECRET_NAME` with the name of the secret.

Payload examples

This section includes examples of webhook payloads for some common use cases. The examples demonstrate how to use payload variables.

Verify that your access tokens have required set of permissions to trigger your GHA workflow. For more information, see these GitHub Docs.

Send a repository dispatch from W&B to trigger a GitHub action. For example, suppose you have a GitHub workflow file that accepts a repository dispatch as a trigger for the on key:

on:
repository_dispatch:
  types: BUILD_AND_DEPLOY

The payload for the repository might look something like:

{
  "event_type": "BUILD_AND_DEPLOY",
  "client_payload": 
  {
    "event_author": "${event_author}",
    "artifact_version": "${artifact_version}",
    "artifact_version_string": "${artifact_version_string}",
    "artifact_collection_name": "${artifact_collection_name}",
    "project_name": "${project_name}",
    "entity_name": "${entity_name}"
    }
}

The event_type key in the webhook payload must match the types field in the GitHub workflow YAML file.

The contents and positioning of rendered template strings depends on the event or model version the automation is configured for. ${event_type} will render as either LINK_ARTIFACT or ADD_ARTIFACT_ALIAS. See below for an example mapping:

${event_type} --> "LINK_ARTIFACT" or "ADD_ARTIFACT_ALIAS"
${event_author} --> "<wandb-user>"
${artifact_version} --> "wandb-artifact://_id/QXJ0aWZhY3Q6NTE3ODg5ODg3""
${artifact_version_string} --> "<entity>/model-registry/<registered_model_name>:<alias>"
${artifact_collection_name} --> "<registered_model_name>"
${project_name} --> "model-registry"
${entity_name} --> "<entity>"

Use template strings to dynamically pass context from W&B to GitHub Actions and other tools. If those tools can call Python scripts, they can consume the registered model artifacts through the W&B API.

For more information about repository dispatch, see the official documentation on the GitHub Marketplace.
Watch the videos Webhook Automations for Model Evaluation and Webhook Automations for Model Deployment, which guide you to create automations for model evaluation and deployment.
Review a W&B report, which illustrates how to use a Github Actions webhook automation for Model CI. Check out this GitHub repository to learn how to create model CI with a Modal Labs webhook.

This example payload shows how to notify your Teams channel using a webhook:

{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"summary": "New Notification",
"sections": [
  {
    "activityTitle": "Notification from WANDB",
    "text": "This is an example message sent via Teams webhook.",
    "facts": [
      {
        "name": "Author",
        "value": "${event_author}"
      },
      {
        "name": "Event Type",
        "value": "${event_type}"
      }
    ],
    "markdown": true
  }
]
}

You can use template strings to inject W&B data into your payload at the time of execution (as shown in the Teams example above).

This section is provided for historical purposes. If you currently use a webhook to integrate with Slack, W&B recommends that you update your configuration to use the [new Slack integration]({{ relref “#create-a-slack-automation”}}) instead.

Set up your Slack app and add an incoming webhook integration with the instructions highlighted in the Slack API documentation. Ensure that you have the secret specified under Bot User OAuth Token as your W&B webhook’s access token.

The following is an example payload:

{
    "text": "New alert from WANDB!",
"blocks": [
    {
            "type": "section",
        "text": {
            "type": "mrkdwn",
            "text": "Registry event: ${event_type}"
        }
    },
        {
            "type":"section",
            "text": {
            "type": "mrkdwn",
            "text": "New version: ${artifact_version_string}"
        }
        },
        {
        "type": "divider"
    },
        {
            "type": "section",
        "text": {
            "type": "mrkdwn",
            "text": "Author: ${event_author}"
        }
        }
    ]
}

Troubleshoot your webhook

Interactively troubleshoot your webhook with the W&B App UI or programmatically with a Bash script. You can troubleshoot a webhook when you create a new webhook or edit an existing webhook.

For details about the format W&B uses for the POST request, refer to the Bash script tab.

A team admin can test a webhook interactively with the W&B App UI.

Navigate to your W&B Team Settings page.
Scroll to the Webhooks section.
Click on the horizontal three docs (meatball icon) next to the name of your webhook.
Select Test.
From the UI panel that appears, paste your POST request to the field that appears.
Click on Test webhook. Within the W&B App UI, W&B posts the response from your endpoint.

Watch the video Testing Webhooks in W&B for a demonstration.

This shell script shows one method to generate a POST request similar to the request W&B sends to your webhook automation when it is triggered.

Copy and paste the code below into a shell script to troubleshoot your webhook. Specify your own values for:

ACCESS_TOKEN
SECRET
PAYLOAD
API_ENDPOINT

webhook_test.sh

4.5.2 - Automation events and scopes

This feature requires a Pro or Enterprise plan.

An automation can start when a specific event occurs within a project or registry. This page describes the events that can trigger an automation within each scope. Learn more about automations in the Automations overview or Create an automation.

Registry

This section describes the scopes and events for an automation in a Registry.

Navigate to the Registry App at https://wandb.ai/registry/.
Click the name of a registry, then view and create automations in the Automations tab.

Screenshot of the Registry Automations tab with an automation

Learn more about creating automations.

Scopes

You can create a Registry automation at these scopes:

Registry level: The automation watches for the event taking place on any collection within a specific registry, including collections added in the future.
Collection level: A single collection in a specific registry.

Events

A Registry automation can watch for these events:

A new version is linked to a collection: Test and validate new models or datasets when they are added to a registry.
An artifact alias is added: Trigger a specific step of your workflow when a new artifact version has a specific alias applied. For example, deploy a model when it has the production alias applied.

Project

This section describes the scopes and events for an automation in a project.

Navigate to your W&B project on the W&B App at https://wandb.ai/<team>/<project-name>.
View and create automations in the Automations tab.

Screenshot of the Project Automations tab with an automation

Learn more about creating automations.

Scopes

You can create a project automation at these scopes:

Project level: The automation watches for the event taking place on any collection in the project.
Collection level: All collections in the project that match the filter you specify.

Artifact events

This section describes the events related to an artifact that can trigger an automation.

A new version is added to an artifact: Apply recurring actions to each version of an artifact. For example, start a training job when a new dataset artifact version is created.
An artifact alias is added: Trigger a specific step of your workflow when a new artifact version in a project or collection has a specific alias applied. For example, run a series of downstream processing steps when an artifact has the test-set-quality-check alias applied, or run a workflow each time a new artifact version gains the latest alias. Only one artifact version can have a given alias at a point in time.
An artifact tag is added: Trigger a specific step of your workflow when an artifact version in a project or collection has a specific tag applied. For example, trigger a geo-specific workflow when the tag “europe” is added to an artifact version. Artifact tags are used for grouping and filtering, and a given tag can be assigned to multiple artifact versions simultaneously.

Run events

An automation can be triggered by a change in a run’s status or a change in a metric value.

Run status change

Currently available only in W&B Multi-tenant Cloud.
A run with Killed status cannot trigger an automation. This status indicates that the run was stopped forcibly by an admin user.

Trigger a workflow when a run changes its status to Running, Finished, or Failed. Optionally, you can further limit the runs that can trigger an automation by filtering by the user that started a run or the run’s name.

Screenshot showing a run status change automation

Because run status is a property of the entire run, you can create a run status automation only from the the Automations page, not from a workspace.

Run metrics change

Currently available only in W&B Multi-tenant Cloud.

Trigger a workflow based on a logged value for a metric, either a metric in a run’s history or a system metric such as cpu, which tracks the percentage of CPU utilization. W&B logs system metrics automatically every 15 seconds.

You can create a run metrics automation from the project’s Automations tab or directly from a line plot panel in a workspace.

To set up a run metric automation, you configure how to compare the metric’s value with the threshold you specify. Your choices depend on the event type and on any filters you specify.

Optionally, you can further limit the runs that can trigger an automation by filtering by the user that started a run or the run’s name.

Threshold

For Run metrics threshold met events, you configure:

The window of most recently logged values to consider (defaults to 5).
Whether to evaluate the Average, Min, or Max value within the window.
The comparison to make:
- Above
- Above or equal to
- Below
- Below or equal to
- Not equal to
- Equal to

For example, trigger an automation when average accuracy is above .6.

Screenshot showing a run metrics threshold automation

Change threshold

For Run metrics change threshold met events, the automation uses two “windows” of values to check whether to start:

The current window of recently logged values to consider (defaults to 10).
The prior window of recently logged values to consider (defaults to 50).

The current and prior windows are consecutive and do not overlap.

To create the automation, you configure:

The current window of logged values (defaults to 10).
The prior window of logged values (defaults to 50).
Whether to evaluate the values as relative or absolute (defaults to Relative).
The comparison to make:
- Increases by at least
- Decreases by at least
- Increases or decreases by at least

For example, trigger an automation when average loss decreases by at least .25.

Screenshot showing a run metrics change threshold automation

Run filters

This section describes how the automation selects runs to evaluate.

By default, any run in the project triggers the automation when the event occurs. To consider only specific runs, specify a run filter.
Each run is considered individually and can potentially trigger the automation.
Each run’s values are put into a separate window and compared to the threshold separately.
In a 24 hour period, a particular automation can fire at most once per run.

Next steps

5 - W&B Platform

W&B Platform is the foundational infrastructure, tooling and governance scaffolding which supports the W&B products like Core, Models and Weave.

W&B Platform is available in three different deployment options:

W&B Multi-tenant Cloud
W&B Dedicated Cloud
W&B Customer-managed

The following responsibility matrix outlines some of the key differences:

	Multi-tenant Cloud	Dedicated Cloud	Customer-managed
MySQL / DB management	Fully hosted and managed by W&B	Fully hosted & managed by W&B on cloud or region of customer choice	Fully hosted and managed by customer
Object Storage (S3/GCS/Blob storage)	Option 1: Fully hosted by W&B Option 2: Customer can configure their own bucket per team, using the Secure Storage Connector	Option 1: Fully hosted by W&B Option 2: Customer can configure their own bucket per instance or team, using the Secure Storage Connector	Fully hosted and managed by customer
SSO Support	W&B managed via Auth0	Option 1: Customer managed Option 2: Managed by W&B via Auth0	Fully managed by customer
W&B Service (App)	Fully managed by W&B	Fully managed by W&B	Fully managed by customer
App security	Fully managed by W&B	Shared responsibility of W&B and customer	Fully managed by customer
Maintenance (upgrades, backups, etc.)	Managed by W&B	Managed by W&B	Managed by customer
Support	Support SLA	Support SLA	Support SLA
Supported cloud infrastructure	GCP	AWS, GCP, Azure	AWS, GCP, Azure, On-Prem bare-metal

Deployment options

The following sections provide an overview of each deployment type.

W&B Multi-tenant Cloud

W&B Multi-tenant Cloud is a fully managed service deployed in W&B’s cloud infrastructure, where you can seamlessly access the W&B products at the desired scale, with cost-efficient options for pricing, and with continuous updates for the latest features and functionalities. W&B recommends to use the Multi-tenant Cloud for your product trial, or to manage your production AI workflows if you do not need the security of a private deployment, self-service onboarding is important, and cost efficiency is critical.

See W&B Multi-tenant Cloud for more information.

W&B Dedicated Cloud

W&B Dedicated Cloud is a single-tenant, fully managed service deployed in W&B’s cloud infrastructure. It is the best place to onboard W&B if your organization requires conformance to strict governance controls including data residency, have need of advanced security capabilities, and are looking to optimize their AI operating costs by not having to build & manage the required infrastructure with security, scale & performance characteristics.

See W&B Dedicated Cloud for more information.

W&B Customer-Managed

With this option, you can deploy and manage W&B Server on your own managed infrastructure. W&B Server is a self-contained packaged mechanism to run the W&B Platform & its supported W&B products. W&B recommends this option if all your existing infrastructure is on-prem, or your organization has strict regulatory needs that are not satisfied by W&B Dedicated Cloud. With this option, you are fully responsible to manage the provisioning, and continuous maintenance & upgrades of the infrastructure required to support W&B Server.

See W&B Self Managed for more information.

Next steps

If you’re looking to try any of the W&B products, W&B recommends using the Multi-tenant Cloud. If you’re looking for an enterprise-friendly setup, choose the appropriate deployment type for your trial here.

5.1 - Deployment options

This section describes the different ways can you can deploy W&B.

W&B Multi-tenant Cloud

W&B Multi-tenant Cloud is fully managed by W&B, including upgrades, maintenance, platform security, and capacity planning. Multi-tenant Cloud is deployed in W&B’s Google Cloud Platform (GCP) account in GPC’s North America regions. Bring your own bucket (BYOB) optionally allows you to store W&B Artifacts and other related sensitive data in your own cloud or on-premises infrastructure.

See W&B Multi-tenant Cloud or get started for free.

W&B Dedicated Cloud

W&B Dedicated Cloud is a single-tenant, fully managed platform designed with enterprise organizations in mind. W&B Dedicated Cloud is deployed in W&B’s AWS, GCP or Azure account. Dedicated Cloud provides more flexibility than Multi-tenant Cloud, but less complexity than W&B Self-Managed. Upgrades, maintenance, platform security, and capacity planning are managed by W&B. Each Dedicated Cloud instance has its own isolated network, compute and storage from other W&B Dedicated Cloud instances.

Your W&B specific metadata and data is stored in an isolated cloud storage and is processed using isolated cloud compute services. Bring your own bucket (BYOB) optionally allows you to store artifacts and other related sensitive data in your own cloud or on-premises infrastructure.

W&B Dedicated Cloud includes an enterprise license with support for important security and other enterprise-friendly capabilities.

For organizations with advanced security or compliance requirements, features such as HIPAA compliance, Single Sign On, or Customer Managed Encryption Keys (CMEK) are available with Enterprise support. Request more information.

See W&B Dedicated Cloud or get started for free.

W&B Self-Managed

W&B Self-Managed is entirely managed by you, either on your premises or in cloud infrastructure that you manage. Your IT/DevOps/MLOps team is responsible for:

Provisioning your deployment.
Securing your infrastructure in accordance with your organization’s policies and Security Technical Implementation Guidelines (STIG), if applicable.
Managing upgrades and applying patches.
Continuously maintaining your self managed W&B Server instance.

You can optionally obtain an enterprise license for W&B Self-Managed. An enterprise license includes support for important security and other enterprise-friendly capabilities.

See W&B Self-Managed or review the reference architecture guidelines.

5.1.1 - Use W&B Multi-tenant Cloud

W&B Multi-tenant Cloud is a fully managed platform deployed in W&B’s Google Cloud Platform (GCP) account in GPC’s North America regions. W&B Multi-tenant Cloud utilizes autoscaling in GCP to ensure that the platform scales appropriately based on increases or decreases in traffic.

W&B Multi-tenant Cloud scales to meet your organization’s needs, and supports logging up to 250,000 metrics per project with up to 1 million data points per metric. For larger deployments, contact support.

Data security

For users on Free or Pro plans, all data is only stored in the shared cloud storage and is processed with shared cloud compute services. Depending on your pricing plan, you may be subject to storage limits.

Users on an Enterprise plan can bring their own bucket (BYOB) using the secure storage connector at the team level to store their files such as models, datasets, and more. You can configure a single bucket for multiple teams or you can use separate buckets for different W&B Teams. If you do not configure BYOB for a team, the team’s data is stored in the shared cloud storage.

You are responsible for ensuring that your deployment complies with your organization’s policies and Security Technical Implementation Guidelines (STIG), if applicable.

Identity and access management (IAM)

If you are on an Enterprise plan, enhanced identity and access managements capabilities allow for secure authentication and effective authorization for your W&B deployment:

SSO authentication with OIDC or SAML. Reach out to your W&B team or support if you would like to configure SSO for your organization.
Configure appropriate user roles at the scope of the organization and within a team.
Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it with restricted projects.

Monitor

Organization admins can manage usage and billing for their account from the Billing tab in their account view. If using the shared cloud storage on Multi-tenant Cloud, an admin can optimize storage usage across different teams in their organization.

Maintenance

W&B Multi-tenant Cloud is a multi-tenant, fully managed platform. Since W&B Multi-tenant Cloud is managed by W&B, you do not incur the overhead and costs of provisioning and maintaining the W&B platform.

Compliance

Security controls for Multi-tenant Cloud are periodically audited internally and externally. Refer to the W&B Security Portal to request the SOC2 report and other security and compliance documents.

Next steps

Access Multi-tenant Cloud directly to get started with most features for free. To try out enhanced data security and IAM features, request an Enterprise trial.

5.1.2 - Dedicated Cloud

Use W&B Dedicated Cloud for single-tenant SaaS

W&B Dedicated Cloud is a single-tenant, fully managed platform deployed in W&B’s AWS, GCP, or Azure cloud accounts. Each Dedicated Cloud instance has its own isolated network, compute and storage from other W&B Dedicated Cloud instances. Your W&B specific metadata and data is stored in an isolated cloud storage and is processed using isolated cloud compute services.

W&B Dedicated Cloud is available in multiple global regions for each cloud provider

Data security

You can bring your own bucket (BYOB) using the secure storage connector at the instance and team levels to store your files such as models, datasets, and more.

Similar to W&B Multi-tenant Cloud, you can configure a single bucket for multiple teams or you can use separate buckets for different teams. If you do not configure secure storage connector for a team, that data is stored in the instance level bucket.

In addition to BYOB with secure storage connector, you can use IP allowlisting to restrict access to your Dedicated Cloud instance from only trusted network locations.

You can connect privately to your Dedicated Cloud instance using cloud provider’s secure connectivity solution.

You are responsible for ensuring that your deployment complies with your organization’s policies and Security Technical Implementation Guidelines (STIG), if applicable.

Identity and access management (IAM)

Use the identity and access management capabilities for secure authentication and effective authorization in your W&B Organization. The following features are available for IAM in Dedicated Cloud instances:

Authenticate with SSO using OpenID Connect (OIDC) or with LDAP.
Configure appropriate user roles at the scope of the organization and within a team.
Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it with restricted projects.
Leverage JSON Web Tokens with identity federation to access W&B APIs.

Monitor

Use Audit logs to track user activity within your teams and to conform to your enterprise governance requirements. Also, you can view organization usage in our Dedicated Cloud instance with W&B Organization Dashboard.

Maintenance

Similar to W&B Multi-tenant Cloud, you do not incur the overhead and costs of provisioning and maintaining the W&B platform with Dedicated Cloud.

To understand how W&B manages updates on Dedicated Cloud, refer to the server release process.

Compliance

Security controls for W&B Dedicated Cloud are periodically audited internally and externally. Refer to the W&B Security Portal to request the security and compliance documents for your product assessment exercise.

Migration options

Migration to Dedicated Cloud from a Self-Managed instance or Multi-tenant Cloud is supported, subject to specific limits and migration-related constraints

Next steps

Submit this form if you are interested in using Dedicated Cloud.

5.1.2.1 - Supported Dedicated Cloud regions

AWS, GCP, and Azure support cloud computing services in multiple locations worldwide. Global regions help ensure that you satisfy requirements related to data residency & compliance, latency, cost efficiency and more. W&B supports many of the available global regions for Dedicated Cloud.

Reach out to W&B Support if your preferred AWS, GCP, or Azure Region is not listed. W&B can validate if the relevant region has all the services that Dedicated Cloud needs and prioritize support depending on the outcome of the evaluation.

Supported AWS Regions

The following table lists AWS Regions that W&B currently supports for Dedicated Cloud instances.

Region location	Region name
US East (Ohio)	us-east-2
US East (N. Virginia)	us-east-1
US West (N. California)	us-west-1
US West (Oregon)	us-west-2
Canada (Central)	ca-central-1
Europe (Frankfurt)	eu-central-1
Europe (Ireland)	eu-west-1
Europe (London)	eu-west-2
Europe (Milan)	eu-south-1
Europe (Stockholm)	eu-north-1
Asia Pacific (Mumbai)	ap-south-1
Asia Pacific (Singapore)	ap-southeast-1
Asia Pacific (Sydney)	ap-southeast-2
Asia Pacific (Tokyo)	ap-northeast-1
Asia Pacific (Seoul)	ap-northeast-2

For more information about AWS Regions, see the Regions, Availability Zones, and Local Zones in the AWS Documentation.

See What to Consider when Selecting a Region for your Workloads for an overview of factors that you should consider when choosing an AWS Region.

Supported GCP Regions

The following table lists GCP Regions that W&B currently supports for Dedicated Cloud instances.

Region location	Region name
South Carolina	us-east1
N. Virginia	us-east4
Iowa	us-central1
Oregon	us-west1
Los Angeles	us-west2
Las Vegas	us-west4
Toronto	northamerica-northeast2
Belgium	europe-west1
London	europe-west2
Frankfurt	europe-west3
Netherlands	europe-west4
Sydney	australia-southeast1
Tokyo	asia-northeast1
Seoul	asia-northeast3

For more information about GCP Regions, see Regions and zones in the GCP Documentation.

Supported Azure Region

The following table lists Azure regions that W&B currently supports for Dedicated Cloud instances.

Region location	Region name
Virginia	eastus
Iowa	centralus
Washington	westus2
California	westus
Canada Central	canadacentral
France Central	francecentral
Netherlands	westeurope
Tokyo, Saitama	japaneast
Seoul	koreacentral

For more information about Azure regions, see Azure geographies in the Azure Documentation.

5.1.2.2 - Export data from Dedicated cloud

Export data from Dedicated cloud

If you would like to export all the data managed in your Dedicated cloud instance, you can use the W&B SDK API to extract the runs, metrics, artifacts, and more with the Import and Export API. The following table has covers some of the key exporting use cases.

Purpose	Documentation
Export project metadata	Projects API
Export runs in a project	Runs API
Export reports	Report and Workspace API
Export artifacts	Explore artifact graphs, Download and use artifacts

If you manage artifacts stored in the Dedicated cloud with Secure Storage Connector, you may not need to export the artifacts using the W&B SDK API.

Using W&B SDK API to export all of your data can be slow if you have a large number of runs, artifacts etc. W&B recommends running the export process in appropriately sized batches so as not to overwhelm your Dedicated cloud instance.

5.1.3 - Self-Managed

Deploying W&B in production

Use W&B Self-Managed on cloud or on-prem infrastructure

W&B recommends fully managed deployment options such as W&B Multi-tenant Cloud or W&B Dedicated Cloud deployment types. W&B fully managed services are simple and secure to use, with minimum to no configuration required.

Deploy W&B Server on your AWS, GCP, or Azure cloud account or within your on-premises infrastructure.

Your IT/DevOps/MLOps team is responsible for:

Provisioning your deployment.
Securing your infrastructure in accordance with your organization’s policies and Security Technical Implementation Guidelines (STIG), if applicable.
Managing upgrades and applying patches.
Continuously maintaining your Self-Managed W&B Server instance.

Deploy W&B Server within self-managed cloud accounts

W&B recommends that you use official W&B Terraform scripts to deploy W&B Server into your AWS, GCP, or Azure cloud account.

See specific cloud provider documentation for more information on how to set up W&B Server in AWS, GCP, or Azure.

Deploy W&B Server in on-prem infrastructure

To set up W&B Server in your on-premises infrastructure, you need to configure several infrastructure components. Some of those components include, but are not limited to:

(Strongly recommended) Kubernetes cluster
MySQL 8 database cluster
Amazon S3-compatible object storage
Redis cache cluster

See Install on on-prem infrastructure for detailed instructions to install W&B Server on your on-prem infrastructure. W&B can provide recommendations for the different components and provide guidance through the installation process.

Deploy W&B Server on a custom cloud platform

You can deploy W&B Server to a cloud platform that is not AWS, GCP, or Azure. Requirements for that are similar to that for deploying in on-prem infrastructure.

Obtain your W&B Server license

You need a W&B trial license to complete your configuration of the W&B server. Open the Deploy Manager to generate a free trial license.

If you do not already have a W&B account, create one to generate your free license.

If you need an enterprise license for W&B Server which includes support for important security & other enterprise-friendly capabilities, submit this form or reach out to your W&B team.

The URL redirects you to a Get a License for W&B Local form. Provide the following information:

Choose a deployment type from the Choose Platform step.
Select the owner of the license or add a new organization in the Basic Information step.
Provide a name for the instance in the Name of Instance field and optionally provide a description in the Description field in the Get a License step.
Select the Generate License Key button.

A page displays with an overview of your deployment along with the license associated with the instance.

5.1.3.1 - Reference Architecture

W&B Reference Architecture

This page describes a reference architecture for a W&B deployment and outlines the recommended infrastructure and resources to support a production deployment of the platform.

Depending on your chosen deployment environment for W&B, various services can help to enhance the resiliency of your deployment.

For instance, major cloud providers offer robust managed database services which help to reduce the complexity of database configuration, maintenance, high availability, and resilience.

This reference architecture addresses some common deployment scenarios and shows how you can integrate your W&B deployment with cloud vendor services for optimal performance and reliability.

Before you start

Running any application in production comes with its own set of challenges, and W&B is no exception. While we aim to streamline the process, certain complexities may arise depending on your unique architecture and design decisions. Typically, managing a production deployment involves overseeing various components, including hardware, operating systems, networking, storage, security, the W&B platform itself, and other dependencies. This responsibility extends to both the initial setup of the environment and its ongoing maintenance.

Consider carefully whether a self-managed approach with W&B is suitable for your team and specific requirements.

A strong understanding of how to run and maintain production-grade application is an important prerequisite before you deploy self-managed W&B. If your team needs assistance, our Professional Services team and partners offer support for implementation and optimization.

To learn more about managed solutions for running W&B instead of managing it yourself, refer to W&B Multi-tenant Cloud and W&B Dedicated Cloud.

Infrastructure

Application layer

The application layer consists of a multi-node Kubernetes cluster, with resilience against node failures. The Kubernetes cluster runs and maintains W&B’s pods.

Storage layer

The storage layer consists of a MySQL database and object storage. The MySQL database stores metadata and the object storage stores artifacts such as models and datasets.

Infrastructure requirements

Kubernetes

The W&B Server application is deployed as a Kubernetes Operator that deploys multiple pods. For this reason, W&B requires a Kubernetes cluster with:

A fully configured and functioning Ingress controller.
The capability to provision Persistent Volumes.

MySQL

W&B stores metadata in a MySQL database. The database’s performance and storage requirements depend on the shapes of the model parameters and related metadata. For example, the database grows in size as you track more training runs, and load on the database increases based on queries in run tables, user workspaces, and reports.

Consider the following when you deploy a self-managed MySQL database:

Backups. You should periodically back up the database to a separate facility. W&B recommends daily backups with at least 1 week of retention.
Performance. The disk the server is running on should be fast. W&B recommends running the database on an SSD or accelerated NAS.
Monitoring. The database should be monitored for load. If CPU usage is sustained at > 40% of the system for more than 5 minutes it is likely a good indication the server is resource starved.
Availability. Depending on your availability and durability requirements you might want to configure a hot standby on a separate machine that streams all updates in realtime from the primary server and can be used to failover to in the event that the primary server crashes or become corrupted.

Object storage

W&B requires object storage with pre-signed URL and CORS support, deployed in one of:

CoreWeave AI Object Storage is a high-performance, S3-compatible object storage service optimized for AI workloads.
Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.
Google Cloud Storage is a managed service for storing unstructured data at scale.
Azure Blob Storage is a cloud-based object storage solution for storing massive amounts of unstructured data like text, binary data, images, videos, and logs.
S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.

Versions

Software	Minimum version
Kubernetes	v1.29
MySQL	v8.0.0, “General Availability” releases only

Networking

For a networked deployment, egress to these endpoints is required during both installation and runtime:

To learn about air-gapped deployments, refer to Kubernetes operator for air-gapped instances. Access to W&B and to the object storage is required for the training infrastructure and for each system that tracks the needs of experiments.

DNS

The fully qualified domain name (FQDN) of the W&B deployment must resolve to the IP address of the ingress/load balancer using an A record.

SSL/TLS

W&B requires a valid signed SSL/TLS certificate for secure communication between clients and the server. SSL/TLS termination must occur on the ingress/load balancer. The W&B Server application does not terminate SSL or TLS connections.

Please note: W&B does not recommend the use self-signed certificates and custom CAs.

Supported CPU architectures

W&B runs on the Intel (x86) CPU architecture. ARM is not supported.

Infrastructure provisioning

Terraform is the recommended way to deploy W&B for production. Using Terraform, you define the required resources, their references to other resources, and their dependencies. W&B provides Terraform modules for the major cloud providers. For details, refer to Deploy W&B Server within self managed cloud accounts.

Sizing

Use the following general guidelines as a starting point when planning a deployment. W&B recommends that you monitor all components of a new deployment closely and that you make adjustments based on observed usage patterns. Continue to monitor production deployments over time and make adjustments as needed to maintain optimal performance.

Models only

Kubernetes

Environment	CPU	Memory	Disk
Test/Dev	2 cores	16 GB	100 GB
Production	8 cores	64 GB	100 GB

Numbers are per Kubernetes worker node.

MySQL

Environment	CPU	Memory	Disk
Test/Dev	2 cores	16 GB	100 GB
Production	8 cores	64 GB	500 GB

Numbers are per MySQL node.

Weave only

Kubernetes

Environment	CPU	Memory	Disk
Test/Dev	4 cores	32 GB	100 GB
Production	12 cores	96 GB	100 GB

Numbers are per Kubernetes worker node.

MySQL

Environment	CPU	Memory	Disk
Test/Dev	2 cores	16 GB	100 GB
Production	8 cores	64 GB	500 GB

Numbers are per MySQL node.

Models and Weave

Kubernetes

Environment	CPU	Memory	Disk
Test/Dev	4 cores	32 GB	100 GB
Production	16 cores	128 GB	100 GB

Numbers are per Kubernetes worker node.

MySQL

Environment	CPU	Memory	Disk
Test/Dev	2 cores	16 GB	100 GB
Production	8 cores	64 GB	500 GB

Numbers are per MySQL node.

Cloud provider instance recommendations

Services

Cloud	Kubernetes	MySQL	Object Storage
AWS	EKS	RDS Aurora	S3
GCP	GKE	Google Cloud SQL - Mysql	Google Cloud Storage (GCS)
Azure	AKS	Azure Database for Mysql	Azure Blob Storage

Machine types

These recommendations apply to each node of a self-managed deployment of W&B in cloud infrastructure.

AWS

Environment	K8s (Models only)	K8s (Weave only)	K8s (Models&Weave)	MySQL
Test/Dev	r6i.large	r6i.xlarge	r6i.xlarge	db.r6g.large
Production	r6i.2xlarge	r6i.4xlarge	r6i.4xlarge	db.r6g.2xlarge

GCP

Environment	K8s (Models only)	K8s (Weave only)	K8s (Models&Weave)	MySQL
Test/Dev	n2-highmem-2	n2-highmem-4	n2-highmem-4	db-n1-highmem-2
Production	n2-highmem-8	n2-highmem-16	n2-highmem-16	db-n1-highmem-8

Azure

Environment	K8s (Models only)	K8s (Weave only)	K8s (Models&Weave)	MySQL
Test/Dev	Standard_E2_v5	Standard_E4_v5	Standard_E4_v5	MO_Standard_E2ds_v4
Production	Standard_E8_v5	Standard_E16_v5	Standard_E16_v5	MO_Standard_E8ds_v4

5.1.3.2 - Run W&B Server on Kubernetes

Deploy W&B Platform with Kubernetes Operator

W&B Kubernetes Operator

Use the W&B Kubernetes Operator to simplify deploying, administering, troubleshooting, and scaling your W&B Server deployments on Kubernetes. You can think of the operator as a smart assistant for your W&B instance.

The W&B Server architecture and design continuously evolves to expand AI developer tooling capabilities, and to provide appropriate primitives for high performance, better scalability, and easier administration. That evolution applies to the compute services, relevant storage and the connectivity between them. To help facilitate continuous updates and improvements across deployment types, W&B users a Kubernetes operator.

W&B uses the operator to deploy and manage Dedicated cloud instances on AWS, GCP and Azure public clouds.

For more information about Kubernetes operators, see Operator pattern in the Kubernetes documentation.

Reasons for the architecture shift

Historically, the W&B application was deployed as a single deployment and pod within a Kubernetes Cluster or a single Docker container. W&B has, and continues to recommend, to externalize the Database and Object Store. Externalizing the Database and Object store decouples the application’s state.

As the application grew, the need to evolve from a monolithic container to a distributed system (microservices) was apparent. This change facilitates backend logic handling and seamlessly introduces built-in Kubernetes infrastructure capabilities. Distributed systems also supports deploying new services essential for additional features that W&B relies on.

Before 2024, any Kubernetes-related change required manually updating the terraform-kubernetes-wandb Terraform module. Updating the Terraform module ensures compatibility across cloud providers, configuring necessary Terraform variables, and executing a Terraform apply for each backend or Kubernetes-level change.

This process was not scalable since W&B Support had to assist each customer with upgrading their Terraform module.

The solution was to implement an operator that connects to a central deploy.wandb.ai server to request the latest specification changes for a given release channel and apply them. Updates are received as long as the license is valid. Helm is used as both the deployment mechanism for the W&B operator and the means for the operator to handle all configuration templating of the W&B Kubernetes stack, Helm-ception.

How it works

You can install the operator with helm or from the source. See charts/operator for detailed instructions.

The installation process creates a deployment called controller-manager and uses a custom resource definition named weightsandbiases.apps.wandb.com (shortName: wandb), that takes a single spec and applies it to the cluster:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: weightsandbiases.apps.wandb.com

The controller-manager installs charts/operator-wandb based on the spec of the custom resource, release channel, and a user defined config. The configuration specification hierarchy enables maximum configuration flexibility at the user end and enables W&B to release new images, configurations, features, and Helm updates automatically.

Refer to the configuration specification hierarchy and configuration reference for configuration options.

The deployment consists of multiple pods, one per service. Each pod’s name is prefixed with wandb-.

Configuration specification hierarchy

Configuration specifications follow a hierarchical model where higher-level specifications override lower-level ones. Here’s how it works:

Release Channel Values: This base level configuration sets default values and configurations based on the release channel set by W&B for the deployment.
User Input Values: Users can override the default settings provided by the Release Channel Spec through the System Console.
Custom Resource Values: The highest level of specification, which comes from the user. Any values specified here override both the User Input and Release Channel specifications. For a detailed description of the configuration options, see Configuration Reference.

This hierarchical model ensures that configurations are flexible and customizable to meet varying needs while maintaining a manageable and systematic approach to upgrades and changes.

Requirements to use the W&B Kubernetes Operator

Satisfy the following requirements to deploy W&B with the W&B Kubernetes operator:

Refer to the reference architecture. In addition, obtain a valid W&B Server license.

See the bare-metal installation guide for a detailed explanation on how to set up and configure a self-managed installation.

Depending on the installation method, you might need to meet the following requirements:

Kubectl installed and configured with the correct Kubernetes cluster context.
Helm is installed.

Air-gapped installations

See the Deploy W&B in airgapped environment with Kubernetes tutorial on how to install the W&B Kubernetes Operator in an airgapped environment.

Deploy W&B Server application

This section describes different ways to deploy the W&B Kubernetes operator.

The W&B Operator is the default and recommended installation method for W&B Server.

Deploy W&B with Helm CLI

W&B provides a Helm Chart to deploy the W&B Kubernetes operator to a Kubernetes cluster. This approach allows you to deploy W&B Server with Helm CLI or a continuous delivery tool like ArgoCD. Make sure that the above mentioned requirements are in place.

Follow those steps to install the W&B Kubernetes Operator with Helm CLI:

Add the W&B Helm repository. The W&B Helm chart is available in the W&B Helm repository:
```
helm repo add wandb https://charts.wandb.ai
helm repo update
```

Install the Operator on a Kubernetes cluster:

helm upgrade --install operator wandb/operator -n wandb-cr --create-namespace

Configure the W&B operator custom resource to trigger the W&B Server installation, either by overriding the default configuration with a Helm values.yaml file or by fully customizing the custom resource definition (CRD) directly.
- values.yaml override (recommended): Create a new file named values.yaml that includes only the keys from the full values.yaml specification that you want to override. For example, to configure MySQL:
  values.yaml
- Full CRD: Copy this example configuration to a new file named operator.yaml. Make the required changes to the file. Refer to Configuration Reference.
  operator.yaml
Start the Operator with your custom configuration so that it can install, configure, and manage the W&B Server application.
- To start the Operator with a values.yaml override:
```
kubectl apply -f values.yaml
```
- To start the operator with a fully customized CRD:
```
kubectl apply -f operator.yaml
```
Wait until the deployment completes. This takes a few minutes.
To verify the installation using the web UI, create the first admin user account, then follow the verification steps outlined in Verify the installation.

Deploy W&B with Helm Terraform Module

This method allows for customized deployments tailored to specific requirements, leveraging Terraform’s infrastructure-as-code approach for consistency and repeatability. The official W&B Helm-based Terraform Module is located here.

The following code can be used as a starting point and includes all necessary configuration options for a production grade deployment.

module "wandb" {
  source  = "wandb/wandb/helm"

  spec = {
    values = {
      global = {
        host    = "https://<HOST_URI>"
        license = "eyJhbGnUzaH...j9ZieKQ2x5GGfw"

        bucket = {
          <details depend on the provider>
        }

        mysql = {
          <redacted>
        }
      }

      ingress = {
        annotations = {
          "a" = "b"
          "x" = "y"
        }
      }
    }
  }
}

Note that the configuration options are the same as described in Configuration Reference, but that the syntax has to follow the HashiCorp Configuration Language (HCL). The Terraform module creates the W&B custom resource definition (CRD).

To see how W&B&Biases themselves use the Helm Terraform module to deploy “Dedicated cloud” installations for customers, follow those links:

Deploy W&B with W&B Cloud Terraform modules

W&B provides a set of Terraform Modules for AWS, GCP and Azure. Those modules deploy entire infrastructures including Kubernetes clusters, load balancers, MySQL databases and so on as well as the W&B Server application. The W&B Kubernetes Operator is already pre-baked with those official W&B cloud-specific Terraform Modules with the following versions:

Terraform Registry	Source Code	Version
AWS	https://github.com/wandb/terraform-aws-wandb	v4.0.0+
Azure	https://github.com/wandb/terraform-azurerm-wandb	v2.0.0+
GCP	https://github.com/wandb/terraform-google-wandb	v2.0.0+

This integration ensures that W&B Kubernetes Operator is ready to use for your instance with minimal setup, providing a streamlined path to deploying and managing W&B Server in your cloud environment.

For a detailed description on how to use these modules, refer to the self-managed installations section in the docs.

Verify the installation

To verify the installation, W&B recommends using the W&B CLI. The verify command executes several tests that verify all components and configurations.

This step assumes that the first admin user account is created with the browser.

Follow these steps to verify the installation:

Install the W&B CLI:
```
pip install wandb
```

wandb login --host=https://YOUR_DNS_DOMAIN

For example:

wandb login --host=https://wandb.company-name.com

Verify the installation:
```
wandb verify
```

A successful installation and fully working W&B deployment shows the following output:

Default host selected:  https://wandb.company-name.com
Find detailed logs for this test at: /var/folders/pn/b3g3gnc11_sbsykqkm3tx5rh0000gp/T/tmpdtdjbxua/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................✅
Checking ability to send large payloads through proxy...................✅
Checking requests to base url...........................................✅
Checking requests made over signed URLs.................................✅
Checking CORs configuration of the bucket...............................✅
Checking wandb package version is up to date............................✅
Checking logged metrics, saving and downloading a file..................✅
Checking artifact save and download workflows...........................✅

Access the W&B Management Console

The W&B Kubernetes operator comes with a management console. It is located at ${HOST_URI}/console, for example https://wandb.company-name.com/console.

There are two ways to log in to the management console:

Open the W&B application in the browser and login. Log in to the W&B application with ${HOST_URI}/, for example https://wandb.company-name.com/
Access the console. Click on the icon in the top right corner and then click System console. Only users with admin privileges can see the System console entry.

W&B recommends you access the console using the following steps only if Option 1 does not work.

Open console application in browser. Open the above described URL, which redirects you to the login screen:
Retrieve the password from the Kubernetes secret that the installation generates:
```
kubectl get secret wandb-password -o jsonpath='{.data.password}' | base64 -d
```
Copy the password.
Login to the console. Paste the copied password, then click Login.

Update the W&B Kubernetes operator

This section describes how to update the W&B Kubernetes operator.

Updating the W&B Kubernetes operator does not update the W&B server application.
See the instructions here if you use a Helm chart that does not user the W&B Kubernetes operator before you follow the proceeding instructions to update the W&B operator.

Copy and paste the code snippets below into your terminal.

First, update the repo with helm repo update:
```
helm repo update
```

Next, update the Helm chart with helm upgrade:

helm upgrade operator wandb/operator -n wandb-cr --reuse-values

Update the W&B Server application

You no longer need to update W&B Server application if you use the W&B Kubernetes operator.

The operator automatically updates your W&B Server application when a new version of the software of W&B is released.

Migrate self-managed instances to W&B Operator

The proceeding section describe how to migrate from self-managing your own W&B Server installation to using the W&B Operator to do this for you. The migration process depends on how you installed W&B Server:

The W&B Operator is the default and recommended installation method for W&B Server. Reach out to Customer Support or your W&B team if you have any questions.

If you used the official W&B Cloud Terraform Modules, navigate to the appropriate documentation and follow the steps there:
- AWS
- GCP
- Azure
If you used the W&B Non-Operator Helm chart, continue here.
If you used the W&B Non-Operator Helm chart with Terraform, continue here.
If you created the Kubernetes resources with manifests, continue here.

Migrate to Operator-based AWS Terraform Modules

For a detailed description of the migration process, continue here.

Migrate to Operator-based GCP Terraform Modules

Reach out to Customer Support or your W&B team if you have any questions or need assistance.

Migrate to Operator-based Azure Terraform Modules

Reach out to Customer Support or your W&B team if you have any questions or need assistance.

Migrate to Operator-based Helm chart

Follow these steps to migrate to the Operator-based Helm chart:

Get the current W&B configuration. If W&B was deployed with an non-operator-based version of the Helm chart, export the values like this:
```
helm get values wandb
```
If W&B was deployed with Kubernetes manifests, export the values like this:
```
kubectl get deployment wandb -o yaml
```
You now have all the configuration values you need for the next step.
Create a file called operator.yaml. Follow the format described in the Configuration Reference. Use the values from step 1.
Scale the current deployment to 0 pods. This step is stops the current deployment.
```
kubectl scale --replicas=0 deployment wandb
```
Update the Helm chart repo:
```
helm repo update
```

Install the new Helm chart:

helm upgrade --install operator wandb/operator -n wandb-cr --create-namespace

Configure the new helm chart and trigger W&B application deployment. Apply the new configuration.
```
kubectl apply -f operator.yaml
```
The deployment takes a few minutes to complete.
Verify the installation. Make sure that everything works by following the steps in Verify the installation.
Remove to old installation. Uninstall the old helm chart or delete the resources that were created with manifests.

Migrate to Operator-based Terraform Helm chart

Follow these steps to migrate to the Operator-based Helm chart:

Prepare Terraform config. Replace the Terraform code from the old deployment in your Terraform config with the one that is described here. Set the same variables as before. Do not change .tfvars file if you have one.
Execute Terraform run. Execute terraform init, plan and apply
Verify the installation. Make sure that everything works by following the steps in Verify the installation.
Remove to old installation. Uninstall the old helm chart or delete the resources that were created with manifests.

Configuration Reference for W&B Server

This section describes the configuration options for W&B Server application. The application receives its configuration as custom resource definition named WeightsAndBiases. Some configuration options are exposed with the below configuration, some need to be set as environment variables.

The documentation has two lists of environment variables: basic and advanced. Only use environment variables if the configuration option that you need are not exposed using Helm Chart.

The W&B Server application configuration file for a production deployment requires the following contents. This YAML file defines the desired state of your W&B deployment, including the version, environment variables, external resources like databases, and other necessary settings.

apiVersion: apps.wandb.com/v1
kind: WeightsAndBiases
metadata:
  labels:
    app.kubernetes.io/name: weightsandbiases
    app.kubernetes.io/instance: wandb
  name: wandb
  namespace: default
spec:
  values:
    global:
      host: https://<HOST_URI>
      license: eyJhbGnUzaH...j9ZieKQ2x5GGfw
      bucket:
        <details depend on the provider>
      mysql:
        <redacted>
    ingress:
      annotations:
        <redacted>

Find the full set of values in the W&B Helm repository, and change only those values you need to override.

Complete example

This is an example configuration that uses GCP Kubernetes with GCP Ingress and GCS (GCP Object storage):

apiVersion: apps.wandb.com/v1
kind: WeightsAndBiases
metadata:
  labels:
    app.kubernetes.io/name: weightsandbiases
    app.kubernetes.io/instance: wandb
  name: wandb
  namespace: default
spec:
  values:
    global:
      host: https://abc-wandb.sandbox-gcp.wandb.ml
      bucket:
        name: abc-wandb-moving-pipefish
        provider: gcs
      mysql:
        database: wandb_local
        host: 10.218.0.2
        name: wandb_local
        password: 8wtX6cJHizAZvYScjDzZcUarK4zZGjpV
        port: 3306
        user: wandb
      license: eyJhbGnUzaHgyQjQyQWhEU3...ZieKQ2x5GGfw
    ingress:
      annotations:
        ingress.gcp.kubernetes.io/pre-shared-cert: abc-wandb-cert-creative-puma
        kubernetes.io/ingress.class: gce
        kubernetes.io/ingress.global-static-ip-name: abc-wandb-operator-address

Host

 # Provide the FQDN with protocol
global:
  # example host name, replace with your own
  host: https://wandb.example.com

Object storage (bucket)

AWS

global:
  bucket:
    provider: "s3"
    name: ""
    kmsKey: ""
    region: ""

GCP

global:
  bucket:
    provider: "gcs"
    name: ""

Azure

global:
  bucket:
    provider: "az"
    name: ""
    secretKey: ""

Other providers (Minio, Ceph, etc.)

For other S3 compatible providers, set the bucket configuration as follows:

global:
  bucket:
    # Example values, replace with your own
    provider: s3
    name: storage.example.com
    kmsKey: null
    path: wandb
    region: default
    accessKey: 5WOA500...P5DK7I
    secretKey: HDKYe4Q...JAp1YyjysnX

For S3-compatible storage hosted outside of AWS, kmsKey must be null.

To reference accessKey and secretKey from a secret:

global:
  bucket:
    # Example values, replace with your own
    provider: s3
    name: storage.example.com
    kmsKey: null
    path: wandb
    region: default
    secret:
      secretName: bucket-secret
      accessKeyName: ACCESS_KEY
      secretKeyName: SECRET_KEY

MySQL

global:
   mysql:
     # Example values, replace with your own
     host: db.example.com
     port: 3306
     database: wandb_local
     user: wandb
     password: 8wtX6cJH...ZcUarK4zZGjpV

To reference the password from a secret:

global:
   mysql:
     # Example values, replace with your own
     host: db.example.com
     port: 3306
     database: wandb_local
     user: wandb
     passwordSecret:
       name: database-secret
       passwordKey: MYSQL_WANDB_PASSWORD

License

global:
  # Example license, replace with your own
  license: eyJhbGnUzaHgyQjQy...VFnPS_KETXg1hi

To reference the license from a secret:

global:
  licenseSecret:
    name: license-secret
    key: CUSTOMER_WANDB_LICENSE

Ingress

To identify the ingress class, see this FAQ entry.

Without TLS

global:
# IMPORTANT: Ingress is on the same level in the YAML as ‘global’ (not a child)
ingress:
  class: ""

With TLS

Create a secret that contains the certificate

kubectl create secret tls wandb-ingress-tls --key wandb-ingress-tls.key --cert wandb-ingress-tls.crt

Reference the secret in the ingress configuration

global:
# IMPORTANT: Ingress is on the same level in the YAML as ‘global’ (not a child)
ingress:
  class: ""
  annotations:
    {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  tls: 
    - secretName: wandb-ingress-tls
      hosts:
        - <HOST_URI>

In case of Nginx you might have to add the following annotation:

ingress:
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: 64m

Custom Kubernetes ServiceAccounts

Specify custom Kubernetes service accounts to run the W&B pods.

The following snippet creates a service account as part of the deployment with the specified name:

app:
  serviceAccount:
    name: custom-service-account
    create: true

parquet:
  serviceAccount:
    name: custom-service-account
    create: true

global:
  ...

The subsystems “app” and “parquet” run under the specified service account. The other subsystems run under the default service account.

If the service account already exists on the cluster, set create: false:

app:
  serviceAccount:
    name: custom-service-account
    create: false

parquet:
  serviceAccount:
    name: custom-service-account
    create: false
    
global:
  ...

You can specify service accounts on different subsystems such as app, parquet, console, and others:

app:
  serviceAccount:
    name: custom-service-account
    create: true

console:
  serviceAccount:
    name: custom-service-account
    create: true

global:
  ...

The service accounts can be different between the subsystems:

app:
  serviceAccount:
    name: custom-service-account
    create: false

console:
  serviceAccount:
    name: another-custom-service-account
    create: true

global:
  ...

External Redis

redis:
  install: false

global:
  redis:
    host: ""
    port: 6379
    password: ""
    parameters: {}
    caCert: ""

To reference the password from a secret:

kubectl create secret generic redis-secret --from-literal=redis-password=supersecret

Reference it in below configuration:

redis:
  install: false

global:
  redis:
    host: redis.example
    port: 9001
    auth:
      enabled: true
      secret: redis-secret
      key: redis-password

LDAP

Without TLS

global:
  ldap:
    enabled: true
    # LDAP server address including "ldap://" or "ldaps://"
    host:
    # LDAP search base to use for finding users
    baseDN:
    # LDAP user to bind with (if not using anonymous bind)
    bindDN:
    # Secret name and key with LDAP password to bind with (if not using anonymous bind)
    bindPW:
    # LDAP attribute for email and group ID attribute names as comma separated string values.
    attributes:
    # LDAP group allow list
    groupAllowList:
    # Enable LDAP TLS
    tls: false

With TLS

The LDAP TLS cert configuration requires a config map pre-created with the certificate content.

To create the config map you can use the following command:

kubectl create configmap ldap-tls-cert --from-file=certificate.crt

And use the config map in the YAML like the example below

global:
  ldap:
    enabled: true
    # LDAP server address including "ldap://" or "ldaps://"
    host:
    # LDAP search base to use for finding users
    baseDN:
    # LDAP user to bind with (if not using anonymous bind)
    bindDN:
    # Secret name and key with LDAP password to bind with (if not using anonymous bind)
    bindPW:
    # LDAP attribute for email and group ID attribute names as comma separated string values.
    attributes:
    # LDAP group allow list
    groupAllowList:
    # Enable LDAP TLS
    tls: true
    # ConfigMap name and key with CA certificate for LDAP server
    tlsCert:
      configMap:
        name: "ldap-tls-cert"
        key: "certificate.crt"

OIDC SSO

global: 
  auth:
    sessionLengthHours: 720
    oidc:
      clientId: ""
      secret: ""
      # Only include if your IdP requires it.
      authMethod: ""
      issuer: ""

authMethod is optional.

SMTP

global:
  email:
    smtp:
      host: ""
      port: 587
      user: ""
      password: ""

Environment Variables

global:
  extraEnv:
    GLOBAL_ENV: "example"

Custom certificate authority

customCACerts is a list and can take many certificates. Certificate authorities specified in customCACerts only apply to the W&B Server application.

global:
  customCACerts:
  - |
    -----BEGIN CERTIFICATE-----
    MIIBnDCCAUKgAwIBAg.....................fucMwCgYIKoZIzj0EAwIwLDEQ
    MA4GA1UEChMHSG9tZU.....................tZUxhYiBSb290IENBMB4XDTI0
    MDQwMTA4MjgzMFoXDT.....................oNWYggsMo8O+0mWLYMAoGCCqG
    SM49BAMCA0gAMEUCIQ.....................hwuJgyQRaqMI149div72V2QIg
    P5GD+5I+02yEp58Cwxd5Bj2CvyQwTjTO4hiVl1Xd0M0=
    -----END CERTIFICATE-----
  - |
    -----BEGIN CERTIFICATE-----
    MIIBxTCCAWugAwIB.......................qaJcwCgYIKoZIzj0EAwIwLDEQ
    MA4GA1UEChMHSG9t.......................tZUxhYiBSb290IENBMB4XDTI0
    MDQwMTA4MjgzMVoX.......................UK+moK4nZYvpNpqfvz/7m5wKU
    SAAwRQIhAIzXZMW4.......................E8UFqsCcILdXjAiA7iTluM0IU
    aIgJYVqKxXt25blH/VyBRzvNhViesfkNUQ==
    -----END CERTIFICATE-----

CA certificates can also be stored in a ConfigMap:

global:
  caCertsConfigMap: custom-ca-certs

The ConfigMap must look like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-ca-certs
data:
  ca-cert1.crt: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  ca-cert2.crt: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----

If using a ConfigMap, each key in the ConfigMap must end with .crt (for example, my-cert.crt or ca-cert1.crt). This naming convention is required for update-ca-certificates to parse and add each certificate to the system CA store.

Custom security context

Each W&B component supports custom security context configurations of the following form:

pod:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    runAsGroup: 0
    fsGroup: 1001
    fsGroupChangePolicy: Always
    seccompProfile:
      type: RuntimeDefault
container:
  securityContext:
    capabilities:
      drop:
        - ALL
    readOnlyRootFilesystem: false
    allowPrivilegeEscalation: false

The only valid value for runAsGroup: is 0. Any other value is an error.

For example, to configure the application pod, add a section app to your configuration:

global:
  ...
app:
  pod:
    securityContext:
      runAsNonRoot: true
      runAsUser: 1001
      runAsGroup: 0
      fsGroup: 1001
      fsGroupChangePolicy: Always
      seccompProfile:
        type: RuntimeDefault
  container:
    securityContext:
      capabilities:
        drop:
          - ALL
      readOnlyRootFilesystem: false
      allowPrivilegeEscalation: false

The same concept applies to console, weave, weave-trace and parquet.

Configuration Reference for W&B Operator

This section describes configuration options for W&B Kubernetes operator (wandb-controller-manager). The operator receives its configuration in the form of a YAML file.

By default, the W&B Kubernetes operator does not need a configuration file. Create a configuration file if required. For example, you might need a configuration file to specify custom certificate authorities, deploy in an air gap environment and so forth.

Find the full list of spec customization in the Helm repository.

Custom CA

A custom certificate authority (customCACerts), is a list and can take many certificates. Those certificate authorities when added only apply to the W&B Kubernetes operator (wandb-controller-manager).

customCACerts:
- |
  -----BEGIN CERTIFICATE-----
  MIIBnDCCAUKgAwIBAg.....................fucMwCgYIKoZIzj0EAwIwLDEQ
  MA4GA1UEChMHSG9tZU.....................tZUxhYiBSb290IENBMB4XDTI0
  MDQwMTA4MjgzMFoXDT.....................oNWYggsMo8O+0mWLYMAoGCCqG
  SM49BAMCA0gAMEUCIQ.....................hwuJgyQRaqMI149div72V2QIg
  P5GD+5I+02yEp58Cwxd5Bj2CvyQwTjTO4hiVl1Xd0M0=
  -----END CERTIFICATE-----
- |
  -----BEGIN CERTIFICATE-----
  MIIBxTCCAWugAwIB.......................qaJcwCgYIKoZIzj0EAwIwLDEQ
  MA4GA1UEChMHSG9t.......................tZUxhYiBSb290IENBMB4XDTI0
  MDQwMTA4MjgzMVoX.......................UK+moK4nZYvpNpqfvz/7m5wKU
  SAAwRQIhAIzXZMW4.......................E8UFqsCcILdXjAiA7iTluM0IU
  aIgJYVqKxXt25blH/VyBRzvNhViesfkNUQ==
  -----END CERTIFICATE-----

CA certificates can also be stored in a ConfigMap:

caCertsConfigMap: custom-ca-certs

The ConfigMap must look like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-ca-certs
data:
  ca-cert1.crt: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  ca-cert2.crt: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----

Each key in the ConfigMap must end with .crt (e.g., my-cert.crt or ca-cert1.crt). This naming convention is required for update-ca-certificates to parse and add each certificate to the system CA store.

FAQ

What is the purpose/role of each individual pod?

wandb-app: the core of W&B, including the GraphQL API and frontend application. It powers most of our platform’s functionality.
wandb-console: the administration console, accessed via /console.
wandb-otel: the OpenTelemetry agent, which collects metrics and logs from resources at the Kubernetes layer for display in the administration console.
wandb-prometheus: the Prometheus server, which captures metrics from various components for display in the administration console.
wandb-parquet: a backend microservice separate from the wandb-app pod that exports database data to object storage in Parquet format.
wandb-weave: another backend microservice that loads query tables in the UI and supports various core app features.
wandb-weave-trace: a framework for tracking, experimenting with, evaluating, deploying, and improving LLM-based applications. The framework is accessed via the wandb-app pod.

How to get the W&B Operator Console password

See Accessing the W&B Kubernetes Operator Management Console.

How to access the W&B Operator Console if Ingress doesn’t work

Execute the following command on a host that can reach the Kubernetes cluster:

kubectl port-forward svc/wandb-console 8082

Access the console in the browser with https://localhost:8082/ console.

See Accessing the W&B Kubernetes Operator Management Console on how to get the password (Option 2).

How to view W&B Server logs

The application pod is named wandb-app-xxx.

kubectl get pods
kubectl logs wandb-XXXXX-XXXXX

How to identify the Kubernetes ingress class

You can get the ingress class installed in your cluster by running

kubectl get ingressclass

5.1.3.2.1 - Kubernetes operator for air-gapped instances

Deploy W&B Platform with Kubernetes Operator (Airgapped)

Introduction

This guide provides step-by-step instructions to deploy the W&B Platform in air-gapped customer-managed environments.

Use an internal repository or registry to host the Helm charts and container images. Run all commands in a shell console with proper access to the Kubernetes cluster.

You could utilize similar commands in any continuous delivery tooling that you use to deploy Kubernetes applications.

Step 1: Prerequisites

Before starting, make sure your environment meets the following requirements:

Kubernetes version >= 1.28
Helm version >= 3
Access to an internal container registry with the required W&B images
Access to an internal Helm repository for W&B Helm charts

Step 2: Prepare internal container registry

Before proceeding with the deployment, you must ensure that the following container images are available in your internal container registry:

These images are critical for the successful deployment of W&B components. W&B recommends that you use WSM to prepare the container registry.

If your organization already uses an internal container registry, you can add the images to it. Otherwise, follow the proceeding section to use a called WSM to prepare the container repository.

You are responsible for tracking the Operator’s requirements and for checking for and downloading image upgrades, either by using WSM or by using your organization’s own processes.

Install WSM

Install WSM using one of these methods.

WSM requires a functioning Docker installation.

Bash

Run the Bash script directly from GitHub:

curl -sSL https://raw.githubusercontent.com/wandb/wsm/main/install.sh | bash

The script downloads the binary to the folder in which you executed the script. To move it to another folder, execute:

sudo mv wsm /usr/local/bin

GitHub

Download or clone WSM from the W&B managed wandb/wsm GitHub repository at https://github.com/wandb/wsm. See the wandb/wsm release notes for the latest release.

List images and their versions

Get an up to date list of image versions using wsm list.

wsm list

The output looks similar to the following:

:package: Starting the process to list all images required for deployment...
Operator Images:
  wandb/controller:1.16.1
W&B Images:
  wandb/local:0.62.2
  docker.io/bitnami/redis:7.2.4-debian-12-r9
  quay.io/prometheus-operator/prometheus-config-reloader:v0.67.0
  quay.io/prometheus/prometheus:v2.47.0
  otel/opentelemetry-collector-contrib:0.97.0
  wandb/console:2.13.1
Here are the images required to deploy W&B. Ensure these images are available in your internal container registry and update the values.yaml accordingly.

Download images

Download all images in the latest versions using wsm download.

wsm download

The output looks similar to the following:

Downloading operator helm chart
Downloading wandb helm chart
✓ wandb/controller:1.16.1
✓ docker.io/bitnami/redis:7.2.4-debian-12-r9
✓ otel/opentelemetry-collector-contrib:0.97.0
✓ quay.io/prometheus-operator/prometheus-config-reloader:v0.67.0
✓ wandb/console:2.13.1
✓ quay.io/prometheus/prometheus:v2.47.0

  Done! Installed 7 packages.

WSM downloads a .tgz archive for each image to the bundle directory.

Step 3: Prepare internal Helm chart repository

Along with the container images, you also must ensure that the following Helm charts are available in your internal Helm Chart repository. The WSM tool introduced in the last step can also download the Helm charts. Alternatively, download them here:

The operator chart is used to deploy the W&B Operator, which is also referred to as the Controller Manager. The platform chart is used to deploy the W&B Platform using the values configured in the custom resource definition (CRD).

Step 4: Set up Helm repository

Now, configure the Helm repository to pull the W&B Helm charts from your internal repository. Run the following commands to add and update the Helm repository:

helm repo add local-repo https://charts.yourdomain.com
helm repo update

Step 5: Install the Kubernetes operator

The W&B Kubernetes operator, also known as the controller manager, is responsible for managing the W&B platform components. To install it in an air-gapped environment, you must configure it to use your internal container registry.

To do so, you must override the default image settings to use your internal container registry and set the key airgapped: true to indicate the expected deployment type. Update the values.yaml file as shown below:

image:
  repository: registry.yourdomain.com/library/controller
  tag: 1.13.3
airgapped: true

Replace the tag with the version that is available in your internal registry.

Install the operator and the CRD:

helm upgrade --install operator wandb/operator -n wandb --create-namespace -f values.yaml

For full details about the supported values, refer to the Kubernetes operator GitHub repository.

Step 6: Configure W&B Custom Resource

After installing the W&B Kubernetes operator, you must configure the Custom Resource (CR) to point to your internal Helm repository and container registry.

This configuration ensures that the Kubernetes operators uses your internal registry and repository are when it deploys the required components of the W&B platform.

Copy this example CR to a new file named wandb.yaml.

apiVersion: apps.wandb.com/v1
kind: WeightsAndBiases
metadata:
  labels:
    app.kubernetes.io/instance: wandb
    app.kubernetes.io/name: weightsandbiases
  name: wandb
  namespace: default

spec:
  chart:
    url: http://charts.yourdomain.com
    name: operator-wandb
    version: 0.18.0

  values:
    global:
      host: https://wandb.yourdomain.com
      license: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      bucket:
        accessKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        secretKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        name: s3.yourdomain.com:port #Ex.: s3.yourdomain.com:9000
        path: bucket_name
        provider: s3
        region: us-east-1
      mysql:
        database: wandb
        host: mysql.home.lab
        password: password
        port: 3306
        user: wandb
      extraEnv:
        ENABLE_REGISTRY_UI: 'true'
    
    # If install: true, Helm installs a MySQL database for the deployment to use. Set to `false` to use your own external MySQL deployment.
    mysql:
      install: false

    app:
      image:
        repository: registry.yourdomain.com/local
        tag: 0.59.2

    console:
      image:
        repository: registry.yourdomain.com/console
        tag: 2.12.2

    ingress:
      annotations:
        nginx.ingress.kubernetes.io/proxy-body-size: 64m
      class: nginx

To deploy the W&B platform, the Kubernetes Operator uses the values from your CR to configure the operator-wandb Helm chart from your internal repository.

Replace all tags/versions with the versions that are available in your internal registry.

More information on creating the preceding configuration file can be found here.

Step 7: Deploy the W&B platform

Now that the Kubernetes operator and the CR are configured, apply the wandb.yaml configuration to deploy the W&B platform:

kubectl apply -f wandb.yaml

FAQ

Refer to the below frequently asked questions (FAQs) and troubleshooting tips during the deployment process:

There is another ingress class. Can that class be used?

Yes, you can configure your ingress class by modifying the ingress settings in values.yaml.

The certificate bundle has more than one certificate. Would that work?

You must split the certificates into multiple entries in the customCACerts section of values.yaml.

How do you prevent the Kubernetes operator from applying unattended updates. Is that possible?

You can turn off auto-updates from the W&B console. Reach out to your W&B team for any questions on the supported versions. W&B supports a major W&B Server release for 12 months from its initial release date. Customers with Self-managed instances are responsible for upgrading in time to maintain support. Avoid staying on an unsupported version. Refer to Release policies and processes.

W&B strongly recommends customers with Self-managed instances to update their deployments with the latest release at minimum once per quarter to maintain support and receive the latest features, performance improvements, and fixes.

Does the deployment work if the environment has no connection to public repositories?

If your configuration sets airgapped to true, the Kubernetes operator uses only your internal resources and does not attempt to connect to public repositories.

5.1.3.3 - Install on public cloud

5.1.3.3.1 - Deploy W&B Platform on AWS

Hosting W&B Server on AWS.

W&B recommends using the W&B Server AWS Terraform Module to deploy the platform on AWS.

Before you start, W&B recommends that you choose one of the remote backends available for Terraform to store the State File.

The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.

The Terraform Module deploys the following mandatory components:

Load Balancer
AWS Identity & Access Management (IAM)
AWS Key Management System (KMS)
Amazon Aurora MySQL
Amazon VPC
Amazon S3
Amazon Route53
Amazon Certificate Manager (ACM)
Amazon Elastic Load Balancing (ALB)
Amazon Secrets Manager

Other deployment options can also include the following optional components:

Elastic Cache for Redis
SQS

Pre-requisite permissions

The account that runs Terraform needs to be able to create all components described in the Introduction and permission to create IAM Policies and IAM Roles and assign roles to resources.

General steps

The steps on this topic are common for any deployment option covered by this documentation.

Prepare the development environment.
- Install Terraform
- W&B recommend creating a Git repository for version control.
Create the terraform.tfvars file.

The tvfars file content can be customized according to the installation type, but the minimum recommended will look like the example below.
```
namespace                  = "wandb"
license                    = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
subdomain                  = "wandb-aws"
domain_name                = "wandb.ml"
zone_id                    = "xxxxxxxxxxxxxxxx"
allowed_inbound_cidr       = ["0.0.0.0/0"]
allowed_inbound_ipv6_cidr  = ["::/0"]
eks_cluster_version        = "1.29"
```
Ensure to define variables in your tvfars file before you deploy because the namespace variable is a string that prefixes all resources created by Terraform.

The combination of subdomain and domain will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will be wandb-aws.wandb.ml and the DNS zone_id where the FQDN record will be created.

Both allowed_inbound_cidr and allowed_inbound_ipv6_cidr also require setting. In the module, this is a mandatory input. The proceeding example permits access from any source to the W&B installation.
Create the file versions.tf

This file will contain the Terraform and Terraform provider versions required to deploy W&B in AWS
```
provider "aws" {
  region = "eu-central-1"

  default_tags {
    tags = {
      GithubRepo = "terraform-aws-wandb"
      GithubOrg  = "wandb"
      Enviroment = "Example"
      Example    = "PublicDnsExternal"
    }
  }
}
```
Refer to the Terraform Official Documentation to configure the AWS provider.

Optionally, but highly recommended, add the remote backend configuration mentioned at the beginning of this documentation.

Create the file variables.tf

For every option configured in the terraform.tfvars Terraform requires a correspondent variable declaration.

variable "namespace" {
  type        = string
  description = "Name prefix used for resources"
}

variable "domain_name" {
  type        = string
  description = "Domain name used to access instance."
}

variable "subdomain" {
  type        = string
  default     = null
  description = "Subdomain for accessing the Weights & Biases UI."
}

variable "license" {
  type = string
}

variable "zone_id" {
  type        = string
  description = "Domain for creating the Weights & Biases subdomain on."
}

variable "allowed_inbound_cidr" {
 description = "CIDRs allowed to access wandb-server."
 nullable    = false
 type        = list(string)
}

variable "allowed_inbound_ipv6_cidr" {
 description = "CIDRs allowed to access wandb-server."
 nullable    = false
 type        = list(string)
}

variable "eks_cluster_version" {
 description = "EKS cluster kubernetes version"
 nullable    = false
 type        = string
}

Recommended deployment option

This is the most straightforward deployment option configuration that creates all Mandatory components and installs in the Kubernetes Cluster the latest version of W&B.

Create the main.tf

In the same directory where you created the files in the General Steps, create a file main.tf with the following content:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "~>7.0"

  namespace   = var.namespace
  domain_name = var.domain_name
  subdomain   = var.subdomain
  zone_id     = var.zone_id

  allowed_inbound_cidr           = var.allowed_inbound_cidr
  allowed_inbound_ipv6_cidr      = var.allowed_inbound_ipv6_cidr

  public_access                  = true
  external_dns                   = true
  kubernetes_public_access       = true
  kubernetes_public_access_cidrs = ["0.0.0.0/0"]
  eks_cluster_version            = var.eks_cluster_version
}

 data "aws_eks_cluster" "eks_cluster_id" {
   name = module.wandb_infra.cluster_name
 }

 data "aws_eks_cluster_auth" "eks_cluster_auth" {
   name = module.wandb_infra.cluster_name
 }

 provider "kubernetes" {
   host                   = data.aws_eks_cluster.eks_cluster_id.endpoint
   cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster_id.certificate_authority.0.data)
   token                  = data.aws_eks_cluster_auth.eks_cluster_auth.token
 }


 provider "helm" {
   kubernetes {
     host                   = data.aws_eks_cluster.eks_cluster_id.endpoint
     cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster_id.certificate_authority.0.data)
     token                  = data.aws_eks_cluster_auth.eks_cluster_auth.token
   }
 }

 output "url" {
   value = module.wandb_infra.url
 }

 output "bucket" {
   value = module.wandb_infra.bucket_name
 }

Deploy W&B

To deploy W&B, execute the following commands:

terraform init
terraform apply -var-file=terraform.tfvars

Enable REDIS

Another deployment option uses Redis to cache the SQL queries and speed up the application response when loading the metrics for the experiments.

You need to add the option create_elasticache_subnet = true to the same main.tf file described in the Recommended deployment section to enable the cache.

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "~>7.0"

  namespace   = var.namespace
  domain_name = var.domain_name
  subdomain   = var.subdomain
  zone_id     = var.zone_id
	**create_elasticache_subnet = true**
}
[...]

Enable message broker (queue)

Deployment option 3 consists of enabling the external message broker. This is optional because the W&B brings embedded a broker. This option doesn’t bring a performance improvement.

The AWS resource that provides the message broker is the SQS, and to enable it, you will need to add the option use_internal_queue = false to the same main.tf described in the Recommended deployment section.

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "~>7.0"

  namespace   = var.namespace
  domain_name = var.domain_name
  subdomain   = var.subdomain
  zone_id     = var.zone_id
  **use_internal_queue = false**

[...]
}

Other deployment options

You can combine all three deployment options adding all configurations to the same file. The Terraform Module provides several options that can be combined along with the standard options and the minimal configuration found in Deployment - Recommended

Manual configuration

To use an Amazon S3 bucket as a file storage backend for W&B, you will need to:

Create an Amazon S3 Bucket and Bucket Notifications
Create SQS Queue
Grant Permissions to Node Running W&B

you’ll need to create a bucket, along with an SQS queue configured to receive object creation notifications from that bucket. Your instance will need permissions to read from this queue.

Create an S3 Bucket and Bucket Notifications

Follow the procedure bellow to create an Amazon S3 bucket and enable bucket notifications.

Navigate to Amazon S3 in the AWS Console.
Select Create bucket.
Within the Advanced settings, select Add notification within the Events section.
Configure all object creation events to be sent to the SQS Queue you configured earlier.

Enable CORS access. Your CORS configuration should look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>http://YOUR-W&B-SERVER-IP</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

Create an SQS Queue

Follow the procedure below to create an SQS Queue:

Navigate to Amazon SQS in the AWS Console.
Select Create queue.
From the Details section, select a Standard queue type.
Within the Access policy section, add permission to the following principals:

SendMessage
ReceiveMessage
ChangeMessageVisibility
DeleteMessage
GetQueueUrl

Optionally add an advanced access policy in the Access Policy section. For example, the policy for accessing Amazon SQS with a statement is as follows:

{
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Effect" : "Allow",
        "Principal" : "*",
        "Action" : ["sqs:SendMessage"],
        "Resource" : "<sqs-queue-arn>",
        "Condition" : {
          "ArnEquals" : { "aws:SourceArn" : "<s3-bucket-arn>" }
        }
      }
    ]
}

Grant permissions to node that runs W&B

The node where W&B server is running must be configured to permit access to Amazon S3 and Amazon SQS. Depending on the type of server deployment you have opted for, you may need to add the following policy statements to your node role:

{
   "Statement":[
      {
         "Sid":"",
         "Effect":"Allow",
         "Action":"s3:*",
         "Resource":"arn:aws:s3:::<WANDB_BUCKET>"
      },
      {
         "Sid":"",
         "Effect":"Allow",
         "Action":[
            "sqs:*"
         ],
         "Resource":"arn:aws:sqs:<REGION>:<ACCOUNT>:<WANDB_QUEUE>"
      }
   ]
}

Configure W&B server

Finally, configure your W&B Server.

Navigate to the W&B settings page at http(s)://YOUR-W&B-SERVER-HOST/system-admin.
Enable the **Use an external file storage backend option
Provide information about your Amazon S3 bucket, region, and Amazon SQS queue in the following format:

File Storage Bucket: s3://<bucket-name>
File Storage Region (AWS only): <region>
Notification Subscription: sqs://<queue-name>

Select Update settings to apply the new settings.

Upgrade your W&B version

Follow the steps outlined here to update W&B:

Add wandb_version to your configuration in your wandb_app module. Provide the version of W&B you want to upgrade to. For example, the following line specifies W&B version 0.48.1:

module "wandb_app" {
    source  = "wandb/wandb/kubernetes"
    version = "~>1.0"

    license       = var.license
    wandb_version = "0.48.1"

Alternatively, you can add the wandb_version to the terraform.tfvars and create a variable with the same name and instead of using the literal value, use the var.wandb_version

After you update your configuration, complete the steps described in the Recommended deployment section.

Migrate to operator-based AWS Terraform modules

This section details the steps required to upgrade from pre-operator to post-operator environments using the terraform-aws-wandb module.

The transition to a Kubernetes operator pattern is necessary for the W&B architecture. See the architecture shift explanation for a detailed explanation.

Before and after architecture

Previously, the W&B architecture used:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "1.16.10"
  ...
}

to control the infrastructure:

and this module to deploy the W&B Server:

module "wandb_app" {
  source  = "wandb/wandb/kubernetes"
  version = "1.12.0"
}

Post-transition, the architecture uses:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "4.7.2"
  ...
}

to manage both the installation of infrastructure and the W&B Server to the Kubernetes cluster, thus eliminating the need for the module "wandb_app" in post-operator.tf.

This architectural shift enables additional features (like OpenTelemetry, Prometheus, HPAs, Kafka, and image updates) without requiring manual Terraform operations by SRE/Infrastructure teams.

To commence with a base installation of the W&B Pre-Operator, ensure that post-operator.tf has a .disabled file extension and pre-operator.tf is active (that does not have a .disabled extension). Those files can be found here.

Prerequisites

Before initiating the migration process, ensure the following prerequisites are met:

Egress: The deployment can’t be airgapped. It needs access to deploy.wandb.ai to get the latest spec for the Release Channel.
AWS Credentials: Proper AWS credentials configured to interact with your AWS resources.
Terraform Installed: The latest version of Terraform should be installed on your system.
Route53 Hosted Zone: An existing Route53 hosted zone corresponding to the domain under which the application will be served.
Pre-Operator Terraform Files: Ensure pre-operator.tf and associated variable files like pre-operator.tfvars are correctly set up.

Pre-Operator set up

Execute the following Terraform commands to initialize and apply the configuration for the Pre-Operator setup:

terraform init -upgrade
terraform apply -var-file=./pre-operator.tfvars

pre-operator.tf should look something like this:

namespace     = "operator-upgrade"
domain_name   = "sandbox-aws.wandb.ml"
zone_id       = "Z032246913CW32RVRY0WU"
subdomain     = "operator-upgrade"
wandb_license = "ey..."
wandb_version = "0.51.2"

The pre-operator.tf configuration calls two modules:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "1.16.10"
  ...
}

This module spins up the infrastructure.

module "wandb_app" {
  source  = "wandb/wandb/kubernetes"
  version = "1.12.0"
}

This module deploys the application.

Post-Operator Setup

Make sure that pre-operator.tf has a .disabled extension, and post-operator.tf is active.

The post-operator.tfvars includes additional variables:

...
# wandb_version = "0.51.2" is now managed via the Release Channel or set in the User Spec.

# Required Operator Variables for Upgrade:
size                 = "small"
enable_dummy_dns     = true
enable_operator_alb  = true
custom_domain_filter = "sandbox-aws.wandb.ml"

Run the following commands to initialize and apply the Post-Operator configuration:

terraform init -upgrade
terraform apply -var-file=./post-operator.tfvars

The plan and apply steps will update the following resources:

actions:
  create:
    - aws_efs_backup_policy.storage_class
    - aws_efs_file_system.storage_class
    - aws_efs_mount_target.storage_class["0"]
    - aws_efs_mount_target.storage_class["1"]
    - aws_eks_addon.efs
    - aws_iam_openid_connect_provider.eks
    - aws_iam_policy.secrets_manager
    - aws_iam_role_policy_attachment.ebs_csi
    - aws_iam_role_policy_attachment.eks_efs
    - aws_iam_role_policy_attachment.node_secrets_manager
    - aws_security_group.storage_class_nfs
    - aws_security_group_rule.nfs_ingress
    - random_pet.efs
    - aws_s3_bucket_acl.file_storage
    - aws_s3_bucket_cors_configuration.file_storage
    - aws_s3_bucket_ownership_controls.file_storage
    - aws_s3_bucket_server_side_encryption_configuration.file_storage
    - helm_release.operator
    - helm_release.wandb
    - aws_cloudwatch_log_group.this[0]
    - aws_iam_policy.default
    - aws_iam_role.default
    - aws_iam_role_policy_attachment.default
    - helm_release.external_dns
    - aws_default_network_acl.this[0]
    - aws_default_route_table.default[0]
    - aws_iam_policy.default
    - aws_iam_role.default
    - aws_iam_role_policy_attachment.default
    - helm_release.aws_load_balancer_controller

  update_in_place:
    - aws_iam_policy.node_IMDSv2
    - aws_iam_policy.node_cloudwatch
    - aws_iam_policy.node_kms
    - aws_iam_policy.node_s3
    - aws_iam_policy.node_sqs
    - aws_eks_cluster.this[0]
    - aws_elasticache_replication_group.default
    - aws_rds_cluster.this[0]
    - aws_rds_cluster_instance.this["1"]
    - aws_default_security_group.this[0]
    - aws_subnet.private[0]
    - aws_subnet.private[1]
    - aws_subnet.public[0]
    - aws_subnet.public[1]
    - aws_launch_template.workers["primary"]

  destroy:
    - kubernetes_config_map.config_map
    - kubernetes_deployment.wandb
    - kubernetes_priority_class.priority
    - kubernetes_secret.secret
    - kubernetes_service.prometheus
    - kubernetes_service.service
    - random_id.snapshot_identifier[0]

  replace:
    - aws_autoscaling_attachment.autoscaling_attachment["primary"]
    - aws_route53_record.alb
    - aws_eks_node_group.workers["primary"]

You should see something like this:

Note that in post-operator.tf, there is a single:

module "wandb_infra" {
  source  = "wandb/wandb/aws"
  version = "4.7.2"
  ...
}

Changes in the post-operator configuration:

Update Required Providers: Change required_providers.aws.version from 3.6 to 4.0 for provider compatibility.
DNS and Load Balancer Configuration: Integrate enable_dummy_dns and enable_operator_alb to manage DNS records and AWS Load Balancer setup through an Ingress.
License and Size Configuration: Transfer the license and size parameters directly to the wandb_infra module to match new operational requirements.
Custom Domain Handling: If necessary, use custom_domain_filter to troubleshoot DNS issues by checking the External DNS pod logs within the kube-system namespace.
Helm Provider Configuration: Enable and configure the Helm provider to manage Kubernetes resources effectively:

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.app_cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.app_cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.app_cluster.token
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.app_cluster.name]
      command     = "aws"
    }
  }
}

This comprehensive setup ensures a smooth transition from the Pre-Operator to the Post-Operator configuration, leveraging new efficiencies and capabilities enabled by the operator model.

5.1.3.3.2 - Deploy W&B Platform on GCP

Hosting W&B Server on GCP.

If you’ve determined to self-managed W&B Server, W&B recommends using the W&B Server GCP Terraform Module to deploy the platform on GCP.

The module documentation is extensive and contains all available options that can be used.

Before you start, W&B recommends that you choose one of the remote backends available for Terraform to store the State File.

The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.

The Terraform Module will deploy the following mandatory components:

VPC
Cloud SQL for MySQL
Cloud Storage Bucket
Google Kubernetes Engine
KMS Crypto Key
Load Balancer

Other deployment options can also include the following optional components:

Memory store for Redis
Pub/Sub messages system

Pre-requisite permissions

The account that will run the terraform need to have the role roles/owner in the GCP project used.

General steps

The steps on this topic are common for any deployment option covered by this documentation.

Prepare the development environment.
- Install Terraform
- We recommend creating a Git repository with the code that will be used, but you can keep your files locally.
- Create a project in Google Cloud Console
- Authenticate with GCP (make sure to install gcloud before) gcloud auth application-default login
Create the terraform.tfvars file.

The tvfars file content can be customized according to the installation type, but the minimum recommended will look like the example below.
```
project_id  = "wandb-project"
region      = "europe-west2"
zone        = "europe-west2-a"
namespace   = "wandb"
license     = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
subdomain   = "wandb-gcp"
domain_name = "wandb.ml"
```
The variables defined here need to be decided before the deployment because. The namespace variable will be a string that will prefix all resources created by Terraform.

The combination of subdomain and domain will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will be wandb-gcp.wandb.ml

Create the file variables.tf

For every option configured in the terraform.tfvars Terraform requires a correspondent variable declaration.

variable "project_id" {
  type        = string
  description = "Project ID"
}

variable "region" {
  type        = string
  description = "Google region"
}

variable "zone" {
  type        = string
  description = "Google zone"
}

variable "namespace" {
  type        = string
  description = "Namespace prefix used for resources"
}

variable "domain_name" {
  type        = string
  description = "Domain name for accessing the Weights & Biases UI."
}

variable "subdomain" {
  type        = string
  description = "Subdomain for access the Weights & Biases UI."
}

variable "license" {
  type        = string
  description = "W&B License"
}

Deployment - Recommended (~20 mins)

This is the most straightforward deployment option configuration that will create all Mandatory components and install in the Kubernetes Cluster the latest version of W&B.

Create the main.tf

In the same directory where you created the files in the General Steps, create a file main.tf with the following content:

provider "google" {
 project = var.project_id
 region  = var.region
 zone    = var.zone
}

provider "google-beta" {
 project = var.project_id
 region  = var.region
 zone    = var.zone
}

data "google_client_config" "current" {}

provider "kubernetes" {
  host                   = "https://${module.wandb.cluster_endpoint}"
  cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
  token                  = data.google_client_config.current.access_token
}

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/google"
  version = "~> 5.0"

  namespace   = var.namespace
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain
}

# You'll want to update your DNS with the provisioned IP address
output "url" {
  value = module.wandb.url
}

output "address" {
  value = module.wandb.address
}

output "bucket_name" {
  value = module.wandb.bucket_name
}

Deploy W&B

To deploy W&B, execute the following commands:

terraform init
terraform apply -var-file=terraform.tfvars

Deployment with REDIS Cache

Another deployment option uses Redis to cache the SQL queries and speedup the application response when loading the metrics for the experiments.

You need to add the option create_redis = true to the same main.tf file specified in the recommended Deployment option section to enable the cache.

[...]

module "wandb" {
  source  = "wandb/wandb/google"
  version = "~> 1.0"

  namespace    = var.namespace
  license      = var.license
  domain_name  = var.domain_name
  subdomain    = var.subdomain
  allowed_inbound_cidrs = ["*"]
  #Enable Redis
  create_redis = true

}
[...]

Deployment with External Queue

Deployment option 3 consists of enabling the external message broker. This is optional because the W&B brings embedded a broker. This option doesn’t bring a performance improvement.

The GCP resource that provides the message broker is the Pub/Sub, and to enable it, you will need to add the option use_internal_queue = false to the same main.tf specified in the recommended Deployment option section

[...]

module "wandb" {
  source  = "wandb/wandb/google"
  version = "~> 1.0"

  namespace          = var.namespace
  license            = var.license
  domain_name        = var.domain_name
  subdomain          = var.subdomain
  allowed_inbound_cidrs = ["*"]
  #Create and use Pub/Sub
  use_internal_queue = false

}

[...]

Other deployment options

Manual configuration

To use a GCP Storage bucket as a file storage backend for W&B, you will need to create a:

PubSub Topic and Subscription
Storage Bucket
PubSub Notification

Create PubSub Topic and Subscription

Follow the procedure below to create a PubSub topic and subscription:

Navigate to the Pub/Sub service within the GCP Console
Select Create Topic and provide a name for your topic.
At the bottom of the page, select Create subscription. Ensure Delivery Type is set to Pull.
Click Create.

Make sure the service account or account that your instance is running has the pubsub.admin role on this subscription. For details, see https://cloud.google.com/pubsub/docs/access-control#console.

Create Storage Bucket

Navigate to the Cloud Storage Buckets page.
Select Create bucket and provide a name for your bucket. Ensure you choose a Standard storage class.

Ensure that the service account or account that your instance is running has both:

access to the bucket you created in the previous step
storage.objectAdmin role on this bucket. For details, see https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-add

Your instance also needs the iam.serviceAccounts.signBlob permission in GCP to create signed file URLs. Add Service Account Token Creator role to the service account or IAM member that your instance is running as to enable permission.

Enable CORS access. This can only be done using the command line. First, create a JSON file with the following CORS configuration.

cors:
- maxAgeSeconds: 3600
  method:
   - GET
   - PUT
     origin:
   - '<YOUR_W&B_SERVER_HOST>'
     responseHeader:
   - Content-Type

Note that the scheme, host, and port of the values for the origin must match exactly.

Make sure you have gcloud installed, and logged into the correct GCP Project.
Next, run the following:

gcloud storage buckets update gs://<BUCKET_NAME> --cors-file=<CORS_CONFIG_FILE>

Create PubSub Notification

Follow the procedure below in your command line to create a notification stream from the Storage Bucket to the Pub/Sub topic.

You must use the CLI to create a notification stream. Ensure you have gcloud installed.

Log into your GCP Project.
Run the following in your terminal:

gcloud pubsub topics list  # list names of topics for reference
gcloud storage ls          # list names of buckets for reference

# create bucket notification
gcloud storage buckets notifications create gs://<BUCKET_NAME> --topic=<TOPIC_NAME>

Further reference is available on the Cloud Storage website.

Configure W&B server

Finally, navigate to the W&B System Connections page at http(s)://YOUR-W&B-SERVER-HOST/console/settings/system.
Select the provider Google Cloud Storage (gcs),
Provide the name of the GCS bucket

Press Update settings to apply the new settings.

Upgrade W&B Server

Follow the steps outlined here to update W&B:

Add wandb_version to your configuration in your wandb_app module. Provide the version of W&B you want to upgrade to. For example, the following line specifies W&B version 0.48.1:

module "wandb_app" {
    source  = "wandb/wandb/kubernetes"
    version = "~>5.0"

    license       = var.license
    wandb_version = "0.58.1"

Alternatively, you can add the wandb_version to the terraform.tfvars and create a variable with the same name and instead of using the literal value, use the var.wandb_version

After you update your configuration, complete the steps described in the Deployment option section.

5.1.3.3.3 - Deploy W&B Platform on Azure

Hosting W&B Server on Azure.

If you’ve determined to self-managed W&B Server, W&B recommends using the W&B Server Azure Terraform Module to deploy the platform on Azure.

The module documentation is extensive and contains all available options that can be used. We will cover some deployment options in this document.

Before you start, we recommend you choose one of the remote backends available for Terraform to store the State File.

The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.

The Terraform Module will deploy the following mandatory components:

Azure Resource Group
Azure Virtual Network (VPC)
Azure MySQL Fliexible Server
Azure Storage Account & Blob Storage
Azure Kubernetes Service
Azure Application Gateway

Other deployment options can also include the following optional components:

Azure Cache for Redis
Azure Event Grid

Pre-requisite permissions

The simplest way to get the AzureRM provider configured is via Azure CLI but the incase of automation using Azure Service Principal can also be useful. Regardless the authentication method used, the account that will run the Terraform needs to be able to create all components described in the Introduction.

General steps

The steps on this topic are common for any deployment option covered by this documentation.

Prepare the development environment.

Install Terraform
We recommend creating a Git repository with the code that will be used, but you can keep your files locally.

Create the terraform.tfvars file The tvfars file content can be customized according to the installation type, but the minimum recommended will look like the example below.
```
 namespace     = "wandb"
 wandb_license = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
 subdomain     = "wandb-aws"
 domain_name   = "wandb.ml"
 location      = "westeurope"
```
The variables defined here need to be decided before the deployment because. The namespace variable will be a string that will prefix all resources created by Terraform.

The combination of subdomain and domain will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will be wandb-aws.wandb.ml and the DNS zone_id where the FQDN record will be created.
Create the file versions.tf This file will contain the Terraform and Terraform provider versions required to deploy W&B in AWS

terraform {
  required_version = "~> 1.3"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.17"
    }
  }
}

Refer to the Terraform Official Documentation to configure the AWS provider.

Optionally, but highly recommended, you can add the remote backend configuration mentioned at the beginning of this documentation.

Create the file variables.tf. For every option configured in the terraform.tfvars Terraform requires a correspondent variable declaration.

  variable "namespace" {
    type        = string
    description = "String used for prefix resources."
  }

  variable "location" {
    type        = string
    description = "Azure Resource Group location"
  }

  variable "domain_name" {
    type        = string
    description = "Domain for accessing the Weights & Biases UI."
  }

  variable "subdomain" {
    type        = string
    default     = null
    description = "Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route."
  }

  variable "license" {
    type        = string
    description = "Your wandb/local license"
  }

Recommended deployment

This is the most straightforward deployment option configuration that will create all Mandatory components and install in the Kubernetes Cluster the latest version of W&B.

Create the main.tf In the same directory where you created the files in the General Steps, create a file main.tf with the following content:

provider "azurerm" {
  features {}
}

provider "kubernetes" {
  host                   = module.wandb.cluster_host
  cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
  client_key             = base64decode(module.wandb.cluster_client_key)
  client_certificate     = base64decode(module.wandb.cluster_client_certificate)
}

provider "helm" {
  kubernetes {
    host                   = module.wandb.cluster_host
    cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
    client_key             = base64decode(module.wandb.cluster_client_key)
    client_certificate     = base64decode(module.wandb.cluster_client_certificate)
  }
}

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/azurerm"
  version = "~> 1.2"

  namespace   = var.namespace
  location    = var.location
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain

  deletion_protection = false

  tags = {
    "Example" : "PublicDns"
  }
}

output "address" {
  value = module.wandb.address
}

output "url" {
  value = module.wandb.url
}

Deploy to W&B To deploy W&B, execute the following commands:

terraform init
terraform apply -var-file=terraform.tfvars

Deployment with REDIS Cache

Another deployment option uses Redis to cache the SQL queries and speed up the application response when loading the metrics for the experiments.

You must add the option create_redis = true to the same main.tf file that you used in recommended deployment to enable the cache.

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/azurerm"
  version = "~> 1.2"


  namespace   = var.namespace
  location    = var.location
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain

  create_redis       = true # Create Redis
  [...]

Deployment with External Queue

Deployment option 3 consists of enabling the external message broker. This is optional because the W&B brings embedded a broker. This option doesn’t bring a performance improvement.

The Azure resource that provides the message broker is the Azure Event Grid, and to enable it, you must add the option use_internal_queue = false to the same main.tf that you used in the recommended deployment

# Spin up all required services
module "wandb" {
  source  = "wandb/wandb/azurerm"
  version = "~> 1.2"


  namespace   = var.namespace
  location    = var.location
  license     = var.license
  domain_name = var.domain_name
  subdomain   = var.subdomain

  use_internal_queue       = false # Enable Azure Event Grid
  [...]
}

Other deployment options

You can combine all three deployment options adding all configurations to the same file. The Terraform Module provides several options that you can combine along with the standard options and the minimal configuration found in recommended deployment

5.1.3.4 - Deploy W&B Platform On-premises

Hosting W&B Server on on-premises infrastructure

Reach out to the W&B Sales Team for related question: contact@wandb.com.

Infrastructure guidelines

Before you start deploying W&B, refer to the reference architecture, especially the infrastructure requirements.

MySQL database

W&B does not recommend using MySQL 5.7. If you are using MySQL 5.7, migrate to MySQL 8 for best compatibility with latest versions of W&B Server. The W&B Server currently only supports MySQL 8 versions 8.0.28 and above.

There are a number of enterprise services that make operating a scalable MySQL database simpler. W&B recommends looking into one of the following solutions:

Percona Server for MySQL

MySQL Operator for Kubernetes

Satisfy the conditions below if you run W&B Server MySQL 8.0 or when you upgrade from MySQL 5.7 to 8.0:

binlog_format = 'ROW'
innodb_online_alter_log_max_size = 268435456
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
binlog_row_image = 'MINIMAL'

Due to some changes in the way that MySQL 8.0 handles sort_buffer_size, you might need to update the sort_buffer_size parameter from its default value of 262144. The recommendation is to set the value to 67108864 (64MiB) to ensure that MySQL works efficiently with W&B. MySQL supports this configuration starting with v8.0.28.

Database considerations

Create a database and a user with the following SQL query. Replace SOME_PASSWORD with password of your choice:

CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'SOME_PASSWORD';
CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION;

This works only if the SSL certificate is trusted. W&B does not support self-signed certificates.

Parameter group configuration

Ensure that the following parameter groups are set to tune the database performance:

binlog_format = 'ROW'
innodb_online_alter_log_max_size = 268435456
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
binlog_row_image = 'MINIMAL'
sort_buffer_size = 67108864

Object storage

The object store can be externally hosted on a Minio cluster, or any Amazon S3 compatible object store that has support for signed URLs. Run the following script to check if your object store supports signed URLs.

Additionally, the following CORS policy needs to be applied to the object store.

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>http://YOUR-W&B-SERVER-IP</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedMethod>HEAD</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

You can specify your credentials in a connection string when you connect to an Amazon S3 compatible object store. For example, you can specify the following:

s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME

You can optionally tell W&B to only connect over TLS if you configure a trusted SSL certificate for your object store. To do so, add the tls query parameter to the URL. For example, the following URL example demonstrates how to add the TLS query parameter to an Amazon S3 URI:

s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME?tls=true

This works only if the SSL certificate is trusted. W&B does not support self-signed certificates.

Set BUCKET_QUEUE to internal:// if you use third-party object stores. This tells the W&B server to manage all object notifications internally instead of depending on an external SQS queue or equivalent.

The most important things to consider when running your own object store are:

Storage capacity and performance. It’s fine to use magnetic disks, but you should be monitoring the capacity of these disks. Average W&B usage results in 10’s to 100’s of Gigabytes. Heavy usage could result in Petabytes of storage consumption.
Fault tolerance. At a minimum, the physical disk storing the objects should be on a RAID array. If you use minio, consider running it in distributed mode.
Availability. Monitoring should be configured to ensure the storage is available.

There are many enterprise alternatives to running your own object storage service such as:

MinIO set up

If you use minio, you can run the following commands to create a bucket.

mc config host add local http://$MINIO_HOST:$MINIO_PORT "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY" --api s3v4
mc mb --region=us-east1 local/local-files

Deploy W&B Server application to Kubernetes

The recommended installation method is with the official W&B Helm chart. Follow the Helm CLI deployment section to deploy the W&B Server application.

OpenShift

W&B supports operating from within an OpenShift Kubernetes cluster.

W&B recommends you install with the official W&B Helm chart.

Run the container as an un-privileged user

By default, containers use a $UID of 999. Specify $UID >= 100000 and a $GID of 0 if your orchestrator requires the container run with a non-root user.

W&B must start as the root group ($GID=0) for file system permissions to function properly.

An example security context for Kubernetes looks similar to the following:

spec:
  securityContext:
    runAsUser: 100000
    runAsGroup: 0

Networking

Load balancer

Run a load balancer that stop network requests at the appropriate network boundary.

Common load balancers include:

Ensure that all machines used to execute machine learning payloads, and the devices used to access the service through web browsers, can communicate to this endpoint.

SSL / TLS

W&B Server does not stop SSL. If your security policies require SSL communication within your trusted networks consider using a tool like Istio and side car containers. The load balancer itself should terminate SSL with a valid certificate. Using self-signed certificates is not supported and will cause a number of challenges for users. If possible using a service like Let’s Encrypt is a great way to provided trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you.

Example nginx configuration

The following is an example configuration using nginx as a reverse proxy.

events {}
http {
    # If we receive X-Forwarded-Proto, pass it through; otherwise, pass along the
    # scheme used to connect to this server
    map $http_x_forwarded_proto $proxy_x_forwarded_proto {
        default $http_x_forwarded_proto;
        ''      $scheme;
    }

    # Also, in the above case, force HTTPS
    map $http_x_forwarded_proto $sts {
        default '';
        "https" "max-age=31536000; includeSubDomains";
    }

    # If we receive X-Forwarded-Host, pass it though; otherwise, pass along $http_host
    map $http_x_forwarded_host $proxy_x_forwarded_host {
        default $http_x_forwarded_host;
        ''      $http_host;
    }

    # If we receive X-Forwarded-Port, pass it through; otherwise, pass along the
    # server port the client connected to
    map $http_x_forwarded_port $proxy_x_forwarded_port {
        default $http_x_forwarded_port;
        ''      $server_port;
    }

    # If we receive Upgrade, set Connection to "upgrade"; otherwise, delete any
    # Connection header that may have been passed to this server
    map $http_upgrade $proxy_connection {
        default upgrade;
        '' close;
    }

    server {
        listen 443 ssl;
        server_name         www.example.com;
        ssl_certificate     www.example.com.crt;
        ssl_certificate_key www.example.com.key;

        proxy_http_version 1.1;
        proxy_buffering off;
        proxy_set_header Host $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $proxy_connection;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $proxy_x_forwarded_proto;
        proxy_set_header X-Forwarded-Host $proxy_x_forwarded_host;

        location / {
            proxy_pass  http://$YOUR_UPSTREAM_SERVER_IP:8080/;
        }

        keepalive_timeout 10;
    }
}

Verify your installation

Very your W&B Server is configured properly. Run the following commands in your terminal:

pip install wandb
wandb login --host=https://YOUR_DNS_DOMAIN
wandb verify

Check log files to view any errors the W&B Server hits at startup. Run the following commands:

docker logs wandb-local

kubectl get pods
kubectl logs wandb-XXXXX-XXXXX

Contact W&B Support if you encounter errors.

5.1.3.5 - Update W&B license and version

Guide for updating W&B version and license across different installation methods.

Update your W&B Server Version and License with the same method you installed W&B Server with. The following table lists how to update your license and version based on different deployment methods:

Release Type	Description
Terraform	W&B supports three public Terraform modules for cloud deployment: AWS, GCP, and Azure.
Helm	You can use the Helm Chart to install W&B into an existing Kubernetes cluster.

Update with Terraform

Update your license and version with Terraform. The proceeding table lists W&B managed Terraform modules based cloud platform.

Cloud provider	Terraform module
AWS	AWS Terraform module
GCP	GCP Terraform module
Azure	Azure Terraform module

First, navigate to the W&B maintained Terraform module for your appropriate cloud provider. See the preceding table to find the appropriate Terraform module based on your cloud provider.

Within your Terraform configuration, update wandb_version and license in your Terraform wandb_app module configuration:

module "wandb_app" {
    source  = "wandb/wandb/<cloud-specific-module>"
    version = "new_version"
    license       = "new_license_key" # Your new license key
    wandb_version = "new_wandb_version" # Desired W&B version
    ...
}

Apply the Terraform configuration with terraform plan and terraform apply.
```
terraform init
terraform apply
```
(Optional) If you use a terraform.tfvars or other .tfvars file.

Update or create a terraform.tfvars file with the new W&B version and license key.
```
terraform plan -var-file="terraform.tfvars"
```
Apply the configuration. In your Terraform workspace directory execute:
```
terraform apply -var-file="terraform.tfvars"
```

Update with Helm

Update W&B with spec

Specify a new version by modifying the image.tag and/or license values in your Helm chart *.yaml configuration file:
```
license: 'new_license'
image:
  repository: wandb/local
  tag: 'new_version'
```

Execute the Helm upgrade with the following command:

helm repo update
helm upgrade --namespace=wandb --create-namespace \
  --install wandb wandb/wandb --version ${chart_version} \
  -f ${wandb_install_spec.yaml}

Update license and version directly

Set the new license key and image tag as environment variables:
```
export LICENSE='new_license'
export TAG='new_version'
```

Upgrade your Helm release with the command below, merging the new values with the existing configuration:

helm repo update
helm upgrade --namespace=wandb --create-namespace \
  --install wandb wandb/wandb --version ${chart_version} \
  --reuse-values --set license=$LICENSE --set image.tag=$TAG

For more details, see the upgrade guide in the public repository.

Update with admin UI

This method only works for updating licenses that are not set with an environment variable in the W&B server container, typically in self-managed Docker installations.

Obtain a new license from the W&B Deployment Page, ensuring it matches the correct organization and deployment ID for the deployment you are looking to upgrade.
Access the W&B Admin UI at <host-url>/system-settings.
Navigate to the license management section.
Enter the new license key and save your changes.

5.1.3.6 - Disable automatic updates for W&B Server

Learn how to disable automatic updates for W&B Server.

This page shows how to disable automatic version upgrades for W&B Server and pin its version. These instructions work for deployments managed by the W&B Kubernetes Operator only.

W&B supports a major W&B Server release for 12 months from its initial release date. Customers with Self-managed instances are responsible for upgrading in time to maintain support. Avoid staying on an unsupported version. W&B strongly recommends customers with Self-managed instances to update their deployments with the latest release at minimum once per quarter to maintain support and receive the latest features, performance improvements, and fixes.

Requirements

W&B Kubernetes Operator v1.13.0 or newer
System Console v2.12.2 or newer

To verify that you meet these requirements, refer to the W&B Custom Resource or Helm chart for your instance. Check the version values for the operator-wandb and system-console components.

Disable automatic updates

Log in to the W&B App as a user with the admin role.
Click the user icon at the top, then click System Console.
Go to Settings > Advanced, then select the Other tab.
In the Disable Auto Upgrades section, turn on Pin specific version.
Click the Select a version drop-down, select a W&B Server version.
Click Save.

Automatic upgrades are turned off and W&B Server is pinned at the version you selected.
Verify that automatic upgrades are turned off. Go to the Operator tab and search the reconciliation logs for the string Version pinning is enabled.

│info 2025-04-17T17:24:16Z wandb default No changes found
│info 2025-04-17T17:24:16Z wandb default Active spec found
│info 2025-04-17T17:24:16Z wandb default Desired spec
│info 2025-04-17T17:24:16Z wandb default License
│info 2025-04-17T17:24:16Z wandb default Version Pinning is enabled
│info 2025-04-17T17:24:16Z wandb default Found Weights & Biases instance, processing the spec...
│info 2025-04-17T17:24:16Z wandb default === Reconciling Weights & Biases instance...

5.2 - Identity and access management (IAM)

W&B Platform has three IAM scopes within W&B: Organizations, Teams, and Projects.

Organization

An Organization is the root scope in your W&B account or instance. All actions in your account or instance take place within the context of that root scope, including managing users, managing teams, managing projects within teams, tracking usage and more.

If you are using Multi-tenant Cloud, you may have more than one organization where each may correspond to a business unit, a personal user, a joint partnership with another business and more.

If you are using Dedicated Cloud or a Self-managed instance, it corresponds to one organization. Your company may have more than one of Dedicated Cloud or Self-managed instances to map to different business units or departments, though that is strictly an optional way to manage AI practioners across your businesses or departments.

For more information, see Manage organizations.

Team

A Team is a subscope within a organization, that may map to a business unit / function, department, or a project team in your company. You may have more than one team in your organization depending on your deployment type and pricing plan.

AI projects are organized within the context of a team. The access control within a team is governed by team admins, who may or may not be admins at the parent organization level.

For more information, see Add and manage teams.

Project

A Project is a subscope within a team, that maps to an actual AI project with specific intended outcomes. You may have more than one project within a team. Each project has a visibility mode which determines who can access it.

Every project is comprised of Workspaces and Reports, and is linked to relevant Artifacts, Sweeps, and Automations.

5.2.1 - Authentication

5.2.1.1 - Configure SSO with LDAP

Authenticate your credentials with the W&B Server LDAP server. The following guide explains how to configure the settings for W&B Server. It covers mandatory and optional configurations, as well as instructions for configuring the LDAP connection from systems settings UI. it also provides information on the different inputs of the LDAP configuration, such as the address, base distinguished name, and attributes. You can specify these attributes from the W&B App UI or using environment variables. You can setup either an anonymous bind, or bind with an administrator DN and Password.

Only W&B Admin roles can enable and configure LDAP authentication.

Configure LDAP connection

Navigate to the W&B App.
Select your profile icon from the upper right. From the dropdown, select System Settings.
Toggle Configure LDAP Client.
Add the details in the form. Refer to Configuring Parameters section for details on each input.
Click on Update Settings to test your settings. This will establish a test client/connection with the W&B server.
If your connection is verified, toggle the Enable LDAP Authentication and select the Update Settings button.

Set LDAP an connection with the following environment variables:

Environment variable	Required	Example
`LOCAL_LDAP_ADDRESS`	Yes	`ldaps://ldap.example.com:636`
`LOCAL_LDAP_BASE_DN`	Yes	`email=mail,group=gidNumber`
`LOCAL_LDAP_BIND_DN`	No	`cn=admin`, `dc=example,dc=org`
`LOCAL_LDAP_BIND_PW`	No
`LOCAL_LDAP_ATTRIBUTES`	Yes	`email=mail`, `group=gidNumber`
`LOCAL_LDAP_TLS_ENABLE`	No
`LOCAL_LDAP_GROUP_ALLOW_LIST`	No
`LOCAL_LDAP_LOGIN`	No

See the Configuration parameters section for definitions of each environment variable. Note that the environment variable prefix LOCAL_LDAP was omitted from the definition names for clarity.

Configuration parameters

The following table lists and describes required and optional LDAP configurations.

Environment variable	Definition	Required
`ADDRESS`	This is the address of your LDAP server within the VPC that hosts W&B Server.	Yes
`BASE_DN`	The root path searches start from and required for doing any queries into this directory.	Yes
`BIND_DN`	Path of the administrative user registered in the LDAP server. This is required if the LDAP server does not support unauthenticated binding. If specified, W&B Server connects to the LDAP server as this user. Otherwise, W&B Server connects using anonymous binding.	No
`BIND_PW`	The password for administrative user, this is used to authenticate the binding. If left blank, W&B Server connects using anonymous binding.	No
`ATTRIBUTES`	Provide an email and group ID attribute names as comma separated string values.	Yes
`TLS_ENABLE`	Enable TLS.	No
`GROUP_ALLOW_LIST`	Group allowlist.	No
`LOGIN`	This tells W&B Server to use LDAP to authenticate. Set to either `True` or `False`. Optionally set this to false to test the LDAP configuration. Set this to true to start LDAP authentication.	No

5.2.1.2 - Configure SSO with OIDC

W&B Server’s support for OpenID Connect (OIDC) compatible identity providers allows for management of user identities and group memberships through external identity providers like Okta, Keycloak, Auth0, Google, and Entra.

OpenID Connect (OIDC)

W&B Server supports the following OIDC authentication flows for integrating with external Identity Providers (IdPs).

Implicit Flow with Form Post
Authorization Code Flow with Proof Key for Code Exchange (PKCE)

These flows authenticate users and provide W&B Server with the necessary identity information (in the form of ID tokens) to manage access control.

The ID token is a JWT that contains the user’s identity information, such as their name, username, email, and group memberships. W&B Server uses this token to authenticate the user and map them to appropriate roles or groups in the system.

In the context of W&B Server, access tokens authorize requests to APIs on behalf of the user, but since W&B Server’s primary concern is user authentication and identity, it only requires the ID token.

You can use environment variables to configure IAM options for your Dedicated cloud or Self-managed instance.

To assist with configuring Identity Providers for Dedicated cloud or Self-managed W&B Server installations, follow these guidelines to follow for various IdPs. If you’re using the SaaS version of W&B, reach out to support@wandb.com for assistance in configuring an Auth0 tenant for your organization.

Follow the procedure below to set up AWS Cognito for authorization:

First, sign in to your AWS account and navigate to the AWS Cognito App.
Provide an allowed callback URL to configure the application in your IdP:
- Add http(s)://YOUR-W&B-HOST/oidc/callback as the callback URL. Replace YOUR-W&B-HOST with your W&B host path.
If your IdP supports universal logout, set the Logout URL to http(s)://YOUR-W&B-HOST. Replace YOUR-W&B-HOST with your W&B host path.

For example, if your application was running at https://wandb.mycompany.com, you would replace YOUR-W&B-HOST with wandb.mycompany.com.

The image below demonstrates how to provide allowed callback and sign-out URLs in AWS Cognito.

wandb/local uses the implicit grant with the form_post response type by default.

You can also configure wandb/local to perform an authorization_code grant that uses the PKCE Code Exchange flow.
Select one or more OAuth grant types to configure how AWS Cognito delivers tokens to your app.
W&B requires specific OpenID Connect (OIDC) scopes. Select the following from AWS Cognito App:
- “openid”
- “profile”
- “email”
For example, your AWS Cognito App UI should look similar to the following image:

Select the Auth Method in the settings page or set the OIDC_AUTH_METHOD environment variable to tell wandb/local which grant to.

You must set the Auth Method to pkce.
You need a Client ID and the URL of your OIDC issuer. The OpenID discovery document must be available at $OIDC_ISSUER/.well-known/openid-configuration

For example, , you can generate your issuer URL by appending your User Pool ID to the Cognito IdP URL from the App Integration tab within the User Pools section:

Do not use the “Cognito domain” for the IDP URL. Cognito provides it’s discovery document at https://cognito-idp.$REGION.amazonaws.com/$USER_POOL_ID

Follow the procedure below to set up Okta for authorization:

Log in to the Okta Portal.
On the left side, select Applications and then Applications again.
Click on “Create App integration.”
On the screen named “Create a new app integration,” select OIDC - OpenID Connect and Single-Page Application. Then click “Next.”
On the screen named “New Single-Page App Integration,” fill out the values as follows and click Save:
- App integration name, for example “W&B”
- Grant type: Select both Authorization Code and Implicit (hybrid)
- Sign-in redirect URIs: https://YOUR_W_AND_B_URL/oidc/callback
- Sign-out redirect URIs: https://YOUR_W_AND_B_URL/logout
- Assignments: Select Skip group assignment for now
On the overview screen of the Okta application that you just created, make note of the Client ID under Client Credentials under the General tab:
To identify the Okta OIDC Issuer URL, select Settings and then Account on the left side. The Okta UI shows the company name under Organization Contact.

The OIDC issuer URL has the following format: https://COMPANY.okta.com. Replace COMPANY with the corresponding value. Make note of it.

Log in to the Azure Portal.
Select “Microsoft Entra ID” service.
On the left side, select “App registrations.”
On the top, click “New registration.”

On the screen named “Register an application,” fill out the values as follows:
- Specify a name, for example “Weights and Biases application”
- By default the selected account type is: “Accounts in this organizational directory only (Default Directory only - Single tenant).” Modify if you need to.
- Configure Redirect URI as type Web with value: https://YOUR_W_AND_B_URL/oidc/callback
- Click “Register.”
- Make a note of the “Application (client) ID” and “Directory (tenant) ID.”
On the left side, click Authentication.
- Under Front-channel logout URL, specify: https://YOUR_W_AND_B_URL/logout
- Click “Save.”
On the left side, click “Certificates & secrets.”
- Click “Client secrets” and then click “New client secret.”
  
  On the screen named “Add a client secret,” fill out the values as follows:
  - Enter a description, for example “wandb”
  - Leave “Expires” as is or change if you have to.
  - Click “Add.”
- Make a note of the “Value” of the secret. There is no need for the “Secret ID.”

You should now have made notes of three values:

OIDC Client ID
OIDC Client Secret
Tenant ID is needed for the OIDC Issuer URL

The OIDC issuer URL has the following format: https://login.microsoftonline.com/${TenantID}/v2.0

Set up SSO on the W&B Server

To set up SSO, you need administrator privileges and the following information:

OIDC Client ID
OIDC Auth method (implicit or pkce)
OIDC Issuer URL
OIDC Client Secret (optional; depends on how you have setup your IdP)

If your IdP requires a OIDC Client Secret, specify it by passing the environment variables OIDC_CLIENT_SECRET.

In the UI, go to System Console > Settings > Advanced > User Spec and add OIDC_CLIENT_SECRET to the extraENV section as shown below.
In Helm, configure values.global.extraEnv as shown below.

values:
  global:
    extraEnv:
      OIDC_CLIENT_SECRET="<your_secret>"

If you’re unable to log in to your instance after configuring SSO, you can restart the instance with the LOCAL_RESTORE=true environment variable set. This outputs a temporary password to the containers logs and disables SSO. Once you’ve resolved any issues with SSO, you must remove that environment variable to enable SSO again.

The System Console is the successor to the System Settings page. It is available with the W&B Kubernetes Operator based deployment.

Refer to Access the W&B Management Console.
Navigate to Settings, then Authentication. Select OIDC in the Type dropdown.
Enter the values.
Click on Save.
Log out and then log back in, this time using the IdP login screen.

Sign in to your Weights&Biases instance.
Navigate to the W&B App.
From the dropdown, select System Settings:
Enter your Issuer, Client ID, and Authentication Method.
Select Update settings.

If you’re unable to log in to your instance after configuring SSO, you can restart the instance with the LOCAL_RESTORE=true environment variable set. This outputs a temporary password to the containers logs and turn off SSO. Once you’ve resolved any issues with SSO, you must remove that environment variable to enable SSO again.

Security Assertion Markup Language (SAML)

W&B Server does not support SAML.

5.2.1.3 - Use federated identities with SDK

Use identity federation to sign in using your organizational credentials through W&B SDK. If your W&B organization admin has configured SSO for your organization, then you already use your organizational credentials to sign-in to the W&B app UI. In that sense, identity federation is like SSO for W&B SDK, but by using JSON Web Tokens (JWTs) directly. You can use identity federation as an alternative to API keys.

RFC 7523 forms the underlying basis for identity federation with SDK.

Identity federation is available in Preview for Enterprise plans on all platform types - SaaS Cloud, Dedicated Cloud, and Self-managed instances. Reach out to your W&B team for any questions.

For the purpose of this document, the terms identity provider and JWT issuer are used interchangeably. Both refer to one and the same thing in the context of this capability.

JWT issuer setup

As a first step, an organization admin must set up a federation between your W&B organization and a publicly accessible JWT issuer.

Go to the Settings tab in your organization dashboard
In the Authentication option, press Set up JWT Issuer
Add the JWT issuer URL in the text box and press Create

W&B will automatically look for a OIDC discovery document at the path ${ISSUER_URL}/.well-known/oidc-configuration, and try to find the JSON Web Key Set (JWKS) at a relevant URL in the discovery document. The JWKS is used for real-time validation of the JWTs to ensure that those have been issued by the relevant identity provider.

Using the JWT to access W&B

Once a JWT issuer has been setup for your W&B organization, users can start accessing the relevant W&B projects using JWTs issued by that identity provider. The mechanism for using JWTs is as follows:

You must sign-in to the identity provider using one of the mechanisms available in your organization. Some providers can be accessed in an automated manner using an API or SDK, while some can only be accessed using a relevant UI. Reach out to your W&B organization admin or the owner of the JWT issuer for details.
Once you’ve retrieved the JWT after signing in to your identity provider, store it in a file at a secure location and configure the absolute file path in an environment variable WANDB_IDENTITY_TOKEN_FILE.
Access your W&B project using the W&B SDK or CLI. The SDK or CLI should automatically detect the JWT and exchange it for a W&B access token after the JWT has been successfully validated. The W&B access token is used to access the relevant APIs for enabling your AI workflows, that is, to log runs, metrics, artifacts and so forth. The access token is by default stored at the path ~/.config/wandb/credentials.json. You can change that path by specifying the environment variable WANDB_CREDENTIALS_FILE.

JWTs are meant to be short-lived credentials to address the shortcomings of long-lived credentials like API keys, passwords and so forth. Depending on the JWT expiry time configured in your identity provider, you must continuously refresh the JWT and ensure that it’s stored in the file referenced by the environment variable WANDB_IDENTITY_TOKEN_FILE.

W&B access token also has a default expiry duration, after which the SDK or the CLI automatically try to refresh that using your JWT. If the user JWT has also expired by that time and is not refreshed, that could result in an authentication failure. If possible, the JWT retrieval and post-expiry refresh mechanism should be implemented as part of the AI workload that uses the W&B SDK or CLI.

JWT validation

As part of the workflow to exchange the JWT for a W&B access token and then access a project, the JWT undergoes following validations:

The JWT signature is verified using the JWKS at the W&B organization level. This is the first line of defense, and if this fails, that means there’s a problem with your JWKS or how your JWT is signed.
The iss claim in the JWT should be equal to the issuer URL configured at the organization level.
The sub claim in the JWT should be equal to the user’s email address as configured in the W&B organization.
The aud claim in the JWT should be equal to the name of the W&B organization which houses the project that you are accessing as part of your AI workflow. In case of Dedicated Cloud or Self-managed instances, you could configure an instance-level environment variable SKIP_AUDIENCE_VALIDATION to true to skip validation of the audience claim, or use wandb as the audience.
The exp claim in the JWT is checked to see if the token is valid or has expired and needs to be refreshed.

External service accounts

W&B has supported built-in service accounts with long-lived API keys for long. With the identity federation capability for SDK and CLI, you can also bring external service accounts that could use JWTs for authentication, though as long as those are issued by the same issuer which is configured at the organization level. A team admin can configure external service accounts within the scope of a team, like the built-in service accounts.

To configure an external service account:

Go to the Service Accounts tab for your team
Press New service account
Provide a name for the service account, select Federated Identity as the Authentication Method, provide a Subject, and press Create

The sub claim in the external service account’s JWT should be same as what the team admin configures as its subject in the team-level Service Accounts tab. That claim is verified as part of JWT validation. The aud claim requirement is similar to that for human user JWTs.

When using an external service account’s JWT to access W&B, it’s typically easier to automate the workflow to generate the initial JWT and continuously refresh it. If you would like to attribute the runs logged using an external service account to a human user, you can configure the environment variables WANDB_USERNAME or WANDB_USER_EMAIL for your AI workflow, similar to how it’s done for the built-in service accounts.

W&B recommends to use a mix of built-in and external service accounts across your AI workloads with different levels of data sensitivity, in order to strike a balance between flexibility and simplicity.

5.2.1.4 - Use service accounts to automate workflows

Manage automated or non-interactive workflows using org and team scoped service accounts

A service account represents a non-human or machine user that can automatically perform common tasks across projects within a team or across teams.

An org admin can create a service account at the scope of the organization.
A team admin can create a service account at the scope of that team.

A service account’s API key allows the caller to read from or write to projects within the service account’s scope.

Service accounts allow for centralized management of workflows by multiple users or teams, to automate experiment tracking for W&B Models or to log traces for W&B Weave. You have the option to associate a human user’s identity with a workflow managed by a service account, by using either of the environment variables WANDB_USERNAME or WANDB_USER_EMAIL.

Service accounts are available on Dedicated Cloud, Self-managed instances with an enterprise license, and enterprise accounts in SaaS Cloud.

Organization-scoped service accounts

Service accounts scoped to an organization have permissions to read and write in all projects in the organization, regardless of the team, with the exception of restricted projects. Before an organization-scoped service account can access a restricted project, an admin of that project must explicitly add the service account to the project.

An organization admin can obtain the API key for an organization-scoped service account from the Service Accounts tab of the organization or account dashboard.

To create a new organization-scoped service account:

Click New service account button in the Service Accounts tab of your organization dashboard.
Enter a Name.
Select a default team for the service account.
Click Create.
Next to the newly created service account, click Copy API key.
Store the copied API key in a secret manager or another secure but accessible location.

An organization-scoped service account requires a default team, even though it has access to non-restricted projects owned by all teams within the organization. This helps to prevent a workload from failing if the WANDB_ENTITY variable is not set in the environment for your model training or generative AI app. To use an organization-scoped service account for a project in a different team, you must set the WANDB_ENTITY environment variable to that team.

Team-scoped service accounts

A team-scoped service account can read and write in all projects within its team, except to restricted projects in that team. Before a team-scoped service account can access a restricted project, an admin of that project must explicitly add the service account to the project.

As a team admin, you can get the API key for a team-scoped service account in your team at <WANDB_HOST_URL>/<your-team-name>/service-accounts. Alternatively you can go to the Team settings for your team and then refer to the Service Accounts tab.

To create a new team scoped service account for your team:

Click New service account button in the Service Accounts tab of your team.
Enter a Name.
Select Generate API key (Built-in) as the authentication method.
Click Create.
Next to the newly created service account, click Copy API key.
Store the copied API key in a secret manager or another secure but accessible location.

If you do not configure a team in your model training or generative AI app environment that uses a team-scoped service account, the model runs or weave traces log to the named project within the service account’s parent team. In such a scenario, user attribution using the WANDB_USERNAME or WANDB_USER_EMAIL variables do not work unless the referenced user is part of the service account’s parent team.

A team-scoped service account cannot log runs to a team or restricted-scoped project in a team different from its parent team, but it can log runs to an open visibility project within another team.

External service accounts

In addition to Built-in service accounts, W&B also supports team-scoped External service accounts with the W&B SDK and CLI using Identity federation with identity providers (IdPs) that can issue JSON Web Tokens (JWTs).

5.2.2 - Access management

Manage users and teams within an organization

The first user to sign up to W&B with a unique organization domain is assigned as that organization’s instance administrator role. The organization administrator assigns specific users team administrator roles.

W&B recommends to have more than one instance admin in an organization. It is a best practice to ensure that admin operations can continue when the primary admin is not available.

A team administrator is a user in organization that has administrative permissions within a team.

Organization administrators can access and use an organization’s account settings at https://wandb.ai/account-settings/ to invite users, assign or update a user’s role, create teams, remove users from your organization, assign the billing administrator, and more. See Add and manage users for more information.

Once an organization administrator creates a team, the instance administrator or a team administrator can:

By default, only an admin can invite users to that team or remove users from the team. To change this behavior, refer to Team settings.
Assign or update a team member’s role.
Automatically add new users to a team when they join your organization.

Both the organization administrator and the team administrator use team dashboards at https://wandb.ai/<your-team-name> to manage teams. For more information, and to configure a team’s default privacy settings, see Add and manage teams.

Maintain admin access

You must ensure that at least one admin user exists in your instance or organization at all times. Otherwise, no user will be able to configure or maintain your organization’s W&B account.

If users are managed interactively, admin access is required to delete a user, including another admin user. This helps to reduce the risk of the sole admin user being removed.

However, if an organization uses automated processes to deprovision users from W&B, a deprovisioning operation could inadvertently remove the last remaining admin from the instance or organization.

For assistance with developing operational procedures, or to restore admin access, contact support.

Limit visibility to specific projects

Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it. Limiting who can view a project is particularly useful if a team works with sensitive or confidential data.

An organization admin, team admin, or the owner of a project can both set and edit a project’s visibility.

For more information, see Project visibility.

5.2.2.1 - Manage your organization

As an admin of an organization you can manage individual users within your organization and manage teams.

As a team admin you can manage teams.

The following workflow applies to users with instance admin roles. Reach out to an admin in your organization if you believe you should have instance admin permissions.

If you are looking to simplify user management in your organization, refer to Automate user and team management.

Change the name of your organization

The following workflow only applies to W&B Multi-tenant SaaS Cloud.

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. Within the Account section of the dropdown, select Settings.
Within the Settings tab, select General.
Select the Change name button.
Within the modal that appears, provide a new name for your organization and select the Save name button.

Add and manage users

As an admin, use your organization’s dashboard to:

Invite or remove users.
Assign or update a user’s organization role, and create custom roles.
Assign the billing admin.

There are several ways an organization admin can add users to an organization:

Member-by-invite
Auto provisioning with SSO
Domain capture

Seats and pricing

The proceeding table summarizes how seats work for Models and Weave:

Product	Seats	Cost based on
Models	Pay per set	How many Models paid seats you have, and how much usage you’ve accrued determines your overall subscription cost. Each user can be assigned one of the three available seat types: Full, Viewer, and No-Access
Weave	Free	Usage based

Invite a user

admins can invite users to their organization, as well as specific teams within the organization.

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. Within the Account section of the dropdown, select Users.
Select Invite new user.
In the modal that appears, provide the email or username of the user in the Email or username field.
(Recommended) Add the user to a team from the Choose teams dropdown menu.
From the Select role dropdown, select the role to assign to the user. You can change the user’s role at a later time. See the table listed in Assign a role for more information about possible roles.
Choose the Send invite button.

W&B sends an invite link using a third-party email server to the user’s email after you select the Send invite button. A user can access your organization once they accept the invite.

Navigate to https://<org-name>.io/console/settings/. Replace <org-name> with your organization name.
Select the Add user button
Within the modal that appears, provide the email of the new user in the Email field.
Select a role to assign to the user from the Role dropdown. You can change the user’s role at a later time. See the table listed in Assign a role for more information about possible roles.
Check the Send invite email to user box if you want W&B to send an invite link using a third-party email server to the user’s email.
Select the Add new user button.

Auto provision users

A W&B user with matching email domain can sign in to your W&B Organization with Single Sign-On (SSO) if you configure SSO and your SSO provider permits it. SSO is available for all Enterprise licenses.

Enable SSO for authentication

W&B strongly recommends and encourages that users authenticate using Single Sign-On (SSO). Reach out to your W&B team to enable SSO for your organization.

To learn more about how to setup SSO with Dedicated cloud or Self-managed instances, refer to SSO with OIDC or SSO with LDAP.

W&B assigned auto-provisioning users “Member” roles by default. You can change the role of auto-provisioned users at any time.

Auto-provisioning users with SSO is on by default for Dedicated cloud instances and Self-managed deployments. You can turn off auto provisioning. Turning auto provisioning off enables you to selectively add specific users to your W&B organization.

The proceeding tabs describe how to turn off SSO based on deployment type:

Reach out to your W&B team if you are on Dedicated cloud instance and you want to turn off auto provisioning with SSO.

Use the W&B Console to turn off auto provisioning with SSO:

Navigate to https://<org-name>.io/console/settings/. Replace <org-name> with your organization name.
Choose Security
Select the Disable SSO Provisioning to turn off auto provisioning with SSO.

Auto provisioning with SSO is useful for adding users to an organization at scale because organization admins do not need to generate individual user invitations.

Create custom roles

An Enterprise license is required to create or assign custom roles on Dedicated cloud or Self-managed deployments.

Organization admins can compose a new role based on either the View-Only or Member role and add additional permissions to achieve fine-grained access control. Team admins can assign a custom role to a team member. Custom roles are created at the organization level but are assigned at the team level.

To create a custom role:

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. Within the Account section of the dropdown, select Settings.
Click Roles.
In the Custom roles section, click Create a role.
Provide a name for the role. Optionally provide a description.
Choose the role to base the custom role on, either Viewer or Member.
To add permissions, click the Search permissions field, then select one or more permissions to add.
Review the Custom role permissions section, which summarizes the permissions the role has.
Click Create Role.

Use the W&B Console to turn off auto provisioning with SSO:

Navigate to https://<org-name>.io/console/settings/. Replace <org-name> with your organization name.
In the Custom roles section, click Create a role.
Provide a name for the role. Optionally provide a description.
Choose the role to base the custom role on, either Viewer or Member.
To add permissions, click the Search permissions field, then select one or more permissions to add.
Review the Custom role permissions section, which summarizes the permissions the role has.
Click Create Role.

A team admin can now assign the custom role to members of a team from the Team settings.

Domain capture

Domain capture helps your employees join the your companies organization to ensure new users do not create assets outside of your company jurisdiction.

Domains must be unique

Domains are unique identifiers. This means that you can not use a domain that is already in use by another organization.

Domain capture lets you automatically add people with a company email address, such as @example.com, to your W&B SaaS cloud organization. This helps all your employees join the right organization and ensures that new users do not create assets outside of your company jurisdiction.

This table summarizes the behavior of new and existing users with and without domain capture enabled:

	With domain capture	Without domain capture
New users	Users who sign up for W&B from verified domains are automatically added as members to your organization’s default team. They can choose additional teams to join at sign up, if you enable team joining. They can still join other organizations and teams with an invitation.	Users can create W&B accounts without knowing there is a centralized organization available.
Invited users	Invited users automatically join your organization when accepting your invite. Invited users are not automatically added as members to your organization’s default team. They can still join other organizations and teams with an invitation.	Invited users automatically join your organization when accepting your invite. They can still join other organizations and teams with an invitation.
Existing users	Existing users with verified email addresses from your domains can join your organization’s teams within the W&B App. All data that existing users create before joining your organization remains. W&B does not migrate the existing user’s data.	Existing W&B users may be spread across multiple organizations and teams.

To automatically assign non-invited new users to a default team when they join your organization:

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Settings.
Within the Settings tab, select General.
Choose the Claim domain button within Domain capture.
Select the team that you want new users to automatically join from the Default team dropdown. If no teams are available, you’ll need to update team settings. See the instructions in Add and manage teams.
Click the Claim email domain button.

You must enable domain matching within a team’s settings before you can automatically assign non-invited new users to that team.

Navigate to the team’s dashboard at https://wandb.ai/<team-name>. Where <team-name> is the name of the team you want to enable domain matching.
Select Team settings in the global navigation on the left side of the team’s dashboard.
Within the Privacy section, toggle the “Recommend new users with matching email domains join this team upon signing up” option.

Reach out to your W&B Account Team if you use Dedicated or Self-managed deployment type to configure domain capture. Once configured, your W&B SaaS instance automatically prompts users who create a W&B account with your company email address to contact your admin to request access to your Dedicated or Self-managed instance.

	With domain capture	Without domain capture
New users	Users who sign up for W&B on SaaS cloud from verified domains are automatically prompted to contact an admin with an email address you customize. They can still create an organizations on SaaS cloud to trial the product.	Users can create W&B SaaS cloud accounts without learning their company has a centralized dedicated instance.
Existing users	Existing W&B users may be spread across multiple organizations and teams.	Existing W&B users may be spread across multiple organizations and teams.

Assign or update a user’s role

Every member in an Organization has an organization role and seat for both W&B Models and Weave. The type of seat they have determines both their billing status and the actions they can take in each product line.

You initially assign an organization role to a user when you invite them to your organization. You can change any user’s role at a later time.

A user within an organization can have one of the proceeding roles:

Role	Descriptions
admin	A instance admin who can add or remove other users to the organization, change user roles, manage custom roles, add teams and more. W&B recommends ensuring there is more than one admin in the event that your admin is unavailable.
Member	A regular user of the organization, invited by an instance admin. A organization member cannot invite other users or manage existing users in the organization.
Viewer (Enterprise-only feature)	A view-only user of your organization, invited by an instance admin. A viewer only has read access to the organization and the underlying teams that they are a member of.
Custom Roles (Enterprise-only feature)	Custom roles allow organization admins to compose new roles by inheriting from the preceding View-Only or Member roles, and adding additional permissions to achieve fine-grained access control. Team admins can then assign any of those custom roles to users in their respective teams.

To change a user’s role:

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Users.
Provide the name or email of the user in the search bar.
Select a role from the TEAM ROLE dropdown next to the name of the user.

Assign or update a user’s access

A user within an organization has one of the proceeding model seat or weave access types: full, viewer, or no access.

Seat type	Description
Full	Users with this role type have full permissions to write, read, and export data for Models or Weave.
Viewer	A view-only user of your organization. A viewer only has read access to the organization and the underlying teams that they are a part of, and view only access to Models or Weave.
No access	Users with this role have no access to the Models or Weave products.

Model seat type and weave access type are defined at the organization level, and inherited by the team. If you want to change a user’s seat type, navigate to the organization settings and follow the proceeding steps:

For SaaS users, navigate to your organization’s settings at https://wandb.ai/account-settings/<organization>/settings. Ensure to replace the values enclosed in angle brackets (<>) with your organization name. For other Dedicated and Self-managed deployments, navigate to https://<your-instance>.wandb.io/org/dashboard.
Select the Users tab.
From the Role dropdown, select the seat type you want to assign to the user.

The organization role and subscription type determines which seat types are available within your organization.

Remove a user

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Users.
Provide the name or email of the user in the search bar.
Select the ellipses or three dots icon (…) when it appears.
From the dropdown, choose Remove member.

Assign the billing admin

Navigate to https://wandb.ai/home.
In the upper right corner of the page, select the User menu dropdown. From the dropdown, choose Users.
Provide the name or email of the user in the search bar.
Under the Billing admin column, choose the user you want to assign as the billing admin.

Add and manage teams

Use your organization’s dashboard to create and manage teams within your organization. An organization admin or a team admin can:

Invite users to a team or remove users from a team.
Manage a team member’s roles.
Automate the addition of users to a team when they join your organization.
Manage team storage with the team’s dashboard at https://wandb.ai/<team-name>.

Create a team

Use your organization’s dashboard to create a team:

Navigate to https://wandb.ai/home.
Select Create a team to collaborate on the left navigation panel underneath Teams.
Provide a name for your team in the Team name field in the modal that appears.
Choose a storage type.
Select the Create team button.

After you select Create team button, W&B redirects you to a new team page at https://wandb.ai/<team-name>. Where <team-name> consists of the name you provide when you create a team.

Once you have a team, you can add users to that team.

Invite users to a team

Invite users to a team in your organization. Use the team’s dashboard to invite users using their email address or W&B username if they already have a W&B account.

Navigate to https://wandb.ai/<team-name>.
Select Team settings in the global navigation on the left side of the dashboard.
Select the Users tab.
Choose on Invite a new user.
Within the modal that appears, provide the email of the user in the Email or username field and select the role to assign to that user from the Select a team role dropdown. For more information about roles a user can have in a team, see Team roles.
Choose on the Send invite button.

By default, only a team or instance admin can invite members to a team. To change this behavior, refer to Team settings.

In addition to inviting users manually with email invites, you can automatically add new users to a team if the new user’s email matches the domain of your organization.

Allow new users within your organization discover Teams within your organization when they sign-up. New users must have a verified email domain that matches your organization’s verified email domain. Verified new users can view a list of verified teams that belong to an organization when they sign up for a W&B account.

An organization admin must enable domain claiming. To enable domain capture, see the steps described in Domain capture.

Assign or update a team member’s role

Select the account type icon next to the name of the team member.
From the drop-down, choose the account type you want that team member to posses.

This table lists the roles you can assign to a member of a team:

Role	Definition
admin	A user who can add and remove other users in the team, change user roles, and configure team settings.
Member	A regular user of a team, invited by email or their organization-level username by the team admin. A member user cannot invite other users to the team.
View-Only (Enterprise-only feature)	A view-only user of a team, invited by email or their organization-level username by the team admin. A view-only user only has read access to the team and its contents.
Service (Enterprise-only feature)	A service worker or service account is an API key that is useful for utilizing W&B with your run automation tools. If you use an API key from a service account for your team, ensure to set the environment variable `WANDB_USERNAME` to correctly attribute runs to the appropriate user.
Custom Roles (Enterprise-only feature)	Custom roles allow organization admins to compose new roles by inheriting from the preceding View-Only or Member roles, and adding additional permissions to achieve fine-grained access control. Team admins can then assign any of those custom roles to users in their respective teams. Refer to the custom roles announcement for details.

Only enterprise licenses on Dedicated cloud or Self-managed deployment can assign custom roles to members in a team.

Remove users from a team

Remove a user from a team using the team’s dashboard. W&B preserves runs created in a team even if the member who created the runs is no longer on that team.

Navigate to https://wandb.ai/<team-name>.
Select Team settings in the left navigation bar.
Select the Users tab.
Hover your mouse next to the name of the user you want to delete. Select the ellipses or three dots icon (…) when it appears.
From the dropdown, select Remove user.

5.2.2.2 - Manage access control for projects

Manage project access using visibility scopes and project-level roles

Define the scope of a W&B project to limit who can view, edit, and submit W&B runs to it.

You can use a combination of a couple of controls to configure the access level for any project within a W&B team. Visibility scope is the higher-level mechanism. Use that to control which groups of users can view or submit runs in a project. For a project with Team or Restricted visibility scope, you can then use Project level roles to control the level of access that each user has within the project.

The owner of a project, a team admin, or an organization admin can set or edit a project’s visibility.

Visibility scopes

There are four project visibility scopes you can choose from. In order of most public to most private, they are:

Scope	Description
Open	Anyone who knows about the project can view it and submit runs or reports.
Public	Anyone who knows about the project can view it. Only your team can submit runs or reports.
Team	Only members of the parent team can view the project and submit runs or reports. Anyone outside the team can not access the project.
Restricted	Only invited members from the parent team can view the project and submit runs or reports.

Set a project’s scope to Restricted if you would like to collaborate on workflows related to sensitive or confidential data. When you create a restricted project within a team, you can invite or add specific members from the team to collaborate on relevant experiments, artifacts, reports, and so forth.

Unlike other project scopes, all members of a team do not get implicit access to a restricted project. At the same time, team admins can join restricted projects if needed.

Set visibility scope on a new or existing project

Set a project’s visibility scope when you create a project or when editing it later.

Only the owner of the project or a team admin can set or edit its visibility scope.
When a team admin enables Make all future team projects private (public sharing not allowed) within a team’s privacy setting, that turns off Open and Public project visibility scopes for that team. In this case, your team can only use Team and Restricted scopes.

Set visibility scope when you create a new project

Navigate to your W&B organization on SaaS Cloud, Dedicated Cloud, or Self-managed instance.
Click the Create a new project button in the left hand sidebar’s My projects section. Alternatively, navigate to the Projects tab of your team and click the Create new project button in the upper right hand corner.
After selecting the parent team and entering the name of the project, select the desired scope from the Project Visibility dropdown.

Complete the following step if you select Restricted visibility.

Provide names of one or more W&B team members in the Invite team members field. Add only those members who are essential to collaborate on the project.

You can add or remove members in a restricted project later, from its Users tab.

Edit visibility scope of an existing project

Navigate to your W&B Project.
Select the Overview tab on the left column.
Click the Edit Project Details button on the upper right corner.
From the Project Visibility dropdown, select the desired scope.

Complete the following step if you select Restricted visibility.

Go to the Users tab in the project, and click Add user button to invite specific users to the restricted project.

All members of a team lose access to a project if you change its visibility scope from Team to Restricted, unless you invite the required team members to the project.
All members of a team get access to a project if you change its visibility scope from Restricted to Team.
If you remove a team member from the user list for a restricted project, they lose access to that project.

Other key things to note for restricted scope

If you want to use a team-level service account in a restricted project, you should invite or add that specifically to the project. Otherwise a team-level service account can not access a restricted project by default.
You can not move runs from a restricted project, but you can move runs from a non-restricted project to a restricted one.
You can convert the visibility of a restricted project to only Team scope, irrespective of the team privacy setting Make all future team projects private (public sharing not allowed).
If the owner of a restricted project is not part of the parent team anymore, the team admin should change the owner to ensure seamless operations in the project.

Project level roles

For the Team or Restricted scoped projects in your team, you can assign a specific role to a user, which could be different from that user’s team level role. For example, if a user has Member role at the team level, you can assign the View-Only, or Admin, or any available custom role to that user within a Team or Restricted scope project in that team.

Project level roles are in preview on SaaS Cloud, Dedicated Cloud, and Self-managed instances.

Assign project level role to a user

Navigate to your W&B Project.
Select the Overview tab on the left column.
Go to the Users tab in the project.
Click the currently assigned role for the pertinent user in the Project Role field, which should open up a dropdown listing the other available roles.
Select another role from the dropdown. It should save instantly.

When you change the project level role for a user to be different from their team level role, the project level role includes a * to indicate the difference.

Other key things to note for project level roles

By default, project level roles for all users in a team or restricted scoped project inherit their respective team level roles.
You can not change the project level role of a user who has View-only role at the team level.
If the project level role for a user within a particular project is same as the team level role, and at some point if a team admin changes the team level role, the relevant project role is automatically changed to track the team level role.
If you change the project level role for a user within a particular project such that it is different from the team level role, and at some point if a team admin changes the team level role, the relevant project level role remains as is.
If you remove a user from a restricted project when their project level role was different from the team level role, and if you then add the user back to the project after some time, they would inherit the team level role due to the default behavior. If needed, you would need to change the project level role again to be different from the team level role.

5.2.3 - Automate user and team management

SCIM API

Use SCIM API to manage users, and the teams they belong to, in an efficient and repeatable manner. You can also use the SCIM API to manage custom roles or assign roles to users in your W&B organization. Role endpoints are not part of the official SCIM schema. W&B adds role endpoints to support automated management of custom roles.

SCIM API is especially useful if you want to:

manage user provisioning and de-provisioning at scale
manage users with a SCIM-supporting Identity Provider

There are broadly three categories of SCIM API - User, Group, and Roles.

User SCIM API

User SCIM API allows for creating, deactivating, getting the details of a user, or listing all users in a W&B organization. This API also supports assigning predefined or custom roles to users in an organization.

Deactivate a user within a W&B organization with the DELETE User endpoint. Deactivated users can no longer sign in. However, deactivated users still appears in the organization’s user list.

To fully remove a deactivated user from the user list, you must remove the user from the organization.

It is possible to re-enable a deactivated user, if needed.

Group SCIM API

Group SCIM API allows for managing W&B teams, including creating or removing teams in an organization. Use the PATCH Group to add or remove users in an existing team.

There is no notion of a group of users having the same role within W&B. A W&B team closely resembles a group, and allows diverse personas with different roles to work collaboratively on a set of related projects. Teams can consist of different groups of users. Assign each user in a team a role: team admin, member, viewer, or a custom role.

W&B maps Group SCIM API endpoints to W&B teams because of the similarity between groups and W&B teams.

Custom role API

Custom role SCIM API allows for managing custom roles, including creating, listing, or updating custom roles in an organization.

Delete a custom role with caution.

Delete a custom role within a W&B organization with the DELETE Role endpoint. The predefined role that the custom role inherits is assigned to all users that are assigned the custom role before the operation.

Update the inherited role for a custom role with the PUT Role endpoint. This operation doesn’t affect any of the existing, that is, non-inherited custom permissions in the custom role.

W&B Python SDK API

Just like how SCIM API allows you to automate user and team management, you can also use some of the methods available in the W&B Python SDK API for that purpose. Keep a note of the following methods:

Method name	Purpose
`create_user(email, admin=False)`	Add a user to the organization and optionally make them the organization admin.
`user(userNameOrEmail)`	Return an existing user in the organization.
`user.teams()`	Return the teams for the user. You can get the user object using the user(userNameOrEmail) method.
`create_team(teamName, adminUserName)`	Create a new team and optionally make an organization-level user the team admin.
`team(teamName)`	Return an existing team in the organization.
`Team.invite(userNameOrEmail, admin=False)`	Add a user to the team. You can get the team object using the team(teamName) method.
`Team.create_service_account(description)`	Add a service account to the team. You can get the team object using the team(teamName) method.
`Member.delete()`	Remove a member user from a team. You can get the list of member objects in a team using the team object’s `members` attribute. And you can get the team object using the team(teamName) method.

5.2.4 - Manage users, groups, and roles with SCIM

Watch a video demonstrating SCIM in action (12 min)

Overview

The System for Cross-domain Identity Management (SCIM) API allows instance or organization admins to manage users, groups, and custom roles in their W&B organization. SCIM groups map to W&B teams.

The SCIM API is accessible at <host-url>/scim/ and supports the /Users and /Groups endpoints with a subset of the fields found in the RC7643 protocol. It additionally includes the /Roles endpoints which are not part of the official SCIM schema. W&B adds the /Roles endpoints to support automated management of custom roles in W&B organizations.

If you are an admin of multiple Enterprise SaaS Cloud organizations, you must configure the organization where SCIM API requests are sent. Click your profile image, then click User Settings. The setting is named Default API organization. This is required for all hosting options, including Dedicated Cloud, Self-managed instances, and SaaS Cloud. In SaaS Cloud, the organization admin must configure the default organization in user settings to ensure that the SCIM API requests go to the right organization.

The chosen hosting option determines the value for the <host-url> placeholder used in the examples in this page.

In addition, examples use user IDs such as abc and def. Real requests and responses have hashed values for user IDs.

Authentication

Access to the SCIM API can be authenticated in two ways:

Users

An organization or instance admin can use basic authentication with their API key to access the SCIM API. Set the HTTP request’s Authorization header to the string Basic followed by a space, then the base-64 encoded string in the format username:API-KEY. In other words, replace the username and API key with your values separated with a : character, then base-64-encode the result. For example, to authorize as demo:p@55w0rd, the header should be Authorization: Basic ZGVtbzpwQDU1dzByZA==.

Service accounts

An organization service account with the admin role can access the SCIM API. The username is left blank and only the API key is used. Find the API key for service accounts in the Service account tab in the organization dashboard. Refer to Organization-scoped service accounts.

Set the HTTP request’s Authorization header to the string Basic followed by a space, then the base-64 encoded string in the format :API-KEY (notice the colon at the beginning with no username). For example, to authorize with only an API key such as sa-p@55w0rd, set the header to: Authorization: Basic OnNhLXBANTV3MHJk.

User Management

The SCIM user resource maps to W&B users. Use these endpoints to manage users in your organization.

Get User

Retrieves information for a specific user in your organization.

Endpoint

URL: <host-url>/scim/Users/{id}
Method: GET

Parameters

Parameter	Type	Required	Description
id	string	Yes	The unique ID of the user

Example

GET /scim/Users/abc

(Status 200)

{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1"
}

List Users

Retrieves a list of all users in your organization.

Endpoint

URL: <host-url>/scim/Users
Method: GET

Example

GET /scim/Users

(Status 200)

{
    "Resources": [
        {
            "active": true,
            "displayName": "Dev User 1",
            "emails": {
                "Value": "dev-user1@test.com",
                "Display": "",
                "Type": "",
                "Primary": true
            },
            "id": "abc",
            "meta": {
                "resourceType": "User",
                "created": "2023-10-01T00:00:00Z",
                "lastModified": "2023-10-01T00:00:00Z",
                "location": "Users/abc"
            },
            "schemas": [
                "urn:ietf:params:scim:schemas:core:2.0:User"
            ],
            "userName": "dev-user1"
        }
    ],
    "itemsPerPage": 9999,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 1
}

Create User

Creates a new user in your organization.

Endpoint

URL: <host-url>/scim/Users
Method: POST

Parameters

Parameter	Type	Required	Description
emails	array	Yes	Array of email objects. Must include a primary email
userName	string	Yes	The username for the new user

Example

POST /scim/Users

{
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "emails": [
        {
            "primary": true,
            "value": "dev-user2@test.com"
        }
    ],
    "userName": "dev-user2"
}

POST /scim/Users

{
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User",
        "urn:ietf:params:scim:schemas:extension:teams:2.0:User"
    ],
    "emails": [
        {
            "primary": true,
            "value": "dev-user2@test.com"
        }
    ],
    "userName": "dev-user2",
    "urn:ietf:params:scim:schemas:extension:teams:2.0:User": {
        "teams": ["my-team"]
    }
}

Response

(Status 201)

{
    "active": true,
    "displayName": "Dev User 2",
    "emails": {
        "Value": "dev-user2@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "def",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "location": "Users/def"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user2"
}

(Status 201)

{
    "active": true,
    "displayName": "Dev User 2",
    "emails": {
        "Value": "dev-user2@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "def",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "location": "Users/def"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User",
        "urn:ietf:params:scim:schemas:extension:teams:2.0:User"
    ],
    "userName": "dev-user2",
    "organizationRole": "member",
    "teamRoles": [
        {
            "teamName": "my-team",
            "roleName": "member"
        }
    ],
    "groups": [
        {
            "value": "my-team-id"
        }
    ]
}

Delete User

Maintain admin access

You must ensure that at least one admin user exists in your instance or organization at all times. Otherwise, no user will be able to configure or maintain your organization’s W&B account. If an organization uses SCIM or another automated process to deprovision users from W&B, a deprovisioning operation could inadvertently remove the last remaining admin from the instance or organization.

For assistance with developing operational procedures, or to restore admin access, contact support.

Fully deletes a user from your organization.

Endpoint

URL: <host-url>/scim/Users/{id}
Method: DELETE

Parameters

Parameter	Type	Required	Description
id	string	Yes	The unique ID of the user to delete

Example

DELETE /scim/Users/abc

(Status 204)

To temporarily deactivate the user, refer to Deactivate user API which uses the PATCH endpoint.

Deactivate User

Temporarily deactivates a user in your organization.

Endpoint

URL: <host-url>/scim/Users/{id}
Method: PATCH

Parameters

Parameter	Type	Required	Description
id	string	Yes	The unique ID of the user to deactivate
op	string	Yes	Must be “replace”
value	object	Yes	Object with `{"active": false}`

User deactivation and reactivation operations are not supported in SaaS Cloud.

Example

PATCH /scim/Users/abc

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "value": {"active": false}
        }
    ]
}

(Status 200)

{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1"
}

Reactivate User

Reactivates a previously deactivated user in your organization.

Endpoint

URL: <host-url>/scim/Users/{id}
Method: PATCH

Parameters

Parameter	Type	Required	Description
id	string	Yes	The unique ID of the user to reactivate
op	string	Yes	Must be “replace”
value	object	Yes	Object with `{"active": true}`

User deactivation and reactivation operations are not supported in SaaS Cloud.

Example

PATCH /scim/Users/abc

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "value": {"active": true}
        }
    ]
}

(Status 200)

{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1"
}

Assign Organization Role

Assigns an organization-level role to a user.

Endpoint

URL: <host-url>/scim/Users/{id}
Method: PATCH

Parameters

Parameter	Type	Required	Description
id	string	Yes	The unique ID of the user
op	string	Yes	Must be “replace”
path	string	Yes	Must be “organizationRole”
value	string	Yes	Role name (“admin” or “member”)

The viewer role is deprecated and can no longer be set in the UI. W&B assigns the member role to a user if you attempt to assign the viewer role using SCIM. The user is automatically provisioned with Models and Weave seats if possible. Otherwise, a Seat limit reached error is logged. For organizations that use Registry, the user is automatically assigned the viewer role in registries that are visible at the organization level.

Example

PATCH /scim/Users/abc

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "path": "organizationRole",
            "value": "admin"
        }
    ]
}

(Status 200)

{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1",
    "teamRoles": [
        {
            "teamName": "team1",
            "roleName": "admin"
        }
    ],
    "organizationRole": "admin"
}

Assign Team Role

Assigns a team-level role to a user.

Endpoint

URL: <host-url>/scim/Users/{id}
Method: PATCH

Parameters

Parameter	Type	Required	Description
id	string	Yes	The unique ID of the user
op	string	Yes	Must be “replace”
path	string	Yes	Must be “teamRoles”
value	array	Yes	Array of objects with `teamName` and `roleName`

Example

PATCH /scim/Users/abc

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "replace",
            "path": "teamRoles",
            "value": [
                {
                    "roleName": "admin",
                    "teamName": "team1"
                }
            ]
        }
    ]
}

(Status 200)

{
    "active": true,
    "displayName": "Dev User 1",
    "emails": {
        "Value": "dev-user1@test.com",
        "Display": "",
        "Type": "",
        "Primary": true
    },
    "id": "abc",
    "meta": {
        "resourceType": "User",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Users/abc"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "userName": "dev-user1",
    "teamRoles": [
        {
            "teamName": "team1",
            "roleName": "admin"
        }
    ],
    "organizationRole": "admin"
}

Group resource

The SCIM group resource maps to W&B teams, that is, when you create a SCIM group in a W&B deployment, it creates a W&B team. Same applies to other group endpoints.

Get team

Endpoint: <host-url>/scim/Groups/{id}
Method: GET
Description: Retrieve team information by providing the team’s unique ID.
Request Example:

GET /scim/Groups/ghi

Response Example:

(Status 200)

{
    "displayName": "wandb-devs",
    "id": "ghi",
    "members": [
        {
            "Value": "abc",
            "Ref": "",
            "Type": "",
            "Display": "dev-user1"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Groups/ghi"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

List teams

Endpoint: <host-url>/scim/Groups
Method: GET
Description: Retrieve a list of teams.
Request Example:

GET /scim/Groups

Response Example:

(Status 200)

{
    "Resources": [
        {
            "displayName": "wandb-devs",
            "id": "ghi",
            "members": [
                {
                    "Value": "abc",
                    "Ref": "",
                    "Type": "",
                    "Display": "dev-user1"
                }
            ],
            "meta": {
                "resourceType": "Group",
                "created": "2023-10-01T00:00:00Z",
                "lastModified": "2023-10-01T00:00:00Z",
                "location": "Groups/ghi"
            },
            "schemas": [
                "urn:ietf:params:scim:schemas:core:2.0:Group"
            ]
        }
    ],
    "itemsPerPage": 9999,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 1
}

Create team

Endpoint: <host-url>/scim/Groups
Method: POST
Description: Create a new team resource.
Supported Fields:

Field	Type	Required
displayName	String	Yes
members	Multi-Valued Array	Yes (`value` sub-field is required and maps to a user ID)

Request Example:

Creating a team called wandb-support with dev-user2 as its member.

POST /scim/Groups

{
    "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
    "displayName": "wandb-support",
    "members": [
        {
            "value": "def"
        }
    ]
}

Response Example:

(Status 201)

{
    "displayName": "wandb-support",
    "id": "jkl",
    "members": [
        {
            "Value": "def",
            "Ref": "",
            "Type": "",
            "Display": "dev-user2"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:00:00Z",
        "location": "Groups/jkl"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

Update team

Endpoint: <host-url>/scim/Groups/{id}
Method: PATCH
Description: Update an existing team’s membership list.
Supported Operations: add member, remove member

The remove operations follow RFC 7644 SCIM protocol specifications. Use the filter syntax members[value eq "{user_id}"] to remove a specific user, or members to remove all users from the team.

Replace {team_id} with the actual team ID and {user_id} with the actual user ID in your requests.

Adding a user to a team

Adding dev-user2 to wandb-devs:

PATCH /scim/Groups/{team_id}

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "add",
            "path": "members",
            "value": [
                {
                    "value": "{user_id}"
                }
            ]
        }
    ]
}

(Status 200)

{
    "displayName": "wandb-devs",
    "id": "ghi",
    "members": [
        {
            "Value": "abc",
            "Ref": "",
            "Type": "",
            "Display": "dev-user1"
        },
        {
            "Value": "def",
            "Ref": "",
            "Type": "",
            "Display": "dev-user2"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:01:00Z",
        "location": "Groups/ghi"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

Removing a specific user from a team

Removing dev-user2 from wandb-devs:

PATCH /scim/Groups/{team_id}

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "remove",
            "path": "members[value eq \"{user_id}\"]"
        }
    ]
}

(Status 200)

{
    "displayName": "wandb-devs",
    "id": "ghi",
    "members": [
        {
            "Value": "abc",
            "Ref": "",
            "Type": "",
            "Display": "dev-user1"
        }
    ],
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:01:00Z",
        "location": "Groups/ghi"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

Removing all users from a team

Removing all users from wandb-devs:

PATCH /scim/Groups/{team_id}

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
    "Operations": [
        {
            "op": "remove",
            "path": "members"
        }
    ]
}

(Status 200)

{
    "displayName": "wandb-devs",
    "id": "ghi",
    "members": null,
    "meta": {
        "resourceType": "Group",
        "created": "2023-10-01T00:00:00Z",
        "lastModified": "2023-10-01T00:01:00Z",
        "location": "Groups/ghi"
    },
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:Group"
    ]
}

Delete team

Deleting teams is currently unsupported by the SCIM API since there is additional data linked to teams. Delete teams from the app to confirm you want everything deleted.

Role resource

The SCIM role resource maps to W&B custom roles. As mentioned earlier, the /Roles endpoints are not part of the official SCIM schema, W&B adds /Roles endpoints to support automated management of custom roles in W&B organizations.

Get custom role

Endpoint: <host-url>/scim/Roles/{id}
Method: GET
Description: Retrieve information for a custom role by providing the role’s unique ID.
Request Example:

GET /scim/Roles/abc

Response Example:

(Status 200)

{
    "description": "A sample custom role for example",
    "id": "Um9sZTo3",
    "inheritedFrom": "member", // indicates the predefined role
    "meta": {
        "resourceType": "Role",
        "created": "2023-11-20T23:10:14Z",
        "lastModified": "2023-11-20T23:31:23Z",
        "location": "Roles/Um9sZTo3"
    },
    "name": "Sample custom role",
    "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
    "permissions": [
        {
            "name": "artifact:read",
            "isInherited": true // inherited from member predefined role
        },
        ...
        ...
        {
            "name": "project:update",
            "isInherited": false // custom permission added by admin
        }
    ],
    "schemas": [
        ""
    ]
}

List custom roles

Endpoint: <host-url>/scim/Roles
Method: GET
Description: Retrieve information for all custom roles in the W&B organization
Request Example:

GET /scim/Roles

Response Example:

(Status 200)

{
   "Resources": [
        {
            "description": "A sample custom role for example",
            "id": "Um9sZTo3",
            "inheritedFrom": "member", // indicates the predefined role that the custom role inherits from
            "meta": {
                "resourceType": "Role",
                "created": "2023-11-20T23:10:14Z",
                "lastModified": "2023-11-20T23:31:23Z",
                "location": "Roles/Um9sZTo3"
            },
            "name": "Sample custom role",
            "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
            "permissions": [
                {
                    "name": "artifact:read",
                    "isInherited": true // inherited from member predefined role
                },
                ...
                ...
                {
                    "name": "project:update",
                    "isInherited": false // custom permission added by admin
                }
            ],
            "schemas": [
                ""
            ]
        },
        {
            "description": "Another sample custom role for example",
            "id": "Um9sZToxMg==",
            "inheritedFrom": "viewer", // indicates the predefined role that the custom role inherits from
            "meta": {
                "resourceType": "Role",
                "created": "2023-11-21T01:07:50Z",
                "location": "Roles/Um9sZToxMg=="
            },
            "name": "Sample custom role 2",
            "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
            "permissions": [
                {
                    "name": "launchagent:read",
                    "isInherited": true // inherited from viewer predefined role
                },
                ...
                ...
                {
                    "name": "run:stop",
                    "isInherited": false // custom permission added by admin
                }
            ],
            "schemas": [
                ""
            ]
        }
    ],
    "itemsPerPage": 9999,
    "schemas": [
        "urn:ietf:params:scim:api:messages:2.0:ListResponse"
    ],
    "startIndex": 1,
    "totalResults": 2
}

Create custom role

Endpoint: <host-url>/scim/Roles
Method: POST
Description: Create a new custom role in the W&B organization.
Supported Fields:

Field	Type	Required
name	String	Name of the custom role
description	String	Description of the custom role
permissions	Object array	Array of permission objects where each object includes a `name` string field that has value of the form `w&bobject:operation`. For example, a permission object for delete operation on W&B runs would have `name` as `run:delete`.
inheritedFrom	String	The predefined role which the custom role would inherit from. It can either be `member` or `viewer`.

Request Example:

POST /scim/Roles

{
    "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Role"],
    "name": "Sample custom role",
    "description": "A sample custom role for example",
    "permissions": [
        {
            "name": "project:update"
        }
    ],
    "inheritedFrom": "member"
}

Response Example:

(Status 201)

{
    "description": "A sample custom role for example",
    "id": "Um9sZTo3",
    "inheritedFrom": "member", // indicates the predefined role
    "meta": {
        "resourceType": "Role",
        "created": "2023-11-20T23:10:14Z",
        "lastModified": "2023-11-20T23:31:23Z",
        "location": "Roles/Um9sZTo3"
    },
    "name": "Sample custom role",
    "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==",
    "permissions": [
        {
            "name": "artifact:read",
            "isInherited": true // inherited from member predefined role
        },
        ...
        ...
        {
            "name": "project:update",
            "isInherited": false // custom permission added by admin
        }
    ],
    "schemas": [
        ""
    ]
}

Delete custom role

Endpoint: <host-url>/scim/Roles/{id}
Method: DELETE
Description: Delete a custom role in the W&B organization. Use it with caution. The predefined role from which the custom role inherited is now assigned to all users that were assigned the custom role before the operation.
Request Example:

DELETE /scim/Roles/abc

5.2.5 - Advanced IAM configuration

In addition to basic environment variables, you can use environment variables to configure IAM options for your Dedicated Cloud or Self-managed instance.

Choose any of the following environment variables for your instance depending on your IAM needs.

Environment variable	Description
`DISABLE_SSO_PROVISIONING`	Set this to `true` to turn off user auto-provisioning in your W&B instance.
`SESSION_LENGTH`	If you would like to change the default user session expiry time, set this variable to the desired number of hours. For example, set SESSION_LENGTH to `24` to configure session expiry time to 24 hours. The default value is 720 hours.
`GORILLA_ENABLE_SSO_GROUP_CLAIMS`	If you are using OIDC based SSO, set this variable to `true` to automate W&B team membership in your instance based on your OIDC groups. Add a `groups` claim to user OIDC token. It should be a string array where each entry is the name of a W&B team that the user should belong to. The array should include all the teams that a user is a part of.
`GORILLA_LDAP_GROUP_SYNC`	If you are using LDAP based SSO, set it to `true` to automate W&B team membership in your instance based on your LDAP groups.
`GORILLA_OIDC_CUSTOM_SCOPES`	If you are using OIDC based SSO, you can specify additional scopes that W&B instance should request from your identity provider. W&B does not change the SSO functionality due to these custom scopes in any way.
`GORILLA_USE_IDENTIFIER_CLAIMS`	If you are using OIDC based SSO, set this variable to `true` to enforce username and full name of your users using specific OIDC claims from your identity provider. If set, ensure that you configure the enforced username and full name in the `preferred_username` and `name` OIDC claims respectively. Usernames can only contain alphanumeric characters along with underscores and hyphens as special characters.
`GORILLA_DISABLE_PERSONAL_ENTITY`	When set to true, turns off personal entities. Prevents creation of new personal projects in their personal entities and prevents writing to existing personal projects.
`GORILLA_DISABLE_ADMIN_TEAM_ACCESS`	Set this to `true` to restrict Organization or Instance Admins from self-joining or adding themselves to a W&B team, thus ensuring that only Data & AI personas have access to the projects within the teams.
`WANDB_IDENTITY_TOKEN_FILE`	For identity federation, the absolute path to the local directory where Java Web Tokens (JWTs) are stored.

W&B advises to exercise caution and understand all implications before enabling some of these settings, like GORILLA_DISABLE_ADMIN_TEAM_ACCESS. Reach out to your W&B team for any questions.

5.3 - Data security

5.3.1 - Bring your own bucket (BYOB)

Overview

Bring your own bucket (BYOB) allows you to store W&B artifacts and other related sensitive data in your own cloud or on-prem infrastructure. In case of Dedicated Cloud or Multi-tenant Cloud, data that you store in your bucket is not copied to the W&B managed infrastructure.

Communication between W&B SDK / CLI / UI and your buckets occurs using pre-signed URLs.
W&B uses a garbage collection process to delete W&B Artifacts. For more information, see Deleting Artifacts.
You can specify a sub-path when configuring a bucket, to ensure that W&B does not store any files in a folder at the root of the bucket. It can help you better conform to your organzation’s bucket governance policy.

Data stored in the central database vs buckets

When using BYOB functionality, certain types of data will be stored in the W&B central database, and other types will be stored in your bucket.

Database

Metadata for users, teams, artifacts, experiments, and projects
Reports
Experiment logs
System metrics
Console logs

Buckets

Experiment files and metrics
Artifact files
Media files
Run files
Exported history metrics and system events in Parquet format

Bucket scopes

There are two scopes you can configure your storage bucket to:

Scope

Description

Instance level

In Dedicated Cloud and Self-Managed, any user with the required permissions within your organization or instance can access files stored in your instance’s storage bucket. Not applicable to Multi-tenant Cloud.

Team level

If a W&B Team is configured to use a Team level storage bucket, team members can access files stored in it. Team level storage buckets allow greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements.

Team level storage can help different business units or departments sharing an instance to efficiently use the infrastructure and administrative resources. It can also allow separate project teams to manage AI workflows for separate customer engagements. Available for all deployment types. You configure team level BYOB when setting up the team.

This flexible design allows for many different storage topologies, depending on your organization’s needs. For example:

The same bucket can be used for the instance and one or more teams.
Each team can use a separate bucket, some teams can choose to write to the instance bucket, or multiple teams can share a bucket by writing to subpaths.
Buckets for different teams can be hosted in different cloud infrastructure environments or regions, and can be managed by different storage admin teams.

For example, suppose you have a team called Kappa in your organization. Your organization (and Team Kappa) use the Instance level storage bucket by default. Next, you create a team called Omega. When you create Team Omega, you configure a Team level storage bucket for that team. Files generated by Team Omega are not accessible by Team Kappa. However, files created by Team Kappa are accessible by Team Omega. If you want to isolate data for Team Kappa, you must configure a Team level storage bucket for them as well.

Availability matrix

W&B can connect to the following storage providers:

CoreWeave AI Object Storage is a high-performance, S3-compatible object storage service optimized for AI workloads.
Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance.
Google Cloud Storage is a managed service for storing unstructured data at scale.
Azure Blob Storage is a cloud-based object storage solution for storing massive amounts of unstructured data like text, binary data, images, videos, and logs.
S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.

The following table shows the availability of BYOB at each scope for each W&B deployment type.

W&B deployment type	Instance level	Team level	Additional information
Dedicated Cloud	✓	✓	Instance and team level BYOB are supported for CoreWeave AI Object Storage, Amazon S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or on-premises infrastructure.
Multi-tenant Cloud	Not Applicable	✓	Team level BYOB is supported for CoreWeave AI Object Storage, Amazon S3, and GCP Storage. W&B fully manages the default and only storage bucket for Microsoft Azure.
Self-Managed	✓	✓	Instance and team level BYOB are supported for CoreWeave AI Object Storage, Amazon S3, GCP Storage, Microsoft Azure Blob Storage, and S3-compatible storage like MinIO hosted in your cloud or infrastructure on your premises.

The following sections guide you through the process of setting up BYOB.

Provision your bucket

After verifying availability, you are ready to provision your storage bucket, including its access policy and CORS. Select a tab to continue.

Requirements:

Dedicated Cloud or Self-Hosted v0.70.0 or newer, or Multi-tenant Cloud.
A CoreWeave account with AI Object Storage enabled and with permission to create buckets, API access keys, and secret keys.
Your W&B instance must be able to connect to CoreWeave network endpoints.

For details, see Create a CoreWeave AI Object Storage bucket in the CoreWeave documentation.

Multi-tenant Cloud: Obtain your organization ID, which is required for your bucket policy.
1. Log in to the W&B App.
2. In the left navigation, click Create a new team.
3. In the drawer that opens, copy the W&B organization ID, which is located above Invite team members.
4. Leave this page open. You will use it to configure W&B.
In CoreWeave, create the bucket with a name of your choice in your preferred CoreWeave availability zone. Optionally create a folder for W&B to use as a sub-path for all W&B files. Make a note of the bucket name, availability zone, API access key, secret key, and sub-path.

Set the following Cross-origin resource sharing (CORS) policy for the bucket:

[
  {
    "AllowedHeaders": [
      "*"
    ],
    "AllowedMethods": [
      "GET",
      "HEAD",
      "PUT"
    ],
    "AllowedOrigins": [
      "*"
    ],
    "ExposeHeaders": [
      "ETag"
    ],
    "MaxAgeSeconds": 3000
  }
]

CoreWeave storage is S3-compatible. For details about CORS, refer to Configuring cross-origin resource sharing (CORS) in the AWS documentation.

Multi-tenant Cloud: Configure a bucket policy that grants the required permissions for your W&B deployment to access the bucket and generate pre-signed URLs that AI workloads in your cloud infrastructure or user browsers utilize to access the bucket. Refer to Bucket Policy Reference in the CoreWeave documentation.

Replace <cw-bucket> with the CoreWeave bucket name and replace <wb-org-id> with the W&B organization ID you obtained in step 1.

{
  "Version": "2012-10-17",
  "Statement": [
  {
    "Sid": "AllowWandbUser",
    "Action": [
      "s3:GetObject*",
      "s3:GetEncryptionConfiguration",
      "s3:ListBucket",
      "s3:ListBucketMultipartUploads",
      "s3:ListBucketVersions",
      "s3:AbortMultipartUpload",
      "s3:DeleteObject",
      "s3:PutObject",
      "s3:GetBucketCORS",
      "s3:GetBucketLocation",
      "s3:GetBucketVersioning"
    ],
    "Effect": "Allow",
    "Resource": [
      "arn:aws:s3:::<cw-bucket>/*",
      "arn:aws:s3:::<cw-bucket>"
    ],
    "Principal": {
      "CW": "arn:aws:iam::wandb:static/wandb-integration"
    },
    "Condition": {
      "StringLike": {
        "wandb:OrgID": [
          "<wb-org-id>"
        ]
      }
    }
  },
  {
    "Sid": "AllowUsersInOrg",
    "Action": "s3:*",
    "Effect": "Allow",
    "Resource": [
      "arn:aws:s3:::<cw-bucket>",
      "arn:aws:s3:::<cw-bucket>/*"
    ],
    "Principal": {
      "CW": "arn:aws:iam::<cw-storage-org-id>:*"
    }
  }]
}

The clause beginning with "Sid": "AllowUsersInOrg" grants users in your W&B organization direct access to the bucket. If you don’t need this ability, you can omit the clause from your policy.

For details, see Create an S3 bucket in the AWS documentation.

Provision the KMS Key.

W&B requires you to provision a KMS Key to encrypt and decrypt the data on the S3 bucket. The key usage type must be ENCRYPT_DECRYPT. Assign the following policy to the key:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid" : "Internal",
      "Effect" : "Allow",
      "Principal" : { "AWS" : "<Your_Account_Id>" },
      "Action" : "kms:*",
      "Resource" : "<aws_kms_key.key.arn>"
    },
    {
      "Sid" : "External",
      "Effect" : "Allow",
      "Principal" : { "AWS" : "<aws_principal_and_role_arn>" },
      "Action" : [
        "kms:Decrypt",
        "kms:Describe*",
        "kms:Encrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*"
      ],
      "Resource" : "<aws_kms_key.key.arn>"
    }
  ]
}

Replace <Your_Account_Id> and <aws_kms_key.key.arn> accordingly.

If you are using Multi-tenant Cloud or Dedicated Cloud, replace <aws_principal_and_role_arn> with the corresponding value:

For Multi-tenant Cloud: arn:aws:iam::725579432336:role/WandbIntegration
For Dedicated Cloud: arn:aws:iam::830241207209:root

This policy grants your AWS account full access to the key and also assigns the required permissions to the AWS account hosting the W&B Platform. Keep a record of the KMS Key ARN.

Provision the S3 Bucket.

Follow these steps to provision the S3 bucket in your AWS account:

Create the S3 bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.
Enable server side encryption, using the KMS key from the previous step.

Configure CORS with the following policy:

[
  {
      "AllowedHeaders": [
          "*"
      ],
      "AllowedMethods": [
          "GET",
          "HEAD",
          "PUT"
      ],
      "AllowedOrigins": [
          "*"
      ],
      "ExposeHeaders": [
          "ETag"
      ],
      "MaxAgeSeconds": 3000
  }
]

If data in your bucket expires due to an object lifecycle management policy, you may lose the ability to read the history of some runs.

Grant the required S3 permissions to the AWS account hosting the W&B Platform, which requires these permissions to generate pre-signed URLs that AI workloads in your cloud infrastructure or user browsers utilize to access the bucket.

{
  "Version": "2012-10-17",
  "Id": "WandBAccess",
  "Statement": [
    {
      "Sid": "WAndBAccountAccess",
      "Effect": "Allow",
      "Principal": { "AWS": "<aws_principal_and_role_arn>" },
        "Action" : [
          "s3:GetObject*",
          "s3:GetEncryptionConfiguration",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads",
          "s3:ListBucketVersions",
          "s3:AbortMultipartUpload",
          "s3:DeleteObject",
          "s3:PutObject",
          "s3:GetBucketCORS",
          "s3:GetBucketLocation",
          "s3:GetBucketVersioning"
        ],
      "Resource": [
        "arn:aws:s3:::<wandb_bucket>",
        "arn:aws:s3:::<wandb_bucket>/*"
      ]
    }
  ]
}

Replace <wandb_bucket> accordingly and keep a record of the bucket name. Next, configure W&B.

If you are using Multi-tenant Cloud or Dedicated Cloud, replace <aws_principal_and_role_arn> with the corresponding value.

For Multi-tenant Cloud: arn:aws:iam::725579432336:role/WandbIntegration
For Dedicated Cloud: arn:aws:iam::830241207209:root

For more details, see the AWS self-managed hosting guide.

For details, see Create a bucket in the GCP documentation.

Provision the GCS bucket.

Follow these steps to provision the GCS bucket in your GCP project:
1. Create the GCS bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.
2. Set encryption type to Google-managed.
3. Set the CORS policy with gsutil. This is not possible in the UI.
  1. Create a file called cors-policy.json locally.
  2. Copy the following CORS policy into the file and save it.
```
[
  {
    "origin": ["*"],
    "responseHeader": ["Content-Type"],
    "exposeHeaders": ["ETag"],
    "method": ["GET", "HEAD", "PUT"],
    "maxAgeSeconds": 3000
  }
]
```
    If data in your bucket expires due to an object lifecycle management policy, you may lose the ability to read the history of some runs.
4. Replace <bucket_name> with the correct bucket name and run gsutil.
```
gsutil cors set cors-policy.json gs://<bucket_name>
```
5. Verify the bucket’s policy. Replace <bucket_name> with the correct bucket name.
```
gsutil cors get gs://<bucket_name>
```
If you are using Multi-tenant Cloud or Dedicated Cloud, grant the storage.admin role to the GCP service account linked to the W&B Platform. W&B requires this role to check the bucket’s CORS configuration and attributes, such as whether object versioning is enabled. If the service account does not have the storage.admin role, these checks result in a HTTP 403 error.
- For Multi-tenant Cloud, the account is: wandb-integration@wandb-production.iam.gserviceaccount.com
- For Dedicated Cloud the account is: deploy@wandb-production.iam.gserviceaccount.com
Keep a record of the bucket name. Next, configure W&B for BYOB.

For details, see Create a blob storage container in the Azure documentation.

Provision the Azure Blob Storage container.

For the instance level BYOB, if you’re not using this Terraform module, follow the steps below to provision a Azure Blob Storage bucket in your Azure subscription:

Create a bucket with a name of your choice. Optionally create a folder which you can configure as sub-path to store all W&B files.

Configure the CORS policy on the bucket

To set the CORS policy through the UI go to the blob storage, scroll down to Settings/Resource Sharing (CORS) and then set the following:

Parameter	Value
Allowed Origins	`*`
Allowed Methods	`GET`, `HEAD`, `PUT`
Allowed Headers	`*`
Exposed Headers	`*`
Max Age	`3000`

If data in your bucket expires due to an object lifecycle management policy, you may lose the ability to read the history of some runs.

Generate a storage account access key and make a note of its name and the storage account name. If you are using Dedicated Cloud, share the storage account name and access key with your W&B team using a secure sharing mechanism.

For team level BYOB, W&B recommends that you use Terraform to provision the Azure Blob Storage bucket along with the necessary access mechanism and permissions. If you use Dedicated Cloud, provide the OIDC issuer URL for your instance. Make a note of the following details:
- Storage account name
- Storage container name
- Managed identity client id
- Azure tenant id

Create your S3-compatible bucket. Make a note of:

Access key
Secret access key
URL endpoint
Bucket name
Folder path, if applicable.
Region

Next, determine the storage address.

Determine the storage address

This section explains the syntax to use to connect a W&B Team to a BYOB storage bucket. In the examples, replace placeholder values between angle brackets (<>) with your bucket’s details. Select a tab for detailed instructions.

This section is relevant only for team level BYOB on Dedicated Cloud or Self-Managed. For instance level BYOB or for Multi-tenant Cloud, you are ready to Configure W&B.

Determine the full bucket path using the following format. Replace placeholders between angle brackets (<>) with the bucket’s values.

Bucket format:

cw://<accessKey>:<secretAccessKey>@cwobject.com/<bucketName>?tls=true

The cwobject.com HTTPS endpoint is supported. TLS 1.3 is required. Contact support to express interest in other CoreWeave endpoints.

Bucket format:

s3://<accessKey>:<secretAccessKey>@<s3_regional_url_endpoint>/<bucketName>?region=<region>

In the address, the region parameter is mandatory unless both your W&B instance and your storage bucket are deployed AWS, and the W&B instance’s AWS_REGION matches the bucket’s AWS S3 region.

Bucket format:

gs://<serviceAccountEmail>:<urlEncodedPrivateKey>@<bucketName>

Bucket format:

az://:<urlEncodedAccessKey>@<storageAccountName>/<containerName>

Bucket format:

s3://<accessKey>:<secretAccessKey>@<url_endpoint>/<bucketName>?region=<region>&tls=true

In the address, the region parameter is mandatory.

This section is for S3-compatible storage buckets that are not hosted in S3, like MinIO hosted on your premises. For storage buckets hosted in AWS S3, see the AWS tab instead.

For Cloud-native storage buckets with an optional S3-compatible mode, use the Cloud-native protocol specifier when possible. For example, use cw:// for a CoreWeave bucket, rather than s3://.

After determining the storage address, you are ready to configure team level BYOB.

Configure W&B

After you provision your bucket and determine its address, you are ready to configure BYOB at the instance level or team level.

Plan your storage bucket layout carefully. After you configure a storage bucket for W&B, migrating its data to another bucket is complex and requires the assistance of W&B. This applies to storage for Dedicated Cloud and Self-Managed, as well as team-level storage for Multi-tenant Cloud. For questions, contact support.

Instance level BYOB

For CoreWeave AI Object Storage at the instance level, contact W&B support instead of following these instructions. Self-service configuration is not yet supported.

For Dedicated Cloud: Share the bucket details with your W&B team, who will configure your Dedicated Cloud instance.

For Self-Managed, you can configure instance level BYOB using the W&B App:

Log in to W&B as a user with the admin role.
Click the user icon at the top, then click System Console.
Go to Settings > System Connections.
In the Bucket Storage section, ensure the identity in the Identity field is granted access to the new bucket.
Select the Provider.
Enter the Bucket Name.
Optionally, enter the Path to use in the new bucket.
Click Save

For Self-Managed, W&B recommends using the Terraform module managed by W&B to provision a storage bucket along with the necessary access mechanism and related IAM permissions:

AWS
GCP
Azure - Instance level BYOB or Team level BYOB

Team level BYOB

After you determine the storage location for your bucket, you can use the W&B App to configure team level BYOB while creating a team.

After a team is created, its storage cannot be changed.
For Instance level BYOB, refer to Instance level BYOB instead.
If you plan to configure CoreWeave storage for the team, contact support to verify that your bucket is configured correctly in CoreWeave and to validate your team’s configuration, since the storage details cannot be changed after the team is created.

Select your deployment type to continue.

Dedicated Cloud: You must provide the bucket path to your account team so that they can add it to your instance’s supported file stores before following the rest of these steps to use the storage bucket for a team.
Self-Managed: You must add the bucket path to your the GORILLA_SUPPORTED_FILE_STORES environment variable and then restart W&B before following the rest of these steps to use the storage bucket for a team.
Log in to W&B as a user with the admin role, click the icon at the top left to open the left navigation, then click Create a team to collaborate.
Provide a name for the team.
Set Storage Type to External storage.

To use the instance level storage for team storage (regardless of whether it is internal or external), leave Storage Type set to Internal, even if the instance level bucket is configured for BYOB. To use separate external storage for the team, set Storage Type for the team to External and configure the bucket details in the next step.
Click Bucket location.
To use an existing bucket, select it from the list. To add a new bucket, click Add bucket at the bottom, then provide the bucket’s details.

Click Cloud provider and select CoreWeave, AWS, GCP, or Azure.

If the cloud provider is not listed, ensure that you have followed step 1 to add the bucket path to the supported file stores for your instance. If the storage provider is still not listed, contact support for assistance.
Specify the bucket details.
- For CoreWeave, provide only the bucket name.
- For Amazon S3, GCP, or S3-compatible storage, provide the full bucket path you determined earlier.
- For Azure on W&B Dedicated or Self-Managed, set Account name to the Azure account and Container name to the Azure blob storage container.
- Optionally:
  - If applicable, set Path to the bucket sub-path.
  - AWS: Set KMS key ARN to the ARN of your KMS encryption key.
  - Azure: If applicable, specify values for Tenant ID and Managed Identity Client ID.
Click Create team.

If W&B encounters errors accessing the bucket or detects invalid settings, an error or warning displays at the bottom of the page. Otherwise, the team is created.

Switch to the browser window where you previously began to create the new team to find the W&B organization ID previously. Otherwise, log in to W&B as a user with the admin role, click the icon at the top left to open the left navigation, then click Create a team to collaborate.
Provide a name for the team.
Set Storage Type to External storage.
Click Bucket location.
To use an existing bucket, select it from the list. To add a new bucket, click Add bucket at the bottom, then provide the bucket’s details.

Click Cloud provider and select CoreWeave, AWS, GCP, or Azure.
Specify the bucket details.
- For CoreWeave, provide only the bucket name.
- For Amazon S3, GCP, or S3-compatible storage, provide the full bucket path you determined earlier.
- For Azure on W&B Dedicated or Self-Managed, set Account name to the Azure account and Container name to the Azure blob storage container.
- Optionally:
  - If applicable, set Path to the bucket sub-path.
  - AWS: Set KMS key ARN to the ARN of your KMS encryption key.
  - Azure: If applicable, specify values for Tenant ID and Managed Identity Client ID.
- Invite members to the team. In Invite team members, specify a comma-separated list of email addresses. Otherwise, you can invite members to the team after it is created.
Click Create team.

If W&B encounters errors accessing the bucket or detects invalid settings, an error or warning displays at the bottom of the page. Otherwise, the team is created.

Troubleshooting

Connecting to CoreWeave AI Object Storage

Connection errors
- Verify that your W&B instance can connect to CoreWeave network endpoints.
- CoreWeave uses virtual-hosted style paths, where the bucket name is a subdomain at the beginning of the path. For example: cw://bucket-name.cwobject.com is correct, while ~~cw://cwobject.com/bucket-name/~~ is not.
- Bucket names must not contain underscores (_) or other characters incompatible with DNS rules.
- Bucket names must be globally unique among CoreWeave locations.
- Bucket names must not begin with cw- or vip-, which are reserved prefixes.
CORS validation failures
- A CORS policy is required. CoreWeave is S3-compatible; for details about CORS, see Configuring cross-origin resource sharing (CORS) in the AWS documentation.
- AllowedMethods must include methods GET, PUT, and HEAD.
- ExposeHeaders must include `ETag.
- W&B front-end domains must be included in the CORS policy’s AllowedOrigins. The example CORS policies provided on this page include all domains using *.
LOTA endpoint issues
- Connecting to LOTA endpoints from W&B is not yet supported. To express interest, contact support.
Access key and permission errors
- Verify that your CoreWeave API Access Key is not expired.
- Verify that your CoreWeave API Access Key and Secret Key have sufficient permissions GetObject, PutObject, DeleteObject, ListBucket. The examples in this page meet this requirement. Refer to Create and Manage Access Keys in the CoreWeave documentation.

5.3.2 - Access BYOB using pre-signed URLs

W&B uses pre-signed URLs to simplify access to blob storage from your AI workloads or user browsers. For basic information on pre-signed URLs, refer to the cloud provider’s documentation:

Pre-signed URLs for AWS S3, which also applies to S3-compatible storage like CoreWeave AI Object Storage.
Signed URLs for Google Cloud Storage
Shared Access Signature for Azure Blob Storage

How it works:

When needed, AI workloads or user browser clients within your network request pre-signed URLs from W&B.
W&B responds to the request by accessing the blob storage to generate the pre-signed URL with the required permissions.
W&B returns the pre-signed URL to the client.
The client uses the pre-signed URL to read or write to the blob storage.

A pre-signed URL expires after:

Reading: 1 hour
Writing: 24 hours, to allow more time to upload large objects in chunks.

Team-level access control

Each pre-signed URL is restricted to specific buckets based on team level access control in the W&B platform. If a user is part of a team which is mapped to a storage bucket using secure storage connector, and if that user is part of only that team, then the pre-signed URLs generated for their requests would not have permissions to access storage buckets mapped to other teams.

W&B recommends adding users to only the teams that they are supposed to be a part of.

Network restriction

W&B recommends using IAM policies to restrict the networks that can use pre-signed URLs to access external storage using pre-signed URLs. This helps to ensure that your W&B specific buckets are accessed only from networks where your AI workloads are running, or from gateway IP addresses that map to your user machines.

For CoreWeave AI Object Storage, refer to Bucket policy reference in the CoreWeave documentation.
For AWS S3 or S3-compatible storage like MiniIO hosted on your premises, refer to the S3 userguide, the MinIO documentation, or the documentation for your S3-compatible storage provider.

Audit logs

W&B recommends using W&B audit logs together with blob storage specific audit logs. For blob storage audit logs, refer to the documentation for each cloud provider:

Admin and security teams can use audit logs to keep track of which user is doing what in the W&B product and take necessary action if they determine that some operations need to be limited for certain users.

Pre-signed URLs are the only supported blob storage access mechanism in W&B. W&B recommends configuring some or all of the above list of security controls according to your organization’s needs.

Determine the user that requested a pre-signed URL

When W&B returns a pre-signed URL, a query parameter in the URL contains the requester’s username:

Storage provider	Signed URL query parameter
CoreWeave AI Object Storage	`X-User`
AWS S3 storage	`X-User`
Google Cloud storage	`X-User`
Azure blob storage	`scid`

5.3.3 - Configure IP allowlisting for Dedicated Cloud

You can restrict access to your Dedicated Cloud instance from only an authorized list of IP addresses. This applies to the access from your AI workloads to the W&B APIs and from your user browsers to the W&B app UI as well. Once IP allowlisting has been set up for your Dedicated Cloud instance, W&B denies any requests from other unauthorized locations. Reach out to your W&B team to configure IP allowlisting for your Dedicated Cloud instance.

IP allowlisting is available on Dedicated Cloud instances on AWS, GCP and Azure.

You can use IP allowlisting with secure private connectivity. If you use IP allowlisting with secure private connectivity, W&B recommends using secure private connectivity for all traffic from your AI workloads and majority of the traffic from your user browsers if possible, while using IP allowlisting for instance administration from privileged locations.

W&B strongly recommends to use CIDR blocks assigned to your corporate or business egress gateways rather than individual /32 IP addresses. Using individual IP addresses is not scalable and has strict limits per cloud.

5.3.4 - Configure private connectivity to Dedicated Cloud

You can connect to your Dedicated Cloud instance over the cloud provider’s secure private network. This applies to the access from your AI workloads to the W&B APIs and optionally from your user browsers to the W&B app UI as well. When using private connectivity, the relevant requests and responses do not transit through the public network or internet.

Secure private connectivity is coming soon as an advanced security option with Dedicated Cloud.

Secure private connectivity is available on Dedicated Cloud instances on AWS, GCP and Azure:

Using AWS Privatelink on AWS
Using GCP Private Service Connect on GCP
Using Azure Private Link on Azure

Once enabled, W&B creates a private endpoint service for your instance and provides you the relevant DNS URI to connect to. With that, you can create private endpoints in your cloud accounts that can route the relevant traffic to the private endpoint service. Private endpoints are easier to setup for your AI training workloads running within your cloud VPC or VNet. To use the same mechanism for traffic from your user browsers to the W&B app UI, you must configure appropriate DNS based routing from your corporate network to the private endpoints in your cloud accounts.

If you would like to use this feature, contact your W&B team.

You can use secure private connectivity with IP allowlisting. If you use secure private connectivity for IP allowlisting, W&B recommends that you secure private connectivity for all traffic from your AI workloads and majority of the traffic from your user browsers if possible, while using IP allowlisting for instance administration from privileged locations.

5.3.5 - Data encryption in Dedicated cloud

W&B uses a W&B-managed cloud-native key to encrypt the W&B-managed database and object storage in every Dedicated cloud, by using the customer-managed encryption key (CMEK) capability in each cloud. In this case, W&B acts as a customer of the cloud provider, while providing the W&B platform as a service to you. Using a W&B-managed key means that W&B has control over the keys that it uses to encrypt the data in each cloud, thus doubling down on its promise to provide a highly safe and secure platform to all of its customers.

W&B uses a unique key to encrypt the data in each customer instance, providing another layer of isolation between Dedicated cloud tenants. The capability is available on AWS, Azure and GCP.

Dedicated cloud instances on GCP and Azure that W&B provisioned before August 2024 use the default cloud provider managed key for encrypting the W&B-managed database and object storage. Only new instances that W&B has been creating starting August 2024 use the W&B-managed cloud-native key for the relevant encryption.

Dedicated cloud instances on AWS have been using the W&B-managed cloud-native key for encryption from before August 2024.

W&B doesn’t generally allow customers to bring their own cloud-native key to encrypt the W&B-managed database and object storage in their Dedicated cloud instance, because multiple teams and personas in an organization could have access to its cloud infrastructure for various reasons. Some of those teams or personas may not have context on W&B as a critical component in the organization’s technology stack, and thus may remove the cloud-native key completely or revoke W&B’s access to it. Such an action could corrupt all data in the organization’s W&B instance and thus leave it in a irrecoverable state.

If your organization needs to use their own cloud-native key to encrypt the W&B-managed database and object storage to approve the use of Dedicated cloud for your AI workflows, W&B can review it on a exception basis. If approved, use of your cloud-native key for encryption would conform to the shared responsibility model of W&B Dedicated cloud. If any user in your organization removes your key or revokes W&B’s access to it at any point when your Dedicated cloud instance is live, W&B would not be liable for any resulting data loss or corruption and also would not be responsible for recovery of such data.

5.4 - Configure privacy settings

Organization and Team admins can configure a set of privacy settings at the organization and team scopes respectively. When configured at the organization scope, organization admins enforce those settings for all teams in that organization.

W&B recommends organization admins to enforce a privacy setting only after communicating that in advance to all team admins and users in their organization. This is to avoid unexpected changes in their workflows.

Configure privacy settings for a team

Team admins can configure privacy settings for their respective teams from within the Privacy section of the team Settings tab. Each setting is configurable as long as it’s not enforced at the organization scope:

Hide this team from all non-members
Make all future team projects private (public sharing not allowed)
Allow any team member to invite other members (not just admins)
Turn off public sharing to outside of team for reports in private projects. This turns off existing magic links.
Allow users with matching organization email domain to join this team.
- This setting is applicable only to SaaS Cloud. It’s not available in Dedicated Cloud or Self-managed instances.
Enable code saving by default.

Enforce privacy settings for all teams

Organization admins can enforce privacy settings for all teams in their organization from within the Privacy section of the Settings tab in the account or organization dashboard. If organization admins enforce a setting, team admins are not allowed to configure that within their respective teams.

Enforce team visibility restrictions
- Enable this option to hide all teams from non-members
Enforce privacy for future projects
- Enable this option to enforce all future projects in all teams to be private or restricted
Enforce invitation control
- Enable this option to prevent non-admins from inviting members to any team
Enforce report sharing control
- Enable this option to turn off public sharing of reports in private projects and deactivate existing magic links
Enforce team self joining restrictions
- Enable this option to restrict users with matching organization email domain from self-joining any team
- This setting is applicable only to SaaS Cloud. It’s not available in Dedicated Cloud or Self-managed instances.
Enforce default code saving restrictions
- Enable this option to turn off code saving by default for all teams

5.5 - Monitoring and usage

5.5.1 - Track user activity with audit logs

Use W&B audit logs to track user activity within your organization and to conform to your enterprise governance requirements. Audit logs are available in JSON format. Refer to Audit log schema.

How to access audit logs depends on your W&B platform deployment type:

W&B Platform Deployment type	Audit logs access mechanism
Self-Managed	Synced to instance-level bucket every 10 minutes. Also available using the API.
Dedicated Cloud with secure storage connector (BYOB)	Synced to instance-level bucket (BYOB) every 10 minutes. Also available using the API.
Dedicated Cloud with W&B managed storage (without BYOB)	Available only by using the API.
Multi-tenant Cloud	Available for Enterprise plans only. Available only by using the API.

After fetching audit logs, you can analyze them using tools like Pandas, Amazon Redshift, Google BigQuery, or Microsoft Fabric. Some audit log analysis tools do not support JSON; refer to the documentation for your analysis tool for guidelines and requirements for transforming the JSON-formatted audit logs before analysis.

Audit log retention

If you require audit logs to be retained for a specific period of time, W&B recommends periodically transferring logs to long-term storage, either using storage buckets or the Audit Logging API.

If you are subject to the Health Insurance Portability and Accountability Act of 1996 (HIPAA), audit logs must be retained for a minimum of 6 years in an environment where they cannot be deleted or modified by any internal or exterrnal actor before the end of the mandatory retention period. For HIPAA-compliant Dedicated Cloud instances with BYOB, you must configure guardrails for your managed storage, including any long-term retention storage.

Audit log schema

This table shows all keys which may appear in an audit log entry, ordered alphabetically. Depending on the action and the circumstances, a specific log entry may include only a subset of the possible fields.

Key	Definition
`action`	The action of the event.
`actor_email`	The email address of the user that initiated the action, if applicable.
`actor_ip`	The IP address of the user that initiated the action.
`actor_user_id`	The ID of the logged-in user who performed the action, if applicable.
`artifact_asset`	The artifact ID associated with the action, if applicable.
`artifact_digest`	The artifact digest associated with the action, if applicable.
`artifact_qualified_name`	The full name of the artifact associated with the action, if applicable.
`artifact_sequence_asset`	The artifact sequence ID associated with the action, if applicable.
`cli_version`	The version of the Python SDK that initiated the action, if applicable.
`entity_asset`	The entity or team ID associated with the action, if applicable.
`entity_name`	The entity or team name associated with the action, if applicable.
`project_asset`	The project associated with the action, if applicable.
`project_name`	The name of the project associated with the action, if applicable.
`report_asset`	The report ID associated with the action, if applicable.
`report_name`	The name of the report associated with the action, if applicable.
`response_code`	The HTTP response code for the action, if applicable.
`timestamp`	The time of the event in RFC3339 format. For example, `2023-01-23T12:34:56Z` represents January 23, 2023 at 12:34:56 UTC.
`user_asset`	The user asset the action impacts (rather than the user performing the action), if applicable.
`user_email`	The email address of the user the action impacts (rather than the email address of the user performing the action), if applicable.

Personally identifiable information (PII)

Personally identifiable information (PII), such as email addresses and the names of projects, teams, and reports, is available only using the API endpoint option.

For Self-Managed and Dedicated Cloud, an organization admin can exclude PII when fetching audit logs.
For Multi-tenant Cloud, the API endpoint always returns relevant fields for audit logs, including PII. This is not configurable.

Fetch audit logs

An organization or instance admin can fetch the audit logs for a W&B instance using the Audit Logging API, at the endpoint audit_logs/.

If a user other than an admin attempts to fetch audit logs, a HTTP 403 error occurs, indicating that access is denied.
If you are an admin of multiple Enterprise Multi-tenant Cloud organizations, you must configure the organization where audit logging API requests are sent. Click your profile image, then click User Settings. The setting is named Default API organization.

Determine the correct API endpoint for your instance:
- Self-Managed: <wandb-platform-url>/admin/audit_logs
- Dedicated Cloud: <wandb-platform-url>/admin/audit_logs
- Multi-tenant Cloud (Enterprise required): https://api.wandb.ai/audit_logs
In proceeding steps, replace <API-endpoint> with your API endpoint.
Construct the full API endpoint from the base endpoint, and optionally include URL parameters:
- anonymize: if set to true, remove any PII; defaults to false. Refer to Exclude PII when fetching audit logs. Not supported for Multi-tenant Cloud.
- numDays: logs will be fetched starting from today - numdays to most recent; defaults to 0, which returns logs only for today. For Multi-tenant Cloud, you can fetch audit logs from a maximum of 7 days in the past.
- startDate: an optional date with format YYYY-MM-DD. Supported only on Multi-tenant Cloud.
  
  startDate and numDays interact:
  - If you set both startDate and numDays, logs are returned from startDate to startDate + numDays.
  - If you omit startDate but include numDays, logs are returned from today to numDays.
  - If you set neither startDate nor numDays, logs are returned for today only.
Execute an HTTP GET request on the constructed fully qualified API endpoint using a web browser or a tool like Postman, HTTPie, or cURL.

The API response contains new-line separated JSON objects. Objects will include the fields described in the schema, just like when audit logs are synced to an instance-level bucket. In those cases, the audit logs are located in the /wandb-audit-logs directory in your bucket.

Use basic authentication

To use basic authentication with your API key to access the audit logs API, set the HTTP request’s Authorization header to the string Basic followed by a space, then the base-64 encoded string in the format username:API-KEY. In other words, replace the username and API key with your values separated with a : character, then base-64-encode the result. For example, to authorize as demo:p@55w0rd, the header should be Authorization: Basic ZGVtbzpwQDU1dzByZA==.

Exclude PII when fetching audit logs

For Self-Managed and Dedicated Cloud, a W&B organization or instance admin can exclude PII when fetching audit logs. For Multi-tenant Cloud, the API endpoint always returns relevant fields for audit logs, including PII. This is not configurable.

To exclude PII, pass the anonymize=true URL parameter. For example, if your W&B instance URL is https://mycompany.wandb.io and you would like to get audit logs for user activity within the last week and exclude PII, use an API endpoint like:

https://mycompany.wandb.io/admin/audit_logs?numDays=7&anonymize=true.

Actions

This table describes possible actions that can be recorded by W&B, sorted alphabetically.

Action	Definition
`artifact:create`	Artifact is created.
`artifact:delete`	Artifact is deleted.
`artifact:read`	Artifact is read.
`project:delete`	Project is deleted.
`project:read`	Project is read.
`report:read`	Report is read. ¹
`run:delete_many`	Batch of runs is deleted.
`run:delete`	Run is deleted.
`run:stop`	Run is stopped.
`run:undelete_many`	Batch of runs is restored from trash.
`run:update_many`	Batch of runs is updated.
`run:update`	Run is updated.
`sweep:create_agent`	Sweep agent is created.
`team:create_service_account`	Service account is created for the team.
`team:create`	Team is created.
`team:delete`	Team is deleted.
`team:invite_user`	User is invited to team.
`team:uninvite`	User or service account is uninvited from team.
`user:create_api_key`	API key for the user is created. ¹
`user:create`	User is created. ¹
`user:deactivate`	User is deactivated. ¹
`user:delete_api_key`	API key for the user is deleted. ¹
`user:initiate_login`	User initiates log in. ¹
`user:login`	User logs in. ¹
`user:logout`	User logs out. ¹
`user:permanently_delete`	User is permanently deleted. ¹
`user:reactivate`	User is reactivated. ¹
`user:read`	User profile is read. ¹
`user:update`	User is updated. ¹

1: On Multi-tenant Cloud, audit logs are not collected for:

Open or Public projects.
The report:read action.
User actions which are not tied to a specific organization.

5.5.2 - Use Prometheus monitoring

Use Prometheus with W&B Server. Prometheus installs are exposed as a kubernetes ClusterIP service.

Prometheus monitoring is only available with Self-managed instances.

Follow the procedure below to access your Prometheus metrics endpoint (/metrics):

Connect to the cluster with Kubernetes CLI toolkit, kubectl. See kubernetes’ Accessing Clusters documentation for more information.
Find the internal address of the cluster with:
```
kubectl describe svc prometheus
```
Start a shell session inside your container running in your Kubernetes cluster with kubectl exec. Hit the endpoint at <internal address>/metrics.

Copy the command below and execute it in your terminal and replace <internal address> with your internal address:
```
kubectl exec <internal address>/metrics
```

A test pod starts, which you can exec into just to access anything in the network:

kubectl run -it testpod --image=alpine bin/ash --restart=Never --rm

From there you can choose to keep access internal to the network or expose it yourself with a kubernetes nodeport service.

5.5.3 - Configure Slack alerts

Integrate W&B Server with Slack.

Watch a video demonstrating setting up Slack alerts on W&B Dedicated Cloud deployment (6 min).

Create the Slack application

Follow the procedure below to create a Slack application.

Visit https://api.slack.com/apps and select Create an App.
Provide a name for your app in the App Name field.
Select a Slack workspace where you want to develop your app in. Ensure that the Slack workspace you use is the same workspace you intend to use for alerts.

Configure the Slack application

On the left sidebar, select OAth & Permissions.
Within the Scopes section, provide the bot with the incoming_webhook scope. Scopes give your app permission to perform actions in your development workspace.

For more information about OAuth scopes for Bots, see the Understanding OAuth scopes for Bots tutorial in the Slack API documentation.
Configure the Redirect URL to point to your W&B installation. Use the same URL that your host URL is set to in your local system settings. You can specify multiple URLs if you have different DNS mappings to your instance.
Select Save URLs.
You can optionally specify an IP range under Restrict API Token Usage, allow-list the IP or IP range of your W&B instances. Limiting the allowed IP address helps further secure your Slack application.

Register your Slack application with W&B

Navigate to the System Settings or System Console page of your W&B instance, depending on your deployment
Depending on the System page you are on follow one of the below options:
- If you are in the System Console: go to Settings then to Notifications
- If you are in the System Settings: toggle the Enable a custom Slack application to dispatch alerts to enable a custom Slack application
Supply your Slack client ID and Slack secret then click Save. Navigate to Basic Information in Settings to find your application’s client ID and secret.
Verify that everything is working by setting up a Slack integration in the W&B app.

5.5.4 - View organization activity

This page shows various ways to view activity within your W&B organization.

View user status and activity

To access the Organization Dashboard, navigate to https://<org-name>.io/org/dashboard/. Replace <org-name> with your organization name. The Users tab opens by default. It lists all users, along with data about each user.
To sort the list by user status, click the Last Active column label. Each user’s status is one of the following:
- Invite pending: Admin has sent invite but user has not accepted invitation.
- Active: User has accepted the invite and created an account.
- -: The user was previously active but has not been active in the last 6 months.
- Deactivated: Admin has revoked access of the user.
To see details about a user’s last activity, hover your mouse over the Last Active field for the user. A tooltip appears that shows when the user was added and how many total days the user has been active.

A user is active if they:
- log in to W&B.
- view any page in the W&B App.
- log runs.
- use the SDK to track an experiment.
- interact with the W&B Server in any way.

Navigate to the Members page. This page lists all users, along with data about each user.
To sort the list by user status, click the Last Active column label. Each user’s status is one of the following:
- Invite pending: Admin has sent invite but user has not accepted invitation.
- Active: User has accepted the invite and created an account.
- -: A hyphen indicates that the user has not yet been active within the organization.
A user is active if they perform any auditable action scoped to the organization after May 8, 2025. For a full list, refer to Actions in the Audit Logging page.

Export user details

From the Users tab, you can export details about how your organization uses W&B in CSV format.

Navigate to the Organization Dashboard at https://<org-name>.io/org/dashboard/. Replace <org-name> with your organization name. The Users tab opens by default.
Click the action ... menu next to the Invite new user user button.
Click Export as CSV. The downloaded CSV file lists details about each user of an organization, such as their user name and email address, the time they were last active, their roles, and more.

Exporting users is not available for Multi-tenant Cloud.

View activity over time

This section shows how to get an aggregate view of activity over time.

Use the plots in the Activity tab to get an aggregate view of how many users have been active over time.

To access the Organization Dashboard, navigate to https://<org-name>.io/org/dashboard/. Replace <org-name> with your organization name.
Click the Activity tab.
The Total active users plot shows how many unique users have been active in a period of time (defaults to 3 months).
The Users active over time plot shows the fluctuation of active users over a period of time (defaults to 6 months). Hover your mouse over a pointo to see the number of users on that date.

To change the period of time for a plot, use the drop-down. You can select:

Last 30 days
Last 3 months
Last 6 months
Last 12 months
All time

Use the plots in the Activity Dashboard to get an aggregate view of activity over time:

Click the user profile icon at the top right.
Under Account, click Users.
View the Activity Panel above the list of users. It shows:

The Active user count badge shows how many unique users have been active in a period of time (defaults to 3 months). A user is active if they perform any auditable action scoped to the organization. For a full list, refer to Actions in the Audit Logging page.
The Weekly active users plot shows the number of users active per week.
The Most active user leaderboard ranks the top ten most active users by how many days they were active over the period of time, as well as when they were most recently active.

To adjust the span of time the plots show, click the date picker in the top right. You can choose 7, 30, or 90 days. The default date range is 30 days. All of the plots share the same time range and update automatically.

5.6 - Configure SMTP

In W&B server, adding users to the instance or team will trigger an email invite. To send these email invites, W&B uses a third-party mail server. In some cases, organizations might have strict policies on traffic leaving the corporate network and hence causing these email invites to never be sent to the end user. W&B server offers an option to configure sending these invite emails via an internal SMTP server.

To configure, follow the steps below:

Set the GORILLA_EMAIL_SINK environment variable in the docker container or the kubernetes deployment to smtp://<user:password>@smtp.host.com:<port>
username and password are optional
If you’re using an SMTP server that’s designed to be unauthenticated you would just set the value for the environment variable like GORILLA_EMAIL_SINK=smtp://smtp.host.com:<port>
Commonly used port numbers for SMTP are ports 587, 465 and 25. Note that this might differ based on the type of the mail server you’re using.
To configure the default sender email address for SMTP, which is initially set to noreply@wandb.com, you can update it to an email address of your choice. This can be done by setting the GORILLA_EMAIL_FROM_ADDRESS environment variable on the server to your desired sender email address.

5.7 - Configure environment variables

How to configure the W&B Server installation

In addition to configuring instance level settings via the System Settings admin UI, W&B also provides a way to configure these values via code using Environment Variables. Also, refer to advanced configuration for IAM.

Environment variable reference

Environment Variable	Description
`LICENSE`	Your wandb/local license
`MYSQL`	The MySQL connection string
`BUCKET`	The S3 / GCS bucket for storing data
`BUCKET_QUEUE`	The SQS / Google PubSub queue for object creation events
`NOTIFICATIONS_QUEUE`	The SQS queue on which to publish run events
`AWS_REGION`	The AWS Region where your bucket lives
`HOST`	The FQD of your instance, that is `https://my.domain.net`
`OIDC_ISSUER`	A URL to your Open ID Connect identity provider, that is `https://cognito-idp.us-east-1.amazonaws.com/us-east-1_uiIFNdacd`
`OIDC_CLIENT_ID`	The Client ID of application in your identity provider
`OIDC_AUTH_METHOD`	Implicit (default) or pkce, see below for more context
`SLACK_CLIENT_ID`	The client ID of the Slack application you want to use for alerts
`SLACK_SECRET`	The secret of the Slack application you want to use for alerts
`LOCAL_RESTORE`	You can temporarily set this to true if you’re unable to access your instance. Check the logs from the container for temporary credentials.
`REDIS`	Can be used to setup an external REDIS instance with W&B.
`LOGGING_ENABLED`	When set to true, access logs are streamed to stdout. You can also mount a sidecar container and tail `/var/log/gorilla.log` without setting this variable.
`GORILLA_ALLOW_USER_TEAM_CREATION`	When set to true, allows non-admin users to create a new team. False by default.
`GORILLA_CUSTOMER_SECRET_STORE_SOURCE`	Sets the secret manager for storing team secrets used by W&B Weave. These secret managers are supported: Internal secret manager (default): `k8s-secretmanager://wandb-secret` AWS Secret Manager: `aws-secretmanager` GCP Secret Manager: `gcp-secretmanager` Azure: `az-secretmanger`
`GORILLA_DATA_RETENTION_PERIOD`	How long to retain deleted data from runs in hours. Deleted run data is unrecoverable. Append an `h` to the input value. For example, `"24h"`.
`GORILLA_DISABLE_PERSONAL_ENTITY`	When set to true, turns off personal entities. Prevents creation of new personal projects in their personal entities and prevents writing to existing personal projects.
`ENABLE_REGISTRY_UI`	When set to true, enables the new W&B Registry UI.
`WANDB_ARTIFACT_DIR`	Where to store all downloaded artifacts. If unset, defaults to the `artifacts` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of generated metadata files, which you can set using the `WANDB_DIR` environment variable.
`WANDB_DATA_DIR`	Where to upload staging artifacts. The default location depends on your platform, because it uses the value of `user_data_dir` from the `platformdirs` Python package. Make sure this directory exists and the running user has permission to write to it.
`WANDB_DIR`	Where to store all generated files. If unset, defaults to the `wandb` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of downloaded artifacts, which you can set using the `WANDB_ARTIFACT_DIR` environment variable.
`WANDB_IDENTITY_TOKEN_FILE`	For identity federation, the absolute path to the local directory where Java Web Tokens (JWTs) are stored.

Use the GORILLA_DATA_RETENTION_PERIOD environment variable cautiously. Data is removed immediately once the environment variable is set. We also recommend that you backup both the database and the storage bucket before you enable this flag.

Advanced Reliability Settings

Redis

Configuring an external Redis server is optional but recommended for production systems. Redis helps improve the reliability of the service and enable caching to decrease load times, especially in large projects. Use a managed Redis service such ElastiCache with high availability (HA) and the following specifications:

Minimum 4GB of memory, suggested 8GB
Redis version 6.x
In transit encryption
Authentication enabled

To configure the Redis instance with W&B, you can navigate to the W&B settings page at http(s)://YOUR-W&B-SERVER-HOST/system-admin. Enable the “Use an external Redis instance” option, and fill in the Redis connection string in the following format:

You can also configure Redis using the environment variable REDIS on the container or in your Kubernetes deployment. Alternatively, you could also setup REDIS as a Kubernetes secret.

This page assumes the Redis instance is running at the default port of 6379. If you configure a different port, setup authentication and also want to have TLS enabled on the redis instance the connection string format would look something like: redis://$USER:$PASSWORD@$HOST:$PORT?tls=true

6 - W&B Inference

Access open-source foundation models through W&B Weave and an OpenAI-compatible API

W&B Inference gives you access to leading open-source foundation models through W&B Weave and an OpenAI-compatible API. You can:

Build AI applications and agents without signing up for a hosting provider or self-hosting a model
Try supported models in the W&B Weave Playground

With Weave, you can trace, evaluate, monitor, and improve your W&B Inference-powered applications.

Quickstart

Here’s a simple example using Python:

import openai

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',
    
    # Get your API key from https://wandb.ai/authorize
    api_key="<your-api-key>",
    
    # Team and project are required for usage tracking
    project="<your-team>/<your-project>",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)

Next steps

Review the available models and usage information and limits
Set up your account using the prerequisites
Use the service through the API or UI
Try the usage examples

Usage details

Important

W&B Inference credits come with Free, Pro, and Academic plans for a limited time. Availability may vary for Enterprise accounts. When credits run out:

Free users must upgrade to a paid plan to continue using Inference.
👉 Upgrade to Pro or Enterprise
Pro users are billed monthly for usage beyond free credits, up to a default cap of $6,000/month. See Account tiers and default usage caps
Enterprise usage is capped at $700,000/year. Your account executive handles billing and limit increases. See Account tiers and default usage caps

To learn more, visit the pricing page or see model-specific costs.

6.1 - Prerequisites

Set up your environment to use W&B Inference

Complete these steps before using the W&B Inference service through the API or UI.

Tip

Before starting, review the usage information and limits to understand costs and restrictions.

Set up your W&B account and project

You need these items to access W&B Inference:

A W&B account
Sign up at W&B
A W&B API key
Get your API key at https://wandb.ai/authorize
A W&B project
Create a project in your W&B account to track usage

Set up your environment (Python)

To use the Inference API with Python, you also need to:

Complete the general requirements above
Install the required libraries:
```
pip install openai weave
```

Note

The weave library is optional but recommended. It lets you trace your LLM applications. Learn more in the Weave Quickstart.

See usage examples for code samples using W&B Inference with Weave.

Next steps

After completing the prerequisites:

Check the API reference to learn about available endpoints
Try the usage examples to see the service in action
Use the UI guide to access models through the web interface

6.2 - Available Models

Browse the foundation models available through W&B Inference

W&B Inference provides access to several open-source foundation models. Each model has different strengths and use cases.

Model comparison

Model	Model ID (for API usage)	Type	Context Window	Parameters	Description
OpenAI GPT OSS 120B	`openai/gpt-oss-120b`	Text	131,000	5.1B-117B (Active-Total)	Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.
OpenAI GPT OSS 20B	`openai/gpt-oss-20b`	Text	131,000	3.6B-20B (Active-Total)	Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities.
Qwen3 235B A22B Thinking-2507	`Qwen/Qwen3-235B-A22B-Thinking-2507`	Text	262K	22B-235B (Active-Total)	High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation
Qwen3 235B A22B-2507	`Qwen/Qwen3-235B-A22B-Instruct-2507`	Text	262K	22B-235B (Active-Total)	Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning
Qwen3 Coder 480B A35B	`Qwen/Qwen3-Coder-480B-A35B-Instruct`	Text	262K	35B-480B (Active-Total)	Mixture-of-Experts model optimized for coding tasks such as function calling, tooling use, and long-context reasoning
MoonshotAI Kimi K2	`moonshotai/Kimi-K2-Instruct`	Text	128K	32B-1T (Active-Total)	Mixture-of-Experts model optimized for complex tool use, reasoning, and code synthesis
DeepSeek R1-0528	`deepseek-ai/DeepSeek-R1-0528`	Text	161K	37B-680B (Active-Total)	Optimized for precise reasoning tasks including complex coding, math, and structured document analysis
DeepSeek V3-0324	`deepseek-ai/DeepSeek-V3-0324`	Text	161K	37B-680B (Active-Total)	Robust Mixture-of-Experts model tailored for high-complexity language processing and comprehensive document analysis
Meta Llama 3.1 8B	`meta-llama/Llama-3.1-8B-Instruct`	Text	128K	8B (Total)	Efficient conversational model optimized for responsive multilingual chatbot interactions
Meta Llama 3.3 70B	`meta-llama/Llama-3.3-70B-Instruct`	Text	128K	70B (Total)	Multilingual model excelling in conversational tasks, detailed instruction-following, and coding
Meta Llama 4 Scout	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	Text, Vision	64K	17B-109B (Active-Total)	Multi-modal model integrating text and image understanding, ideal for visual tasks and combined analysis
Microsoft Phi 4 Mini 3.8B	`microsoft/Phi-4-mini-instruct`	Text	128K	3.8B (Active-Total)	Compact, efficient model ideal for fast responses in resource-constrained environments

Using model IDs

When using the API, specify the model using its ID from the table above. For example:

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[...]
)

Next steps

Check usage limits and pricing for each model
See API reference for how to use these models
Try models in the W&B Playground

6.3 - Usage Information and Limits

Understand pricing, usage limits, and account restrictions for W&B Inference

Learn about pricing, limits, and other important usage information before using W&B Inference.

Pricing

For detailed model pricing information, visit W&B Inference pricing.

Purchase more credits

W&B Inference credits come with Free, Pro, and Academic plans for a limited time. Enterprise availability may vary. When credits run out:

Free accounts must upgrade to a paid plan to continue using W&B Inference. Upgrade to Pro or Enterprise
Pro plan users are billed for overages monthly, based on model-specific pricing
Enterprise accounts should contact their account executive

Account tiers and default usage caps

Each account tier has a default spending cap to help manage costs and prevent unexpected charges. W&B requires prepayment for paid Inference access.

Some users may need to change their cap. Contact your account executive or support to adjust your limit.

Account Tier	Default Cap	How to Change Limit
Pro	$6,000/month	Contact your account executive or support for manual review
Enterprise	$700,000/year	Contact your account executive or support for manual review

Concurrency limits

If you exceed the rate limit, the API returns a 429 Concurrency limit reached for requests response. To fix this error, reduce the number of concurrent requests. For detailed troubleshooting, see W&B Inference support articles.

W&B applies rate limits per W&B project. For example, if you have 3 projects in a team, each project has its own rate limit quota.

Personal entities unsupported

Note

Personal entities were deprecated in May 2024, so this only applies to legacy accounts.

Personal accounts (personal entities) don’t support W&B Inference. To access W&B Inference, switch to a non-personal account by creating a Team.

Geographic restrictions

The Inference service is only available from supported geographic locations. For more information, see the Terms of Service.

Next steps

Review the prerequisites before getting started
See available models and their specific costs

6.4 - API Reference

Complete API reference for W&B Inference service

Learn how to use the W&B Inference API to access foundation models programmatically.

Tip

Having trouble with API errors? See W&B Inference support articles for solutions.

Endpoint

Access the Inference service at:

https://api.inference.wandb.ai/v1

Important

To use this endpoint, you need:

A W&B account with Inference credits
A valid W&B API key
A W&B entity (team) and project

In code samples, these appear as <your-team>/<your-project>.

Available methods

The Inference API supports these methods:

Chat completions

Create a chat completion using the /chat/completions endpoint. This endpoint follows the OpenAI format for sending messages and receiving responses.

To create a chat completion, provide:

The Inference service base URL: https://api.inference.wandb.ai/v1
Your W&B API key: <your-api-key>
Your W&B entity and project: <your-team>/<your-project>
A model ID from the available models

curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "OpenAI-Project: <your-team>/<your-project>" \
  -d '{
    "model": "<model-id>",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Tell me a joke." }
    ]
  }'

import openai

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-api-key>",

    # Team and project are required for usage tracking
    project="<your-team>/<your-project>",
)

# Replace <model-id> with any model ID from the available models list
response = client.chat.completions.create(
    model="<model-id>",
    messages=[
        {"role": "system", "content": "<your-system-prompt>"},
        {"role": "user", "content": "<your-prompt>"}
    ],
)

print(response.choices[0].message.content)

List supported models

Get all available models and their IDs. Use this to select models dynamically or check what’s available.

curl https://api.inference.wandb.ai/v1/models \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "OpenAI-Project: <your-team>/<your-project>"

import openai

client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key="<your-api-key>",
    project="<your-team>/<your-project>"
)

response = client.models.list()

for model in response.data:
    print(model.id)

Response format

The API returns responses in OpenAI-compatible format:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a joke for you..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 50,
    "total_tokens": 75
  }
}

Next steps

Try the usage examples to see the API in action
Explore models in the UI

6.5 - Usage Examples

Learn how to use W&B Inference with practical code examples

These examples show how to use W&B Inference with Weave for tracing, evaluation, and comparison.

Basic example: Trace Llama 3.1 8B with Weave

This example shows how to send a prompt to the Llama 3.1 8B model and trace the call with Weave. Tracing captures the full input and output of the LLM call, monitors performance, and lets you analyze results in the Weave UI.

Tip

Learn more about tracing in Weave.

In this example:

You define a @weave.op()-decorated function that makes a chat completion request
Your traces are recorded and linked to your W&B entity and project
The function is automatically traced, logging inputs, outputs, latency, and metadata
The result prints in the terminal, and the trace appears in your Traces tab at https://wandb.ai

Before running this example, complete the prerequisites.

import weave
import openai

# Set the Weave team and project for tracing
weave.init("<your-team>/<your-project>")

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    api_key="<your-api-key>",

    # Required for W&B inference usage tracking
    project="wandb/inference-demo",
)

# Trace the model call in Weave
@weave.op()
def run_chat():
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Tell me a joke."}
        ],
    )
    return response.choices[0].message.content

# Run and log the traced call
output = run_chat()
print(output)

After running the code, view the trace in Weave by:

Clicking the link printed in the terminal (for example: https://wandb.ai/<your-team>/<your-project>/r/call/01977f8f-839d-7dda-b0c2-27292ef0e04g)
Or navigating to https://wandb.ai and selecting the Traces tab

Advanced example: Use Weave Evaluations and Leaderboards

Besides tracing model calls, you can also evaluate performance and publish leaderboards. This example compares two models on a question-answer dataset.

Before running this example, complete the prerequisites.

import os
import asyncio
import openai
import weave
from weave.flow import leaderboard
from weave.trace.ref_util import get_ref

# Set the Weave team and project for tracing
weave.init("<your-team>/<your-project>")

dataset = [
    {"input": "What is 2 + 2?", "target": "4"},
    {"input": "Name a primary color.", "target": "red"},
]

@weave.op
def exact_match(target: str, output: str) -> float:
    return float(target.strip().lower() == output.strip().lower())

class WBInferenceModel(weave.Model):
    model: str

    @weave.op
    def predict(self, prompt: str) -> str:
        client = openai.OpenAI(
            base_url="https://api.inference.wandb.ai/v1",
            # Get your API key from https://wandb.ai/authorize
            api_key="<your-api-key>",
            # Required for W&B inference usage tracking
            project="<your-team>/<your-project>",
        )
        resp = client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
        )
        return resp.choices[0].message.content

llama = WBInferenceModel(model="meta-llama/Llama-3.1-8B-Instruct")
deepseek = WBInferenceModel(model="deepseek-ai/DeepSeek-V3-0324")

def preprocess_model_input(example):
    return {"prompt": example["input"]}

evaluation = weave.Evaluation(
    name="QA",
    dataset=dataset,
    scorers=[exact_match],
    preprocess_model_input=preprocess_model_input,
)

async def run_eval():
    await evaluation.evaluate(llama)
    await evaluation.evaluate(deepseek)

asyncio.run(run_eval())

spec = leaderboard.Leaderboard(
    name="Inference Leaderboard",
    description="Compare models on a QA dataset",
    columns=[
        leaderboard.LeaderboardColumn(
            evaluation_object_ref=get_ref(evaluation).uri(),
            scorer_name="exact_match",
            summary_metric_path="mean",
        )
    ],
)

weave.publish(spec)

After running this code, go to your W&B account at https://wandb.ai/ and:

Select the Traces tab to view your traces
Select the Evals tab to view your model evaluations
Select the Leaders tab to view the generated leaderboard

Next steps

Explore the API reference for all available methods
Try models in the UI

6.6 - UI Guide

Access W&B Inference models through the web interface

Learn how to use the W&B Inference service through the web UI. Before using the UI, complete the prerequisites.

Access the Inference service

You can access the Inference service from three places:

Direct link

Go to https://wandb.ai/inference.

From the Inference tab

Go to your W&B account at https://wandb.ai/
Select Inference from the left sidebar
A page displays with available models and model information

Using an Inference model in the Playground

From the Playground tab

Select Playground from the left sidebar. The Playground chat UI appears
Hover over W&B Inference in the LLM dropdown list. A dropdown with available models appears on the right
From the models dropdown, you can:
- Click any model name to try it in the Playground
- Compare multiple models

The Inference models dropdown in Playground

Try a model in the Playground

After selecting a model, you can test it in the Playground. Available actions include:

Compare multiple models

You can compare Inference models side by side in the Playground. Access the Compare view from two places:

From the Inference tab

Select Inference from the left sidebar. The available models page appears
Click anywhere on a model card (except the model name) to select it. The card border turns blue
Repeat for each model you want to compare
Click Compare N models in the Playground on any selected card. N shows the number of models selected
The comparison view opens

Now you can compare models and use all features from Try a model in the Playground.

Select multiple models to compare in Playground

From the Playground tab

Select Playground from the left sidebar. The Playground chat UI appears
Hover over W&B Inference in the LLM dropdown list. The models dropdown appears on the right
Select Compare from the dropdown. The Inference tab appears
Click anywhere on a model card (except the model name) to select it. The card border turns blue
Repeat for each model you want to compare
Click Compare N models in the Playground on any selected card. The comparison view opens

Now you can compare models and use all features from Try a model in the Playground.

View billing and usage information

Organization admins can track credit balance, usage history, and upcoming bills from the W&B UI:

Go to the W&B Billing page in the UI
Find the Inference billing information card in the bottom right corner
From here you can:
- Click View usage to see your usage over time
- View upcoming inference charges (for paid plans)

Tip

Visit the Inference pricing page for per-model pricing details.

Next steps

Review available models to find the best one for your needs
Try the API for programmatic access
See usage examples for code samples

6.7 - Support

Find answers to common W&B Inference questions

7 - Integrations

W&B integrations make it fast and easy to set up experiment tracking and data versioning inside existing projects. Check out integrations for ML frameworks such as PyTorch, ML libraries such as Hugging Face, or cloud services such as Amazon SageMaker.

Examples: Try the code with notebook and script examples for each integration.
Video Tutorials: Learn to use W&B with YouTube video tutorials

7.1 - Add wandb to any library

Add wandb to any library

This guide provides best practices on how to integrate W&B into your Python library to get powerful Experiment Tracking, GPU and System Monitoring, Model Checkpointing, and more for your own library.

If you are still learning how to use W&B, we recommend exploring the other W&B Guides in these docs, such as Experiment Tracking, before reading further.

Below we cover best tips and best practices when the codebase you are working on is more complicated than a single Python training script or Jupyter notebook. The topics covered are:

Setup requirements
User Login
Starting a wandb Run
Defining a Run Config
Logging to W&B
Distributed Training
Model Checkpointing and More
Hyper-parameter tuning
Advanced Integrations

Setup requirements

Before you get started, decide whether or not to require W&B in your library’s dependencies:

Require W&B on installation

Add the W&B Python library (wandb) to your dependencies file, for example, in your requirements.txt file:

torch==1.8.0 
...
wandb==0.13.*

Make W&B optional on installation

There are two ways to make the W&B SDK (wandb) optional:

A. Raise an error when a user tries to use wandb functionality without installing it manually and show an appropriate error message:

try: 
    import wandb 
except ImportError: 
    raise ImportError(
        "You are trying to use wandb which is not currently installed."
        "Please install it using pip install wandb"
    )

B. Add wandb as an optional dependency to your pyproject.toml file, if you are building a Python package:

[project]
name = "my_awesome_lib"
version = "0.1.0"
dependencies = [
    "torch",
    "sklearn"
]

[project.optional-dependencies]
dev = [
    "wandb"
]

Create an API key

An API key authenticates a client or machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

If a user is using wandb for the first time without following any of the steps mentioned above, they will automatically be prompted to log in when your script calls wandb.init.

Start a run

A W&B Run is a unit of computation logged by W&B. Typically, you associate a single W&B Run per training experiment.

Initialize W&B and start a Run within your code with:

run = wandb.init()

Optionally, you can provide a name for their project, or let the user set it themselves with parameters such as wandb_project in your code along with the username or team name, such as wandb_entity, for the entity parameter:

run = wandb.init(project=wandb_project, entity=wandb_entity)

You must call run.finish() to finish the run. If this works with your integration’s design, use the run as a context manager:

# When this block exits, it calls run.finish() automatically.
# If it exits due to an exception, it uses run.finish(exit_code=1) which
# marks the run as failed.
with wandb.init() as run:
    ...

When to call `wandb.init`?

Your library should create W&B Run as early as possible because any output in your console, including error messages, is logged as part of the W&B Run. This makes debugging easier.

Use `wandb` as an optional dependency

If you want to make wandb optional when your users use your library, you can either:

Define a wandb flag such as:

trainer = my_trainer(..., use_wandb=True)

python train.py ... --use-wandb

Or, set wandb to be disabled in wandb.init:

wandb.init(mode="disabled")

export WANDB_MODE=disabled

wandb disabled

Or, set wandb to be offline - note this will still run wandb, it just won’t try and communicate back to W&B over the internet:

export WANDB_MODE=offline

os.environ['WANDB_MODE'] = 'offline'

wandb offline

Define a run config

With a wandb run config, you can provide metadata about your model, dataset, and so on when you create a W&B Run. You can use this information to compare different experiments and quickly understand the main differences.

Typical config parameters you can log include:

Model name, version, architecture parameters, etc.
Dataset name, version, number of train/val examples, etc.
Training parameters such as learning rate, batch size, optimizer, etc.

The following code snippet shows how to log a config:

config = {"batch_size": 32, ...}
wandb.init(..., config=config)

Update the run config

Use wandb.Run.config.update to update the config. Updating your configuration dictionary is useful when parameters are obtained after the dictionary was defined. For example, you might want to add a model’s parameters after the model is instantiated.

run.config.update({"model_parameters": 3500})

For more information on how to define a config file, see Configure experiments.

Log to W&B

Log metrics

Create a dictionary where the key value is the name of the metric. Pass this dictionary object to run.log:

for epoch in range(NUM_EPOCHS):
    for input, ground_truth in data: 
        prediction = model(input) 
        loss = loss_fn(prediction, ground_truth) 
        metrics = { "loss": loss } 
        run.log(metrics)

If you have a lot of metrics, you can have them automatically grouped in the UI by using prefixes in the metric name, such as train/... and val/.... This will create separate sections in your W&B Workspace for your training and validation metrics, or other metric types you’d like to separate:

metrics = {
    "train/loss": 0.4,
    "train/learning_rate": 0.4,
    "val/loss": 0.5, 
    "val/accuracy": 0.7
}
run.log(metrics)

See the wandb.Run.log() reference.

Prevent x-axis misalignments

If you perform multiple calls to run.log for the same training step, the wandb SDK increments an internal step counter for each call to run.log. This counter may not align with the training step in your training loop.

To avoid this situation, define your x-axis step explicitly with run.define_metric, one time, immediately after you call wandb.init:

with wandb.init(...) as run:
    run.define_metric("*", step_metric="global_step")

The glob pattern, *, means that every metric will use global_step as the x-axis in your charts. If you only want certain metrics to be logged against global_step, you can specify them instead:

run.define_metric("train/loss", step_metric="global_step")

Now, log your metrics, your step metric, and your global_step each time you call run.log:

for step, (input, ground_truth) in enumerate(data):
    ...
    run.log({"global_step": step, "train/loss": 0.1})
    run.log({"global_step": step, "eval/loss": 0.2})

If you do not have access to the independent step variable, for example “global_step” is not available during your validation loop, the previously logged value for “global_step” is automatically used by wandb. In this case, ensure you log an initial value for the metric so it has been defined when it’s needed.

Log images, tables, audio, and more

In addition to metrics, you can log plots, histograms, tables, text, and media such as images, videos, audios, 3D, and more.

Some considerations when logging data include:

How often should the metric be logged? Should it be optional?
What type of data could be helpful in visualizing?
- For images, you can log sample predictions, segmentation masks, etc., to see the evolution over time.
- For text, you can log tables of sample predictions for later exploration.

See the logging guide for media, objects, plots, and more.

Distributed training

For frameworks supporting distributed environments, you can adapt any of the following workflows:

Detect which is the “main” process and only use wandb there. Any required data coming from other processes must be routed to the main process first. (This workflow is encouraged).
Call wandb in every process and auto-group them by giving them all the same unique group name.

See Log Distributed Training Experiments for more details.

Log model checkpoints and more

If your framework uses or produces models or datasets, you can log them for full traceability and have wandb automatically monitor your entire pipeline through W&B Artifacts.

Stored Datasets and Model Checkpoints in W&B

When using Artifacts, it might be useful but not necessary to let your users define:

The ability to log model checkpoints or datasets (in case you want to make it optional).
The path/reference of the artifact being used as input, if any. For example, user/project/artifact.
The frequency for logging Artifacts.

Log model checkpoints

You can log Model Checkpoints to W&B. It is useful to leverage the unique wandb Run ID to name output Model Checkpoints to differentiate them between Runs. You can also add useful metadata. In addition, you can also add aliases to each model as shown below:

metadata = {"eval/accuracy": 0.8, "train/steps": 800} 

artifact = wandb.Artifact(
                name=f"model-{run.id}", 
                metadata=metadata, 
                type="model"
                ) 
artifact.add_dir("output_model") # local directory where the model weights are stored

aliases = ["best", "epoch_10"] 
run.log_artifact(artifact, aliases=aliases)

For information on how to create a custom alias, see Create a Custom Alias.

You can log output Artifacts at any frequency (for example, every epoch, every 500 steps, and so on) and they are automatically versioned.

Log and track pre-trained models or datasets

You can log artifacts that are used as inputs to your training such as pre-trained models or datasets. The following snippet demonstrates how to log an Artifact and add it as an input to the ongoing Run as shown in the graph above.

artifact_input_data = wandb.Artifact(name="flowers", type="dataset")
artifact_input_data.add_file("flowers.npy")
run.use_artifact(artifact_input_data)

Download an artifact

You re-use an Artifact (dataset, model, etc.) and wandb will download a copy locally (and cache it):

artifact = run.use_artifact("user/project/artifact:latest")
local_path = artifact.download("./tmp")

Artifacts can be found in the Artifacts section of W&B and can be referenced with aliases generated automatically (latest, v2, v3) or manually when logging (best_accuracy, etc.).

To download an Artifact without creating a wandb run (through wandb.init), for example in distributed environments or for simple inference, you can instead reference the artifact with the wandb API:

artifact = wandb.Api().artifact("user/project/artifact:latest")
local_path = artifact.download()

For more information, see Download and Use Artifacts.

Tune hyper-parameters

If your library would like to leverage W&B hyper-parameter tuning, W&B Sweeps can also be added to your library.

Advanced integrations

You can also see what an advanced W&B integrations look like in the following integrations. Note most integrations will not be as complex as these:

7.2 - Azure OpenAI Fine-Tuning

How to Fine-Tune Azure OpenAI models using W&B.

Introduction

Fine-tuning GPT-3.5 or GPT-4 models on Microsoft Azure using W&B tracks, analyzes, and improves model performance by automatically capturing metrics and facilitating systematic evaluation through W&B’s experiment tracking and evaluation tools.

Prerequisites

Set up Azure OpenAI service according to official Azure documentation.
Configure a W&B account with an API key.

Workflow overview

1. Fine-tuning setup

Prepare training data according to Azure OpenAI requirements.
Configure the fine-tuning job in Azure OpenAI.
W&B automatically tracks the fine-tuning process, logging metrics and hyperparameters.

2. Experiment tracking

During fine-tuning, W&B captures:

Training and validation metrics
Model hyperparameters
Resource utilization
Training artifacts

3. Model evaluation

After fine-tuning, use W&B Weave to:

Evaluate model outputs against reference datasets
Compare performance across different fine-tuning runs
Analyze model behavior on specific test cases
Make data-driven decisions for model selection

Real-world example

Explore the medical note generation demo to see how this integration facilitates:
- Systematic tracking of fine-tuning experiments
- Model evaluation using domain-specific metrics
Go through an interactive demo of fine-tuning a notebook

Additional resources

7.3 - Catalyst

How to integrate W&B for Catalyst, a Pytorch framework.

Catalyst is a PyTorch framework for deep learning R&D that focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new.

Catalyst includes a W&B integration for logging parameters, metrics, images, and other artifacts.

Check out their documentation of the integration, which includes examples using Python and Hydra.

Interactive Example

Run an example colab to see Catalyst and W&B integration in action.

7.4 - Cohere fine-tuning

How to Fine-Tune Cohere models using W&B.

With W&B you can log your Cohere model’s fine-tuning metrics and configuration to analyze and understand the performance of your models and share the results with your colleagues.

This guide from Cohere has a full example of how to kick off a fine-tuning run and you can find the Cohere API docs here

Log your Cohere fine-tuning results

To add Cohere fine-tuning logging to your W&B workspace:

Create a WandbConfig with your W&B API key, W&B entity and project name. You can find your W&B API key at https://wandb.ai/authorize

Pass this config to the FinetunedModel object along with your model name, dataset and hyperparameters to kick off your fine-tuning run.

from cohere.finetuning import WandbConfig, FinetunedModel

# create a config with your W&B details
wandb_ft_config = WandbConfig(
    api_key="<wandb_api_key>",
    entity="my-entity", # must be a valid enitity associated with the provided API key
    project="cohere-ft",
)

...  # set up your datasets and hyperparameters

# start a fine-tuning run on cohere
cmd_r_finetune = co.finetuning.create_finetuned_model(
  request=FinetunedModel(
    name="command-r-ft",
    settings=Settings(
      base_model=...
      dataset_id=...
      hyperparameters=...
      wandb=wandb_ft_config  # pass your W&B config here
    ),
  ),
)

View your model’s fine-tuning training and validation metrics and hyperparameters in the W&B project that you created.

Organize runs

Your W&B runs are automatically organized and can be filtered/sorted based on any configuration parameter such as job type, base model, learning rate and any other hyper-parameter.

In addition, you can rename your runs, add notes or create tags to group them.

Resources

Cohere Fine-tuning Example

7.5 - Databricks

How to integrate W&B with Databricks.

W&B integrates with Databricks by customizing the W&B Jupyter notebook experience in the Databricks environment.

Configure Databricks

Install wandb in the cluster

Navigate to your cluster configuration, choose your cluster, click Libraries. Click Install New, choose PyPI, and add the package wandb.

Set up authentication

To authenticate your W&B account you can add a Databricks secret which your notebooks can query.

# install databricks cli
pip install databricks-cli

# Generate a token from databricks UI
databricks configure --token

# Create a scope with one of the two commands (depending if you have security features enabled on databricks):
# with security add-on
databricks secrets create-scope --scope wandb
# without security add-on
databricks secrets create-scope --scope wandb --initial-manage-principal users

# Add your api_key from: https://app.wandb.ai/authorize
databricks secrets put --scope wandb --key api_key

Examples

Simple example

import os
import wandb

api_key = dbutils.secrets.get("wandb", "api_key")
wandb.login(key=api_key)

with wandb.init() as run:
    run.log({"foo": 1})

Sweeps

Setup required (temporary) for notebooks attempting to use wandb.sweep() or wandb.agent():

import os

# These will not be necessary in the future
os.environ["WANDB_ENTITY"] = "my-entity"
os.environ["WANDB_PROJECT"] = "my-project-that-exists"

7.6 - DeepChecks

How to integrate W&B with DeepChecks.

Try in Colab

DeepChecks helps you validate your machine learning models and data, such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models, all with minimal effort.

Getting Started

To use DeepChecks with W&B you will first need to sign up for a W&B account. With the W&B integration in DeepChecks you can quickly get started like so:

import wandb

wandb.login()

# import your check from deepchecks
from deepchecks.checks import ModelErrorAnalysis

# run your check
result = ModelErrorAnalysis()

# push that result to wandb
result.to_wandb()

You can also log an entire DeepChecks test suite to W&B.

import wandb

wandb.login()

# import your full_suite tests from deepchecks
from deepchecks.suites import full_suite

# create and run a DeepChecks test suite
suite_result = full_suite().run(...)

# push thes results to wandb
# here you can pass any wandb.init configs and arguments you need
suite_result.to_wandb(project="my-suite-project", config={"suite-name": "full-suite"})

Example

This Report shows off the power of using DeepChecks and W&B.

Any questions or issues about this W&B integration? Open an issue in the DeepChecks github repository and we’ll catch it and get you an answer.

7.7 - DeepChem

How to integrate W&B with DeepChem library.

The DeepChem library provides open source tools that democratize the use of deep-learning in drug discovery, materials science, chemistry, and biology. This W&B integration adds simple and easy-to-use experiment tracking and model checkpointing while training models using DeepChem.

DeepChem logging in 3 lines of code

logger = WandbLogger(…)
model = TorchModel(…, wandb_logger=logger)
model.fit(…)

Report and Google Colab

Explore the Using W&B with DeepChem: Molecular Graph Convolutional Networks article for an example charts generated using the W&B DeepChem integration.

To dive straight into working code, check out this Google Colab.

Track experiments

Set up W&B for DeepChem models of type KerasModel or TorchModel.

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

Log your training and evaluation data to W&B

Training loss and evaluation metrics can be automatically logged to W&B. Optional evaluation can be enabled using the DeepChem ValidationCallback, the WandbLogger will detect ValidationCallback callback and log the metrics generated.

from deepchem.models import TorchModel, ValidationCallback

vc = ValidationCallback(…)  # optional
model = TorchModel(…, wandb_logger=logger)
model.fit(…, callbacks=[vc])
logger.finish()

from deepchem.models import KerasModel, ValidationCallback

vc = ValidationCallback(…)  # optional
model = KerasModel(…, wandb_logger=logger)
model.fit(…, callbacks=[vc])
logger.finish()

7.8 - Docker

How to integrate W&B with Docker.

Docker Integration

W&B can store a pointer to the Docker image that your code ran in, giving you the ability to restore a previous experiment to the exact environment it was run in. The wandb library looks for the WANDB_DOCKER environment variable to persist this state. We provide a few helpers that automatically set this state.

Local Development

wandb docker is a command that starts a docker container, passes in wandb environment variables, mounts your code, and ensures wandb is installed. By default the command uses a docker image with TensorFlow, PyTorch, Keras, and Jupyter installed. You can use the same command to start your own docker image: wandb docker my/image:latest. The command mounts the current directory into the “/app” directory of the container, you can change this with the “–dir” flag.

Production

The wandb docker-run command is provided for production workloads. It’s meant to be a drop in replacement for nvidia-docker. It’s a simple wrapper to the docker run command that adds your credentials and the WANDB_DOCKER environment variable to the call. If you do not pass the “–runtime” flag and nvidia-docker is available on the machine, this also ensures the runtime is set to nvidia.

Kubernetes

If you run your training workloads in Kubernetes and the k8s API is exposed to your pod (which is the case by default). wandb will query the API for the digest of the docker image and automatically set the WANDB_DOCKER environment variable.

Restoring

If a run was instrumented with the WANDB_DOCKER environment variable, calling wandb restore username/project:run_id will checkout a new branch restoring your code then launch the exact docker image used for training pre-populated with the original command.

7.9 - Farama Gymnasium

How to integrate W&B with Farama Gymnasium.

If you’re using Farama Gymnasium we will automatically log videos of your environment generated by gymnasium.wrappers.Monitor. Just set the monitor_gym keyword argument to wandb.init to True.

Our gymnasium integration is very light. We simply look at the name of the video file being logged from gymnasium and name it after that or fall back to "videos" if we don’t find a match. If you want more control, you can always just manually log a video.

Check out this report to learn more on how to use Gymnasium with the CleanRL library.

7.10 - fastai

If you’re using fastai to train your models, W&B has an easy integration using the WandbCallback. Explore the details in interactive docs with examples →

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

Add the `WandbCallback` to the `learner` or `fit` method

import wandb
from fastai.callback.wandb import *

# start logging a wandb run
wandb.init(project="my_project")

# To log only during one training phase
learn.fit(..., cbs=WandbCallback())

# To log continuously for all training phases
learn = learner(..., cbs=WandbCallback())

If you use version 1 of Fastai, refer to the Fastai v1 docs.

WandbCallback Arguments

WandbCallback accepts the following arguments:

Args	Description
log	Whether to log the model’s: `gradients` , `parameters`, `all` or `None` (default). Losses & metrics are always logged.
log_preds	whether we want to log prediction samples (default to `True`).
log_preds_every_epoch	whether to log predictions every epoch or at the end (default to `False`)
log_model	whether we want to log our model (default to False). This also requires `SaveModelCallback`
model_name	The name of the `file` to save, overrides `SaveModelCallback`
log_dataset	`False` (default) `True` will log folder referenced by learn.dls.path. a path can be defined explicitly to reference which folder to log. Note: subfolder “models” is always ignored.
dataset_name	name of logged dataset (default to `folder name`).
valid_dl	`DataLoaders` containing items used for prediction samples (default to random items from `learn.dls.valid`.
n_preds	number of logged predictions (default to 36).
seed	used for defining random samples.

For custom workflows, you can manually log your datasets and models:

log_dataset(path, name=None, metadata={})
log_model(path, name=None, metadata={})

Note: any subfolder “models” will be ignored.

Distributed Training

fastai supports distributed training by using the context manager distrib_ctx. W&B supports this automatically and enables you to track your Multi-GPU experiments out of the box.

Review this minimal example:

import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()

Then, in your terminal you will execute:

$ torchrun --nproc_per_node 2 train.py

in this case, the machine has 2 GPUs.

You can now run distributed training directly inside a notebook.

import wandb
from fastai.vision.all import *

from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fit(1)

notebook_launcher(train, num_processes=2)

Log only on the main process

In the examples above, wandb launches one run per process. At the end of the training, you will end up with two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you will have to detect in which process you are manually and avoid creating runs (calling wandb.init in all other processes)

import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()

in your terminal call:

$ torchrun --nproc_per_node 2 train.py

import wandb
from fastai.vision.all import *

from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fit(1)

notebook_launcher(train, num_processes=2)

Examples

Visualize, track, and compare Fastai models: A thoroughly documented walkthrough.
Image Segmentation on CamVid: A sample use case of the integration.

7.10.1 - fastai v1

This documentation is for fastai v1. If you use the current version of fastai, you should refer to fastai page.

For scripts using fastai v1, we have a callback that can automatically log model topology, losses, metrics, weights, gradients, sample predictions and best trained model.

import wandb
from wandb.fastai import WandbCallback

wandb.init()

learn = cnn_learner(data, model, callback_fns=WandbCallback)
learn.fit(epochs)

Requested logged data is configurable through the callback constructor.

from functools import partial

learn = cnn_learner(
    data, model, callback_fns=partial(WandbCallback, input_type="images")
)

It is also possible to use WandbCallback only when starting training. In this case it must be instantiated.

learn.fit(epochs, callbacks=WandbCallback(learn))

Custom parameters can also be given at that stage.

learn.fit(epochs, callbacks=WandbCallback(learn, input_type="images"))

Example Code

We’ve created a few examples for you to see how the integration works:

Fastai v1

Classify Simpsons characters : A simple demo to track and compare Fastai models
Semantic Segmentation with Fastai: Optimize neural networks on self-driving cars

Options

WandbCallback() class supports a number of options:

Keyword argument	Default	Description
learn	N/A	the fast.ai learner to hook.
save_model	True	save the model if it’s improved at each step. It will also load best model at the end of training.
mode	auto	`min`, `max`, or `auto`: How to compare the training metric specified in `monitor` between steps.
monitor	None	training metric used to measure performance for saving the best model. None defaults to validation loss.
log	gradients	`gradients`, `parameters`, `all`, or None. Losses & metrics are always logged.
input_type	None	`images` or `None`. Used to display sample predictions.
validation_data	None	data used for sample predictions if `input_type` is set.
predictions	36	number of predictions to make if `input_type` is set and `validation_data` is `None`.
seed	12345	initialize random generator for sample predictions if `input_type` is set and `validation_data` is `None`.

7.11 - Hugging Face Transformers

Try in Colab

The Hugging Face Transformers library makes state-of-the-art NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The W&B integration adds rich, flexible experiment tracking and model versioning to interactive centralized dashboards without compromising that ease of use.

Next-level logging in few lines

os.environ["WANDB_PROJECT"] = "<my-amazing-project>"  # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint"  # log all model checkpoints

from transformers import TrainingArguments, Trainer

args = TrainingArguments(..., report_to="wandb")  # turn on W&B logging
trainer = Trainer(..., args=args)

If you’d rather dive straight into working code, check out this Google Colab.

Get started: track experiments

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to the W&B authorization page. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

If you are using W&B for the first time you might want to check out our quickstart

Name the project

A W&B Project is where all of the charts, data, and models logged from related runs are stored. Naming your project helps you organize your work and keep all the information about a single project in one place.

To add a run to a project simply set the WANDB_PROJECT environment variable to the name of your project. The WandbCallback will pick up this project name environment variable and use it when setting up your run.

WANDB_PROJECT=amazon_sentiment_analysis

import os
os.environ["WANDB_PROJECT"]="amazon_sentiment_analysis"

%env WANDB_PROJECT=amazon_sentiment_analysis

Make sure you set the project name before you initialize the Trainer.

If a project name is not specified the project name defaults to huggingface.

Log your training runs to W&B

This is the most important step when defining your Trainer training arguments, either inside your code or from the command line, is to set report_to to "wandb" in order enable logging with W&B.

The logging_steps argument in TrainingArguments will control how often training metrics are pushed to W&B during training. You can also give a name to the training run in W&B using the run_name argument.

That’s it. Now your models will log losses, evaluation metrics, model topology, and gradients to W&B while they train.

python run_glue.py \     # run your Python script
  --report_to wandb \    # enable logging to W&B
  --run_name bert-base-high-lr \   # name of the W&B run (optional)
  # other command line arguments here

from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    # other args and kwargs here
    report_to="wandb",  # enable logging to W&B
    run_name="bert-base-high-lr",  # name of the W&B run (optional)
    logging_steps=1,  # how often to log to W&B
)

trainer = Trainer(
    # other args and kwargs here
    args=args,  # your training args
)

trainer.train()  # start training and logging to W&B

Using TensorFlow? Just swap the PyTorch Trainer for the TensorFlow TFTrainer.

Turn on model checkpointing

Using Artifacts, you can store up to 100GB of models and datasets for free and then use the W&B Registry. Using Registry, you can register models to explore and evaluate them, prepare them for staging, or deploy them in your production environment.

To log your Hugging Face model checkpoints to Artifacts, set the WANDB_LOG_MODEL environment variable to one of:

checkpoint: Upload a checkpoint every args.save_steps from the TrainingArguments.
end: Upload the model at the end of training, if load_best_model_at_end is also set.
false: Do not upload the model.

WANDB_LOG_MODEL="checkpoint"

import os

os.environ["WANDB_LOG_MODEL"] = "checkpoint"

%env WANDB_LOG_MODEL="checkpoint"

Any Transformers Trainer you initialize from now on will upload models to your W&B project. The model checkpoints you log will be viewable through the Artifacts UI, and include the full model lineage (see an example model checkpoint in the UI here).

By default, your model will be saved to W&B Artifacts as model-{run_id} when WANDB_LOG_MODEL is set to end or checkpoint-{run_id} when WANDB_LOG_MODEL is set to checkpoint. However, If you pass a run_name in your TrainingArguments, the model will be saved as model-{run_name} or checkpoint-{run_name}.

W&B Registry

Once you have logged your checkpoints to Artifacts, you can then register your best model checkpoints and centralize them across your team with Registry. Using Registry, you can organize your best models by task, manage the lifecycles of models, track and audit the entire ML lifecyle, and automate downstream actions.

To link a model Artifact, refer to Registry.

Visualise evaluation outputs during training

Visualing your model outputs during training or evaluation is often essential to really understand how your model is training.

By using the callbacks system in the Transformers Trainer, you can log additional helpful data to W&B such as your models’ text generation outputs or other predictions to W&B Tables.

See the Custom logging section below for a full guide on how to log evaluation outputs while training to log to a W&B Table like this:

Shows a W&B Table with evaluation outputs

Finish your W&B Run (Notebook only)

If your training is encapsulated in a Python script, the W&B run will end when your script finishes.

If you are using a Jupyter or Google Colab notebook, you’ll need to tell us when you’re done with training by calling run.finish().

run = wandb.init()
trainer.train()  # start training and logging to W&B

# post-training analysis, testing, other logged code

run.finish()

Visualize your results

Once you have logged your training results you can explore your results dynamically in the W&B Dashboard. It’s easy to compare across dozens of runs at once, zoom in on interesting findings, and coax insights out of complex data with flexible, interactive visualizations.

Advanced features and FAQs

How do I save the best model?

If you pass TrainingArguments with load_best_model_at_end=True to your Trainer, W&B saves the best performing model checkpoint to Artifacts.

If you save your model checkpoints as Artifacts, you can promote them to the Registry. In Registry, you can:

Organize your best model versions by ML task.
Centralize models and share them with your team.
Stage models for production or bookmark them for further evaluation.
Trigger downstream CI/CD processes.

How do I load a saved model?

If you saved your model to W&B Artifacts with WANDB_LOG_MODEL, you can download your model weights for additional training or to run inference. You just load them back into the same Hugging Face architecture that you used before.

# Create a new run
with wandb.init(project="amazon_sentiment_analysis") as run:
    # Pass the name and version of Artifact
    my_model_name = "model-bert-base-high-lr:latest"
    my_model_artifact = run.use_artifact(my_model_name)

    # Download model weights to a folder and return the path
    model_dir = my_model_artifact.download()

    # Load your Hugging Face model from that folder
    #  using the same model class
    model = AutoModelForSequenceClassification.from_pretrained(
        model_dir, num_labels=num_labels
    )

    # Do additional training, or run inference

How do I resume training from a checkpoint?

If you had set WANDB_LOG_MODEL='checkpoint' you can also resume training by you can using the model_dir as the model_name_or_path argument in your TrainingArguments and pass resume_from_checkpoint=True to Trainer.

last_run_id = "xxxxxxxx"  # fetch the run_id from your wandb workspace

# resume the wandb run from the run_id
with wandb.init(
    project=os.environ["WANDB_PROJECT"],
    id=last_run_id,
    resume="must",
) as run:
    # Connect an Artifact to the run
    my_checkpoint_name = f"checkpoint-{last_run_id}:latest"
    my_checkpoint_artifact = run.use_artifact(my_model_name)

    # Download checkpoint to a folder and return the path
    checkpoint_dir = my_checkpoint_artifact.download()

    # reinitialize your model and trainer
    model = AutoModelForSequenceClassification.from_pretrained(
        "<model_name>", num_labels=num_labels
    )
    # your awesome training arguments here.
    training_args = TrainingArguments()

    trainer = Trainer(model=model, args=training_args)

    # make sure use the checkpoint dir to resume training from the checkpoint
    trainer.train(resume_from_checkpoint=checkpoint_dir)

How do I log and view evaluation samples during training

Logging to W&B via the Transformers Trainer is taken care of by the WandbCallback in the Transformers library. If you need to customize your Hugging Face logging you can modify this callback by subclassing WandbCallback and adding additional functionality that leverages additional methods from the Trainer class.

Below is the general pattern to add this new callback to the HF Trainer, and further down is a code-complete example to log evaluation outputs to a W&B Table:

# Instantiate the Trainer as normal
trainer = Trainer()

# Instantiate the new logging callback, passing it the Trainer object
evals_callback = WandbEvalsCallback(trainer, tokenizer, ...)

# Add the callback to the Trainer
trainer.add_callback(evals_callback)

# Begin Trainer training as normal
trainer.train()

View evaluation samples during training

The following section shows how to customize the WandbCallback to run model predictions and log evaluation samples to a W&B Table during training. We will every eval_steps using the on_evaluate method of the Trainer callback.

Here, we wrote a decode_predictions function to decode the predictions and labels from the model output using the tokenizer.

Then, we create a pandas DataFrame from the predictions and labels and add an epoch column to the DataFrame.

Finally, we create a wandb.Table from the DataFrame and log it to wandb. Additionally, we can control the frequency of logging by logging the predictions every freq epochs.

Note: Unlike the regular WandbCallback this custom callback needs to be added to the trainer after the Trainer is instantiated and not during initialization of the Trainer. This is because the Trainer instance is passed to the callback during initialization.

from transformers.integrations import WandbCallback
import pandas as pd


def decode_predictions(tokenizer, predictions):
    labels = tokenizer.batch_decode(predictions.label_ids)
    logits = predictions.predictions.argmax(axis=-1)
    prediction_text = tokenizer.batch_decode(logits)
    return {"labels": labels, "predictions": prediction_text}


class WandbPredictionProgressCallback(WandbCallback):
    """Custom WandbCallback to log model predictions during training.

    This callback logs model predictions and labels to a wandb.Table at each
    logging step during training. It allows to visualize the
    model predictions as the training progresses.

    Attributes:
        trainer (Trainer): The Hugging Face Trainer instance.
        tokenizer (AutoTokenizer): The tokenizer associated with the model.
        sample_dataset (Dataset): A subset of the validation dataset
          for generating predictions.
        num_samples (int, optional): Number of samples to select from
          the validation dataset for generating predictions. Defaults to 100.
        freq (int, optional): Frequency of logging. Defaults to 2.
    """

    def __init__(self, trainer, tokenizer, val_dataset, num_samples=100, freq=2):
        """Initializes the WandbPredictionProgressCallback instance.

        Args:
            trainer (Trainer): The Hugging Face Trainer instance.
            tokenizer (AutoTokenizer): The tokenizer associated
              with the model.
            val_dataset (Dataset): The validation dataset.
            num_samples (int, optional): Number of samples to select from
              the validation dataset for generating predictions.
              Defaults to 100.
            freq (int, optional): Frequency of logging. Defaults to 2.
        """
        super().__init__()
        self.trainer = trainer
        self.tokenizer = tokenizer
        self.sample_dataset = val_dataset.select(range(num_samples))
        self.freq = freq

    def on_evaluate(self, args, state, control, **kwargs):
        super().on_evaluate(args, state, control, **kwargs)
        # control the frequency of logging by logging the predictions
        # every `freq` epochs
        if state.epoch % self.freq == 0:
            # generate predictions
            predictions = self.trainer.predict(self.sample_dataset)
            # decode predictions and labels
            predictions = decode_predictions(self.tokenizer, predictions)
            # add predictions to a wandb.Table
            predictions_df = pd.DataFrame(predictions)
            predictions_df["epoch"] = state.epoch
            records_table = self._wandb.Table(dataframe=predictions_df)
            # log the table to wandb
            self._wandb.log({"sample_predictions": records_table})


# First, instantiate the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_datasets["train"],
    eval_dataset=lm_datasets["validation"],
)

# Instantiate the WandbPredictionProgressCallback
progress_callback = WandbPredictionProgressCallback(
    trainer=trainer,
    tokenizer=tokenizer,
    val_dataset=lm_dataset["validation"],
    num_samples=10,
    freq=2,
)

# Add the callback to the trainer
trainer.add_callback(progress_callback)

For a more detailed example please refer to this colab

What additional W&B settings are available?

Further configuration of what is logged with Trainer is possible by setting environment variables. A full list of W&B environment variables can be found here.

Environment Variable	Usage
`WANDB_PROJECT`	Give your project a name (`huggingface` by default)
`WANDB_LOG_MODEL`	Log the model checkpoint as a W&B Artifact (`false` by default) `false` (default): No model checkpointing `checkpoint`: A checkpoint will be uploaded every args.save_steps (set in the Trainer’s TrainingArguments). `end`: The final model checkpoint will be uploaded at the end of training.
`WANDB_WATCH`	Set whether you’d like to log your models gradients, parameters or neither `false` (default): No gradient or parameter logging `gradients`: Log histograms of the gradients `all`: Log histograms of gradients and parameters
`WANDB_DISABLED`	Set to `true` to turn off logging entirely (`false` by default)
`WANDB_QUIET`.	Set to `true` to limit statements logged to standard output to critical statements only (`false` by default)
`WANDB_SILENT`	Set to `true` to silence the output printed by wandb (`false` by default)

WANDB_WATCH=all
WANDB_SILENT=true

%env WANDB_WATCH=all
%env WANDB_SILENT=true

How do I customize `wandb.init`?

The WandbCallback that Trainer uses will call wandb.init under the hood when Trainer is initialized. You can alternatively set up your runs manually by calling wandb.init before theTrainer is initialized. This gives you full control over your W&B run configuration.

An example of what you might want to pass to init is below. For wandb.init() details, see the wandb.init() reference.

wandb.init(
    project="amazon_sentiment_analysis",
    name="bert-base-high-lr",
    tags=["baseline", "high-lr"],
    group="bert",
)

Additional resources

Below are 6 Transformers and W&B related articles you might enjoy

Hyperparameter Optimization for Hugging Face Transformers

Three strategies for hyperparameter optimization for Hugging Face Transformers are compared: Grid Search, Bayesian Optimization, and Population Based Training.
We use a standard uncased BERT model from Hugging Face transformers, and we want to fine-tune on the RTE dataset from the SuperGLUE benchmark
Results show that Population Based Training is the most effective approach to hyperparameter optimization of our Hugging Face transformer model.

Read the Hyperparameter Optimization for Hugging Face Transformers report.

Hugging Tweets: Train a Model to Generate Tweets

In the article, the author demonstrates how to fine-tune a pre-trained GPT2 HuggingFace Transformer model on anyone’s Tweets in five minutes.
The model uses the following pipeline: Downloading Tweets, Optimizing the Dataset, Initial Experiments, Comparing Losses Between Users, Fine-Tuning the Model.

Read the full report here.

Sentence Classification With Hugging Face BERT and WB

In this article, we’ll build a sentence classifier leveraging the power of recent breakthroughs in Natural Language Processing, focusing on an application of transfer learning to NLP.
We’ll be using The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification, which is a set of sentences labeled as grammatically correct or incorrect that was first published in May 2018.
We’ll use Google’s BERT to create high performance models with minimal effort on a range of NLP tasks.

Read the full report here.

A Step by Step Guide to Tracking Hugging Face Model Performance

We use W&B and Hugging Face transformers to train DistilBERT, a Transformer that’s 40% smaller than BERT but retains 97% of BERT’s accuracy, on the GLUE benchmark
The GLUE benchmark is a collection of nine datasets and tasks for training NLP models

Read the full report here.

Examples of Early Stopping in HuggingFace

Fine-tuning a Hugging Face Transformer using Early Stopping regularization can be done natively in PyTorch or TensorFlow.
Using the EarlyStopping callback in TensorFlow is straightforward with the tf.keras.callbacks.EarlyStoppingcallback.
In PyTorch, there is not an off-the-shelf early stopping method, but there is a working early stopping hook available on GitHub Gist.

Read the full report here.

How to Fine-Tune Hugging Face Transformers on a Custom Dataset

We fine tune a DistilBERT transformer for sentiment analysis (binary classification) on a custom IMDB dataset.

Read the full report here.

Get help or request features

For any issues, questions, or feature requests for the Hugging Face W&B integration, feel free to post in this thread on the Hugging Face forums or open an issue on the Hugging Face Transformers GitHub repo.

7.12 - Hugging Face Diffusers

Try in Colab

Hugging Face Diffusers is the go-to library for state-of-the-art pre-trained diffusion models for generating images, audio, and even 3D structures of molecules. The W&B integration adds rich, flexible experiment tracking, media visualization, pipeline architecture, and configuration management to interactive centralized dashboards without compromising that ease of use.

Next-level logging in just two lines

Log all the prompts, negative prompts, generated media, and configs associated with your experiment by simply including 2 lines of code. Here are the 2 lines of code to begin logging:

# import the autolog function
from wandb.integration.diffusers import autolog

# call the autolog before calling the pipeline
autolog(init=dict(project="diffusers_logging"))


An example of how the results of your experiment are logged.

Get started

Install diffusers, transformers, accelerate, and wandb.

Command line:

pip install --upgrade diffusers transformers accelerate wandb

Notebook:

!pip install --upgrade diffusers transformers accelerate wandb

Use autolog to initialize a W&B Run and automatically track the inputs and the outputs from all supported pipeline calls.

You can call the autolog() function with the init parameter, which accepts a dictionary of parameters required by wandb.init().

When you call autolog(), it initializes a W&B Run and automatically tracks the inputs and the outputs from all supported pipeline calls.
- Each pipeline call is tracked into its own table in the workspace, and the configs associated with the pipeline call is appended to the list of workflows in the configs for that run.
- The prompts, negative prompts, and the generated media are logged in a wandb.Table.
- All other configs associated with the experiment including seed and the pipeline architecture are stored in the config section for the run.
- The generated media for each pipeline call are also logged in media panels in the run.
```
You can find a [list of supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). In case, you want to request a new feature of this integration or report a bug associated with it, open an issue on the [W&B GitHub issues page](https://github.com/wandb/wandb/issues).
```

Examples

Autologging

Here is a brief end-to-end example of the autolog in action:

import torch
from diffusers import DiffusionPipeline

# import the autolog function
from wandb.integration.diffusers import autolog

# call the autolog before calling the pipeline
autolog(init=dict(project="diffusers_logging"))

# Initialize the diffusion pipeline
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
).to("cuda")

# Define the prompts, negative prompts, and seed.
prompt = ["a photograph of an astronaut riding a horse", "a photograph of a dragon"]
negative_prompt = ["ugly, deformed", "ugly, deformed"]
generator = torch.Generator(device="cpu").manual_seed(10)

# call the pipeline to generate the images
images = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    num_images_per_prompt=2,
    generator=generator,
)

import torch
from diffusers import DiffusionPipeline

import wandb

# import the autolog function
from wandb.integration.diffusers import autolog

run = wandb.init()

# call the autolog before calling the pipeline
autolog(init=dict(project="diffusers_logging"))

# Initialize the diffusion pipeline
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
).to("cuda")

# Define the prompts, negative prompts, and seed.
prompt = ["a photograph of an astronaut riding a horse", "a photograph of a dragon"]
negative_prompt = ["ugly, deformed", "ugly, deformed"]
generator = torch.Generator(device="cpu").manual_seed(10)

# call the pipeline to generate the images
images = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    num_images_per_prompt=2,
    generator=generator,
)

# Finish the experiment
run.finish()

The results of a single experiment:
The results of multiple experiments:
The config of an experiment:

You need to explicitly call wandb.Run.finish() when executing the code in IPython notebook environments after calling the pipeline. This is not necessary when executing python scripts.

Tracking multi-pipeline workflows

This section demonstrates the autolog with a typical Stable Diffusion XL + Refiner workflow, in which the latents generated by the StableDiffusionXLPipeline is refined by the corresponding refiner.

Try in Colab

import torch
from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline
from wandb.integration.diffusers import autolog

# initialize the SDXL base pipeline
base_pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
base_pipeline.enable_model_cpu_offload()

# initialize the SDXL refiner pipeline
refiner_pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base_pipeline.text_encoder_2,
    vae=base_pipeline.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner_pipeline.enable_model_cpu_offload()

prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt = "static, frame, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing"

# Make the experiment reproducible by controlling randomness.
# The seed would be automatically logged to WandB.
seed = 42
generator_base = torch.Generator(device="cuda").manual_seed(seed)
generator_refiner = torch.Generator(device="cuda").manual_seed(seed)

# Call WandB Autolog for Diffusers. This would automatically log
# the prompts, generated images, pipeline architecture and all
# associated experiment configs to W&B, thus making your
# image generation experiments easy to reproduce, share and analyze.
autolog(init=dict(project="sdxl"))

# Call the base pipeline to generate the latents
image = base_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    output_type="latent",
    generator=generator_base,
).images[0]

# Call the refiner pipeline to generate the refined image
image = refiner_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image[None, :],
    generator=generator_refiner,
).images[0]

import torch
from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline

import wandb
from wandb.integration.diffusers import autolog

run = wandb.init()

# initialize the SDXL base pipeline
base_pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
base_pipeline.enable_model_cpu_offload()

# initialize the SDXL refiner pipeline
refiner_pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base_pipeline.text_encoder_2,
    vae=base_pipeline.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner_pipeline.enable_model_cpu_offload()

prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt = "static, frame, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing"

# Make the experiment reproducible by controlling randomness.
# The seed would be automatically logged to WandB.
seed = 42
generator_base = torch.Generator(device="cuda").manual_seed(seed)
generator_refiner = torch.Generator(device="cuda").manual_seed(seed)

# Call WandB Autolog for Diffusers. This would automatically log
# the prompts, generated images, pipeline architecture and all
# associated experiment configs to W&B, thus making your
# image generation experiments easy to reproduce, share and analyze.
autolog(init=dict(project="sdxl"))

# Call the base pipeline to generate the latents
image = base_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    output_type="latent",
    generator=generator_base,
).images[0]

# Call the refiner pipeline to generate the refined image
image = refiner_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image[None, :],
    generator=generator_refiner,
).images[0]

# Finish the experiment
run.finish()

Example of a Stable Diffisuion XL + Refiner experiment:

More resources

You must also set the AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY environment variables. ↩︎

Parameter	Description
`log_freq`	(`epoch`, `batch`, or an `int`): if `epoch`, logs metrics at the end of each epoch. If `batch`, logs metrics at the end of each batch. If an `int`, logs metrics at the end of that many batches. Defaults to `epoch`.
`initial_global_step`	(int): Use this argument to correctly log the learning rate when you resume training from some initial_epoch, and a learning rate scheduler is used. This can be computed as step_size * initial_step. Defaults to 0.

Parameter	Description
`filepath`	(str): path to save the mode file.
`monitor`	(str): The metric name to monitor.
`verbose`	(int): Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action.
`save_best_only`	(Boolean): if `save_best_only=True`, it only saves the latest model or the model it considers the best, according to the defined by the `monitor` and `mode` attributes.
`save_weights_only`	(Boolean): if True, saves only the model’s weights.
`mode`	(`auto`, `min`, or `max`): For `val_acc`, set it to `max`, for `val_loss`, set it to `min`, and so on
`save_freq`	(“epoch” or int): When using ‘epoch’, the callback saves the model after each epoch. When using an integer, the callback saves the model at end of this many batches. Note that when monitoring validation metrics such as `val_acc` or `val_loss`, `save_freq` must be set to “epoch” as those metrics are only available at the end of an epoch.
`options`	(str): Optional `tf.train.CheckpointOptions` object if `save_weights_only` is true or optional `tf.saved_model.SaveOptions` object if `save_weights_only` is false.
`initial_value_threshold`	(float): Floating point initial “best” value of the metric to be monitored.

Parameter	Description
`data_table_columns`	(list) List of column names for the `data_table`
`pred_table_columns`	(list) List of column names for the `pred_table`

Arguments
`monitor`	(str) name of metric to monitor. Defaults to `val_loss`.
`mode`	(str) one of {`auto`, `min`, `max`}. `min` - save model when monitor is minimized `max` - save model when monitor is maximized `auto` - try to guess when to save the model (default).
`save_model`	True - save a model when monitor beats all previous epochs False - don’t save models
`save_graph`	(boolean) if True save model graph to wandb (default to True).
`save_weights_only`	(boolean) if True, saves only the model’s weights(`model.save_weights(filepath)`). Otherwise, saves the full model).
`log_weights`	(boolean) if True save histograms of the model’s layer’s weights.
`log_gradients`	(boolean) if True log histograms of the training gradients
`training_data`	(tuple) Same format `(X,y)` as passed to `model.fit`. This is needed for calculating gradients - this is mandatory if `log_gradients` is `True`.
`validation_data`	(tuple) Same format `(X,y)` as passed to `model.fit`. A set of data for wandb to visualize. If you set this field, every epoch, wandb makes a small number of predictions and saves the results for later visualization.
`generator`	(generator) a generator that returns validation data for wandb to visualize. This generator should return tuples `(X,y)`. Either `validate_data` or generator should be set for wandb to visualize specific data examples.
`validation_steps`	(int) if `validation_data` is a generator, how many steps to run the generator for the full validation set.
`labels`	(list) If you are visualizing your data with wandb this list of labels converts numeric output to understandable string if you are building a classifier with multiple classes. For a binary classifier, you can pass in a list of two labels [`label for false`, `label for true`]. If `validate_data` and `generator` are both false, this does nothing.
`predictions`	(int) the number of predictions to make for visualization each epoch, max is 100.
`input_type`	(string) type of the model input to help visualization. can be one of: (`image`, `images`, `segmentation_mask`).
`output_type`	(string) type of the model output to help visualziation. can be one of: (`image`, `images`, `segmentation_mask`).
`log_evaluation`	(boolean) if True, save a Table containing validation data and the model’s predictions at each epoch. See `validation_indexes`, `validation_row_processor`, and `output_row_processor` for additional details.
`class_colors`	([float, float, float]) if the input or output is a segmentation mask, an array containing an rgb tuple (range 0-1) for each class.
`log_batch_frequency`	(integer) if None, callback logs every epoch. If set to integer, callback logs training metrics every `log_batch_frequency` batches.
`log_best_prefix`	(string) if None, saves no extra summary metrics. If set to a string, prepends the monitored metric and epoch with the prefix and saves the results as summary metrics.
`validation_indexes`	([wandb.data_types._TableLinkMixin]) an ordered list of index keys to associate with each validation example. If `log_evaluation` is True and you provide `validation_indexes`, does not create a Table of validation data. Instead, associates each prediction with the row represented by the `TableLinkMixin`. To obtain a list of row keys, use `Table.get_index()` .
`validation_row_processor`	(Callable) a function to apply to the validation data, commonly used to visualize the data. The function receives an `ndx` (int) and a `row` (dict). If your model has a single input, then `row["input"]` contains the input data for the row. Otherwise, it contains the names of the input slots. If your fit function takes a single target, then `row["target"]` contains the target data for the row. Otherwise, it contains the names of the output slots. For example, if your input data is a single array, to visualize the data as an Image, provide `lambda ndx, row: {"img": wandb.Image(row["input"])}` as the processor. Ignored if `log_evaluation` is False or `validation_indexes` are present.
`output_row_processor`	(Callable) same as `validation_row_processor`, but applied to the model’s output. `row["output"]` contains the results of the model output.
`infer_missing_processors`	(Boolean) Determines whether to infer `validation_row_processor` and `output_row_processor` if they are missing. Defaults to True. If you provide `labels`, W&B attempts to infer classification-type processors where appropriate.
`log_evaluation_frequency`	(int) Determines how often to log evaluation results. Defaults to `0` to log only at the end of training. Set to 1 to log every epoch, 2 to log every other epoch, and so on. Has no effect when `log_evaluation` is False.

Kubeflow Pipelines	W&B	Location in W&B
Input Scalar	`config`	Overview tab
Output Scalar	`summary`	Overview tab
Input Artifact	Input Artifact	Artifacts tab
Output Artifact	Output Artifact	Artifacts tab

Data	Client library	UI
`Parameter(...)`	`wandb.Run.config`	Overview tab, Config
`datasets`, `models`, `others`	`wandb.Run.use_artifact("{var_name}:latest")`	Artifacts tab
Base Python types (`dict`, `list`, `str`, etc.)	`wandb.Run.summary`	Overview tab, Summary

kwarg	Options
`datasets`	`True`: Log instance variables that are a dataset `False`
`models`	`True`: Log instance variables that are a model `False`
`others`	`True`: Log anything else that is serializable as a pickle `False`
`settings`	`wandb.Settings(…)`: Specify your own `wandb` settings for this step or flow `None`: Equivalent to passing `wandb.Settings()` By default, if: `settings.run_group` is `None`, it will be set to `{flow_name}/{run_id}` `settings.run_job_type` is `None`, it will be set to `{run_job_type}/{step_name}`

Logging Setting	Type
default (always on)	`dict, list, set, str, int, float, bool`
`datasets`	`pd.DataFrame` `pathlib.Path`
`models`	`nn.Module` `sklearn.base.BaseEstimator`
`others`	Anything that is pickle-able and JSON serializable

Kind of Variable	behavior	Example	Data Type
Instance	Auto-logged	`self.accuracy`	`float`
Instance	Logged if `datasets=True`	`self.df`	`pd.DataFrame`
Instance	Not logged if `datasets=False`	`self.df`	`pd.DataFrame`
Local	Never logged	`accuracy`	`float`
Local	Never logged	`df`	`pd.DataFrame`

Parameter	Description
`project`	W&B Project name (str, optional)
`group`	W&B group name (str, optional)
`name`	W&B Run name. If not specified, the State.run_name is used (str, optional)
`entity`	W&B entity name, such as your username or W&B Team name (str, optional)
`tags`	W&B tags (List[str], optional)
`log_artifacts`	Whether to log checkpoints to wandb, default: `false` (bool, optional)
`rank_zero_only`	Whether to log only on the rank-zero process. When logging artifacts, it is highly recommended to log on all ranks. Artifacts from ranks ≥1 are not stored, which may discard pertinent information. For example, when using Deepspeed ZeRO, it would be impossible to restore from checkpoints without artifacts from all ranks, default: `True` (bool, optional)
`init_kwargs`	Params to pass to `wandb.init()` such as your wandb `config` etc. See the `wandb.init()` parameters for parameters that `wandb.init()` accepts.

Argument	Description
fine_tune_job_id	This is the OpenAI Fine-Tune ID which you get when you create your fine-tune job using `client.fine_tuning.jobs.create`. If this argument is None (default), all the OpenAI fine-tune jobs that haven’t already been synced will be synced to W&B.
openai_client	Pass an initialized OpenAI client to `sync`. If no client is provided, one is initialized by the logger itself. By default it is None.
num_fine_tunes	If no ID is provided, then all the unsynced fine-tunes will be logged to W&B. This argument allows you to select the number of recent fine-tunes to sync. If num_fine_tunes is 5, it selects the 5 most recent fine-tunes.
project	W&B project name where your fine-tune metrics, models, data, etc. will be logged. By default, the project name is “OpenAI-Fine-Tune.”
entity	W&B Username or team name where you’re sending runs. By default, your default entity is used, which is usually your username.
overwrite	Forces logging and overwrite existing wandb run of the same fine-tune job. By default this is False.
wait_for_job_success	Once an OpenAI fine-tuning job is started it usually takes a bit of time. To ensure that your metrics are logged to W&B as soon as the fine-tune job is finished, this setting will check every 60 seconds for the status of the fine-tune job to change to `succeeded`. Once the fine-tune job is detected as being successful, the metrics will be synced automatically to W&B. Set to True by default.
model_artifact_name	The name of the model artifact that is logged. Defaults to `"model-metadata"`.
model_artifact_type	The type of the model artifact that is logged. Defaults to `"model"`.
**kwargs_wandb_init	Aany additional argument passed directly to `wandb.init()`

Metric	Description
`loss`	The loss of the model
`lr`	The learning rate
`tokens_per_second`	The tokens per second of the model
`grad_norm`	The gradient norm of the model
`global_step`	Corresponds to the current step in the training loop. Takes into account gradient accumulation, basically every time an optimizer step is taken, the model is updated, the gradients are accumulated and the model is updated once every `gradient_accumulation_steps`

Parameter	Description
`project`	Define what wandb Project to log to
`name`	Give a name to your wandb run
`log_model`	Log all models if `log_model="all"` or at end of training if `log_model=True`
`save_dir`	Path where data is saved

Parameter	Type	Description
`wandb_run`	`wandb.wandb_run`. Run	wandb run used to log data.
`save_model`	bool (default=True)	Whether to save a checkpoint of the best model and upload it to your Run on W&B.
`keys_ignored`	str or list of str (default=None)	Key or list of keys that should not be logged to tensorboard. Note that in addition to the keys provided by the user, keys such as those starting with `event_` or ending on `_best` are ignored by default.

Method	Description
`initialize`()	(Re-)Set the initial state of the callback.
`on_batch_begin`(net[, X, y, training])	Called at the beginning of each batch.
`on_batch_end`(net[, X, y, training])	Called at the end of each batch.
`on_epoch_begin`(net[, dataset_train, …])	Called at the beginning of each epoch.
`on_epoch_end`(net, **kwargs)	Log values from the last history step and save best model
`on_grad_computed`(net, named_parameters[, X, …])	Called once per batch after gradients have been computed but before an update step was performed.
`on_train_begin`(net, **kwargs)	Log model topology and add a hook for gradients
`on_train_end`(net[, X, y])	Called at the end of training.

Name	Description
`project_name`	`str`. The name of the W&B Project. The project will be created automatically if it doesn’t exist yet.
`remove_config_values`	`List[str]` . A list of values to exclude from the config before it is uploaded to W&B. `[]` by default.
`model_log_interval`	`Optional int`. `None` by default. If set, enables model versioning with Artifacts. Pass in the number of steps to wait between logging model checkpoints. `None` by default.
`log_dataset_dir`	`Optional str`. If passed a path, the dataset will be uploaded as an Artifact at the beginning of training. `None` by default.
`entity`	`Optional str` . If passed, the run will be created in the specified entity
`run_name`	`Optional str` . If specified, the run will be created with the specified name.

Argument	Usage
`verbose`	The verbosity of sb3 output
`model_save_path`	Path to the folder where the model will be saved, The default value is `None` so the model is not logged
`model_save_freq`	Frequency to save the model
`gradient_save_freq`	Frequency to log gradient. The default value is 0 so the gradients are not logged

Cloud provider	Credentials	Logging directory format
S3	`aws configure`	`s3://bucket/path/to/logs`
GCS	`gcloud auth application-default login`	`gs://bucket/path/to/logs`
Azure	`az login`¹	`az://account/container/path/to/logs`

Guides

What is W&B?

How does W&B work?

Are you a first-time user of W&B?

1 - W&B Quickstart

Sign up and create an API key

Install the wandb library and log in

Start a run and track hyperparameters

Assemble the components

Next steps

2 - W&B Models

2.1 - Experiments

How it works

Get started

Best practices and tips

2.1.1 - Create an experiment

How to create a W&B Experiment

Initialize a W&B run

Capture a dictionary of hyperparameters

Log metrics inside your training loop

Log an artifact to W&B

Putting it all together

Next steps: Visualize your experiment

Best practices

2.1.2 - Configure experiments

Set up an experiment configuration

Set the configuration at initialization

Set the configuration with argparse

Set the configuration throughout your script

Set the configuration after your Run has finished

absl.FLAGS

File-Based Configs

Example use case for file-based configs

TensorFlow v1 flags

2.1.3 - Projects

Overview tab

Workspace tab

Add a section of panels

Move panels between sections

Resize panels

Search for metrics

Runs tab

Automations tab

Reports tab

Sweeps tab

Artifacts tab

Overview panel

Metadata panel

Usage panel

Files panel

Lineage panel

Action History Audit tab

Versions tab

Create a project

Star a project

Delete a project

Add notes to a project

Add description overview to a project

Create reports to create descriptive notes comparing runs

Add notes to run workspace

2.1.4 - View experiments results

Workspace types

Saved workspace views

Create a new saved workspace view

Update a saved workspace view

Delete a saved workspace view

Share a workspace view

Workspace templates

Default workspace settings

Configure your workspace template

View your workspace template

Update your workspace template

Delete your workspace template

Programmatically create workspaces

Install Workspace API

Define and save a workspace view programmatically

Edit an existing view

Copy a workspace saved view to another workspace

2.1.5 - What are runs?

Initialize a W&B Run

Install the `wandb` library and log in

`absl.FLAGS`

Copy a workspace `saved view` to another workspace

`reinit` options

Specifying `reinit`