This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

What are runs?

Learn about the basic building block of W&B, Runs.

1: Add labels to runs with tags
2: Create and manage multiple runs in a single process
3: Customize run colors
4: Filter and search runs
5: Fork a run
6: Group runs into experiments
7: Move runs
8: Resume a run
9: Rewind a run
10: Semantic run plot legends
11: Send an alert

A run is a single unit of computation logged by W&B. You can think of a W&B Run as an atomic element of your whole project. In other words, each run is a record of a specific computation, such as training a model and logging the results, hyperparameter sweeps, and so forth.

Common patterns for initiating a run include, but are not limited to:

Training a model
Changing a hyperparameter and conducting a new experiment
Conducting a new machine learning experiment with a different model
Logging data or a model as a W&B Artifact
Downloading a W&B Artifact

W&B stores runs that you create into projects. You can view runs and their properties within the run’s project workspace on the W&B App. You can also programmatically access run properties with the wandb.Api.Run object.

Anything you log with wandb.Run.log() is recorded in that run.

import wandb

entity = "nico"  # Replace with your W&B entity
project = "awesome-project"

with wandb.init(entity=entity, project=project) as run:
    run.log({"accuracy": 0.9, "loss": 0.1})

The first line imports the W&B Python SDK. The second line initializes a run in the project awesome-project under the entity nico. The third line logs the accuracy and loss of the model to that run.

Within the terminal, W&B returns:

wandb: Syncing run earnest-sunset-1
wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project
wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/1jx1ud12
wandb:                                                                                
wandb: 
wandb: Run history:
wandb: accuracy ▁
wandb:     loss ▁
wandb: 
wandb: Run summary:
wandb: accuracy 0.9
wandb:     loss 0.5
wandb: 
wandb: 🚀 View run earnest-sunset-1 at: https://wandb.ai/nico/awesome-project/runs/1jx1ud12
wandb: ⭐️ View project at: https://wandb.ai/nico/awesome-project
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20241105_111006-1jx1ud12/logs

The URL W&B returns in the terminal to redirects you to the run’s workspace in the W&B App UI. Note that the panels generated in the workspace corresponds to the single point.

Logging a metrics at a single point of time might not be that useful. A more realistic example in the case of training discriminative models is to log metrics at regular intervals. For example, consider the proceeding code snippet:

import wandb
import random

config = {
    "epochs": 10,
    "learning_rate": 0.01,
}

with wandb.init(project="awesome-project", config=config) as run:
    print(f"lr: {config['learning_rate']}")
      
    # Simulating a training run
    for epoch in range(config['epochs']):
      offset = random.random() / 5
      acc = 1 - 2**-epoch - random.random() / (epoch + 1) - offset
      loss = 2**-epoch + random.random() / (epoch + 1) + offset
      print(f"epoch={epoch}, accuracy={acc}, loss={loss}")
      run.log({"accuracy": acc, "loss": loss})

This returns the following output:

wandb: Syncing run jolly-haze-4
wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project
wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/pdo5110r
lr: 0.01
epoch=0, accuracy=-0.10070974957523078, loss=1.985328507123956
epoch=1, accuracy=0.2884687745057535, loss=0.7374362314407752
epoch=2, accuracy=0.7347387967382066, loss=0.4402409835486663
epoch=3, accuracy=0.7667969248039795, loss=0.26176963846423457
epoch=4, accuracy=0.7446848791003173, loss=0.24808611724405083
epoch=5, accuracy=0.8035095836268268, loss=0.16169791827329466
epoch=6, accuracy=0.861349032371624, loss=0.03432578493587426
epoch=7, accuracy=0.8794926436276016, loss=0.10331872172219471
epoch=8, accuracy=0.9424839917077272, loss=0.07767793473500445
epoch=9, accuracy=0.9584880427028566, loss=0.10531971149250456
wandb: 🚀 View run jolly-haze-4 at: https://wandb.ai/nico/awesome-project/runs/pdo5110r
wandb: Find logs at: wandb/run-20241105_111816-pdo5110r/logs

The training script calls wandb.Run.log() 10 times. Each time the script calls wandb.Run.log(), W&B logs the accuracy and loss for that epoch. Selecting the URL that W&B prints from the preceding output, directs you to the run’s workspace in the W&B App UI.

W&B captures the simulated training loop within a single run called jolly-haze-4. This is because the script calls wandb.init() method only once.

As another example, during a sweep, W&B explores a hyperparameter search space that you specify. W&B implements each new hyperparameter combination that the sweep creates as a unique run.

Initialize a W&B Run

Initialize a W&B Run with wandb.init(). The proceeding code snippet shows how to import the W&B Python SDK and initialize a run.

Ensure to replace values enclosed in angle brackets (< >) with your own values:

import wandb

with wandb.init(entity="<entity>", project="<project>") as run:
    # Your code here

When you initialize a run, W&B logs your run to the project you specify for the project field (wandb.init(project="<project>"). W&B creates a new project if the project does not already exist. If the project already exists, W&B stores the run in that project.

If you do not specify a project name, W&B stores the run in a project called Uncategorized.

Each run in W&B has a unique identifier known as a run ID. You can specify a unique ID or let W&B randomly generate one for you.

Each run also has a human-readable, non-unique run name. You can specify a name for your run or let W&B randomly generate one for you. You can rename a run after initializing it.

For example, consider the following code snippet:

import wandb

run = wandb.init(entity="wandbee", project="awesome-project")

The code snippet produces the following output:

🚀 View run exalted-darkness-6 at: 
https://wandb.ai/nico/awesome-project/runs/pgbn9y21
Find logs at: wandb/run-20241106_090747-pgbn9y21/logs

Since the preceding code did not specify an argument for the id parameter, W&B creates a unique run ID. Where nico is the entity that logged the run, awesome-project is the name of the project the run is logged to, exalted-darkness-6 is the name of the run, and pgbn9y21 is the run ID.

Notebook users

Specify run.finish() at the end of your run to mark the run finished. This helps ensure that the run is properly logged to your project and does not continue in the background.

import wandb

run = wandb.init(entity="<entity>", project="<project>")
# Training code, logging, and so forth
run.finish()

If you group runs into experiments, you can move a run into or out of a group or from one group to another.

Each run has a state that describes the current status of the run. See Run states for a full list of possible run states.

Run states

The proceeding table describes the possible states a run can be in:

State	Description
`Crashed`	Run stopped sending heartbeats in the internal process, which can happen if the machine crashes.
`Failed`	Run ended with a non-zero exit status.
`Finished`	Run ended and fully synced data, or called `wandb.Run.finish()`.
`Killed`	Run was forcibly stopped before it could finish.
`Running`	Run is still running and has recently sent a heartbeat.

Unique run identifiers

Run IDs are unique identifiers for runs. By default, W&B generates a random and unique run ID for you when you initialize a new run. You can also specify your own unique run ID when you initialize a run.

Autogenerated run IDs

If you do not specify a run ID when you initialize a run, W&B generates a random run ID for you. You can find the unique ID of a run in the W&B App.

Navigate to the W&B App.
Navigate to the W&B project you specified when you initialized the run.
Within your project’s workspace, select the Runs tab.
Select the Overview tab.

W&B displays the unique run ID in the Run path field. The run path consists of the name of your team, the name of the project, and the run ID. The unique ID is the last part of the run path.

For example, in the proceeding image, the unique run ID is 9mxi1arc:

Custom run IDs

You can specify your own run ID by passing the id parameter to the wandb.init() method.

import wandb

run = wandb.init(entity="<project>", project="<project>", id="<run-id>")

You can use a run’s unique ID to directly navigate to the run’s overview page in the W&B App. The proceeding cell shows the URL path for a specific run:

https://wandb.ai/<entity>/<project>/<run-id>

Where values enclosed in angle brackets (< >) are placeholders for the actual values of the entity, project, and run ID.

Name your run

The name of a run is a human-readable, non-unique identifier.

By default, W&B generates a random run name when you initialize a new run. The name of a run appears within your project’s workspace and at the top of the run’s overview page.

Use run names as a way to quickly identify a run in your project workspace.

You can specify a name for your run by passing the name parameter to the wandb.init() method.

import wandb

with wandb.init(entity="<project>", project="<project>", name="<run-name>") as run:
    # Your code here

Rename a run

After you initialize a run, you can rename it from your workspace or its Runs page.

Navigate to your W&B project.
Select the Workspace or Runs tab from the project sidebar.
Search or scroll to the run you want to rename.

Hover over the run name, click the three vertical dots, then select the scope:
- Rename run for project: The run is renamed across the project.
- Rename run for workspace: The run is renamed only in this workspace.
Type a new name for the run. To generate a new random name, leave the field blank.
Submit the form. The run’s new name displays. An information icon appears next to a run that has a custom name in the workspace. Hover over it for more details.

You can also rename a run from a run set in a report:

In the report, click the pencil icon to open the report editor.
In the run set, find the run to rename. Hover over the report name, click the three vertical dots, then select either:

Rename run for project: rename the run across the entire project. To generate a new random name, leave the field blank.
Rename run for panel grid rename the run only in the report, preserving the existing name in other contexts. Generating a new random name is not supported.

Submit the form.

Click Publish report.

Add a note to a run

Notes that you add to a specific run appear on the run page in the Overview tab and in the table of runs on the project page.

Navigate to your W&B project
Select the Workspace tab from the project sidebar
Select the run you want to add a note to from the run selector
Choose the Overview tab
Select the pencil icon next to the Description field and add your notes

Stop a run

Stop a run from the W&B App or programmatically.

Navigate to the terminal or code editor where you initialized the run.
Press Ctrl+D to stop the run.

For example, following the preceding instructions, your terminal might looks similar to the following:

KeyboardInterrupt
wandb: 🚀 View run legendary-meadow-2 at: https://wandb.ai/nico/history-blaster-4/runs/o8sdbztv
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20241106_095857-o8sdbztv/logs

Navigate to the W&B App to confirm the run is no longer active:

Navigate to the project that your run was logging to.
Select the name of the run.

You can find the name of the run that you stop from the output of your terminal or code editor. For example, in the preceding example, the name of the run is legendary-meadow-2.

3. Choose the **Overview** tab from the project sidebar.

Next to the State field, the run’s state changes from running to Killed.

Navigate to the project that your run is logging to.
Select the run you want to stop within the run selector.
Choose the Overview tab from the project sidebar.
Select the top button next to the State field.

Next to the State field, the run’s state changes from running to Killed.

See State fields for a full list of possible run states.

View logged runs

View a information about a specific run such as the state of the run, artifacts logged to the run, log files recorded during the run, and more.

To view a specific run:

Navigate to the W&B App.
Navigate to the W&B project you specified when you initialized the run.
Within the project sidebar, select the Workspace tab.
Within the run selector, click the run you want to view, or enter a partial run name to filter for matching runs.

Note that the URL path of a specific run has the proceeding format:

https://wandb.ai/<team-name>/<project-name>/runs/<run-id>

Where values enclosed in angle brackets (< >) are placeholders for the actual values of the team name, project name, and run ID.

Customize how runs are displayed

This section shows how to customize how runs are displayed in your project’s Workspace and Runs tab, which share the same display configuration.

A workspace is limited to displaying a maximum of 1000 runs, regardless of its configuration.

To customize which columns are visible:

In the project sidebar, navigate to the Runs tab.
Above the list of runs, click Columns.
Click the name of a hidden column to show it. Click the name of a visible column to hide it.

You can optionally search by column name using fuzzy search, an exact match, or regular expressions. Drag columns to change their order.
Click Done to close the column browser.

To sort the list of runs by any visible column:

Hover over the column name, then click its action ... menu.
Click Sort ascending or Sort descending.

Pinned columns are shown on the right-hand side. Unpinned columns are shown on the left-hand side of the Runs tab and are not shown on the Workspace tab.

To pin a column:

In the project sidebar, navigate to the Runs tab.
Click Pin column.

To unpin a column:

In the project sidebar, navigate to the Workspace or Runs tab.
Hover over the column name, then click its action ... menu.
Click Unpin column.

By default, long run names are truncated in the middle for readability. To customize the truncation of run names:

Click the action ... menu at the top of the list of runs.
Set Run name cropping to crop the end, middle, or beginning.

See the Runs tab.

Overview tab

Use the Overview tab to learn about specific run information in a project, such as:

Author: The W&B entity that creates the run.
Command: The command that initializes the run.
Description: A description of the run that you provided. This field is empty if you do not specify a description when you create the run. You can add a description to a run with the W&B App UI or programmatically with the Python SDK.
Tracked Hours: The amount of time the run is actively computing or logging data, excluding any pauses or waiting periods. This metric helps you understand the actual computational time spent on your run.
Runtime: Measures the total time from the start to the end of the run. It’s the wall-clock time for the run, including any time where the run is paused or waiting for resources. This metric provides the complete elapsed time for your run.
Git repository: The git repository associated with the run. You must enable git to view this field.
Host name: Where W&B computes the run. W&B displays the name of your machine if you initialize the run locally on your machine.
Name: The name of the run.
OS: Operating system that initializes the run.
Python executable: The command that starts the run.
Python version: Specifies the Python version that creates the run.
Run path: Identifies the unique run identifier in the form entity/project/run-ID.
Start time: The timestamp when you initialize the run.
State: The state of the run.
System hardware: The hardware W&B uses to compute the run.
Tags: A list of strings. Tags are useful for organizing related runs together or applying temporary labels like baseline or production.
W&B CLI version: The W&B CLI version installed on the machine that hosted the run command.
Git state: The most recent git commit SHA of a repository or working directory where the run is initialized. This field is empty if you do not enable Git when you create the run or if the git information is not available.

W&B stores the proceeding information below the overview section:

Artifact Outputs: Artifact outputs produced by the run.
Config: List of config parameters saved with wandb.Run.config.
Summary: List of summary parameters saved with wandb.Run.log(). By default, W&B sets this value to the last value logged.

View an example project overview here.

Workspace tab

Use the Workspace tab to view, search, group, and arrange visualizations such as autogenerated and custom plots, system metrics, and more.

View an example project workspace here

Runs tab

Use the Runs tab to filter, group, and sort your runs.

The proceeding tabs demonstrate some common actions you can take in the Runs tab.

The Runs tab shows details about runs in the project. It shows a large number of columns by default.

To view all visible columns, scroll the page horizontally.
To change the order of the columns, drag a column to the left or right.
To pin a column, hover over the column name, click the action menu .... that appears, then click Pin column. Pinned columns appear near the left of the page, after the Name column. To unpin a pinned column, choose Unpin column
To hide a column, hover over the column name, click the action menu .... that appears, then click Hide column. To view all columns that are currently hidden, click Columns.
To show, hide, pin, and unpin multiple columns at once, click Columns.
- Click the name of a hidden column to unhide it.
- Click the name of a visible column to hide it.
- Click the pin icon next to a visible column to pin it.

When you customize the Runs tab, the customization is also reflected in the Runs selector of the Workspace tab.

Sort all rows in a Table by the value in a given column.

Hover your mouse over the column title. A kebab menu will appear (three vertical docs).
Select on the kebab menu (three vertical dots).
Choose Sort Asc or Sort Desc to sort the rows in ascending or descending order, respectively.

The preceding image demonstrates how to view sorting options for a Table column called val_acc.

Filter all rows by an expression with the Filter button above the dashboard.

Select Add filter to add one or more filters to your rows. Three dropdown menus will appear. From left to right the filter types are based on: Column name, Operator , and Values

	Column name	Binary relation	Value
Accepted values	String	=, ≠, ≤, ≥, IN, NOT IN,	Integer, float, string, timestamp, null

The expression editor shows a list of options for each term using autocomplete on column names and logical predicate structure. You can connect multiple logical predicates into one expression using “and” or “or” (and sometimes parentheses).

The preceding image shows a filter that is based on the `val_loss` column. The filter shows runs with a validation loss less than or equal to 1.

Group all rows by the value in a particular column with the Group by button above the dashboard.

By default, this turns other numeric columns into histograms that each show the distribution of values for that column across the group. Grouping is helpful for understanding higher-level patterns in your data.

The Group by feature is distinct from a run’s run group. You can group runs by run group. To move a run to a different run group, refer to Assign a group or job type to a run.

Logs tab

The Log tab shows output printed on the command line such as the standard output (stdout) and standard error (stderr).

Choose the Download button in the upper right hand corner to download the log file.

View an example logs tab here.

Files tab

Use the Files tab to view files associated with a specific run such as model checkpoints, validation set examples, and more

View an example files tab here.

Artifacts tab

The Artifacts tab lists the input and output artifacts for the specified run.

View example artifact graphs.

Delete runs

Delete one or more runs from a project with the W&B App.

Navigate to the project that contains the runs you want to delete.
Select the Runs tab from the project sidebar.
Select the checkbox next to the runs you want to delete.
Choose the Delete button (trash can icon) above the table.
From the modal that appears, choose Delete.

Once a run with a specific ID is deleted, its ID may not be used again. Trying to initiate a run with a previously deleted ID will show an error and prevent initiation.

For projects that contain a large number of runs, you can use either the search bar to filter runs you want to delete using Regex or the filter button to filter runs based on their status, tags, or other properties.

Organize runs

This section provides instructions on how to organize runs using groups and job types. By assigning runs to groups (for example, experiment names) and specifying job types (for example, preprocessing, training, evaluation, debugging), you can streamline your workflow and improve model comparison.

Assign a group or job type to a run

Each run in W&B can be categorized by group and a job type:

Group: a broad category for the experiment, used to organize and filter runs.
Job type: the function of the run, such as preprocessing, training, or evaluation.

The proceeding example workspace, trains a baseline model using increasing amounts of data from the Fashion-MNIST dataset. The workspace uses colorts to represent the amount of data used:

Yellow to dark green indicate increasing amounts of data for the baseline model.
Light blue to violet to magenta indicate amounts of data for a more complex “double” model with additional parameters.

Use W&B’s filtering options and search bar to compare runs based on specific conditions, such as:

Training on the same dataset.
Evaluating on the same test set.

When you apply filters, the Table view is updated automatically. This allows you to identify performance differences between models, such as determining which classes are significantly more challenging for one model compared to another.

1 - Add labels to runs with tags

Add tags to label runs with particular features that might not be obvious from the logged metrics or artifact data.

For example, you can add a tag to a run to indicated that run’s model is in_production, that run is preemptible, this run represents the baseline, and so forth.

Add tags to one or more runs

Programmatically or interactively add tags to your runs.

Based on your use case, select the tab below that best fits your needs:

You can add tags to a run when it is created:

import wandb

run = wandb.init(
  entity="entity",
  project="<project-name>",
  tags=["tag1", "tag2"]
)

You can also update the tags after you initialize a run. For example, the proceeding code snippet shows how to update a tag if a particular metrics crosses a pre-defined threshold:

import wandb

run = wandb.init(
  entity="entity", 
  project="capsules", 
  tags=["debug"]
  )

# python logic to train model

if current_loss < threshold:
    run.tags = run.tags + ("release_candidate",)

After you create a run, you can update tags using the Public API. For example:

run = wandb.Api().run("{entity}/{project}/{run-id}")
run.tags.append("tag1")  # you can choose tags based on run data here
run.update()

This method is best suited to tagging large numbers of runs with the same tag or tags.

Navigate to your project workspace.
Select Runs in the from the project sidebar.
Select one or more runs from the table.
Once you select one or more runs, select the Tag button above the table.
Type the tag you want to add and select the Create new tag checkbox to add the tag.

This method is best suited to applying a tag or tags to a single run manually.

Navigate to your project workspace.
Select a run from the list of runs within your project’s workspace.
Select Overview from the project sidebar.
Select the gray plus icon (+) button next to Tags.
Type a tag you want to add and select Add below the text box to add a new tag.

Remove tags from one or more runs

Tags can also be removed from runs with the W&B App UI.

This method is best suited to removing tags from a large numbers of runs.

In the Run sidebar of the project, select the table icon in the upper-right. This will expand the sidebar into the full runs table.
Hover over a run in the table to see a checkbox on the left or look in the header row for a checkbox to select all runs.
Select the checkbox to enable bulk actions.
Select the runs you want to remove tags.
Select the Tag button above the rows of runs.
Select the checkbox next to a tag to remove it from the run.

In the left sidebar of the Run page, select the top Overview tab. The tags on the run are visible here.
Hover over a tag and select the “x” to remove it from the run.

2 - Create and manage multiple runs in a single process

Manage multiple runs in a single Python process using W&B’s reinit functionality

Manage multiple runs in a single Python process. This is useful for workflows where you want to keep a primary process active while creating short-lived secondary processes for sub-tasks. Some use cases include:

Keeping a single “primary” run active throughout a script while spinning up short-lived “secondary” runs for evaluations or sub-tasks.
Orchestrating sub-experiments in a single file.
Logging from one “main” process to several runs that represent different tasks or time periods.

By default, W&B assumes each Python process has only one active run at a time when you call wandb.init(). If you call wandb.init() again, W&B will either return the same run or finish the old run before starting a new one, depending on the configuration. The content in this guide explains how to use reinit to modify the wandb.init() behavior to enable multiple runs in a single Python process.

Requirements

To manage multiple runs in a single Python process, you must have W&B Python SDK version v0.19.10 or newer.

`reinit` options

Use the reinit parameter to configure how W&B handles multiple calls to wandb.init(). The following table describes valid arguments and their effects:

	Description	Creates a run?	Example use case
`create_new`	Create a new run with `wandb.init()` without finishing existing, active runs. W&B does not automatically switch the global `wandb.Run` to new runs. You must hold onto each run object yourself. See the multiple runs in one process example below for details.	Yes	Ideal for creating and managing concurrent processes. For example, a “primary” run that remains active while you start or end “secondary” runs.
`finish_previous`	Finish all active runs with `run.finish()` before creating a new one run with `wandb.init()`. Default behavior for non notebook environments.	Yes	Ideal when you want to break sequential sub-processes into separate individual runs.
`return_previous`	Return the most recent, unfinished run. Default behavior for notebook environments.	No

W&B does not support create_new mode for W&B Integrations that assume a single global run, such as Hugging Face Trainer, Keras callbacks, and PyTorch Lightning. If you use these integrations, you should run each sub-experiment in a separate process.

Specifying `reinit`

Use wandb.init() with the reinit argument directly:

import wandb
wandb.init(reinit="<create_new|finish_previous|return_previous>")

Use wandb.init() and pass a wandb.Settings object to the settings parameter. Specify reinit in the Settings object:

import wandb
wandb.init(settings=wandb.Settings(reinit="<create_new|finish_previous|return_previous>"))

Use wandb.setup() to set the reinit option globally for all runs in the current process. This is useful if you want to configure the behavior once and have it apply to all subsequent wandb.init() calls in that process.
```
import wandb
wandb.setup(wandb.Settings(reinit="<create_new|finish_previous|return_previous>"))
```
Specify the desired value for reinit in the environment variable WANDB_REINIT. Defining an environment variable applies the reinit option to wandb.init() calls.
```
export WANDB_REINIT="<create_new|finish_previous|return_previous>"
```

The following code snippet shows a high level overview how to set up W&B to create a new run each time you call wandb.init():

import wandb

wandb.setup(wandb.Settings(reinit="create_new"))

with wandb.init() as experiment_results_run:
    # This run will be used to log the results of each experiment.
    # You can think of this as a parent run that collects results
      with wandb.init() as run:
         # The do_experiment() function logs fine-grained metrics
         # to the given run and returns result metrics that
         # you want to track separately.
         experiment_results = do_experiment(run)

         # After each experiment, log its results to a parent
         # run. Each point in the parent run's charts corresponds
         # to one experiment's results.
         experiment_results_run.log(experiment_results)

Example: Concurrent processes

Suppose you want to create a primary process that remains open for the script’s entire lifespan, while periodically spawning short-lived secondary processes without finishing the primary process. For example, this pattern can be useful if you want to train a model in the primary run, but compute evaluations or do other work in separate runs.

To achieve this, use reinit="create_new" and initialize multiple runs. For this example, suppose “Run A” is the primary process that remains open throughout the script, while “Run B1”, “Run B2”, are short-lived secondary runs for tasks like evaluation.

The high level workflow might look like this:

Initialize the primary process Run A with wandb.init() and log training metrics.
Initialize Run B1 (with wandb.init()), log data, then finish it.
Log more data to Run A.
Initialize Run B2, log data, then finish it.
Continue logging to Run A.
Finally finish Run A at the end.

The following Python code example demonstrates this workflow:

import wandb

def train(name: str) -> None:
    """Perform one training iteration in its own W&B run.

    Using a 'with wandb.init()' block with `reinit="create_new"` ensures that
    this training sub-run can be created even if another run (like our primary
    tracking run) is already active.
    """
    with wandb.init(
        project="my_project",
        name=name,
        reinit="create_new"
    ) as run:
        # In a real script, you'd run your training steps inside this block.
        run.log({"train_loss": 0.42})  # Replace with your real metric(s)

def evaluate_loss_accuracy() -> (float, float):
    """Returns the current model's loss and accuracy.
    
    Replace this placeholder with your real evaluation logic.
    """
    return 0.27, 0.91  # Example metric values

# Create a 'primary' run that remains active throughout multiple train/eval steps.
with wandb.init(
    project="my_project",
    name="tracking_run",
    reinit="create_new"
) as tracking_run:
    # 1) Train once under a sub-run named 'training_1'
    train("training_1")
    loss, accuracy = evaluate_loss_accuracy()
    tracking_run.log({"eval_loss": loss, "eval_accuracy": accuracy})

    # 2) Train again under a sub-run named 'training_2'
    train("training_2")
    loss, accuracy = evaluate_loss_accuracy()
    tracking_run.log({"eval_loss": loss, "eval_accuracy": accuracy})
    
    # The 'tracking_run' finishes automatically when this 'with' block ends.

Note three key points from the previous example:

reinit="create_new" creates a new run each time you call wandb.init().
You keep references of each run. wandb.run does not automatically point to the new run created with reinit="create_new". Store new runs in variables like run_a, run_b1, etc., and call .log() or .finish() on those objects as needed.
You can finish sub-runs whenever you want while keeping the primary run open until.
Finish your runs with run.finish() when you are done logging to them. This ensures that all data is uploaded and the run is properly closed.

3 - Customize run colors

W&B automatically assigns a color to each run that you create in your project. You can change the default color of a run to help you visually distinguish it from other runs in the table and graphs. Reset your project workspace to restore the default colors for all runs in the table.

Run colors are locally scoped. On the project page, custom colors apply only to your own workspace. In reports, custom colors for runs apply only at the section level. You can visualize the same run in different sections, which can use different custom colors per section.

Edit default run colors

Click the Runs tab from the project sidebar.
Click the dot color next to the run name in the Name column.
Select a color from the color palette or the color picker, or enter a hex code.

Edit default run color in project workspace

Randomize run colors

To randomize the colors of all runs in the table:

Click the Runs tab from the project sidebar.
Hover over the Name column header, click the three horizontal dots (…), and select Randomize run colors from the dropdown menu.

The option to randomize run colors is available only after modify the run’s table in some way, such as by sorting, filtering, searching, or grouping.

Reset run colors

To restore the default colors for all runs in the table:

Click the Runs tab from the project sidebar.
Hover over the Name column header, click the three horizontal dots (…), and select Reset colors from the dropdown menu.

4 - Filter and search runs

How to use the sidebar and table on the project page

Use your project page to gain insights from runs logged to W&B. You can filter and search runs from both the Workspace page and the Runs page.

Filter runs

Filter runs based on their status, tags, regular expressions (RegEx) or other properties with the filter button.

See Customize run colors for more information on how to edit, randomize, and reset run colors.

Filter runs with tags

Filter runs based on their tags with the filter button.

Click on the Runs tab from the project sidebar.
Select the Filter button, which looks like a funnel, at the top of the runs table.
From left to right, select "Tags" from the dropdown menu, select a logic operator, and select a filter search value.

Filter runs with regex

If regex doesn’t provide you the desired results, you can make use of tags to filter out the runs in Runs Table. Tags can be added either on run creation or after they’re finished. Once the tags are added to a run, you can add a tag filter as shown in the gif below.

Click on the Runs tab from the project sidebar.
Click on the search box at the top of the runs table.
Ensure that the RegEx toggle (.*) is enabled (the toggle should be blue).
Enter your regular expression in the search box.

Search runs

Use regular expressions (RegEx) to find runs with the regular expression you specify. When you type a query in the search box, that will filter down the visible runs in the graphs on the workspace as well as filtering the rows of the table.

Group runs

To group runs by one or more columns (including hidden columns):

Below the search box, click the Group button, which looks like a lined sheet of paper.
Select one or more columns to group results by.
Each set of grouped runs is collapsed by default. To expand it, click the arrow next to the group name.

Sort runs by minimum and maximum values

Sort the runs table by the minimum or maximum value of a logged metric. This is particularly useful if you want to view the best (or worst) recorded value.

The following steps describe how to sort the run table by a specific metric based on the minimum or maximum recorded value:

Hover your mouse over the column with the metric you want to sort with.
Select the kebab menu (three vertical lines).
From the dropdown, select either Show min or Show max.
From the same dropdown, select Sort by asc or Sort by desc to sort in ascending or descending order, respectively.

Search End Time for runs

We provide a column named End Time that logs that last heartbeat from the client process. The field is hidden by default.

Export runs table to CSV

Export the table of all your runs, hyperparameters, and summary metrics to a CSV with the download button.

5 - Fork a run

Forking a W&B run

The ability to fork a run is in private preview. Contact W&B Support at support@wandb.com to request access to this feature.

Use fork_from when you initialize a run with wandb.init() to “fork” from an existing W&B run. When you fork from a run, W&B creates a new run using the run ID and step of the source run.

Forking a run enables you to explore different parameters or models from a specific point in an experiment without impacting the original run.

Forking a run requires wandb SDK version >= 0.16.5
Forking a run requires monotonically increasing steps. You can not use non-monotonic steps defined with define_metric() to set a fork point because it would disrupt the essential chronological order of run history and system metrics.

Start a forked run

To fork a run, use the fork_from argument in wandb.init() and specify the source run ID and the step from the source run to fork from:

import wandb

# Initialize a run to be forked later
original_run = wandb.init(project="your_project_name", entity="your_entity_name")
# ... perform training or logging ...
original_run.finish()

# Fork the run from a specific step
forked_run = wandb.init(
    project="your_project_name",
    entity="your_entity_name",
    fork_from=f"{original_run.id}?_step=200",
)

Using an immutable run ID

Use an immutable run ID to ensure you have a consistent and unchanging reference to a specific run. Follow these steps to obtain the immutable run ID from the user interface:

Access the Overview Tab: Navigate to the Overview tab on the source run’s page.
Copy the Immutable Run ID: Click on the ... menu (three dots) located in the top-right corner of the Overview tab. Select the Copy Immutable Run ID option from the dropdown menu.

By following these steps, you will have a stable and unchanging reference to the run, which can be used for forking a run.

Continue from a forked run

After initializing a forked run, you can continue logging to the new run. You can log the same metrics for continuity and introduce new metrics.

For example, the following code example shows how to first fork a run and then how to log metrics to the forked run starting from a training step of 200:

import wandb
import math

# Initialize the first run and log some metrics
run1 = wandb.init("your_project_name", entity="your_entity_name")
for i in range(300):
    run1.log({"metric": i})
run1.finish()

# Fork from the first run at a specific step and log the metric starting from step 200
run2 = wandb.init(
    "your_project_name", entity="your_entity_name", fork_from=f"{run1.id}?_step=200"
)

# Continue logging in the new run
# For the first few steps, log the metric as is from run1
# After step 250, start logging the spikey pattern
for i in range(200, 300):
    if i < 250:
        run2.log({"metric": i})  # Continue logging from run1 without spikes
    else:
        # Introduce the spikey behavior starting from step 250
        subtle_spike = i + (2 * math.sin(i / 3.0))  # Apply a subtle spikey pattern
        run2.log({"metric": subtle_spike})
    # Additionally log the new metric at all steps
    run2.log({"additional_metric": i * 1.1})
run2.finish()

Rewind and forking compatibility

Forking compliments a rewind by providing more flexibility in managing and experimenting with your runs.

When you fork from a run, W&B creates a new branch off a run at a specific point to try different parameters or models.

When you rewind a run, W&B let’s you correct or modify the run history itself.

6 - Group runs into experiments

Group training and evaluation runs into larger experiments

Group individual jobs into experiments by passing a unique group name to wandb.init().

Use cases

Distributed training: Use grouping if your experiments are split up into different pieces with separate training and evaluation scripts that should be viewed as parts of a larger whole.
Multiple processes: Group multiple smaller processes together into an experiment.
K-fold cross-validation: Group together runs with different random seeds to see a larger experiment. Here’s an example of k-fold cross-validation with sweeps and grouping.

There are several ways to set grouping:

1. Set group in your script

Pass an optional group and job_type to wandb.init(). This gives you a dedicated group page for each experiment, which contains the individual runs. For example:wandb.init(group="experiment_1", job_type="eval")

2. Set a group environment variable

Use WANDB_RUN_GROUP to specify a group for your runs as an environment variable. For more on this, check our docs for Environment Variables. Group should be unique within your project and shared by all runs in the group. You can use wandb.util.generate_id() to generate a unique 8 character string to use in all your processes— for example, os.environ["WANDB_RUN_GROUP"] = "experiment-" + wandb.util.generate_id()

3. Set a group in the UI

After a run is initialized, you can move it to a new group from your workspace or its Runs page.

Navigate to your W&B project.
Select the Workspace or Runs tab from the project sidebar.
Search or scroll to the run you want to rename.

Hover over the run name, click the three vertical dots, then click Move to another group.
To create a new group, click New group. Type a group name, then submit the form.
Select the run’s new group from the list, then click Move.

4. Toggle grouping by columns in the UI

You can dynamically group by any column, including a column that is hidden. For example, if you use wandb.Run.config to log batch size or learning rate, you can then group by those hyperparameters dynamically in the web app. The Group by feature is distinct from a run’s run group. You can group runs by run group. To move a run to a different run group, refer to Set a group in the UI.

In the list of runs, the Group column is hidden by default.

To group runs by one or more columns:

Click Group.
Click the names of one or more columns.
If you selected more than one column, drag them to change the grouping order.
Click anywhere outside of the form to dismiss it.

Customize how runs are displayed

You can customize how runs are displayed in your project from the Workspace or Runs tabs. Both tabs use the same display configuration.

To customize which columns are visible:

Above the list of runs, click Columns.
Click the name of a hidden column to show it. Click the name of a visible column to hide it.

You can optionally search by column name using fuzzy search, an exact match, or regular expressions. Drag columns to change their order.
Click Done to close the column browser.

To sort the list of runs by any visible column:

Hover over the column name, then click its action ... menu.
Click Sort ascending or Sort descending.

Pinned columns are shown on the right-hand side. To pin or unpin a column:

Hover over the column name, then click its action ... menu.
Click Pin column or Unpin column.

By default, long run names are truncated in the middle for readability. To customize the truncation of run names:

Click the action ... menu at the top of the list of runs.
Set Run name cropping to crop the end, middle, or beginning.

Distributed training with grouping

Suppose you set grouping in wandb.init(), we will group runs by default in the UI. You can toggle this on and off by clicking the Group button at the top of the table. Here’s an example project generated from sample code where we set grouping. You can click on each “Group” row in the sidebar to get to a dedicated group page for that experiment.

From the project page above, you can click a Group in the left sidebar to get to a dedicated page like this one:

Grouping dynamically in the UI

You can group runs by any column, for example by hyperparameter. Here’s an example of what that looks like:

Sidebar: Runs are grouped by the number of epochs.
Graphs: Each line represents the group’s mean, and the shading indicates the variance. This behavior can be changed in the graph settings.

Turn off grouping

Click the grouping button and clear group fields at any time, which returns the table and graphs to their ungrouped state.

Grouping graph settings

Click the edit button in the upper right corner of a graph and select the Advanced tab to change the line and shading. You can select the mean, minimum, or maximum value for the line in each group. For the shading, you can turn off shading, and show the min and max, the standard deviation, and the standard error.

7 - Move runs

This page shows how to move a run from one project to another, into or out of a team, or from one team to another. You must have access to the run at its current and new locations.

When you move a run, historical artifacts associated with it are not moved. To move an artifact manually, you can use the wandb artifact get SDK command or the Api.artifact API to download the artifact, then use wandb artifact put or the Api.artifact API to upload it to the run’s new location.

To customize the Runs tab, refer to Project page.

If you group runs into experiments, refer to Set a group in the UI.

Move runs between your projects

To move runs from one project to another:

Navigate to the project that contains the runs you want to move.
Select the Runs tab from the project sidebar.
Select the checkbox next to the runs you want to move.
Choose the Move button above the table.
Select the destination project from the dropdown.

Move runs to a team

Move runs to a team you are a member of:

Navigate to the project that contains the runs you want to move.
Select the Runs tab from the project sidebar.
Select the checkbox next to the runs you want to move.
Choose the Move button above the table.
Select the destination team and project from the dropdown.

8 - Resume a run

Resume a paused or exited W&B Run

Specify how a run should behave in the event that run stops or crashes. To resume or enable a run to automatically resume, you will need to specify the unique run ID associated with that run for the id parameter:

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="<resume>")

W&B encourages you to provide the name of the W&B Project where you want to store the run.

Pass one of the following arguments to the resume parameter to determine how W&B should respond. In each case, W&B first checks if the run ID already exists.

Argument	Description	Run ID exists	Run ID does not exist	Use case
`"must"`	W&B must resume run specified by the run ID.	W&B resumes run with the same run ID.	W&B raises an error.	Resume a run that must use the same run ID.
`"allow"`	Allow W&B to resume run if run ID exists.	W&B resumes run with the same run ID.	W&B initializes a new run with specified run ID.	Resume a run without overriding an existing run.
`"never"`	Never allow W&B to resume a run specified by run ID.	W&B raises an error.	W&B initializes a new run with specified run ID.

You can also specify resume="auto" to let W&B to automatically try to restart the run on your behalf. However, you will need to ensure that you restart your run from the same directory. See the Enable runs to automatically resume section for more information.

For all the examples below, replace values enclosed within <> with your own.

Resume a run that must use the same run ID

If a run is stopped, crashes, or fails, you can resume it using the same run ID. To do so, initialize a run and specify the following:

Set the resume parameter to "must" (resume="must")
Provide the run ID of the run that stopped or crashed

The following code snippet shows how to accomplish this with the W&B Python SDK:

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="must")

Unexpected results will occur if multiple processes use the same id concurrently.

For more information on how to manage multiple processes, see the Log distributed training experiments

Resume a run without overriding the existing run

Resume a run that stopped or crashed without overriding the existing run. This is especially helpful if your process doesn’t exit successfully. The next time you start W&B, W&B will start logging from the last step.

Set the resume parameter to "allow" (resume="allow") when you initialize a run with W&B. Provide the run ID of the run that stopped or crashed. The following code snippet shows how to accomplish this with the W&B Python SDK:

import wandb

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="allow")

Enable runs to automatically resume

The following code snippet shows how to enable runs to automatically resume with the Python SDK or with environment variables.

The following code snippet shows how to specify a W&B run ID with the Python SDK.

Replace values enclosed within <> with your own:

run = wandb.init(entity="<entity>", \ 
        project="<project>", id="<run ID>", resume="<resume>")

The following example shows how to specify the W&B WANDB_RUN_ID variable in a bash script:

RUN_ID="$1"

WANDB_RESUME=allow WANDB_RUN_ID="$RUN_ID" python eval.py

Within your terminal, you could run the shell script along with the W&B run ID. The following code snippet passes the run ID akj172:

sh run_experiment.sh akj172

Automatic resuming only works if the process is restarted on top of the same filesystem as the failed process.

For example, suppose you execute a python script called train.py in a directory called Users/AwesomeEmployee/Desktop/ImageClassify/training/. Within train.py, the script creates a run that enables automatic resuming. Suppose next that the training script is stopped. To resume this run, you would need to restart your train.py script within Users/AwesomeEmployee/Desktop/ImageClassify/training/ .

If you can not share a filesystem, specify the WANDB_RUN_ID environment variable or pass the run ID with the W&B Python SDK. See the Custom run IDs section in the “What are runs?” page for more information on run IDs.

Resume preemptible Sweeps runs

Automatically requeue interrupted sweep runs. This is particularly useful if you run a sweep agent in a compute environment that is subject to preemption such as a SLURM job in a preemptible queue, an EC2 spot instance, or a Google Cloud preemptible VM.

Use the mark_preempting function to automatically requeue interrupted sweep runs. For example:

run = wandb.init()  # Initialize a run
run.mark_preempting()

The following table outlines how W&B handles runs based on the exit status of the a sweep run.

Status	Behavior
Status code 0	Run is considered to have terminated successfully and it will not be requeued.
Nonzero status	W&B automatically appends the run to a run queue associated with the sweep.
No status	Run is added to the sweep run queue. Sweep agents consume runs off the run queue until the queue is empty. Once the queue is empty, the sweep queue resumes generating new runs based on the sweep search algorithm.

9 - Rewind a run

Rewind

Rewind a run

The option to rewind a run is in private preview. Contact W&B Support at support@wandb.com to request access to this feature.

W&B currently does not support:

Log rewind: Logs are reset in the new run segment.
System metrics rewind: W&B logs only new system metrics after the rewind point.
Artifact association: W&B associates artifacts with the source run that produces them.

To rewind a run, you must have W&B Python SDK version >= 0.17.1.
You must use monotonically increasing steps. This does not work with non-monotonic steps defined with define_metric() because it disrupts the required chronological order of run history and system metrics.

Rewind a run to correct or modify the history of a run without losing the original data. In addition, when you rewind a run, you can log new data from that point in time. W&B recomputes the summary metrics for the run you rewind based on the newly logged history. This means the following behavior:

History truncation: W&B truncates the history to the rewind point, allowing new data logging.
Summary metrics: Recomputed based on the newly logged history.
Configuration preservation: W&B preserves the original configurations and you can merge new configurations.

When you rewind a run, W&B resets the state of the run to the specified step, preserving the original data and maintaining a consistent run ID. This means that:

Run archiving: W&B archives the original runs. Runs are accessible from the Run Overview tab.
Artifact association: Associates artifacts with the run that produce them.
Immutable run IDs: Introduced for consistent forking from a precise state.
Copy immutable run ID: A button to copy the immutable run ID for improved run management.

Rewind and forking compatibility

Forking compliments a rewind.

When you fork from a run, W&B creates a new branch off a run at a specific point to try different parameters or models.

When you rewind a run, W&B lets you correct or modify the run history itself.

Rewind a run

Use resume_from with wandb.init() to “rewind” a run’s history to a specific step. Specify the name of the run and the step you want to rewind from:

import wandb
import math

# Initialize the first run and log some metrics
# Replace with your_project_name and your_entity_name!
run1 = wandb.init(project="your_project_name", entity="your_entity_name")
for i in range(300):
    run1.log({"metric": i})
run1.finish()

# Rewind from the first run at a specific step and log the metric starting from step 200
run2 = wandb.init(project="your_project_name", entity="your_entity_name", resume_from=f"{run1.id}?_step=200")

# Continue logging in the new run
# For the first few steps, log the metric as is from run1
# After step 250, start logging the spikey pattern
for i in range(200, 300):
    if i < 250:
        run2.log({"metric": i, "step": i})  # Continue logging from run1 without spikes
    else:
        # Introduce the spikey behavior starting from step 250
        subtle_spike = i + (2 * math.sin(i / 3.0))  # Apply a subtle spikey pattern
        run2.log({"metric": subtle_spike, "step": i})
    # Additionally log the new metric at all steps
    run2.log({"additional_metric": i * 1.1, "step": i})
run2.finish()

View an archived run

After you rewind a run, you can explore archived run with the W&B App UI. Follow these steps to view archived runs:

Access the Overview Tab: Navigate to the Overview tab on the run’s page. This tab provides a comprehensive view of the run’s details and history.
Locate the Forked From field: Within the Overview tab, find the Forked From field. This field captures the history of the resumptions. The Forked From field includes a link to the source run, allowing you to trace back to the original run and understand the entire rewind history.

By using the Forked From field, you can effortlessly navigate the tree of archived resumptions and gain insights into the sequence and origin of each rewind.

Fork from a run that you rewind

To fork from a rewound run, use the fork_from argument in wandb.init() and specify the source run ID and the step from the source run to fork from:

import wandb

# Fork the run from a specific step
forked_run = wandb.init(
    project="your_project_name",
    entity="your_entity_name",
    fork_from=f"{rewind_run.id}?_step=500",
)

# Continue logging in the new run
for i in range(500, 1000):
    forked_run.log({"metric": i*3})
forked_run.finish()

10 - Semantic run plot legends

Create semantic legends for charts

Create visually meaningful line plots and plot legends by color-coding your W&B runs based on metrics or configuration parameters. Identify patterns and trends across experiments by coloring runs according to their performance metrics (highest, lowest, or latest values). W&B automatically groups your runs into color-coded buckets based on the values of your selected parameter.

Navigate to your workspace’s settings page to configure metric or configuration-based colors for runs:

Navigate to your W&B project.
Select the Workspace tab from the project sidebar.
Click on the Settings icon (⚙️) in the top right corner.
From the drawer, select Runs then select Key-based colors.
- From the Key dropdown, select the metric you want to use for assigning colors to runs.
- From the Y value dropdown, select the y value you want to use for assigning colors to runs.
- Set the the number of buckets to a value from 2 to 8.

The following sections describe how to set the metric and y value and as how to customize the buckets used for assigning colors to runs.

Set a metric

The metric options in your Key dropdown are derived from the key-value pairs you log to W&B and default metrics defined by W&B.

Default metrics

Relative Time (Process): The relative time of the run, measured in seconds since the start of the run.
Relative Time (Wall): The relative time of the run, measured in seconds since the start of the run, adjusted for wall clock time.
Wall Time: The wall clock time of the run, measured in seconds since the epoch.
Step: The step number of the run, which is typically used to track the progress of training or evaluation.

Custom metrics

Color runs and create meaningful plot legends based on custom metrics logged by your training or evaluation scripts. Custom metrics are logged as key-value pairs, where the key is the name of the metric and the value is the metric value.

For example, the following code snippet logs accuracy ("acc" key) and loss ("loss" key) during a training loop:

import wandb
import random

epochs = 10

with wandb.init(project="basic-intro") as run:
  # Block simulates a training loop logging metrics
  offset = random.random() / 5
  for epoch in range(2, epochs):
      acc = 1 - 2 ** -epoch - random.random() / epoch - offset
      loss = 2 ** -epoch + random.random() / epoch + offset

      # Log metrics from your script to W&B
      run.log({"acc": acc, "loss": loss})

Within the Key dropdown, both "acc" and "loss" are available options.

Set a configuration key

The configuration options in your Key dropdown are derived from the key-value pairs you pass to the config parameter when you initialize a W&B run. Configuration keys are typically used to log hyperparameters or other settings used in your training or evaluation scripts.

import wandb

config = {
  "learning_rate": 0.01,
  "batch_size": 32,
  "optimizer": "adam"
}

with wandb.init(project="basic-intro", config=config) as run:
  # Your training code here
  pass

Within the Key dropdown, "learning_rate", "batch_size", and "optimizer" are available options.

Set a y value

You can choose from the following options:

Latest: Determine color based on Y value at last logged step for each line.
Max: Color based on highest Y value logged against the metric.
Min: Color based on lowest Y value logged against the metric.

Customize buckets

Buckets are ranges of values that W&B uses to categorize runs based on the metric or configuration key you select. Buckets are evenly distributed across the range of values for the specified metric or configuration key and each bucket is assigned a unique color. Runs that fall within that bucket’s range are displayed in that color.

Consider the following:

Key is set to "Accuracy" (abbreviated as "acc").
Y value is set to "Max"

With this configuration, W&B colors each run based on their accuracy values. The colors vary from a light yellow color to a deep color. Lighter colors represent lower accuracy values, while deeper colors represent higher accuracy values.

Six buckets are defined for the metric, with each bucket representing a range of accuracy values. Within the Buckets section, the following range of buckets are defined:

Bucket 1: (Min - 0.7629)
Bucket 2: (0.7629 - 0.7824)
Bucket 3: (0.7824 - 0.8019)
Bucket 4: (0.8019 - 0.8214)
Bucket 5: (0.8214 - 0.8409)
Bucket 6: (0.8409 - Max)

In the line plot below, the run with the highest accuracy (0.8232) is colored in a deep purple (Bucket 5), while the run with the lowest accuracy (0.7684) is colored in a light orange (Bucket 2). The other runs are colored based on their accuracy values, with the color gradient indicating their relative performance.

11 - Send an alert

Send alerts, triggered from your Python code, to your Slack or email

Try in Colab

Create alerts with Slack or email if your run crashes or with a custom trigger. For example, you can create an alert if the gradient of your training loop starts to blow up (reports NaN) or a step in your ML pipeline completes. Alerts apply to all projects where you initialize runs, including both personal and team projects.

And then see W&B Alerts messages in Slack (or your email):

W&B Alerts require you to add run.alert() to your code. Without modifying your code, Automations provide another way to notify Slack based on an event in W&B, such as when an artifact artifact version is created or when a run metric meets or changes by a threshold.

For example, an automation can notify a Slack channel when a new version is created, run an automated testing webhook when the production alias is added to an artifact, or start a validation job only when a run’s loss is within acceptable bounds.

Read the Automations overview or create an automation.

Create an alert

The following guide only applies to alerts in multi-tenant cloud.

If you’re using W&B Server in your Private Cloud or on W&B Dedicated Cloud, refer to Configure Slack alerts in W&B Server to set up Slack alerts.

To set up an alert, take these steps, which are detailed in the following sections:

Turn on Alerts in your W&B User Settings.
Add run.alert() to your code.
Test the configuration.

1. Turn on alerts in your W&B User Settings

In your User Settings:

Scroll to the Alerts section
Turn on Scriptable run alerts to receive alerts from run.alert()
Use Connect Slack to pick a Slack channel to post alerts. We recommend the Slackbot channel because it keeps the alerts private.
Email will go to the email address you used when you signed up for W&B. We recommend setting up a filter in your email so all these alerts go into a folder and don’t fill up your inbox.

You will only have to do this the first time you set up W&B Alerts, or when you’d like to modify how you receive alerts.

2. Add `run.alert()` to your code

Add run.alert() to your code (either in a Notebook or Python script) wherever you’d like it to be triggered

import wandb

run = wandb.init()
run.alert(title="High Loss", text="Loss is increasing rapidly")

3. Test the configuration

Check your Slack or emails for the alert message. If you didn’t receive any, make sure you’ve got emails or Slack turned on for Scriptable Alerts in your User Settings

Example

This simple alert sends a warning when accuracy falls below a threshold. In this example, it only sends alerts at least 5 minutes apart.

import wandb
from wandb import AlertLevel

run = wandb.init()

if acc < threshold:
    run.alert(
        title="Low accuracy",
        text=f"Accuracy {acc} is below the acceptable threshold {threshold}",
        level=AlertLevel.WARN,
        wait_duration=300,
    )

Tag or mention users

Use the at sign @ followed by the Slack user ID to tag yourself or your colleagues in either the title or the text of the alert. You can find a Slack user ID from their Slack profile page.

run.alert(title="Loss is NaN", text=f"Hey <@U1234ABCD> loss has gone to NaN")

Configure team alerts

Team admins can set up alerts for the team on the team settings page: wandb.ai/teams/your-team.

Team alerts apply to everyone on your team. W&B recommends using the Slackbot channel because it keeps alerts private.

Change Slack channel to send alerts to

To change what channel alerts are sent to, click Disconnect Slack and then reconnect. After you reconnect, pick a different Slack channel.

What are runs?

Initialize a W&B Run

Notebook users

Run states

Unique run identifiers

Autogenerated run IDs

Custom run IDs

Name your run

Rename a run

Add a note to a run

Stop a run

View logged runs

Customize how runs are displayed

Overview tab

Workspace tab

Runs tab

Logs tab

Files tab

Artifacts tab

Delete runs

Organize runs

Assign a group or job type to a run

1 - Add labels to runs with tags

Add tags to one or more runs

Remove tags from one or more runs

2 - Create and manage multiple runs in a single process

Requirements

reinit options

Specifying reinit

Example: Concurrent processes

3 - Customize run colors

Edit default run colors

Randomize run colors

Reset run colors

4 - Filter and search runs

Filter runs

Filter runs with tags

Filter runs with regex

Search runs

Group runs

Sort runs by minimum and maximum values

Search End Time for runs

Export runs table to CSV

5 - Fork a run

Start a forked run

Using an immutable run ID

Continue from a forked run

Rewind and forking compatibility

6 - Group runs into experiments

Use cases

1. Set group in your script

2. Set a group environment variable

3. Set a group in the UI

4. Toggle grouping by columns in the UI

Customize how runs are displayed

Distributed training with grouping

Grouping dynamically in the UI

Turn off grouping

Grouping graph settings

7 - Move runs

Move runs between your projects

Move runs to a team

8 - Resume a run

Resume a run that must use the same run ID

Resume a run without overriding the existing run

Enable runs to automatically resume

Resume preemptible Sweeps runs

9 - Rewind a run

Rewind a run

Rewind and forking compatibility

Rewind a run

View an archived run

Fork from a run that you rewind

10 - Semantic run plot legends

Set a metric

Default metrics

Custom metrics

Set a configuration key

Set a y value

Customize buckets

`reinit` options

Specifying `reinit`

2. Add `run.alert()` to your code