This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Integrations

1: Add wandb to any library
2: Azure OpenAI Fine-Tuning
3: Catalyst
4: Cohere fine-tuning
5: Databricks
6: DeepChecks
7: DeepChem
8: Docker
9: Farama Gymnasium
10: fastai

10.1: fastai v1

11: Hugging Face Transformers
12: Hugging Face Diffusers
13: Hugging Face AutoTrain
14: Hugging Face Accelerate
15: Hydra
16: Keras
17: Kubeflow Pipelines (kfp)
18: LightGBM
19: Metaflow
20: MMEngine
21: MMF
22: MosaicML Composer
23: OpenAI API
24: OpenAI Fine-Tuning
25: OpenAI Gym
26: PaddleDetection
27: PaddleOCR
28: Prodigy
29: PyTorch
30: PyTorch Geometric
31: Pytorch torchtune
32: PyTorch Ignite
33: PyTorch Lightning
34: Ray Tune
35: SageMaker
36: Scikit-Learn
37: Simple Transformers
38: Skorch
39: spaCy
40: Stable Baselines 3
41: TensorBoard
42: TensorFlow
43: W&B for Julia
44: XGBoost
45: YOLOv5
46: Ultralytics
47: YOLOX

W&B integrations make it fast and easy to set up experiment tracking and data versioning inside existing projects. Check out integrations for ML frameworks such as PyTorch, ML libraries such as Hugging Face, or cloud services such as Amazon SageMaker.

Examples: Try the code with notebook and script examples for each integration.
Video Tutorials: Learn to use W&B with YouTube video tutorials

1 - Add wandb to any library

Add wandb to any library

This guide provides best practices on how to integrate W&B into your Python library to get powerful Experiment Tracking, GPU and System Monitoring, Model Checkpointing, and more for your own library.

If you are still learning how to use W&B, we recommend exploring the other W&B Guides in these docs, such as Experiment Tracking, before reading further.

Below we cover best tips and best practices when the codebase you are working on is more complicated than a single Python training script or Jupyter notebook. The topics covered are:

Setup requirements
User Login
Starting a wandb Run
Defining a Run Config
Logging to W&B
Distributed Training
Model Checkpointing and More
Hyper-parameter tuning
Advanced Integrations

Setup requirements

Before you get started, decide whether or not to require W&B in your library’s dependencies:

Require W&B on installation

Add the W&B Python library (wandb) to your dependencies file, for example, in your requirements.txt file:

torch==1.8.0 
...
wandb==0.13.*

Make W&B optional on installation

There are two ways to make the W&B SDK (wandb) optional:

A. Raise an error when a user tries to use wandb functionality without installing it manually and show an appropriate error message:

try: 
    import wandb 
except ImportError: 
    raise ImportError(
        "You are trying to use wandb which is not currently installed."
        "Please install it using pip install wandb"
    )

B. Add wandb as an optional dependency to your pyproject.toml file, if you are building a Python package:

[project]
name = "my_awesome_lib"
version = "0.1.0"
dependencies = [
    "torch",
    "sklearn"
]

[project.optional-dependencies]
dev = [
    "wandb"
]

Create an API key

An API key authenticates a client or machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to https://wandb.ai/authorize. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

If a user is using wandb for the first time without following any of the steps mentioned above, they will automatically be prompted to log in when your script calls wandb.init.

Start a run

A W&B Run is a unit of computation logged by W&B. Typically, you associate a single W&B Run per training experiment.

Initialize W&B and start a Run within your code with:

run = wandb.init()

Optionally, you can provide a name for their project, or let the user set it themselves with parameters such as wandb_project in your code along with the username or team name, such as wandb_entity, for the entity parameter:

run = wandb.init(project=wandb_project, entity=wandb_entity)

You must call run.finish() to finish the run. If this works with your integration’s design, use the run as a context manager:

# When this block exits, it calls run.finish() automatically.
# If it exits due to an exception, it uses run.finish(exit_code=1) which
# marks the run as failed.
with wandb.init() as run:
    ...

When to call `wandb.init`?

Your library should create W&B Run as early as possible because any output in your console, including error messages, is logged as part of the W&B Run. This makes debugging easier.

Use `wandb` as an optional dependency

If you want to make wandb optional when your users use your library, you can either:

Define a wandb flag such as:

trainer = my_trainer(..., use_wandb=True)

python train.py ... --use-wandb

Or, set wandb to be disabled in wandb.init:

wandb.init(mode="disabled")

export WANDB_MODE=disabled

wandb disabled

Or, set wandb to be offline - note this will still run wandb, it just won’t try and communicate back to W&B over the internet:

export WANDB_MODE=offline

os.environ['WANDB_MODE'] = 'offline'

wandb offline

Define a run config

With a wandb run config, you can provide metadata about your model, dataset, and so on when you create a W&B Run. You can use this information to compare different experiments and quickly understand the main differences.

Typical config parameters you can log include:

Model name, version, architecture parameters, etc.
Dataset name, version, number of train/val examples, etc.
Training parameters such as learning rate, batch size, optimizer, etc.

The following code snippet shows how to log a config:

config = {"batch_size": 32, ...}
wandb.init(..., config=config)

Update the run config

Use run.config.update to update the config. Updating your configuration dictionary is useful when parameters are obtained after the dictionary was defined. For example, you might want to add a model’s parameters after the model is instantiated.

run.config.update({"model_parameters": 3500})

For more information on how to define a config file, see Configure experiments.

Log to W&B

Log metrics

Create a dictionary where the key value is the name of the metric. Pass this dictionary object to run.log:

for epoch in range(NUM_EPOCHS):
    for input, ground_truth in data: 
        prediction = model(input) 
        loss = loss_fn(prediction, ground_truth) 
        metrics = { "loss": loss } 
        run.log(metrics)

If you have a lot of metrics, you can have them automatically grouped in the UI by using prefixes in the metric name, such as train/... and val/.... This will create separate sections in your W&B Workspace for your training and validation metrics, or other metric types you’d like to separate:

metrics = {
    "train/loss": 0.4,
    "train/learning_rate": 0.4,
    "val/loss": 0.5, 
    "val/accuracy": 0.7
}
run.log(metrics)

Learn more about run.log.

Prevent x-axis misalignments

If you perform multiple calls to run.log for the same training step, the wandb SDK increments an internal step counter for each call to run.log. This counter may not align with the training step in your training loop.

To avoid this situation, define your x-axis step explicitly with run.define_metric, one time, immediately after you call wandb.init:

with wandb.init(...) as run:
    run.define_metric("*", step_metric="global_step")

The glob pattern, *, means that every metric will use global_step as the x-axis in your charts. If you only want certain metrics to be logged against global_step, you can specify them instead:

run.define_metric("train/loss", step_metric="global_step")

Now, log your metrics, your step metric, and your global_step each time you call run.log:

for step, (input, ground_truth) in enumerate(data):
    ...
    run.log({"global_step": step, "train/loss": 0.1})
    run.log({"global_step": step, "eval/loss": 0.2})

If you do not have access to the independent step variable, for example “global_step” is not available during your validation loop, the previously logged value for “global_step” is automatically used by wandb. In this case, ensure you log an initial value for the metric so it has been defined when it’s needed.

Log images, tables, audio, and more

In addition to metrics, you can log plots, histograms, tables, text, and media such as images, videos, audios, 3D, and more.

Some considerations when logging data include:

How often should the metric be logged? Should it be optional?
What type of data could be helpful in visualizing?
- For images, you can log sample predictions, segmentation masks, etc., to see the evolution over time.
- For text, you can log tables of sample predictions for later exploration.

Learn more about logging media, objects, plots, and more.

Distributed training

For frameworks supporting distributed environments, you can adapt any of the following workflows:

Detect which is the “main” process and only use wandb there. Any required data coming from other processes must be routed to the main process first. (This workflow is encouraged).
Call wandb in every process and auto-group them by giving them all the same unique group name.

See Log Distributed Training Experiments for more details.

Log model checkpoints and more

If your framework uses or produces models or datasets, you can log them for full traceability and have wandb automatically monitor your entire pipeline through W&B Artifacts.

Stored Datasets and Model Checkpoints in W&B

When using Artifacts, it might be useful but not necessary to let your users define:

The ability to log model checkpoints or datasets (in case you want to make it optional).
The path/reference of the artifact being used as input, if any. For example, user/project/artifact.
The frequency for logging Artifacts.

Log model checkpoints

You can log Model Checkpoints to W&B. It is useful to leverage the unique wandb Run ID to name output Model Checkpoints to differentiate them between Runs. You can also add useful metadata. In addition, you can also add aliases to each model as shown below:

metadata = {"eval/accuracy": 0.8, "train/steps": 800} 

artifact = wandb.Artifact(
                name=f"model-{run.id}", 
                metadata=metadata, 
                type="model"
                ) 
artifact.add_dir("output_model") # local directory where the model weights are stored

aliases = ["best", "epoch_10"] 
run.log_artifact(artifact, aliases=aliases)

For information on how to create a custom alias, see Create a Custom Alias.

You can log output Artifacts at any frequency (for example, every epoch, every 500 steps, and so on) and they are automatically versioned.

Log and track pre-trained models or datasets

You can log artifacts that are used as inputs to your training such as pre-trained models or datasets. The following snippet demonstrates how to log an Artifact and add it as an input to the ongoing Run as shown in the graph above.

artifact_input_data = wandb.Artifact(name="flowers", type="dataset")
artifact_input_data.add_file("flowers.npy")
run.use_artifact(artifact_input_data)

Download an artifact

You re-use an Artifact (dataset, model, etc.) and wandb will download a copy locally (and cache it):

artifact = run.use_artifact("user/project/artifact:latest")
local_path = artifact.download("./tmp")

Artifacts can be found in the Artifacts section of W&B and can be referenced with aliases generated automatically (latest, v2, v3) or manually when logging (best_accuracy, etc.).

To download an Artifact without creating a wandb run (through wandb.init), for example in distributed environments or for simple inference, you can instead reference the artifact with the wandb API:

artifact = wandb.Api().artifact("user/project/artifact:latest")
local_path = artifact.download()

For more information, see Download and Use Artifacts.

Tune hyper-parameters

If your library would like to leverage W&B hyper-parameter tuning, W&B Sweeps can also be added to your library.

Advanced integrations

You can also see what an advanced W&B integrations look like in the following integrations. Note most integrations will not be as complex as these:

2 - Azure OpenAI Fine-Tuning

How to Fine-Tune Azure OpenAI models using W&B.

Introduction

Fine-tuning GPT-3.5 or GPT-4 models on Microsoft Azure using W&B tracks, analyzes, and improves model performance by automatically capturing metrics and facilitating systematic evaluation through W&B’s experiment tracking and evaluation tools.

Prerequisites

Set up Azure OpenAI service according to official Azure documentation.
Configure a W&B account with an API key.

Workflow overview

1. Fine-tuning setup

Prepare training data according to Azure OpenAI requirements.
Configure the fine-tuning job in Azure OpenAI.
W&B automatically tracks the fine-tuning process, logging metrics and hyperparameters.

2. Experiment tracking

During fine-tuning, W&B captures:

Training and validation metrics
Model hyperparameters
Resource utilization
Training artifacts

3. Model evaluation

After fine-tuning, use W&B Weave to:

Evaluate model outputs against reference datasets
Compare performance across different fine-tuning runs
Analyze model behavior on specific test cases
Make data-driven decisions for model selection

Real-world example

Explore the medical note generation demo to see how this integration facilitates:
- Systematic tracking of fine-tuning experiments
- Model evaluation using domain-specific metrics
Go through an interactive demo of fine-tuning a notebook

Additional resources

3 - Catalyst

How to integrate W&B for Catalyst, a Pytorch framework.

Catalyst is a PyTorch framework for deep learning R&D that focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new.

Catalyst includes a W&B integration for logging parameters, metrics, images, and other artifacts.

Check out their documentation of the integration, which includes examples using Python and Hydra.

Interactive Example

Run an example colab to see Catalyst and W&B integration in action.

4 - Cohere fine-tuning

How to Fine-Tune Cohere models using W&B.

With Weights & Biases you can log your Cohere model’s fine-tuning metrics and configuration to analyze and understand the performance of your models and share the results with your colleagues.

This guide from Cohere has a full example of how to kick off a fine-tuning run and you can find the Cohere API docs here

Log your Cohere fine-tuning results

To add Cohere fine-tuning logging to your W&B workspace:

Create a WandbConfig with your W&B API key, W&B entity and project name. You can find your W&B API key at https://wandb.ai/authorize

Pass this config to the FinetunedModel object along with your model name, dataset and hyperparameters to kick off your fine-tuning run.

from cohere.finetuning import WandbConfig, FinetunedModel

# create a config with your W&B details
wandb_ft_config = WandbConfig(
    api_key="<wandb_api_key>",
    entity="my-entity", # must be a valid enitity associated with the provided API key
    project="cohere-ft",
)

...  # set up your datasets and hyperparameters

# start a fine-tuning run on cohere
cmd_r_finetune = co.finetuning.create_finetuned_model(
  request=FinetunedModel(
    name="command-r-ft",
    settings=Settings(
      base_model=...
      dataset_id=...
      hyperparameters=...
      wandb=wandb_ft_config  # pass your W&B config here
    ),
  ),
)

View your model’s fine-tuning training and validation metrics and hyperparameters in the W&B project that you created.

Organize runs

Your W&B runs are automatically organized and can be filtered/sorted based on any configuration parameter such as job type, base model, learning rate and any other hyper-parameter.

In addition, you can rename your runs, add notes or create tags to group them.

Resources

Cohere Fine-tuning Example

5 - Databricks

How to integrate W&B with Databricks.

W&B integrates with Databricks by customizing the W&B Jupyter notebook experience in the Databricks environment.

Configure Databricks

Install wandb in the cluster

Navigate to your cluster configuration, choose your cluster, click Libraries. Click Install New, choose PyPI, and add the package wandb.

Set up authentication

To authenticate your W&B account you can add a Databricks secret which your notebooks can query.

# install databricks cli
pip install databricks-cli

# Generate a token from databricks UI
databricks configure --token

# Create a scope with one of the two commands (depending if you have security features enabled on databricks):
# with security add-on
databricks secrets create-scope --scope wandb
# without security add-on
databricks secrets create-scope --scope wandb --initial-manage-principal users

# Add your api_key from: https://app.wandb.ai/authorize
databricks secrets put --scope wandb --key api_key

Examples

Simple example

import os
import wandb

api_key = dbutils.secrets.get("wandb", "api_key")
wandb.login(key=api_key)

wandb.init()
wandb.log({"foo": 1})

Sweeps

Setup required (temporary) for notebooks attempting to use wandb.sweep() or wandb.agent():

import os

# These will not be necessary in the future
os.environ["WANDB_ENTITY"] = "my-entity"
os.environ["WANDB_PROJECT"] = "my-project-that-exists"

6 - DeepChecks

How to integrate W&B with DeepChecks.

Try in Colab

DeepChecks helps you validate your machine learning models and data, such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models, all with minimal effort.

Getting Started

To use DeepChecks with Weights & Biases you will first need to sign up for a Weights & Biases account here. With the Weights & Biases integration in DeepChecks you can quickly get started like so:

import wandb

wandb.login()

# import your check from deepchecks
from deepchecks.checks import ModelErrorAnalysis

# run your check
result = ModelErrorAnalysis()

# push that result to wandb
result.to_wandb()

You can also log an entire DeepChecks test suite to Weights & Biases

import wandb

wandb.login()

# import your full_suite tests from deepchecks
from deepchecks.suites import full_suite

# create and run a DeepChecks test suite
suite_result = full_suite().run(...)

# push thes results to wandb
# here you can pass any wandb.init configs and arguments you need
suite_result.to_wandb(project="my-suite-project", config={"suite-name": "full-suite"})

Example

This Report shows off the power of using DeepChecks and Weights & Biases.

Any questions or issues about this Weights & Biases integration? Open an issue in the DeepChecks github repository and we’ll catch it and get you an answer :)

7 - DeepChem

How to integrate W&B with DeepChem library.

The DeepChem library provides open source tools that democratize the use of deep-learning in drug discovery, materials science, chemistry, and biology. This W&B integration adds simple and easy-to-use experiment tracking and model checkpointing while training models using DeepChem.

DeepChem logging in 3 lines of code

logger = WandbLogger(…)
model = TorchModel(…, wandb_logger=logger)
model.fit(…)

Report and Google Colab

Explore the Using W&B with DeepChem: Molecular Graph Convolutional Networks article for an example charts generated using the W&B DeepChem integration.

To dive straight into working code, check out this Google Colab.

Track experiments

Set up W&B for DeepChem models of type KerasModel or TorchModel.

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to https://wandb.ai/authorize. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

Log your training and evaluation data to W&B

Training loss and evaluation metrics can be automatically logged to W&B. Optional evaluation can be enabled using the DeepChem ValidationCallback, the WandbLogger will detect ValidationCallback callback and log the metrics generated.

from deepchem.models import TorchModel, ValidationCallback

vc = ValidationCallback(…)  # optional
model = TorchModel(…, wandb_logger=logger)
model.fit(…, callbacks=[vc])
logger.finish()

from deepchem.models import KerasModel, ValidationCallback

vc = ValidationCallback(…)  # optional
model = KerasModel(…, wandb_logger=logger)
model.fit(…, callbacks=[vc])
logger.finish()

8 - Docker

How to integrate W&B with Docker.

Docker Integration

W&B can store a pointer to the Docker image that your code ran in, giving you the ability to restore a previous experiment to the exact environment it was run in. The wandb library looks for the WANDB_DOCKER environment variable to persist this state. We provide a few helpers that automatically set this state.

Local Development

wandb docker is a command that starts a docker container, passes in wandb environment variables, mounts your code, and ensures wandb is installed. By default the command uses a docker image with TensorFlow, PyTorch, Keras, and Jupyter installed. You can use the same command to start your own docker image: wandb docker my/image:latest. The command mounts the current directory into the “/app” directory of the container, you can change this with the “–dir” flag.

Production

The wandb docker-run command is provided for production workloads. It’s meant to be a drop in replacement for nvidia-docker. It’s a simple wrapper to the docker run command that adds your credentials and the WANDB_DOCKER environment variable to the call. If you do not pass the “–runtime” flag and nvidia-docker is available on the machine, this also ensures the runtime is set to nvidia.

Kubernetes

If you run your training workloads in Kubernetes and the k8s API is exposed to your pod (which is the case by default). wandb will query the API for the digest of the docker image and automatically set the WANDB_DOCKER environment variable.

Restoring

If a run was instrumented with the WANDB_DOCKER environment variable, calling wandb restore username/project:run_id will checkout a new branch restoring your code then launch the exact docker image used for training pre-populated with the original command.

9 - Farama Gymnasium

How to integrate W&B with Farama Gymnasium.

If you’re using Farama Gymnasium we will automatically log videos of your environment generated by gymnasium.wrappers.Monitor. Just set the monitor_gym keyword argument to wandb.init to True.

Our gymnasium integration is very light. We simply look at the name of the video file being logged from gymnasium and name it after that or fall back to "videos" if we don’t find a match. If you want more control, you can always just manually log a video.

Check out this report to learn more on how to use Gymnasium with the CleanRL library.

10 - fastai

If you’re using fastai to train your models, W&B has an easy integration using the WandbCallback. Explore the details in interactive docs with examples →

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to https://wandb.ai/authorize. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

Add the `WandbCallback` to the `learner` or `fit` method

import wandb
from fastai.callback.wandb import *

# start logging a wandb run
wandb.init(project="my_project")

# To log only during one training phase
learn.fit(..., cbs=WandbCallback())

# To log continuously for all training phases
learn = learner(..., cbs=WandbCallback())

If you use version 1 of Fastai, refer to the Fastai v1 docs.

WandbCallback Arguments

WandbCallback accepts the following arguments:

Args	Description
log	Whether to log the model’s: `gradients` , `parameters`, `all` or `None` (default). Losses & metrics are always logged.
log_preds	whether we want to log prediction samples (default to `True`).
log_preds_every_epoch	whether to log predictions every epoch or at the end (default to `False`)
log_model	whether we want to log our model (default to False). This also requires `SaveModelCallback`
model_name	The name of the `file` to save, overrides `SaveModelCallback`
log_dataset	`False` (default) `True` will log folder referenced by learn.dls.path. a path can be defined explicitly to reference which folder to log. Note: subfolder “models” is always ignored.
dataset_name	name of logged dataset (default to `folder name`).
valid_dl	`DataLoaders` containing items used for prediction samples (default to random items from `learn.dls.valid`.
n_preds	number of logged predictions (default to 36).
seed	used for defining random samples.

For custom workflows, you can manually log your datasets and models:

log_dataset(path, name=None, metadata={})
log_model(path, name=None, metadata={})

Note: any subfolder “models” will be ignored.

Distributed Training

fastai supports distributed training by using the context manager distrib_ctx. W&B supports this automatically and enables you to track your Multi-GPU experiments out of the box.

Review this minimal example:

import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()

Then, in your terminal you will execute:

$ torchrun --nproc_per_node 2 train.py

in this case, the machine has 2 GPUs.

You can now run distributed training directly inside a notebook.

import wandb
from fastai.vision.all import *

from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"

def train():
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    wandb.init("fastai_ddp", entity="capecape")
    cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fit(1)

notebook_launcher(train, num_processes=2)

Log only on the main process

In the examples above, wandb launches one run per process. At the end of the training, you will end up with two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you will have to detect in which process you are manually and avoid creating runs (calling wandb.init in all other processes)

import wandb
from fastai.vision.all import *
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = rank0_first(lambda: untar_data(URLs.PETS) / "images")

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(sync_bn=False):
        learn.fit(1)

if __name__ == "__main__":
    train()

in your terminal call:

$ torchrun --nproc_per_node 2 train.py

import wandb
from fastai.vision.all import *

from accelerate import notebook_launcher
from fastai.distributed import *
from fastai.callback.wandb import WandbCallback

wandb.require(experiment="service")
path = untar_data(URLs.PETS) / "images"

def train():
    cb = []
    dls = ImageDataLoaders.from_name_func(
        path,
        get_image_files(path),
        valid_pct=0.2,
        label_func=lambda x: x[0].isupper(),
        item_tfms=Resize(224),
    )
    if rank_distrib() == 0:
        run = wandb.init("fastai_ddp", entity="capecape")
        cb = WandbCallback()
    learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
    with learn.distrib_ctx(in_notebook=True, sync_bn=False):
        learn.fit(1)

notebook_launcher(train, num_processes=2)

Examples

Visualize, track, and compare Fastai models: A thoroughly documented walkthrough.
Image Segmentation on CamVid: A sample use case of the integration.

10.1 - fastai v1

This documentation is for fastai v1. If you use the current version of fastai, you should refer to fastai page.

For scripts using fastai v1, we have a callback that can automatically log model topology, losses, metrics, weights, gradients, sample predictions and best trained model.

import wandb
from wandb.fastai import WandbCallback

wandb.init()

learn = cnn_learner(data, model, callback_fns=WandbCallback)
learn.fit(epochs)

Requested logged data is configurable through the callback constructor.

from functools import partial

learn = cnn_learner(
    data, model, callback_fns=partial(WandbCallback, input_type="images")
)

It is also possible to use WandbCallback only when starting training. In this case it must be instantiated.

learn.fit(epochs, callbacks=WandbCallback(learn))

Custom parameters can also be given at that stage.

learn.fit(epochs, callbacks=WandbCallback(learn, input_type="images"))

Example Code

We’ve created a few examples for you to see how the integration works:

Fastai v1

Classify Simpsons characters : A simple demo to track and compare Fastai models
Semantic Segmentation with Fastai: Optimize neural networks on self-driving cars

Options

WandbCallback() class supports a number of options:

Keyword argument	Default	Description
learn	N/A	the fast.ai learner to hook.
save_model	True	save the model if it’s improved at each step. It will also load best model at the end of training.
mode	auto	`min`, `max`, or `auto`: How to compare the training metric specified in `monitor` between steps.
monitor	None	training metric used to measure performance for saving the best model. None defaults to validation loss.
log	gradients	`gradients`, `parameters`, `all`, or None. Losses & metrics are always logged.
input_type	None	`images` or `None`. Used to display sample predictions.
validation_data	None	data used for sample predictions if `input_type` is set.
predictions	36	number of predictions to make if `input_type` is set and `validation_data` is `None`.
seed	12345	initialize random generator for sample predictions if `input_type` is set and `validation_data` is `None`.

11 - Hugging Face Transformers

Try in Colab

The Hugging Face Transformers library makes state-of-the-art NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The W&B integration adds rich, flexible experiment tracking and model versioning to interactive centralized dashboards without compromising that ease of use.

Next-level logging in few lines

os.environ["WANDB_PROJECT"] = "<my-amazing-project>"  # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint"  # log all model checkpoints

from transformers import TrainingArguments, Trainer

args = TrainingArguments(..., report_to="wandb")  # turn on W&B logging
trainer = Trainer(..., args=args)

If you’d rather dive straight into working code, check out this Google Colab.

Get started: track experiments

An API key authenticates your machine to W&B. You can generate an API key from your user profile.

For a more streamlined approach, you can generate an API key by going directly to https://wandb.ai/authorize. Copy the displayed API key and save it in a secure location such as a password manager.

Click your user profile icon in the upper right corner.
Select User Settings, then scroll to the API Keys section.
Click Reveal. Copy the displayed API key. To hide the API key, reload the page.

Install the `wandb` library and log in

To install the wandb library locally and log in:

Set the WANDB_API_KEY environment variable to your API key.
```
export WANDB_API_KEY=<your_api_key>
```
Install the wandb library and log in.
```
pip install wandb

wandb login
```

pip install wandb

import wandb
wandb.login()

!pip install wandb

import wandb
wandb.login()

If you are using W&B for the first time you might want to check out our quickstart

Name the project

A W&B Project is where all of the charts, data, and models logged from related runs are stored. Naming your project helps you organize your work and keep all the information about a single project in one place.

To add a run to a project simply set the WANDB_PROJECT environment variable to the name of your project. The WandbCallback will pick up this project name environment variable and use it when setting up your run.

WANDB_PROJECT=amazon_sentiment_analysis

import os
os.environ["WANDB_PROJECT"]="amazon_sentiment_analysis"

%env WANDB_PROJECT=amazon_sentiment_analysis

Make sure you set the project name before you initialize the Trainer.

If a project name is not specified the project name defaults to huggingface.

Log your training runs to W&B

This is the most important step when defining your Trainer training arguments, either inside your code or from the command line, is to set report_to to "wandb" in order enable logging with W&B.

The logging_steps argument in TrainingArguments will control how often training metrics are pushed to W&B during training. You can also give a name to the training run in W&B using the run_name argument.

That’s it. Now your models will log losses, evaluation metrics, model topology, and gradients to W&B while they train.

python run_glue.py \     # run your Python script
  --report_to wandb \    # enable logging to W&B
  --run_name bert-base-high-lr \   # name of the W&B run (optional)
  # other command line arguments here

from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    # other args and kwargs here
    report_to="wandb",  # enable logging to W&B
    run_name="bert-base-high-lr",  # name of the W&B run (optional)
    logging_steps=1,  # how often to log to W&B
)

trainer = Trainer(
    # other args and kwargs here
    args=args,  # your training args
)

trainer.train()  # start training and logging to W&B

Using TensorFlow? Just swap the PyTorch Trainer for the TensorFlow TFTrainer.

Turn on model checkpointing

Using Artifacts, you can store up to 100GB of models and datasets for free and then use the Weights & Biases Registry. Using Registry, you can register models to explore and evaluate them, prepare them for staging, or deploy them in your production environment.

To log your Hugging Face model checkpoints to Artifacts, set the WANDB_LOG_MODEL environment variable to one of:

checkpoint: Upload a checkpoint every args.save_steps from the TrainingArguments.
end: Upload the model at the end of training, if load_best_model_at_end is also set.
false: Do not upload the model.

WANDB_LOG_MODEL="checkpoint"

import os

os.environ["WANDB_LOG_MODEL"] = "checkpoint"

%env WANDB_LOG_MODEL="checkpoint"

Any Transformers Trainer you initialize from now on will upload models to your W&B project. The model checkpoints you log will be viewable through the Artifacts UI, and include the full model lineage (see an example model checkpoint in the UI here).

By default, your model will be saved to W&B Artifacts as model-{run_id} when WANDB_LOG_MODEL is set to end or checkpoint-{run_id} when WANDB_LOG_MODEL is set to checkpoint. However, If you pass a run_name in your TrainingArguments, the model will be saved as model-{run_name} or checkpoint-{run_name}.

W&B Registry

Once you have logged your checkpoints to Artifacts, you can then register your best model checkpoints and centralize them across your team with Registry. Using Registry, you can organize your best models by task, manage the lifecycles of models, track and audit the entire ML lifecyle, and automate downstream actions.

To link a model Artifact, refer to Registry.

Visualise evaluation outputs during training

Visualing your model outputs during training or evaluation is often essential to really understand how your model is training.

By using the callbacks system in the Transformers Trainer, you can log additional helpful data to W&B such as your models’ text generation outputs or other predictions to W&B Tables.

See the Custom logging section below for a full guide on how to log evaluation outputs while training to log to a W&B Table like this:

Shows a W&B Table with evaluation outputs

Finish your W&B Run (Notebook only)

If your training is encapsulated in a Python script, the W&B run will end when your script finishes.

If you are using a Jupyter or Google Colab notebook, you’ll need to tell us when you’re done with training by calling wandb.finish().

trainer.train()  # start training and logging to W&B

# post-training analysis, testing, other logged code

wandb.finish()

Visualize your results

Once you have logged your training results you can explore your results dynamically in the W&B Dashboard. It’s easy to compare across dozens of runs at once, zoom in on interesting findings, and coax insights out of complex data with flexible, interactive visualizations.

Advanced features and FAQs

How do I save the best model?

If you pass TrainingArguments with load_best_model_at_end=True to your Trainer, W&B saves the best performing model checkpoint to Artifacts.

If you save your model checkpoints as Artifacts, you can promote them to the Registry. In Registry, you can:

Organize your best model versions by ML task.
Centralize models and share them with your team.
Stage models for production or bookmark them for further evaluation.
Trigger downstream CI/CD processes.

How do I load a saved model?

If you saved your model to W&B Artifacts with WANDB_LOG_MODEL, you can download your model weights for additional training or to run inference. You just load them back into the same Hugging Face architecture that you used before.

# Create a new run
with wandb.init(project="amazon_sentiment_analysis") as run:
    # Pass the name and version of Artifact
    my_model_name = "model-bert-base-high-lr:latest"
    my_model_artifact = run.use_artifact(my_model_name)

    # Download model weights to a folder and return the path
    model_dir = my_model_artifact.download()

    # Load your Hugging Face model from that folder
    #  using the same model class
    model = AutoModelForSequenceClassification.from_pretrained(
        model_dir, num_labels=num_labels
    )

    # Do additional training, or run inference

How do I resume training from a checkpoint?

If you had set WANDB_LOG_MODEL='checkpoint' you can also resume training by you can using the model_dir as the model_name_or_path argument in your TrainingArguments and pass resume_from_checkpoint=True to Trainer.

last_run_id = "xxxxxxxx"  # fetch the run_id from your wandb workspace

# resume the wandb run from the run_id
with wandb.init(
    project=os.environ["WANDB_PROJECT"],
    id=last_run_id,
    resume="must",
) as run:
    # Connect an Artifact to the run
    my_checkpoint_name = f"checkpoint-{last_run_id}:latest"
    my_checkpoint_artifact = run.use_artifact(my_model_name)

    # Download checkpoint to a folder and return the path
    checkpoint_dir = my_checkpoint_artifact.download()

    # reinitialize your model and trainer
    model = AutoModelForSequenceClassification.from_pretrained(
        "<model_name>", num_labels=num_labels
    )
    # your awesome training arguments here.
    training_args = TrainingArguments()

    trainer = Trainer(model=model, args=training_args)

    # make sure use the checkpoint dir to resume training from the checkpoint
    trainer.train(resume_from_checkpoint=checkpoint_dir)

How do I log and view evaluation samples during training

Logging to W&B via the Transformers Trainer is taken care of by the WandbCallback in the Transformers library. If you need to customize your Hugging Face logging you can modify this callback by subclassing WandbCallback and adding additional functionality that leverages additional methods from the Trainer class.

Below is the general pattern to add this new callback to the HF Trainer, and further down is a code-complete example to log evaluation outputs to a W&B Table:

# Instantiate the Trainer as normal
trainer = Trainer()

# Instantiate the new logging callback, passing it the Trainer object
evals_callback = WandbEvalsCallback(trainer, tokenizer, ...)

# Add the callback to the Trainer
trainer.add_callback(evals_callback)

# Begin Trainer training as normal
trainer.train()

View evaluation samples during training

The following section shows how to customize the WandbCallback to run model predictions and log evaluation samples to a W&B Table during training. We will every eval_steps using the on_evaluate method of the Trainer callback.

Here, we wrote a decode_predictions function to decode the predictions and labels from the model output using the tokenizer.

Then, we create a pandas DataFrame from the predictions and labels and add an epoch column to the DataFrame.

Finally, we create a wandb.Table from the DataFrame and log it to wandb. Additionally, we can control the frequency of logging by logging the predictions every freq epochs.

Note: Unlike the regular WandbCallback this custom callback needs to be added to the trainer after the Trainer is instantiated and not during initialization of the Trainer. This is because the Trainer instance is passed to the callback during initialization.

from transformers.integrations import WandbCallback
import pandas as pd


def decode_predictions(tokenizer, predictions):
    labels = tokenizer.batch_decode(predictions.label_ids)
    logits = predictions.predictions.argmax(axis=-1)
    prediction_text = tokenizer.batch_decode(logits)
    return {"labels": labels, "predictions": prediction_text}


class WandbPredictionProgressCallback(WandbCallback):
    """Custom WandbCallback to log model predictions during training.

    This callback logs model predictions and labels to a wandb.Table at each
    logging step during training. It allows to visualize the
    model predictions as the training progresses.

    Attributes:
        trainer (Trainer): The Hugging Face Trainer instance.
        tokenizer (AutoTokenizer): The tokenizer associated with the model.
        sample_dataset (Dataset): A subset of the validation dataset
          for generating predictions.
        num_samples (int, optional): Number of samples to select from
          the validation dataset for generating predictions. Defaults to 100.
        freq (int, optional): Frequency of logging. Defaults to 2.
    """

    def __init__(self, trainer, tokenizer, val_dataset, num_samples=100, freq=2):
        """Initializes the WandbPredictionProgressCallback instance.

        Args:
            trainer (Trainer): The Hugging Face Trainer instance.
            tokenizer (AutoTokenizer): The tokenizer associated
              with the model.
            val_dataset (Dataset): The validation dataset.
            num_samples (int, optional): Number of samples to select from
              the validation dataset for generating predictions.
              Defaults to 100.
            freq (int, optional): Frequency of logging. Defaults to 2.
        """
        super().__init__()
        self.trainer = trainer
        self.tokenizer = tokenizer
        self.sample_dataset = val_dataset.select(range(num_samples))
        self.freq = freq

    def on_evaluate(self, args, state, control, **kwargs):
        super().on_evaluate(args, state, control, **kwargs)
        # control the frequency of logging by logging the predictions
        # every `freq` epochs
        if state.epoch % self.freq == 0:
            # generate predictions
            predictions = self.trainer.predict(self.sample_dataset)
            # decode predictions and labels
            predictions = decode_predictions(self.tokenizer, predictions)
            # add predictions to a wandb.Table
            predictions_df = pd.DataFrame(predictions)
            predictions_df["epoch"] = state.epoch
            records_table = self._wandb.Table(dataframe=predictions_df)
            # log the table to wandb
            self._wandb.log({"sample_predictions": records_table})


# First, instantiate the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=lm_datasets["train"],
    eval_dataset=lm_datasets["validation"],
)

# Instantiate the WandbPredictionProgressCallback
progress_callback = WandbPredictionProgressCallback(
    trainer=trainer,
    tokenizer=tokenizer,
    val_dataset=lm_dataset["validation"],
    num_samples=10,
    freq=2,
)

# Add the callback to the trainer
trainer.add_callback(progress_callback)

For a more detailed example please refer to this colab

What additional W&B settings are available?

Further configuration of what is logged with Trainer is possible by setting environment variables. A full list of W&B environment variables can be found here.

Environment Variable	Usage
`WANDB_PROJECT`	Give your project a name (`huggingface` by default)
`WANDB_LOG_MODEL`	Log the model checkpoint as a W&B Artifact (`false` by default) `false` (default): No model checkpointing `checkpoint`: A checkpoint will be uploaded every args.save_steps (set in the Trainer’s TrainingArguments). `end`: The final model checkpoint will be uploaded at the end of training.
`WANDB_WATCH`	Set whether you’d like to log your models gradients, parameters or neither `false` (default): No gradient or parameter logging `gradients`: Log histograms of the gradients `all`: Log histograms of gradients and parameters
`WANDB_DISABLED`	Set to `true` to turn off logging entirely (`false` by default)
`WANDB_QUIET`.	Set to `true` to limit statements logged to standard output to critical statements only (`false` by default)
`WANDB_SILENT`	Set to `true` to silence the output printed by wandb (`false` by default)

WANDB_WATCH=all
WANDB_SILENT=true

%env WANDB_WATCH=all
%env WANDB_SILENT=true

How do I customize `wandb.init`?

The WandbCallback that Trainer uses will call wandb.init under the hood when Trainer is initialized. You can alternatively set up your runs manually by calling wandb.init before theTrainer is initialized. This gives you full control over your W&B run configuration.

An example of what you might want to pass to init is below. For more details on how to use wandb.init, check out the reference documentation.

wandb.init(
    project="amazon_sentiment_analysis",
    name="bert-base-high-lr",
    tags=["baseline", "high-lr"],
    group="bert",
)

Additional resources

Below are 6 Transformers and W&B related articles you might enjoy

Hyperparameter Optimization for Hugging Face Transformers

Three strategies for hyperparameter optimization for Hugging Face Transformers are compared: Grid Search, Bayesian Optimization, and Population Based Training.
We use a standard uncased BERT model from Hugging Face transformers, and we want to fine-tune on the RTE dataset from the SuperGLUE benchmark
Results show that Population Based Training is the most effective approach to hyperparameter optimization of our Hugging Face transformer model.

Read the full report here.

Hugging Tweets: Train a Model to Generate Tweets

In the article, the author demonstrates how to fine-tune a pre-trained GPT2 HuggingFace Transformer model on anyone’s Tweets in five minutes.
The model uses the following pipeline: Downloading Tweets, Optimizing the Dataset, Initial Experiments, Comparing Losses Between Users, Fine-Tuning the Model.

Read the full report here.

Sentence Classification With Hugging Face BERT and WB

In this article, we’ll build a sentence classifier leveraging the power of recent breakthroughs in Natural Language Processing, focusing on an application of transfer learning to NLP.
We’ll be using The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification, which is a set of sentences labeled as grammatically correct or incorrect that was first published in May 2018.
We’ll use Google’s BERT to create high performance models with minimal effort on a range of NLP tasks.

Read the full report here.

A Step by Step Guide to Tracking Hugging Face Model Performance

We use W&B and Hugging Face transformers to train DistilBERT, a Transformer that’s 40% smaller than BERT but retains 97% of BERT’s accuracy, on the GLUE benchmark
The GLUE benchmark is a collection of nine datasets and tasks for training NLP models

Read the full report here.

Examples of Early Stopping in HuggingFace

Fine-tuning a Hugging Face Transformer using Early Stopping regularization can be done natively in PyTorch or TensorFlow.
Using the EarlyStopping callback in TensorFlow is straightforward with the tf.keras.callbacks.EarlyStoppingcallback.
In PyTorch, there is not an off-the-shelf early stopping method, but there is a working early stopping hook available on GitHub Gist.

Read the full report here.

How to Fine-Tune Hugging Face Transformers on a Custom Dataset

We fine tune a DistilBERT transformer for sentiment analysis (binary classification) on a custom IMDB dataset.

Read the full report here.

Get help or request features

For any issues, questions, or feature requests for the Hugging Face W&B integration, feel free to post in this thread on the Hugging Face forums or open an issue on the Hugging Face Transformers GitHub repo.

12 - Hugging Face Diffusers

Try in Colab

Hugging Face Diffusers is the go-to library for state-of-the-art pre-trained diffusion models for generating images, audio, and even 3D structures of molecules. The W&B integration adds rich, flexible experiment tracking, media visualization, pipeline architecture, and configuration management to interactive centralized dashboards without compromising that ease of use.

Next-level logging in just two lines

Log all the prompts, negative prompts, generated media, and configs associated with your experiment by simply including 2 lines of code. Here are the 2 lines of code to begin logging:

# import the autolog function
from wandb.integration.diffusers import autolog

# call the autolog before calling the pipeline
autolog(init=dict(project="diffusers_logging"))


An example of how the results of your experiment are logged.

Get started

Install diffusers, transformers, accelerate, and wandb.

Command line:

pip install --upgrade diffusers transformers accelerate wandb

Notebook:

!pip install --upgrade diffusers transformers accelerate wandb

Use autolog to initialize a Weights & Biases run and automatically track the inputs and the outputs from all supported pipeline calls.

You can call the autolog() function with the init parameter, which accepts a dictionary of parameters required by wandb.init().

When you call autolog(), it initializes a Weights & Biases run and automatically tracks the inputs and the outputs from all supported pipeline calls.
- Each pipeline call is tracked into its own table in the workspace, and the configs associated with the pipeline call is appended to the list of workflows in the configs for that run.
- The prompts, negative prompts, and the generated media are logged in a wandb.Table.
- All other configs associated with the experiment including seed and the pipeline architecture are stored in the config section for the run.
- The generated media for each pipeline call are also logged in media panels in the run.
```
You can find a list of supported pipeline calls [here](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). In case, you want to request a new feature of this integration or report a bug associated with it, please open an issue on [https://github.com/wandb/wandb/issues](https://github.com/wandb/wandb/issues).
```

Examples

Autologging

Here is a brief end-to-end example of the autolog in action:

import torch
from diffusers import DiffusionPipeline

# import the autolog function
from wandb.integration.diffusers import autolog

# call the autolog before calling the pipeline
autolog(init=dict(project="diffusers_logging"))

# Initialize the diffusion pipeline
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
).to("cuda")

# Define the prompts, negative prompts, and seed.
prompt = ["a photograph of an astronaut riding a horse", "a photograph of a dragon"]
negative_prompt = ["ugly, deformed", "ugly, deformed"]
generator = torch.Generator(device="cpu").manual_seed(10)

# call the pipeline to generate the images
images = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    num_images_per_prompt=2,
    generator=generator,
)

import torch
from diffusers import DiffusionPipeline

import wandb

# import the autolog function
from wandb.integration.diffusers import autolog

# call the autolog before calling the pipeline
autolog(init=dict(project="diffusers_logging"))

# Initialize the diffusion pipeline
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
).to("cuda")

# Define the prompts, negative prompts, and seed.
prompt = ["a photograph of an astronaut riding a horse", "a photograph of a dragon"]
negative_prompt = ["ugly, deformed", "ugly, deformed"]
generator = torch.Generator(device="cpu").manual_seed(10)

# call the pipeline to generate the images
images = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    num_images_per_prompt=2,
    generator=generator,
)

# Finish the experiment
wandb.finish()

The results of a single experiment:
The results of multiple experiments:
The config of an experiment:

You need to explicitly call wandb.finish() when executing the code in IPython notebook environments after calling the pipeline. This is not necessary when executing python scripts.

Tracking multi-pipeline workflows

This section demonstrates the autolog with a typical Stable Diffusion XL + Refiner workflow, in which the latents generated by the StableDiffusionXLPipeline is refined by the corresponding refiner.

Try in Colab

import torch
from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline
from wandb.integration.diffusers import autolog

# initialize the SDXL base pipeline
base_pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
base_pipeline.enable_model_cpu_offload()

# initialize the SDXL refiner pipeline
refiner_pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base_pipeline.text_encoder_2,
    vae=base_pipeline.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner_pipeline.enable_model_cpu_offload()

prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt = "static, frame, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing"

# Make the experiment reproducible by controlling randomness.
# The seed would be automatically logged to WandB.
seed = 42
generator_base = torch.Generator(device="cuda").manual_seed(seed)
generator_refiner = torch.Generator(device="cuda").manual_seed(seed)

# Call WandB Autolog for Diffusers. This would automatically log
# the prompts, generated images, pipeline architecture and all
# associated experiment configs to Weights & Biases, thus making your
# image generation experiments easy to reproduce, share and analyze.
autolog(init=dict(project="sdxl"))

# Call the base pipeline to generate the latents
image = base_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    output_type="latent",
    generator=generator_base,
).images[0]

# Call the refiner pipeline to generate the refined image
image = refiner_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image[None, :],
    generator=generator_refiner,
).images[0]

import torch
from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline

import wandb
from wandb.integration.diffusers import autolog

# initialize the SDXL base pipeline
base_pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
base_pipeline.enable_model_cpu_offload()

# initialize the SDXL refiner pipeline
refiner_pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base_pipeline.text_encoder_2,
    vae=base_pipeline.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner_pipeline.enable_model_cpu_offload()

prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt = "static, frame, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing"

# Make the experiment reproducible by controlling randomness.
# The seed would be automatically logged to WandB.
seed = 42
generator_base = torch.Generator(device="cuda").manual_seed(seed)
generator_refiner = torch.Generator(device="cuda").manual_seed(seed)

# Call WandB Autolog for Diffusers. This would automatically log
# the prompts, generated images, pipeline architecture and all
# associated experiment configs to Weights & Biases, thus making your
# image generation experiments easy to reproduce, share and analyze.
autolog(init=dict(project="sdxl"))

# Call the base pipeline to generate the latents
image = base_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    output_type="latent",
    generator=generator_base,
).images[0]

# Call the refiner pipeline to generate the refined image
image = refiner_pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image[None, :],
    generator=generator_refiner,
).images[0]

# Finish the experiment
wandb.finish()

Example of a Stable Diffisuion XL + Refiner experiment:

More resources

Parameter	Description
`log_freq`	(`epoch`, `batch`, or an `int`): if `epoch`, logs metrics at the end of each epoch. If `batch`, logs metrics at the end of each batch. If an `int`, logs metrics at the end of that many batches. Defaults to `epoch`.
`initial_global_step`	(int): Use this argument to correctly log the learning rate when you resume training from some initial_epoch, and a learning rate scheduler is used. This can be computed as step_size * initial_step. Defaults to 0.

Parameter	Description
`filepath`	(str): path to save the mode file.
`monitor`	(str): The metric name to monitor.
`verbose`	(int): Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action.
`save_best_only`	(Boolean): if `save_best_only=True`, it only saves the latest model or the model it considers the best, according to the defined by the `monitor` and `mode` attributes.
`save_weights_only`	(Boolean): if True, saves only the model’s weights.
`mode`	(`auto`, `min`, or `max`): For `val_acc`, set it to `max`, for `val_loss`, set it to `min`, and so on
`save_freq`	(“epoch” or int): When using ‘epoch’, the callback saves the model after each epoch. When using an integer, the callback saves the model at end of this many batches. Note that when monitoring validation metrics such as `val_acc` or `val_loss`, `save_freq` must be set to “epoch” as those metrics are only available at the end of an epoch.
`options`	(str): Optional `tf.train.CheckpointOptions` object if `save_weights_only` is true or optional `tf.saved_model.SaveOptions` object if `save_weights_only` is false.
`initial_value_threshold`	(float): Floating point initial “best” value of the metric to be monitored.

Parameter	Description
`data_table_columns`	(list) List of column names for the `data_table`
`pred_table_columns`	(list) List of column names for the `pred_table`

Arguments
`monitor`	(str) name of metric to monitor. Defaults to `val_loss`.
`mode`	(str) one of {`auto`, `min`, `max`}. `min` - save model when monitor is minimized `max` - save model when monitor is maximized `auto` - try to guess when to save the model (default).
`save_model`	True - save a model when monitor beats all previous epochs False - don’t save models
`save_graph`	(boolean) if True save model graph to wandb (default to True).
`save_weights_only`	(boolean) if True, saves only the model’s weights(`model.save_weights(filepath)`). Otherwise, saves the full model).
`log_weights`	(boolean) if True save histograms of the model’s layer’s weights.
`log_gradients`	(boolean) if True log histograms of the training gradients
`training_data`	(tuple) Same format `(X,y)` as passed to `model.fit`. This is needed for calculating gradients - this is mandatory if `log_gradients` is `True`.
`validation_data`	(tuple) Same format `(X,y)` as passed to `model.fit`. A set of data for wandb to visualize. If you set this field, every epoch, wandb makes a small number of predictions and saves the results for later visualization.
`generator`	(generator) a generator that returns validation data for wandb to visualize. This generator should return tuples `(X,y)`. Either `validate_data` or generator should be set for wandb to visualize specific data examples.
`validation_steps`	(int) if `validation_data` is a generator, how many steps to run the generator for the full validation set.
`labels`	(list) If you are visualizing your data with wandb this list of labels converts numeric output to understandable string if you are building a classifier with multiple classes. For a binary classifier, you can pass in a list of two labels [`label for false`, `label for true`]. If `validate_data` and `generator` are both false, this does nothing.
`predictions`	(int) the number of predictions to make for visualization each epoch, max is 100.
`input_type`	(string) type of the model input to help visualization. can be one of: (`image`, `images`, `segmentation_mask`).
`output_type`	(string) type of the model output to help visualziation. can be one of: (`image`, `images`, `segmentation_mask`).
`log_evaluation`	(boolean) if True, save a Table containing validation data and the model’s predictions at each epoch. See `validation_indexes`, `validation_row_processor`, and `output_row_processor` for additional details.
`class_colors`	([float, float, float]) if the input or output is a segmentation mask, an array containing an rgb tuple (range 0-1) for each class.
`log_batch_frequency`	(integer) if None, callback logs every epoch. If set to integer, callback logs training metrics every `log_batch_frequency` batches.
`log_best_prefix`	(string) if None, saves no extra summary metrics. If set to a string, prepends the monitored metric and epoch with the prefix and saves the results as summary metrics.
`validation_indexes`	([wandb.data_types._TableLinkMixin]) an ordered list of index keys to associate with each validation example. If `log_evaluation` is True and you provide `validation_indexes`, does not create a Table of validation data. Instead, associates each prediction with the row represented by the `TableLinkMixin`. To obtain a list of row keys, use `Table.get_index()` .
`validation_row_processor`	(Callable) a function to apply to the validation data, commonly used to visualize the data. The function receives an `ndx` (int) and a `row` (dict). If your model has a single input, then `row["input"]` contains the input data for the row. Otherwise, it contains the names of the input slots. If your fit function takes a single target, then `row["target"]` contains the target data for the row. Otherwise, it contains the names of the output slots. For example, if your input data is a single array, to visualize the data as an Image, provide `lambda ndx, row: {"img": wandb.Image(row["input"])}` as the processor. Ignored if `log_evaluation` is False or `validation_indexes` are present.
`output_row_processor`	(Callable) same as `validation_row_processor`, but applied to the model’s output. `row["output"]` contains the results of the model output.
`infer_missing_processors`	(Boolean) Determines whether to infer `validation_row_processor` and `output_row_processor` if they are missing. Defaults to True. If you provide `labels`, W&B attempts to infer classification-type processors where appropriate.
`log_evaluation_frequency`	(int) Determines how often to log evaluation results. Defaults to `0` to log only at the end of training. Set to 1 to log every epoch, 2 to log every other epoch, and so on. Has no effect when `log_evaluation` is False.

Kubeflow Pipelines	W&B	Location in W&B
Input Scalar	`config`	Overview tab
Output Scalar	`summary`	Overview tab
Input Artifact	Input Artifact	Artifacts tab
Output Artifact	Output Artifact	Artifacts tab

Data	Client library	UI
`Parameter(...)`	`wandb.config`	Overview tab, Config
`datasets`, `models`, `others`	`wandb.use_artifact("{var_name}:latest")`	Artifacts tab
Base Python types (`dict`, `list`, `str`, etc.)	`wandb.summary`	Overview tab, Summary

kwarg	Options
`datasets`	`True`: Log instance variables that are a dataset `False`
`models`	`True`: Log instance variables that are a model `False`
`others`	`True`: Log anything else that is serializable as a pickle `False`
`settings`	`wandb.Settings(…)`: Specify your own `wandb` settings for this step or flow `None`: Equivalent to passing `wandb.Settings()` By default, if: `settings.run_group` is `None`, it will be set to `{flow_name}/{run_id}` `settings.run_job_type` is `None`, it will be set to `{run_job_type}/{step_name}`

Logging Setting	Type
default (always on)	`dict, list, set, str, int, float, bool`
`datasets`	`pd.DataFrame` `pathlib.Path`
`models`	`nn.Module` `sklearn.base.BaseEstimator`
`others`	Anything that is pickle-able and JSON serializable

Kind of Variable	behavior	Example	Data Type
Instance	Auto-logged	`self.accuracy`	`float`
Instance	Logged if `datasets=True`	`self.df`	`pd.DataFrame`
Instance	Not logged if `datasets=False`	`self.df`	`pd.DataFrame`
Local	Never logged	`accuracy`	`float`
Local	Never logged	`df`	`pd.DataFrame`

Parameter	Description
`project`	W&B project name (str, optional)
`group`	W&B group name (str, optional)
`name`	W&B run name. If not specified, the State.run_name is used (str, optional)
`entity`	W&B entity name, such as your username or W&B Team name (str, optional)
`tags`	W&B tags (List[str], optional)
`log_artifacts`	Whether to log checkpoints to wandb, default: `false` (bool, optional)
`rank_zero_only`	Whether to log only on the rank-zero process. When logging artifacts, it is highly recommended to log on all ranks. Artifacts from ranks ≥1 are not stored, which may discard pertinent information. For example, when using Deepspeed ZeRO, it would be impossible to restore from checkpoints without artifacts from all ranks, default: `True` (bool, optional)
`init_kwargs`	Params to pass to `wandb.init` such as your wandb `config` etc See here for the full list `wandb.init` accepts

Argument	Description
fine_tune_job_id	This is the OpenAI Fine-Tune ID which you get when you create your fine-tune job using `client.fine_tuning.jobs.create`. If this argument is None (default), all the OpenAI fine-tune jobs that haven’t already been synced will be synced to W&B.
openai_client	Pass an initialized OpenAI client to `sync`. If no client is provided, one is initialized by the logger itself. By default it is None.
num_fine_tunes	If no ID is provided, then all the unsynced fine-tunes will be logged to W&B. This argument allows you to select the number of recent fine-tunes to sync. If num_fine_tunes is 5, it selects the 5 most recent fine-tunes.
project	Weights and Biases project name where your fine-tune metrics, models, data, etc. will be logged. By default, the project name is “OpenAI-Fine-Tune.”
entity	W&B Username or team name where you’re sending runs. By default, your default entity is used, which is usually your username.
overwrite	Forces logging and overwrite existing wandb run of the same fine-tune job. By default this is False.
wait_for_job_success	Once an OpenAI fine-tuning job is started it usually takes a bit of time. To ensure that your metrics are logged to W&B as soon as the fine-tune job is finished, this setting will check every 60 seconds for the status of the fine-tune job to change to `succeeded`. Once the fine-tune job is detected as being successful, the metrics will be synced automatically to W&B. Set to True by default.
model_artifact_name	The name of the model artifact that is logged. Defaults to `"model-metadata"`.
model_artifact_type	The type of the model artifact that is logged. Defaults to `"model"`.
**kwargs_wandb_init	Aany additional argument passed directly to `wandb.init()`

Metric	Description
`loss`	The loss of the model
`lr`	The learning rate
`tokens_per_second`	The tokens per second of the model
`grad_norm`	The gradient norm of the model
`global_step`	Corresponds to the current step in the training loop. Takes into account gradient accumulation, basically every time an optimizer step is taken, the model is updated, the gradients are accumulated and the model is updated once every `gradient_accumulation_steps`

Parameter	Description
`project`	Define what wandb Project to log to
`name`	Give a name to your wandb run
`log_model`	Log all models if `log_model="all"` or at end of training if `log_model=True`
`save_dir`	Path where data is saved

Parameter	Type	Description
`wandb_run`	`wandb.wandb_run`. Run	wandb run used to log data.
`save_model`	bool (default=True)	Whether to save a checkpoint of the best model and upload it to your Run on W&B servers.
`keys_ignored`	str or list of str (default=None)	Key or list of keys that should not be logged to tensorboard. Note that in addition to the keys provided by the user, keys such as those starting with `event_` or ending on `_best` are ignored by default.

Method	Description
`initialize`()	(Re-)Set the initial state of the callback.
`on_batch_begin`(net[, X, y, training])	Called at the beginning of each batch.
`on_batch_end`(net[, X, y, training])	Called at the end of each batch.
`on_epoch_begin`(net[, dataset_train, …])	Called at the beginning of each epoch.
`on_epoch_end`(net, **kwargs)	Log values from the last history step and save best model
`on_grad_computed`(net, named_parameters[, X, …])	Called once per batch after gradients have been computed but before an update step was performed.
`on_train_begin`(net, **kwargs)	Log model topology and add a hook for gradients
`on_train_end`(net[, X, y])	Called at the end of training.

Name	Description
`project_name`	`str`. The name of the W&B Project. The project will be created automatically if it doesn’t exist yet.
`remove_config_values`	`List[str]` . A list of values to exclude from the config before it is uploaded to W&B. `[]` by default.
`model_log_interval`	`Optional int`. `None` by default. If set, enables model versioning with Artifacts. Pass in the number of steps to wait between logging model checkpoints. `None` by default.
`log_dataset_dir`	`Optional str`. If passed a path, the dataset will be uploaded as an Artifact at the beginning of training. `None` by default.
`entity`	`Optional str` . If passed, the run will be created in the specified entity
`run_name`	`Optional str` . If specified, the run will be created with the specified name.

Argument	Usage
`verbose`	The verbosity of sb3 output
`model_save_path`	Path to the folder where the model will be saved, The default value is `None` so the model is not logged
`model_save_freq`	Frequency to save the model
`gradient_save_freq`	Frequency to log gradient. The default value is 0 so the gradients are not logged

Cloud provider	Credentials	Logging directory format
S3	`aws configure`	`s3://bucket/path/to/logs`
GCS	`gcloud auth application-default login`	`gs://bucket/path/to/logs`
Azure	`az login`¹	`az://account/container/path/to/logs`

You must also set the AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY environment variables. ↩︎

Integrations

Related resources

1 - Add wandb to any library

Add wandb to any library

Setup requirements

Require W&B on installation

Make W&B optional on installation

User login

Create an API key

Install the wandb library and log in

Start a run

When to call wandb.init?

Use wandb as an optional dependency

Define a run config

Update the run config

Log to W&B

Log metrics

Prevent x-axis misalignments

Log images, tables, audio, and more

Distributed training

Log model checkpoints and more

Log model checkpoints

Log and track pre-trained models or datasets

Download an artifact

Tune hyper-parameters

Advanced integrations

2 - Azure OpenAI Fine-Tuning

Introduction

Prerequisites

Workflow overview

1. Fine-tuning setup

2. Experiment tracking

3. Model evaluation

Real-world example

Additional resources

3 - Catalyst

Interactive Example

4 - Cohere fine-tuning

Log your Cohere fine-tuning results

Organize runs

Resources

5 - Databricks

Configure Databricks

Examples

Simple example

Sweeps

6 - DeepChecks

Getting Started

Example

7 - DeepChem

DeepChem logging in 3 lines of code

Report and Google Colab

Track experiments

Sign up and create an API key

Install the wandb library and log in

Log your training and evaluation data to W&B

8 - Docker

Docker Integration

Local Development

Production

Kubernetes

Restoring

9 - Farama Gymnasium

10 - fastai

Sign up and create an API key

Install the wandb library and log in

Add the WandbCallback to the learner or fit method

WandbCallback Arguments

Distributed Training

Log only on the main process

Examples

10.1 - fastai v1

Example Code

Options

11 - Hugging Face Transformers

Next-level logging in few lines

Get started: track experiments

Sign up and create an API key

Install the wandb library and log in

Name the project

Install the `wandb` library and log in

When to call `wandb.init`?

Use `wandb` as an optional dependency

Install the `wandb` library and log in

Install the `wandb` library and log in

Add the `WandbCallback` to the `learner` or `fit` method

Install the `wandb` library and log in

How do I customize `wandb.init`?

Train using `autotrain`

Track experiments with `WandbMetricsLogger`

`WandbMetricsLogger` reference

Checkpoint a model using `WandbModelCheckpoint`

`WandbModelCheckpoint` reference

Visualize model predictions using `WandbEvalCallback`

`WandbEvalCallback` reference

`WandbCallback` [legacy]

`WandbCallback` reference

How do I use `Keras` multiprocessing with `wandb`?

Install the `wandb` library and log in

With explicit `wandb.log_artifacts` calls