This is the multi-page printable view of this section. Click here to print.
Integration tutorials
- 1: PyTorch
- 2: PyTorch Lightning
- 3: Hugging Face
- 4: TensorFlow
- 5: TensorFlow Sweeps
- 6: 3D brain tumor segmentation with MONAI
- 7: Keras
- 8: Keras models
- 9: Keras tables
- 10: XGBoost Sweeps
1 - PyTorch
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.
What this notebook covers
We show you how to integrate Weights & Biases with your PyTorch code to add experiment tracking to your pipeline.
# import the library
import wandb
# start a new experiment
wandb.init(project="new-sota-model")
# capture a dictionary of hyperparameters with config
wandb.config = {"learning_rate": 0.001, "epochs": 100, "batch_size": 128}
# set up model and data
model, dataloader = get_model(), get_data()
# optional: track gradients
wandb.watch(model)
for batch in dataloader:
metrics = model.training_step()
# log metrics inside your training loop to visualize model performance
wandb.log(metrics)
# optional: save model at the end
model.to_onnx()
wandb.save("model.onnx")
Follow along with a video tutorial.
Note: Sections starting with Step are all you need to integrate W&B in an existing pipeline. The rest just loads data and defines a model.
Install, import, and log in
import os
import random
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.auto import tqdm
# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)
# Device configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# remove slow mirror from list of MNIST mirrors
torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors
if not mirror.startswith("http://yann.lecun.com")]
0️⃣ Step 0: Install W&B
To get started, we’ll need to get the library.
wandb
is easily installed using pip
.
!pip install wandb onnx -Uq
1️⃣ Step 1: Import W&B and Login
In order to log data to our web service, you’ll need to log in.
If this is your first time using W&B, you’ll need to sign up for a free account at the link that appears.
import wandb
wandb.login()
Define the Experiment and Pipeline
Track metadata and hyperparameters with wandb.init
Programmatically, the first thing we do is define our experiment: what are the hyperparameters? what metadata is associated with this run?
It’s a pretty common workflow to store this information in a config
dictionary
(or similar object)
and then access it as needed.
For this example, we’re only letting a few hyperparameters vary
and hand-coding the rest.
But any part of your model can be part of the config
.
We also include some metadata: we’re using the MNIST dataset and a convolutional architecture. If we later work with, say, fully connected architectures on CIFAR in the same project, this will help us separate our runs.
config = dict(
epochs=5,
classes=10,
kernels=[16, 32],
batch_size=128,
learning_rate=0.005,
dataset="MNIST",
architecture="CNN")
Now, let’s define the overall pipeline, which is pretty typical for model-training:
- we first
make
a model, plus associated data and optimizer, then - we
train
the model accordingly and finally test
it to see how training went.
We’ll implement these functions below.
def model_pipeline(hyperparameters):
# tell wandb to get started
with wandb.init(project="pytorch-demo", config=hyperparameters):
# access all HPs through wandb.config, so logging matches execution.
config = wandb.config
# make the model, data, and optimization problem
model, train_loader, test_loader, criterion, optimizer = make(config)
print(model)
# and use them to train the model
train(model, train_loader, criterion, optimizer, config)
# and test its final performance
test(model, test_loader)
return model
The only difference here from a standard pipeline
is that it all occurs inside the context of wandb.init
.
Calling this function sets up a line of communication
between your code and our servers.
Passing the config
dictionary to wandb.init
immediately logs all that information to us,
so you’ll always know what hyperparameter values
you set your experiment to use.
To ensure the values you chose and logged are always the ones that get used
in your model, we recommend using the wandb.config
copy of your object.
Check the definition of make
below to see some examples.
Side Note: We take care to run our code in separate processes, so that any issues on our end (such as if a giant sea monster attacks our data centers) don’t crash your code. Once the issue is resolved, such as when the Kraken returns to the deep, you can log the data with
wandb sync
.
def make(config):
# Make the data
train, test = get_data(train=True), get_data(train=False)
train_loader = make_loader(train, batch_size=config.batch_size)
test_loader = make_loader(test, batch_size=config.batch_size)
# Make the model
model = ConvNet(config.kernels, config.classes).to(device)
# Make the loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(
model.parameters(), lr=config.learning_rate)
return model, train_loader, test_loader, criterion, optimizer
Define the Data Loading and Model
Now, we need to specify how the data is loaded and what the model looks like.
This part is very important, but it’s
no different from what it would be without wandb
,
so we won’t dwell on it.
def get_data(slice=5, train=True):
full_dataset = torchvision.datasets.MNIST(root=".",
train=train,
transform=transforms.ToTensor(),
download=True)
# equiv to slicing with [::slice]
sub_dataset = torch.utils.data.Subset(
full_dataset, indices=range(0, len(full_dataset), slice))
return sub_dataset
def make_loader(dataset, batch_size):
loader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=batch_size,
shuffle=True,
pin_memory=True, num_workers=2)
return loader
Defining the model is normally the fun part.
But nothing changes with wandb
,
so we’re gonna stick with a standard ConvNet architecture.
Don’t be afraid to mess around with this and try some experiments – all your results will be logged on wandb.ai.
# Conventional and convolutional neural network
class ConvNet(nn.Module):
def __init__(self, kernels, classes=10):
super(ConvNet, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(
nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2))
self.fc = nn.Linear(7 * 7 * kernels[-1], classes)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.reshape(out.size(0), -1)
out = self.fc(out)
return out
Define Training Logic
Moving on in our model_pipeline
, it’s time to specify how we train
.
Two wandb
functions come into play here: watch
and log
.
Track gradients with wandb.watch
and everything else with wandb.log
wandb.watch
will log the gradients and the parameters of your model,
every log_freq
steps of training.
All you need to do is call it before you start training.
The rest of the training code remains the same:
we iterate over epochs and batches,
running forward and backward passes
and applying our optimizer
.
def train(model, loader, criterion, optimizer, config):
# Tell wandb to watch what the model gets up to: gradients, weights, and more.
wandb.watch(model, criterion, log="all", log_freq=10)
# Run training and track with wandb
total_batches = len(loader) * config.epochs
example_ct = 0 # number of examples seen
batch_ct = 0
for epoch in tqdm(range(config.epochs)):
for _, (images, labels) in enumerate(loader):
loss = train_batch(images, labels, model, optimizer, criterion)
example_ct += len(images)
batch_ct += 1
# Report metrics every 25th batch
if ((batch_ct + 1) % 25) == 0:
train_log(loss, example_ct, epoch)
def train_batch(images, labels, model, optimizer, criterion):
images, labels = images.to(device), labels.to(device)
# Forward pass ➡
outputs = model(images)
loss = criterion(outputs, labels)
# Backward pass ⬅
optimizer.zero_grad()
loss.backward()
# Step with optimizer
optimizer.step()
return loss
The only difference is in the logging code:
where previously you might have reported metrics by printing to the terminal,
now you pass the same information to wandb.log
.
wandb.log
expects a dictionary with strings as keys.
These strings identify the objects being logged, which make up the values.
You can also optionally log which step
of training you’re on.
Side Note: I like to use the number of examples the model has seen, since this makes for easier comparison across batch sizes, but you can use raw steps or batch count. For longer training runs, it can also make sense to log by
epoch
.
def train_log(loss, example_ct, epoch):
# Where the magic happens
wandb.log({"epoch": epoch, "loss": loss}, step=example_ct)
print(f"Loss after {str(example_ct).zfill(5)} examples: {loss:.3f}")
Define Testing Logic
Once the model is done training, we want to test it: run it against some fresh data from production, perhaps, or apply it to some hand-curated examples.
(Optional) Call wandb.save
This is also a great time to save the model’s architecture
and final parameters to disk.
For maximum compatibility, we’ll export
our model in the
Open Neural Network eXchange (ONNX) format.
Passing that filename to wandb.save
ensures that the model parameters
are saved to W&B’s servers: no more losing track of which .h5
or .pb
corresponds to which training runs.
For more advanced wandb
features for storing, versioning, and distributing
models, check out our Artifacts tools.
def test(model, test_loader):
model.eval()
# Run the model on some test examples
with torch.no_grad():
correct, total = 0, 0
for images, labels in test_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f"Accuracy of the model on the {total} " +
f"test images: {correct / total:%}")
wandb.log({"test_accuracy": correct / total})
# Save the model in the exchangeable ONNX format
torch.onnx.export(model, images, "model.onnx")
wandb.save("model.onnx")
Run training and watch your metrics live on wandb.ai
Now that we’ve defined the whole pipeline and slipped in those few lines of W&B code, we’re ready to run our fully tracked experiment.
We’ll report a few links to you: our documentation, the Project page, which organizes all the runs in a project, and the Run page, where this run’s results will be stored.
Navigate to the Run page and check out these tabs:
- Charts, where the model gradients, parameter values, and loss are logged throughout training
- System, which contains a variety of system metrics, including Disk I/O utilization, CPU and GPU metrics (watch that temperature soar 🔥), and more
- Logs, which has a copy of anything pushed to standard out during training
- Files, where, once training is complete, you can click on the
model.onnx
to view our network with the Netron model viewer.
Once the run in finished, when the with wandb.init
block exits,
we’ll also print a summary of the results in the cell output.
# Build, train and analyze the model with the pipeline
model = model_pipeline(config)
Test Hyperparameters with Sweeps
We only looked at a single set of hyperparameters in this example. But an important part of most ML workflows is iterating over a number of hyperparameters.
You can use Weights & Biases Sweeps to automate hyperparameter testing and explore the space of possible models and optimization strategies.
Check out Hyperparameter Optimization in PyTorch using W&B Sweeps
Running a hyperparameter sweep with Weights & Biases is very easy. There are just 3 simple steps:
-
Define the sweep: We do this by creating a dictionary or a YAML file that specifies the parameters to search through, the search strategy, the optimization metric et all.
-
Initialize the sweep:
sweep_id = wandb.sweep(sweep_config)
-
Run the sweep agent:
wandb.agent(sweep_id, function=train)
That’s all there is to running a hyperparameter sweep.
Example Gallery
See examples of projects tracked and visualized with W&B in our Gallery →
Advanced Setup
- Environment variables: Set API keys in environment variables so you can run training on a managed cluster.
- Offline mode: Use
dryrun
mode to train offline and sync results later. - On-prem: Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams.
- Sweeps: Set up hyperparameter search quickly with our lightweight tool for tuning.
2 - PyTorch Lightning
We will build an image classification pipeline using PyTorch Lightning. We will follow this style guide to increase the readability and reproducibility of our code. A cool explanation of this available here.Setting up PyTorch Lightning and W&B
For this tutorial, we need PyTorch Lightning and Weights and Biases.
pip install lightning -q
pip install wandb -qU
import lightning.pytorch as pl
# your favorite machine learning tracking tool
from lightning.pytorch.loggers import WandbLogger
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import random_split, DataLoader
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import CIFAR10
import wandb
Now you’ll need to log in to your wandb account.
wandb.login()
DataModule - The Data Pipeline we Deserve
DataModules are a way of decoupling data-related hooks from the LightningModule so you can develop dataset agnostic models.
It organizes the data pipeline into one shareable and reusable class. A datamodule encapsulates the five steps involved in data processing in PyTorch:
- Download / tokenize / process.
- Clean and (maybe) save to disk.
- Load inside Dataset.
- Apply transforms (rotate, tokenize, etc…).
- Wrap inside a DataLoader.
Learn more about datamodules here. Let’s build a datamodule for the Cifar-10 dataset.
class CIFAR10DataModule(pl.LightningDataModule):
def __init__(self, batch_size, data_dir: str = './'):
super().__init__()
self.data_dir = data_dir
self.batch_size = batch_size
self.transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
self.num_classes = 10
def prepare_data(self):
CIFAR10(self.data_dir, train=True, download=True)
CIFAR10(self.data_dir, train=False, download=True)
def setup(self, stage=None):
# Assign train/val datasets for use in dataloaders
if stage == 'fit' or stage is None:
cifar_full = CIFAR10(self.data_dir, train=True, transform=self.transform)
self.cifar_train, self.cifar_val = random_split(cifar_full, [45000, 5000])
# Assign test dataset for use in dataloader(s)
if stage == 'test' or stage is None:
self.cifar_test = CIFAR10(self.data_dir, train=False, transform=self.transform)
def train_dataloader(self):
return DataLoader(self.cifar_train, batch_size=self.batch_size, shuffle=True)
def val_dataloader(self):
return DataLoader(self.cifar_val, batch_size=self.batch_size)
def test_dataloader(self):
return DataLoader(self.cifar_test, batch_size=self.batch_size)
Callbacks
A callback is a self-contained program that can be reused across projects. PyTorch Lightning comes with few built-in callbacks which are regularly used. Learn more about callbacks in PyTorch Lightning here.
Built-in Callbacks
In this tutorial, we will use Early Stopping and Model Checkpoint built-in callbacks. They can be passed to the Trainer
.
Custom Callbacks
If you are familiar with Custom Keras callback, the ability to do the same in your PyTorch pipeline is just a cherry on the cake.
Since we are performing image classification, the ability to visualize the model’s predictions on some samples of images can be helpful. This in the form of a callback can help debug the model at an early stage.
class ImagePredictionLogger(pl.callbacks.Callback):
def __init__(self, val_samples, num_samples=32):
super().__init__()
self.num_samples = num_samples
self.val_imgs, self.val_labels = val_samples
def on_validation_epoch_end(self, trainer, pl_module):
# Bring the tensors to CPU
val_imgs = self.val_imgs.to(device=pl_module.device)
val_labels = self.val_labels.to(device=pl_module.device)
# Get model prediction
logits = pl_module(val_imgs)
preds = torch.argmax(logits, -1)
# Log the images as wandb Image
trainer.logger.experiment.log({
"examples":[wandb.Image(x, caption=f"Pred:{pred}, Label:{y}")
for x, pred, y in zip(val_imgs[:self.num_samples],
preds[:self.num_samples],
val_labels[:self.num_samples])]
})
LightningModule - Define the System
The LightningModule defines a system and not a model. Here a system groups all the research code into a single class to make it self-contained. LightningModule
organizes your PyTorch code into 5 sections:
- Computations (
__init__
). - Train loop (
training_step
) - Validation loop (
validation_step
) - Test loop (
test_step
) - Optimizers (
configure_optimizers
)
One can thus build a dataset agnostic model that can be easily shared. Let’s build a system for Cifar-10 classification.
class LitModel(pl.LightningModule):
def __init__(self, input_shape, num_classes, learning_rate=2e-4):
super().__init__()
# log hyperparameters
self.save_hyperparameters()
self.learning_rate = learning_rate
self.conv1 = nn.Conv2d(3, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 32, 3, 1)
self.conv3 = nn.Conv2d(32, 64, 3, 1)
self.conv4 = nn.Conv2d(64, 64, 3, 1)
self.pool1 = torch.nn.MaxPool2d(2)
self.pool2 = torch.nn.MaxPool2d(2)
n_sizes = self._get_conv_output(input_shape)
self.fc1 = nn.Linear(n_sizes, 512)
self.fc2 = nn.Linear(512, 128)
self.fc3 = nn.Linear(128, num_classes)
self.accuracy = Accuracy(task='multiclass', num_classes=num_classes)
# returns the size of the output tensor going into Linear layer from the conv block.
def _get_conv_output(self, shape):
batch_size = 1
input = torch.autograd.Variable(torch.rand(batch_size, *shape))
output_feat = self._forward_features(input)
n_size = output_feat.data.view(batch_size, -1).size(1)
return n_size
# returns the feature tensor from the conv block
def _forward_features(self, x):
x = F.relu(self.conv1(x))
x = self.pool1(F.relu(self.conv2(x)))
x = F.relu(self.conv3(x))
x = self.pool2(F.relu(self.conv4(x)))
return x
# will be used during inference
def forward(self, x):
x = self._forward_features(x)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.log_softmax(self.fc3(x), dim=1)
return x
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
# training metrics
preds = torch.argmax(logits, dim=1)
acc = self.accuracy(preds, y)
self.log('train_loss', loss, on_step=True, on_epoch=True, logger=True)
self.log('train_acc', acc, on_step=True, on_epoch=True, logger=True)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
# validation metrics
preds = torch.argmax(logits, dim=1)
acc = self.accuracy(preds, y)
self.log('val_loss', loss, prog_bar=True)
self.log('val_acc', acc, prog_bar=True)
return loss
def test_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
# validation metrics
preds = torch.argmax(logits, dim=1)
acc = self.accuracy(preds, y)
self.log('test_loss', loss, prog_bar=True)
self.log('test_acc', acc, prog_bar=True)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
return optimizer
Train and Evaluate
Now that we have organized our data pipeline using DataModule
and model architecture+training loop using LightningModule
, the PyTorch Lightning Trainer
automates everything else for us.
The Trainer automates:
- Epoch and batch iteration
- Calling of
optimizer.step()
,backward
,zero_grad()
- Calling of
.eval()
, enabling/disabling grads - Saving and loading weights
- Weights and Biases logging
- Multi-GPU training support
- TPU support
- 16-bit training support
dm = CIFAR10DataModule(batch_size=32)
# To access the x_dataloader we need to call prepare_data and setup.
dm.prepare_data()
dm.setup()
# Samples required by the custom ImagePredictionLogger callback to log image predictions.
val_samples = next(iter(dm.val_dataloader()))
val_imgs, val_labels = val_samples[0], val_samples[1]
val_imgs.shape, val_labels.shape
model = LitModel((3, 32, 32), dm.num_classes)
# Initialize wandb logger
wandb_logger = WandbLogger(project='wandb-lightning', job_type='train')
# Initialize Callbacks
early_stop_callback = pl.callbacks.EarlyStopping(monitor="val_loss")
checkpoint_callback = pl.callbacks.ModelCheckpoint()
# Initialize a trainer
trainer = pl.Trainer(max_epochs=2,
logger=wandb_logger,
callbacks=[early_stop_callback,
ImagePredictionLogger(val_samples),
checkpoint_callback],
)
# Train the model
trainer.fit(model, dm)
# Evaluate the model on the held-out test set ⚡⚡
trainer.test(dataloaders=dm.test_dataloader())
# Close wandb run
wandb.finish()
Final Thoughts
I come from the TensorFlow/Keras ecosystem and find PyTorch a bit overwhelming even though it’s an elegant framework. Just my personal experience though. While exploring PyTorch Lightning, I realized that almost all of the reasons that kept me away from PyTorch is taken care of. Here’s a quick summary of my excitement:
- Then: Conventional PyTorch model definition used to be all over the place. With the model in some
model.py
script and the training loop in thetrain.py
file. It was a lot of looking back and forth to understand the pipeline. - Now: The
LightningModule
acts as a system where the model is defined along with thetraining_step
,validation_step
, etc. Now it’s modular and shareable. - Then: The best part about TensorFlow/Keras is the input data pipeline. Their dataset catalog is rich and growing. PyTorch’s data pipeline used to be the biggest pain point. In normal PyTorch code, the data download/cleaning/preparation is usually scattered across many files.
- Now: The DataModule organizes the data pipeline into one shareable and reusable class. It’s simply a collection of a
train_dataloader
,val_dataloader
(s),test_dataloader
(s) along with the matching transforms and data processing/downloads steps required. - Then: With Keras, one can call
model.fit
to train the model andmodel.predict
to run inference on.model.evaluate
offered a good old simple evaluation on the test data. This is not the case with PyTorch. One will usually find separatetrain.py
andtest.py
files. - Now: With the
LightningModule
in place, theTrainer
automates everything. One needs to just calltrainer.fit
andtrainer.test
to train and evaluate the model. - Then: TensorFlow loves TPU, PyTorch…
- Now: With PyTorch Lightning, it’s so easy to train the same model with multiple GPUs and even on TPU.
- Then: I am a big fan of Callbacks and prefer writing custom callbacks. Something as trivial as Early Stopping used to be a point of discussion with conventional PyTorch.
- Now: With PyTorch Lightning using Early Stopping and Model Checkpointing is a piece of cake. I can even write custom callbacks.
🎨 Conclusion and Resources
I hope you find this report helpful. I will encourage to play with the code and train an image classifier with a dataset of your choice.
Here are some resources to learn more about PyTorch Lightning:
- Step-by-step walk-through - This is one of the official tutorials. Their documentation is really well written and I highly encourage it as a good learning resource.
- Use Pytorch Lightning with Weights & Biases - This is a quick colab that you can run through to learn more about how to use W&B with PyTorch Lightning.
3 - Hugging Face
Visualize your Hugging Face model’s performance quickly with a seamless W&B integration.Compare hyperparameters, output metrics, and system stats like GPU utilization across your models.
Why should I use W&B?
- Unified dashboard: Central repository for all your model metrics and predictions
- Lightweight: No code changes required to integrate with Hugging Face
- Accessible: Free for individuals and academic teams
- Secure: All projects are private by default
- Trusted: Used by machine learning teams at OpenAI, Toyota, Lyft and more
Think of W&B like GitHub for machine learning models— save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you’re running your scripts.
W&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W&B account to start tracking and visualizing your models.
In the Hugging Face Transformers repo, we’ve instrumented the Trainer to automatically log training and evaluation metrics to W&B at each logging step.
Here’s an in depth look at how the integration works: Hugging Face + W&B Report.
Install, import, and log in
Install the Hugging Face and Weights & Biases libraries, and the GLUE dataset and training script for this tutorial.
- Hugging Face Transformers: Natural language models and datasets
- Weights & Biases: Experiment tracking and visualization
- GLUE dataset: A language understanding benchmark dataset
- GLUE script: Model training script for sequence classification
!pip install datasets wandb evaluate accelerate -qU
!wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/text-classification/run_glue.py
# the run_glue.py script requires transformers dev
!pip install -q git+https://github.com/huggingface/transformers
Before continuing, sign up for a free account.
Put in your API key
Once you’ve signed up, run the next cell and click on the link to get your API key and authenticate this notebook.
import wandb
wandb.login()
Optionally, we can set environment variables to customize W&B logging. See documentation.
# Optional: log both gradients and parameters
%env WANDB_WATCH=all
Train the model
Next, call the downloaded training script run_glue.py and see training automatically get tracked to the Weights & Biases dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus— pairs of sentences with human annotations indicating whether they are semantically equivalent.
%env WANDB_PROJECT=huggingface-demo
%env TASK_NAME=MRPC
!python run_glue.py \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--max_seq_length 256 \
--per_device_train_batch_size 32 \
--learning_rate 2e-4 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir \
--logging_steps 50
Visualize results in dashboard
Click the link printed out above, or go to wandb.ai to see your results stream in live. The link to see your run in the browser will appear after all the dependencies are loaded. Look for the following output: “wandb: 🚀 View run at [URL to your unique run]”
Visualize Model Performance It’s easy to look across dozens of experiments, zoom in on interesting findings, and visualize highly dimensional data.
Compare Architectures Here’s an example comparing BERT vs DistilBERT. It’s easy to see how different architectures effect the evaluation accuracy throughout training with automatic line plot visualizations.
Track key information effortlessly by default
Weights & Biases saves a new run for each experiment. Here’s the information that gets saved by default:
- Hyperparameters: Settings for your model are saved in Config
- Model Metrics: Time series data of metrics streaming in are saved in Log
- Terminal Logs: Command line outputs are saved and available in a tab
- System Metrics: GPU and CPU utilization, memory, temperature etc.
Learn more
- Documentation: docs on the Weights & Biases and Hugging Face integration
- Videos: tutorials, interviews with practitioners, and more on our YouTube channel
- Contact: Message us at contact@wandb.com with questions
4 - TensorFlow
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.
What this notebook covers
- Easy integration of Weights and Biases with your TensorFlow pipeline for experiment tracking.
- Computing metrics with
keras.metrics
- Using
wandb.log
to log those metrics in your custom training loop.
Note: Sections starting with Step are all you need to integrate W&B into existing code. The rest is just a standard MNIST example.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Install, Import, Login
Install W&B
%%capture
!pip install wandb
Import W&B and login
import wandb
from wandb.integration.keras import WandbMetricsLogger
wandb.login()
Side note: If this is your first time using W&B or you are not logged in, the link that appears after running
wandb.login()
will take you to sign-up/login page. Signing up is as easy as one click.
Prepare Dataset
# Prepare the training dataset
BATCH_SIZE = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
# build input pipeline using tf.data
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(BATCH_SIZE)
val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
val_dataset = val_dataset.batch(BATCH_SIZE)
Define the Model and the Training Loop
def make_model():
inputs = keras.Input(shape=(784,), name="digits")
x1 = keras.layers.Dense(64, activation="relu")(inputs)
x2 = keras.layers.Dense(64, activation="relu")(x1)
outputs = keras.layers.Dense(10, name="predictions")(x2)
return keras.Model(inputs=inputs, outputs=outputs)
def train_step(x, y, model, optimizer, loss_fn, train_acc_metric):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
def test_step(x, y, model, loss_fn, val_acc_metric):
val_logits = model(x, training=False)
loss_value = loss_fn(y, val_logits)
val_acc_metric.update_state(y, val_logits)
return loss_value
Add wandb.log
to your training loop
def train(train_dataset, val_dataset, model, optimizer,
train_acc_metric, val_acc_metric,
epochs=10, log_step=200, val_log_step=50):
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
train_loss = []
val_loss = []
# Iterate over the batches of the dataset
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
loss_value = train_step(x_batch_train, y_batch_train,
model, optimizer,
loss_fn, train_acc_metric)
train_loss.append(float(loss_value))
# Run a validation loop at the end of each epoch
for step, (x_batch_val, y_batch_val) in enumerate(val_dataset):
val_loss_value = test_step(x_batch_val, y_batch_val,
model, loss_fn,
val_acc_metric)
val_loss.append(float(val_loss_value))
# Display metrics at the end of each epoch
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
val_acc = val_acc_metric.result()
print("Validation acc: %.4f" % (float(val_acc),))
# Reset metrics at the end of each epoch
train_acc_metric.reset_states()
val_acc_metric.reset_states()
# ⭐: log metrics using wandb.log
wandb.log({'epochs': epoch,
'loss': np.mean(train_loss),
'acc': float(train_acc),
'val_loss': np.mean(val_loss),
'val_acc':float(val_acc)})
Run Training
Call wandb.init
to start a run
This lets us know you’re launching an experiment, so we can give it a unique ID and a dashboard.
Check out the official documentation
# initialize wandb with your project name and optionally with configutations.
# play around with the config values and see the result on your wandb dashboard.
config = {
"learning_rate": 0.001,
"epochs": 10,
"batch_size": 64,
"log_step": 200,
"val_log_step": 50,
"architecture": "CNN",
"dataset": "CIFAR-10"
}
run = wandb.init(project='my-tf-integration', config=config)
config = wandb.config
# Initialize model.
model = make_model()
# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=config.learning_rate)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
train(train_dataset,
val_dataset,
model,
optimizer,
train_acc_metric,
val_acc_metric,
epochs=config.epochs,
log_step=config.log_step,
val_log_step=config.val_log_step)
run.finish() # In Jupyter/Colab, let us know you're finished!
Visualize Results
Click on the run page link above to see your live results.
Sweep 101
Use Weights & Biases Sweeps to automate hyperparameter optimization and explore the space of possible models.
Check out Hyperparameter Optimization in TensorFlow using W&B Sweeps
Benefits of using W&B Sweeps
- Quick setup: With just a few lines of code you can run W&B sweeps.
- Transparent: We cite all the algorithms we’re using, and our code is open source.
- Powerful: Our sweeps are completely customizable and configurable. You can launch a sweep across dozens of machines, and it’s just as easy as starting a sweep on your laptop.
Example Gallery
See examples of projects tracked and visualized with W&B in our gallery of examples, Fully Connected →
📏 Best Practices
- Projects: Log multiple runs to a project to compare them.
wandb.init(project="project-name")
- Groups: For multiple processes or cross validation folds, log each process as a runs and group them together.
wandb.init(group='experiment-1')
- Tags: Add tags to track your current baseline or production model.
- Notes: Type notes in the table to track the changes between runs.
- Reports: Take quick notes on progress to share with colleagues and make dashboards and snapshots of your ML projects.
Advanced Setup
- Environment variables: Set API keys in environment variables so you can run training on a managed cluster.
- Offline mode
- On-prem: Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams.
- Artifacts: Track and version models and datasets in a streamlined way that automatically picks up your pipeline steps as you train models.
5 - TensorFlow Sweeps
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.Use Weights & Biases Sweeps to automate hyperparameter optimization and explore the space of possible models, complete with interactive dashboards like this:
Why Should I Use Sweeps?
- Quick setup: With just a few lines of code, you can run W&B sweeps.
- Transparent: The project cites all algorithms used, and the code is open source.
- Powerful: Sweeps are completely customizable and configurable. You can launch a sweep across dozens of machines, and it’s just as easy as starting a sweep on your laptop.
Check out the official documentation
What this notebook covers
- Simple steps to get started with W&B Sweep with custom training loop in TensorFlow.
- Finding the best hyperparameters for an image classification task.
Note: Sections starting with Step are all you need to perform hyperparameter sweep in existing code. The rest of the code is there to set up a simple example.
Install, Import, and Log in
Install W&B
%%capture
!pip install wandb
Import W&B and Login
import tqdm
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import wandb
from wandb.integration.keras import WandbMetricsLogger
wandb.login()
Side note: If this is your first time using W&B or you are not logged in, the link that appears after running
wandb.login()
will take you to sign-up/login page. Signing up is as easy as a few clicks.
Prepare Dataset
# Prepare the training dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
Build a Simple Classifier MLP
def Model():
inputs = keras.Input(shape=(784,), name="digits")
x1 = keras.layers.Dense(64, activation="relu")(inputs)
x2 = keras.layers.Dense(64, activation="relu")(x1)
outputs = keras.layers.Dense(10, name="predictions")(x2)
return keras.Model(inputs=inputs, outputs=outputs)
def train_step(x, y, model, optimizer, loss_fn, train_acc_metric):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
def test_step(x, y, model, loss_fn, val_acc_metric):
val_logits = model(x, training=False)
loss_value = loss_fn(y, val_logits)
val_acc_metric.update_state(y, val_logits)
return loss_value
Write a Training Loop
def train(
train_dataset,
val_dataset,
model,
optimizer,
loss_fn,
train_acc_metric,
val_acc_metric,
epochs=10,
log_step=200,
val_log_step=50,
):
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
train_loss = []
val_loss = []
# Iterate over the batches of the dataset
for step, (x_batch_train, y_batch_train) in tqdm.tqdm(
enumerate(train_dataset), total=len(train_dataset)
):
loss_value = train_step(
x_batch_train,
y_batch_train,
model,
optimizer,
loss_fn,
train_acc_metric,
)
train_loss.append(float(loss_value))
# Run a validation loop at the end of each epoch
for step, (x_batch_val, y_batch_val) in enumerate(val_dataset):
val_loss_value = test_step(
x_batch_val, y_batch_val, model, loss_fn, val_acc_metric
)
val_loss.append(float(val_loss_value))
# Display metrics at the end of each epoch
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
val_acc = val_acc_metric.result()
print("Validation acc: %.4f" % (float(val_acc),))
# Reset metrics at the end of each epoch
train_acc_metric.reset_states()
val_acc_metric.reset_states()
# 3️⃣ log metrics using wandb.log
wandb.log(
{
"epochs": epoch,
"loss": np.mean(train_loss),
"acc": float(train_acc),
"val_loss": np.mean(val_loss),
"val_acc": float(val_acc),
}
)
Configure the Sweep
This is where you will:
- Define the hyperparameters you’re sweeping over
- Provide your hyperparameter optimization method. We have
random
,grid
andbayes
methods. - Provide an objective and a
metric
if usingbayes
, for example tominimize
theval_loss
. - Use
hyperband
for early termination of poorly performing runs.
Check out more on Sweep Configs
sweep_config = {
"method": "random",
"metric": {"name": "val_loss", "goal": "minimize"},
"early_terminate": {"type": "hyperband", "min_iter": 5},
"parameters": {
"batch_size": {"values": [32, 64, 128, 256]},
"learning_rate": {"values": [0.01, 0.005, 0.001, 0.0005, 0.0001]},
},
}
Wrap the Training Loop
You’ll need a function, like sweep_train
below,
that uses wandb.config
to set the hyperparameters
before train
gets called.
def sweep_train(config_defaults=None):
# Set default values
config_defaults = {"batch_size": 64, "learning_rate": 0.01}
# Initialize wandb with a sample project name
wandb.init(config=config_defaults) # this gets over-written in the Sweep
# Specify the other hyperparameters to the configuration, if any
wandb.config.epochs = 2
wandb.config.log_step = 20
wandb.config.val_log_step = 50
wandb.config.architecture_name = "MLP"
wandb.config.dataset_name = "MNIST"
# build input pipeline using tf.data
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = (
train_dataset.shuffle(buffer_size=1024)
.batch(wandb.config.batch_size)
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
val_dataset = val_dataset.batch(wandb.config.batch_size).prefetch(
buffer_size=tf.data.AUTOTUNE
)
# initialize model
model = Model()
# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=wandb.config.learning_rate)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
train(
train_dataset,
val_dataset,
model,
optimizer,
loss_fn,
train_acc_metric,
val_acc_metric,
epochs=wandb.config.epochs,
log_step=wandb.config.log_step,
val_log_step=wandb.config.val_log_step,
)
Initialize Sweep and Run Agent
sweep_id = wandb.sweep(sweep_config, project="sweeps-tensorflow")
You can limit the number of total runs with the count
parameter, we will limit a 10 to make the script run fast, feel free to increase the number of runs and see what happens.
wandb.agent(sweep_id, function=sweep_train, count=10)
Visualize Results
Click on the Sweep URL link above to see your live results.
Example Gallery
See examples of projects tracked and visualized with W&B in the Gallery →
Best Practices
- Projects: Log multiple runs to a project to compare them.
wandb.init(project="project-name")
- Groups: For multiple processes or cross validation folds, log each process as a runs and group them together.
wandb.init(group='experiment-1')
- Tags: Add tags to track your current baseline or production model.
- Notes: Type notes in the table to track the changes between runs.
- Reports: Take quick notes on progress to share with colleagues and make dashboards and snapshots of your ML projects.
Advanced Setup
- Environment variables: Set API keys in environment variables so you can run training on a managed cluster.
- Offline mode
- On-prem: Install W&B in a private cloud or air-gapped servers in your own infrastructure. Everyone from academics to enterprise teams use local installations.
6 - 3D brain tumor segmentation with MONAI
This tutorial demonstrates how to construct a training workflow of multi-labels 3D brain tumor segmentation task using MONAI and use experiment tracking and data visualization features of Weights & Biases. The tutorial contains the following features:
- Initialize a Weights & Biases run and synchronize all configs associated with the run for reproducibility.
- MONAI transform API:
- MONAI Transforms for dictionary format data.
- How to define a new transform according to MONAI
transforms
API. - How to randomly adjust intensity for data augmentation.
- Data Loading and Visualization:
- Load
Nifti
image with metadata, load a list of images and stack them. - Cache IO and transforms to accelerate training and validation.
- Visualize the data using
wandb.Table
and interactive segmentation overlay on Weights & Biases.
- Load
- Training a 3D
SegResNet
model- Using the
networks
,losses
, andmetrics
APIs from MONAI. - Training the 3D
SegResNet
model using a PyTorch training loop. - Track the training experiment using Weights & Biases.
- Log and version model checkpoints as model artifacts on Weights & Biases.
- Using the
- Visualize and compare the predictions on the validation dataset using
wandb.Table
and interactive segmentation overlay on Weights & Biases.
Setup and Installation
First, install the latest version of both MONAI and Weights and Biases.
!python -c "import monai" || pip install -q -U "monai[nibabel, tqdm]"
!python -c "import wandb" || pip install -q -U wandb
import os
import numpy as np
from tqdm.auto import tqdm
import wandb
from monai.apps import DecathlonDataset
from monai.data import DataLoader, decollate_batch
from monai.losses import DiceLoss
from monai.inferers import sliding_window_inference
from monai.metrics import DiceMetric
from monai.networks.nets import SegResNet
from monai.transforms import (
Activations,
AsDiscrete,
Compose,
LoadImaged,
MapTransform,
NormalizeIntensityd,
Orientationd,
RandFlipd,
RandScaleIntensityd,
RandShiftIntensityd,
RandSpatialCropd,
Spacingd,
EnsureTyped,
EnsureChannelFirstd,
)
from monai.utils import set_determinism
import torch
Then, authenticate the Colab instance to use W&B.
wandb.login()
Initialize a W&B Run
Start a new W&B run to start tracking the experiment.
wandb.init(project="monai-brain-tumor-segmentation")
Use of proper config system is a recommended best practice for reproducible machine learning. You can track the hyperparameters for every experiment using W&B.
config = wandb.config
config.seed = 0
config.roi_size = [224, 224, 144]
config.batch_size = 1
config.num_workers = 4
config.max_train_images_visualized = 20
config.max_val_images_visualized = 20
config.dice_loss_smoothen_numerator = 0
config.dice_loss_smoothen_denominator = 1e-5
config.dice_loss_squared_prediction = True
config.dice_loss_target_onehot = False
config.dice_loss_apply_sigmoid = True
config.initial_learning_rate = 1e-4
config.weight_decay = 1e-5
config.max_train_epochs = 50
config.validation_intervals = 1
config.dataset_dir = "./dataset/"
config.checkpoint_dir = "./checkpoints"
config.inference_roi_size = (128, 128, 64)
config.max_prediction_images_visualized = 20
You also need to set the random seed for modules to enable or turn off deterministic training.
set_determinism(seed=config.seed)
# Create directories
os.makedirs(config.dataset_dir, exist_ok=True)
os.makedirs(config.checkpoint_dir, exist_ok=True)
Data Loading and Transformation
Here, use the monai.transforms
API to create a custom transform that converts the multi-classes labels into multi-labels segmentation task in one-hot format.
class ConvertToMultiChannelBasedOnBratsClassesd(MapTransform):
"""
Convert labels to multi channels based on brats classes:
label 1 is the peritumoral edema
label 2 is the GD-enhancing tumor
label 3 is the necrotic and non-enhancing tumor core
The possible classes are TC (Tumor core), WT (Whole tumor)
and ET (Enhancing tumor).
Reference: https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/brats_segmentation_3d.ipynb
"""
def __call__(self, data):
d = dict(data)
for key in self.keys:
result = []
# merge label 2 and label 3 to construct TC
result.append(torch.logical_or(d[key] == 2, d[key] == 3))
# merge labels 1, 2 and 3 to construct WT
result.append(
torch.logical_or(
torch.logical_or(d[key] == 2, d[key] == 3), d[key] == 1
)
)
# label 2 is ET
result.append(d[key] == 2)
d[key] = torch.stack(result, axis=0).float()
return d
Next, set up transforms for training and validation datasets respectively.
train_transform = Compose(
[
# load 4 Nifti images and stack them together
LoadImaged(keys=["image", "label"]),
EnsureChannelFirstd(keys="image"),
EnsureTyped(keys=["image", "label"]),
ConvertToMultiChannelBasedOnBratsClassesd(keys="label"),
Orientationd(keys=["image", "label"], axcodes="RAS"),
Spacingd(
keys=["image", "label"],
pixdim=(1.0, 1.0, 1.0),
mode=("bilinear", "nearest"),
),
RandSpatialCropd(
keys=["image", "label"], roi_size=config.roi_size, random_size=False
),
RandFlipd(keys=["image", "label"], prob=0.5, spatial_axis=0),
RandFlipd(keys=["image", "label"], prob=0.5, spatial_axis=1),
RandFlipd(keys=["image", "label"], prob=0.5, spatial_axis=2),
NormalizeIntensityd(keys="image", nonzero=True, channel_wise=True),
RandScaleIntensityd(keys="image", factors=0.1, prob=1.0),
RandShiftIntensityd(keys="image", offsets=0.1, prob=1.0),
]
)
val_transform = Compose(
[
LoadImaged(keys=["image", "label"]),
EnsureChannelFirstd(keys="image"),
EnsureTyped(keys=["image", "label"]),
ConvertToMultiChannelBasedOnBratsClassesd(keys="label"),
Orientationd(keys=["image", "label"], axcodes="RAS"),
Spacingd(
keys=["image", "label"],
pixdim=(1.0, 1.0, 1.0),
mode=("bilinear", "nearest"),
),
NormalizeIntensityd(keys="image", nonzero=True, channel_wise=True),
]
)
The Dataset
The dataset used for this experiment comes from http://medicaldecathlon.com/. It uses multi-modal multi-site MRI data (FLAIR, T1w, T1gd, T2w) to segment Gliomas, necrotic/active tumour, and oedema. The dataset consists of 750 4D volumes (484 Training + 266 Testing).
Use the DecathlonDataset
to automatically download and extract the dataset. It inherits MONAI CacheDataset
which enables you to set cache_num=N
to cache N
items for training and use the default arguments to cache all the items for validation, depending on your memory size.
train_dataset = DecathlonDataset(
root_dir=config.dataset_dir,
task="Task01_BrainTumour",
transform=val_transform,
section="training",
download=True,
cache_rate=0.0,
num_workers=4,
)
val_dataset = DecathlonDataset(
root_dir=config.dataset_dir,
task="Task01_BrainTumour",
transform=val_transform,
section="validation",
download=False,
cache_rate=0.0,
num_workers=4,
)
train_transform
to the train_dataset
, apply val_transform
to both the training and validation datasets. This is because, before training, you would be visualizing samples from both the splits of the dataset.Visualizing the Dataset
Weights & Biases supports images, video, audio, and more. You can log rich media to explore your results and visually compare our runs, models, and datasets. Use the segmentation mask overlay system to visualize our data volumes. To log segmentation masks in tables, you must provide a wandb.Image
object for each row in the table.
An example is provided in the pseudocode below:
table = wandb.Table(columns=["ID", "Image"])
for id, img, label in zip(ids, images, labels):
mask_img = wandb.Image(
img,
masks={
"prediction": {"mask_data": label, "class_labels": class_labels}
# ...
},
)
table.add_data(id, img)
wandb.log({"Table": table})
Now write a simple utility function that takes a sample image, label, wandb.Table
object and some associated metadata and populate the rows of a table that would be logged to the Weights & Biases dashboard.
def log_data_samples_into_tables(
sample_image: np.array,
sample_label: np.array,
split: str = None,
data_idx: int = None,
table: wandb.Table = None,
):
num_channels, _, _, num_slices = sample_image.shape
with tqdm(total=num_slices, leave=False) as progress_bar:
for slice_idx in range(num_slices):
ground_truth_wandb_images = []
for channel_idx in range(num_channels):
ground_truth_wandb_images.append(
masks = {
"ground-truth/Tumor-Core": {
"mask_data": sample_label[0, :, :, slice_idx],
"class_labels": {0: "background", 1: "Tumor Core"},
},
"ground-truth/Whole-Tumor": {
"mask_data": sample_label[1, :, :, slice_idx] * 2,
"class_labels": {0: "background", 2: "Whole Tumor"},
},
"ground-truth/Enhancing-Tumor": {
"mask_data": sample_label[2, :, :, slice_idx] * 3,
"class_labels": {0: "background", 3: "Enhancing Tumor"},
},
}
wandb.Image(
sample_image[channel_idx, :, :, slice_idx],
masks=masks,
)
)
table.add_data(split, data_idx, slice_idx, *ground_truth_wandb_images)
progress_bar.update(1)
return table
Next, define the wandb.Table
object and what columns it consists of so that it can populate with the data visualizations.
table = wandb.Table(
columns=[
"Split",
"Data Index",
"Slice Index",
"Image-Channel-0",
"Image-Channel-1",
"Image-Channel-2",
"Image-Channel-3",
]
)
Then, loop over the train_dataset
and val_dataset
respectively to generate the visualizations for the data samples and populate the rows of the table which to log to the dashboard.
# Generate visualizations for train_dataset
max_samples = (
min(config.max_train_images_visualized, len(train_dataset))
if config.max_train_images_visualized > 0
else len(train_dataset)
)
progress_bar = tqdm(
enumerate(train_dataset[:max_samples]),
total=max_samples,
desc="Generating Train Dataset Visualizations:",
)
for data_idx, sample in progress_bar:
sample_image = sample["image"].detach().cpu().numpy()
sample_label = sample["label"].detach().cpu().numpy()
table = log_data_samples_into_tables(
sample_image,
sample_label,
split="train",
data_idx=data_idx,
table=table,
)
# Generate visualizations for val_dataset
max_samples = (
min(config.max_val_images_visualized, len(val_dataset))
if config.max_val_images_visualized > 0
else len(val_dataset)
)
progress_bar = tqdm(
enumerate(val_dataset[:max_samples]),
total=max_samples,
desc="Generating Validation Dataset Visualizations:",
)
for data_idx, sample in progress_bar:
sample_image = sample["image"].detach().cpu().numpy()
sample_label = sample["label"].detach().cpu().numpy()
table = log_data_samples_into_tables(
sample_image,
sample_label,
split="val",
data_idx=data_idx,
table=table,
)
# Log the table to your dashboard
wandb.log({"Tumor-Segmentation-Data": table})
The data appears on the W&B dashboard in an interactive tabular format. We can see each channel of a particular slice from a data volume overlaid with the respective segmentation mask in each row. You can write Weave queries to filter the data on the table and focus on one particular row.
An example of logged table data. |
Open an image and see how you can interact with each of the segmentation masks using the interactive overlay.
*An example of visualized segmentation maps. |
Loading the Data
Create the PyTorch DataLoaders for loading the data from the datasets. Before creating the DataLoaders, set the transform
for train_dataset
to train_transform
to pre-process and transform the data for training.
# apply train_transforms to the training dataset
train_dataset.transform = train_transform
# create the train_loader
train_loader = DataLoader(
train_dataset,
batch_size=config.batch_size,
shuffle=True,
num_workers=config.num_workers,
)
# create the val_loader
val_loader = DataLoader(
val_dataset,
batch_size=config.batch_size,
shuffle=False,
num_workers=config.num_workers,
)
Creating the Model, Loss, and Optimizer
This tutorial crates a SegResNet
model based on the paper 3D MRI brain tumor segmentation using auto-encoder regularization. The SegResNet
model that comes implemented as a PyTorch Module as part of the monai.networks
API as well as an optimizer and learning rate scheduler.
device = torch.device("cuda:0")
# create model
model = SegResNet(
blocks_down=[1, 2, 2, 4],
blocks_up=[1, 1, 1],
init_filters=16,
in_channels=4,
out_channels=3,
dropout_prob=0.2,
).to(device)
# create optimizer
optimizer = torch.optim.Adam(
model.parameters(),
config.initial_learning_rate,
weight_decay=config.weight_decay,
)
# create learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer, T_max=config.max_train_epochs
)
Define the loss as multi-label DiceLoss
using the monai.losses
API and the corresponding dice metrics using the monai.metrics
API.
loss_function = DiceLoss(
smooth_nr=config.dice_loss_smoothen_numerator,
smooth_dr=config.dice_loss_smoothen_denominator,
squared_pred=config.dice_loss_squared_prediction,
to_onehot_y=config.dice_loss_target_onehot,
sigmoid=config.dice_loss_apply_sigmoid,
)
dice_metric = DiceMetric(include_background=True, reduction="mean")
dice_metric_batch = DiceMetric(include_background=True, reduction="mean_batch")
post_trans = Compose([Activations(sigmoid=True), AsDiscrete(threshold=0.5)])
# use automatic mixed-precision to accelerate training
scaler = torch.cuda.amp.GradScaler()
torch.backends.cudnn.benchmark = True
Define a small utility for mixed-precision inference. This will be useful during the validation step of the training process and when you want to run the model after training.
def inference(model, input):
def _compute(input):
return sliding_window_inference(
inputs=input,
roi_size=(240, 240, 160),
sw_batch_size=1,
predictor=model,
overlap=0.5,
)
with torch.cuda.amp.autocast():
return _compute(input)
Training and Validation
Before training, define the metric properties which will later be logged with wandb.log()
for tracking the training and validation experiments.
wandb.define_metric("epoch/epoch_step")
wandb.define_metric("epoch/*", step_metric="epoch/epoch_step")
wandb.define_metric("batch/batch_step")
wandb.define_metric("batch/*", step_metric="batch/batch_step")
wandb.define_metric("validation/validation_step")
wandb.define_metric("validation/*", step_metric="validation/validation_step")
batch_step = 0
validation_step = 0
metric_values = []
metric_values_tumor_core = []
metric_values_whole_tumor = []
metric_values_enhanced_tumor = []
Execute Standard PyTorch Training Loop
# Define a W&B Artifact object
artifact = wandb.Artifact(
name=f"{wandb.run.id}-checkpoint", type="model"
)
epoch_progress_bar = tqdm(range(config.max_train_epochs), desc="Training:")
for epoch in epoch_progress_bar:
model.train()
epoch_loss = 0
total_batch_steps = len(train_dataset) // train_loader.batch_size
batch_progress_bar = tqdm(train_loader, total=total_batch_steps, leave=False)
# Training Step
for batch_data in batch_progress_bar:
inputs, labels = (
batch_data["image"].to(device),
batch_data["label"].to(device),
)
optimizer.zero_grad()
with torch.cuda.amp.autocast():
outputs = model(inputs)
loss = loss_function(outputs, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
epoch_loss += loss.item()
batch_progress_bar.set_description(f"train_loss: {loss.item():.4f}:")
## Log batch-wise training loss to W&B
wandb.log({"batch/batch_step": batch_step, "batch/train_loss": loss.item()})
batch_step += 1
lr_scheduler.step()
epoch_loss /= total_batch_steps
## Log batch-wise training loss and learning rate to W&B
wandb.log(
{
"epoch/epoch_step": epoch,
"epoch/mean_train_loss": epoch_loss,
"epoch/learning_rate": lr_scheduler.get_last_lr()[0],
}
)
epoch_progress_bar.set_description(f"Training: train_loss: {epoch_loss:.4f}:")
# Validation and model checkpointing step
if (epoch + 1) % config.validation_intervals == 0:
model.eval()
with torch.no_grad():
for val_data in val_loader:
val_inputs, val_labels = (
val_data["image"].to(device),
val_data["label"].to(device),
)
val_outputs = inference(model, val_inputs)
val_outputs = [post_trans(i) for i in decollate_batch(val_outputs)]
dice_metric(y_pred=val_outputs, y=val_labels)
dice_metric_batch(y_pred=val_outputs, y=val_labels)
metric_values.append(dice_metric.aggregate().item())
metric_batch = dice_metric_batch.aggregate()
metric_values_tumor_core.append(metric_batch[0].item())
metric_values_whole_tumor.append(metric_batch[1].item())
metric_values_enhanced_tumor.append(metric_batch[2].item())
dice_metric.reset()
dice_metric_batch.reset()
checkpoint_path = os.path.join(config.checkpoint_dir, "model.pth")
torch.save(model.state_dict(), checkpoint_path)
# Log and versison model checkpoints using W&B artifacts.
artifact.add_file(local_path=checkpoint_path)
wandb.log_artifact(artifact, aliases=[f"epoch_{epoch}"])
# Log validation metrics to W&B dashboard.
wandb.log(
{
"validation/validation_step": validation_step,
"validation/mean_dice": metric_values[-1],
"validation/mean_dice_tumor_core": metric_values_tumor_core[-1],
"validation/mean_dice_whole_tumor": metric_values_whole_tumor[-1],
"validation/mean_dice_enhanced_tumor": metric_values_enhanced_tumor[-1],
}
)
validation_step += 1
# Wait for this artifact to finish logging
artifact.wait()
Instrumenting the code with wandb.log
not only enables tracking all metrics associated with the training and validation process, but also the all system metrics (our CPU and GPU in this case) on the W&B dashboard.
An example of training and validation process tracking on W&B. |
Navigate to the artifacts tab in the W&B run dashboard to access the different versions of model checkpoint artifacts logged during training.
An example of model checkpoints logging and versioning on W&B. |
Inference
Using the artifacts interface, you can select which version of the artifact is the best model checkpoint, in this case, the mean epoch-wise training loss. You can also explore the entire lineage of the artifact and use the version that you need.
An example of model artifact tracking on W&B. |
Fetch the version of the model artifact with the best epoch-wise mean training loss and load the checkpoint state dictionary to the model.
model_artifact = wandb.use_artifact(
"geekyrakshit/monai-brain-tumor-segmentation/d5ex6n4a-checkpoint:v49",
type="model",
)
model_artifact_dir = model_artifact.download()
model.load_state_dict(torch.load(os.path.join(model_artifact_dir, "model.pth")))
model.eval()
Visualizing Predictions and Comparing with the Ground Truth Labels
Create another utility function to visualize the predictions of the pre-trained model and compare them with the corresponding ground-truth segmentation mask using the interactive segmentation mask overlay,.
def log_predictions_into_tables(
sample_image: np.array,
sample_label: np.array,
predicted_label: np.array,
split: str = None,
data_idx: int = None,
table: wandb.Table = None,
):
num_channels, _, _, num_slices = sample_image.shape
with tqdm(total=num_slices, leave=False) as progress_bar:
for slice_idx in range(num_slices):
wandb_images = []
for channel_idx in range(num_channels):
wandb_images += [
wandb.Image(
sample_image[channel_idx, :, :, slice_idx],
masks={
"ground-truth/Tumor-Core": {
"mask_data": sample_label[0, :, :, slice_idx],
"class_labels": {0: "background", 1: "Tumor Core"},
},
"prediction/Tumor-Core": {
"mask_data": predicted_label[0, :, :, slice_idx] * 2,
"class_labels": {0: "background", 2: "Tumor Core"},
},
},
),
wandb.Image(
sample_image[channel_idx, :, :, slice_idx],
masks={
"ground-truth/Whole-Tumor": {
"mask_data": sample_label[1, :, :, slice_idx],
"class_labels": {0: "background", 1: "Whole Tumor"},
},
"prediction/Whole-Tumor": {
"mask_data": predicted_label[1, :, :, slice_idx] * 2,
"class_labels": {0: "background", 2: "Whole Tumor"},
},
},
),
wandb.Image(
sample_image[channel_idx, :, :, slice_idx],
masks={
"ground-truth/Enhancing-Tumor": {
"mask_data": sample_label[2, :, :, slice_idx],
"class_labels": {0: "background", 1: "Enhancing Tumor"},
},
"prediction/Enhancing-Tumor": {
"mask_data": predicted_label[2, :, :, slice_idx] * 2,
"class_labels": {0: "background", 2: "Enhancing Tumor"},
},
},
),
]
table.add_data(split, data_idx, slice_idx, *wandb_images)
progress_bar.update(1)
return table
Log the prediction results to the prediction table.
# create the prediction table
prediction_table = wandb.Table(
columns=[
"Split",
"Data Index",
"Slice Index",
"Image-Channel-0/Tumor-Core",
"Image-Channel-1/Tumor-Core",
"Image-Channel-2/Tumor-Core",
"Image-Channel-3/Tumor-Core",
"Image-Channel-0/Whole-Tumor",
"Image-Channel-1/Whole-Tumor",
"Image-Channel-2/Whole-Tumor",
"Image-Channel-3/Whole-Tumor",
"Image-Channel-0/Enhancing-Tumor",
"Image-Channel-1/Enhancing-Tumor",
"Image-Channel-2/Enhancing-Tumor",
"Image-Channel-3/Enhancing-Tumor",
]
)
# Perform inference and visualization
with torch.no_grad():
config.max_prediction_images_visualized
max_samples = (
min(config.max_prediction_images_visualized, len(val_dataset))
if config.max_prediction_images_visualized > 0
else len(val_dataset)
)
progress_bar = tqdm(
enumerate(val_dataset[:max_samples]),
total=max_samples,
desc="Generating Predictions:",
)
for data_idx, sample in progress_bar:
val_input = sample["image"].unsqueeze(0).to(device)
val_output = inference(model, val_input)
val_output = post_trans(val_output[0])
prediction_table = log_predictions_into_tables(
sample_image=sample["image"].cpu().numpy(),
sample_label=sample["label"].cpu().numpy(),
predicted_label=val_output.cpu().numpy(),
data_idx=data_idx,
split="validation",
table=prediction_table,
)
wandb.log({"Predictions/Tumor-Segmentation-Data": prediction_table})
# End the experiment
wandb.finish()
Use the interactive segmentation mask overlay to analyze and compare the predicted segmentation masks and the ground-truth labels for each class.
An example of predictions and ground-truth visualization on W&B. |
Acknowledgements and more resources
7 - Keras
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.This Colab notebook introduces the WandbMetricsLogger
callback. Use this callback for Experiment Tracking. It will log your training and validation metrics along with system metrics to Weights and Biases.
Setup and Installation
First, let us install the latest version of Weights and Biases. We will then authenticate this colab instance to use W&B.
pip install -qq -U wandb
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
import tensorflow_datasets as tfds
# Weights and Biases related imports
import wandb
from wandb.integration.keras import WandbMetricsLogger
If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login()
will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
wandb.login()
Hyperparameters
Use of proper config system is a recommended best practice for reproducible machine learning. We can track the hyperparameters for every experiment using W&B. In this colab we will be using simple Python dict
as our config system.
configs = dict(
num_classes=10,
shuffle_buffer=1024,
batch_size=64,
image_size=28,
image_channels=1,
earlystopping_patience=3,
learning_rate=1e-3,
epochs=10,
)
Dataset
In this colab, we will be using CIFAR100 dataset from TensorFlow Dataset catalog. We aim to build a simple image classification pipeline using TensorFlow/Keras.
train_ds, valid_ds = tfds.load("fashion_mnist", split=["train", "test"])
AUTOTUNE = tf.data.AUTOTUNE
def parse_data(example):
# Get image
image = example["image"]
# image = tf.image.convert_image_dtype(image, dtype=tf.float32)
# Get label
label = example["label"]
label = tf.one_hot(label, depth=configs["num_classes"])
return image, label
def get_dataloader(ds, configs, dataloader_type="train"):
dataloader = ds.map(parse_data, num_parallel_calls=AUTOTUNE)
if dataloader_type == "train":
dataloader = dataloader.shuffle(configs["shuffle_buffer"])
dataloader = dataloader.batch(configs["batch_size"]).prefetch(AUTOTUNE)
return dataloader
trainloader = get_dataloader(train_ds, configs)
validloader = get_dataloader(valid_ds, configs, dataloader_type="valid")
Model
def get_model(configs):
backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(
weights="imagenet", include_top=False
)
backbone.trainable = False
inputs = layers.Input(
shape=(configs["image_size"], configs["image_size"], configs["image_channels"])
)
resize = layers.Resizing(32, 32)(inputs)
neck = layers.Conv2D(3, (3, 3), padding="same")(resize)
preprocess_input = tf.keras.applications.mobilenet.preprocess_input(neck)
x = backbone(preprocess_input)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(configs["num_classes"], activation="softmax")(x)
return models.Model(inputs=inputs, outputs=outputs)
tf.keras.backend.clear_session()
model = get_model(configs)
model.summary()
Compile Model
model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=[
"accuracy",
tf.keras.metrics.TopKCategoricalAccuracy(k=5, name="top@5_accuracy"),
],
)
Train
# Initialize a W&B run
run = wandb.init(project="intro-keras", config=configs)
# Train your model
model.fit(
trainloader,
epochs=configs["epochs"],
validation_data=validloader,
callbacks=[
WandbMetricsLogger(log_freq=10)
], # Notice the use of WandbMetricsLogger here
)
# Close the W&B run
run.finish()
8 - Keras models
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.This Colab notebook introduces the WandbModelCheckpoint
callback. Use this callback to log your model checkpoints to Weight and Biases Artifacts.
Setup and Installation
First, let us install the latest version of Weights and Biases. We will then authenticate this colab instance to use W&B.
!pip install -qq -U wandb
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
import tensorflow_datasets as tfds
# Weights and Biases related imports
import wandb
from wandb.integration.keras import WandbMetricsLogger
from wandb.integration.keras import WandbModelCheckpoint
If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login()
will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
wandb.login()
Hyperparameters
Use of proper config system is a recommended best practice for reproducible machine learning. We can track the hyperparameters for every experiment using W&B. In this colab we will be using simple Python dict
as our config system.
configs = dict(
num_classes = 10,
shuffle_buffer = 1024,
batch_size = 64,
image_size = 28,
image_channels = 1,
earlystopping_patience = 3,
learning_rate = 1e-3,
epochs = 10
)
Dataset
In this colab, we will be using CIFAR100 dataset from TensorFlow Dataset catalog. We aim to build a simple image classification pipeline using TensorFlow/Keras.
train_ds, valid_ds = tfds.load('fashion_mnist', split=['train', 'test'])
AUTOTUNE = tf.data.AUTOTUNE
def parse_data(example):
# Get image
image = example["image"]
# image = tf.image.convert_image_dtype(image, dtype=tf.float32)
# Get label
label = example["label"]
label = tf.one_hot(label, depth=configs["num_classes"])
return image, label
def get_dataloader(ds, configs, dataloader_type="train"):
dataloader = ds.map(parse_data, num_parallel_calls=AUTOTUNE)
if dataloader_type=="train":
dataloader = dataloader.shuffle(configs["shuffle_buffer"])
dataloader = (
dataloader
.batch(configs["batch_size"])
.prefetch(AUTOTUNE)
)
return dataloader
trainloader = get_dataloader(train_ds, configs)
validloader = get_dataloader(valid_ds, configs, dataloader_type="valid")
Model
def get_model(configs):
backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(weights='imagenet', include_top=False)
backbone.trainable = False
inputs = layers.Input(shape=(configs["image_size"], configs["image_size"], configs["image_channels"]))
resize = layers.Resizing(32, 32)(inputs)
neck = layers.Conv2D(3, (3,3), padding="same")(resize)
preprocess_input = tf.keras.applications.mobilenet.preprocess_input(neck)
x = backbone(preprocess_input)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(configs["num_classes"], activation="softmax")(x)
return models.Model(inputs=inputs, outputs=outputs)
tf.keras.backend.clear_session()
model = get_model(configs)
model.summary()
Compile Model
model.compile(
optimizer = "adam",
loss = "categorical_crossentropy",
metrics = ["accuracy", tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top@5_accuracy')]
)
Train
# Initialize a W&B run
run = wandb.init(
project = "intro-keras",
config = configs
)
# Train your model
model.fit(
trainloader,
epochs = configs["epochs"],
validation_data = validloader,
callbacks = [
WandbMetricsLogger(log_freq=10),
WandbModelCheckpoint(filepath="models/") # Notice the use of WandbModelCheckpoint here
]
)
# Close the W&B run
run.finish()
9 - Keras tables
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.This Colab notebook introduces the WandbEvalCallback
which is an abstract callback that be inherited to build useful callbacks for model prediction visualization and dataset visualization.
Setup and Installation
First, let us install the latest version of Weights and Biases. We will then authenticate this colab instance to use W&B.
pip install -qq -U wandb
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
import tensorflow_datasets as tfds
# Weights and Biases related imports
import wandb
from wandb.integration.keras import WandbMetricsLogger
from wandb.integration.keras import WandbModelCheckpoint
from wandb.integration.keras import WandbEvalCallback
If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login()
will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
wandb.login()
Hyperparameters
Use of proper config system is a recommended best practice for reproducible machine learning. We can track the hyperparameters for every experiment using W&B. In this colab we will be using simple Python dict
as our config system.
configs = dict(
num_classes=10,
shuffle_buffer=1024,
batch_size=64,
image_size=28,
image_channels=1,
earlystopping_patience=3,
learning_rate=1e-3,
epochs=10,
)
Dataset
In this colab, we will be using CIFAR100 dataset from TensorFlow Dataset catalog. We aim to build a simple image classification pipeline using TensorFlow/Keras.
train_ds, valid_ds = tfds.load("fashion_mnist", split=["train", "test"])
AUTOTUNE = tf.data.AUTOTUNE
def parse_data(example):
# Get image
image = example["image"]
# image = tf.image.convert_image_dtype(image, dtype=tf.float32)
# Get label
label = example["label"]
label = tf.one_hot(label, depth=configs["num_classes"])
return image, label
def get_dataloader(ds, configs, dataloader_type="train"):
dataloader = ds.map(parse_data, num_parallel_calls=AUTOTUNE)
if dataloader_type=="train":
dataloader = dataloader.shuffle(configs["shuffle_buffer"])
dataloader = (
dataloader
.batch(configs["batch_size"])
.prefetch(AUTOTUNE)
)
return dataloader
trainloader = get_dataloader(train_ds, configs)
validloader = get_dataloader(valid_ds, configs, dataloader_type="valid")
Model
def get_model(configs):
backbone = tf.keras.applications.mobilenet_v2.MobileNetV2(
weights="imagenet", include_top=False
)
backbone.trainable = False
inputs = layers.Input(
shape=(configs["image_size"], configs["image_size"], configs["image_channels"])
)
resize = layers.Resizing(32, 32)(inputs)
neck = layers.Conv2D(3, (3, 3), padding="same")(resize)
preprocess_input = tf.keras.applications.mobilenet.preprocess_input(neck)
x = backbone(preprocess_input)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(configs["num_classes"], activation="softmax")(x)
return models.Model(inputs=inputs, outputs=outputs)
tf.keras.backend.clear_session()
model = get_model(configs)
model.summary()
Compile Model
model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=[
"accuracy",
tf.keras.metrics.TopKCategoricalAccuracy(k=5, name="top@5_accuracy"),
],
)
WandbEvalCallback
The WandbEvalCallback
is an abstract base class to build Keras callbacks for primarily model prediction visualization and secondarily dataset visualization.
This is a dataset and task agnostic abstract callback. To use this, inherit from this base callback class and implement the add_ground_truth
and add_model_prediction
methods.
The WandbEvalCallback
is a utility class that provides helpful methods to:
- create data and prediction
wandb.Table
instances, - log data and prediction Tables as
wandb.Artifact
, - logs the data table
on_train_begin
, - logs the prediction table
on_epoch_end
.
As an example, we have implemented WandbClfEvalCallback
below for an image classification task. This example callback:
- logs the validation data (
data_table
) to W&B, - performs inference and logs the prediction (
pred_table
) to W&B on every epoch end.
How the memory footprint is reduced
We log the data_table
to W&B when the on_train_begin
method is ivoked. Once it’s uploaded as a W&B Artifact, we get a reference to this table which can be accessed using data_table_ref
class variable. The data_table_ref
is a 2D list that can be indexed like self.data_table_ref[idx][n]
where idx
is the row number while n
is the column number. Let’s see the usage in the example below.
class WandbClfEvalCallback(WandbEvalCallback):
def __init__(
self, validloader, data_table_columns, pred_table_columns, num_samples=100
):
super().__init__(data_table_columns, pred_table_columns)
self.val_data = validloader.unbatch().take(num_samples)
def add_ground_truth(self, logs=None):
for idx, (image, label) in enumerate(self.val_data):
self.data_table.add_data(idx, wandb.Image(image), np.argmax(label, axis=-1))
def add_model_predictions(self, epoch, logs=None):
# Get predictions
preds = self._inference()
table_idxs = self.data_table_ref.get_index()
for idx in table_idxs:
pred = preds[idx]
self.pred_table.add_data(
epoch,
self.data_table_ref.data[idx][0],
self.data_table_ref.data[idx][1],
self.data_table_ref.data[idx][2],
pred,
)
def _inference(self):
preds = []
for image, label in self.val_data:
pred = self.model(tf.expand_dims(image, axis=0))
argmax_pred = tf.argmax(pred, axis=-1).numpy()[0]
preds.append(argmax_pred)
return preds
Train
# Initialize a W&B run
run = wandb.init(project="intro-keras", config=configs)
# Train your model
model.fit(
trainloader,
epochs=configs["epochs"],
validation_data=validloader,
callbacks=[
WandbMetricsLogger(log_freq=10),
WandbClfEvalCallback(
validloader,
data_table_columns=["idx", "image", "ground_truth"],
pred_table_columns=["epoch", "idx", "image", "ground_truth", "prediction"],
), # Notice the use of WandbEvalCallback here
],
)
# Close the W&B run
run.finish()
10 - XGBoost Sweeps
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.Squeezing the best performance out of tree-based models requires
selecting the right hyperparameters.
How many early_stopping_rounds
? What should the max_depth
of a tree be?
Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy very fast. Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and crown a winner. They enable this by automatically searching through combinations of hyperparameter values to find the most optimal values.
In this tutorial we’ll see how you can run sophisticated hyperparameter sweeps on XGBoost models in 3 easy steps using Weights and Biases.
For a teaser, check out the plots below:
Sweeps: An Overview
Running a hyperparameter sweep with Weights & Biases is very easy. There are just 3 simple steps:
-
Define the sweep: we do this by creating a dictionary-like object that specifies the sweep: which parameters to search through, which search strategy to use, which metric to optimize.
-
Initialize the sweep: with one line of code we initialize the sweep and pass in the dictionary of sweep configurations:
sweep_id = wandb.sweep(sweep_config)
-
Run the sweep agent: also accomplished with one line of code, we call w
andb.agent()
and pass thesweep_id
along with a function that defines your model architecture and trains it:wandb.agent(sweep_id, function=train)
That’s all there is to running a hyperparameter sweep.
In the notebook below, we’ll walk through these 3 steps in more detail.
We highly encourage you to fork this notebook, tweak the parameters, or try the model with your own dataset.
Resources
!pip install wandb -qU
import wandb
wandb.login()
1. Define the Sweep
Weights & Biases sweeps give you powerful levers to configure your sweeps exactly how you want them, with just a few lines of code. The sweeps config can be defined as a dictionary or a YAML file.
Let’s walk through some of them together:
- Metric: This is the metric the sweeps are attempting to optimize. Metrics can take a
name
(this metric should be logged by your training script) and agoal
(maximize
orminimize
). - Search Strategy: Specified using the
"method"
key. We support several different search strategies with sweeps. - Grid Search: Iterates over every combination of hyperparameter values.
- Random Search: Iterates over randomly chosen combinations of hyperparameter values.
- Bayesian Search: Creates a probabilistic model that maps hyperparameters to probability of a metric score, and chooses parameters with high probability of improving the metric. The objective of Bayesian optimization is to spend more time in picking the hyperparameter values, but in doing so trying out fewer hyperparameter values.
- Parameters: A dictionary containing the hyperparameter names, and discrete values, a range, or distributions from which to pull their values on each iteration.
For details, see the list of all sweep configuration options.
sweep_config = {
"method": "random", # try grid or random
"metric": {
"name": "accuracy",
"goal": "maximize"
},
"parameters": {
"booster": {
"values": ["gbtree","gblinear"]
},
"max_depth": {
"values": [3, 6, 9, 12]
},
"learning_rate": {
"values": [0.1, 0.05, 0.2]
},
"subsample": {
"values": [1, 0.5, 0.3]
}
}
}
2. Initialize the Sweep
Calling wandb.sweep
starts a Sweep Controller –
a centralized process that provides settings of the parameters
to any who query it
and expects them to return performance on metrics
via wandb
logging.
sweep_id = wandb.sweep(sweep_config, project="XGBoost-sweeps")
Define your training process
Before we can run the sweep, we need to define a function that creates and trains the model – the function that takes in hyperparameter values and spits out metrics.
We’ll also need wandb
to be integrated into our script.
There’s three main components:
wandb.init()
: Initialize a new W&B run. Each run is single execution of the training script.wandb.config
: Save all your hyperparameters in a config object. This lets you use our app to sort and compare your runs by hyperparameter values.wandb.log()
: Logs metrics and custom objects, such as images, videos, audio files, HTML, plots, or point clouds.
We also need to download the data:
!wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
# XGBoost model for Pima Indians dataset
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
def train():
config_defaults = {
"booster": "gbtree",
"max_depth": 3,
"learning_rate": 0.1,
"subsample": 1,
"seed": 117,
"test_size": 0.33,
}
wandb.init(config=config_defaults) # defaults are over-ridden during the sweep
config = wandb.config
# load data and split into predictors and targets
dataset = loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
X, Y = dataset[:, :8], dataset[:, 8]
# split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, Y,
test_size=config.test_size,
random_state=config.seed)
# fit model on train
model = XGBClassifier(booster=config.booster, max_depth=config.max_depth,
learning_rate=config.learning_rate, subsample=config.subsample)
model.fit(X_train, y_train)
# make predictions on test
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.0%}")
wandb.log({"accuracy": accuracy})
3. Run the Sweep with an agent
Now, we call wandb.agent
to start up our sweep.
You can call wandb.agent
on any machine where you’re logged into W&B that has
- the
sweep_id
, - the dataset and
train
function
and that machine will join the sweep.
Note: a
random
sweep will by defauly run forever, trying new parameter combinations until the cows come home – or until you turn the sweep off from the app UI. You can prevent this by providing the totalcount
of runs you’d like theagent
to complete.
wandb.agent(sweep_id, train, count=25)
Visualize your results
Now that your sweep is finished, it’s time to look at the results.
Weights & Biases will generate a number of useful plots for you automatically.
Parallel coordinates plot
This plot maps hyperparameter values to model metrics. It’s useful for honing in on combinations of hyperparameters that led to the best model performance.
This plot seems to indicate that using a tree as our learner slightly, but not mind-blowingly, outperforms using a simple linear model as our learner.
Hyperparameter importance plot
The hyperparameter importance plot shows which hyperparameter values had the biggest impact on your metrics.
We report both the correlation (treating it as a linear predictor) and the feature importance (after training a random forest on your results) so you can see which parameters had the biggest effect and whether that effect was positive or negative.
Reading this chart, we see quantitative confirmation
of the trend we noticed in the parallel coordinates chart above:
the largest impact on validation accuracy came from the choice of
learner, and the gblinear
learners were generally worse than gbtree
learners.
These visualizations can help you save both time and resources running expensive hyperparameter optimizations by honing in on the parameters (and value ranges) that are the most important, and thereby worthy of further exploration.