Documentation
Search…
PyTorch Lightning
Build scalable, structured, high-performance PyTorch models with Lightning and log them with W&B.
PyTorch Lightning provides a lightweight wrapper for organizing your PyTorch code and easily adding advanced features such as distributed training and 16-bit precision. W&B provides a lightweight wrapper for logging your ML experiments. But you don't need to combine the two yourself: Weights & Biases is incorporated directly into the PyTorch Lightning library via the WandbLogger.

⚡ Get going lightning-fast with just two lines.

1
from pytorch_lightning.loggers import WandbLogger
2
from pytorch_lightning import Trainer
3
4
wandb_logger = WandbLogger()
5
trainer = Trainer(logger=wandb_logger)
Copied!
Interactive dashboards accessible anywhere, and more!

Using PyTorch Lightning's WandbLogger

PyTorch Lightning has a WandbLogger class that can be used to seamlessly log metrics, model weights, media and more. Just instantiate the WandbLogger and pass it to Lightning's Trainer.
1
wandb_logger = WandbLogger()
2
trainer = Trainer(logger=wandb_logger)
Copied!

Logger arguments

Below are some of the most used parameters in WandbLogger, see the PyTorch Lightning WandbLogger documentation for a full list and description
Parameter
Description
project
Define what wandb Project to log to
name
Give a name to your wandb run
log_model
Log all models if log_model="all" or at end of training if log_model=True
save_dir
Path where data is saved

Log your LightningModule hyperparameters

1
class LitModule(LightningModule):
2
def __init__(self, *args, **kwarg):
3
self.save_hyperparameters()
Copied!

Log additional config parameters

1
# add one parameter
2
wandb_logger.experiment.config["key"] = value
3
4
# add multiple parameters
5
wandb_logger.experiment.config.update({key1: val1, key2: val2})
6
7
# use directly wandb module
8
wandb.config["key"] = value
9
wandb.config.update()
Copied!

Log gradients, parameter histogram and model topology

You can pass your model object to wandblogger.watch() to monitor your models's gradients and parameters as you train. See the PyTorch Lightning WandbLogger documentation for a full description

Log metrics

You can log your metrics to W&B when using the WandbLogger by calling self.log('my_metric_name', metric_vale) within your LightningModule, such as in your training_step or validation_step methods.
The code snippet below shows how to define your LightningModule to log your metrics and your LightningModule hyperparameters. In this example we will use the torchmetrics library to calculate our metrics
1
import torch
2
from torch.nn import Linear, CrossEntropyLoss, functional as F
3
from torch.optim import Adam
4
from torchmetrics.functional import accuracy
5
from pytorch_lightning import LightningModule
6
7
class My_LitModule(LightningModule):
8
9
def __init__(self, n_classes=10, n_layer_1=128, n_layer_2=256, lr=1e-3):
10
'''method used to define our model parameters'''
11
super().__init__()
12
13
# mnist images are (1, 28, 28) (channels, width, height)
14
self.layer_1 = Linear(28 * 28, n_layer_1)
15
self.layer_2 = Linear(n_layer_1, n_layer_2)
16
self.layer_3 = Linear(n_layer_2, n_classes)
17
18
self.loss = CrossEntropyLoss()
19
self.lr = lr
20
21
# save hyper-parameters to self.hparams (auto-logged by W&B)
22
self.save_hyperparameters()
23
24
def forward(self, x):
25
'''method used for inference input -> output'''
26
27
# (b, 1, 28, 28) -> (b, 1*28*28)
28
batch_size, channels, width, height = x.size()
29
x = x.view(batch_size, -1)
30
31
# let's do 3 x (linear + relu)
32
x = F.relu(self.layer_1(x))
33
x = F.relu(self.layer_2(x))
34
x = self.layer_3(x)
35
return x
36
37
def training_step(self, batch, batch_idx):
38
'''needs to return a loss from a single batch'''
39
_, loss, acc = self._get_preds_loss_accuracy(batch)
40
41
# Log loss and metric
42
self.log('train_loss', loss)
43
self.log('train_accuracy', acc)
44
return loss
45
46
def validation_step(self, batch, batch_idx):
47
'''used for logging metrics'''
48
preds, loss, acc = self._get_preds_loss_accuracy(batch)
49
50
# Log loss and metric
51
self.log('val_loss', loss)
52
self.log('val_accuracy', acc)
53
return preds
54
55
def configure_optimizers(self):
56
'''defines model optimizer'''
57
return Adam(self.parameters(), lr=self.lr)
58
59
def _get_preds_loss_accuracy(self, batch):
60
'''convenience function since train/valid/test steps are similar'''
61
x, y = batch
62
logits = self(x)
63
preds = torch.argmax(logits, dim=1)
64
loss = self.loss(logits, y)
65
acc = accuracy(preds, y)
66
return preds, loss, acc
Copied!

Log the min/max of your metric

Using wandb's define_metric function you can define whether you'd like your W&B summary metric to display the min, max, mean or best value for that metric. If definemetric isn't used, then the last value logged with appear in your summary metrics. See the define_metric reference docs here and the guide here for more.
To tell W&B to keep track of the max validation accuracy in the W&B summary metric, you just need to call wandb.define_metric once, e.g. you can call it at the beginning of training like so:
1
class My_LitModule(LightningModule):
2
...
3
4
def validation_step(self, batch, batch_idx):
5
if trainer.global_step == 0:
6
wandb.define_metric('val_accuracy', summary='max')
7
8
preds, loss, acc = self._get_preds_loss_accuracy(batch)
9
10
# Log loss and metric
11
self.log('val_loss', loss)
12
self.log('val_accuracy', acc)
13
return preds
Copied!

Log images, text and more

The WandbLogger has log_image, log_text and log_table methods for logging media.
You can also directly call wandb.log or trainer.logger.experiment.log to log other media types such as Audio, Molecules, Point Clouds, 3D Objects and more.
When using wandb.log or trainer.logger.experiment.log within your trainer make sure to also include"global_step": trainer.global_step in the dictionary being passed. That way, you can line up the information you're currently logging with information logged via other methods.
Log Images
Log Text
Log Tables
1
# using tensors, numpy arrays or PIL images
2
wandb_logger.log_image(key="samples", images=[img1, img2])
3
4
# adding captions
5
wandb_logger.log_image(key="samples", images=[img1, img2], caption=["tree", "person"])
6
7
# using file path
8
wandb_logger.log_image(key="samples", images=["img_1.jpg", "img_2.jpg"])
9
10
# using .log in the trainer
11
trainer.logger.experiment.log({
12
"samples": [wandb.Image(img, caption=caption)
13
for (img, caption) in my_images]
14
})
Copied!
1
# data should be a list of lists
2
columns = ["input", "label", "prediction"]
3
my_data = [["cheese", "english", "english"], ["fromage", "french", "spanish"]]
4
5
# using columns and data
6
wandb_logger.log_text(key="my_samples", columns=columns, data=my_data)
7
8
# using a pandas DataFrame
9
wandb_logger.log_text(key="my_samples", dataframe=my_dataframe)
Copied!
1
# log a W&B Table that has a text caption, an image and audio
2
columns = ["caption", "image", "sound"]
3
4
# data should be a list of lists
5
my_data = [["cheese", wandb.Image(img_1), wandb.Audio(snd_1)],
6
["wine", wandb.Image(img_2), wandb.Audio(snd_2)]]
7
8
# log the Table
9
wandb_logger.log_table(key="my_samples", columns=columns, data=data)
Copied!
You can use Lightning's Callbacks system to control when you log to Weights & Biases via the WandbLogger, in this example we log a sample of our validation images and predictions:
Log Image Predictions
1
import torch
2
import wandb
3
import pytorch_lightning as pl
4
from pytorch_lightning.loggers import WandbLogger
5
6
class LogPredictionSamplesCallback(Callback):
7
8
def on_validation_batch_end(
9
self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
10
"""Called when the validation batch ends."""
11
12
# `outputs` comes from `LightningModule.validation_step`
13
# which corresponds to our model predictions in this case
14
15
# Let's log 20 sample image predictions from the first batch
16
if batch_idx == 0:
17
n = 20
18
x, y = batch
19
images = [img for img in x[:n]]
20
captions = [f'Ground Truth: {y_i} - Prediction: {y_pred}'
21
for y_i, y_pred in zip(y[:n], outputs[:n])]
22
23
24
# Option 1: log images with `WandbLogger.log_image`
25
wandb_logger.log_image(
26
key='sample_images',
27
images=images,
28
caption=captions)
29
30
31
# Option 2: log images and predictions as a W&B Table
32
columns = ['image', 'ground truth', 'prediction']
33
data = [[wandb.Image(x_i), y_i, y_pred] f
34
or x_i, y_i, y_pred in list(zip(x[:n], y[:n], outputs[:n]))]
35
wandb_logger.log_table(
36
key='sample_table',
37
columns=columns,
38
data=data)
39
...
40
41
trainer = pl.Trainer(
42
...
43
callbacks=[LogPredictionSamplesCallback()]
44
)
Copied!

How to use multiple GPUs with Lightning and W&B?

PyTorch Lightning has Multi-GPU support through their DDP Interface. However, PyTorch Lightning's design requires us to be careful about how we instantiate our GPUs.
Lightning assumes that each GPU (or Rank) in your training loop must be instantiated in exactly the same way - with the same initial conditions. However, only rank 0 process gets access to the wandb.run object, and for non-zero rank processes: wandb.run = None. This could cause your non-zero processes to fail. Such a situation can put you in a deadlock because rank 0 process will wait for the non-zero rank processes to join, which have already crashed.
For this reason, we have to be careful about how we set up our training code. The recommended way to set it up would be to have your code be independent of the wandb.run object.
1
class MNISTClassifier(pl.LightningModule):
2
def __init__(self):
3
super(MNISTClassifier, self).__init__()
4
5
self.model = nn.Sequential(
6
nn.Flatten(),
7
nn.Linear(28 * 28, 128),
8
nn.ReLU(),
9
nn.Linear(128, 10),
10
)
11
12
self.loss = nn.CrossEntropyLoss()
13
14
def forward(self, x):
15
return self.model(x)
16
17
def training_step(self, batch, batch_idx):
18
x, y = batch
19
y_hat = self.forward(x)
20
loss = self.loss(y_hat, y)
21
22
self.log("train/loss", loss)
23
return {"train_loss": loss}
24
25
def validation_step(self, batch, batch_idx):
26
x, y = batch
27
y_hat = self.forward(x)
28
loss = self.loss(y_hat, y)
29
30
self.log("val/loss", loss)
31
return {"val_loss": loss}
32
33
def configure_optimizers(self):
34
return torch.optim.Adam(self.parameters(), lr=0.001)
35
36
def main():
37
# Setting all the random seeds to the same value.
38
# This is important in a distributed training setting.
39
# Each rank will get its own set of initial weights.
40
# If they don't match up, the gradients will not match either,
41
# leading to training that may not converge.
42
pl.seed_everything(1)
43
44
train_loader = DataLoader(train_dataset, batch_size = 64,
45
shuffle = True,
46
num_workers = 4)
47
val_loader = DataLoader(val_dataset,
48
batch_size = 64,
49
shuffle = False,
50
num_workers = 4)
51
52
model = MNISTClassifier()
53
wandb_logger = WandbLogger(project = "<project_name>")
54
callbacks = [
55
ModelCheckpoint(
56
dirpath = "checkpoints",
57
every_n_train_steps=100,
58
),
59
]
60
trainer = pl.Trainer(
61
max_epochs = 3,
62
gpus = 2,
63
logger = wandb_logger,
64
strategy="ddp",
65
callbacks=callbacks
66
)
67
trainer.fit(model, train_loader, val_loader)
Copied!

Check out interactive examples!

You can follow along in our video tutorial with our tutorial colab here

Frequently Asked Questions

How does W&B integrate with Lightning?

The core integration is based on the Lightning loggers API, which lets you write much of your logging code in a framework-agnostic way. Loggers are passed to the Lightning Trainer and are triggered based on that API's rich hook-and-callback system. This keeps your research code well-separated from engineering and logging code.

What does the integration log without any additional code?

We'll save your model checkpoints to W&B, where you can view them or download them for use in future runs. We'll also capture system metrics, like GPU usage and network I/O, environment information, like hardware and OS information, code state (including git commit and diff patch, notebook contents and session history), and anything printed to the standard out.

What if I really need to use wandb.run in my training setup?

You will have to essentially expand the scope of the variable you need to access yourself. In other words, making sure that the initial conditions are the same on all processes.
1
if os.environ.get("LOCAL_RANK", None) is None:
2
os.environ["WANDB_DIR"] = wandb.run.dir
Copied!
Then, you can use os.environ["WANDB_DIR"] to set up the model checkpoints directory. This way, wandb.run.dir can be used by any non-zero rank processes as well.
Last modified 2mo ago