Documentation
Search…
Fastai
If you're using fastai to train your models, W&B has an easy integration using the WandbCallback. Explore the details in interactive docs with examples →​

Start Logging with W&B

a) Sign up for a free account at https://wandb.ai/site and then login to your wandb account.
b) Install the wandb library on your machine in a Python 3 environment using pip
c) Login to the wandb library on your machine. You will find your API key here: https://wandb.ai/authorize.
Notebook
Command Line
1
!pip install wandb
2
​
3
import wandb
4
wandb.login()
Copied!
1
pip install wandb
2
wandb login
Copied!
Then add the WandbCallback to the learner or fit method:
1
import wandb
2
from fastai.callback.wandb import *
3
​
4
# start logging a wandb run
5
wandb.init(project='my_project')
6
​
7
# To log only during one training phase
8
learn.fit(..., cbs=WandbCallback())
9
​
10
# To log continuously for all training phases
11
learn = learner(..., cbs=WandbCallback())
Copied!
If you use version 1 of Fastai, refer to the Fastai v1 docs.

WandbCallback Arguments

WandbCallback accepts the following arguments:
Args
Description
log
Whether to log the model's: "gradients" , "parameters", "all" or None (default). Losses & metrics are always logged.
log_preds
whether we want to log prediction samples (default to True).
log_preds_every_epoch
whether to log predictions every epoch or at the end (default to False)
log_model
whether we want to log our model (default to False). This also requires SaveModelCallback
model_name
The name of the file to save, overrides SaveModelCallback
log_dataset
  • False (default)
  • True will log folder referenced by learn.dls.path.
  • a path can be defined explicitly to reference which folder to log.
Note: subfolder "models" is always ignored.
dataset_name
name of logged dataset (default to folder name).
valid_dl
DataLoaders containing items used for prediction samples (default to random items from learn.dls.valid.
n_preds
number of logged predictions (default to 36).
seed
used for defining random samples.
For custom workflows, you can manually log your datasets and models:
  • log_dataset(path, name=None, metadata={})
  • log_model(path, name=None, metadata={})
Note: any subfolder "models" will be ignored.

Distributed Training

fastai supports distributed training by using the context manager distrib_ctx. W&B supports this automatically and enables you to track your Multi-GPU experiments out of the box.
A minimal example is shown below:
Notebook
Script
You can now run distributed training directly inside a notebook!
1
import wandb
2
from fastai.vision.all import *
3
​
4
from accelerate import notebook_launcher
5
from fastai.distributed import *
6
from fastai.callback.wandb import WandbCallback
7
​
8
wandb.require(experiment="service")
9
path = untar_data(URLs.PETS)/'images'
10
​
11
def train():
12
dls = ImageDataLoaders.from_name_func(
13
path, get_image_files(path), valid_pct=0.2,
14
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
15
wandb.init('fastai_ddp', entity='capecape')
16
cb = WandbCallback()
17
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
18
with learn.distrib_ctx(in_notebook=True, sync_bn=False):
19
learn.fit(1)
20
21
notebook_launcher(train, num_processes=2)
Copied!
train.py
1
import wandb
2
from fastai.vision.all import *
3
from fastai.distributed import *
4
from fastai.callback.wandb import WandbCallback
5
​
6
wandb.require(experiment="service")
7
path = rank0_first(lambda: untar_data(URLs.PETS)/'images')
8
​
9
def train():
10
dls = ImageDataLoaders.from_name_func(
11
path, get_image_files(path), valid_pct=0.2,
12
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
13
wandb.init('fastai_ddp', entity='capecape')
14
cb = WandbCallback()
15
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
16
with learn.distrib_ctx(sync_bn=False):
17
learn.fit(1)
18
19
if __name__ == "__main__":
20
train()
Copied!
Then, in your terminal you will execute:
1
$ torchrun --nproc_per_node 2 train.py
Copied!
in this case, the machine has 2 GPUs.

Logging only on the main process

In the examples above, wandb launches one run per process. At the end of the training, you will end up with two runs. This can sometimes be confusing, and you may want to log only on the main process. To do so, you will have to detect in which process you are manually and avoid creating runs (calling wandb.init in all other processes)
Notebook
Script
1
import wandb
2
from fastai.vision.all import *
3
​
4
from accelerate import notebook_launcher
5
from fastai.distributed import *
6
from fastai.callback.wandb import WandbCallback
7
​
8
wandb.require(experiment="service")
9
path = untar_data(URLs.PETS)/'images'
10
​
11
def train():
12
cb = []
13
dls = ImageDataLoaders.from_name_func(
14
path, get_image_files(path), valid_pct=0.2,
15
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
16
if rank_distrib() == 0:
17
run = wandb.init('fastai_ddp', entity='capecape')
18
cb = WandbCallback()
19
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
20
with learn.distrib_ctx(in_notebook=True, sync_bn=False):
21
learn.fit(1)
22
​
23
notebook_launcher(train, num_processes=2)
Copied!
train.py
1
import wandb
2
from fastai.vision.all import *
3
from fastai.distributed import *
4
from fastai.callback.wandb import WandbCallback
5
​
6
wandb.require(experiment="service")
7
path = rank0_first(lambda: untar_data(URLs.PETS)/'images')
8
​
9
def train():
10
cb = []
11
dls = ImageDataLoaders.from_name_func(
12
path, get_image_files(path), valid_pct=0.2,
13
label_func=lambda x: x[0].isupper(), item_tfms=Resize(224))
14
if rank_distrib() == 0:
15
run = wandb.init('fastai_ddp', entity='capecape')
16
cb = WandbCallback()
17
learn = vision_learner(dls, resnet34, metrics=error_rate, cbs=cb).to_fp16()
18
with learn.distrib_ctx(sync_bn=False):
19
learn.fit(1)
20
21
if __name__ == "__main__":
22
train()
Copied!
in your terminal call:
1
$ torchrun --nproc_per_node 2 train.py
Copied!

Examples

Last modified 30d ago