How can I use wandb with multiprocessing, e.g. distributed training?
less than a minute
If a training program uses multiple processes, structure the program to avoid making wandb method calls from processes without wandb.init()
.
Manage multiprocess training using these approaches:
- Call
wandb.init
in all processes and use the group keyword argument to create a shared group. Each process will have its own wandb run, and the UI will group the training processes together. - Call
wandb.init
from only one process and pass data to log through multiprocessing queues.
Refer to the Distributed Training Guide for detailed explanations of these approaches, including code examples with Torch DDP.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.