Compare hyperparameters, output metrics, and system stats like GPU utilization across your models.
🤔 Why should I use W&B?
- Unified dashboard: Central repository for all your model metrics and predictions
- Lightweight: No code changes required to integrate with Hugging Face
- Accessible: Free for individuals and academic teams
- Secure: All projects are private by default
- Trusted: Used by machine learning teams at OpenAI, Toyota, Lyft and more
Think of W&B like GitHub for machine learning models— save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you're running your scripts.
W&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W&B account to start tracking and visualizing your models.
In the Hugging Face Transformers repo, we've instrumented the Trainer to automatically log training and evaluation metrics to W&B at each logging step.
Here's an in depth look at how the integration works: Hugging Face + W&B Report.
🚀 Install, Import, and Log in
Install the Hugging Face and Weights & Biases libraries, and the GLUE dataset and training script for this tutorial.
- Hugging Face Transformers: Natural language models and datasets
- Weights & Biases: Experiment tracking and visualization
- GLUE dataset: A language understanding benchmark dataset
- GLUE script: Model training script for sequence classification
!pip install datasets wandb evaluate accelerate -qU
# the run_glue.py script requires transformers dev
!pip install -q git+https://github.com/huggingface/transformers
🔑 Put in your API key
Once you've signed up, run the next cell and click on the link to get your API key and authenticate this notebook.
Optionally, we can set environment variables to customize W&B logging. See documentation.
# Optional: log both gradients and parameters
👟 Train the model
Next, call the downloaded training script run_glue.py and see training automatically get tracked to the Weights & Biases dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus— pairs of sentences with human annotations indicating whether they are semantically equivalent.
!python run_glue.py \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--max_seq_length 256 \
--per_device_train_batch_size 32 \
--learning_rate 2e-4 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
👀 Visualize results in dashboard
Click the link printed out above, or go to wandb.ai to see your results stream in live. The link to see your run in the browser will appear after all the dependencies are loaded — look for the following output: "wandb: 🚀 View run at [URL to your unique run]"
Visualize Model Performance It's easy to look across dozens of experiments, zoom in on interesting findings, and visualize highly dimensional data.
Compare Architectures Here's an example comparing BERT vs DistilBERT — it's easy to see how different architectures effect the evaluation accuracy throughout training with automatic line plot visualizations.
📈 Track key information effortlessly by default
Weights & Biases saves a new run for each experiment. Here's the information that gets saved by default:
- Hyperparameters: Settings for your model are saved in Config
- Model Metrics: Time series data of metrics streaming in are saved in Log
- Terminal Logs: Command line outputs are saved and available in a tab
- System Metrics: GPU and CPU utilization, memory, temperature etc.