Skip to main content

Hugging Face

Try in a Colab Notebook here β†’

Visualize your Hugging Face model's performance quickly with a seamless W&B integration.

Compare hyperparameters, output metrics, and system stats like GPU utilization across your models.

πŸ€” Why should I use W&B?​

  • Unified dashboard: Central repository for all your model metrics and predictions
  • Lightweight: No code changes required to integrate with Hugging Face
  • Accessible: Free for individuals and academic teams
  • Secure: All projects are private by default
  • Trusted: Used by machine learning teams at OpenAI, Toyota, Lyft and more

Think of W&B like GitHub for machine learning modelsβ€” save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you're running your scripts.

W&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W&B account to start tracking and visualizing your models.

In the Hugging Face Transformers repo, we've instrumented the Trainer to automatically log training and evaluation metrics to W&B at each logging step.

Here's an in depth look at how the integration works: Hugging Face + W&B Report.

πŸš€ Install, Import, and Log in

Install the Hugging Face and Weights & Biases libraries, and the GLUE dataset and training script for this tutorial.

!pip install datasets wandb evaluate accelerate -qU
# the script requires transformers dev
!pip install -q git+

πŸ–ŠοΈ Sign up for a free account →​

πŸ”‘ Put in your API key​

Once you've signed up, run the next cell and click on the link to get your API key and authenticate this notebook.

import wandb

Optionally, we can set environment variables to customize W&B logging. See documentation.

# Optional: log both gradients and parameters
%env WANDB_WATCH=all

πŸ‘Ÿ Train the model

Next, call the downloaded training script and see training automatically get tracked to the Weights & Biases dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpusβ€” pairs of sentences with human annotations indicating whether they are semantically equivalent.

%env WANDB_PROJECT=huggingface-demo

!python \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--max_seq_length 256 \
--per_device_train_batch_size 32 \
--learning_rate 2e-4 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir \
--logging_steps 50

πŸ‘€ Visualize results in dashboard

Click the link printed out above, or go to to see your results stream in live. The link to see your run in the browser will appear after all the dependencies are loaded β€” look for the following output: "wandb: πŸš€ View run at [URL to your unique run]"

Visualize Model Performance It's easy to look across dozens of experiments, zoom in on interesting findings, and visualize highly dimensional data.

Compare Architectures Here's an example comparing BERT vs DistilBERT β€” it's easy to see how different architectures effect the evaluation accuracy throughout training with automatic line plot visualizations.

πŸ“ˆ Track key information effortlessly by default​

Weights & Biases saves a new run for each experiment. Here's the information that gets saved by default:

  • Hyperparameters: Settings for your model are saved in Config
  • Model Metrics: Time series data of metrics streaming in are saved in Log
  • Terminal Logs: Command line outputs are saved and available in a tab
  • System Metrics: GPU and CPU utilization, memory, temperature etc.

πŸ€“ Learn more!​

  • Documentation: docs on the Weights & Biases and Hugging Face integration
  • Videos: tutorials, interviews with practitioners, and more on our YouTube channel
  • Contact: Message us at with questions
Was this page helpful?πŸ‘πŸ‘Ž