Hugging Face Transformers
The Hugging Face Transformers library makes state-of-the-art NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The W&B integration adds rich, flexible experiment tracking and model versioning to interactive centralized dashboards without compromising that ease of use.
๐ค Next-level logging in few linesโ
os.eviron["WANDB_PROJECT"] = "<my-amazing-project>" # log to your project
os.eviron["WANDB_LOG_MODEL"] = "all" # log your models
from transformers import TrainingArguments, Trainer
args = TrainingArguments(... , report_to="wandb")
trainer = Trainer(... , args=args)
If you'd rather dive straight into working code, check out this Google Colab.
Getting started: track experimentsโ
1) Sign Up, install the wandb
library and log inโ
a) Sign up for a free account
b) Pip install the wandb
library
c) To login in your training script, you'll need to be signed in to you account at www.wandb.ai, then you will find your API key on the Authorize page.
If you are using Weights and Biases for the first time you might want to check out our quickstart
- Command Line
- Notebook
pip install wandb
wandb login
!pip install wandb
import wandb
wandb.login()
2) Name the projectโ
A Project is where all of the charts, data, and models logged from related runs are stored. Naming your project helps you organize your work and keep all the information about a single project in one place.
To add a run to a project simply set the WANDB_PROJECT
environment variable to the name of your project. The WandbCallback
will pick up this project name environment variable and use it when setting up your run.
- Command Line
- Notebook
WANDB_PROJECT=amazon_sentiment_analysis
%env WANDB_PROJECT=amazon_sentiment_analysis
Make sure you set the project name before you initialize the Trainer
.
If a project name is not specified the project name defaults to "huggingface".
3) Log your training runs to W&Bโ
This is the most important step: when defining your Trainer
training arguments, either inside your code or from the command line, set report_to
to "wandb"
in order enable logging with Weights & Biases.
You can also give a name to the training run using the run_name
argument.
Using TensorFlow? Just swap the PyTorch Trainer
for the TensorFlow TFTrainer
.
That's it! Now your models will log losses, evaluation metrics, model topology, and gradients to Weights & Biases while they train.
- Command Line
- Notebook
python run_glue.py \ # run your Python script
--report_to wandb \ # enable logging to W&B
--run_name bert-base-high-lr \ # name of the W&B run (optional)
# other command line arguments here
from transformers import TrainingArguments, Trainer
args = TrainingArguments(
# other args and kwargs here
report_to="wandb", # enable logging to W&B
run_name="bert-base-high-lr" # name of the W&B run (optional)
)
trainer = Trainer(
# other args and kwargs here
args=args, # your training args
)
trainer.train() # start training and logging to W&B
4) Turn on model Checkpoints and Versioningโ
Using Weights & Biases' Artifacts, you can store up to 100GB of models and datasets. Logging your Hugging Face model to W&B Artifacts can be done by setting a W&B environment variable called WANDB_LOG_MODEL
to one of 'end'
or 'checkpoint'
.
'end'
logs only the final model while 'checkpoint'
logs the model checkpoints every save_steps
in the TrainingArguments
.
- Command Line
- Notebook
WANDB_LOG_MODEL='end'
%env WANDB_LOG_MODEL='end'
By default, your model will be saved to W&B Artifacts as model-{run_id}
when WANDB_LOG_MODEL
is set to end
or checkpoint-{run_id}
when WANDB_LOG_MODEL
is set to checkpoint
.
However, If you pass a run_name
in your TrainingArguments
, the model will be saved as model-{run_name}
or checkpoint-{run_name}
.
Any Trainer
you initialize from now on will upload models to your W&B project.
The model checkpoints you log will be viewable through the W&B Artifacts UI, and include the full model lineage (see an example model checkpoint in the UI here.
To bookmark your best model checkpoints and centralize them across your team, you can link them to the W&B Model Registry. Here you can organize your best models by task, manage model lifecycle, facilitate easy tracking and auditing throughout the ML lifecyle, and automate downstream actions with webhooks or jobs.
(Notebook only) Finish your W&B Runโ
If your training is encapsulated in a Python script, the W&B run will end when your script finishes.
If you are using a Jupyter or Google Colab notebook, you'll need to tell us when you're done with training by calling wandb.finish()
.
trainer.train() # start training and logging to W&B
# post-training analysis, testing, other logged code
wandb.finish()
5) Visualize your resultsโ
Once you have logged your training results you can explore your results dynamically in the W&B Dashboard. It's easy to compare across dozens of runs at once, zoom in on interesting findings, and coax insights out of complex data with flexible, interactive visualizations.
Advanced features and FAQsโ
How do I save the best model?โ
Want to centralize all your best model versions across your team to organize them by ML task, stage them for production, bookmark them for further evaluation, or kick off downstream Model CI/CD processes? Check out the Model Registry
If load_best_model_at_end=True
is passed to Trainer
, then W&B will save the best performing model to Artifacts.
Loading a saved modelโ
If you saved your model to W&B Artifacts with WANDB_LOG_MODEL
, you can download your model weights for additional training or to run inference. You just load them back into the same Hugging Face architecture that you used before.
# Create a new run
with wandb.init(project="amazon_sentiment_analysis") as run:
# Connect an Artifact to the run
my_model_name = "model-bert-base-high-lr:latest"
my_model_artifact = run.use_artifact(my_model_name)
# Download model weights to a folder and return the path
model_dir = my_model_artifact.download()
# Load your Hugging Face model from that folder
# using the same model class
model = AutoModelForSequenceClassification.from_pretrained(
model_dir, num_labels=num_labels)
# Do additional training, or run inference
Resume training from a checkpointโ
If you had set WANDB_LOG_MODEL='checkpoint'
you can also resume training by you can using the model_dir
as the model_name_or_path
argument in your TrainingArguments
and pass resume_from_checkpoint=True
to Trainer
.
last_run_id = "xxxxxxxx" # fetch the run_id from your wandb workspace
# resume the wandb run from the run_id
with wandb.init(
project=os.environ["WANDB_PROJECT"],
id=last_run_id,
resume="must",) as run:
# Connect an Artifact to the run
my_checkpoint_name = f"checkpoint-{last_run_id}:latest"
my_checkpoint_artifact = run.use_artifact(my_model_name)
# Download checkpoint to a folder and return the path
checkpoint_dir = my_checkpoint_artifact.download()
# reinitialize your model and trainer
model = AutoModelForSequenceClassification.from_pretrained(
<model_name>, num_labels=num_labels)
# your awesome training arguments here.
training_args = TrainingArguments(...)
trainer = Trainer(
model=model,
args=training_args,
...)
# make sure use the checkpoint dir to resume training from the checkpoint
trainer.train(resume_from_checkpoint=checkpoint_dir)
Additional W&B settingsโ
Further configuration of what is logged with Trainer
is possible by setting environment variables. A full list of W&B environment variables can be found here.
Environment Variable | Usage |
---|---|
WANDB_PROJECT | Give your project a name (huggingface by default) |
WANDB_LOG_MODEL | Log the model as artifact at the end of training (false by default) |
WANDB_WATCH | Set whether you'd like to log your models gradients, parameters or neither
|
WANDB_DISABLED | Set to true to disable logging entirely (false by default) |
WANDB_SILENT | Set to true to silence the output printed by wandb (false by default) |
- Command Line
- Notebook
WANDB_WATCH=all
WANDB_SILENT=true
%env WANDB_WATCH=all
%env WANDB_SILENT=true
Customize wandb.init
โ
The WandbCallback
that Trainer
uses will call wandb.init
under the hood when Trainer
is initialized. You can alternatively set up your runs manually by calling wandb.init
before theTrainer
is initialized. This gives you full control over your W&B run configuration.
An example of what you might want to pass to init
is below. For more details on how to use wandb.init
, check out the reference documentation.
wandb.init(project="amazon_sentiment_analysis",
name="bert-base-high-lr",
tags=["baseline", "high-lr"],
group="bert")
Custom loggingโ
Logging to Weights & Biases via the Transformers Trainer
is taken care of by the WandbCallback
(reference documentation) in the Transformers library. If you need to customize your Hugging Face logging you can modify this callback.
Highlighted Articlesโ
Below are 6 Transformers and W&B related articles you might enjoy
Hyperparameter Optimization for Hugging Face Transformers
- Three strategies for hyperparameter optimization for Hugging Face Transformers are compared - Grid Search, Bayesian Optimization, and Population Based Training.
- We use a standard uncased BERT model from Hugging Face transformers, and we want to fine-tune on the RTE dataset from the SuperGLUE benchmark
- Results show that Population Based Training is the most effective approach to hyperparameter optimization of our Hugging Face transformer model.
Read the full report here.
Hugging Tweets: Train a Model to Generate Tweets
- In the article, the author demonstrates how to fine-tune a pre-trained GPT2 HuggingFace Transformer model on anyone's Tweets in five minutes.
- The model uses the following pipeline: Downloading Tweets, Optimizing the Dataset, Initial Experiments, Comparing Losses Between Users, Fine-Tuning the Model.
Read the full report here.
Sentence Classification With Hugging Face BERT and WB
- In this article, we'll build a sentence classifier leveraging the power of recent breakthroughs in Natural Language Processing, focusing on an application of transfer learning to NLP.
- We'll be using The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification, which is a set of sentences labeled as grammatically correct or incorrect that was first published in May 2018.
- We'll use Google's BERT to create high performance models with minimal effort on a range of NLP tasks.
Read the full report here.
A Step by Step Guide to Tracking Hugging Face Model Performance
- We use Weights & Biases and Hugging Face transformers to train DistilBERT, a Transformer that's 40% smaller than BERT but retains 97% of BERT's accuracy, on the GLUE benchmark
- The GLUE benchmark is a collection of nine datasets and tasks for training NLP models
Read the full report here.
Early Stopping in HuggingFace - Examples
- Fine-tuning a Hugging Face Transformer using Early Stopping regularization can be done natively in PyTorch or TensorFlow.
- Using the EarlyStopping callback in TensorFlow is straightforward with the
tf.keras.callbacks.EarlyStopping
callback. - In PyTorch, there is not an off-the-shelf early stopping method, but there is a working early stopping hook available on GitHub Gist.
Read the full report here.
How to Fine-Tune Hugging Face Transformers on a Custom Dataset
We fine tune a DistilBERT transformer for sentiment analysis (binary classification) on a custom IMDB dataset.
Read the full report here.
Issues, questions, feature requestsโ
For any issues, questions, or feature requests for the Hugging Face W&B integration, feel free to post in this thread on the Hugging Face forums or open an issue on the Hugging Face Transformers GitHub repo.