spaCy
A Weights & Biases integration for the spaCy library: industrial strength NLP, logged with W&B
spaCy is a popular "industrial-strength" NLP library: fast, accurate models with a minimum of fuss. As of spaCy v3, Weights and Biases can now be used with
spacy train
to track your spaCy model's training metrics as well as to save and version your models and datasets. And all it takes is a few added lines in your configuration!Notebook
Command Line
!pip install wandb
import wandb
wandb.login()
pip install wandb
wandb login
spaCy config files are used to specify all aspects of training, not just logging -- GPU allocation, optimizer choice, dataset paths, and more. Minimally, under
[training.logger]
you need to provide the key @loggers
with the value "spacy.WandbLogger.v3"
, plus a project_name
. You can also turn on dataset and model versioning by just adding a line to the config file.For more on how spaCy training config files work and on other options you can pass in to customize training, check out spaCy's documentation.
config.cfg
[training.logger]
@loggers = "spacy.WandbLogger.v3"
project_name = "my_spacy_project"
remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
log_dataset_dir = "./corpus"
model_log_interval = 1000
Name | Description |
---|---|
project_name | str . The name of the Weights & Biases project. The project will be created automatically if it doesn’t exist yet. |
remove_config_values | List[str] . A list of values to exclude from the config before it is uploaded to W&B. [] by default. |
model_log_interval | Optional int . None by default. If set, model versioning with Artifacts will be enabled. Pass in the number of steps to wait between logging model checkpoints. None by default. |
log_dataset_dir | Optional str . If passed a path, the dataset will be uploaded as an Artifact at the beginning of training. None by default. |
entity | Optional str . If passed, the run will be created in the specified entity |
run_name | Optional str . If specified, the run will be created with the specified name. |
Once you have added the
WandbLogger
to your spaCy training config you can run spacy train
as usual.Notebook
Command Line
!python -m spacy train \
config.cfg \
--output ./output \
--paths.train ./train \
--paths.dev ./dev
python -m spacy train \
config.cfg \
--output ./output \
--paths.train ./train \
--paths.dev ./dev
Last modified 1yr ago