[Beta] Model Management
Catalog and version models, standardize model evaluation, and promote the best models to production
We are actively building out the model registry and model evaluation use cases for W&B. Please contact us with questions and suggestions at [email protected] To unlock Weave panels, add weave-report to your profile page bio.
Use W&B for Model Management to track and report on the complete lifecycle of a model:
  1. 1.
    Datasets: The exact version of the dataset a model trained on
  2. 2.
    Code: The code used in model training
  3. 3.
    Models: The weights of the trained model itself
  4. 4.
    Metrics: The evaluation results of a model on different golden datasets
  5. 5.
    Statuses: Where each model is in the pipeline (ex. "staging" or "production")
Use the interactive W&B UI to view all saved model versions, compare models on evaluation metrics, and track the status of models at different stages in the pipeline.
To unlock Weave panels, add weave-report to your profile page bio.

Quickstart Walkthrough

Clone our GitHub Examples Repo and follow along with this model-evaluation code example.

1. Install requirements

Install the Weights & Biases library wandb and other dependencies.
pip install -r requirements.txt

2. Register a dataset

Generate and register a dataset for a particular model use case. In this example, we use the MNIST dataset for simplicity.

3. Train some models

Train a model based on the latest available dataset for the given model use case. Tweak hyperparameters from the command line, like this:
python --validation_split 0.05
python --validation_split 0.2
python --batch_size 64
python --batch_size 160
Later you'll be able to compare training performance for different models in the W&B dashboard.
Did you hit a weird error? Try waiting a little longer for your dataset from Step #2 to get registered before running Step #3 to train on that dataset.
Here is an example dashboard comparing the models we've trained so far.

4. Evaluate candidate models

Next, run the evaluator script to:
  1. 1.
    Finds all models that haven't yet been evaluated on the latest evaluation dataset
  2. 2.
    Runs the evaluation job for each model
  3. 3.
    Labels the best model "production" to feed into an inference system

5. Visualize results

Create tables to visualize your results. Here's an example report that captures and compares trained models:
In this example, this Weave table is visualizing logged model Artifacts with:
  1. 1.
    Model link: A link to the registered model artifact in the app
  2. 2.
    Version: A unique version number for each registered model
  3. 3.
    Status: A label to indicate key model versions, like production
  4. 4.
    Loss @ 10k: Metric calculated on an evaluation set of 10k
  5. 5.
    Loss @ 1k: Model metric calculated on an evaluation set of 1k

Core features for model management

There are a few key features you can use to build your own Model Registry:
  1. 1.
    Runs: Track a job execution in your ML pipeline — ex. model training, model evaluation
  2. 2.
    Artifacts: Track job inputs and outputs — ex. datasets, trained models
  3. 3.
    Tables: Track and visualize tabular data — ex. evaluation datasets, model predictions
  4. 4.
    Weave: Query and visualize logged data — ex. a list of trained models.
  5. 5.
    Reports: Organize and visualize results — ex. charts, tables, and notes

Model Registry Table

Once you have logged model Artifacts, it's time to query those artifacts.

1. Activate Weave

Go to your profile page and add weave-report to your bio to activate this new beta query feature.

2. Create a report

In a project, go to the Reports tab and click Create a report.

3. Add a Weave panel

Type /weave to create a new Weave panel in your report. If you want to remove the Weave panel later, you can click the handle on the left sidebar and click Delete.

4. Query your logged models

Start typing a query in the weave panel.
Here's what each piece of the query in my example means:
  • projects("carey", "a_model_registry_example"): This pulls data from the entity carey and the project called a_model_registry_example.
  • artifactType("model"): This pulls all the artifacts of type model in this project.
  • artifactVersions: This pulls all the artifact versions of type model.
Add a column to pull all the links to different logged model artifacts.

6. Get the evaluation metric for each model

Create a new row in the table, and query for the loss. This was calculated in the evaluation step, which tested each model on a held-out dataset.
Optionally, you can rename the loss column so it's more readable.

7. Add a date created column

Sometimes it's nice to sort the table by the created time. Add a column:

8. Add a status column

Use the artifacts alias field to keep track of the status of different artifacts in your model registry. Add a column with row.aliases
Then make the Panel visualize the results of the query as: List of: String
Last modified 3mo ago