Skip to main content

Model Management Walkthrough

In this walkthrough you'll learn how to use Weights & Biases for Model Management. Track, visualize, and report on the complete production model workflow.

  1. Model Versioning: Save and restore every version of your model & learned parameters - organize versions by use case and objective. Track training metrics, assign custom metadata, and document rich markdown descriptions of your models.
  2. Model Lineage: Track the exact code, hyperparameters, & training dataset used to produce the model. Enable model reproducibility.
  3. Model Lifecycle: Promote promising models to positions like "staging" or "production" - allowing downstream users to fetch the best model automatically. Communicate progress collaboratively in Reports.

We are actively building new Model Management features. Please reach out with questions or suggestions at


Please see the Artifact Tab details for a discussion of all content available in the Model Registry!


Now we will walk through a canonical workflow for producing, organizing, and consuming trained models:

  1. Create a new Registered Model
  2. Train & log Model Versions
  3. Link Model Versions to the Registered Model
  4. Using a Model Version
  5. Evaluate Model Performance
  6. Promote a Version to Production
  7. Use the Production Model for Inference
  8. Build a Reporting Dashboard

A companion colab notebook is provided which covers step 2-3 in the first code block and steps 4-6 in the second code block.

1. Create a new Registered Model

First, create a Registered Model to hold all the candidate models for your particular modeling task. In this tutorial, we will use the classic MNIST Dataset - 28x28 grayscale input images with output classes from 0-9. The video below demonstrates how to create a new Registered Model.

  1. Visit your Model Registry at (linked from homepage).

  1. Click the Create Registered Model button at the top of the Model Registry.

  1. Make sure the Owning Entity and Owning Project are set correctly to the values you desire. Enter a unique name for your new Registered Model that describes the modeling task or use-case of interest.

2. Train & log Model Versions

Next, you will log a model from your training script:

  1. (Optional) Declare your dataset as a dependency so that it is tracked for reproducibility and audibility
  2. Serialize your model to disk periodically (and/or at the end of training) using the serialization process provided by your modeling library (eg PyTorch & Keras).
  3. Add your model files to an Artifact of type "model"
    • Note: We use the name f'mnist-nn-{}'. While not required, it is advisable to name-space your "draft" Artifacts with the Run id in order to stay organized
  4. (Optional) Log training metrics associated with the performance of your model during training.
    • Note: The data logged immediately before logging your Model Version will automatically be associated with that version
  5. Log your model
    • Note: If you are logging multiple versions, it is advisable to add an alias of "best" to your Model Version when it outperforms the prior versions. This will make it easy to find the model with peak performance - especially when the tail end of training may overfit!

By default, you should use the native W&B Artifacts API to log your serialized model. However, since this pattern is so common, we have provided a single method which combines serialization, Artifact creation, and logging. See the "(Beta) Using log_model" tab for details.

import wandb

# Always initialize a W&B run to start tracking

# (Optional) Declare an upstream dataset dependency
# see the `Declare Dataset Dependency` tab for
# alternative examples.
dataset = wandb.use_artifact("mnist:latest")

# At the end of every epoch (or at the end of your script)...
# ... Serialize your model"path/to/")
# ... Create a Model Version
art = wandb.Artifact(f'mnist-nn-{}', type="model")
# ... Add the serialized files
art.add_file("path/to/", "")
# (optional) Log training metrics
wandb.log({"train_loss": 0.345, "val_loss": 0.456})
# ... Log the Version
if model_is_best:
# If the model is the best model so far, add "best" to the aliases
wandb.log_artifact(art, aliases=["latest", "best"])

After logging 1 or more Model Versions, you will notice that your will have a new Model Artifact in your Artifact Browser. Here, we can see the results of logging 5 versions to an artifact named mnist_nn-1r9jjogr.

If you are following along the example notebook, you should see a Run Workspace with charts similar to the image below

Now, let's say that we are ready to link one of our Model Versions to the Registered Model. We can accomplish this manually as well as via an API.

The following video below demonstrates how to manually link a Model Version to your newly created Registered Model:

  1. Navigate to the Model Version of interest
  2. Click the link icon
  3. Select the target Registered Model
  4. (optional): Add additional aliases

After you link the Model Version, you will see hyperlinks connecting the Version in the Registered Model to the source Artifact and visa versa.

4. Use a Model Version

Now we are ready to consume a Model - perhaps to evaluate its performance, make predictions against a dataset, or use in a live production context. Similar to logging a Model, you may choose to use the raw Artifact API or the more opinionated beta APIs.

You can load in a Model Version using the use_artifact method.

import wandb

# Always initialize a W&B run to start tracking

# Download your Model Version files
path = wandb.use_artifact("[[entity/]project/]collectionName:latest").download()

# Reconstruct your model object in memory:
# `make_model_from_data` below represents your deserialization logic
# to load in a model from disk
model = make_model_from_data(path)

5. Evaluate Model Performance

After training many Models, you will likely want to evaluate the performance of those models. In most circumstances you will have some held-out data which serves as a test dataset, independent of the dataset your models have access to during training. To evaluate a Model Version, you will want to first complete step 4 above to load a model into memory. Then:

  1. (Optional) Declare a data dependency to your evaluation data
  2. Log metrics, media, tables, and anything else useful for evaluation
# ... continuation from 4

# (Optional) Declare an upstream evaluation dataset dependency
dataset = wandb.use_artifact("mnist-evaluation:latest")

# Evaluate your model according to your use-case
loss, accuracy, predictions = evaluate_model(model, dataset)

# Log out metrics, images, tables, or any data useful for evaluation.
wandb.log({"loss": loss, "accuracy": accuracy, "predictions": predictions})

If you are executing similar code, as demonstrated in the notebook, you should see a workspace similar to the image below - here we even show model predictions against the test data!

6. Promote a Version to Production

Next, you will likely want to denote which version in the Registered Model is intended to be used for Production. Here, we use the concept of aliases. Each Registered Model can have any aliases which make sense for your use case - however we often see production as the most common alias. Each alias can only be assigned to a single Version at a time.

The image below shows the new production alias added to v1 of the Registered Model!

7. Consume the Production Model


You can reference a Version within the Registered Model using different alias strategies:

  • latest - which will fetch the most recently linked Version
  • v# - using v0, v1, v2, ... you can fetch a specific version in the Registered Model
  • production - you can use any custom alias that you and your team have assigned

8. Build a Reporting Dashboard

Using Weave Panels, you can display any of the Model Registry/Artifact views inside of Reports! See a demo here. Below is a full-page screenshot of an example Model Dashboard.

Was this page helpful?👍👎