Documentation
Search…
Metaflow

Overview

Metaflow is a framework created by Netflix for creating and running ML workflows.
This integration lets users apply decorators to Metaflow steps and flows to automatically log parameters and artifacts to W&B.
  • Decorating a step will enable or disable logging for certain types within that step.
  • Decorating the flow will enable or disable logging for every step in the flow.

Quickstart

Install W&B and login

Notebook
Command Line
1
!pip install -Uqqq metaflow fastcore wandb
2
3
import wandb
4
wandb.login()
Copied!
1
pip install -Uqqq metaflow fastcore wandb
2
wandb login
Copied!

Decorate your flows and steps

Step
Flow
Both flow and steps
Decorating a step will enable or disable logging for certain types within that Step.
In this example, all datasets and models in start will be logged
1
from wandb.integration.metaflow import wandb_log
2
3
class WandbExampleFlow(FlowSpec):
4
@wandb_log(datasets=True, models=True, settings=wandb.Settings(...))
5
@step
6
def start(self):
7
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
8
self.model_file = torch.load(...) # nn.Module -> upload as model
9
self.next(self.transform)
Copied!
Decorating a flow is equivalent to decorating all the constituent steps with a default.
In this case, all steps in WandbExampleFlow will log datasets and models by default -- the same as decorating each step with @wandb_log(datasets=True, models=True)
1
from wandb.integration.metaflow import wandb_log
2
3
@wandb_log(datasets=True, models=True) # decorate all @step
4
class WandbExampleFlow(FlowSpec):
5
@step
6
def start(self):
7
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
8
self.model_file = torch.load(...) # nn.Module -> upload as model
9
self.next(self.transform)
Copied!
Decorating the flow is equivalent to decorating all steps with a default. That means if you later decorate a Step with another @wandb_log, you will override the flow-level decoration.
In the example below:
  • start and mid will log datasets and models, but
  • end will not log datasets or models.
1
from wandb.integration.metaflow import wandb_log
2
3
@wandb_log(datasets=True, models=True) # same as decorating start and mid
4
class WandbExampleFlow(FlowSpec):
5
# this step will log datasets and models
6
@step
7
def start(self):
8
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
9
self.model_file = torch.load(...) # nn.Module -> upload as model
10
self.next(self.mid)
11
12
# this step will also log datasets and models
13
@step
14
def mid(self):
15
self.raw_df = pd.read_csv(...). # pd.DataFrame -> upload as dataset
16
self.model_file = torch.load(...) # nn.Module -> upload as model
17
self.next(self.end)
18
19
# this step is overwritten and will NOT log datasets OR models
20
@wandb_log(datasets=False, models=False)
21
@step
22
def end(self):
23
self.raw_df = pd.read_csv(...).
24
self.model_file = torch.load(...)
Copied!

Where is my data? Can I access it programmatically?

You can access the information we've captured in three ways: inside the original Python process being logged using the wandb client library, via the web app UI, or programmatically using our Public API. Parameters are saved to W&B's config and can be found in the Overview tab. datasets, models, and others are saved to W&B Artifacts and can be found in the Artifacts tab. Base python types are saved to W&B's summary dict and can be found in the Overview tab. See our guide to the Public API for details on using the API to get this information programmatically from outside .
Here's a cheatsheet:
Data
Client library
UI
Parameter(...)
wandb.config
Overview tab, Config
datasets, models, others
wandb.use_artifact("{var_name}:latest")
Artifacts tab
Base Python types (dict, list, str, etc.)
wandb.summary
Overview tab, Summary

wandb_log kwargs

kwarg
Options
datasets
  • True: Log instance variables that are a dataset
  • False
models
  • True: Log instance variables that are a model
  • False
others
  • True: Log anything else that is serializable as a pickle
  • False
settings
  • wandb.Settings(...): Specify your own wandb settings for this step or flow
  • None: Equivalent to passing wandb.Settings()
By default, if:
  • settings.run_group is None, it will be set to {flow_name}/{run_id}
  • settings.run_job_type is None, it will be set to {run_job_type}/{step_name}

Frequently Asked Questions

What exactly do you log? Do you log all instance and local variables?

wandb_log only logs instance variables. Local variables are NEVER logged. This is useful to avoid logging unnecessary data.

Which data types get logged?

We currently support these types:
Logging Setting
Type
default (always on)
  • dict, list, set, str, int, float, bool
datasets
  • pd.DataFrame
  • pathlib.Path
models
  • nn.Module
  • sklearn.base.BaseEstimator
others

Examples of logging behaviour

Kind of Variable
Behaviour
Example
Data Type
Instance
Auto-logged
self.accuracy
float
Instance
Logged if datasets=True
self.df
pd.DataFrame
Instance
Not logged if datasets=False
self.df
pd.DataFrame
Local
Never logged
accuracy
float
Local
Never logged
df
pd.DataFrame

Does this track artifact lineage?

Yes! If you have an artifact that is an output of step A and an input to step B, we automatically construct the lineage DAG for you.
For an example of this behaviour, please see this notebook and its corresponding W&B Artifacts page
Last modified 2mo ago