Documentation
Search…
Kubeflow Pipelines (kfp)

Overview

Kubeflow Pipelines (kfp) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.
This integration lets users apply decorators to kfp python functional components to automatically log parameters and artifacts to W&B.
This feature was enabled in wandb==0.12.11 and requires kfp<2.0.0

Quickstart

Install W&B and login

Notebook
Command Line
1
!pip install kfp wandb
2
3
import wandb
4
wandb.login()
Copied!
1
pip install kfp wandb
2
wandb login
Copied!

Decorate your components

Add the @wandb_log decorator and create your components as usual. This will automatically log the input/outputs parameters and artifacts to W&B each time you run your pipeline.
1
from kfp import components
2
from wandb.integration.kfp import wandb_log
3
4
@wandb_log
5
def add(a: float, b: float) -> float:
6
return a + b
7
8
add = components.create_component_from_func(add)
Copied!

Passing env vars to containers

You may need to explicitly pass WANDB env vars to your containers. For two-way linking, you should also set the env var WANDB_KUBEFLOW_URL to the base URL of your Kubeflow Pipelines instance (e.g. https://kubeflow.mysite.com)
1
import os
2
from kubernetes.client.models import V1EnvVar
3
4
def add_wandb_env_variables(op):
5
env = {
6
"WANDB_API_KEY": os.getenv("WANDB_API_KEY"),
7
"WANDB_BASE_URL": os.getenv("WANDB_BASE_URL"),
8
}
9
10
for name, value in env.items():
11
op = op.add_env_variable(V1EnvVar(name, value))
12
return op
13
14
@dsl.pipeline(name="example-pipeline")
15
def example_pipeline(...):
16
conf = dsl.get_pipeline_conf()
17
conf.add_op_transformer(add_wandb_env_variables)
18
...
Copied!

Where is my data? Can I access it programmatically?

Via the Kubeflow Pipelines UI

Click on any Run in the Kubeflow Pipelines UI that has been logged with W&B.
  • Inputs and outputs will be tracked in the Input/Output and ML Metadata tabs
  • You can also view the W&B web app from the Visualizations tab.
Get a view of W&B in the Kubeflow UI

Via the web app UI

The web app UI has the same content as the Visualizations tab in Kubeflow Pipelines, but with more space! Learn more about the web app UI here.
View details about a particular run (and link back to the Kubeflow UI)
See the full DAG of inputs and outputs at each stage of your pipeline

Via the Public API (for programmatic access)

Concept mapping from Kubeflow Pipelines to W&B

Here's a mapping of Kubeflow Pipelines concepts to W&B
Kubeflow Pipelines
W&B
Location in W&B
Input Scalar
config
Output Scalar
summary
Input Artifact
Input Artifact
Output Artifact
Output Artifact

Fine-grain logging

If you want finer control of logging, you can sprinkle in wandb.log and wandb.log_artifact calls in the component.

With explicit wandb logging calls

In this example below, we are training a model. The @wandb_log decorator will automatically track the relevant inputs and outputs. If you want to log the training process, you can explicitly add that logging like so:
1
@wandb_log
2
def train_model(
3
train_dataloader_path: components.InputPath("dataloader"),
4
test_dataloader_path: components.InputPath("dataloader"),
5
model_path: components.OutputPath("pytorch_model")
6
):
7
...
8
for epoch in epochs:
9
for batch_idx, (data, target) in enumerate(train_dataloader):
10
...
11
if batch_idx % log_interval == 0:
12
wandb.log({
13
"epoch": epoch,
14
"step": batch_idx * len(data),
15
"loss": loss.item()
16
})
17
...
18
wandb.log_artifact(model_artifact)
Copied!

With implicit wandb integrations

If you're using a framework integration we support, you can also pass in the callback directly:
1
@wandb_log
2
def train_model(
3
train_dataloader_path: components.InputPath("dataloader"),
4
test_dataloader_path: components.InputPath("dataloader"),
5
model_path: components.OutputPath("pytorch_model")
6
):
7
from pytorch_lightning.loggers import WandbLogger
8
from pytorch_lightning import Trainer
9
10
trainer = Trainer(logger=WandbLogger())
11
... # do training
Copied!
Last modified 1mo ago