DSPy
3 minute read
Use W&B with DSPy to track and optimize your language model programs. W&B complements the Weave DSPy integration by providing:
- Evaluation metrics tracking over time
- W&B Tables for program signature evolution
- Integration with DSPy optimizers like MIPROv2
For comprehensive observability when optimizing DSPy modules, enable the integration in both W&B and Weave.
Install and authenticate
Install the required libraries and authenticate with W&B:
-
Install the required libraries:
pip install wandb weave dspy
-
Set the
WANDB_API_KEY
environment variable and log in:export WANDB_API_KEY=<your_api_key> wandb login
-
Install the required libraries:
pip install wandb weave dspy
-
In your code, log in to W&B:
import wandb wandb.login()
Install and import the required libraries, then log in to W&B:
!pip install wandb weave dspy
import wandb
wandb.login()
New to W&B? See our quickstart guide.
Track program optimization (experimental)
For DSPy optimizers that use dspy.Evaluate
(such as MIPROv2), use the WandbDSPyCallback
to log evaluation metrics over time and track program signature evolution in W&B Tables.
import dspy
from dspy.datasets import MATH
import weave
import wandb
from wandb.integration.dspy import WandbDSPyCallback
# Initialize W&B and Weave
project_name = "dspy-optimization"
weave.init(project_name)
wandb.init(project=project_name)
# Add W&B callback to DSPy
dspy.settings.callbacks.append(WandbDSPyCallback())
# Configure language models
teacher_lm = dspy.LM('openai/gpt-4o', max_tokens=2000, cache=True)
student_lm = dspy.LM('openai/gpt-4o-mini', max_tokens=2000)
dspy.configure(lm=student_lm)
# Load dataset and define program
dataset = MATH(subset='algebra')
program = dspy.ChainOfThought("question -> answer")
# Configure and run optimizer
optimizer = dspy.MIPROv2(
metric=dataset.metric,
auto="light",
num_threads=24,
teacher_settings=dict(lm=teacher_lm),
prompt_model=student_lm
)
optimized_program = optimizer.compile(
program,
trainset=dataset.train,
max_bootstrapped_demos=2,
max_labeled_demos=2
)
After running this code, you receive both a W&B Run URL and a Weave URL. W&B displays evaluation metrics over time, along with Tables that show the evolution of program signatures. The run’s Overview tab includes links to Weave traces for detailed inspection.

For comprehensive details about Weave tracing, evaluation, and optimization with DSPy, see the Weave DSPy integration guide.
Log predictions to W&B Tables
Enable detailed prediction logging to inspect individual examples during optimization. The callback creates a W&B Tables for each evaluation step, which can help you to analyze specific successes and failures.
from wandb.integration.dspy import WandbDSPyCallback
# Enable prediction logging (enabled by default)
callback = WandbDSPyCallback(log_results=True)
dspy.settings.callbacks.append(callback)
# Run your optimization
optimized_program = optimizer.compile(program, trainset=train_data)
# Disable prediction logging if needed
# callback = WandbDSPyCallback(log_results=False)
Access prediction data
After optimization, find your prediction data in W&B:
- Navigate to your run’s Overview page.
- Look for Table panels named with a pattern like
predictions_0
,predictions_1
, and so forth. - Filter by
is_correct
to analyze failures. - Compare tables across runs in the project workspace.
Each table includes columns for:
example
: Input dataprediction
: Model outputis_correct
: Evaluation result
Learn more in the W&B Tables guide and the Tables tutorial.
Save and version DSPy programs
To reproduce and version your best DSpy programs, save them as W&B Artifacts. Choose between saving the complete program or only the state.
from wandb.integration.dspy import WandbDSPyCallback
# Create callback instance
callback = WandbDSPyCallback()
dspy.settings.callbacks.append(callback)
# Run optimization
optimized_program = optimizer.compile(program, trainset=train_data)
# Save options:
# 1. Complete program (recommended) - includes architecture and state
callback.log_best_model(optimized_program, save_program=True)
# 2. State only as JSON - lighter weight, human-readable
callback.log_best_model(optimized_program, save_program=False, choice="json")
# 3. State only as pickle - preserves Python objects
callback.log_best_model(optimized_program, save_program=False, choice="pkl")
# Add custom aliases for versioning
callback.log_best_model(
optimized_program,
save_program=True,
aliases=["best", "production", "v2.0"]
)
Feedback
Was this page helpful?
Glad to hear it! If you have more to say, please let us know.
Sorry to hear that. Please tell us how we can improve.