Getting started
To enable Weave tracing for your Verdict pipelines, callweave.init(project=...) at the beginning of your script. Use the project argument to log to a specific W&B Team name with team-name/project-name, or pass project-name to log to your default team or entity.
Tracking call metadata
To attach custom metadata to your Verdict pipeline calls, use theweave.attributes context manager. This context manager lets you tag a specific block of code, such as a pipeline run or evaluation batch, so that you can filter and group related traces later in the Weave UI.
Traces
Storing traces of AI evaluation pipelines in a central database helps during both development and production. These traces support debugging and improving your evaluation workflows, and they provide a useful dataset. Weave automatically captures traces for your Verdict applications. It tracks and logs all calls made through the Verdict library, including:Pipelineexecution steps.JudgeUnitevaluations.Layertransformations.- Pooling operations.
- Custom units and transformations.
Pipeline tracing example
The following example shows how Weave traces nested pipeline operations, so you can see how each step in a multi-stage Verdict pipeline is captured:- The main
Pipelineexecution. - Each
JudgeUnitevaluation within theLayer. - The
MeanPoolUnitaggregation step. - Timing information for each operation.
Configuration
When you callweave.init(), Weave automatically enables tracing for Verdict pipelines. The integration works by patching the Pipeline.__init__() method to inject a VerdictTracer that forwards all trace data to Weave.
You don’t need any additional configuration. Weave automatically performs the following:
- Captures all pipeline operations.
- Tracks execution timing.
- Logs inputs and outputs.
- Maintains trace hierarchy.
- Handles concurrent pipeline execution.
Custom tracers and Weave
If you already use custom Verdict tracers in your application, Weave’sVerdictTracer can run alongside them so that you don’t have to choose between integrations:
Models and evaluations
Organizing and evaluating AI systems with multiple pipeline components can be challenging. Using theweave.Model, you can capture and organize experimental details like prompts, pipeline configurations, and evaluation parameters, making it easier to compare different iterations.
The following example demonstrates wrapping a Verdict pipeline in a weave.Model:
Evaluations
Evaluations help you measure the performance of your evaluation pipelines themselves. By using theweave.Evaluation class, you can capture how well your Verdict pipelines perform on specific tasks or datasets: