Online Monitoring

This is an interactive notebook. You can run it locally or use the links below:

Integrate with Weave: production dashboard

This notebook demonstrates how to use Weave’s APIs and functions to create a custom dashboard for production monitoring as an extension to the Traces view in Weave. This guide is intended for developers and ML engineers who run LLM applications in production and want tailored visibility into performance, cost, and user feedback beyond what the default Traces view provides. This notebook focuses on:

Fetching traces, costs, feedback, and other metrics from Weave.
Creating aggregate views for user feedback and cost distribution.
Creating visualizations for token usage and latency over time.

By the end of this notebook, you will have a working custom dashboard that aggregates trace data, costs, and feedback from your Weave project into a single view. You can try the dashboard with your own Weave project by installing streamlit and running the production dashboard script. Example Production Dashboard with Weave

Setup

To begin, install the following packages:

!pip install streamlit pandas plotly weave

Implementation

The following sections walk through initializing the Weave client, fetching call data, and generating visualizations for the dashboard.

Initialize the Weave client and define costs

First, set up a function to initialize the Weave client and add costs for each model. This step is required so that downstream cost queries can attribute per-token pricing to each call. W&B includes the standard costs for many standard models, and also lets you add your own custom costs and custom models. The following example shows how to add custom costs for a few models and use the standard costs for the rest. The costs are calculated based on the tracked tokens for each call in Weave. For many LLM vendor libraries, Weave automatically tracks the token usage, but you can also return custom token counts for any call. For more information about defining the token count and cost calculation for a custom model, see the custom cost cookbook.

PROJECT_NAME = "wandb-smle/weave-cookboook-demo"
python
import weave

MODEL_NAMES = [
    # model name, prompt cost, completion cost
    ("gpt-4o-2024-05-13", 0.03, 0.06),
    ("gpt-4o-mini-2024-07-18", 0.03, 0.06),
    ("gemini/gemini-1.5-flash", 0.00025, 0.0005),
    ("gpt-4o-mini", 0.03, 0.06),
    ("gpt-4-turbo", 0.03, 0.06),
    ("claude-3-haiku-20240307", 0.01, 0.03),
    ("gpt-4o", 0.03, 0.06),
]

def init_weave_client(project_name):
    try:
        client = weave.init(project_name)
        for model, prompt_cost, completion_cost in MODEL_NAMES:
            client.add_cost(
                llm_id=model,
                prompt_token_cost=prompt_cost,
                completion_token_cost=completion_cost,
            )
    except Exception as e:
        print(f"Failed to initialize Weave client for project '{project_name}': {e}")
        return None
    else:
        return client

client = init_weave_client(PROJECT_NAME)

Fetch calls data from Weave

With the client initialized and costs configured, the next step is to pull call data from Weave. There are two options for fetching call data:

Fetching data call-by-call.
Using high-level APIs.

The following sections describe each option.

Fetch data call-by-call

The first option to access data from Weave is to retrieve a list of filtered calls and extract the wanted data call-by-call. To do this, use the calls_query_stream API to fetch the calls data from Weave:

calls_query_stream API: This API fetches the calls data from Weave.
filter dictionary: This dictionary contains the filter parameters to fetch the calls data. See the CallSchema reference for more details.
expand_columns list: This list contains the columns to expand in the calls data.
sort_by list: This list contains the sorting parameters for the calls data.
include_costs boolean: This boolean indicates whether to include the costs in the calls data.
include_feedback boolean: This boolean indicates whether to include the feedback in the calls data.

import itertools
from datetime import datetime, timedelta

import pandas as pd

def fetch_calls(client, project_id, start_time, trace_roots_only, limit):
    filter_params = {
        "project_id": project_id,
        "filter": {"started_at": start_time, "trace_roots_only": trace_roots_only},
        "expand_columns": ["inputs.example", "inputs.model"],
        "sort_by": [{"field": "started_at", "direction": "desc"}],
        "include_costs": True,
        "include_feedback": True,
    }
    try:
        calls_stream = client.server.calls_query_stream(filter_params)
        calls = list(
            itertools.islice(calls_stream, limit)
        )  # limit the number of calls to fetch if too many
        print(f"Fetched {len(calls)} calls.")
    except Exception as e:
        print(f"Error fetching calls: {e}")
        return []
    else:
        return calls

calls = fetch_calls(client, PROJECT_NAME, datetime.now() - timedelta(days=1), True, 100)
python
# the raw data is a list of Call objects
pd.DataFrame([call.dict() for call in calls]).head(3)

Process the calls using the return from Weave. Extract the relevant information and store it in a list of dictionaries. Then, convert the list of dictionaries to a pandas DataFrame and return it.

import json
from datetime import datetime

import pandas as pd

def process_calls(calls):
    records = []
    for call in calls:
        feedback = call.summary.get("weave", {}).get("feedback", [])
        thumbs_up = sum(
            1
            for item in feedback
            if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👍"
        )
        thumbs_down = sum(
            1
            for item in feedback
            if isinstance(item, dict) and item.get("payload", {}).get("emoji") == "👎"
        )
        latency = call.summary.get("weave", {}).get("latency_ms", 0)

        records.append(
            {
                "Call ID": call.id,
                "Trace ID": call.trace_id,  # this is a unique ID for the trace that can be used to retrieve it
                "Display Name": call.display_name,  # this is an optional name you can set in the UI or programatically
                "Latency (ms)": latency,
                "Thumbs Up": thumbs_up,
                "Thumbs Down": thumbs_down,
                "Started At": pd.to_datetime(getattr(call, "started_at", datetime.min)),
                "Inputs": json.dumps(call.inputs, default=str),
                "Outputs": json.dumps(call.output, default=str),
            }
        )
    return pd.DataFrame(records)
python
df_calls = process_calls(calls)
df_calls.head(3)

Use high-level APIs

Instead of going through every call, Weave also provides high-level APIs to directly access model costs, feedback, and other metrics. For example, for the cost, use the query_costs API to fetch the costs of all LLMs used in the project:

# Use cost API to get costs
costs = client.query_costs()
df_costs = pd.DataFrame([cost.dict() for cost in costs])
df_costs["total_cost"] = (
    df_costs["prompt_token_cost"] + df_costs["completion_token_cost"]
)

# only show the first row for every unqiue llm_id
df_costs

Gather inputs and generate visualizations

With call data and costs available as DataFrames, you can generate the visualizations using plotly. This is a starter dashboard that you can customize as you like. For a more advanced example, check out the Streamlit example in the knowledge-worker-weave repo.

import plotly.express as px
import plotly.graph_objects as go

def plot_feedback_pie_chart(thumbs_up, thumbs_down):
    fig = go.Figure(
        data=[
            go.Pie(
                labels=["Thumbs Up", "Thumbs Down"],
                values=[thumbs_up, thumbs_down],
                marker={"colors": ["#66b3ff", "#ff9999"]},
                hole=0.3,
            )
        ]
    )
    fig.update_traces(textinfo="percent+label", hoverinfo="label+percent")
    fig.update_layout(showlegend=False, title="Feedback Summary")
    return fig

def plot_model_cost_distribution(df):
    fig = px.bar(
        df,
        x="llm_id",
        y="total_cost",
        color="llm_id",
        title="Cost Distribution by Model",
    )
    fig.update_layout(xaxis_title="Model", yaxis_title="Cost (USD)")
    return fig

# See the source code for all the plots
python
plot_feedback_pie_chart(df_calls["Thumbs Up"].sum(), df_calls["Thumbs Down"].sum())
python
plot_model_cost_distribution(df_costs)

Conclusion

This cookbook demonstrated how to create a custom production monitoring dashboard using Weave’s APIs and functions. Weave focuses on fast integrations for streamlined input of data as well as extraction of the data for custom processes.

Data input:
- Framework-agnostic tracing with the @weave-op() decorator and the option to import calls from CSV (see the related import cookbook).
- Service API endpoints to log to Weave from various programming frameworks and languages. See the Service API reference for more details.
Data output:
- Download the data in CSV, TSV, JSONL, or JSON formats. See the Service API reference for more details.
- Export using programmatic access to the data. See the “Use Python” section in the export panel as described in this cookbook. See Querying and exporting calls for more details.

This custom dashboard extends Weave’s native Traces view to allow tailored monitoring of LLM applications in production. To view a more complex dashboard, check out the Streamlit example in the agent-dev-collection repo, where you can add your own Weave project URL.

Get Started

Guides

Cookbooks

Reference

Details & Support

Integrate with Weave: production dashboard

Setup

Implementation

Initialize the Weave client and define costs

Fetch calls data from Weave

Fetch data call-by-call

Use high-level APIs

Gather inputs and generate visualizations

Conclusion

​Integrate with Weave: production dashboard

​Setup

​Implementation

​Initialize the Weave client and define costs

​Fetch calls data from Weave

​Fetch data call-by-call

​Use high-level APIs

​Gather inputs and generate visualizations

​Conclusion

Integrate with Weave: production dashboard

Setup

Implementation

Initialize the Weave client and define costs

Fetch calls data from Weave

Fetch data call-by-call

Use high-level APIs

Gather inputs and generate visualizations

Conclusion