Skip to main content
Weave for Agents is in public preview. Features, APIs, and the Agents view UI may change before general availability.
In W&B Weave’s Agents view, the Signals tab shows tags and ratings for your agent’s conversations. Signals surface quality and safety issues to flag problems, find patterns, and highlight the traces that need your attention. Use signals to automatically score the quality of your agent’s responses, notice when a user is frustrated, or flag NSFW content.

Get started

To view signals for your project:
  1. Navigate to https://wandb.ai and select your project.
  2. In the sidebar menu, select Agents to view all agent conversations saved for your project.
  3. In the tab bar, click Signals.
The Signals tab showing a list of scored turns for the agent.

Key terms

  • Turn: One back-and-forth exchange between the user and the agent.
  • Rating: A numeric score between 0.0 and 1.0 assigned to a matching span.
  • Tags: Labels assigned to matching spans, such as “user-frustration” or “nsfw”.

Signals table

Each row represents the output of one of your signal monitors. The following columns appear by default.
ColumnDescription
TypeThe part of the conversation being scored. Currently only “turn” is supported.
ScorerThe name of the signal that produced this score.
Last messageA preview of the last message in the scored turn, with the role shown below.
AgentThe agent associated with the scored turn.
ScoresThe numeric rating from 0.0 to 1.0, or a tag if matched. We recommend using consistent ratings where 1 indicates good and 0 indicates bad, but your scorers can use any scale you define.
TrendDisplays an inline chart showing how this signal is trending over time. Shows either the average value (for ratings) or the count (for tags).
WhenWhen the signal was scored.
Use the time window selector and Filter bar to narrow results by scorer, agent, score range, or time period. The Score volume timeline shows counts of signals that have been evaluated for ratings or tagged. It reflects the rows shown in the table and supports drag-to-filter on the timeline.

Create a new signal

Select + New signal to create a new scorer for your agent.

Scorer type

Choose to create either a Rating scorer or a Tags scorer.
  • Rating: Assigns a score between 0 and 1 to each matching span.
  • Tags: Assigns up to 10 tags to each matching span. The signals UI only displays rows for spans that matched at least one tag, so your tag scorer might be working even if you don’t see any output.

Only score turns matching

Use this selector to restrict which turns the signal scores, such as a specific Agent name, Operation name, Tool name, or Request model. Multiple filters are combined with AND logic. To score every turn, select the x at the end of the filter row to remove it.

Prompt template

Choose a starter template from the lists below, then adjust the prompt that appears inline as Scorer prompt. Weave resolves template variables, such as {input_messages}, {output_messages}, and {system_instructions}, during scoring.

Rating templates

TemplateWhat it evaluates
User SatisfactionWhether the user is satisfied (positive feedback, follow-up engagement, task completion) or dissatisfied (complaints, repeated rephrasing, abandonment).
User Good IntentWhether the user’s intent is benign and legitimate, versus jailbreak attempts, harmful requests, or prompt injection.
Safe-for-WorkWhether the conversation is appropriate for any professional setting, versus explicit, violent, or otherwise inappropriate workplace content.
Response QualityWhether the agent’s response is accurate, complete, and directly addresses the user’s request.

Tags templates

TemplateWhat it detects
User FrustrationUser shows signs of frustration, anger, confusion, or dissatisfaction.
Malicious Intent (Jailbreaking)User attempts to jailbreak the system, extract restricted content, perform prompt injection, use role-play exploits, or otherwise manipulate the agent into ignoring its guardrails.
NSFWUser input or agent output contains explicit sexual content, graphic violence, or other material inappropriate for a workplace setting.
Low Quality ResponseAgent output that is factually wrong, off-topic, evasive, repetitive, refuses without justification, or otherwise fails to address the user’s request.

Scorer name

Choose the display name for this signal.

Advanced settings

Under Advanced, configure the following options.
  • Inference model: The LLM to use for scoring. Serverless Inference is the recommended default when available.
  • Sample rate: For high-traffic agents, set a sample rate to score a fraction of turns instead of every turn and reduce cost.

Manage and edit signals

Select Manage signals to open a drawer that lists all active signals for the project. From there you can toggle signals on or off, delete them, or edit any signal. The editor shows the same fields as + New signal.

Troubleshoot signals

Signal activity appears under Traces in the project sidebar. If you don’t see the expected signal matches, debug using the Traces table. For example, check the scorer name and the Status column for error conditions. Scorer execution errors show a red indicator for Status and include details for the error.