Skip to main content
W&B Weave is an observability and evaluation platform for building reliable LLM applications. Weave helps you understand what your AI application is doing, measure how well it performs, and systematically improve it over time. Building LLM applications is fundamentally different from traditional software development. LLM outputs are non-deterministic, making debugging harder. Quality is subjective and context-dependent. Small prompt changes can cause unexpected behavior changes. Traditional testing approaches fall short. Weave addresses these challenges by providing:
  • Visibility into every LLM call, input, and output in your application
  • Systematic evaluation to measure performance against curated test cases
  • Version tracking for prompts, models, and data so you can understand what changed
  • Feedback collection to capture human judgments and production signals

Debug with traces

Weave automatically traces your LLM calls and shows them in an interactive UI. You can see exactly what went into each call, what came out, how long it took, and how calls relate to each other. Get started with tracing

Evaluate systematically

Run your application against curated test datasets and measure performance with scoring functions. Track how changes to prompts or models affect quality over time. Build an evaluation pipeline

Version everything

Weave tracks versions of your prompts, datasets, and model configurations. When something breaks, you can see exactly what changed. When something works, you can reproduce it. Learn about versioning

Experiment with prompts and models

Bring your API keys and quickly test prompts and compare responses from various commercial models using the Playground. Experiment in the Weave Playground

Collect feedback

Capture human feedback, annotations, and corrections from production use. Use this data to build better test cases and improve your application. Collect feedback

Monitor production

Score production traffic with the same scorers you use in evaluation. Set up guardrails to catch issues before they reach users. Set up guardrails and monitors

Supported languages

Weave provides SDKs for Python and TypeScript:
pip install weave
Both SDKs support tracing, evaluation, datasets, and the core Weave features. Some advanced features like class-based Models and Scorers are currently not available for the Weave TypeScript SDK.

Integrations

Weave integrates with popular LLM providers and frameworks:
  • LLM providers: OpenAI, Anthropic, Google, Mistral, Cohere, and more
  • Frameworks: LangChain, LlamaIndex, DSPy, CrewAI, and more
  • Local models: Ollama, vLLM, and other local inference servers
When you use a supported integration, Weave automatically traces LLM calls without additional code changes. View all integrations

Next steps