- Visibility into every LLM call, input, and output in your application
- Systematic evaluation to measure performance against curated test cases
- Version tracking for prompts, models, and data so you can understand what changed
- Feedback collection to capture human judgments and production signals
Debug with traces
Weave automatically traces your LLM calls and shows them in an interactive UI. You can see exactly what went into each call, what came out, how long it took, and how calls relate to each other. Get started with tracingEvaluate systematically
Run your application against curated test datasets and measure performance with scoring functions. Track how changes to prompts or models affect quality over time. Build an evaluation pipelineVersion everything
Weave tracks versions of your prompts, datasets, and model configurations. When something breaks, you can see exactly what changed. When something works, you can reproduce it. Learn about versioningExperiment with prompts and models
Bring your API keys and quickly test prompts and compare responses from various commercial models using the Playground. Experiment in the Weave PlaygroundCollect feedback
Capture human feedback, annotations, and corrections from production use. Use this data to build better test cases and improve your application. Collect feedbackMonitor production
Score production traffic with the same scorers you use in evaluation. Set up guardrails to catch issues before they reach users. Set up guardrails and monitorsSupported languages
Weave provides SDKs for Python and TypeScript:- Python
- TypeScript
Integrations
Weave integrates with popular LLM providers and frameworks:- LLM providers: OpenAI, Anthropic, Google, Mistral, Cohere, and more
- Frameworks: LangChain, LlamaIndex, DSPy, CrewAI, and more
- Local models: Ollama, vLLM, and other local inference servers