LangChain

Weave tracks and logs calls made through the LangChain Python library. When working with LLMs, debugging is part of the job. Whether a model call fails, an output is misformatted, or nested model calls create confusion, pinpointing issues can be hard. LangChain applications often consist of multiple steps and LLM call invocations, so it helps to understand the inner workings of your chains and agents. Weave captures traces for your LangChain applications automatically. This lets you monitor and analyze your application’s performance, which makes it easier to debug and optimize your LLM workflows. This guide is for developers building LangChain applications who want to add tracing, evaluation, and observability with Weave. It walks through enabling automatic tracing, attaching metadata, controlling tracing manually, and wrapping LangChain chains as Weave models for evaluation.

Getting started

To get started, call weave.init() at the beginning of your script. The argument to weave.init() is a project name that Weave uses to organize your traces.

import weave
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# Initialize Weave with your project name
weave.init("langchain_demo")

llm = ChatOpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

llm_chain = prompt | llm

output = llm_chain.invoke({"number": 2})

print(output)

Track call metadata

Custom metadata helps you filter and analyze traces in the Weave UI. To track metadata from your LangChain calls, use the weave.attributes context manager. This context manager lets you set custom metadata for a specific block of code, such as a chain or a single request.

import weave
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# Initialize Weave with your project name
weave.init("langchain_demo")

llm = ChatOpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

llm_chain = prompt | llm

with weave.attributes({"my_awesome_attribute": "value"}):
    output = llm_chain.invoke()

print(output)

Weave automatically tracks the metadata against the trace of the LangChain call. You can view the metadata in the Weave web interface.

Weave UI showing LangChain trace metadata attributes

Traces

Storing traces of LLM applications in a central database helps during both development and production. These traces give you a dataset you can use to debug and improve your application. Weave captures traces for your LangChain applications automatically. Weave tracks and logs calls made through the LangChain library, including prompt templates, chains, LLM calls, tools, and agent steps. You can view the traces in the Weave web interface.

Trace calls manually

Besides automatic tracing, you can manually trace calls using the WeaveTracer callback or the weave_tracing_enabled context manager. These methods resemble using request callbacks in individual parts of a LangChain application. Use them when you want to trace specific chains or invocations rather than your whole application. The following sections describe each approach. Note: Weave traces LangChain Runnables by default, and this is enabled when you call weave.init(). You can disable this behavior by setting the environment variable WEAVE_TRACE_LANGCHAIN to "false" before calling weave.init(). This lets you control the tracing behavior of specific chains or even individual requests in your application.

Use `WeaveTracer`

You can pass the WeaveTracer callback to individual LangChain components to trace specific requests.

import os

os.environ["WEAVE_TRACE_LANGCHAIN"] = "false" # <- explicitly disable global tracing.

from weave.integrations.langchain import WeaveTracer
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
import weave

# Initialize Weave with your project name
weave.init("langchain_demo")  # <-- we don't enable tracing here because the env var is explicitly set to `false`

weave_tracer = WeaveTracer()

config = {"callbacks": [weave_tracer]}

llm = ChatOpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

llm_chain = prompt | llm

output = llm_chain.invoke({"number": 2}, config=config) # <-- this enables tracing only for this chain invoke.

llm_chain.invoke({"number": 4})  # <-- this will not have tracing enabled for langchain calls but openai calls will still be traced

Use the `weave_tracing_enabled` context manager

Alternatively, you can use the weave_tracing_enabled context manager to enable tracing for specific blocks of code.

import os

os.environ["WEAVE_TRACE_LANGCHAIN"] = "false" # <- explicitly disable global tracing.

from weave.integrations.langchain import weave_tracing_enabled
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
import weave

# Initialize Weave with your project name
weave.init("langchain_demo")  # <-- we don't enable tracing here because the env var is explicitly set to `false`

llm = ChatOpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

llm_chain = prompt | llm

with weave_tracing_enabled():  # <-- this enables tracing only for this chain invoke.
    output = llm_chain.invoke({"number": 2})


llm_chain.invoke({"number": 4})  # <-- this will not have tracing enabled for langchain calls but openai calls will still be traced

Configuration

When you call weave.init(), Weave enables tracing by setting the environment variable WEAVE_TRACE_LANGCHAIN to "true". This lets Weave automatically capture traces for your LangChain applications. To disable this behavior, set the environment variable to "false".

Relation to LangChain callbacks

This section explains how Weave’s tracing integrates with LangChain’s callback system, so you can choose the approach that best fits your application.

Auto logging

The automatic logging provided by weave.init() is similar to passing a constructor callback to every component in a LangChain application. This means Weave tracks all interactions globally across your entire application, including prompt templates, chains, LLM calls, tools, and agent steps.

Manual logging

The manual logging methods (WeaveTracer and weave_tracing_enabled) are similar to using request callbacks in individual parts of a LangChain application. These methods give you finer control over which parts of your application Weave traces:

Constructor callbacks: Apply to the entire chain or component and log all interactions consistently.
Request callbacks: Apply to specific requests and provide detailed tracing of particular invocations.

When you integrate Weave with LangChain, you get logging and monitoring of your LLM applications, which makes debugging and performance tuning easier. For more information, see the LangChain documentation.

Models and evaluations

Organizing and evaluating LLMs across different use cases gets harder as you add components like prompts, model configurations, and inference parameters. With weave.Model, you can capture and organize experimental details like system prompts or the models you use, which makes it easier to compare iterations. The following sections show how to wrap a LangChain chain as a weave.Model and then evaluate it. The following example demonstrates wrapping a LangChain chain in a WeaveModel:

import json
import asyncio

import weave

from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# Initialize Weave with your project name
weave.init("langchain_demo")

class ExtractFruitsModel(weave.Model):
    model_name: str
    prompt_template: str

    @weave.op()
    async def predict(self, sentence: str) -> dict:
        llm = ChatOpenAI(model=self.model_name, temperature=0.0)
        prompt = PromptTemplate.from_template(self.prompt_template)

        llm_chain = prompt | llm
        response = llm_chain.invoke({"sentence": sentence})
        result = response.content

        if result is None:
            raise ValueError("No response from model")
        parsed = json.loads(result)
        return parsed

model = ExtractFruitsModel(
    model_name="gpt-3.5-turbo-1106",
    prompt_template='Extract fields ("fruit": <str>, "color": <str>, "flavor": <str>) from the following text, as json: {sentence}',
)
sentence = "There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy."

prediction = asyncio.run(model.predict(sentence))

# if you're in a Jupyter Notebook, run:
# prediction = await model.predict(sentence)

print(prediction)

This code creates a model that you can visualize in the Weave UI:

Weave UI showing a LangChain chain wrapped as a Weave Model

You can also use Weave Models with serve, and Evaluations.

Evaluations

Evaluations help you measure the performance of your models. The weave.Evaluation class captures how well your model performs on specific tasks or datasets. This makes it easier to compare different models and iterations of your application. The following example demonstrates how to evaluate the preceding model:

from weave.scorers import MultiTaskBinaryClassificationF1

sentences = [
    "There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy.",
    "Pounits are a bright green color and are more savory than sweet.",
    "Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them.",
]
labels = [
    {"fruit": "neoskizzles", "color": "purple", "flavor": "candy"},
    {"fruit": "pounits", "color": "bright green", "flavor": "savory"},
    {"fruit": "glowls", "color": "pale orange", "flavor": "sour and bitter"},
]
examples = [
    {"id": "0", "sentence": sentences[0], "target": labels[0]},
    {"id": "1", "sentence": sentences[1], "target": labels[1]},
    {"id": "2", "sentence": sentences[2], "target": labels[2]},
]

@weave.op()
def fruit_name_score(target: dict, output: dict) -> dict:
    return {"correct": target["fruit"] == output["fruit"]}


evaluation = weave.Evaluation(
    dataset=examples,
    scorers=[
        MultiTaskBinaryClassificationF1(class_names=["fruit", "color", "flavor"]),
        fruit_name_score,
    ],
)
scores = asyncio.run(evaluation.evaluate(model)))
# if you're in a Jupyter Notebook, run:
# scores = await evaluation.evaluate(model)

print(scores)

This code generates an evaluation trace that you can visualize in the Weave UI:

Weave UI showing a LangChain evaluation trace

Known issues

Tracing async calls: A bug in the implementation of the AsyncCallbackManager in LangChain causes async calls to not be traced in the correct order. Weave has filed a PR to fix this. As a result, the order of calls in the trace may not be accurate when you use the ainvoke, astream, and abatch methods in LangChain Runnables.

Get Started

Guides

Cookbooks

Reference

Details & Support

Getting started

Track call metadata

Traces

Trace calls manually

Use `WeaveTracer`

Use the `weave_tracing_enabled` context manager

Configuration

Relation to LangChain callbacks

Auto logging

Manual logging

Models and evaluations

Evaluations

Known issues

​Getting started

​Track call metadata

​Traces

​Trace calls manually

​Use WeaveTracer

​Use the weave_tracing_enabled context manager

​Configuration

​Relation to LangChain callbacks

​Auto logging

​Manual logging

​Models and evaluations

​Evaluations

​Known issues

Getting started

Track call metadata

Traces

Trace calls manually

Use `WeaveTracer`

Use the `weave_tracing_enabled` context manager

Configuration

Relation to LangChain callbacks

Auto logging

Manual logging

Models and evaluations

Evaluations

Known issues