Skip to main content
Many developers run open source models like Llama 3, Mixtral, Gemma, and Phi on their own hardware. Weave supports several local model runners by default, as long as they offer OpenAI SDK compatibility. This guide shows you how to trace calls to locally hosted models with Weave so you can capture inputs, outputs, and metadata in the same way you would for hosted LLM providers. It’s intended for developers who run open source models on their own machine and want observability for those calls.

Wrap local model functions with @weave.op()

If your local model isn’t accessed through an OpenAI-compatible runner, you can still capture traces by wrapping your own model-calling functions. Integrate Weave with any LLM by initializing Weave with weave.init('<your-project-name>') and then wrapping the calls to your LLMs with weave.op(). For more details, see the tracing guide.

Update your OpenAI SDK code to use local models

If your local model runner supports the OpenAI SDK, you can point the existing OpenAI client at it with two changes. The main change is the base_url parameter during the openai.OpenAI() initialization. This tells the OpenAI SDK to send requests to your local server instead of the OpenAI hosted API.
client = openai.OpenAI(
    base_url="http://localhost:1234",
)
For local models, the api_key can be any string, but you must override it. Otherwise, the OpenAI SDK reads it from environment variables and shows an error.

Supported local model runners

The following runners let you download and run models from Hugging Face on your computer. Each runner exposes an OpenAI-compatible endpoint that you can point the OpenAI SDK at using the preceding changes.