Wrap local model functions with @weave.op()
If your local model isn’t accessed through an OpenAI-compatible runner, you can still capture traces by wrapping your own model-calling functions. Integrate Weave with any LLM by initializing Weave with weave.init('<your-project-name>') and then wrapping the calls to your LLMs with weave.op(). For more details, see the tracing guide.
Update your OpenAI SDK code to use local models
If your local model runner supports the OpenAI SDK, you can point the existing OpenAI client at it with two changes. The main change is thebase_url parameter during the openai.OpenAI() initialization. This tells the OpenAI SDK to send requests to your local server instead of the OpenAI hosted API.
api_key can be any string, but you must override it. Otherwise, the OpenAI SDK reads it from environment variables and shows an error.
Supported local model runners
The following runners let you download and run models from Hugging Face on your computer. Each runner exposes an OpenAI-compatible endpoint that you can point the OpenAI SDK at using the preceding changes.- Nomic GPT4All - support through Local Server in settings (FAQ)
- LMStudio - Local Server OpenAI SDK support docs
- Ollama - OpenAI compatibility for OpenAI SDK
- llama-cpp-python - Python package for running
llama.cppwith OpenAI SDK support - llamafile -
http://localhost:8080/v1automatically supports the OpenAI SDK when you run Llamafile