> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Learn Weave with W&B Inference

> Learn Weave fundamentals using W&B Inference to trace model calls, compare outputs, and run evaluations.

export const GitHubLink = ({url}) => <a href={url} target="_blank" rel="noopener noreferrer" className="github-source-link">
    <svg width="20" height="20" viewBox="0 0 24 24" fill="currentColor" xmlns="http://www.w3.org/2000/svg">
      <path d="M12 0C5.37 0 0 5.37 0 12c0 5.31 3.435 9.795 8.205 11.385.6.105.825-.255.825-.57 0-.285-.015-1.23-.015-2.235-3.015.555-3.795-.735-4.035-1.41-.135-.345-.72-1.41-1.23-1.695-.42-.225-1.02-.78-.015-.795.945-.015 1.62.87 1.845 1.23 1.08 1.815 2.805 1.305 3.495.99.105-.78.42-1.305.765-1.605-2.67-.3-5.46-1.335-5.46-5.925 0-1.305.465-2.385 1.23-3.225-.12-.3-.54-1.53.12-3.18 0 0 1.005-.315 3.3 1.23.96-.27 1.98-.405 3-.405s2.04.135 3 .405c2.295-1.56 3.3-1.23 3.3-1.23.66 1.65.24 2.88.12 3.18.765.84 1.23 1.905 1.23 3.225 0 4.605-2.805 5.625-5.475 5.925.435.375.81 1.095.81 2.22 0 1.605-.015 2.895-.015 3.3 0 .315.225.69.825.57A12.02 12.02 0 0024 12c0-6.63-5.37-12-12-12z" />
    </svg>
    GitHub source
  </a>;

export const ColabLink = ({url}) => <a href={url} target="_blank" rel="noopener noreferrer" className="colab-link">
    <svg width="20" height="20" viewBox="0 0 24 24" fill="currentColor" xmlns="http://www.w3.org/2000/svg">
      <path d="M14.25.18l.9.2.73.26.59.3.45.32.34.34.25.34.16.33.1.3.04.26.02.2-.01.13V8.5l-.05.63-.13.55-.21.46-.26.38-.3.31-.33.25-.35.19-.35.14-.33.1-.3.07-.26.04-.21.02H8.77l-.69.05-.59.14-.5.22-.41.27-.33.32-.27.35-.2.36-.15.37-.1.35-.07.32-.04.27-.02.21v3.06H3.17l-.21-.03-.28-.07-.32-.12-.35-.18-.36-.26-.36-.36-.35-.46-.32-.59-.28-.73-.21-.88-.14-1.05-.05-1.23.06-1.22.16-1.04.24-.87.32-.71.36-.57.4-.44.42-.33.42-.24.4-.16.36-.1.32-.05.24-.01h.16l.06.01h8.16v-.83H6.18l-.01-2.75-.02-.37.05-.34.11-.31.17-.28.25-.26.31-.23.38-.2.44-.18.51-.15.58-.12.64-.1.71-.06.77-.04.84-.02 1.27.05zm-6.3 1.98l-.23.33-.08.41.08.41.23.34.33.22.41.09.41-.09.33-.22.23-.34.08-.41-.08-.41-.23-.33-.33-.22-.41-.09-.41.09zm13.09 3.95l.28.06.32.12.35.18.36.27.36.35.35.47.32.59.28.73.21.88.14 1.04.05 1.23-.06 1.23-.16 1.04-.24.86-.32.71-.36.57-.4.45-.42.33-.42.24-.4.16-.36.09-.32.05-.24.02-.16-.01h-8.22v.82h5.84l.01 2.76.02.36-.05.34-.11.31-.17.29-.25.25-.31.24-.38.2-.44.17-.51.15-.58.13-.64.09-.71.07-.77.04-.84.01-1.27-.04-1.07-.14-.9-.2-.73-.25-.59-.3-.45-.33-.34-.34-.25-.34-.16-.33-.1-.3-.04-.25-.02-.2.01-.13v-5.34l.05-.64.13-.54.21-.46.26-.38.3-.32.33-.24.35-.2.35-.14.33-.1.3-.06.26-.04.21-.02.13-.01h5.84l.69-.05.59-.14.5-.21.41-.28.33-.32.27-.35.2-.36.15-.36.1-.35.07-.32.04-.28.02-.21V6.07h2.09l.14.01.21.03zm-6.47 14.25l-.23.33-.08.41.08.41.23.33.33.23.41.08.41-.08.33-.23.23-.33.08-.41-.08-.41-.23-.33-.33-.23-.41-.08-.41.08z" />
    </svg>
    Try in Colab
  </a>;

<div style={{ display: 'flex', gap: '12px', flexWrap: 'wrap' }}>
  <ColabLink url="https://colab.research.google.com/github/wandb/docs/blob/main/weave/cookbooks/source/learn-weave-with-inference.ipynb" />

  <GitHubLink url="https://github.com/wandb/docs/blob/main/weave/cookbooks/source/learn-weave-with-inference.ipynb" />
</div>

This guide shows you how to use W\&B Weave with [W\&B Inference](https://docs.wandb.ai/inference/). Use W\&B Inference, to build and trace LLM applications using live open-source models without setting up your own infrastructure or managing API keys from multiple providers. With your W\&B API key, you can interact with [all models hosted by W\&B Inference](https://docs.wandb.ai/inference/models/).

## What you'll learn

This guide shows you how to:

* Set up Weave and W\&B Inference
* Build a basic LLM application with automatic tracing
* Compare multiple models
* Evaluate model performance on a dataset
* View your results in the Weave UI

## Prerequisites

* A [W\&B account](https://wandb.ai/signup)
* Python 3.8+ or Node.js 18+
* Required packages installed:
  * **Python**: `pip install weave openai`
  * **TypeScript**: `npm install weave openai`
* An [OpenAI API key](https://platform.openai.com/api-keys) set as an environment variable

## Trace your first LLM call

To begin, copy and paste the following code example. The code example uses Llama 3.1-8B from W\&B Inference.

When you run this code, Weave:

* Traces your LLM call automatically
* Logs inputs, outputs, latency, and token usage
* Provides a link to view your trace in the Weave UI

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    import weave
    import openai

    # Initialize Weave - replace with your-team/your-project
    weave.init("<team-name>/inference-quickstart")

    # Create an OpenAI-compatible client pointing to W&B Inference
    client = openai.OpenAI(
        base_url='https://api.inference.wandb.ai/v1',
        api_key="YOUR_WANDB_API_KEY",  # Replace with your actual API key
        project="<team-name>/my-first-weave-project",  # Required for usage tracking
    )

    # Decorate your function to enable tracing; use the standard OpenAI client
    @weave.op()
    def ask_llama(question: str) -> str:
        response = client.chat.completions.create(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": question}
            ],
        )
        return response.choices[0].message.content

    # Call your function - Weave automatically traces everything
    result = ask_llama("What are the benefits of using W&B Weave for LLM development?")
    print(result)
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript lines theme={null}
    import * as weave from 'weave';
    import OpenAI from 'openai';

    // Initialize Weave - Replace values enclosed in "<>" with your own.
    await weave.init("<team-name>/inference-quickstart")

    // Create an OpenAI-compatible client pointing to W&B Inference
    const client = new OpenAI({
        baseURL: 'https://api.inference.wandb.ai/v1',  // W&B Inference endpoint
        apiKey: process.env.WANDB_API_KEY || 'YOUR_WANDB_API_KEY', // Replace with your API key or set the WANDB_API_KEY environment variable
    });

    // Wrap your function with weave.op to enable tracing
    const askLlama = weave.op(async function askLlama(question: string): Promise<string> {
    const response = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-70B-Instruct',
        messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: question }
        ],
    });
    return response.choices[0].message.content || '';
    });

    // Call your function - Weave automatically traces everything
    const result = await askLlama('What are the benefits of using W&B Weave for LLM development?');
    console.log(result);
    ```
  </Tab>
</Tabs>

## Build a text summarization application

Next, try running this code, which is a basic summarization app that shows how Weave traces nested operations:

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    import weave
    import openai

    # Initialize Weave - Replace values enclosed in "<>" with your own.
    weave.init("<team-name>/inference-quickstart")

    client = openai.OpenAI(
        base_url='https://api.inference.wandb.ai/v1',
        api_key="YOUR_WANDB_API_KEY",  # Replace with your actual API key
        project="<team-name>/my-first-weave-project",  # Required for usage tracking
    )

    @weave.op()
    def extract_key_points(text: str) -> list[str]:
        """Extract key points from a text."""
        response = client.chat.completions.create(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=[
                {"role": "system", "content": "Extract 3-5 key points from the text. Return each point on a new line."},
                {"role": "user", "content": text}
            ],
        )
        # Returns response without blank lines
        return [line for line in response.choices[0].message.content.strip().splitlines() if line.strip()]

    @weave.op()
    def create_summary(key_points: list[str]) -> str:
        """Create a concise summary based on key points."""
        points_text = "\n".join(f"- {point}" for point in key_points)
        response = client.chat.completions.create(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=[
                {"role": "system", "content": "Create a one-sentence summary based on these key points."},
                {"role": "user", "content": f"Key points:\n{points_text}"}
            ],
        )
        return response.choices[0].message.content

    @weave.op()
    def summarize_text(text: str) -> dict:
        """Main summarization pipeline."""
        key_points = extract_key_points(text)
        summary = create_summary(key_points)
        return {
            "key_points": key_points,
            "summary": summary
        }

    # Try it with sample text
    sample_text = """
    The Apollo 11 mission was a historic spaceflight that landed the first humans on the Moon 
    on July 20, 1969. Commander Neil Armstrong and lunar module pilot Buzz Aldrin descended 
    to the lunar surface while Michael Collins remained in orbit. Armstrong became the first 
    person to step onto the Moon, followed by Aldrin 19 minutes later. They spent about 
    two and a quarter hours together outside the spacecraft, collecting samples and taking photographs.
    """

    result = summarize_text(sample_text)
    print("Key Points:", result["key_points"])
    print("\nSummary:", result["summary"])
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript lines theme={null}
    import * as weave from 'weave';
    import OpenAI from 'openai';

    // Initialize Weave - replace with your-team/your-project
    await weave.init('<team-name>/inference-quickstart');

    const client = new OpenAI({
    baseURL: 'https://api.inference.wandb.ai/v1',
    apiKey: process.env.WANDB_API_KEY || 'YOUR_WANDB_API_KEY',  // Replace with your API key or set the WANDB_API_KEY environment variable
    });

    const extractKeyPoints = weave.op(async function extractKeyPoints(text: string): Promise<string[]> {
    const response = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-8B-Instruct',
        messages: [
        { role: 'system', content: 'Extract 3-5 key points from the text. Return each point on a new line.' },
        { role: 'user', content: text }
        ],
    });
    // Returns response without blank lines
    const content = response.choices[0].message.content || '';
    return content.split('\n').map(line => line.trim()).filter(line => line.length > 0);
    });

    const createSummary = weave.op(async function createSummary(keyPoints: string[]): Promise<string> {
    const pointsText = keyPoints.map(point => `- ${point}`).join('\n');
    const response = await client.chat.completions.create({
        model: 'meta-llama/Llama-3.1-8B-Instruct',
        messages: [
        { role: 'system', content: 'Create a one-sentence summary based on these key points.' },
        { role: 'user', content: `Key points:\n${pointsText}` }
        ],
    });
    return response.choices[0].message.content || '';
    });

    const summarizeText = weave.op(async function summarizeText(text: string): Promise<{key_points: string[], summary: string}> {
    const keyPoints = await extractKeyPoints(text);
    const summary = await createSummary(keyPoints);
    return {
        key_points: keyPoints,
        summary: summary
    };
    });

    // Try it with sample text
    const sampleText = `
    The Apollo 11 mission was a historic spaceflight that landed the first humans on the Moon 
    on July 20, 1969. Commander Neil Armstrong and lunar module pilot Buzz Aldrin descended 
    to the lunar surface while Michael Collins remained in orbit. Armstrong became the first 
    person to step onto the Moon, followed by Aldrin 19 minutes later. They spent about 
    two and a quarter hours together outside the spacecraft, collecting samples and taking photographs.
    `;

    const result = await summarizeText(sampleText);
    console.log('Key Points:', result.key_points);
    console.log('\nSummary:', result.summary);
    ```
  </Tab>
</Tabs>

## Compare multiple models

W\&B Inference provides access to multiple models. Use the following code to compare the performance of Llama and DeepSeek's respective responses:

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    import weave
    import openai

    # Initialize Weave - replace with your-team/your-project
    weave.init("<team-name>/inference-quickstart")

    client = openai.OpenAI(
        base_url='https://api.inference.wandb.ai/v1',
        api_key="YOUR_WANDB_API_KEY",  # Replace with your actual API key
        project="<team-name>/my-first-weave-project",  # Required for usage tracking
    )

    # Define a Model class to compare different LLMs
    class InferenceModel(weave.Model):
        model_name: str
        
        @weave.op()
        def predict(self, question: str) -> str:
            response = client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "user", "content": question}
                ],
            )
            return response.choices[0].message.content

    # Create instances for different models
    llama_model = InferenceModel(model_name="meta-llama/Llama-3.1-8B-Instruct")
    deepseek_model = InferenceModel(model_name="deepseek-ai/DeepSeek-V3.1")

    # Compare their responses
    test_question = "Explain quantum computing in one paragraph for a high school student."

    print("Llama 3.1 8B response:")
    print(llama_model.predict(test_question))
    print("\n" + "="*50 + "\n")
    print("DeepSeek V3 response:")
    print(deepseek_model.predict(test_question))
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript lines theme={null}
    import * as weave from 'weave';
    import OpenAI from 'openai';

    // Initialize Weave - replace with your-team/your-project
    await weave.init("<team-name>/inference-quickstart")

    const client = new OpenAI({
      baseURL: 'https://api.inference.wandb.ai/v1',
      apiKey: process.env.WANDB_API_KEY || 'YOUR_WANDB_API_KEY', // Replace with your API key or set the WANDB_API_KEY environment variable
    });

    // Create model functions using weave.op (weave.Model is not supported in TypeScript)
    function createModel(modelName: string) {
      return weave.op(async function predict(question: string): Promise<string> {
        const response = await client.chat.completions.create({
          model: modelName,
          messages: [
            { role: 'user', content: question }
          ],
        });
        return response.choices[0].message.content || '';
      });
    }

    // Create instances for different models
    const llamaModel = createModel('meta-llama/Llama-3.1-8B-Instruct');
    const deepseekModel = createModel('deepseek-ai/DeepSeek-V3.1');

    // Compare their responses
    const testQuestion = 'Explain quantum computing in one paragraph for a high school student.';

    console.log('Llama 3.1 8B response:');
    console.log(await llamaModel(testQuestion));
    console.log('\n' + '='.repeat(50) + '\n');
    console.log('DeepSeek V3 response:');
    console.log(await deepseekModel(testQuestion));
    ```
  </Tab>
</Tabs>

## Evaluate model performance

Evaluate how well a model performs on a Q\&A task using Weave's built-in `EvaluationLogger`. This provides structured evaluation tracking with automatic aggregation, token usage capture, and rich comparison features in the UI.

Append the following code to the script you used in the prior section:

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    from typing import Optional
    from weave import EvaluationLogger

    # Create a simple dataset
    dataset = [
        {"question": "What is 2 + 2?", "expected": "4"},
        {"question": "What is the capital of France?", "expected": "Paris"},
        {"question": "Name a primary color", "expected_one_of": ["red", "blue", "yellow"]},
    ]

    # Define a scorer
    @weave.op()
    def accuracy_scorer(expected: str, output: str, expected_one_of: Optional[list[str]] = None) -> dict:
        """Score the accuracy of the model output."""
        output_clean = output.strip().lower()
        
        if expected_one_of:
            is_correct = any(option.lower() in output_clean for option in expected_one_of)
        else:
            is_correct = expected.lower() in output_clean
        
        return {"correct": is_correct, "score": 1.0 if is_correct else 0.0}

    # Evaluate a model using Weave's EvaluationLogger
    def evaluate_model(model: InferenceModel, dataset: list[dict]):
        """Run evaluation on a dataset using Weave's built-in evaluation framework."""
        # Initialize EvaluationLogger BEFORE calling the model to capture token usage
        # This is especially important for W&B Inference to track costs
        # Convert model name to a valid format (replace non-alphanumeric chars with underscores)
        safe_model_name = model.model_name.replace("/", "_").replace("-", "_").replace(".", "_")
        eval_logger = EvaluationLogger(
            model=safe_model_name,
            dataset="qa_dataset"
        )
        
        for example in dataset:
            # Get model prediction
            output = model.predict(example["question"])
            
            # Log the prediction
            pred_logger = eval_logger.log_prediction(
                inputs={"question": example["question"]},
                output=output
            )
            
            # Score the output
            score = accuracy_scorer(
                expected=example.get("expected", ""),
                output=output,
                expected_one_of=example.get("expected_one_of")
            )
            
            # Log the score
            pred_logger.log_score(
                scorer="accuracy",
                score=score["score"]
            )
            
            # Finish logging for this prediction
            pred_logger.finish()
        
        # Log summary - Weave automatically aggregates the accuracy scores
        eval_logger.log_summary()
        print(f"Evaluation complete for {model.model_name} (logged as: {safe_model_name}). View results in the Weave UI.")

    # Compare multiple models - a key feature of Weave's evaluation framework
    models_to_compare = [
        llama_model,
        deepseek_model,
    ]

    for model in models_to_compare:
        evaluate_model(model, dataset)

    # In the Weave UI, navigate to the Evals tab to compare results across models
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript lines theme={null}
    import { EvaluationLogger } from 'weave';

    // Create a simple dataset
    interface DatasetExample {
      question: string;
      expected?: string;
      expected_one_of?: string[];
    }

    const dataset: DatasetExample[] = [
      { question: 'What is 2 + 2?', expected: '4' },
      { question: 'What is the capital of France?', expected: 'Paris' },
      { question: 'Name a primary color', expected_one_of: ['red', 'blue', 'yellow'] },
    ];

    // Define a scorer
    const accuracyScorer = weave.op(function accuracyScorer(args: {
      expected: string;
      output: string;
      expected_one_of?: string[];
    }): { correct: boolean; score: number } {
      const outputClean = args.output.trim().toLowerCase();
      
      let isCorrect: boolean;
      if (args.expected_one_of) {
        isCorrect = args.expected_one_of.some(option => 
          outputClean.includes(option.toLowerCase())
        );
      } else {
        isCorrect = outputClean.includes(args.expected.toLowerCase());
      }
      
      return { correct: isCorrect, score: isCorrect ? 1.0 : 0.0 };
    });

    // Evaluate a model using Weave's EvaluationLogger
    async function evaluateModel(
      model: (question: string) => Promise<string>,
      modelName: string,
      dataset: DatasetExample[]
    ): Promise<void> {
      // Initialize EvaluationLogger BEFORE calling the model to capture token usage
      // This is especially important for W&B Inference to track costs
      // Convert model name to a valid format (replace non-alphanumeric chars with underscores)
      const safeModelName = modelName.replace(/\//g, '_').replace(/-/g, '_').replace(/\./g, '_');
      const evalLogger = new EvaluationLogger({
        name: 'inference_evaluation',
        model: { name: safeModelName },
        dataset: 'qa_dataset'
      });
      
      for (const example of dataset) {
        // Get model prediction
        const output = await model(example.question);
        
        // Log the prediction
        const predLogger = evalLogger.logPrediction(
          { question: example.question },
          output
        );
        
        // Score the output
        const score = await accuracyScorer({
          expected: example.expected || '',
          output: output,
          expected_one_of: example.expected_one_of
        });
        
        // Log the score
        predLogger.logScore('accuracy', score.score);
        
        // Finish logging for this prediction
        predLogger.finish();
      }
      
      // Log summary - Weave automatically aggregates the accuracy scores
      await evalLogger.logSummary();
      console.log(`Evaluation complete for ${modelName} (logged as: ${safeModelName}). View results in the Weave UI.`);
    }

    // Compare multiple models - a key feature of Weave's evaluation framework
    const modelsToCompare = [
      { model: llamaModel, name: 'meta-llama/Llama-3.1-8B-Instruct' },
      { model: deepseekModel, name: 'deepseek-ai/DeepSeek-V3.1' },
    ];

    for (const { model, name } of modelsToCompare) {
      await evaluateModel(model, name, dataset);
    }

    // In the Weave UI, navigate to the Evals tab to compare results across models
    ```
  </Tab>
</Tabs>

Running these examples returns links to the traces in the terminal. Click any link to view traces in the Weave UI.

In the Weave UI, you can:

* Review a timeline of all your LLM calls
* Inspect inputs and outputs for each operation
* View token usage and estimated costs (automatically captured by EvaluationLogger)
* Analyze latency and performance metrics
* Navigate to the **Evals** tab to see aggregated evaluation results
* Use the **Compare** feature to analyze performance across different models
* Page through specific examples to see how different models performed on the same inputs

## Available models

For a complete list of available models, see the [Available Models section](https://docs.wandb.ai/inference/models/) in the W\&B Inference documentation.

## Next steps

* **Use the Playground**: [Try models interactively](/weave/guides/tools/playground#access-the-playground) in the Weave Playground
* **Build evaluations**: Learn about [systematic evaluation](/weave/guides/core-types/evaluations) of your LLM applications
* **Try other integrations**: Weave works with [OpenAI, Anthropic, and many more](/weave/guides/integrations)

## Troubleshooting

<details>
  <summary>Authentication errors</summary>

  If you get authentication errors:

  1. Verify you have a valid W\&B account
  2. Check that you're using the correct API key from [wandb.ai/settings](https://wandb.ai/settings)
  3. Ensure your project name follows the format `team-name/project-name`
</details>

<details>
  <summary>Rate limit errors</summary>

  W\&B Inference has concurrency limits per project. If you hit rate limits:

  * Reduce the number of concurrent requests
  * Add delays between calls
  * Consider upgrading your plan for higher limits

  For more details, see the [limits documentation for W\&B Inference](https://docs.wandb.ai/inference/usage-limits/).
</details>

<details>
  <summary>Running out of credits</summary>

  The free tier includes limited credits. See the [usage and limits documentation](https://docs.wandb.ai/inference/usage-limits/) for details.
</details>
