> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Évaluer des applications RAG

> Créez et évaluez des applications RAG à l’aide de Weave et de juges LLM

export const GitHubLink = ({url}) => <a href={url} target="_blank" rel="noopener noreferrer" className="github-source-link">
    <svg width="20" height="20" viewBox="0 0 24 24" fill="currentColor" xmlns="http://www.w3.org/2000/svg">
      <path d="M12 0C5.37 0 0 5.37 0 12c0 5.31 3.435 9.795 8.205 11.385.6.105.825-.255.825-.57 0-.285-.015-1.23-.015-2.235-3.015.555-3.795-.735-4.035-1.41-.135-.345-.72-1.41-1.23-1.695-.42-.225-1.02-.78-.015-.795.945-.015 1.62.87 1.845 1.23 1.08 1.815 2.805 1.305 3.495.99.105-.78.42-1.305.765-1.605-2.67-.3-5.46-1.335-5.46-5.925 0-1.305.465-2.385 1.23-3.225-.12-.3-.54-1.53.12-3.18 0 0 1.005-.315 3.3 1.23.96-.27 1.98-.405 3-.405s2.04.135 3 .405c2.295-1.56 3.3-1.23 3.3-1.23.66 1.65.24 2.88.12 3.18.765.84 1.23 1.905 1.23 3.225 0 4.605-2.805 5.625-5.475 5.925.435.375.81 1.095.81 2.22 0 1.605-.015 2.895-.015 3.3 0 .315.225.69.825.57A12.02 12.02 0 0024 12c0-6.63-5.37-12-12-12z" />
    </svg>
    Source GitHub
  </a>;

export const ColabLink = ({url}) => <a href={url} target="_blank" rel="noopener noreferrer" className="colab-link">
    <svg width="20" height="20" viewBox="0 0 24 24" fill="currentColor" xmlns="http://www.w3.org/2000/svg">
      <path d="M14.25.18l.9.2.73.26.59.3.45.32.34.34.25.34.16.33.1.3.04.26.02.2-.01.13V8.5l-.05.63-.13.55-.21.46-.26.38-.3.31-.33.25-.35.19-.35.14-.33.1-.3.07-.26.04-.21.02H8.77l-.69.05-.59.14-.5.22-.41.27-.33.32-.27.35-.2.36-.15.37-.1.35-.07.32-.04.27-.02.21v3.06H3.17l-.21-.03-.28-.07-.32-.12-.35-.18-.36-.26-.36-.36-.35-.46-.32-.59-.28-.73-.21-.88-.14-1.05-.05-1.23.06-1.22.16-1.04.24-.87.32-.71.36-.57.4-.44.42-.33.42-.24.4-.16.36-.1.32-.05.24-.01h.16l.06.01h8.16v-.83H6.18l-.01-2.75-.02-.37.05-.34.11-.31.17-.28.25-.26.31-.23.38-.2.44-.18.51-.15.58-.12.64-.1.71-.06.77-.04.84-.02 1.27.05zm-6.3 1.98l-.23.33-.08.41.08.41.23.34.33.22.41.09.41-.09.33-.22.23-.34.08-.41-.08-.41-.23-.33-.33-.22-.41-.09-.41.09zm13.09 3.95l.28.06.32.12.35.18.36.27.36.35.35.47.32.59.28.73.21.88.14 1.04.05 1.23-.06 1.23-.16 1.04-.24.86-.32.71-.36.57-.4.45-.42.33-.42.24-.4.16-.36.09-.32.05-.24.02-.16-.01h-8.22v.82h5.84l.01 2.76.02.36-.05.34-.11.31-.17.29-.25.25-.31.24-.38.2-.44.17-.51.15-.58.13-.64.09-.71.07-.77.04-.84.01-1.27-.04-1.07-.14-.9-.2-.73-.25-.59-.3-.45-.33-.34-.34-.25-.34-.16-.33-.1-.3-.04-.25-.02-.2.01-.13v-5.34l.05-.64.13-.54.21-.46.26-.38.3-.32.33-.24.35-.2.35-.14.33-.1.3-.06.26-.04.21-.02.13-.01h5.84l.69-.05.59-.14.5-.21.41-.28.33-.32.27-.35.2-.36.15-.36.1-.35.07-.32.04-.28.02-.21V6.07h2.09l.14.01.21.03zm-6.47 14.25l-.23.33-.08.41.08.41.23.33.33.23.41.08.41-.08.33-.23.23-.33.08-.41-.08-.41-.23-.33-.33-.23-.41-.08-.41.08z" />
    </svg>
    Essayer sur Colab
  </a>;

<div style={{ display: 'flex', gap: '12px', flexWrap: 'wrap' }}>
  <ColabLink url="https://colab.research.google.com/github/wandb/docs/blob/main/weave/cookbooks/source/evaluate_rag_applications.ipynb" />

  <GitHubLink url="https://github.com/wandb/docs/blob/main/weave/cookbooks/source/evaluate_rag_applications.ipynb" />
</div>

La génération augmentée par récupération (RAG) est une méthode courante pour créer des applications d’IA générative ayant accès à des bases de connaissances personnalisées. Ce tutoriel vous explique comment créer une application RAG et utiliser Weave pour suivre ses étapes de récupération et évaluer ses réponses avec un juge LLM, afin de mesurer et d’améliorer la qualité des réponses renvoyées par votre application.

Ce guide s’adresse aux développeurs qui créent des applications RAG et souhaitent ajouter de l’observabilité et une évaluation systématique à leurs pipelines.

<img src="https://mintcdn.com/wb-21fd5541/aRvhhwVWqlxBzke5/images/evals-hero.png?fit=max&auto=format&n=aRvhhwVWqlxBzke5&q=85&s=7d7466d666ad412ed3916bfab533d118" alt="Illustration Evals" width="4100" height="2160" data-path="images/evals-hero.png" />

<div id="what-youll-learn">
  ## Ce que vous apprendrez
</div>

Ce guide vous montre comment :

* Créer une base de connaissances.
* Créer une application RAG avec une étape de récupération qui identifie les documents pertinents.
* Suivre les étapes de récupération avec Weave.
* Évaluer des applications RAG à l’aide d’un juge LLM pour mesurer la précision du contexte.
* Définir des fonctions de score personnalisées.

<div id="prerequisites">
  ## Prérequis
</div>

* Un [compte W\&B](https://wandb.ai/signup)
* Python 3.10+ ou Node.js 18+
* Packages requis :
  * **Python**: `pip install weave openai`
  * **TypeScript**: `npm install weave openai`
* Une [clé API OpenAI](https://platform.openai.com/api-keys) configurée comme variable d'environnement.

<div id="build-a-knowledge-base">
  ## Créer une base de connaissances
</div>

La base de connaissances est le corpus que votre application RAG interroge au moment de la requête. Dans cette section, vous calculez des embeddings vectoriels pour un petit ensemble d’articles afin de pouvoir récupérer plus tard l’article le plus pertinent pour une question donnée.

Commencez par calculer les embeddings des articles. En général, vous ne feriez cela qu'une seule fois pour vos articles, puis vous stockeriez les embeddings et les métadonnées dans une base de données, mais ici, pour plus de simplicité, cette opération est effectuée à chaque exécution du script.

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    from openai import OpenAI
    import weave
    from weave import Model
    import numpy as np
    import json
    import asyncio

    articles = [
        "Novo Nordisk and Eli Lilly rival soars 32 percent after promising weight loss drug results Shares of Denmarks Zealand Pharma shot 32 percent higher in morning trade, after results showed success in its liver disease treatment survodutide, which is also on trial as a drug to treat obesity. The trial “tells us that the 6mg dose is safe, which is the top dose used in the ongoing [Phase 3] obesity trial too,” one analyst said in a note. The results come amid feverish investor interest in drugs that can be used for weight loss.",
        "Berkshire shares jump after big profit gain as Buffetts conglomerate nears $1 trillion valuation Berkshire Hathaway shares rose on Monday after Warren Buffetts conglomerate posted strong earnings for the fourth quarter over the weekend. Berkshires Class A and B shares jumped more than 1.5%, each. Class A shares are higher by more than 17% this year, while Class B has gained more than 18%. Berkshire was last valued at $930.1 billion, up from $905.5 billion where it closed on Friday, according to FactSet. Berkshire on Saturday posted fourth-quarter operating earnings of $8.481 billion, about 28 percent higher than the $6.625 billion from the year-ago period, driven by big gains in its insurance business. Operating earnings refers to profits from businesses across insurance, railroads and utilities. Meanwhile, Berkshires cash levels also swelled to record levels. The conglomerate held $167.6 billion in cash in the fourth quarter, surpassing the $157.2 billion record the conglomerate held in the prior quarter.",
        "Highmark Health says its combining tech from Google and Epic to give doctors easier access to information Highmark Health announced it is integrating technology from Google Cloud and the health-care software company Epic Systems. The integration aims to make it easier for both payers and providers to access key information they need, even if its stored across multiple points and formats, the company said. Highmark is the parent company of a health plan with 7 million members, a provider network of 14 hospitals and other entities",
        "Rivian and Lucid shares plunge after weak EV earnings reports Shares of electric vehicle makers Rivian and Lucid fell Thursday after the companies reported stagnant production in their fourth-quarter earnings after the bell Wednesday. Rivian shares sank about 25 percent, and Lucids stock dropped around 17 percent. Rivian forecast it will make 57,000 vehicles in 2024, slightly less than the 57,232 vehicles it produced in 2023. Lucid said it expects to make 9,000 vehicles in 2024, more than the 8,428 vehicles it made in 2023.",
        "Mauritius blocks Norwegian cruise ship over fears of a potential cholera outbreak Local authorities on Sunday denied permission for the Norwegian Dawn ship, which has 2,184 passengers and 1,026 crew on board, to access the Mauritius capital of Port Louis, citing “potential health risks.” The Mauritius Ports Authority said Sunday that samples were taken from at least 15 passengers on board the cruise ship. A spokesperson for the U.S.-headquartered Norwegian Cruise Line Holdings said Sunday that 'a small number of guests experienced mild symptoms of a stomach-related illness' during Norwegian Dawns South Africa voyage.",
        "Intuitive Machines lands on the moon in historic first for a U.S. company Intuitive Machines Nova-C cargo lander, named Odysseus after the mythological Greek hero, is the first U.S. spacecraft to soft land on the lunar surface since 1972. Intuitive Machines is the first company to pull off a moon landing — government agencies have carried out all previously successful missions. The company's stock surged in extended trading Thursday, after falling 11 percent in regular trading.",
        "Lunar landing photos: Intuitive Machines Odysseus sends back first images from the moon Intuitive Machines cargo moon lander Odysseus returned its first images from the surface. Company executives believe the lander caught its landing gear sideways on the moon's surface while touching down and tipped over. Despite resting on its side, the company's historic IM-1 mission is still operating on the moon.",
    ]

    def docs_to_embeddings(docs: list) -> list:
        openai = OpenAI()
        document_embeddings = []
        for doc in docs:
            response = (
                openai.embeddings.create(input=doc, model="text-embedding-3-small")
                .data[0]
                .embedding
            )
            document_embeddings.append(response)
        return document_embeddings

    article_embeddings = docs_to_embeddings(articles) # Remarque : en général, vous ne feriez cela qu'une seule fois pour vos articles et stockeriez les embeddings et métadonnées dans une base de données
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript twoslash lines theme={null}
    // @noErrors
    require('dotenv').config();
    import { OpenAI } from 'openai';
    import * as weave from 'weave';

    interface Article {
        text: string;
        embedding?: number[];
    }

    const articles: Article[] = [
        { 
            text: `Novo Nordisk and Eli Lilly rival soars 32 percent after promising weight loss drug results Shares of Denmarks Zealand Pharma shot 32 percent higher in morning trade, after results showed success in its liver disease treatment survodutide, which is also on trial as a drug to treat obesity. The trial tells us that the 6mg dose is safe, which is the top dose used in the ongoing [Phase 3] obesity trial too, one analyst said in a note. The results come amid feverish investor interest in drugs that can be used for weight loss.`
        },
        { 
            text: `Berkshire shares jump after big profit gain as Buffetts conglomerate nears $1 trillion valuation Berkshire Hathaway shares rose on Monday after Warren Buffetts conglomerate posted strong earnings for the fourth quarter over the weekend. Berkshires Class A and B shares jumped more than 1.5%, each. Class A shares are higher by more than 17% this year, while Class B has gained more than 18%. Berkshire was last valued at $930.1 billion, up from $905.5 billion where it closed on Friday, according to FactSet. Berkshire on Saturday posted fourth-quarter operating earnings of $8.481 billion, about 28 percent higher than the $6.625 billion from the year-ago period, driven by big gains in its insurance business. Operating earnings refers to profits from businesses across insurance, railroads and utilities. Meanwhile, Berkshires cash levels also swelled to record levels. The conglomerate held $167.6 billion in cash in the fourth quarter, surpassing the $157.2 billion record the conglomerate held in the prior quarter.`
        },
        { 
            text: `Highmark Health says its combining tech from Google and Epic to give doctors easier access to information Highmark Health announced it is integrating technology from Google Cloud and the health-care software company Epic Systems. The integration aims to make it easier for both payers and providers to access key information they need, even if its stored across multiple points and formats, the company said. Highmark is the parent company of a health plan with 7 million members, a provider network of 14 hospitals and other entities`
        }
    ];

    function cosineSimilarity(a: number[], b: number[]): number {
        const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
        const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
        const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
        return dotProduct / (magnitudeA * magnitudeB);
    }

    const docsToEmbeddings = weave.op(async function(docs: Article[]): Promise<Article[]> {
        const openai = new OpenAI();
        const enrichedDocs = await Promise.all(docs.map(async (doc) => {
            const response = await openai.embeddings.create({
                input: doc.text,
                model: "text-embedding-3-small"
            });
            return {
                ...doc,
                embedding: response.data[0].embedding
            };
        }));
        return enrichedDocs;
    });
    ```
  </Tab>
</Tabs>

<div id="create-a-rag-app">
  ## Créer une application RAG
</div>

Une fois la base de connaissances en place, vous pouvez maintenant créer l'application RAG elle-même. Cette section combine l'étape de récupération avec un appel LLM et encapsule les deux avec Weave afin que chaque entrée et sortie soit automatiquement suivie.

Ensuite, encapsulez la fonction de récupération `get_most_relevant_document` avec le décorateur `weave.op()`, puis créez une classe `Model`. Encapsuler la fonction de récupération avec `weave.op()` permet à Weave de capturer ses entrées et sorties pour chaque appel, ce qui rend l'étape de récupération inspectable par la suite. Appelez `weave.init('<team-name>/rag-quickstart')` pour commencer à suivre toutes les entrées et sorties de vos fonctions afin de pouvoir les inspecter plus tard. Si vous ne spécifiez pas de nom d'équipe, Weave enregistre la sortie dans votre [équipe ou entité W\&B par défaut](/fr/platform/app/settings-page/user-settings#default-team).

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    from openai import OpenAI
    import weave
    from weave import Model
    import numpy as np
    import asyncio

    @weave.op()
    def get_most_relevant_document(query):
        openai = OpenAI()
        query_embedding = (
            openai.embeddings.create(input=query, model="text-embedding-3-small")
            .data[0]
            .embedding
        )
        similarities = [
            np.dot(query_embedding, doc_emb)
            / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
            for doc_emb in article_embeddings
        ]
        # Obtenir l'index du document le plus similaire
        most_relevant_doc_index = np.argmax(similarities)
        return articles[most_relevant_doc_index]

    class RAGModel(Model):
        system_message: str
        model_name: str = "gpt-3.5-turbo-1106"

        @weave.op()
        def predict(self, question: str) -> dict: # remarque : `question` sera utilisé plus tard pour sélectionner des données à partir de nos lignes d'évaluation
            from openai import OpenAI
            context = get_most_relevant_document(question)
            client = OpenAI()
            query = f"""Use the following information to answer the subsequent question. If the answer cannot be found, write "I don't know."
            Context:
            \"\"\"
            {context}
            \"\"\"
            Question: {question}"""
            response = client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": self.system_message},
                    {"role": "user", "content": query},
                ],
                temperature=0.0,
                response_format={"type": "text"},
            )
            answer = response.choices[0].message.content
            return {'answer': answer, 'context': context}

    # Définissez les noms de votre équipe et de votre projet
    weave.init('<team-name>/rag-quickstart')
    model = RAGModel(
        system_message="You are an expert in finance and answer questions related to finance, financial services, and financial markets. When responding based on provided information, be sure to cite the source."
    )
    model.predict("What significant result was reported about Zealand Pharma's obesity trial?")
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript twoslash lines theme={null}
    // @noErrors
    class RAGModel {
        private openai: OpenAI;
        private systemMessage: string;
        private modelName: string;
        private articleEmbeddings: Article[];

        constructor(config: {
            systemMessage: string;
            modelName?: string;
            articleEmbeddings: Article[];
        }) {
            this.openai = new OpenAI();
            this.systemMessage = config.systemMessage;
            this.modelName = config.modelName || "gpt-3.5-turbo-1106";
            this.articleEmbeddings = config.articleEmbeddings;
            this.predict = weave.op(this, this.predict);
        }

        async predict(question: string): Promise<{
            answer: string;
            context: string;
        }> {
            const context = await this.getMostRelevantDocument(question);
            
            const response = await this.openai.chat.completions.create({
                model: this.modelName,
                messages: [
                    { role: "system", content: this.systemMessage },
                    { role: "user", content: `Use the following information to answer the subsequent question. If the answer cannot be found, write "I don't know."
                        Context:
                        """
                        ${context}
                        """
                        Question: ${question}` }
                ],
                temperature: 0
            });

            return {
                answer: response.choices[0].message.content || "",
                context
            };
        }
    }
    ```
  </Tab>
</Tabs>

<div id="evaluate-with-an-llm-judge">
  ## Évaluer avec un juge LLM
</div>

Maintenant que votre application RAG est en cours d’exécution et suivie par Weave, l’étape suivante consiste à évaluer dans quelle mesure elle répond bien aux questions. Cette section montre comment utiliser un LLM comme juge automatique afin que vous puissiez attribuer un score aux réponses de votre application sans rédiger manuellement des libellés.

Lorsqu’il n’existe pas de moyen simple d’évaluer votre application, une approche consiste à utiliser un LLM pour en évaluer certains aspects. Voici un exemple d’utilisation d’un juge LLM pour mesurer la précision du contexte, en lui demandant de vérifier si le contexte a été utile pour parvenir à la réponse fournie. Ce prompt a été adapté à partir du framework [RAGAS](https://docs.ragas.io/en/stable/).

<div id="define-a-scoring-function">
  ### Définir une fonction de score
</div>

Comme dans le tutoriel [Créer un pipeline d’Évaluation](/fr/weave/tutorial-eval), définissez un ensemble de lignes d’exemple pour tester votre application, ainsi qu’une fonction de score. La fonction de score prend une ligne et l’évalue. Les arguments d’entrée doivent correspondre aux clés de votre ligne ; ici, `question` provient donc du dictionnaire de la ligne. `output` est la sortie du modèle. L’entrée du modèle est récupérée depuis l’exemple en fonction de son argument d’entrée ; ici aussi, il s’agit de `question`. Cet exemple utilise des fonctions `async` afin qu’elles s’exécutent en parallèle. Pour une brève introduction à l’async, consultez la [documentation Python `asyncio`](https://docs.python.org/3/library/asyncio.html).

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    from openai import OpenAI
    import weave
    import asyncio

    @weave.op()
    async def context_precision_score(question, output):
        context_precision_prompt = """Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output.
        Output in only valid JSON format.

        question: {question}
        context: {context}
        answer: {answer}
        verdict: """
        client = OpenAI()

        prompt = context_precision_prompt.format(
            question=question,
            context=output['context'],
            answer=output['answer'],
        )

        response = client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[{"role": "user", "content": prompt}],
            response_format={ "type": "json_object" }
        )
        response_message = response.choices[0].message
        response = json.loads(response_message.content)
        return {
            "verdict": int(response["verdict"]) == 1,
        }

    questions = [
        {"question": "What significant result was reported about Zealand Pharma's obesity trial?"},
        {"question": "How much did Berkshire Hathaway's cash levels increase in the fourth quarter?"},
        {"question": "What is the goal of Highmark Health's integration of Google Cloud and Epic Systems technology?"},
        {"question": "What were Rivian and Lucid's vehicle production forecasts for 2024?"},
        {"question": "Why was the Norwegian Dawn cruise ship denied access to Mauritius?"},
        {"question": "Which company achieved the first U.S. moon landing since 1972?"},
        {"question": "What issue did Intuitive Machines' lunar lander encounter upon landing on the moon?"}
    ]
    evaluation = weave.Evaluation(dataset=questions, scorers=[context_precision_score])
    asyncio.run(evaluation.evaluate(model)) # remarque : vous devrez définir un modèle à évaluer
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript twoslash lines theme={null}
    // @noErrors
    const contextPrecisionScore = weave.op(async function(args: {
        datasetRow: QuestionRow;
        modelOutput: { answer: string; context: string; }
    }): Promise<ScorerResult> {
        const openai = new OpenAI();
        
        const prompt = `Given question, answer and context verify if the context was useful...`;

        const response = await openai.chat.completions.create({
            model: "gpt-4-turbo-preview",
            messages: [{ role: "user", content: prompt }],
            response_format: { type: "json_object" }
        });

        const result = JSON.parse(response.choices[0].message.content || "{}");
        return {
            verdict: parseInt(result.verdict) === 1
        };
    });

    const evaluation = new weave.Evaluation({
        dataset: createQuestionDataset(),
        scorers: [contextPrecisionScore]
    });

    await evaluation.evaluate({
        model: weave.op((args: { datasetRow: QuestionRow }) => 
            model.predict(args.datasetRow.question)
        )
    });
    ```
  </Tab>
</Tabs>

<div id="optional-define-a-scorer-class">
  ### Facultatif : définir une classe `Scorer`
</div>

La fonction de score de la section précédente fonctionne bien pour les cas simples, mais une classe `Scorer` est utile si vous voulez réutiliser le même juge dans plusieurs évaluations ou personnaliser la manière dont les scores sont agrégés. Les étapes suivantes montrent quand et comment en définir une.

Dans certaines applications, vous pouvez vouloir créer des classes d’évaluation personnalisées — par exemple lorsqu’il faut créer une classe `LLMJudge` standardisée avec des paramètres spécifiques (par exemple, modèle de chat et prompt), une logique de score propre à chaque ligne et un calcul spécifique du score agrégé. Weave propose une liste de classes `Scorer` prêtes à l’emploi et permet aussi de créer facilement un `Scorer` personnalisé. L’exemple suivant montre comment créer une `class CorrectnessLLMJudge(Scorer)` personnalisée.

Dans les grandes lignes, les étapes pour créer un `Scorer` personnalisé sont :

1. Définissez une classe personnalisée qui hérite de `weave.flow.scorer.Scorer`.
2. Redéfinissez la fonction `score` et ajoutez `@weave.op()` si vous voulez suivre chaque appel de la fonction.
   * Cette fonction doit définir un argument `output` dans lequel la prédiction du modèle est transmise. Définissez-le avec le type `Optional[dict]` au cas où le modèle renverrait `"None"`.
   * Les autres arguments peuvent être de type `Any` ou `dict`, ou correspondre à des colonnes spécifiques du jeu de données utilisé pour évaluer le modèle avec la classe `weave.Evaluate`. Ils doivent avoir exactement les mêmes noms que les noms de colonnes ou les clés d’une ligne après son passage à `preprocess_model_input`, si celle-ci est utilisée.
3. Facultatif : Redéfinissez la fonction `summarize` pour personnaliser le calcul du score agrégé. Par défaut, Weave utilise la fonction `weave.flow.scorer.auto_summarize` si vous ne définissez pas de fonction personnalisée.
   * Cette fonction doit avoir un décorateur `@weave.op()`.

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    from weave import Scorer

    class CorrectnessLLMJudge(Scorer):
        prompt: str
        model_name: str
        device: str

        @weave.op()
        async def score(self, output: Optional[dict], query: str, answer: str) -> Any:
            """Évalue l’exactitude des prédictions en comparant pred, query et target.
            Arguments :
                - output: le dict qui sera fourni par le modèle évalué
                - query: la question posée - telle que définie dans le jeu de données
                - answer: la réponse cible - telle que définie dans le jeu de données
            Retourne :
                - dict unique {nom de métrique: valeur d’évaluation unique}"""

            # get_model est défini comme une fonction générique de récupération de modèle basée sur les paramètres fournis (OpenAI,HF...)
            eval_model = get_model(
                model_name = self.model_name,
                prompt = self.prompt
                device = self.device,
            )
            # évaluation asynchrone pour accélérer l’évaluation - cela n’a pas besoin d’être asynchrone
            grade = await eval_model.async_predict(
                {
                    "query": query,
                    "answer": answer,
                    "result": output.get("result"),
                }
            )
            # analyse de la sortie - pourrait être effectuée de manière plus robuste avec pydantic
            evaluation = "incorrect" not in grade["text"].strip().lower()

            # le nom de colonne affiché dans Weave
            return {"correct": evaluation}

        @weave.op()
        def summarize(self, score_rows: list) -> Optional[dict]:
            """Agrège tous les scores calculés pour chaque ligne par la fonction de score.
            Arguments :
                - score_rows: une liste de dicts. Chaque dict contient des métriques et des scores
            Retourne :
                - dict imbriqué avec la même structure que l’entrée"""

            # si rien n’est fourni, la fonction weave.flow.scorer.auto_summarize est utilisée
            # return auto_summarize(score_rows)

            valid_data = [x.get("correct") for x in score_rows if x.get("correct") is not None]
            count_true = list(valid_data).count(True)
            int_data = [int(x) for x in valid_data]

            sample_mean = np.mean(int_data) if int_data else 0
            sample_variance = np.var(int_data) if int_data else 0
            sample_error = np.sqrt(sample_variance / len(int_data)) if int_data else 0

            # la couche supplémentaire "correct" n’est pas nécessaire mais ajoute de la structure dans l’interface utilisateur
            return {
                "correct": {
                    "true_count": count_true,
                    "true_fraction": sample_mean,
                    "stderr": sample_error,
                }
            }
    ```
  </Tab>

  <Tab title="TypeScript">
    ```plaintext theme={null}
    Cette fonctionnalité n’est pas encore disponible en TypeScript.
    ```
  </Tab>
</Tabs>

Pour l’utiliser comme évaluateur, initialisez-le, puis passez-le à l’argument `scorers` dans votre `Evaluation`, comme ceci :

<Tabs>
  <Tab title="Python">
    ```python lines theme={null}
    evaluation = weave.Evaluation(dataset=questions, scorers=[CorrectnessLLMJudge()])
    ```
  </Tab>

  <Tab title="TypeScript">
    ```plaintext theme={null}
    Cette fonctionnalité n’est pas encore disponible en TypeScript.
    ```
  </Tab>
</Tabs>

<div id="put-it-all-together">
  ## Assemblez le tout
</div>

Cette section regroupe tout ce qui a été vu dans les étapes précédentes dans un exemple complet de bout en bout, afin que vous puissiez voir comment les différents éléments s’articulent et les adapter à votre propre application RAG.

Pour obtenir le même résultat avec vos applications RAG :

* Encapsulez les appels LLM et les fonctions des étapes de récupération avec `weave.op()`.
* Facultatif : créez une sous-classe `Model` avec une fonction `predict` et les informations de l’application.
* Collectez des exemples à évaluer.
* Créez des fonctions de score qui évaluent un exemple.
* Utilisez la classe `Evaluation` pour exécuter des évaluations sur vos exemples.

**REMARQUE :** Il arrive que l’exécution asynchrone des évaluations atteigne une limite de débit sur les modèles d’OpenAI, d’Anthropic et d’autres fournisseurs. Pour l’éviter, vous pouvez définir une variable d’environnement afin de limiter le nombre de workers parallèles, par exemple `WEAVE_PARALLELISM=3`.

Voici le code dans son intégralité.

<Tabs>
  <Tab title="Python">
    ```python lines {34,52-77} theme={null}
    from openai import OpenAI
    import weave
    from weave import Model
    import numpy as np
    import json
    import asyncio

    # Exemples à utiliser pour les évaluations
    articles = [
        "Novo Nordisk and Eli Lilly rival soars 32 percent after promising weight loss drug results Shares of Denmarks Zealand Pharma shot 32 percent higher in morning trade, after results showed success in its liver disease treatment survodutide, which is also on trial as a drug to treat obesity. The trial “tells us that the 6mg dose is safe, which is the top dose used in the ongoing [Phase 3] obesity trial too,” one analyst said in a note. The results come amid feverish investor interest in drugs that can be used for weight loss.",
        "Berkshire shares jump after big profit gain as Buffetts conglomerate nears $1 trillion valuation Berkshire Hathaway shares rose on Monday after Warren Buffetts conglomerate posted strong earnings for the fourth quarter over the weekend. Berkshires Class A and B shares jumped more than 1.5%, each. Class A shares are higher by more than 17% this year, while Class B has gained more than 18%. Berkshire was last valued at $930.1 billion, up from $905.5 billion where it closed on Friday, according to FactSet. Berkshire on Saturday posted fourth-quarter operating earnings of $8.481 billion, about 28 percent higher than the $6.625 billion from the year-ago period, driven by big gains in its insurance business. Operating earnings refers to profits from businesses across insurance, railroads and utilities. Meanwhile, Berkshires cash levels also swelled to record levels. The conglomerate held $167.6 billion in cash in the fourth quarter, surpassing the $157.2 billion record the conglomerate held in the prior quarter.",
        "Highmark Health says its combining tech from Google and Epic to give doctors easier access to information Highmark Health announced it is integrating technology from Google Cloud and the health-care software company Epic Systems. The integration aims to make it easier for both payers and providers to access key information they need, even if it's stored across multiple points and formats, the company said. Highmark is the parent company of a health plan with 7 million members, a provider network of 14 hospitals and other entities",
        "Rivian and Lucid shares plunge after weak EV earnings reports Shares of electric vehicle makers Rivian and Lucid fell Thursday after the companies reported stagnant production in their fourth-quarter earnings after the bell Wednesday. Rivian shares sank about 25 percent, and Lucids stock dropped around 17 percent. Rivian forecast it will make 57,000 vehicles in 2024, slightly less than the 57,232 vehicles it produced in 2023. Lucid said it expects to make 9,000 vehicles in 2024, more than the 8,428 vehicles it made in 2023.",
        "Mauritius blocks Norwegian cruise ship over fears of a potential cholera outbreak Local authorities on Sunday denied permission for the Norwegian Dawn ship, which has 2,184 passengers and 1,026 crew on board, to access the Mauritius capital of Port Louis, citing “potential health risks.” The Mauritius Ports Authority said Sunday that samples were taken from at least 15 passengers on board the cruise ship. A spokesperson for the U.S.-headquartered Norwegian Cruise Line Holdings said Sunday that 'a small number of guests experienced mild symptoms of a stomach-related illness' during Norwegian Dawns South Africa voyage.",
        "Intuitive Machines lands on the moon in historic first for a U.S. company Intuitive Machines Nova-C cargo lander, named Odysseus after the mythological Greek hero, is the first U.S. spacecraft to soft land on the lunar surface since 1972. Intuitive Machines is the first company to pull off a moon landing — government agencies have carried out all previously successful missions. The company's stock surged in extended trading Thursday, after falling 11 percent in regular trading.",
        "Lunar landing photos: Intuitive Machines Odysseus sends back first images from the moon Intuitive Machines cargo moon lander Odysseus returned its first images from the surface. Company executives believe the lander caught its landing gear sideways on the surface of the moon while touching down and tipped over. Despite resting on its side, the company's historic IM-1 mission is still operating on the moon.",
    ]

    def docs_to_embeddings(docs: list) -> list:
        openai = OpenAI()
        document_embeddings = []
        for doc in docs:
            response = (
                openai.embeddings.create(input=doc, model="text-embedding-3-small")
                .data[0]
                .embedding
            )
            document_embeddings.append(response)
        return document_embeddings

    article_embeddings = docs_to_embeddings(articles) # Remarque : en général, vous ne feriez cela qu'une seule fois pour vos articles et stockeriez les embeddings et les métadonnées dans une base de données

    # Ajouter un décorateur à l'étape de récupération
    @weave.op()
    def get_most_relevant_document(query):
        openai = OpenAI()
        query_embedding = (
            openai.embeddings.create(input=query, model="text-embedding-3-small")
            .data[0]
            .embedding
        )
        similarities = [
            np.dot(query_embedding, doc_emb)
            / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
            for doc_emb in article_embeddings
        ]
        # Obtenir l'index du document le plus similaire
        most_relevant_doc_index = np.argmax(similarities)
        return articles[most_relevant_doc_index]

    # Créer une sous-classe de Model avec les détails de l'application, ainsi qu'une fonction predict qui produit une réponse
    class RAGModel(Model):
        system_message: str
        model_name: str = "gpt-3.5-turbo-1106"

        @weave.op()
        def predict(self, question: str) -> dict: # remarque : `question` sera utilisé plus tard pour sélectionner des données dans nos lignes d'évaluation
            from openai import OpenAI
            context = get_most_relevant_document(question)
            client = OpenAI()
            query = f"""Use the following information to answer the subsequent question. If the answer cannot be found, write "I don't know."
            Context:
            \"\"\"
            {context}
            \"\"\"
            Question: {question}"""
            response = client.chat.completions.create(
                model=self.model_name,
                messages=[
                    {"role": "system", "content": self.system_message},
                    {"role": "user", "content": query},
                ],
                temperature=0.0,
                response_format={"type": "text"},
            )
            answer = response.choices[0].message.content
            return {'answer': answer, 'context': context}

    # Définir les noms de votre équipe et de votre projet
    weave.init('<team-name>/rag-quickstart')
    model = RAGModel(
        system_message="You are an expert in finance and answer questions related to finance, financial services, and financial markets. When responding based on provided information, be sure to cite the source."
    )

    # Voici notre fonction de score qui utilise notre question et la sortie pour produire un score
    @weave.op()
    async def context_precision_score(question, output):
        context_precision_prompt = """Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output.
        Output in only valid JSON format.

        question: {question}
        context: {context}
        answer: {answer}
        verdict: """
        client = OpenAI()

        prompt = context_precision_prompt.format(
            question=question,
            context=output['context'],
            answer=output['answer'],
        )

        response = client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[{"role": "user", "content": prompt}],
            response_format={ "type": "json_object" }
        )
        response_message = response.choices[0].message
        response = json.loads(response_message.content)
        return {
            "verdict": int(response["verdict"]) == 1,
        }

    questions = [
        {"question": "What significant result was reported about Zealand Pharma's obesity trial?"},
        {"question": "How much did Berkshire Hathaway's cash levels increase in the fourth quarter?"},
        {"question": "What is the goal of Highmark Health's integration of Google Cloud and Epic Systems technology?"},
        {"question": "What were Rivian and Lucid's vehicle production forecasts for 2024?"},
        {"question": "Why was the Norwegian Dawn cruise ship denied access to Mauritius?"},
        {"question": "Which company achieved the first U.S. moon landing since 1972?"},
        {"question": "What issue did Intuitive Machines' lunar lander encounter upon landing on the moon?"}
    ]

    # Définir un objet Evaluation et transmettre des exemples de questions ainsi que des fonctions de score
    evaluation = weave.Evaluation(dataset=questions, scorers=[context_precision_score])
    asyncio.run(evaluation.evaluate(model))
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript twoslash lines theme={null}
    // @noErrors
    require('dotenv').config();
    import { OpenAI } from 'openai';
    import * as weave from 'weave';

    interface Article {
        text: string;
        embedding?: number[];
    }

    const articles: Article[] = [
        { 
            text: `Novo Nordisk and Eli Lilly rival soars 32 percent after promising weight loss drug results Shares of Denmarks Zealand Pharma shot 32 percent higher in morning trade, after results showed success in its liver disease treatment survodutide, which is also on trial as a drug to treat obesity. The trial tells us that the 6mg dose is safe, which is the top dose used in the ongoing [Phase 3] obesity trial too, one analyst said in a note. The results come amid feverish investor interest in drugs that can be used for weight loss.`
        },
        { 
            text: `Berkshire shares jump after big profit gain as Buffetts conglomerate nears $1 trillion valuation Berkshire Hathaway shares rose on Monday after Warren Buffetts conglomerate posted strong earnings for the fourth quarter over the weekend. Berkshires Class A and B shares jumped more than 1.5%, each. Class A shares are higher by more than 17% this year, while Class B has gained more than 18%. Berkshire was last valued at $930.1 billion, up from $905.5 billion where it closed on Friday, according to FactSet. Berkshire on Saturday posted fourth-quarter operating earnings of $8.481 billion, about 28 percent higher than the $6.625 billion from the year-ago period, driven by big gains in its insurance business. Operating earnings refers to profits from businesses across insurance, railroads and utilities. Meanwhile, Berkshires cash levels also swelled to record levels. The conglomerate held $167.6 billion in cash in the fourth quarter, surpassing the $157.2 billion record the conglomerate held in the prior quarter.`
        },
        { 
            text: `Highmark Health says its combining tech from Google and Epic to give doctors easier access to information Highmark Health announced it is integrating technology from Google Cloud and the health-care software company Epic Systems. The integration aims to make it easier for both payers and providers to access key information they need, even if its stored across multiple points and formats, the company said. Highmark is the parent company of a health plan with 7 million members, a provider network of 14 hospitals and other entities`
        }
    ];

    function cosineSimilarity(a: number[], b: number[]): number {
        const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
        const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
        const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
        return dotProduct / (magnitudeA * magnitudeB);
    }

    const docsToEmbeddings = weave.op(async function(docs: Article[]): Promise<Article[]> {
        const openai = new OpenAI();
        const enrichedDocs = await Promise.all(docs.map(async (doc) => {
            const response = await openai.embeddings.create({
                input: doc.text,
                model: "text-embedding-3-small"
            });
            return {
                ...doc,
                embedding: response.data[0].embedding
            };
        }));
        return enrichedDocs;
    });

    class RAGModel {
        private openai: OpenAI;
        private systemMessage: string;
        private modelName: string;
        private articleEmbeddings: Article[];

        constructor(config: {
            systemMessage: string;
            modelName?: string;
            articleEmbeddings: Article[];
        }) {
            this.openai = new OpenAI();
            this.systemMessage = config.systemMessage;
            this.modelName = config.modelName || "gpt-3.5-turbo-1106";
            this.articleEmbeddings = config.articleEmbeddings;
            this.predict = weave.op(this, this.predict);
        }

        private async getMostRelevantDocument(query: string): Promise<string> {
            const queryEmbedding = await this.openai.embeddings.create({
                input: query,
                model: "text-embedding-3-small"
            });

            const similarities = this.articleEmbeddings.map(doc => {
                if (!doc.embedding) return 0;
                return cosineSimilarity(queryEmbedding.data[0].embedding, doc.embedding);
            });

            const mostRelevantIndex = similarities.indexOf(Math.max(...similarities));
            return this.articleEmbeddings[mostRelevantIndex].text;
        }

        async predict(question: string): Promise<{
            answer: string;
            context: string;
        }> {
            const context = await this.getMostRelevantDocument(question);
            
            const response = await this.openai.chat.completions.create({
                model: this.modelName,
                messages: [
                    { role: "system", content: this.systemMessage },
                    { 
                        role: "user", 
                        content: `Use the following information to answer the subsequent question. If the answer cannot be found, write "I don't know."
                        Context:
                        """
                        ${context}
                        """
                        Question: ${question}`
                    }
                ],
                temperature: 0
            });

            return {
                answer: response.choices[0].message.content || "",
                context
            };
        }
    }

    interface ScorerResult {
        verdict: boolean;
    }

    interface QuestionRow {
        question: string;
    }

    function createQuestionDataset(): weave.Dataset<QuestionRow> {
        return new weave.Dataset<QuestionRow>({
            id: 'rag-questions',
            rows: [
                { question: "What significant result was reported about Zealand Pharma's obesity trial?" },
                { question: "How much did Berkshire Hathaway's cash levels increase in the fourth quarter?" },
                { question: "What is the goal of Highmark Health's integration of Google Cloud and Epic Systems technology?" }
            ]
        });
    }

    const contextPrecisionScore = weave.op(async function(args: {
        datasetRow: QuestionRow;
        modelOutput: { answer: string; context: string; }
    }): Promise<ScorerResult> {
        const openai = new OpenAI();
        
        const prompt = `Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output.
        Output in only valid JSON format.

        question: ${args.datasetRow.question}
        context: ${args.modelOutput.context}
        answer: ${args.modelOutput.answer}
        verdict: `;

        const response = await openai.chat.completions.create({
            model: "gpt-4-turbo-preview",
            messages: [{ role: "user", content: prompt }],
            response_format: { type: "json_object" }
        });

        const result = JSON.parse(response.choices[0].message.content || "{}");
        return {
            verdict: parseInt(result.verdict) === 1
        };
    });

    async function main() {
        # Définissez les noms de votre équipe et de votre projet
        await weave.init('<team-name>/rag-quickstart');
        
        const articleEmbeddings = await docsToEmbeddings(articles);
        
        const model = new RAGModel({
            systemMessage: "You are an expert in finance and answer questions related to finance, financial services, and financial markets. When responding based on provided information, be sure to cite the source.",
            articleEmbeddings
        });

        const evaluation = new weave.Evaluation({
            dataset: createQuestionDataset(),
            scorers: [contextPrecisionScore]
        });

        const results = await evaluation.evaluate({
            model: weave.op((args: { datasetRow: QuestionRow }) => 
                model.predict(args.datasetRow.question)
            )
        });
        
        console.log('Evaluation results:', results);
    }

    if (require.main === module) {
        main().catch(console.error);
    }
    ```
  </Tab>
</Tabs>

<div id="conclusion">
  ## Conclusion
</div>

Ce tutoriel vous a montré comment intégrer l’observabilité à différentes étapes de vos applications, comme l’étape de récupération dans cet exemple. Vous avez également appris à créer des fonctions de score plus complexes, comme un juge LLM, pour évaluer automatiquement les réponses de l’application.

<div id="next-steps">
  ## Prochaines étapes
</div>

Voir le [cours RAG++](https://www.wandb.courses/courses/rag-in-production?utm_source=wandb_docs\&utm_medium=code\&utm_campaign=weave_docs) pour approfondir les techniques pratiques de RAG destinées aux ingénieurs. Vous y apprendrez des solutions prêtes pour la Production proposées par W\&B, Cohere et Weaviate afin d'optimiser les performances, de réduire les coûts et d'améliorer la précision et la pertinence de vos applications.
