Chat Completion 생성 - Weights & Biases Documentation

Chat Completion 생성

curl --request POST \
  --url https://api.example.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "content": "<string>",
      "role": "<string>",
      "name": "<string>"
    }
  ],
  "add_generation_prompt": true,
  "add_special_tokens": false,
  "allowed_token_ids": [
    123
  ],
  "bad_words": [
    "<string>"
  ],
  "cache_salt": "<string>",
  "chat_template": "<string>",
  "chat_template_kwargs": {},
  "continue_final_message": false,
  "documents": [
    {}
  ],
  "echo": false,
  "frequency_penalty": 0,
  "ignore_eos": false,
  "include_reasoning": true,
  "include_stop_str_in_output": false,
  "kv_transfer_params": {},
  "length_penalty": 1,
  "logit_bias": {},
  "logprobs": false,
  "max_completion_tokens": 123,
  "max_tokens": 123,
  "min_p": 123,
  "min_tokens": 0,
  "mm_processor_kwargs": {},
  "model": "<string>",
  "n": 1,
  "parallel_tool_calls": true,
  "presence_penalty": 0,
  "priority": 0,
  "prompt_logprobs": 123,
  "repetition_detection": {
    "max_pattern_size": 0,
    "min_count": 0,
    "min_pattern_size": 0
  },
  "repetition_penalty": 123,
  "request_id": "<string>",
  "response_format": {
    "json_schema": {
      "name": "<string>",
      "description": "<string>",
      "schema": {},
      "strict": true
    }
  },
  "return_token_ids": true,
  "return_tokens_as_token_ids": true,
  "seed": 0,
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "stop": [],
  "stop_token_ids": [],
  "stream": false,
  "stream_options": {
    "continuous_usage_stats": false,
    "include_usage": true
  },
  "structured_outputs": {
    "_backend": "<string>",
    "_backend_was_auto": false,
    "choice": [
      "<string>"
    ],
    "disable_additional_properties": false,
    "disable_any_whitespace": false,
    "disable_fallback": false,
    "grammar": "<string>",
    "json": "<string>",
    "json_object": true,
    "regex": "<string>",
    "structural_tag": "<string>",
    "whitespace_pattern": "<string>"
  },
  "temperature": 123,
  "tool_choice": "none",
  "tools": [
    {
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {}
      },
      "type": "function"
    }
  ],
  "top_k": 123,
  "top_logprobs": 0,
  "top_p": 123,
  "truncate_prompt_tokens": 4611686018427388000,
  "use_beam_search": false,
  "user": "<string>",
  "vllm_xargs": {}
}
'

{
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "annotations": {
          "type": "<string>",
          "url_citation": {
            "end_index": 123,
            "start_index": 123,
            "title": "<string>",
            "url": "<string>"
          }
        },
        "audio": {
          "data": "<string>",
          "expires_at": 123,
          "id": "<string>",
          "transcript": "<string>"
        },
        "content": "<string>",
        "function_call": {
          "arguments": "<string>",
          "name": "<string>"
        },
        "reasoning": "<string>",
        "refusal": "<string>",
        "tool_calls": [
          {
            "function": {
              "arguments": "<string>",
              "name": "<string>"
            },
            "id": "<string>",
            "type": "function"
          }
        ]
      },
      "finish_reason": "stop",
      "logprobs": {
        "content": [
          {
            "token": "<string>",
            "bytes": [
              123
            ],
            "logprob": -9999,
            "top_logprobs": [
              {
                "token": "<string>",
                "bytes": [
                  123
                ],
                "logprob": -9999
              }
            ]
          }
        ]
      },
      "stop_reason": 123,
      "token_ids": [
        123
      ]
    }
  ],
  "model": "<string>",
  "usage": {
    "completion_tokens": 0,
    "prompt_tokens": 0,
    "prompt_tokens_details": {
      "cached_tokens": 123
    },
    "total_tokens": 0
  },
  "created": 123,
  "id": "<string>",
  "kv_transfer_params": {},
  "object": "chat.completion",
  "prompt_logprobs": [
    {}
  ],
  "prompt_token_ids": [
    123
  ],
  "system_fingerprint": "<string>"
}

POST

chat

completions

Chat Completion 생성

curl --request POST \
  --url https://api.example.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "content": "<string>",
      "role": "<string>",
      "name": "<string>"
    }
  ],
  "add_generation_prompt": true,
  "add_special_tokens": false,
  "allowed_token_ids": [
    123
  ],
  "bad_words": [
    "<string>"
  ],
  "cache_salt": "<string>",
  "chat_template": "<string>",
  "chat_template_kwargs": {},
  "continue_final_message": false,
  "documents": [
    {}
  ],
  "echo": false,
  "frequency_penalty": 0,
  "ignore_eos": false,
  "include_reasoning": true,
  "include_stop_str_in_output": false,
  "kv_transfer_params": {},
  "length_penalty": 1,
  "logit_bias": {},
  "logprobs": false,
  "max_completion_tokens": 123,
  "max_tokens": 123,
  "min_p": 123,
  "min_tokens": 0,
  "mm_processor_kwargs": {},
  "model": "<string>",
  "n": 1,
  "parallel_tool_calls": true,
  "presence_penalty": 0,
  "priority": 0,
  "prompt_logprobs": 123,
  "repetition_detection": {
    "max_pattern_size": 0,
    "min_count": 0,
    "min_pattern_size": 0
  },
  "repetition_penalty": 123,
  "request_id": "<string>",
  "response_format": {
    "json_schema": {
      "name": "<string>",
      "description": "<string>",
      "schema": {},
      "strict": true
    }
  },
  "return_token_ids": true,
  "return_tokens_as_token_ids": true,
  "seed": 0,
  "skip_special_tokens": true,
  "spaces_between_special_tokens": true,
  "stop": [],
  "stop_token_ids": [],
  "stream": false,
  "stream_options": {
    "continuous_usage_stats": false,
    "include_usage": true
  },
  "structured_outputs": {
    "_backend": "<string>",
    "_backend_was_auto": false,
    "choice": [
      "<string>"
    ],
    "disable_additional_properties": false,
    "disable_any_whitespace": false,
    "disable_fallback": false,
    "grammar": "<string>",
    "json": "<string>",
    "json_object": true,
    "regex": "<string>",
    "structural_tag": "<string>",
    "whitespace_pattern": "<string>"
  },
  "temperature": 123,
  "tool_choice": "none",
  "tools": [
    {
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {}
      },
      "type": "function"
    }
  ],
  "top_k": 123,
  "top_logprobs": 0,
  "top_p": 123,
  "truncate_prompt_tokens": 4611686018427388000,
  "use_beam_search": false,
  "user": "<string>",
  "vllm_xargs": {}
}
'

{
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "annotations": {
          "type": "<string>",
          "url_citation": {
            "end_index": 123,
            "start_index": 123,
            "title": "<string>",
            "url": "<string>"
          }
        },
        "audio": {
          "data": "<string>",
          "expires_at": 123,
          "id": "<string>",
          "transcript": "<string>"
        },
        "content": "<string>",
        "function_call": {
          "arguments": "<string>",
          "name": "<string>"
        },
        "reasoning": "<string>",
        "refusal": "<string>",
        "tool_calls": [
          {
            "function": {
              "arguments": "<string>",
              "name": "<string>"
            },
            "id": "<string>",
            "type": "function"
          }
        ]
      },
      "finish_reason": "stop",
      "logprobs": {
        "content": [
          {
            "token": "<string>",
            "bytes": [
              123
            ],
            "logprob": -9999,
            "top_logprobs": [
              {
                "token": "<string>",
                "bytes": [
                  123
                ],
                "logprob": -9999
              }
            ]
          }
        ]
      },
      "stop_reason": 123,
      "token_ids": [
        123
      ]
    }
  ],
  "model": "<string>",
  "usage": {
    "completion_tokens": 0,
    "prompt_tokens": 0,
    "prompt_tokens_details": {
      "cached_tokens": 123
    },
    "total_tokens": 0
  },
  "created": 123,
  "id": "<string>",
  "kv_transfer_params": {},
  "object": "chat.completion",
  "prompt_logprobs": [
    {}
  ],
  "prompt_token_ids": [
    123
  ],
  "system_fingerprint": "<string>"
}

인증

Authorization

string

header

필수

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

본문

application/json

messages

필수

사용자가 보낸 메시지와 관계없이 모델이 따라야 하는 개발자 제공 지침입니다. o1 모델 및 이후 모델에서는 developer 메시지가 기존 system 메시지를 대체합니다.

Show child attributes

add_generation_prompt

boolean

기본값:true

true이면 생성 프롬프트가 채팅 템플릿에 추가됩니다. 이는 모델의 tokenizer 설정에 있는 채팅 템플릿에서 사용하는 매개변수입니다.

add_special_tokens

boolean

기본값:false

true이면 채팅 템플릿으로 추가되는 내용 외에도 특수 토큰(예: BOS)이 프롬프트에 추가됩니다. 대부분의 모델에서는 채팅 템플릿이 특수 토큰 추가를 처리하므로 false로 설정해야 합니다(기본값도 false).

allowed_token_ids

integer[] | null

bad_words

string[]

cache_salt

string | null

지정하면 다중 사용자 환경에서 공격자가 프롬프트를 추측하지 못하도록 제공된 문자열을 사용해 prefix cache에 솔트를 추가합니다. 솔트는 무작위여야 하고, 제3자가 접근할 수 없도록 보호되어야 하며, 예측할 수 없을 만큼 충분히 길어야 합니다(예: 256비트에 해당하는 base64 인코딩 43자).

chat_template

string | null

이 변환에 사용할 Jinja 템플릿입니다. transformers v4.44부터는 기본 채팅 템플릿이 더 이상 허용되지 않으므로 tokenizer에 채팅 템플릿이 정의되어 있지 않다면 반드시 제공해야 합니다.

chat_template_kwargs

Chat Template Kwargs · object

템플릿 렌더러에 전달할 추가 키워드 인수입니다. 채팅 템플릿에서 접근할 수 있습니다.

continue_final_message

boolean

기본값:false

이 값을 설정하면 채팅이 마지막 메시지가 EOS 토큰 없이 열린 형태가 되도록 포맷됩니다. 모델은 새 메시지를 시작하는 대신 이 메시지를 이어서 생성합니다. 이를 통해 모델 응답의 일부를 "미리 채워 넣을" 수 있습니다. add_generation_prompt와 동시에 사용할 수 없습니다.

documents

Documents · object[] | null

모델이 RAG(검색 증강 생성)를 수행할 때 접근할 수 있는 문서를 나타내는 dict 목록입니다. 템플릿이 RAG를 지원하지 않으면 이 argument는 아무런 효과가 없습니다. 각 문서는 "title" 및 "text" 키를 포함하는 dict로 구성하는 것을 권장합니다.

Show child attributes

echo

boolean

기본값:false

true이면 새 메시지가 마지막 메시지와 동일한 역할에 속할 경우, 마지막 메시지 앞에 추가됩니다.

frequency_penalty

number | null

기본값:0

ignore_eos

boolean

기본값:false

include_reasoning

boolean

기본값:true

include_stop_str_in_output

boolean

기본값:false

kv_transfer_params

Kv Transfer Params · object

분리형 서빙에 사용되는 KVTransfer 매개변수입니다.

length_penalty

number

기본값:1

logit_bias

Logit Bias · object

Show child attributes

logprobs

boolean | null

기본값:false

max_completion_tokens

integer | null

max_tokens

integer | null

지원 중단

min_p

number | null

min_tokens

integer

기본값:0

mm_processor_kwargs

Mm Processor Kwargs · object

HF 프로세서에 전달할 추가 kwargs입니다.

model

string | null

integer | null

기본값:1

parallel_tool_calls

boolean | null

기본값:true

presence_penalty

number | null

기본값:0

priority

integer

기본값:0

요청의 우선순위입니다(값이 낮을수록 더 먼저 처리되며, 기본값은 0). 서빙 중인 모델이 우선순위 스케줄링을 사용하지 않는 경우, 0이 아닌 우선순위를 지정하면 오류가 발생합니다.

prompt_logprobs

integer | null

reasoning_effort

enum<string> | null

사용 가능한 옵션:

low,

medium,

high

repetition_detection

RepetitionDetectionParams · object

출력 토큰에서 반복되는 N-gram 패턴을 감지하기 위한 매개변수입니다. 이러한 반복이 감지되면 생성이 조기에 종료됩니다. LLM은 때때로 반복적이고 유용하지 않은 토큰 패턴을 생성하며, 최대 출력 길이에 도달할 때까지 멈추지 않을 수 있습니다(예: 'abcdabcdabcd...' 또는 '\emoji \emoji \emoji ...'). 이 기능은 이러한 동작을 감지해 조기에 종료함으로써 시간과 토큰을 절약합니다.

Show child attributes

repetition_penalty

number | null

request_id

string

이 요청과 관련된 request_id입니다. 호출자가 이를 설정하지 않으면 random_uuid가 생성됩니다. 이 ID는 Inference 과정 전반에서 사용되며 Response에 반환됩니다.

response_format

ResponseFormat · object

ResponseFormat
StructuralTagResponseFormat
LegacyStructuralTagResponseFormat

Show child attributes

return_token_ids

boolean | null

지정하면 생성된 텍스트와 함께 token ID도 결과에 포함됩니다. 스트리밍 모드에서는 prompt_token_ids가 첫 번째 청크에만 포함되고, token_ids에는 각 청크의 delta token이 포함됩니다. 이는 디버깅하거나 생성된 텍스트를 입력 token에 다시 매핑해야 할 때 유용합니다.

return_tokens_as_token_ids

boolean | null

'logprobs'와 함께 지정하면 JSON으로 인코딩할 수 없는 token을 식별할 수 있도록 token이 'token_id:{token_id}' 형식의 문자열로 표현됩니다.

seed

integer | null

필수 범위: -9223372036854776000 <= x <= 9223372036854776000

skip_special_tokens

boolean

기본값:true

spaces_between_special_tokens

boolean

기본값:true

stop

기본값:[]

stop_token_ids

integer[] | null

stream

boolean | null

기본값:false

stream_options

StreamOptions · object

Show child attributes

structured_outputs

StructuredOutputsParams · object

구조화된 출력용 추가 kwargs입니다.

Show child attributes

temperature

number | null

tool_choice

기본값:none

Allowed value: "none"

tools

ChatCompletionToolsParam · object[] | null

Show child attributes

top_k

integer | null

top_logprobs

integer | null

기본값:0

top_p

number | null

truncate_prompt_tokens

integer | null

필수 범위: -1 <= x <= 9223372036854776000

use_beam_search

boolean

기본값:false

user

string | null

vllm_xargs

Vllm Xargs · object

맞춤형 확장에서 사용하는 추가 요청 매개변수로, 문자열 값 또는 숫자 값(또는 그 목록)을 받습니다.

Show child attributes

응답

성공 응답

choices

ChatCompletionResponseChoice · object[]

필수

Show child attributes

model

string

필수

usage

UsageInfo · object

필수

Show child attributes

created

integer

string

kv_transfer_params

Kv Transfer Params · object

KVTransfer 매개변수입니다.

object

string

기본값:chat.completion

Allowed value: "chat.completion"

prompt_logprobs

(object | null)[] | null

Show child attributes

prompt_token_ids

integer[] | null

service_tier

enum<string> | null

사용 가능한 옵션:

auto,

default,

flex,

scale,

priority

system_fingerprint

string | null

API 개요

Chat Completion 생성

⌘I