Chat Completions API

Chat Completions API 는 OpenAI 의 언어 모델과 대화하는 가장 기본적인 인터페이스야. 매 호출은 단순한 request-response 패턴 — role 이 부여된 messages 리스트를 보내면 completion 이 돌아와.

모든 request 의 심장은 message array 야. 각 message 는 role 과 content 를 가져. 사용 가능한 role 은:

system / developer — 모델 동작을 잡는 high-level 지시. GPT-5.x 부터는 developer 가 권장.
user — 실제 사용자가 보낸 메시지.
assistant — 이전 모델 응답 (multi-turn 컨텍스트용).
tool — tool/function call 결과, tool_call_id 로 묶임.

Request / Response 라이프사이클

흐름은 단순해 — POST 보내고 200 받고, choices 에서 텍스트 꺼내고, usage 에서 토큰 확인. Streaming 모드에선 stream: true 박으면 SSE (Server-Sent Events) 로 chunk 가 흘러와.

Response 의 핵심 필드

응답에는 choices 배열이 있어 (보통 길이 1). 각 element 는 message object 와 finish_reason 을 가지지 — stop (자연스러운 종료), length (토큰 캡 truncation), tool_calls (tool 호출), content_filter (모더레이션) 중 하나야. usage 는 정확한 토큰 사용량 알려줘.

왜 finish_reason 을 무시하면 안 되는지

finish_reason 안 읽으면 length-truncation 된 응답이랑 tool-call 응답이 정상 completion 처럼 조용히 보여. 그 버그는 항상 production 에서 터지지 dev 에서 안 터져. 매 호출마다 체크.

Code

Request shape (HTTP)·text

Client → POST /v1/chat/completions (messages[], model, params)
       ← HTTP 200 { id, object, created, model, choices[], usage }

Streaming SSE shape·text

Client → POST /v1/chat/completions (stream: true)
       ← HTTP 200 text/event-stream
         data: {"id":"...","choices":[{"delta":{"content":"..."}}]}
         data: [DONE]

Minimal Python 예제·python

from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from env

completion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "developer", "content": "You are a concise assistant."},
        {"role": "user", "content": "What is the speed of light?"},
    ],
    temperature=0.7,
    max_completion_tokens=200,
)

print(completion.choices[0].message.content)
print(f"Tokens used: {completion.usage.total_tokens}")
print(f"Finish reason: {completion.choices[0].finish_reason}")

Request / Response 라이프사이클

Response 의 핵심 필드

왜 finish_reason 을 무시하면 안 되는지

Code

External links

Exercise

Progress

댓글 0