POST /api/chat — 대화형 endpoint

모양

/api/chat은 messages array (system / user / assistant / tool role) 받아서 streaming NDJSON이나 single JSON 객체 (stream에 따라) 반환. Raw completion / FIM 빼고 다 이 endpoint 써.

Request 필드

model (필수) — ollama list에 나오는 그대로의 모델 이름.
messages — {role, content, images?, tool_calls?} array. Role: system, user, assistant, tool.
stream — boolean (default true).
format — "json" 또는 JSON Schema 객체 (structured output용).
options — Inference parameter (temperature, top_p, num_ctx, num_predict 등).
tools — OpenAI-format tool 정의 array (function calling용).
keep_alive — 이 request 후 모델 메모리 유지 시간 (default "5m"; "-1"로 pin).
think — boolean; reasoning 모델의 thinking 출력 활성화.

Response 필드 (non-streaming)

message — {role, content, tool_calls?}.
done — boolean (stream off일 땐 항상 true).
Timing: total_duration, load_duration, prompt_eval_count, prompt_eval_duration, eval_count, eval_duration — 다 nanosecond.

왜 /api/chat을 /api/generate보다 선호?

messages 모양이 role 경계 (system instruction vs user turn vs assistant output vs tool result) 의식하게 강제해. 그 구조가 뒤에서 tool 사용, multi-turn context, adapter 패턴이 일관되게 굴러가게 해주는 거야. /api/generate는 단순해 보이는데 concatenated-string prompt로 밀어붙여서 조합하기 더 어려워져.

Code

Python — non-streaming chat·python

import httpx

def chat(model: str, messages: list[dict]) -> dict:
    resp = httpx.post(
        "http://localhost:11434/api/chat",
        json={"model": model, "messages": messages, "stream": False},
        timeout=120.0,
    )
    resp.raise_for_status()
    return resp.json()

result = chat("qwen2.5:7b", [
    {"role": "system", "content": "You are concise."},
    {"role": "user", "content": "Define quantization in two sentences."},
])
print(result["message"]["content"])
print(f"Took {result['total_duration'] / 1e9:.2f}s")
print(f"Generated {result['eval_count']} tokens at "
      f"{result['eval_count'] / (result['eval_duration'] / 1e9):.1f} tok/s")

TypeScript — non-streaming chat·typescript

type Msg = { role: "system" | "user" | "assistant" | "tool"; content: string };

async function chat(model: string, messages: Msg[]) {
  const res = await fetch("http://localhost:11434/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ model, messages, stream: false }),
  });
  if (!res.ok) throw new Error(`Ollama ${res.status}: ${await res.text()}`);
  return res.json();
}

const out = await chat("qwen2.5:7b", [
  { role: "system", content: "You are concise." },
  { role: "user", content: "Define quantization in two sentences." },
]);
console.log(out.message.content);

POST /api/chat — 대화형 endpoint

모양

Request 필드

Response 필드 (non-streaming)

왜 /api/chat을 /api/generate보다 선호?

Code

External links

Exercise

Progress

댓글 0