API 표면

한 표에 정리한 전체 표면

Ollama HTTP API는 작은 REST 표면이야 — 만질 endpoint 10개 안 돼. Streaming은 default ON. 응답의 모든 duration은 nanosecond 단위. 인증 없음. Daemon이 default로 localhost에 binding되는 게 보안 경계.

Method	Endpoint	용도
POST	`/api/chat`	Conversational (messages array). 거의 모든 경우 이거 써.
POST	`/api/generate`	Raw completion (single prompt). FIM / autocomplete 형태.
GET	`/api/tags`	설치된 모델 list.
POST	`/api/show`	모델 상세.
POST	`/api/pull`	모델 다운로드 (progress stream).
DELETE	`/api/delete`	모델 제거.
POST	`/api/embed`	Embedding 생성.
GET	`/api/ps`	현재 로드된 모델 list.
GET	`/api/version`	Daemon 버전.

물리는 default 두 개

Streaming default ON. {"model": "...", "messages": [...]}를 "stream": false 없이 보내면 NDJSON 받지 single JSON 객체 안 받아. 사람들이 여기서 JSON 파서 자주 깨뜨려.
Duration은 nanosecond. total_duration: 1234567890은 1.23초지 1234초 아냐. 로깅 전에 항상 1e9로 나눠.

OpenAI-compatible endpoint

Ollama는 /v1/chat/completions에 OpenAI-compatible 표면도 노출 — OpenAI Python / TypeScript SDK drop-in 호환용. 거의 호환되지만 정확히 같진 않아 — 뒤의 serving track 참고. Native Ollama 기능 (NDJSON streaming, format으로 structured output, embed endpoint) 쓰려면 /api/... 사용.

Code

Read-only endpoint 다 hit해보기·bash

# Daemon health
curl -s http://localhost:11434/api/version

# 뭐가 설치돼 있나?
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# 지금 뭐가 로드돼 있나?
curl -s http://localhost:11434/api/ps | python3 -m json.tool

# 모델 상세
curl -s http://localhost:11434/api/show -d '{"model":"qwen2.5:7b"}' | python3 -m json.tool

# One-shot non-streaming chat
curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:7b",
  "messages": [{"role":"user","content":"Say hi in 5 words."}],
  "stream": false
}' | python3 -m json.tool

한 표에 정리한 전체 표면

물리는 default 두 개

OpenAI-compatible endpoint

Code

External links

Exercise

Progress

댓글 0