NDJSON, SSE 아냐

Streaming 가장 중요한 단 하나의 차이

Cloud API들 (OpenAI, Anthropic, Google)은 Server-Sent Events (SSE)로 stream — 각 chunk가 data: {json}\n\n. Ollama는 newline-delimited JSON (NDJSON)으로 stream — 각 chunk가 그냥 {json}\n. data: prefix 없고 double newline 없어.

SSE 파서를 Ollama에 가리키면 거대한 blob 하나로 보고 영원히 안 오는 data: 기다려. JSON 파서를 Ollama에 가리키면 streaming 응답 전체를 토큰 하나로 읽다가 두 번째 { 만나면 깨져.

NDJSON 줄 하나의 모양

각 줄은 완전히 parsing 가능한 JSON 객체. 흥미로운 필드들:

message.content — 텍스트 delta (토큰 하나 또는 몇 개).
done — boolean. Incremental chunk면 false, 마지막이면 true.
마지막 chunk만: timing 필드 (total_duration, eval_count, eval_duration 등) — 성능 metric용.

Ollama가 NDJSON 고른 이유

NDJSON이 server쪽 produce 단순, client쪽 parse 단순 (newline split, 각 줄 JSON parse), SSE keep-alive comment ritual 안 필요. 비용은 SSE library 손대기 전에 NDJSON임을 알아야 한다는 거.

Code

Streaming 응답의 해부도·typescript

// stream:true /api/chat에서 wire로 받는 거
{"model":"qwen2.5:7b","created_at":"2026-05-03T10:30:00Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"qwen2.5:7b","created_at":"2026-05-03T10:30:00Z","message":{"role":"assistant","content":" sky"},"done":false}
{"model":"qwen2.5:7b","created_at":"2026-05-03T10:30:00Z","message":{"role":"assistant","content":" is"},"done":false}
// ...delta chunk 많이...
// 마지막 chunk:
{"model":"qwen2.5:7b","created_at":"2026-05-03T10:30:01Z","message":{"role":"assistant","content":""},"done":true,"total_duration":1234567890,"eval_count":42,"eval_duration":900000000}

SSE library에 손 대지 마·python

# 잘못 — Ollama에 sseclient 쓰면 영원히 hang
# import sseclient  # ❌

# 옳음 — raw line 읽고 각 줄 JSON parse
import httpx, json

with httpx.stream("POST", "http://localhost:11434/api/chat",
                  json={"model": "qwen2.5:7b",
                        "messages": [{"role": "user", "content": "hi"}],
                        "stream": True},
                  timeout=None) as r:
    for line in r.iter_lines():
        if not line:
            continue
        chunk = json.loads(line)
        print(chunk["message"]["content"], end="", flush=True)
        if chunk.get("done"):
            break

Streaming 가장 중요한 단 하나의 차이

NDJSON 줄 하나의 모양

Ollama가 NDJSON 고른 이유

Code

External links

Exercise

Progress

댓글 0