OpenAI 호환성 test

OpenAI 호환성은 spectrum

모든 local engine이 OpenAI-compatible 주장해. 다 거의 호환. 차이는 예측 가능하지만 표준화 안 됨:

Streaming 형식. 다 SSE emit, 정확한 chunk 모양 다름.
Tool calling. Native Ollama가 parsed dict args 줌; OpenAI는 JSON 문자열 args emit; 일부 engine은 다르게 wrap.
Structured output. Ollama는 JSON Schema의 format 필드. OpenAI는 response_format. vLLM은 guided_json. 같은 아이디어, 다른 키.
Token counting. 약간 다른 (또는 없는) token usage 필드.

호환성 어떻게 test?

Label 믿지 마. 아빠가 쓰는 기능 exercise하는 작은 test suite 돌려. 아래 list가 canonical local-AI 호환성 test:

단순 message로 non-streaming chat.
같은 message로 streaming chat.
Multi-turn 대화 (system + user + assistant).
Tool 정의 + multi-turn tool loop.
Structured output (JSON Schema).
응답의 token usage.

Engine에서 6개 다 통과하면 app 변경 없이 engine swap 가능. 4번이나 5번 fail하면 engine별 shim 작성 예상해.

Code

호환성 test harness·python

import httpx, json, time

def test_compat(base_url: str, model: str, label: str = ""):
    print(f"\n=== {label or base_url} ===")
    client_kwargs = {"timeout": 120.0}

    # 1. Non-streaming
    r = httpx.post(f"{base_url}/chat/completions", json={
        "model": model,
        "messages": [{"role": "user", "content": "Say 'ok' and nothing else."}],
        "stream": False,
    }, **client_kwargs)
    print(f"1. non-stream: {'PASS' if r.status_code == 200 else 'FAIL'}")

    # 2. Streaming
    with httpx.stream("POST", f"{base_url}/chat/completions", json={
        "model": model,
        "messages": [{"role": "user", "content": "Count: 1 2 3"}],
        "stream": True,
    }, timeout=None) as r2:
        got_chunks = sum(1 for line in r2.iter_lines() if line.startswith("data:"))
    print(f"2. stream:     {'PASS' if got_chunks > 1 else 'FAIL'} ({got_chunks} chunks)")

    # 3. Tool call
    tools = [{"type": "function", "function": {
        "name": "say_ok", "description": "Reply with 'ok'.",
        "parameters": {"type": "object", "properties": {}, "required": []},
    }}]
    r3 = httpx.post(f"{base_url}/chat/completions", json={
        "model": model,
        "messages": [{"role": "user", "content": "Use the say_ok tool."}],
        "tools": tools, "stream": False,
    }, **client_kwargs)
    has_tools = bool(r3.json().get("choices", [{}])[0].get("message", {}).get("tool_calls"))
    print(f"3. tool_call:  {'PASS' if has_tools else 'FAIL'}")

# 여러 engine에 돌려
test_compat("http://localhost:11434/v1", "qwen2.5:7b",   label="Ollama")
test_compat("http://localhost:8080/v1",  "qwen-direct",  label="llama-server")
# test_compat("http://localhost:8000/v1", "Qwen/Qwen2.5-7B-Instruct", label="vLLM")

OpenAI 호환성은 spectrum

호환성 어떻게 test?

Code

External links

Exercise

Progress

댓글 0