OpenAI 호환 엔드포인트

~22 min · inference, openai-compat

Level 0스카우트

0 XP0/50 lessons0/10 achievements

0/120 XP to next level120 XP to go0% complete

왜 OpenAI 호환이 중요한가

야생의 클라이언트 대부분 — LangChain, LlamaIndex, 너 커스텀 shim, OpenAI SDK 자체 — OpenAI wire 포맷 사용. HF 의 Inference Providers 가 https://router.huggingface.co/v1/... 에 OpenAI 호환 엔드포인트 expose. OpenAI SDK 가 그 URL + HF 토큰 가리키게 하면, HF-routed 프로바이더의 어떤 모델이든 OpenAI 호출이라 생각하는 코드로 접근 가능.

같은 거

chat completions, embeddings, (일부) image 엔드포인트. messages, tools, response_format, streaming chunk JSON 모양. OPENAI_API_KEY 자리에 Authorization: Bearer ${HF_TOKEN}.

다른 거

모델 id 가 OpenAI 모델 이름 X, HF 레포 id. 일부 response 필드 (provider-specific 메타) 일부 carry, 일부 X. Rate limit 이 HF 거, OpenAI 거 X.

Code

OpenAI SDK 로 HF 치기·python

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

resp = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",  # HF 레포 id
    messages=[{"role": "user", "content": "Hi from the OpenAI SDK pointed at HF."}],
    max_tokens=60,
)
print(resp.choices[0].message.content)

OpenAI SDK 통한 streaming·python

from openai import OpenAI
import os

client = OpenAI(base_url="https://router.huggingface.co/v1", api_key=os.environ["HF_TOKEN"])

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "List three open-source LLMs."}],
    max_tokens=200,
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

External links

Exercise

OpenAI 호출하는 기존 코드 (또는 새로 짜) 가져와. Llama-3 instruct 모델로 HF router 호출하게 스위치. diff: 몇 줄 바뀜 (2 줄: base_url 과 model). 둘 버전 같은 프롬프트로 돌려 출력 비교.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousStreaming, Tool Use, Structured Output Next →비용, Rate Limit, 재시도 전략

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.