Provider Routing 과 가격 표면

~24 min · inference, providers

Level 0스카우트

0 XP0/50 lessons0/10 achievements

0/120 XP to next level120 XP to go0% complete

Hub 가 곧 router

HF 의 Inference Providers 는 routing 레이어: 모델 고르고, 플랫폼이 어떤 프로바이더가 서빙하는지 보여주고, InferenceClient(provider=...) 요청이 HF edge 통해 그 프로바이더 hit. HF 한테 결제, HF 가 프로바이더한테 결제. Auth 는 너 HF 토큰.

왜 직접 안 가고 HF 통해서

Auth 레이어 하나 — 토큰 하나, 프로바이더별 키 다섯 개 X.
Billing 표면 하나 — invoice 하나, 다섯 개 X.
Provider 추상화 — Provider A 다운되면 string 하나 바꾸기.
Provider 별 free tier — 일부 계정에 HF 가 프로바이더별 무료 크레딧 pass-through.

직접 가야 할 때

HF 가 surface 안 하는 프로바이더 기능 (커스텀 JSON 모드, adapter routing, region 핀) 필요하면 가끔 프로바이더 native SDK 로 내려가. InferenceClient 의 escape hatch 가 OpenAI 호환 모드 — 다음 레슨.

Code

어떤 프로바이더가 모델 서빙하는지 발견·python

from huggingface_hub import HfApi

api = HfApi()
info = api.model_info("meta-llama/Llama-3.1-8B-Instruct")

# 인퍼런스 프로바이더 리스트가 .inference 에 (있을 때) 뜸
print("inference field:", getattr(info, "inference", None))

# 더 reliable, provider 태그 체크
providers = [t for t in (info.tags or []) if t.startswith("inference-providers:")]
print(providers)

string 하나 바꿔 provider 스위치·python

from huggingface_hub import InferenceClient

def ask(provider):
    client = InferenceClient(model="meta-llama/Llama-3.1-8B-Instruct", provider=provider)
    return client.chat_completion(
        messages=[{"role": "user", "content": "ping"}],
        max_tokens=10,
    ).choices[0].message.content

for p in ["hf-inference", "together", "fireworks-ai"]:
    try:
        print(p, "->", ask(p))
    except Exception as e:
        print(p, "->", type(e).__name__)

External links

Exercise

인기 모델 골라. 서빙하는 프로바이더 셋 식별. 같은 프롬프트 5번씩 각 프로바이더에 돌려. 추적: avg latency, p95 latency, output 차이 (response shape, content). 각 프로바이더 publish 가격 기반으로 $/1k-token 추정.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousInferenceClient: 객체 하나, 백엔드 여러 개 Next →Streaming, Tool Use, Structured Output

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.