탐색, 필터링, 네이밍 컨벤션

필터 axis 가 멘탈 모델

/models 브라우즈 페이지가 노출하는 필터 셋이 그대로 API 의 필터야: task (pipeline_tag), library (transformers, diffusers, sentence-transformers, mlx, peft, ...), language (ISO 639-1), license, provider (어떤 인퍼런스 프로바이더가 서빙하는지), size, format. 정렬: trending, downloads, likes, recently updated, recently created. 이걸 홈페이지 chrome 이 아니라 Hub 의 진짜 분류체계로 보면 “X 하는 모델을 못 찾겠어” 에 몇 주 안 날려.

네이밍은 vibe 가 아니라 계약

HF 모델 ID 는 {org}/{base}-{size}-{variant} 컨벤션 따라가. 익혀둘 패턴 몇 개:

meta-llama/Llama-3.1-8B — 베이스 모델. instruction tuning 안 됨.
meta-llama/Llama-3.1-8B-Instruct — instruction-tuned variant. 챗 용도면 raw base 가 명시적으로 필요한 게 아니라면 이거 써.
TheBloke/Llama-2-7B-Chat-GGUF — llama.cpp / Ollama 용 GGUF 양자화.
{org}/{model}-AWQ / -GPTQ / -bnb-4bit — 인퍼런스 서버용 양자화 variant.
{org}/{model}-Matryoshka / -Distill — 특화 derivative (embedding truncation, distillation).

네이밍은 정보용이지 강제 안 돼. YAML front-matter 의 base_model, base_model_relation (다음 레슨) 이 같은 lineage 의 machine-readable 버전이야.

Code

여러 axis 로 필터링·python

from huggingface_hub import HfApi

api = HfApi()

# 한국어 text-generation 모델, Apache-2 only, Transformers loadable
results = api.list_models(
    task="text-generation",
    language="ko",
    library="transformers",
    sort="downloads",
    limit=10,
)
for m in results:
    print(f"{m.id:<55} dl={m.downloads or 0:>8}  {','.join((m.tags or [])[:4])}")

카드 본문 포함 풀텍스트 검색·python

from huggingface_hub import HfApi
api = HfApi()

# id 만이 아니라 모델 카드 / readme 본문도 검색
results = api.list_models(
    search="korean instruction tuning",
    sort="downloads",
    limit=5,
)
for m in results:
    print(m.id)

Exercise

huggingface.co/models 열어. 후보가 20개 이하 남을 때까지 필터 적용: task = text-generation, language = en (또는 ko), library = transformers, license = apache-2.0 또는 mit, parameters 1B - 8B 사이. trending 으로 정렬. 결과에서 on-prem 프로젝트에 진짜로 고려할 만한 셋 골라. 카드 차이 (다음 레슨) 메모.