Cost-Quality Tradeoff Analysis

Pareto 경계 매핑

모든 LLM system 이 cost 와 quality 사이 tradeoff 위에 sit. Evaluation 이 정확히 curve 어디 있는지 — 그리고 어디 갈 수 있는지 — 알려줘.

Pareto-frontier 연습

Representative eval set 골라.
같은 eval 을 여러 model / prompt / configuration 에 돌려, quality (eval pass rate) 와 cost (per-1k-call 달러 비용) 둘 다 캡처.
Quality vs cost plot. 각 (model, config) 가 한 점.
"Pareto frontier" 는 cost 증가 없이 quality 향상 못 하는 configuration set. Frontier 밖 점은 dominated — 두 axis 에 strictly worse.

Chart 가 보통 보여주는 것

대부분 product task 에 curve 가 바닥에 가파르고 (cheap model 이 거의 안 작동, modest 비용 증가가 큰 quality jump 사) top 에 평평 (frontier model 이 marginal quality 위해 10x 더 비쌈). Curve 의 옳은 점은 product 의 quality threshold 가 가장 작은 acceptable cost 와 만나는 곳.

원칙: 항상 cost-quality grid 돌려. Dominated configuration — 두 axis 에 strictly worse 인 점 — 은 보통 plot 후에만 보여.

Advanced system 위 model routing

모든 거에 한 model 고르지 마. Easy query 를 cheap model 로, hard query 를 expensive 로 route. 단순 router (query length / topic / classifier 위 regex) + quality classifier 가 inference 비용 3-10x 줄이고 quality 보존 가능. RouteLLM, Martian, OpenAI Routing API 가 표준 도구.

Code

Pareto-frontier grid run·python

MODELS = [
    {"name": "haiku-4.5",      "cost_per_1k": 0.25, "quality": None},
    {"name": "gpt-5-mini",     "cost_per_1k": 0.15, "quality": None},
    {"name": "sonnet-4.6",     "cost_per_1k": 3.00, "quality": None},
    {"name": "opus-4.7",       "cost_per_1k": 15.00,"quality": None},
    {"name": "gpt-5",          "cost_per_1k": 5.00, "quality": None},
]

for m in MODELS:
    pass_rate = run_eval_suite(model=m["name"], dataset=DATASET)
    m["quality"] = pass_rate

# Sort by cost; for each successive point, check if quality > best so far.
# Points where quality ≤ a cheaper one are dominated — drop them.
frontier = []
best_q = -1.0
for m in sorted(MODELS, key=lambda x: x["cost_per_1k"]):
    if m["quality"] > best_q:
        frontier.append(m)
        best_q = m["quality"]
print("Pareto frontier:", frontier)

단순 router — easy 에 cheap, hard 에 expensive·python

def difficulty_classifier(query: str) -> str:
    if len(query) < 30 and "why" not in query.lower():
        return "easy"
    if any(k in query.lower() for k in ("explain", "compare", "analyze", "why")):
        return "hard"
    return "medium"

def route(query):
    diff = difficulty_classifier(query)
    return {
        "easy":   "haiku-4.5",
        "medium": "sonnet-4.6",
        "hard":   "opus-4.7",
    }[diff]

# Validate: run eval with the routed setup. Quality should approximate
# always-using-opus while cost approximates always-using-haiku for the
# easy share of traffic.

Cost-Quality Tradeoff Analysis

Pareto 경계 매핑

Pareto-frontier 연습

Chart 가 보통 보여주는 것

Advanced system 위 model routing

Code

External links

Exercise

Progress

댓글 0