Threshold 튜닝과 비즈니스 비용

~30 min · threshold, cost-matrix

Level 0Scout

0 XP0/48 lessons0/11 achievements

0/120 XP to next level120 XP to go0% complete

0.5는 거의 옳은 threshold가 아니다

Default 0.5는 권장이 아니라 관습. 옳은 threshold는 false positive 비용, false negative 비용, 팀 operating capacity에 의존. 비즈니스에 매칭되도록 threshold 옮기고, vibe가 아니라 숫자로 선택을 방어해.

정직한 세 threshold 전략

Cost-minimizing — expected cost가 minimum인 threshold.
Recall at precision floor — precision ≥ P인 threshold (팀 capacity 고정일 때 자주 사용).
Top-K capacity — 모두 score 매기고 top K 가져가, downstream이 그 K 위에서 act (밑에서 thresholded ranking).

threshold logging과 monitoring

threshold ship 후, 모델 version 옆에 log. drift가 모델 안 바뀌어도 옳은 threshold를 바꿀 수 있어. 의도된 operating point 대비 실제를 매주 monitor.

Code

expected cost 최소화 threshold·python

import numpy as np

thresholds = np.linspace(0.01, 0.99, 99)
cost_fp, cost_fn = 5, 50
best_t, best_cost = 0.5, float("inf")
for t in thresholds:
    preds = (probs >= t).astype(int)
    fp = ((preds == 1) & (y_val == 0)).sum()
    fn = ((preds == 0) & (y_val == 1)).sum()
    cost = fp * cost_fp + fn * cost_fn
    if cost < best_cost:
        best_cost, best_t = cost, t
print(f"chosen threshold {best_t:.2f}  expected cost {best_cost}")

fixed precision의 recall·python

from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(y_val, probs)
ok = precision[:-1] >= 0.70
if ok.any():
    idx = np.argmax(recall[:-1][ok])
    chosen = thresholds[ok][idx]
    print(f"threshold for precision≥0.70: {chosen:.3f} → recall {recall[:-1][ok][idx]:.3f}")

External links

Exercise

binary classifier에 대해 구체 unit으로 cost matrix를 적어. cost-minimizing threshold와 recall-at-precision-0.7 threshold를 계산. 하나를 operating point로 선택하고 이유 문서화.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousROC-AUC vs PR-AUC Next →퀴즈 · 4 questions

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.