Error Analysis

~28 min · errors, diagnostics

Level 0Scout

0 XP0/48 lessons0/11 achievements

0/120 XP to next level120 XP to go0% complete

error가 가장 싼 데이터

Error analysis가 모델이 뭘 틀리는지 자세히, 직접 눈으로 보는 디시플린. 고전 ML에서 가장 leverage 높은 활동이 매 training run 후 50개 error sample하고 failure mode로 tag. 대부분 팀이 건너뛰고, 안 건너뛰는 팀이 안정적으로 ship 해.

재사용 가능한 triage tag set

Bad label — ground truth 자체가 틀림.
Hard ambiguous — senior reviewer도 양쪽 다 볼 수 있음.
Missing feature — 모델이 필요한 feature가 dataset에 없음.
Distribution shift — example이 training data 같지 않음.
Genuine model error — 모델이 normal example에 그냥 실수.

follow-through

각 tag가 다른 fix를 가리킴. Bad label은 labeling으로 돌려보냄. Missing feature는 work ticket. Distribution shift는 coverage check 트리거. "Genuine model error"만 더 모델 튜닝 정당화. error tag가 몇 달 무익한 hyperparameter search 방지.

Code

error 타입별 stratified sampling·python

import numpy as np
import pandas as pd

preds = (probs >= 0.5).astype(int)
errors = X_val.assign(y_true=y_val.values, y_pred=preds, prob=probs)
errors = errors[errors["y_true"] != errors["y_pred"]]
fp = errors[errors["y_pred"] == 1].sample(min(25, len(errors)), random_state=7)
fn = errors[errors["y_pred"] == 0].sample(min(25, len(errors)), random_state=7)
review = pd.concat([fp, fn]).sample(frac=1, random_state=7)
review.to_csv("errors_to_review.csv", index=False)

subgroup 별 metric으로 systemic error 발견·python

from sklearn.metrics import classification_report

for plan in df_val["plan_tier"].unique():
    mask = df_val["plan_tier"] == plan
    print(f"--- {plan} (n={mask.sum()}) ---")
    print(classification_report(y_val[mask], preds[mask], digits=3))

External links

Exercise

validation set에서 error 50개 sampling. 각각 bad-label / ambiguous / missing-feature / distribution-shift / genuine-model-error로 tag. tag 분포 보고. 가장 큰 expected lift fix 결정.

Progress

Progress is local-only — sign in to sync across devices.

← 🌌 Unsupervised Learning퀴즈 · 4 questions Next →Feature Importance

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.