C.W.K.
Stream
Lesson 06 of 08 · published

Metric 과 Evaluation

~12 min · metrics, accuracy, precision, recall, f1

Level 0Tensor 호기심
0 XP0/62 lessons0/13 achievements
0/120 XP to next level120 XP to go0% complete

Loss 가 gradient signal; metric 이 성적표

loss 에 train. task-specific metric 에 evaluate. 같은 숫자 아니고, 차이 중요.

Classification metric

  • Accuracy — 정확하게 predict 한 비율. 해석 쉬움, class 불균형이면 무용.
  • Precision / Recall / F1 — accuracy 를 'positive predict 한 거 중 실제 positive (precision)' 와 '모든 실제 positive 중 잡은 거 (recall)' 로 분해하는 per-class metric. F1 이 그것의 harmonic mean.
  • Top-K accuracy — 진짜 class 가 top K prediction 안에 있는 비율. ImageNet 표준 (top-1, top-5).
  • ROC AUC — receiver operating characteristic 아래 면적. binary classifier 용; threshold-independent.

Regression metric

  • MAE / RMSE — mean absolute error 와 root mean squared error. 둘 다 target 단위.
  • — 설명된 variance 비율.

validation 만, eval mode 에서, inference_mode 로 metric 계산

metric 계산이 training-mode 행동 (BatchNorm, Dropout) 오염하거나 필요 없는 autograd graph 짓길 원치 않음.

torchmetrics — 직접 짜기 싫을 때

torchmetrics 패키지 (별도 설치) 에 모든 흔한 metric 의 tested 구현, 포함 streaming variant 가 batch 들 누적 update. accuracy 넘어선 거면 의지.

Code

Accuracy 와 per-class precision/recall — 손으로·python
import torch

def accuracy(logits, targets):
    return (logits.argmax(-1) == targets).float().mean().item()

def precision_recall_f1(logits, targets, num_classes):
    preds = logits.argmax(-1)
    out = {}
    for c in range(num_classes):
        tp = ((preds == c) & (targets == c)).sum().item()
        fp = ((preds == c) & (targets != c)).sum().item()
        fn = ((preds != c) & (targets == c)).sum().item()
        prec = tp / (tp + fp) if (tp + fp) else 0.0
        rec  = tp / (tp + fn) if (tp + fn) else 0.0
        f1 = 2 * prec * rec / (prec + rec) if (prec + rec) else 0.0
        out[c] = {'precision': prec, 'recall': rec, 'f1': f1}
    return out

logits = torch.randn(64, 5)
targets = torch.randint(0, 5, (64,))
print(f"acc: {accuracy(logits, targets):.3f}")
print(precision_recall_f1(logits, targets, 5))
완전한 evaluation 함수·python
import torch
import torch.nn as nn

def evaluate(model, val_loader, criterion, device):
    model.eval()
    total_loss = 0.0
    total_correct = 0
    total_samples = 0

    with torch.inference_mode():
        for x, y in val_loader:
            x, y = x.to(device), y.to(device)
            out = model(x)
            loss = criterion(out, y)

            total_loss += loss.item() * x.size(0)
            total_correct += (out.argmax(-1) == y).sum().item()
            total_samples += x.size(0)

    return total_loss / total_samples, total_correct / total_samples

# Usage
val_loss, val_acc = evaluate(model, val_loader, criterion, device)
print(f"val_loss={val_loss:.4f}  val_acc={val_acc:.4f}")
torchmetrics — tested 원할 때·python
# pip install torchmetrics
import torch
from torchmetrics.classification import (
    MulticlassAccuracy, MulticlassF1Score, MulticlassConfusionMatrix
)

n_classes = 10
acc = MulticlassAccuracy(num_classes=n_classes, top_k=1)
acc5 = MulticlassAccuracy(num_classes=n_classes, top_k=5)
f1 = MulticlassF1Score(num_classes=n_classes, average='macro')
cm = MulticlassConfusionMatrix(num_classes=n_classes)

# Accumulate across batches
for x, y in val_loader:
    logits = model(x)
    acc.update(logits, y)
    acc5.update(logits, y)
    f1.update(logits, y)
    cm.update(logits, y)

print(f"top-1: {acc.compute():.4f}")
print(f"top-5: {acc5.compute():.4f}")
print(f"macro-F1: {f1.compute():.4f}")
print("Confusion matrix:")
print(cm.compute())

External links

Exercise

두 번째 code block 의 evaluation 함수에 confusion matrix print 추가. torchmetrics.MulticlassConfusionMatrix 사용. 5-class model train 하고 matrix inspect — diagonal-heavy = 좋은 prediction; off-diagonal hot spot 이 흔히 헷갈리는 class 쌍 알려줌.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.