Metric 과 Evaluation

Loss 가 gradient signal; metric 이 성적표

loss 에 train. task-specific metric 에 evaluate. 같은 숫자 아니고, 차이 중요.

Classification metric

Accuracy — 정확하게 predict 한 비율. 해석 쉬움, class 불균형이면 무용.
Precision / Recall / F1 — accuracy 를 'positive predict 한 거 중 실제 positive (precision)' 와 '모든 실제 positive 중 잡은 거 (recall)' 로 분해하는 per-class metric. F1 이 그것의 harmonic mean.
Top-K accuracy — 진짜 class 가 top K prediction 안에 있는 비율. ImageNet 표준 (top-1, top-5).
ROC AUC — receiver operating characteristic 아래 면적. binary classifier 용; threshold-independent.

Regression metric

MAE / RMSE — mean absolute error 와 root mean squared error. 둘 다 target 단위.
R² — 설명된 variance 비율.

validation 만, eval mode 에서, inference_mode 로 metric 계산

metric 계산이 training-mode 행동 (BatchNorm, Dropout) 오염하거나 필요 없는 autograd graph 짓길 원치 않음.

torchmetrics — 직접 짜기 싫을 때

torchmetrics 패키지 (별도 설치) 에 모든 흔한 metric 의 tested 구현, 포함 streaming variant 가 batch 들 누적 update. accuracy 넘어선 거면 의지.

Code

Accuracy 와 per-class precision/recall — 손으로·python

import torch

def accuracy(logits, targets):
    return (logits.argmax(-1) == targets).float().mean().item()

def precision_recall_f1(logits, targets, num_classes):
    preds = logits.argmax(-1)
    out = {}
    for c in range(num_classes):
        tp = ((preds == c) & (targets == c)).sum().item()
        fp = ((preds == c) & (targets != c)).sum().item()
        fn = ((preds != c) & (targets == c)).sum().item()
        prec = tp / (tp + fp) if (tp + fp) else 0.0
        rec  = tp / (tp + fn) if (tp + fn) else 0.0
        f1 = 2 * prec * rec / (prec + rec) if (prec + rec) else 0.0
        out[c] = {'precision': prec, 'recall': rec, 'f1': f1}
    return out

logits = torch.randn(64, 5)
targets = torch.randint(0, 5, (64,))
print(f"acc: {accuracy(logits, targets):.3f}")
print(precision_recall_f1(logits, targets, 5))

완전한 evaluation 함수·python

import torch
import torch.nn as nn

def evaluate(model, val_loader, criterion, device):
    model.eval()
    total_loss = 0.0
    total_correct = 0
    total_samples = 0

    with torch.inference_mode():
        for x, y in val_loader:
            x, y = x.to(device), y.to(device)
            out = model(x)
            loss = criterion(out, y)

            total_loss += loss.item() * x.size(0)
            total_correct += (out.argmax(-1) == y).sum().item()
            total_samples += x.size(0)

    return total_loss / total_samples, total_correct / total_samples

# Usage
val_loss, val_acc = evaluate(model, val_loader, criterion, device)
print(f"val_loss={val_loss:.4f}  val_acc={val_acc:.4f}")

torchmetrics — tested 원할 때·python

# pip install torchmetrics
import torch
from torchmetrics.classification import (
    MulticlassAccuracy, MulticlassF1Score, MulticlassConfusionMatrix
)

n_classes = 10
acc = MulticlassAccuracy(num_classes=n_classes, top_k=1)
acc5 = MulticlassAccuracy(num_classes=n_classes, top_k=5)
f1 = MulticlassF1Score(num_classes=n_classes, average='macro')
cm = MulticlassConfusionMatrix(num_classes=n_classes)

# Accumulate across batches
for x, y in val_loader:
    logits = model(x)
    acc.update(logits, y)
    acc5.update(logits, y)
    f1.update(logits, y)
    cm.update(logits, y)

print(f"top-1: {acc.compute():.4f}")
print(f"top-5: {acc5.compute():.4f}")
print(f"macro-F1: {f1.compute():.4f}")
print("Confusion matrix:")
print(cm.compute())

Loss 가 gradient signal; metric 이 성적표

Classification metric

Regression metric

validation 만, eval mode 에서, inference_mode 로 metric 계산

torchmetrics — 직접 짜기 싫을 때

Code

External links

Exercise

Progress

댓글 0