Loss Function 과 그게 기대하는 것

task 에 맞는 loss 골라 — input 계약 읽기

대부분 'model 이 학습 안 함' 스토리가 loss function 과 model output 사이 mismatch 로 추적. 발견하면 fix 가 보통 한 줄 변경.

Classification

nn.CrossEntropyLoss — multi-class. Input: raw logit shape (N, C); target: int64 class index shape (N,). 내부에서 numerical 안정성 위해 log_softmax + NLL 결합 — 전에 softmax X.
nn.BCEWithLogitsLoss — binary OR multi-label. Input: raw logit; target: {0, 1} 의 float. 안정성 위해 sigmoid + BCE 결합 — 전에 sigmoid X.
nn.NLLLoss — multi-class 인데 logit 대신 log-probability 기대. log_softmax 이미 적용했을 때만.

Regression

nn.MSELoss — mean squared error. regression default. 제곱 때문에 outlier 에 무겁게 penalize.
nn.L1Loss — mean absolute error. outlier 에 더 robust.
nn.SmoothL1Loss / nn.HuberLoss — 0 근처 L2, 꼬리 L1. noisy regression 의 'best of both'.

덜 흔하지만 유용

nn.KLDivLoss — 두 distribution 사이 KL divergence. knowledge distillation 에 사용.
nn.CosineEmbeddingLoss — similarity-based learning (face verification, embedding similarity).
nn.TripletMarginLoss — anchor/positive/negative triple 의 metric learning.

Code

Multi-class classification — CrossEntropyLoss·python

import torch
import torch.nn as nn

# Model outputs raw logits, NOT softmax probabilities
model = nn.Linear(10, 5)               # 5 classes
criterion = nn.CrossEntropyLoss()

x = torch.randn(16, 10)
targets = torch.randint(0, 5, (16,))   # int64 class indices

logits = model(x)
loss = criterion(logits, targets)
print(loss.item())                      # scalar

Binary / multi-label — BCEWithLogitsLoss·python

import torch
import torch.nn as nn

# Binary classifier — single logit per sample
model_bin = nn.Linear(10, 1)
criterion = nn.BCEWithLogitsLoss()

x = torch.randn(16, 10)
y_bin = torch.randint(0, 2, (16, 1)).float()   # MUST be float, not int
loss = criterion(model_bin(x), y_bin)

# Multi-label — N binary outputs per sample
model_ml = nn.Linear(10, 5)             # 5 independent binary labels
y_ml = torch.randint(0, 2, (16, 5)).float()
loss_ml = criterion(model_ml(x), y_ml)
print(loss_ml.item())

Regression — outlier 행동으로 고르기·python

import torch
import torch.nn as nn

x = torch.randn(16, 10)
y = torch.randn(16, 2)
model = nn.Linear(10, 2)

mse = nn.MSELoss()                      # default; squared, sensitive to outliers
l1  = nn.L1Loss()                       # robust, less sharp gradient near zero
huber = nn.SmoothL1Loss(beta=1.0)       # L2 near zero, L1 in tails

for name, fn in [('MSE', mse), ('L1', l1), ('Huber', huber)]:
    print(f"{name}: {fn(model(x), y).item():.4f}")

class_weight — 불균형 class 처리·python

import torch
import torch.nn as nn

# Suppose class 0 is 9x more common than class 1
# Heavier weight on the rare class so the loss cares more about it
weights = torch.tensor([1.0, 9.0])      # one weight per class
criterion = nn.CrossEntropyLoss(weight=weights)

logits = torch.randn(16, 2)
targets = torch.cat([torch.zeros(14, dtype=torch.long), torch.ones(2, dtype=torch.long)])
loss = criterion(logits, targets)
print(loss.item())

Exercise

output shape (B, 5) classifier 짓기. 같은 input + target 에 둘 다 적용: softmax-then-NLLLoss AND raw-logit-with-CrossEntropyLoss. 둘 다 같은 loss 값 (등가) 검증, 그 다음 softmax-then-CrossEntropyLoss 가 double softmax 때문에 잘못된 (더 작은) 값 생산하는지 검증.