Loss Function

Task 에 loss 맞추기

Loss 는 data 와 optimizer 사이의 contract. 잘못 고르면 model 이 잘못된 걸 학습 — accuracy 가 표면적으로 괜찮아 보여도.

Multi-class classification — nn.CrossEntropyLoss. Raw logit 받고, integer class label 기대.
Multi-label classification — nn.BCEWithLogitsLoss. Label 당 sigmoid, label 당 BCE.
Binary classification — single output 의 nn.BCEWithLogitsLoss.
Regression (squared error) — nn.MSELoss. Outlier 에 sensitive.
Regression (robust) — nn.SmoothL1Loss 또는 nn.HuberLoss.
Ranking / similarity — triplet loss, contrastive loss, InfoNCE.
Sequence generation — token-level cross-entropy + 옵션으로 label smoothing.

팁: Validation split 의 accuracy 가 괜찮아 보이는데 model 이 edge case 에서 이상하게 행동하면, model 의심 전에 loss 의심해. Loss 가 training 이 optimize 하는 거고, accuracy 는 그것의 한 report.

Cross-entropy 가 일꾼

Logit z 와 true class c 의 single example cross-entropy 는 -log(softmax(z)[c]). 옳은 class 의 logit 을 위로, 틀린 class 의 logit 을 아래로 밀고, gradient 가 아름답게 깔끔: softmax(z) - one_hot(c).

Balanced class 는 plain cross-entropy. Imbalanced 는 class 당 weight= pass 하거나 focal loss (쉬운 example down-weight). 매우 큰 vocabulary (LLM) 는 full softmax 대신 negative subset sampling.

원칙: 쓰는 모든 loss 의 docstring 읽어. 'reduction' argument (mean/sum/none) 가 junior code 의 silent metric bug 절반의 원인이야.

Class weight, label smoothing, 친구들

Imbalanced data 면 weight=class_weights pass. Overconfident model 이면 label_smoothing=0.1 로 target softer. Multi-label task 면 label 당 BCE 가 올바른 shape. 이 knob 들은 basic loss 위에 앉아서 calibration 만 바꾸지 선택은 잘 안 바꿔.

Code

Loss functions for the four common shapes·python

import torch, torch.nn as nn

logits_mc = torch.randn(8, 10)
labels_mc = torch.randint(0, 10, (8,))
print(nn.CrossEntropyLoss()(logits_mc, labels_mc).item())

weights = torch.tensor([2.0]*5 + [1.0]*5)
loss_mc_balanced = nn.CrossEntropyLoss(weight=weights, label_smoothing=0.1)
print(loss_mc_balanced(logits_mc, labels_mc).item())

logits_bc = torch.randn(8, 1)
labels_bc = torch.randint(0, 2, (8, 1)).float()
print(nn.BCEWithLogitsLoss()(logits_bc, labels_bc).item())

preds  = torch.randn(8, 1)
target = torch.randn(8, 1)
print(nn.MSELoss()(preds, target).item())
print(nn.SmoothL1Loss()(preds, target).item())

Task 에 loss 맞추기

Cross-entropy 가 일꾼

Class weight, label smoothing, 친구들

Code

External links

Exercise

Progress

댓글 0