Evaluation Mode

Train mode 와 eval mode 는 다른 함수

일부 PyTorch layer — Dropout, BatchNorm, LayerNorm 의 affine=True 는 보통 같음 — 이 model 이 training 인지 evaluation 인지에 따라 행동 변함. Dropout 은 train 에서 random unit zero, eval 에서 identity. BatchNorm 은 train 에서 batch statistics, eval 에서 running statistics. Switch 잊으면 silent metric noise.

model.train() 과 model.eval() 로 switch. Recursive — top-level module 에 호출하면 모든 submodule switch.

팁: 습관 만들기: 모든 training loop 가 model.train() 으로 시작, 모든 validation loop 가 model.eval() 로 시작. 5 줄 전에 올바르게 set 한 거 '안다' 해도.

Eval mode 와 no_grad/inference_mode pair

Evaluation 에 gradient 안 필요, 계산하면 memory 와 시간 낭비. torch.inference_mode() (newer, faster) 또는 torch.no_grad() (older, 대부분 use case 에 equivalent) 사용.

흔한 eval bug

GPU 에서 accuracy 계산 후 CPU 값 print — 모든 .item() 이 CUDA sync 강제. GPU 에서 aggregate, epoch 끝에 한 번 sync.

Mismatched preprocessing — training 은 augmentation (random crop, flip, color jitter) 적용, evaluation 은 deterministic preprocessing (resize, center crop, normalize) 만. 둘 섞는 게 vision training 의 가장 흔한 silent bug.

원칙: Eval 은 training 아냐. 같은 data layer, 다른 mode, 다른 preprocessing, gradient graph 없음. 이 구분을 일찍 뼈에 새겨 — 'val loss 가 train loss 보다 높은 이유' 디버깅 세션 무수히 절약.

Code

A clean evaluation function·python

import torch

@torch.inference_mode()
def evaluate(model, loader, device):
    model.eval()
    correct, total, total_loss = 0, 0, 0.0
    loss_fn = torch.nn.CrossEntropyLoss(reduction="sum")
    for xb, yb in loader:
        xb, yb = xb.to(device, non_blocking=True), yb.to(device, non_blocking=True)
        logits = model(xb)
        total_loss += loss_fn(logits, yb).item()
        preds = logits.argmax(dim=-1)
        correct += (preds == yb).sum().item()
        total   += yb.size(0)
    return correct / total, total_loss / total

Train mode 와 eval mode 는 다른 함수

Eval mode 와 no_grad/inference_mode pair

흔한 eval bug

Code

External links

Exercise

Progress

댓글 0