Vision 의 Transfer Learning

왜 transfer learning 이 default

Pretrained backbone 이 수백만 example 에서 'image 가 어떻게 작동하는지' 이미 학습. 본인 downstream task — 새 7 종 분류 — 는 5000 라벨 example 정도. From scratch 5000 image 에 CNN train 하면 catastrophically overfit. Transfer learning 이 backbone 의 이해 재사용하고 task 에 small head 만 train 하게 해.

3 가지 맛: frozen backbone (head 만 train, 가장 싸고 자주 surprisingly competitive), full fine-tune (전부 train, best accuracy, more compute), partial unfreeze (head + 마지막 몇 layer train, 중간).

팁: 첫 run 에 frozen backbone default. 빠르고, baseline 주고, 충분할 수도. Accuracy 부족하면 마지막 block unfreeze, 여전히 부족하면 tiny learning rate (1e-4 또는 1e-5) 로 full fine-tune.

Fine-tuning 의 lr trick

Full fine-tuning 에 from-scratch training 보다 훨씬 작은 learning rate (1e-3 대신 1e-4 또는 1e-5) 사용. 옵션으로 새 head 에 pretrained backbone 보다 높은 learning rate — head 는 random 시작하고 update 더 필요, backbone 은 이미 좋고 큰 update 가 손상.

Data preprocessing match

Backbone 이 train 된 preprocessing 매치. ImageNet pretrained model 은 specific normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 와 specific input size (보통 224×224) 기대. torchvision 이 weights.transforms() 통해 올바른 transform 노출 — 그거 써, 다시 발명하지 마.

원칙: Transfer learning 이 vision 에 본인이 할 수 있는 가장 leverage 높은 'modeling' 결정. Frozen ResNet50 + linear head 가 보통 10K image 에 from scratch train 한 어떤 것도 이김.

Code

Frozen-backbone transfer learning·python

import torch
import torch.nn as nn
import torchvision.models as tvm
from torchvision.models import ResNet50_Weights

weights = ResNet50_Weights.IMAGENET1K_V2
backbone = tvm.resnet50(weights=weights)

# Freeze all backbone parameters
for p in backbone.parameters():
    p.requires_grad = False

# Replace head with task-specific layer (only trainable)
backbone.fc = nn.Linear(backbone.fc.in_features, n_classes)

# Optimizer over only the trainable params
opt = torch.optim.AdamW([p for p in backbone.parameters() if p.requires_grad], lr=1e-3)

# Use the backbone's preprocessing
preprocess = weights.transforms()

Full fine-tune with discriminative learning rates·python

# Unfreeze everything but train head faster than backbone
for p in backbone.parameters():
    p.requires_grad = True

backbone_params = [p for n, p in backbone.named_parameters() if not n.startswith("fc.")]
head_params     = [p for n, p in backbone.named_parameters() if n.startswith("fc.")]

opt = torch.optim.AdamW([
    {"params": backbone_params, "lr": 1e-5},
    {"params": head_params,     "lr": 1e-3},
])

왜 transfer learning 이 default

Fine-tuning 의 lr trick

Data preprocessing match

Code

External links

Exercise

Progress

댓글 0