Image Augmentation — Random Crop, Flip, Color, MixUp, CutMix

Augmentation 은 data 로 위장한 regularization

각 training image 의 (살짝 perturb 된) view 를 model 이 더 많이 볼수록, 외우기 더 어렵고 generalize 더 잘함. Augmentation 이 computer vision 에서 가장 비용 효과적인 단일 regularizer.

표준 image augmentation

RandomResizedCrop — random scale 과 aspect ratio crop, 고정 output 으로 resize. 표준 ImageNet-style crop.
RandomHorizontalFlip(p=0.5) — natural image 에 universal.
RandomVerticalFlip — up/down 이 의미 안 바꿀 때만 (medical, satellite, abstract pattern).
ColorJitter — brightness, contrast, saturation, hue 변동.
RandomRotation, RandomAffine — 작은 rotation 과 translation.
RandomErasing — random rectangle zero (a.k.a. Cutout). 놀랍게 효과적.

Batch-level augmentation: MixUp 과 CutMix

두 image 잡고 blend — label 도 blend. model 이 mix 에 해당하는 soft label 예측 강요. modern image-classification training 의 표준 (2020+ ImageNet baseline 다 가짐).

MixUp — Beta distribution 의 weight α 로 두 image 사이 pixel-wise linear interpolation.
CutMix — image B 에서 rectangle cut, image A 에 paste; label 이 paste 면적으로 weight.

cardinal rule

Train transform 이 augmentation 적용; val/test transform 이 deterministic resize + normalize 만. validation data 를 augment 하면 움직이는 과녁 측정. 두 별도 Compose pipeline.

Code

Train vs val transform — 절대 안 섞기·python

import torch
import torchvision.transforms.v2 as T

# TRAINING: includes augmentation
train_tf = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.RandomResizedCrop(224, scale=(0.8, 1.0)),
    T.RandomHorizontalFlip(p=0.5),
    T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.05),
    T.RandomRotation(15),
    T.RandomErasing(p=0.1),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# VALIDATION: deterministic only — resize, center-crop, normalize
val_tf = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize(256),
    T.CenterCrop(224),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

MixUp 과 CutMix — batch 별 적용·python

import torch
import torch.nn as nn
from torchvision.transforms.v2 import MixUp, CutMix, RandomChoice

mixup = MixUp(alpha=0.2, num_classes=10)
cutmix = CutMix(alpha=1.0, num_classes=10)

# Randomly pick one each batch
batch_aug = RandomChoice([mixup, cutmix])

# Loss: with mixed labels, use CE with soft targets
criterion = nn.CrossEntropyLoss()

for x, y in train_loader:
    x, y = batch_aug(x, y)               # y is now soft (probability over classes)
    out = model(x)
    loss = criterion(out, y)
    loss.backward()

한 transformed image inspect·python

import torch
import torchvision.transforms.v2 as T
from torchvision import tv_tensors

# Build a synthetic image
img = torch.randint(0, 255, (3, 224, 224), dtype=torch.uint8)

aug = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.RandomResizedCrop(224, scale=(0.5, 1.0)),
    T.RandomHorizontalFlip(p=0.5),
])

# Apply twice — different random outcomes
out1 = aug(img)
out2 = aug(img)
print(torch.equal(out1, out2))   # False — random transforms differ each call

Image Augmentation — Random Crop, Flip, Color, MixUp, CutMix

Augmentation 은 data 로 위장한 regularization

표준 image augmentation

Batch-level augmentation: MixUp 과 CutMix

cardinal rule

Code

External links

Exercise

Progress

댓글 0