C.W.K.
Stream
Lesson 02 of 07 · published

Image Augmentation — Random Crop, Flip, Color, MixUp, CutMix

~14 min · augmentation, mixup, cutmix, regularization

Level 0Tensor 호기심
0 XP0/62 lessons0/13 achievements
0/120 XP to next level120 XP to go0% complete

Augmentation 은 data 로 위장한 regularization

각 training image 의 (살짝 perturb 된) view 를 model 이 더 많이 볼수록, 외우기 더 어렵고 generalize 더 잘함. Augmentation 이 computer vision 에서 가장 비용 효과적인 단일 regularizer.

표준 image augmentation

  • RandomResizedCrop — random scale 과 aspect ratio crop, 고정 output 으로 resize. 표준 ImageNet-style crop.
  • RandomHorizontalFlip(p=0.5) — natural image 에 universal.
  • RandomVerticalFlip — up/down 이 의미 안 바꿀 때만 (medical, satellite, abstract pattern).
  • ColorJitter — brightness, contrast, saturation, hue 변동.
  • RandomRotation, RandomAffine — 작은 rotation 과 translation.
  • RandomErasing — random rectangle zero (a.k.a. Cutout). 놀랍게 효과적.

Batch-level augmentation: MixUp 과 CutMix

두 image 잡고 blend — label 도 blend. model 이 mix 에 해당하는 soft label 예측 강요. modern image-classification training 의 표준 (2020+ ImageNet baseline 다 가짐).

  • MixUp — Beta distribution 의 weight α 로 두 image 사이 pixel-wise linear interpolation.
  • CutMix — image B 에서 rectangle cut, image A 에 paste; label 이 paste 면적으로 weight.

cardinal rule

Train transform 이 augmentation 적용; val/test transform 이 deterministic resize + normalize 만. validation data 를 augment 하면 움직이는 과녁 측정. 두 별도 Compose pipeline.

Code

Train vs val transform — 절대 안 섞기·python
import torch
import torchvision.transforms.v2 as T

# TRAINING: includes augmentation
train_tf = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.RandomResizedCrop(224, scale=(0.8, 1.0)),
    T.RandomHorizontalFlip(p=0.5),
    T.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.05),
    T.RandomRotation(15),
    T.RandomErasing(p=0.1),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# VALIDATION: deterministic only — resize, center-crop, normalize
val_tf = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize(256),
    T.CenterCrop(224),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
MixUp 과 CutMix — batch 별 적용·python
import torch
import torch.nn as nn
from torchvision.transforms.v2 import MixUp, CutMix, RandomChoice

mixup = MixUp(alpha=0.2, num_classes=10)
cutmix = CutMix(alpha=1.0, num_classes=10)

# Randomly pick one each batch
batch_aug = RandomChoice([mixup, cutmix])

# Loss: with mixed labels, use CE with soft targets
criterion = nn.CrossEntropyLoss()

for x, y in train_loader:
    x, y = batch_aug(x, y)               # y is now soft (probability over classes)
    out = model(x)
    loss = criterion(out, y)
    loss.backward()
한 transformed image inspect·python
import torch
import torchvision.transforms.v2 as T
from torchvision import tv_tensors

# Build a synthetic image
img = torch.randint(0, 255, (3, 224, 224), dtype=torch.uint8)

aug = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.RandomResizedCrop(224, scale=(0.5, 1.0)),
    T.RandomHorizontalFlip(p=0.5),
])

# Apply twice — different random outcomes
out1 = aug(img)
out2 = aug(img)
print(torch.equal(out1, out2))   # False — random transforms differ each call

External links

Exercise

CIFAR-10 위 train 과 val transform pipeline 짓기. 같은 image 에 각각 5 번 적용 — train 은 5 다른 output, val 은 5 동일 output 생산해야. 2x5 grid 로 저장 (matplotlib). 시각적 차이가 룰 기억 박음.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.