C.W.K.
Stream
Lesson 01 of 07 · published

torchvision Dataset 과 v2 Transform API

~12 min · torchvision, v2, transforms

Level 0Tensor 호기심
0 XP0/62 lessons0/13 achievements
0/120 XP to next level120 XP to go0% complete

Built-in dataset 과 modern transform pipeline

torchvision 이 ready-to-use dataset (MNIST, CIFAR-10/100, ImageNet, COCO) 와 transform pipeline ship. 현재 API 는 torchvision.transforms.v2 — 옛 torchvision.transforms 를 추천 path 로 대체. 새 코드엔 v2.

왜 v2 중요

  • tv_tensors (Image, BoundingBoxes, Mask, Video) native 지원 — detection 과 segmentation 에 중요.
  • transform 이 multiple input 에 동시에 정확히 적용: image AND bounding box AND mask 를 한 번에 rotate.
  • 가장 흔한 op 가 다시 쓴 구현 덕에 훨씬 빠름.
  • PIL-image transform 과 tensor transform 사이 깔끔한 분리.

표준 preprocessing chain

ImageNet-pretrained model 위해 이 정확한 recipe 끊임없이 봐:

  1. ToImage() — tv_tensor.Image (v2-native type) 로 wrap.
  2. ToDtype(torch.float32, scale=True) — [0, 1] 의 float32 로 변환.
  3. Resize(256) — 짧은 변을 256 으로.
  4. CenterCrop(224) — 중앙 224x224 crop.
  5. Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) — ImageNet normalization. 이 숫자 외워.

Code

Built-in dataset — CIFAR10 예·python
import torch
import torchvision
from torchvision import datasets
import torchvision.transforms.v2 as T

transform = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize(32),
    T.Normalize(mean=[0.4914, 0.4822, 0.4465],
                std=[0.2470, 0.2435, 0.2616]),  # CIFAR10-specific
])

train_ds = datasets.CIFAR10('./data', train=True, download=True, transform=transform)
test_ds  = datasets.CIFAR10('./data', train=False, download=True, transform=transform)
print(len(train_ds), len(test_ds))   # 50000 10000
print(train_ds[0][0].shape, train_ds[0][1])  # torch.Size([3, 32, 32]) 6
표준 ImageNet preprocessing·python
import torch
import torchvision.transforms.v2 as T

# This is the chain that matches every torchvision pretrained model
preprocess = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Resize(256),
    T.CenterCrop(224),
    T.Normalize(mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
])

# But the modern recommended way is to ASK the model for its transforms:
from torchvision.models import resnet50, ResNet50_Weights
weights = ResNet50_Weights.IMAGENET1K_V2
preprocess = weights.transforms()
# weights.transforms() returns the EXACT preprocessing the model was trained with
v2 의 killer feature — multi-input transform·python
import torch
import torchvision.transforms.v2 as T
from torchvision import tv_tensors

# A scene with image + bounding boxes + segmentation mask
img = torch.randint(0, 255, (3, 224, 224), dtype=torch.uint8)
boxes = tv_tensors.BoundingBoxes(
    [[10, 20, 100, 150]], format='XYXY', canvas_size=(224, 224)
)
mask = tv_tensors.Mask(torch.zeros(224, 224, dtype=torch.uint8))

transform = T.Compose([
    T.RandomHorizontalFlip(p=1.0),
    T.RandomRotation(15),
])

# Apply to all three at once — boxes and mask transform consistently with image
img_t, boxes_t, mask_t = transform(img, boxes, mask)
print(img_t.shape, boxes_t, mask_t.shape)
# This was nearly impossible with the old transforms API.

External links

Exercise

v2 transform pipeline 으로 CIFAR-10 로드. 한 transformed image 의 dtype, shape, mean, std print. 그 다음 Normalize 없이 로드해서 값 범위 확인 — [0, 1] 이어야. Normalize 추가하고 mean 이 0, std 가 1 로 shift 검증.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.