torchvision 이 ready-to-use dataset (MNIST, CIFAR-10/100, ImageNet, COCO) 와 transform pipeline ship. 현재 API 는 torchvision.transforms.v2 — 옛 torchvision.transforms 를 추천 path 로 대체. 새 코드엔 v2.
왜 v2 중요
tv_tensors (Image, BoundingBoxes, Mask, Video) native 지원 — detection 과 segmentation 에 중요.
transform 이 multiple input 에 동시에 정확히 적용: image AND bounding box AND mask 를 한 번에 rotate.
가장 흔한 op 가 다시 쓴 구현 덕에 훨씬 빠름.
PIL-image transform 과 tensor transform 사이 깔끔한 분리.
표준 preprocessing chain
ImageNet-pretrained model 위해 이 정확한 recipe 끊임없이 봐:
ToImage() — tv_tensor.Image (v2-native type) 로 wrap.
ToDtype(torch.float32, scale=True) — [0, 1] 의 float32 로 변환.
Resize(256) — 짧은 변을 256 으로.
CenterCrop(224) — 중앙 224x224 crop.
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) — ImageNet normalization. 이 숫자 외워.
import torch
import torchvision.transforms.v2 as T
# This is the chain that matches every torchvision pretrained model
preprocess = T.Compose([
T.ToImage(),
T.ToDtype(torch.float32, scale=True),
T.Resize(256),
T.CenterCrop(224),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
# But the modern recommended way is to ASK the model for its transforms:
from torchvision.models import resnet50, ResNet50_Weights
weights = ResNet50_Weights.IMAGENET1K_V2
preprocess = weights.transforms()
# weights.transforms() returns the EXACT preprocessing the model was trained with
v2 의 killer feature — multi-input transform·python
import torch
import torchvision.transforms.v2 as T
from torchvision import tv_tensors
# A scene with image + bounding boxes + segmentation mask
img = torch.randint(0, 255, (3, 224, 224), dtype=torch.uint8)
boxes = tv_tensors.BoundingBoxes(
[[10, 20, 100, 150]], format='XYXY', canvas_size=(224, 224)
)
mask = tv_tensors.Mask(torch.zeros(224, 224, dtype=torch.uint8))
transform = T.Compose([
T.RandomHorizontalFlip(p=1.0),
T.RandomRotation(15),
])
# Apply to all three at once — boxes and mask transform consistently with image
img_t, boxes_t, mask_t = transform(img, boxes, mask)
print(img_t.shape, boxes_t, mask_t.shape)
# This was nearly impossible with the old transforms API.
v2 transform pipeline 으로 CIFAR-10 로드. 한 transformed image 의 dtype, shape, mean, std print. 그 다음 Normalize 없이 로드해서 값 범위 확인 — [0, 1] 이어야. Normalize 추가하고 mean 이 0, std 가 1 로 shift 검증.
Progress
Progress is local-only — sign in to sync across devices.