C.W.K.
Stream
Lesson 02 of 06 · published

Vision Model — ResNet, EfficientNet, ConvNeXt, ViT

~12 min · resnet, efficientnet, convnext, vit

Level 0Tensor 호기심
0 XP0/62 lessons0/13 achievements
0/120 XP to next level120 XP to go0% complete

pretrained vision model 메뉴

torchvision 이 수십 architecture 의 pretrained weight ship. 가장 자주 잡을 네 가족:

  • ResNet (resnet18/34/50/101/152) — 일꾼. 신뢰할 만, 잘 이해됨, 모든 backend 에서 빠름. 새 task 의 default 시작점.
  • EfficientNet (efficientnet_b0..b7, _v2_*) — 더 나은 accuracy/parameter trade-off. model size 중요할 때 (mobile deploy, large-scale inference).
  • ConvNeXt (convnext_tiny/small/base/large) — ViT 와의 격차 좁힌 modern ConvNet. 놀랍게 강력, 같은 accuracy 에 ViT 보다 빠름.
  • Vision Transformer (vit_b_16/32, vit_l_16, vit_h_14) — image 에 적용된 Transformer. 많은 task 의 state-of-the-art 지만 CNN 보다 data-hungry, parameter 당 살짝 느림.

적응 — head 교체가 architecture-specific

각 architecture 가 classifier head 를 다르게 노출:

  • ResNet: model.fc = nn.Linear(model.fc.in_features, num_classes)
  • EfficientNet: model.classifier[1] = nn.Linear(model.classifier[1].in_features, num_classes)
  • ConvNeXt: model.classifier[2] = nn.Linear(model.classifier[2].in_features, num_classes)
  • ViT: model.heads.head = nn.Linear(model.heads.head.in_features, num_classes)

pattern: 먼저 print(model), final Linear 찾고 교체. convention 이 충분히 일관돼서 작은 helper 함수 한 번 짜서 reuse 가능.

Code

네 가족 로딩·python
from torchvision import models
from torchvision.models import (
    ResNet50_Weights, EfficientNet_B0_Weights,
    ConvNeXt_Base_Weights, ViT_B_16_Weights,
)

resnet = models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
effnet = models.efficientnet_b0(weights=EfficientNet_B0_Weights.IMAGENET1K_V1)
convnext = models.convnext_base(weights=ConvNeXt_Base_Weights.IMAGENET1K_V1)
vit = models.vit_b_16(weights=ViT_B_16_Weights.IMAGENET1K_V1)

for m in [resnet, effnet, convnext, vit]:
    n = sum(p.numel() for p in m.parameters())
    print(f"{type(m).__name__:14s}  {n/1e6:6.1f}M params")
Architecture-aware head 교체·python
import torch.nn as nn
from torchvision import models
from torchvision.models import (
    ResNet50_Weights, EfficientNet_B0_Weights,
    ConvNeXt_Base_Weights, ViT_B_16_Weights,
)

def make_classifier(arch, num_classes):
    if arch == 'resnet50':
        m = models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
        m.fc = nn.Linear(m.fc.in_features, num_classes)
    elif arch == 'efficientnet_b0':
        m = models.efficientnet_b0(weights=EfficientNet_B0_Weights.IMAGENET1K_V1)
        m.classifier[1] = nn.Linear(m.classifier[1].in_features, num_classes)
    elif arch == 'convnext_base':
        m = models.convnext_base(weights=ConvNeXt_Base_Weights.IMAGENET1K_V1)
        m.classifier[2] = nn.Linear(m.classifier[2].in_features, num_classes)
    elif arch == 'vit_b_16':
        m = models.vit_b_16(weights=ViT_B_16_Weights.IMAGENET1K_V1)
        m.heads.head = nn.Linear(m.heads.head.in_features, num_classes)
    return m

model = make_classifier('convnext_base', num_classes=5)
timm — torchvision 에 없는 architecture 필요할 때·python
# pip install timm
import timm

# timm has 1000+ pretrained vision models — ConvNeXt v2, MaxViT, EVA, BEiT, etc.
model = timm.create_model('convnext_base.fb_in22k_ft_in1k', pretrained=True, num_classes=5)

# timm also gives you the matching transforms
data_cfg = timm.data.resolve_data_config({}, model=model)
transform = timm.data.create_transform(**data_cfg)

# For research, timm is often more up-to-date than torchvision
# For production stability, torchvision is the safer pick.

External links

Exercise

make_classifier 함수로 num_classes=5 의 네 architecture 다 instantiate. 가장 빠른 device 의 (8, 3, 224, 224) batch 에 single forward pass 시간. per-architecture latency 노트 — 배포 결정에 유용.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.