Convolution Layer — Conv2d 와 친구들

PyTorch 의 NCHW universe 에서 conv layer

PyTorch 는 image data 를 (N, C, H, W) 로: batch, channel, height, width. (다른 주요 convention 은 NHWC, TensorFlow 와 CoreML 사용 — framework 넘을 때 permute 준비.)

nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0) 가 (kernel_size, kernel_size, in_channels) shape 의 out_channels filter 학습, 각각 input 위 slide, out_channels feature map 생산. output spatial size 표준 공식: out = (in + 2*padding - kernel) / stride + 1.

실제 쓰게 될 variant

Conv2d — 2D conv. image default.
Conv1d — 1D conv. sequence (audio waveform, character-level text, time series) 에 유용.
ConvTranspose2d — upsampling 위 'deconvolution'. U-Net 의 decoder, DCGAN 의 generator.
Depthwise separable conv — Conv2d(groups=in_channels) + Conv2d(1x1) 결합으로 짓기. MobileNet / EfficientNet 을 효율적으로 만드는 trick.

padding shorthand

PyTorch 1.10+ 에 padding='same' 을 string 으로 추가, stride=1 에 spatial dim 유지. 모두가 손으로 쓰던 'padding 직접 계산' 의 편한 버전. spatial dim 떨어뜨릴 specific 이유 없으면 사용.

Code

Conv2d 기본·python

import torch
import torch.nn as nn

# 3 RGB channels in, 16 feature maps out, 3x3 kernel
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3,
                 stride=1, padding=1)

x = torch.randn(8, 3, 32, 32)   # batch=8, RGB, 32x32 image
y = conv(x)
print(y.shape)                   # torch.Size([8, 16, 32, 32]) — same spatial
print(conv.weight.shape)         # torch.Size([16, 3, 3, 3])  — out, in, kH, kW
print(conv.bias.shape)           # torch.Size([16])

# stride=2 halves spatial dims
conv_down = nn.Conv2d(3, 16, 3, stride=2, padding=1)
print(conv_down(x).shape)        # torch.Size([8, 16, 16, 16])

간단 CNN — canonical pattern·python

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2),                 # 32x32 → 16x16

            nn.Conv2d(32, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2),                 # 16x16 → 8x8

            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d(1),         # global avg pool → 1x1
        )
        self.classifier = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.features(x)
        x = x.flatten(1)                     # (B, 128, 1, 1) → (B, 128)
        return self.classifier(x)

model = SimpleCNN()
print(model(torch.randn(4, 3, 32, 32)).shape)   # torch.Size([4, 10])

Depthwise separable conv — MobileNet trick·python

import torch.nn as nn

class DepthwiseSeparable(nn.Module):
    """Replace a regular conv with depthwise + pointwise — far fewer params."""
    def __init__(self, in_ch, out_ch, kernel=3):
        super().__init__()
        # Depthwise: each input channel gets its own kernel
        self.depthwise = nn.Conv2d(in_ch, in_ch, kernel,
                                    padding=kernel // 2, groups=in_ch)
        # Pointwise: 1x1 conv to mix channels
        self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)

    def forward(self, x):
        return self.pointwise(self.depthwise(x))

# Compare param counts
regular = nn.Conv2d(64, 128, 3, padding=1)
sep = DepthwiseSeparable(64, 128, 3)

regular_params = sum(p.numel() for p in regular.parameters())
sep_params = sum(p.numel() for p in sep.parameters())
print(f"Regular: {regular_params:,}")    # 73,856
print(f"Separable: {sep_params:,}")       # 8,896 — about 8x fewer

Convolution Layer — Conv2d 와 친구들

PyTorch 의 NCHW universe 에서 conv layer

실제 쓰게 될 variant

padding shorthand

Code

External links

Exercise

Progress

댓글 0