Convolution: Shared Filter

Convolution 이 실제 하는 일

2-D convolution 은 작은 filter (e.g. 3×3 weight) 를 input image 위 slide 하면서, 각 위치에서 local pixel patch 의 weighted sum 계산. 같은 filter 를 모든 위치에 재사용 — CNN 을 efficient 하게 만드는 parameter sharing. Input channel 32 개, output channel 64 개의 single 3×3 conv 가 image size 무관 3*3*32*64 + 64 = 18496 parameter.

Layer 당 많은 filter (channel dimension) 와 많은 layer (depth) 쌓고, 사이에 non-linearity. 초기 layer 가 edge detector 와 color blob 학습, 중간 layer 가 texture 와 shape part, deep layer 가 object-shaped feature.

팁: 모든 nn.Conv2d 를 'kxk size 의 N learned filter 를 spatial dimension 따라 stride s, padding p 로 적용, shape [B, N, H_out, W_out] output 생성' 으로 읽어. 정확한 output shape 는 floor((H + 2p - k)/s) + 1.

중요한 hyperparameter

kernel_size — 보통 3, 가끔 1 (channel-mix), 가끔 7 (early layer).
stride — 1 (same spatial size) 또는 2 (downsample).
padding — 보통 k//2 의 'same' padding (stride=1 일 때 output 이 input 과 same size).
in_channels / out_channels — 이 layer 에서 network 의 width.
groups — 1 (standard) 또는 in_channels (depthwise, efficient mobile architecture 에).

왜 convolution 이 generalize 하는가

두 이유. Translation equivariance: input shift 하면 output 도 같게 shift — model 이 모든 위치 마다 feature 다시 학습 안 해도 됨. Parameter sharing: 각 filter 가 image 위에서 재사용, 훨씬 적은 parameter, 각각이 fully-connected layer 에서 학습할 데이터보다 훨씬 더 많은 데이터에 학습.

원칙: Convolution 이 locality 와 translation 을 built-in 가정으로 encode. 가정이 data 에 맞으면 (image, audio spectrogram), CNN 이 fully-connected model 보다 dramatically parameter-efficient. 안 맞으면 (tabular, set-valued data) 다른 architecture 써.

Code

Hand-trace a Conv2d output shape·python

import torch
import torch.nn as nn

x = torch.randn(8, 3, 224, 224)              # [B, C, H, W]

conv = nn.Conv2d(in_channels=3, out_channels=64,
                 kernel_size=3, stride=2, padding=1)
y = conv(x)
print(y.shape)  # torch.Size([8, 64, 112, 112])
# H_out = floor((224 + 2*1 - 3) / 2) + 1 = 112

# Channel-mix 1x1 conv (no spatial change, just learns linear combos of channels)
conv1x1 = nn.Conv2d(64, 32, kernel_size=1)
print(conv1x1(y).shape)  # torch.Size([8, 32, 112, 112])

Depthwise separable conv (mobile efficiency trick)·python

class DepthwiseSeparable(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.dw = nn.Conv2d(in_ch, in_ch, 3, padding=1, groups=in_ch, bias=False)
        self.pw = nn.Conv2d(in_ch, out_ch, 1, bias=False)
    def forward(self, x):
        return self.pw(self.dw(x))
# Far fewer params than a full Conv2d(in_ch, out_ch, 3) — same receptive field

Convolution 이 실제 하는 일

중요한 hyperparameter

왜 convolution 이 generalize 하는가

Code

External links

Exercise

Progress

댓글 0