Multilayer Perceptron

외워야 할 가장 단순한 neural network

MLP 는 그냥 linear → non-linearity → linear → non-linearity → linear, 마지막 linear layer 가 logit 생성. 이 quest 의 모든 architecture 가 이 template 의 specialization 또는 refinement 야. CNN 은 image 용 convolution 으로 linear 를 대체, transformer 는 attention 추가, recurrent net 은 매 timestep 같은 MLP 재사용.

Tabular data 와 많은 단순 regression/classification 문제에서, hidden unit 64–256 개의 2-3 layer MLP 와 sensible activation (ReLU/GELU) 가 합리적 시작점. 잘 튜닝된 gradient-boosted model 을 이기는 일은 드물지만, field 의 나머지를 배우기에 올바른 모양이야.

MLP 통과하는 shape

Layer 마다 shape 추적해. Input x: [B, in_dim], hidden h₁: [B, hidden_dim], hidden h₂: [B, hidden_dim], output y: [B, out_dim]. PyTorch 의 convention 에서 두 layer 를 잇는 weight matrix 의 shape 는 [out, in], bias 는 [out].

팁: Network 어딘가에 'one big layer' 가 있으면, network 안 가진 것과 마찬가지야. Multilayer 의 핵심은 downstream layer 가 합칠 수 있는 중간 hidden representation.

Depth 와 width 추가하면 뭐가 변하나

Width (layer 당 더 많은 unit) 는 한 abstraction level 의 parallel feature 더. Depth (더 많은 layer) 는 abstraction 사이의 compositional structure 더. 진짜 network 는 둘 다 필요. Tabular 문제 대부분은 적당한 width (128–512) 의 hidden layer 2-3 개로 충분, image/text 는 depth 와 specialized architecture 가 dominant.

원칙: Hidden representation 이 MLP 의 전체 핵심. Output layer 는 위에 올라가는 얇은 reader 야.

Code

An MLP with explicit shape comments·python

import torch, torch.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden_dim),     # [B, in_dim] -> [B, hidden_dim]
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim), # [B, hidden_dim] -> [B, hidden_dim]
            nn.ReLU(),
            nn.Linear(hidden_dim, out_dim),    # [B, hidden_dim] -> [B, out_dim]
        )
    def forward(self, x):
        return self.net(x)

model = MLP(in_dim=20, hidden_dim=128, out_dim=3)
x = torch.randn(64, 20)
logits = model(x)
print(logits.shape)  # torch.Size([64, 3])
n_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("params:", n_params)

외워야 할 가장 단순한 neural network

MLP 통과하는 shape

Depth 와 width 추가하면 뭐가 변하나

Code

External links

Exercise

Progress

댓글 0