nn.Module — 모든 게 상속받는 subclass

두 method, 무한 power

PyTorch 의 모든 neural network 컴포넌트 — single linear layer 부터 70B-parameter transformer 까지 — 가 nn.Module. subclass 하기가 이 framework 에서 쓸 가장 중요한 Python pattern, 계약을 박아둬:

__init__ override. 항상 super().__init__() 먼저. 자식 module 과 parameter 를 self 에 할당해서 만들기 (예: self.layer1 = nn.Linear(10, 20)). PyTorch 가 __setattr__ 후크해서 등록.
forward(self, x, ...) override. 계산 정의. 직접 호출 절대 안 함 — module instance 를 함수처럼 호출: model(x), model.forward(x) 아님. instance 호출이 __call__ 통과해서 등록된 hook 발사, 자식의 train/eval mode 처리, autograd 기계장치 적절히 thread.

무료로 얻는 것

model.parameters() — 얼마나 nested 되어 있든 모든 learnable tensor iterator.
model.named_parameters() — 동일, dotted-path name 첨부.
model.to(device) — 모든 parameter AND buffer 를 device 로 이동.
model.train() / model.eval() — 신경 쓰는 자식의 행동 뒤집기 (Dropout, BatchNorm).
model.state_dict() / load_state_dict() — serialization-ready 모든 parameter 와 buffer 의 dict.

이 모든 게 nn.Module 기계장치 사용에 의존 (self 에 할당, collection 엔 plain Python list 대신 nn.ModuleList, raw learnable tensor 엔 nn.Parameter). 빼먹으면 auto-magic 이 silently 멈춤.

Code

최소 nn.Module — canonical subclass·python

import torch
import torch.nn as nn

class TinyMLP(nn.Module):
    def __init__(self, in_dim=784, hidden=128, out_dim=10):
        super().__init__()                    # ALWAYS first
        self.fc1 = nn.Linear(in_dim, hidden)
        self.act = nn.ReLU()
        self.fc2 = nn.Linear(hidden, out_dim)

    def forward(self, x):
        x = self.fc1(x)
        x = self.act(x)
        x = self.fc2(x)
        return x

model = TinyMLP()
print(model)
# TinyMLP(
#   (fc1): Linear(in_features=784, out_features=128, bias=True)
#   (act): ReLU()
#   (fc2): Linear(in_features=128, out_features=10, bias=True)
# )

무료 utility — parameter, device, mode·python

import torch
import torch.nn as nn

class TinyMLP(nn.Module):
    def __init__(self): super().__init__(); self.fc = nn.Linear(10, 4)
    def forward(self, x): return self.fc(x)

model = TinyMLP()

# Parameter iteration
for name, p in model.named_parameters():
    print(name, p.shape)
# fc.weight torch.Size([4, 10])
# fc.bias   torch.Size([4])

# Total parameter count
total = sum(p.numel() for p in model.parameters())
print(f"Params: {total:,}")  # Params: 44

# Move once, everything follows
model = model.to('cpu')        # or 'cuda' / 'mps'
print(next(model.parameters()).device)

# Mode switching
model.train()                  # default mode
model.eval()                   # affects Dropout / BatchNorm

왜 model(x), 절대 model.forward(x) 아님·python

import torch
import torch.nn as nn

class HookedLinear(nn.Linear):
    pass

m = HookedLinear(4, 2)

# Register a hook that prints the OUTPUT shape after each forward
def hook(module, inputs, output):
    print(f"Hook fired: out shape = {output.shape}")

m.register_forward_hook(hook)

x = torch.randn(3, 4)

m(x)               # Hook fired: out shape = torch.Size([3, 2])
# m.forward(x)     # NO hook fires! Bypasses __call__.

# This is why everyone calls model(x). Bypassing __call__ skips:
#   - registered forward / backward hooks
#   - __torch_function__ dispatch
#   - some compile / quantization machinery

nn.Module — 모든 게 상속받는 subclass

두 method, 무한 power

무료로 얻는 것

Code

External links

Exercise

Progress

댓글 0