In-place 연산과 Trailing Underscore

convention: trailing underscore = mutate

trailing underscore 가 있는 PyTorch op 는 자기 tensor 를 in-place 로 mutate: add_, mul_, zero_, fill_, uniform_, normal_, clamp_. underscore 없는 버전은 새 tensor 반환하고 input 변경 안 함.

왜 신경 써? 두 가지 이유:

Memory. in-place 는 새 tensor 할당 회피. 1B-parameter model 에선 의미 있음.
Autograd 안전성. in-place op 는 autograd graph 를 손상시킬 수 있음. PyTorch 가 감지해서 명확한 에러 내지만, 룰 알아둬: backward 에 필요한 tensor 를 mutate 하지 마.

in-place op 가 routine 인 곳

optimizer.zero_grad() 가 모든 parameter 의 p.grad.zero_() 호출 (또는 set_to_none=True 넘기면 p.grad = None 설정 — modern PyTorch 의 default 이고 약간 더 빠름).
Custom weight initialization 은 보통 torch.no_grad() 블록 안의 .uniform_() / .normal_().
EMA (exponential moving average) update 의 teacher / momentum network 가 교과서 in-place 영역.

그 패턴 밖에선 non-mutating 버전에 기대. 정상 model 코드에서 추가 in-place 의 메모리 절약은 autograd 위험에 비해 거의 가치 없음.

Code

Out-of-place vs in-place·python

import torch

t = torch.tensor([1.0, 2.0, 3.0])

# Out-of-place: returns new tensor, original untouched
t2 = t.add(5)
print(t)   # tensor([1., 2., 3.])
print(t2)  # tensor([6., 7., 8.])

# In-place: mutates t, returns t for chaining
t.add_(5)
print(t)   # tensor([6., 7., 8.])

# Chaining
t.mul_(2).clamp_(0, 100)
print(t)   # tensor([12., 14., 16.])

no_grad 안의 in-place — weight init·python

import torch
import torch.nn as nn

linear = nn.Linear(10, 4)

# Custom Xavier init — must be inside no_grad to avoid autograd-tracking
with torch.no_grad():
    bound = (6.0 / (linear.in_features + linear.out_features)) ** 0.5
    linear.weight.uniform_(-bound, bound)
    linear.bias.zero_()

print(linear.weight.std())  # roughly Xavier-shaped

왜 autograd 가 in-place 를 싫어하는지 — 작은 demo·python

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x ** 2          # y depends on x's values for the backward
y.sum().backward()  # works
print(x.grad)       # tensor([2., 4., 6.])

# Now mutate x AFTER the forward but BEFORE backward — invalid
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x ** 2
x.add_(100)         # corrupts the value autograd needs
try:
    y.sum().backward()
except RuntimeError as e:
    print(type(e).__name__, str(e)[:80])
# RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

In-place 연산과 Trailing Underscore

convention: trailing underscore = mutate

in-place op 가 routine 인 곳

Code

External links

Exercise

Progress

댓글 0