Autograd 가 실제로 하는 일

Reverse-mode 자동 미분, eager Python 안에서

Autograd 는 PyTorch 의 자동 미분 엔진이야. 하나만 해: scalar (loss 같은) 를 계산하면, 그 scalar 를 만든 op 들을 backward 로 걸어가면서 scalar 에 대한 gradient 를 — gradient tracking 을 요청한 모든 leaf tensor 에 누적.

'reverse-mode' 가 중요해: forward-mode AD 는 derivative 를 forward 로 계산 (input 적고 output 많을 때 좋음); reverse-mode 는 backward (input 많고 output 하나일 때 좋음 — 정확히 deep learning 의 loss-vs-parameter shape). 100M parameter 와 scalar loss 에선 reverse-mode 가 유일한 실용적 옵션.

'Dynamic graph' — 그게 사주는 것

PyTorch 는 graph 를 on the fly 로, 매 forward 마다 만들어. 의미:

forward() 안에서 Python control flow (if/else, while, for) 사용 가능. graph 가 이 input 에 실제로 탄 path 반영.
print, breakpoint, pdb 로 디버깅 가능. forward pass 가 그냥 Python.
graph 가 backward 후 버려져 (retain 요청 안 하면). 메모리 bounded.

비용: dynamic graph 가 static graph 보다 setup 살짝 느려 — ahead-of-time optimization 기회 없으니까. torch.compile() (나중 트랙) 이 dynamic feel 안 포기하고 static-graph speedup 줘 — 근데 autograd 자체는 dynamic 으로 유지.

Code

최소 autograd loop·python

import torch

# A tracked scalar
x = torch.tensor(3.0, requires_grad=True)

# Forward — y depends on x
y = x ** 2 + 2 * x + 1   # y = 16 at x=3

# Backward — compute dy/dx
y.backward()

# Result lands on x.grad
# dy/dx = 2x + 2 = 2*3 + 2 = 8
print(x.grad)            # tensor(8.)

Dynamic graph — forward 안의 Python control flow·python

import torch

def f(x, branch):
    if branch == "polynomial":
        return x ** 3 - x
    else:
        return torch.sin(x) * x

# Same code path adapts based on Python data
x = torch.tensor(2.0, requires_grad=True)
y = f(x, "polynomial")
y.backward()
print(x.grad)            # 3*x^2 - 1 = 11

x.grad = None
y = f(x, "trig")
y.backward()
# d/dx (sin(x)*x) = cos(x)*x + sin(x)  ≈ -0.832 + 0.909 = 0.077
print(x.grad)

param 많고 scalar 하나 — deep learning shape·python

import torch

# A toy 'model': linear + bias
W = torch.randn(3, 2, requires_grad=True)
b = torch.randn(2, requires_grad=True)
x = torch.randn(4, 3)             # batch=4, features=3
y_true = torch.randn(4, 2)

# Forward
y_pred = x @ W + b                # (4, 2)
loss = ((y_pred - y_true) ** 2).mean()

# One backward call → gradients on every leaf
loss.backward()
print(W.grad.shape)    # torch.Size([3, 2])
print(b.grad.shape)    # torch.Size([2])

Autograd 가 실제로 하는 일

Reverse-mode 자동 미분, eager Python 안에서

'Dynamic graph' — 그게 사주는 것

Code

External links

Exercise

Progress

댓글 0