수학: Element-wise, Matmul, Broadcasting

세 카테고리의 수학, 각각 quirk 하나씩

Element-wise

모든 표준 연산자 (+ - * / **) 와 대부분 torch.foo 함수 (exp, log, sqrt, abs, clamp) 는 element-wise 로 동작. a * b 의 별표는 element-wise 곱셈, matrix multiplication 아님 — 교과서 수학에서 포팅할 때 가장 자주 헷갈리는 포인트.

Matrix multiplication

matrix multiply 는 @ (Python 3.5 의 PEP 465) 또는 동등하게 torch.matmul. 둘 다 2D-on-2D 와 batched 3D+ tensor 작동, 후자는 leading dim 으로 broadcast. torch.bmm 은 엄격히 batched 버전, shape contract 강제할 때 유용.

Reductions

Reduction (sum, mean, max, argmax, std) 은 하나 이상의 dim collapse. dim 인자가 어느 거 결정: x.sum() 은 전부, x.sum(dim=0) 은 row 따라. keepdim default 는 False — reduce 된 dim 사라짐. downstream broadcasting 호환성 유지하려면 keepdim=True.

Broadcasting

Broadcasting 은 size-1 dim 을 가상으로 확장해서 다른 shape 의 tensor 들을 결합 가능하게 해줘. 룰, 오른쪽에서 왼쪽으로:

각 dim 쌍은 같거나, 한 쪽이 1 이어야 함.
없는 leading dim 은 1 로 취급.

NumPy 와 동일. 헷갈리면 두 shape 를 위아래로 적고 오른쪽 정렬, 오른쪽부터 쌍별로 체크.

Code

Element-wise vs matrix multiply·python

import torch

a = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
b = torch.tensor([[5.0, 6.0], [7.0, 8.0]])

# ELEMENT-WISE — what the asterisk does
a * b
# tensor([[ 5., 12.],
#         [21., 32.]])

# MATRIX MULTIPLY — what @ does
a @ b
# tensor([[19., 22.],
#         [43., 50.]])

# Both also exist as named functions
torch.mul(a, b)        # element-wise
torch.matmul(a, b)     # matrix multiply

Batched matmul (attention pattern)·python

import torch

# Q, K, V in attention: (batch, heads, seq, head_dim)
Q = torch.randn(2, 8, 64, 32)
K = torch.randn(2, 8, 64, 32)

# Attention scores: (batch, heads, seq, seq)
# K.transpose(-2, -1) → (2, 8, 32, 64)
scores = (Q @ K.transpose(-2, -1)) / (32 ** 0.5)
print(scores.shape)   # torch.Size([2, 8, 64, 64])

# torch.bmm is the strictly-3D version (no broadcasting on the leading dim)
A = torch.randn(8, 3, 4)
B = torch.randn(8, 4, 5)
torch.bmm(A, B).shape  # torch.Size([8, 3, 5])

Reduction 과 keepdim·python

import torch

t = torch.tensor([[1.0, 2.0, 3.0],
                  [4.0, 5.0, 6.0]])

t.sum()             # tensor(21.) — over everything
t.sum(dim=0)        # tensor([5., 7., 9.]) — collapse rows → shape (3,)
t.sum(dim=1)        # tensor([6., 15.])     — collapse cols → shape (2,)

# keepdim=True preserves the dim, which keeps broadcasting valid downstream
mean_per_row = t.mean(dim=1, keepdim=True)   # shape (2, 1)
centered = t - mean_per_row                  # broadcasts cleanly
print(centered)

실전 broadcasting·python

import torch

t = torch.zeros(3, 4)
row = torch.tensor([1, 2, 3, 4])      # (4,)        broadcasts down rows
col = torch.tensor([[10], [20], [30]])  # (3, 1)    broadcasts across cols

(t + row).shape   # torch.Size([3, 4])
(t + col).shape   # torch.Size([3, 4])
(t + row + col).shape  # torch.Size([3, 4])

# Right-aligned shape check:
#         (3, 4)
#            (4,)   ← matches col 4, missing dim treated as 1
#         (3, 1)    ← matches col 4 via 1, row matches 3

수학: Element-wise, Matmul, Broadcasting

세 카테고리의 수학, 각각 quirk 하나씩

Element-wise

Matrix multiplication

Reductions

Broadcasting

Code

External links

Exercise

Progress

댓글 0