행렬곱과 Broadcasting

모든 Dense layer 뒤에 숨은 두 연산

행렬곱은 neural network의 핵심 연산이야. Dense layer는 수학적으로 matmul + bias add + activation. tf.matmul(a, b)는 a.shape[-1] == b.shape[-2] 필수 — 안쪽 차원이 contract. 앞쪽 차원은 batch 차원이고 일치 (또는 broadcast).

Python @ 연산자가 tf.matmul의 짧은 표기. 가독성 좋은 쪽 써.

Broadcasting은 size-1 차원을 암묵적으로 확장해서 다른 shape의 tensor끼리 연산 가능하게 해줘. 규칙: shape를 오른쪽 정렬, size-1 차원은 확장 가능, 둘 다 1이 아닌데 안 맞으면 에러. (batch, n) activation에 (n,) bias 더하기가 되는 이유는 bias가 (1, n)으로 취급되고 batch에 broadcast되기 때문.

Broadcasting은 조용해. 의도치 않게 broadcast해도 TF가 경고 안 해. row vector (1, n)을 의도했는데 column vector (n, 1)이면 에러 없이 다른 결과 나와. 다른 size 연산 후엔 출력 shape 항상 print해.

Code

Matrix multiplication patterns·python

import tensorflow as tf

# Basic: (m, k) @ (k, n) → (m, n)
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])     # (2, 2)
b = tf.constant([[5.0], [6.0]])                # (2, 1)
result = tf.matmul(a, b)                       # (2, 1)
result2 = a @ b                                # same — Python @ shorthand

# Batched: leading dims are batch
batch_a = tf.ones([32, 64, 128])
batch_b = tf.ones([32, 128, 10])
out = tf.matmul(batch_a, batch_b)              # (32, 64, 10)

# transpose_b avoids creating an intermediate transpose tensor
a = tf.ones([2, 3])
b = tf.ones([2, 3])
result = tf.matmul(a, b, transpose_b=True)     # (2, 3) @ (3, 2) → (2, 2)

Broadcasting examples·python

import tensorflow as tf

x = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # (2, 3)

# Scalar broadcast
print(x + 10)         # [[11,12,13],[14,15,16]]

# Row vector (1, 3) broadcast to (2, 3)
row_bias = tf.constant([[10.0, 20.0, 30.0]])
print(x + row_bias)   # [[11,22,33],[14,25,36]]

# Column vector (2, 1) broadcast to (2, 3)
col_bias = tf.constant([[100.0], [200.0]])
print(x + col_bias)   # [[101,102,103],[204,205,206]]

# Higher-rank broadcasting
a = tf.ones([4, 1, 3])
b = tf.ones([1, 5, 3])
print((a + b).shape)  # (4, 5, 3)

모든 Dense layer 뒤에 숨은 두 연산

Code

External links

Progress

댓글 0