Perceptron 기초

모든 걸 시작한 1958 모델

Frank Rosenblatt 의 perceptron (1958) 은 single-output binary classifier: y = sign(w · x + b). Rosenblatt 는 아름다운 convergence theorem 을 증명했어 — data 가 linearly separable 이면 그의 학습 rule 이 finite step 안에 분리 hyperplane 을 찾아. 언론은 self-aware machine 의 씨앗이라고 했고, 그건 예상대로 곱게 늙지 못했지.

Minsky 와 Papert 의 Perceptrons (1969) 가 single perceptron 이 XOR 을 못 배운다는 걸 지적했어 — non-linear decision boundary 가 필요한 문제. 이 책이 뒤따른 AI winter 의 원인으로 자주 비난받는데, 실제 이유는 아무도 multilayer network 를 효과적으로 train 하는 법을 몰랐던 거야. Backprop 이 거의 20 년 뒤에 그걸 풀었지.

왜 아직 가르치는가

Perceptron 은 'output 에 step 있는 linear model' 의 가장 단순한 예야. 모든 modern classifier — softmax, sigmoid, cross-entropy — 가 더 부드럽고 differentiable 한 사촌이야. Perceptron 을 이해하면 나머지 math 가 신비롭지 않고 필연적으로 느껴져.

팁: Perceptron 의 XOR 실패는 hidden layer 와 non-linear activation 의 canonical 동기야. 언제든 화이트보드에 그릴 수 있어야 해 — field 전체에서 가장 leverage 높은 예시 중 하나니까.

Perceptron rule 의 모양

Misclassify 된 예시마다, weight vector 를 positive 였어야 할 input 쪽으로 (또는 negative 였어야 할 것에서 멀리) 밀어: w := w + η y x. 변장한 gradient descent 야 — hinge 모양 loss 의 gradient + step function on top.

Code

Perceptron from scratch·python

import numpy as np

def perceptron_train(X, y, lr=1.0, epochs=20):
    N, D = X.shape
    w = np.zeros(D); b = 0.0
    for _ in range(epochs):
        wrong = 0
        for xi, yi in zip(X, y):
            if yi * (w @ xi + b) <= 0:
                w += lr * yi * xi
                b += lr * yi
                wrong += 1
        if wrong == 0:
            break
    return w, b

X = np.array([[2, 3], [1, 1], [4, 5], [-1, -2], [-2, -3], [-3, -1]])
y = np.array([+1, +1, +1, -1, -1, -1])
w, b = perceptron_train(X, y)
print("learned w, b:", w, b)
print("predictions:", np.sign(X @ w + b))

모든 걸 시작한 1958 모델

왜 아직 가르치는가

Perceptron rule 의 모양

Code

External links

Exercise

Progress

댓글 0