Overfitting: 모델이 학습 대신 암기할 때

초보자 잡아먹는 함정

모델 학습. 학습 loss 가 0 으로. 자랑스러움. 그러고 테스트 데이터 평가 — 성능 끔찍. Overfit 했어. 모델이 underlying 패턴 학습 대신 학습셋 암기.

시그니처: 학습 loss 낮음, 테스트 loss 높음. 그 갭이 overfitting.

왜 일어나

파라미터 너무 많은 모델은 학습 데이터를 정확히 암기 가능 — 모든 quirk, 모든 노이즈 sample. 일반화 능력 잃음 — 그럴 필요 없었으니까. 학습 중 유일한 신호 = "학습 loss 작게", 충분히 큰 모델은 brute force 로 가능.

해독제

더 많은 데이터 — 가장 싼 fix. 1000만 예시 암기가 100 암기보다 어려움.
정규화 — 복잡도 페널티. L1/L2 weight 페널티. Dropout (랜덤 뉴런 끄기). Early stopping.
Train/Validation/Test split — 모델이 학습 중 절대 못 보는 데이터 hold out. Val loss 추적해 비행 중 overfitting 감지.
데이터 augmentation — 학습 sample 약간 perturb (이미지 회전, 텍스트 paraphrase) 해 effective 데이터셋 크기 증가.
더 작은 모델 — 파라미터 적음 = 암기 capacity 적음. 가끔 best fix.

학습 정확도는 vanity 메트릭. Validation 정확도가 진실. 둘 멀면 모델이 암기 중, 학습 X. 늘 모델이 못 보는 데이터 hold out.

Code

다항식 overfit 데모·python

import numpy as np

# Degree 증가하는 다항식 fit — overfitting 봐
np.random.seed(0)
x = np.linspace(0, 1, 10)
y = np.sin(2 * np.pi * x) + np.random.normal(0, 0.1, 10)

x_test = np.linspace(0, 1, 100)
y_test = np.sin(2 * np.pi * x_test)        # 진실

for degree in [1, 3, 9]:
    coeffs = np.polyfit(x, y, degree)
    train_pred = np.polyval(coeffs, x)
    test_pred  = np.polyval(coeffs, x_test)
    train_mse = np.mean((y - train_pred) ** 2)
    test_mse  = np.mean((y_test - test_pred) ** 2)
    print(f"degree {degree}: train MSE = {train_mse:.4f}, test MSE = {test_mse:.4f}")
# Degree 9 는 학습엔 거의 완벽 fit, 테스트에선 폭발 — 클래식 overfit.

Exercise

20개 점 y = sin(2πx) + 작은 노이즈 따라. Degree 1, 3, 9, 15 다항식 fit. 각각 fine grid 에 plot, test MSE 계산. 높은 degree 에서 test MSE 급등 — 시각적 overfitting.

Hint

높은 degree = 더 flexible = 학습 fit 좋음 but 일반화 나쁨. '맞는' degree = 노이즈 레벨과 데이터 양 따라.

Overfitting: 모델이 학습 대신 암기할 때

초보자 잡아먹는 함정

왜 일어나

해독제

Code

External links

Exercise

Progress

댓글 0