Optimizer — Adam, SGD, AdamW, schedule

~13 min · optimizer, adam, sgd, schedule

Level 0Level 0

0 XP0/78 lessons0/17 achievements

0/100 XP to next level100 XP to go0% complete

Weight가 실제로 움직이는 방식

Optimizer는 gradient를 받아서 weight 업데이트하는 방법을 결정해. 선택과 하이퍼파라미터가 training 속도와 최종 품질에 크게 영향.

Optimizer	적합한 곳	비고
SGD + momentum	대규모 vision, LR 튜닝 가능할 때	제대로 튜닝하면 최고 final accuracy 자주 달성
Adam	일반 default, NLP, 초기 실험	견고, 빠른 수렴, 튜닝된 SGD 대비 살짝 낮은 peak
AdamW	Transformer, 대형 model	decoupled weight decay — 더 깔끔한 regularization
RMSprop	RNN, non-stationary objective	Adam 등장 전 옛날 default

Learning rate schedule은 거의 항상 도움 돼. 클래식 패턴: 처음 몇 epoch는 warmup (0에서 base_lr까지 선형), 그 후 cosine decay로 0 향해. Keras 3에 CosineDecay가 옵션 warmup_steps와 함께 내장.

Code

Four optimizers + clipping·python

from tensorflow import keras

sgd = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9, nesterov=True)
adam = keras.optimizers.Adam(learning_rate=1e-3, beta_1=0.9, beta_2=0.999)
adamw = keras.optimizers.AdamW(learning_rate=1e-3, weight_decay=0.004)
rmsprop = keras.optimizers.RMSprop(learning_rate=1e-3, rho=0.9)

# Gradient clipping (works with any optimizer)
adam_clip_norm = keras.optimizers.Adam(learning_rate=1e-3, clipnorm=1.0)
adam_clip_val = keras.optimizers.Adam(learning_rate=1e-3, clipvalue=0.5)

Cosine decay schedule·python

from tensorflow import keras

# Smooth decrease from initial_lr to alpha * initial_lr
cosine = keras.optimizers.schedules.CosineDecay(
    initial_learning_rate=1e-3,
    decay_steps=10_000,
    alpha=0.0,
)

# Use schedule with any optimizer
optimizer = keras.optimizers.Adam(learning_rate=cosine)

# Exponential decay
exp = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-2,
    decay_steps=1000,
    decay_rate=0.96,
    staircase=True,
)

Progress

Progress is local-only — sign in to sync across devices.

← Previousmodel.compile() — optimizer, loss, metric 연결 Next →Loss와 metric — 올바른 선택

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.