퀴즈 · 5 questions
🎯 Attention 메커니즘
Q, K, V — 그리고 그걸 스케일하는 엔지니어링
Level 0Token
0 XP0/94 lessons0/10 achievements
0/120 XP to next level120 XP to go0% complete
Quiz
01attention의 √d_k 스케일링 인자는 뭘 방지하나?
Hint
What happens to softmax(x) when the magnitude of x is much larger than 1?
02KV-cache는 뭐에 쓰이나?
Hint
What can you cache once it's been computed?
03Llama 3.3 70B의 KV head 수는?
Hint
It's the standard GQA group size used by most modern open-weight models.
04Flash Attention의 핵심 혁신은?
Hint
Same math, different memory strategy.
05decoder-only 학습에 causal masking이 왜 필요한가?
Hint
What's the difference between training in parallel and generating sequentially?
댓글 0
🔔 답글 알림 (로그인 필요)로그인 — 댓글을 남기려면 로그인해 주세요.
아직 댓글이 없어요. 첫 댓글을 남겨보세요.