퀴즈 · 5 questions

🎯 Attention 메커니즘

Q, K, V — 그리고 그걸 스케일하는 엔지니어링

Level 0Token

0 XP0/94 lessons0/10 achievements

0/120 XP to next level120 XP to go0% complete

01attention의 √d_k 스케일링 인자는 뭘 방지하나?

Hint

What happens to softmax(x) when the magnitude of x is much larger than 1?

02KV-cache는 뭐에 쓰이나?

Hint

What can you cache once it's been computed?

03Llama 3.3 70B의 KV head 수는?

Hint

It's the standard GQA group size used by most modern open-weight models.

04Flash Attention의 핵심 혁신은?

Hint

Same math, different memory strategy.

05decoder-only 학습에 causal masking이 왜 필요한가?

Hint

What's the difference between training in parallel and generating sequentially?

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.