퀴즈 · 4 questions

🔥 학습과 생성

Loss, 스케줄, 디코딩, 그리고 정렬

Level 0Token

0 XP0/94 lessons0/10 achievements

0/120 XP to next level120 XP to go0% complete

01GPT, Llama, Mistral이 쓰는 사전학습 objective는?

Hint

It's the same thing the model does at inference — predict the next token.

02DPO가 PPO/RLHF 대비 제거하는 것은?

Hint

DPO's simplicity comes from skipping a major component of RLHF.

03디코딩에서 temperature=0이 의미하는 건?

Hint

Think about what 1/T does as T approaches zero.

04큰 LLM 학습의 모던 기본 부동소수점 포맷은?

Hint

It has the range of FP32 but half the bits, and doesn't need GradScaler.

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.