퀴즈 · 4 questions

🔤 토큰화

텍스트가 정수가 되는 법

Level 0Token

0 XP0/94 lessons0/10 achievements

0/120 XP to next level120 XP to go0% complete

01GPT-4가 쓰는 subword 토큰화 알고리즘은?

Hint

It's the same algorithm as GPT-2 and GPT-3, just with a larger vocabulary.

02왜 모델이 'strawberry'의 글자 수를 셀 때 틀릴 수 있나?

Hint

What does the model literally see when you type 'strawberry'?

03Gemma 3가 쓰는 vocab 크기는 대략?

Hint

It's the largest vocabulary among major open-weight models — designed for many languages.

04왜 출력 토큰이 보통 입력 토큰보다 4-10배 비싸나?

Hint

Think about prefill vs decode — one is a big matmul, the other is many small ones.

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.