Context window 수학 — token, 순서, 남는 것

단위는 토큰이지 character가 아니야

모든 모델이 토큰 단위로 billing, limit, 추론. 토큰은 영어 단어의 대략 ¾ — 근데 언어마다 (한국어는 dense, 코드는 다른 식으로 dense), tokenizer마다 달라. Anthropic, OpenAI, Google 다 다른 tokenizer 써. 같은 영어 단락이 셋에서 다른 토큰 수야. 같다고 가정하면 context window 수학이 20-30% 어긋나.

너의 모델에 대해 알아야 할 세 숫자

Total context window — input + output. Claude Opus 4.7은 1M 토큰 모드 있고, OpenAI GPT-5.5와 Gemini 2.5 Pro는 tier에 따라 200k–2M.
Output cap — 단일 응답 실용 max, 보통 context window보다 훨씬 작아.
Effective attention budget — 어떤 길이 넘어가면 attention degrade. 모델이 1M 토큰 기술적으로 읽을 수 있는데 중간을 똑같이 잘 attend 안 해. 레슨 7의 long-context limit이 진짜 우려야.

순서는 프롬프트의 일부

모델은 context의 시작과 끝에 disproportionate한 attention 줘 — lost-in-the-middle 효과. 위에 instruction, document block 위에 가장 중요한 evidence, 매우 긴 context의 경우 아래에 instruction reminder 짧게.

Code

프롬프트 토큰 카운트·python

import anthropic

client = anthropic.Anthropic()
resp = client.messages.count_tokens(
    model="claude-opus-4-7",
    system=open("prompts/agent.md").read(),
    messages=[{"role": "user", "content": user_text}]
)
print(resp.input_tokens, "input tokens")

긴 context에서 instruction sandwich·markdown

# Top of context — full instruction
[5-clause contract]

# Documents
[hundreds of thousands of tokens]

# Bottom of context — short reminder
Return JSON matching the schema above. Cite each claim with a doc id.

Context window 수학 — token, 순서, 남는 것

단위는 토큰이지 character가 아니야

너의 모델에 대해 알아야 할 세 숫자

순서는 프롬프트의 일부

Code

External links

Exercise

Progress

댓글 0