Reasoning budget vs output budget

~12 min · reasoning, budget, max-tokens

Level 0수련생

0 XP0/100 lessons0/14 achievements

0/120 XP to next level120 XP to go0% complete

두 budget, 두 failure mode

Reasoning 모델은 thinking 토큰을 output 토큰이랑 별도로 bill해. Thinking budget이 user한테 invisible인데 청구서엔 매우 visible. 반대로 output max_tokens가 너무 작으면 모델이 답 중간에 truncate해. 두 budget 의도적으로 set 필요.

thinking budget 세팅

Trivial classification — off나 'low'. 최대 몇 백 토큰.
Multi-step planning — 4k–10k 토큰. branch 공간 줘.
Hard math, deep code analysis — 20k+. budget 더 주는 게 도움 되는지 테스트; diminishing return 들어와.

output budget 세팅

max_tokens를 expected output shape에 margin 두고 match. JSON object 5개 field → 512 max_tokens 충분. Long-form essay → 2k–4k. Truncated output은 반복적인 incident 패턴; max_tokens explicit하게 만들고 response가 cap 칠 때 alert 추가.

Code

두 budget call·python

client.messages.create(
    model="claude-opus-4-7",
    thinking={"type": "enabled", "budget_tokens": 8_000},
    max_tokens=1_024,  # output budget
    messages=[...],
)

External links

Exercise

production 프롬프트의 두 budget 감사. request sample에서 thinking 절반으로 낮춰; quality 측정. output max_tokens hit 시 alert 추가.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousStep compression — 안 보여줘도 되면 work 안 보여줘 Next →Reasoning theatre의 trap

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.