Attention Sink와 Sticky Start

시작이 이상하게 강력하다

Attention sink 연구(Xiao et al. 2023, Efficient Streaming Language Models with Attention Sinks)가 보여줬어 — sequence 첫 몇 token이 semantic 관련성 없어도 disproportionate한 attention 받을 수 있다. softmax attention은 확률 합 1로 강제 — 어떤 token도 특별히 관련 없으면, 모형이 그 mass를 첫 몇 token에 dump해서 information-free anchor가 된다.

실전에서 왜 중요한가

system prompt의 *위치*가 내용만큼 중요. 첫 token은 outsized weight 있어 — 의도적으로 써. 깨끗한 system prompt와 안정적 project rule은 prefix에서 이득. 위에 noisy timestamp, 랜덤 banner, 낡은 rule 두면 그것도 sticky해진다 — 모형이 잘못된 데 계속 half-attend.

prefix 보호

prefix에는 durable rule과 안정적 구조. 변동성 있는 per-turn noise는 뒤로. attention behavior와 prompt caching 둘 다 좋아진다 — 두 효과가 서로 강화한다.

context 시작은 비싼 부동산이야. noise한테 임대 주지 마.

Code

prefix hygiene·yaml

prefix:
  keep:
    - durable instructions
    - tool schemas
    - stable examples
  avoid:
    - timestamps
    - request ids
    - stale TODOs
    - random server banners
    - rotating welcome messages

나쁜 vs 좋은 첫 200 token·markdown

BAD (cache-busting + sink-poisoning):
  Request 7e3a-2026-05-03T19:56:04Z processed by node-12.
  Welcome back, user! Session started 4321ms ago.
  Today is Saturday. Random tip: be careful.
  ...rules below...

GOOD (stable, sink-friendly):
  You are an AI assistant for the cwkPippa project.
  Always use the existing dev server. Do not run git push.
  Tool schemas:
  ...stable rules continue...

시작이 이상하게 강력하다

실전에서 왜 중요한가

prefix 보호

Code

External links

Exercise

Progress

댓글 0