Persona drift — turn 사이로 catch

~14 min · conversation, persona, drift

Level 0수련생

0 XP0/100 lessons0/14 achievements

0/120 XP to next level120 XP to go0% complete

erode하는 persona

turn 20쯤 너의 warm, terse한 support agent가 chatty하고 hedging하고 너무 apologize하는 assistant 됐어. 모델이 깨진 게 아니라 — user의 긴 message mirror, context에 hedging 봐서 hedging, 50 turn history 무게에 system prompt tone clause 천천히 잊고 있어.

3가지 drift mechanism

Mirroring — 모델이 user의 tone, length, vocabulary 채택.
Self-imitation — 한 번 hedge하면 hedging이 context에 있어서 더 hedge.
System-prompt dilution — system 200 토큰; turn-50 history 20,000 토큰; system 상대 weight 떨어짐.

counter-drift tactic

주기적 re-anchor (N turn마다 한 줄 tone reminder re-inject).
compaction으로 drifty assistant turn history에서 제거.
explicit anti-drift 문장 추가 ("Do not adopt the user's tone").
assistant output에 tone classifier 돌리고 drift alert.

Code

주기적 re-anchor·python

if turn_count % 10 == 0:
    messages.append({
        "role": "user",
        "content": "[system reminder] Stay terse. Open with the next step. Do not apologize."
    })

External links

Exercise

system에서 긴 대화 100개 capture. assistant output을 tone drift (terseness, hedging frequency)로 score. turn 수 대비 drift plot. 진짜면 half-life에 re-anchor ship.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousMemory — 모델이 기억할 때와 안 할 때 Next →긴 대화 — compaction 전략

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.