Tradeoff 와 v8 / RWKV-X 미래

Recall — 같은 wall, 다른 모양

RWKV-7 은 passkey-retrieval 테스트에서 약 28K 토큰 까지 좋은 performance 유지. 그 너머 degrade. failure mode 모양은 Mamba 와 다르지만 (RWKV 의 exponential decay 가 Mamba 의 selective compression 과 다른 forgetting pattern 산출), failure 의 category 는 같아: finite state 가 임의 정보량 저장 불가.

이게 이 quest 의 반복 테마. attention 교체 시도하는 모든 architecture 가 어떤 context length 에서 같은 wall 에 부딪혀, 다른 length 와 다른 failure 모양으로. RWKV 의 wall 은 ~28K. Mamba 도 비슷. Hyena 는 모양은 다른데 있어. Transformer 는 wall 없지만 cost 있어.

Ecosystem reality check

RWKV 의 ecosystem 은 alternative architecture 치고 인상적으로 broad 하지만, Transformer 대비 훨씬 작아. Fine-tuning recipe, instruct-tuned variant, evaluation harness, RLHF 인프라 — 다 더 얇아. 벤치마크 경쟁력이 architectural constraint 보상하기 위해 여전히 더 많은 학습 토큰 필요해서, 작은 fine-tuning run 이 동등 Transformer fine-tune 보다 더 단단한 ceiling hit.

Upside: 작은 ecosystem 은 기여할 기회가 더 많다는 뜻. Researcher 나 hobbyist 인데 mark 남기고 싶다면, RWKV 의 tooling 이 2026 Transformer ecosystem 보다 더 많은 low-hanging fruit 가져.

다가오는 거: v8 Heron 과 RWKV-X

RWKV-8 "Heron" 가 두 새 아이디어 도입: DeepEmbed (VRAM overhead 없는 edge-MoE-style sparse 메커니즘, fixed inference memory 에서 더 큰 effective model capacity 허용) 와 ROSA (구조화된 지식을 위한 neurosymbolic suffix automaton — recurrent architecture 에 명시적 symbolic state 가져옴).

RWKV-X (2025.4) 는 recall 한계에 대한 family 의 hybrid 응답: RWKV layer + sparse attention layer, Jamba 와 Nemotron-H 의 SSM-attention hybrid 패턴 mirror. Hybrid 로의 수렴이 family 들 전반에서 일어나, Mamba 안에서만 아니라.

Exercise

RWKV-7 에 직접 passkey-retrieval 테스트 돌려: 알려진 string 을 synthetic context (10K, 28K, 64K 토큰의 filler) 의 다양한 깊이에 두고, 모델한테 반복 요청. depth vs recall accuracy 그래프. 25–32K 토큰 어딘가에 명확한 절벽 봐야. 그 절벽이 진짜고 RWKV 에 대해 내재화할 가장 중요한 실용 한계.

Recall — 같은 wall, 다른 모양

Ecosystem reality check

다가오는 거: v8 Heron 과 RWKV-X

External links

Exercise

Progress

댓글 0