Architecture 의사결정 표

벽에-pin-할 표

아래가 2026 production 결정 위한 실용 비교표. Print, 화면 옆에 pin, 어떤 architecture commitment 만들기 전 consult.

Architecture	핵심 아이디어	주된 advantage	주된 약점	Best 사용 사례	Status (2026)
Transformer	Full pairwise attention	Best recall, 가장 mature ecosystem	O(n²) cost, KV-cache 자람	32K 미만 일반 NLP, frontier reasoning	Use Now
Efficient Transformer (FA3 + GQA + sliding window)	엔지니어링 win 가진 표준 attention	Pragmatic, 완전 ecosystem-호환	Window 위에서 여전히 fundamentally quadratic	64K 토큰까지, recall 중요	Use Now
Mamba / SSM	Selective state-space 압축	Linear time, O(1) 추론 메모리	약한 recall, 좁은 LR window	Throughput- 또는 memory-bound long context	Watch Closely
RWKV	RNN-Transformer dual form	토큰 당 constant cost, 1.5B 디바이스 proven	더 작은 ecosystem, ~28K recall 한계	On-device, streaming, edge	Watch Closely
RetNet	Exponential decay retention	세 computation mode	Data-independent decay 가 expressiveness 한계	Research foundation, 개념적 영향	Research
Hyena	Implicit long convolution	64K+ 에서 attention 보다 100× 빠름	약한 recall, niche language quality	Genomics, byte-level, 매우 긴 sequence	Watch (domain-specific)
Hybrid SSM-Attention	SSM 다수 + sparse attention layer	Best quality–efficiency balance	처음부터 디자인 위한 architectural 복잡도	Long-context production (32K–1M)	Use Now
Linear Attention (Kimi Linear, MHLA)	State 구조 가진 더 영리한 linear-attention	Transformer 위 drop-in, 큰 speedup	더 새로움, 덜 battle-tested	Long-context summarization, streaming	Watch Closely

이 표 읽는 법

Use Now 의미: production-ready, mature tooling, 팀과 CFO 한테 defensible. Watch Closely 의미: real production 배포 존재, architecture 가 solidify 중, 워크로드가 push 하면 2026 에 평가 가능해야. Research 의미: 개념적으로 중요하고 이해 가치 있는데, 2026 에 경쟁 alternative 대비 고를 production case 없음.

Exercise

이 표 print (또는 엔지니어링 노트북에 재생성). 팀의 가장 중요한 production 워크로드 셋에 대해, 표 옆에 각자 한 줄 답: 이 워크로드가 어떤 architecture lane 에 사나? 다음에 누가 정당화 없이 architecture 변경 제안할 때 그 답 참조 — 표가 대화 시작점.

벽에-pin-할 표

이 표 읽는 법

External links

Exercise

Progress

댓글 0