Architecture 선택이 product 디자인에 어떻게 ripple

모델 선택이 UX 선택이기도

Backbone 을 vacuum 에서 안 골라. 고른 paradigm 이 어떤 UX 패턴 가능, 어떤 게 wasteful, 어떤 게 required 인지 결정.

Dense — 예측 가능 UI

토큰당 예측 가능 latency 가 streaming UI 단순. 자신감 있게 typewriter 효과 보일 수 있어.
Cost 가 user 활동 대략 선형, 그래서 billing tier 깨끗이 사이즈.
Capacity planning 곱셈. 마법 없음.

MoE — 다른 인프라 스토리

토큰당 더 낮은 compute 가 scale 에서 dollar 당 더 많은 토큰 서빙.
근데 total 메모리 cost dominate. Active 가 아니라 total params 비례 GPU 필요.
Throughput 이 batch-size 민감 — 고르지 않은 expert load 가 tail-latency 놀람 야기. Production 팀이 capacity planning 에 account 해야.

Reasoning — UX 변함

변동 응답 시간이 typewriter 효과 아니라 progress 표시기 ("Thinking..." 일정 시간) 갖춘 streaming UI 필요.
요청당 더 높은 cost 가 usage cap, tiered pricing, user-facing 설정의 thinking-budget control 필요.
Visible-CoT 디자인이 thinking text UI surface 필요 — collapsible, summarized, 답과 다르게 styled.
Latency budget 이 thinking time 명시 포함해야. "30 초 답" 이 25 초 thinking + 5 초 output 일 수 있음.

Product-architecture 루프

잘 디자인된 AI product 대부분이 한 paradigm 에 commit 하고 그 주위로 UX 디자인. Cursor 의 autocomplete 가 dense 와 fast — 다른 거면 broken 느낌. ChatGPT 의 "thinking..." 표시기가 reasoning latency 의 UX accommodation. Claude.ai 의 collapsible thinking block 이 의도적 visibility 디자인. 각각이 architectural 선택의 downstream 결과.

Build 하면

Paradigm 먼저 고르고, UX 다음 디자인. UX 먼저 디자인하고 reasoning 모델을 low-latency surface 에 retrofit 하려는 게 thinking on 일 때 broken 느낌 chatbot 으로 끝나는 길.

Exercise

AI-assisted product 셋 골라. UX 만으로 각각의 paradigm (dense standard / MoE standard / reasoning) 식별. 각 product 가 만든 UX accommodation 적어: progress 표시기, thinking visibility, retry 버튼, partial output. UX 디자인이 architecture 선택의 downstream — 한번 보면 unsee 못 해.