Single-model 의존이 fragile
Model provider가 outage 가져. Rate limit hit. Capacity tier fluctuate. Serious system이 fallback chain 가져서 primary path unavailable해도 user가 답 keep getting.
Chain shape
- Primary — best quality, 너의 default.
- Secondary — different provider, similar capability. Different infrastructure = different outage profile.
- Tertiary — cheaper나 local model, 낮은 quality인데 available. Last-resort.
- Cached / static fallback — 일부 prompt에 cached "sorry, try again" with useful pointer가 blocked request보다 나을 수 있어.
Wiring
- Failure detect: 5xx, timeout, benign request에 content-policy refusal, validator-fail-and-retry-failed.
- Failure에 chain step; 모든 step log.
- Hop budget set: 최대 3 hop; 그 후 user한테 structured error로 fail clean.
- Per-step quality track — tertiary가 정기적으로 traffic serve하면 너의 eval에 포함.