너 rate limited; user도
Provider rate limit이 3 layer에 sit — RPM, TPM, concurrency. 너의 application이 그걸 respect, hit하면 gracefully recover, user한테 backpressure propagate해서 request 안 piled up.
Tactic
- Smoothing — egress에 token bucket / leaky bucket, burst가 너 429 안 시키게.
- Jitter 박힌 retry — randomization 박힌 exponential backoff, attempt cap.
- Per-user limit — runaway client 보호; UI에 limit surface.
- Backpressure — downstream hot이면 caller한테 429 return; silent하게 queue X.
- Tier-aware routing — heavy user를 higher-tier API key로, light user를 shared pool로.
User한테 노출할 것
- 유용한 429 ("slow down — try again in 30 seconds").
- 유료 customer용 quota dashboard.
- Rate limit 가까울 때 UI affordance (disabled button, banner).