모든 LLM endpoint가 필요한 3개 dashboard
- Health — error rate, refusal rate, p50/p95 latency, 토큰 사용, call당 cost. Alerted on.
- Quality — eval score, judge-flagged output, complaint rate, regeneration rate. 주간 review.
- Routing — 어느 prompt version / model / tool path가 어느 traffic serve했는지. Incident debugging surface.
Watch할 specific metric
- Output validation pass rate (schema match).
- Resolved request당 토큰 cost (call당 X).
- Tool-call retry rate.
- Reasoning-token consumption distribution.
- Prompt version별 p95 latency.
Alert vs dashboard
지금 사람 필요한 거 (cost spike, error rate, validation failure rate)에 alert. 다 dashboard. Emergency에 사람 깨우고, metric에 X.