C.W.K.
Stream
Lesson 04 of 10 · published

Telemetry — production에서 뭐 측정

~12 min · production, telemetry

Level 0수련생
0 XP0/100 lessons0/14 achievements
0/120 XP to next level120 XP to go0% complete

모든 LLM endpoint가 필요한 3개 dashboard

  • Health — error rate, refusal rate, p50/p95 latency, 토큰 사용, call당 cost. Alerted on.
  • Quality — eval score, judge-flagged output, complaint rate, regeneration rate. 주간 review.
  • Routing — 어느 prompt version / model / tool path가 어느 traffic serve했는지. Incident debugging surface.

Watch할 specific metric

  • Output validation pass rate (schema match).
  • Resolved request당 토큰 cost (call당 X).
  • Tool-call retry rate.
  • Reasoning-token consumption distribution.
  • Prompt version별 p95 latency.

Alert vs dashboard

지금 사람 필요한 거 (cost spike, error rate, validation failure rate)에 alert. 다 dashboard. Emergency에 사람 깨우고, metric에 X.

Code

Per-request metric record·python
metrics.timing("llm.latency_ms", elapsed_ms, tags=[f"prompt:{ver}", f"model:{model}"])
metrics.incr("llm.requests", tags=[f"prompt:{ver}", f"outcome:{outcome}"])
metrics.gauge("llm.input_tokens", usage.input_tokens, tags=[f"prompt:{ver}"])
metrics.gauge("llm.output_tokens", usage.output_tokens)
metrics.gauge("llm.thinking_tokens", usage.thinking_tokens or 0)

External links

Exercise

한 endpoint에 prompt-version-tagged metric 추가. version별 pass rate, latency, cost 보여주는 dashboard build. v_new deploy 시 dashboard update verify.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.