Skip to main content

Observability and Tracing

LLM systems fail in different ways than traditional APIs. Observability must cover latency and errors, but also cost and output quality.

What to Measure

  • Request volume, error rates, and tail latency.
  • Token usage and cost per request, per feature, and per user cohort.
  • Tool call success rates and tool latency distribution.
  • Cache hit rates (if you cache prompts/responses).
  • Quality signals (human feedback, automated checks, eval scores).

Trade-offs

  • High-cardinality logging is useful, but can become expensive and risky.
  • Sampling reduces cost, but can miss rare failure modes.

Coming Next

  • A minimal “production dashboard” template for LLM features.
  • Tracing strategies for tool calls and RAG retrieval steps.