"OpenTelemetry for GenAI: Tracing Token Costs, Tool Calls, and RAG Latency"
Modern GenAI systems fail in expensive, opaque, and highly distributed ways: token spend drifts upward, tool chains hide the true source of latency, and retrieval pipelines quietly erode output quality. This book is written for experienced engineers, platform teams, observability specialists, and AI infrastructure architects who need production-grade visibility into these systems. It shows how to apply OpenTelemetry rigorously to model calls, agent workflows, and retrieval pipelines without falling into ad hoc tracing or vendor-specific telemetry models.
Across the book, readers learn how to model GenAI traces with evolving semantic conventions, instrument inference and streaming responses, attribute token usage to business context, and separate model latency from tool and dependency latency. It also covers tracing embeddings and RAG stages, designing collector-centric telemetry pipelines, enforcing privacy controls for prompt and response capture, and correlating traces, metrics, and logs for operational triage. The result is a practical framework for understanding cost, performance, reliability, and schema stability in real GenAI platforms.
The treatment is advanced, implementation-aware, and deliberately architecture-first. Readers should already be comfortable with distributed systems, observability basics, and production software delivery. Rather than rehearse fundamentals, the book concentrates on durable telemetry design, operational trade-offs, and migration-safe patterns that remain usef