Distributed Tracing

Track requests across microservices to debug latency and failures

✨ The solution you've been looking for

Verified
Tested and verified by our team
25450 Stars

Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.

distributed-tracing microservices observability jaeger tempo opentelemetry debugging performance
Repository

See It In Action

Interactive preview & real-world examples

Live Demo
Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

I have a user API that's responding slowly. Help me set up distributed tracing to find the bottleneck across my authentication service, user service, and database layers.

Skill Processing

Analyzing request...

Agent Response

Complete Jaeger setup with instrumented services showing exact timing for each component in your request flow

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install distributed-tracing

claude-code skill install distributed-tracing
2

Config

3

First Trigger

@distributed-tracing help

Commands

CommandDescriptionRequired Args
@distributed-tracing debug-service-latencyIdentify which service in your request chain is causing slow response timesNone
@distributed-tracing analyze-service-dependenciesUnderstand how your microservices communicate and depend on each otherNone
@distributed-tracing implement-production-tracingSet up comprehensive tracing infrastructure for a production environmentNone

Typical Use Cases

Debug Service Latency

Identify which service in your request chain is causing slow response times

Analyze Service Dependencies

Understand how your microservices communicate and depend on each other

Implement Production Tracing

Set up comprehensive tracing infrastructure for a production environment

Overview

Distributed Tracing

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

Purpose

Track requests across distributed systems to understand latency, dependencies, and failure points.

When to Use

  • Debug latency issues
  • Understand service dependencies
  • Identify bottlenecks
  • Trace error propagation
  • Analyze request paths

Distributed Tracing Concepts

Trace Structure

Trace (Request ID: abc123)
  ↓
Span (frontend) [100ms]
  ↓
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]

Key Components

  • Trace - End-to-end request journey
  • Span - Single operation within a trace
  • Context - Metadata propagated between services
  • Tags - Key-value pairs for filtering
  • Logs - Timestamped events within a span

Jaeger Setup

Kubernetes Deployment

 1# Deploy Jaeger Operator
 2kubectl create namespace observability
 3kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability
 4
 5# Deploy Jaeger instance
 6kubectl apply -f - <<EOF
 7apiVersion: jaegertracing.io/v1
 8kind: Jaeger
 9metadata:
10  name: jaeger
11  namespace: observability
12spec:
13  strategy: production
14  storage:
15    type: elasticsearch
16    options:
17      es:
18        server-urls: http://elasticsearch:9200
19  ingress:
20    enabled: true
21EOF

Docker Compose

 1version: "3.8"
 2services:
 3  jaeger:
 4    image: jaegertracing/all-in-one:latest
 5    ports:
 6      - "5775:5775/udp"
 7      - "6831:6831/udp"
 8      - "6832:6832/udp"
 9      - "5778:5778"
10      - "16686:16686" # UI
11      - "14268:14268" # Collector
12      - "14250:14250" # gRPC
13      - "9411:9411" # Zipkin
14    environment:
15      - COLLECTOR_ZIPKIN_HOST_PORT=:9411

Reference: See references/jaeger-setup.md

Application Instrumentation

Python (Flask)

 1from opentelemetry import trace
 2from opentelemetry.exporter.jaeger.thrift import JaegerExporter
 3from opentelemetry.sdk.resources import SERVICE_NAME, Resource
 4from opentelemetry.sdk.trace import TracerProvider
 5from opentelemetry.sdk.trace.export import BatchSpanProcessor
 6from opentelemetry.instrumentation.flask import FlaskInstrumentor
 7from flask import Flask
 8
 9# Initialize tracer
10resource = Resource(attributes={SERVICE_NAME: "my-service"})
11provider = TracerProvider(resource=resource)
12processor = BatchSpanProcessor(JaegerExporter(
13    agent_host_name="jaeger",
14    agent_port=6831,
15))
16provider.add_span_processor(processor)
17trace.set_tracer_provider(provider)
18
19# Instrument Flask
20app = Flask(__name__)
21FlaskInstrumentor().instrument_app(app)
22
23@app.route('/api/users')
24def get_users():
25    tracer = trace.get_tracer(__name__)
26
27    with tracer.start_as_current_span("get_users") as span:
28        span.set_attribute("user.count", 100)
29        # Business logic
30        users = fetch_users_from_db()
31        return {"users": users}
32
33def fetch_users_from_db():
34    tracer = trace.get_tracer(__name__)
35
36    with tracer.start_as_current_span("database_query") as span:
37        span.set_attribute("db.system", "postgresql")
38        span.set_attribute("db.statement", "SELECT * FROM users")
39        # Database query
40        return query_database()

Node.js (Express)

 1const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
 2const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");
 3const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
 4const { registerInstrumentations } = require("@opentelemetry/instrumentation");
 5const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
 6const {
 7  ExpressInstrumentation,
 8} = require("@opentelemetry/instrumentation-express");
 9
10// Initialize tracer
11const provider = new NodeTracerProvider({
12  resource: { attributes: { "service.name": "my-service" } },
13});
14
15const exporter = new JaegerExporter({
16  endpoint: "http://jaeger:14268/api/traces",
17});
18
19provider.addSpanProcessor(new BatchSpanProcessor(exporter));
20provider.register();
21
22// Instrument libraries
23registerInstrumentations({
24  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
25});
26
27const express = require("express");
28const app = express();
29
30app.get("/api/users", async (req, res) => {
31  const tracer = trace.getTracer("my-service");
32  const span = tracer.startSpan("get_users");
33
34  try {
35    const users = await fetchUsers();
36    span.setAttributes({ "user.count": users.length });
37    res.json({ users });
38  } finally {
39    span.end();
40  }
41});

Go

 1package main
 2
 3import (
 4    "context"
 5    "go.opentelemetry.io/otel"
 6    "go.opentelemetry.io/otel/exporters/jaeger"
 7    "go.opentelemetry.io/otel/sdk/resource"
 8    sdktrace "go.opentelemetry.io/otel/sdk/trace"
 9    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
10)
11
12func initTracer() (*sdktrace.TracerProvider, error) {
13    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
14        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
15    ))
16    if err != nil {
17        return nil, err
18    }
19
20    tp := sdktrace.NewTracerProvider(
21        sdktrace.WithBatcher(exporter),
22        sdktrace.WithResource(resource.NewWithAttributes(
23            semconv.SchemaURL,
24            semconv.ServiceNameKey.String("my-service"),
25        )),
26    )
27
28    otel.SetTracerProvider(tp)
29    return tp, nil
30}
31
32func getUsers(ctx context.Context) ([]User, error) {
33    tracer := otel.Tracer("my-service")
34    ctx, span := tracer.Start(ctx, "get_users")
35    defer span.End()
36
37    span.SetAttributes(attribute.String("user.filter", "active"))
38
39    users, err := fetchUsersFromDB(ctx)
40    if err != nil {
41        span.RecordError(err)
42        return nil, err
43    }
44
45    span.SetAttributes(attribute.Int("user.count", len(users)))
46    return users, nil
47}

Reference: See references/instrumentation.md

Context Propagation

HTTP Headers

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE

Propagation in HTTP Requests

Python

1from opentelemetry.propagate import inject
2
3headers = {}
4inject(headers)  # Injects trace context
5
6response = requests.get('http://downstream-service/api', headers=headers)

Node.js

1const { propagation } = require("@opentelemetry/api");
2
3const headers = {};
4propagation.inject(context.active(), headers);
5
6axios.get("http://downstream-service/api", { headers });

Tempo Setup (Grafana)

Kubernetes Deployment

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: tempo-config
 5data:
 6  tempo.yaml: |
 7    server:
 8      http_listen_port: 3200
 9
10    distributor:
11      receivers:
12        jaeger:
13          protocols:
14            thrift_http:
15            grpc:
16        otlp:
17          protocols:
18            http:
19            grpc:
20
21    storage:
22      trace:
23        backend: s3
24        s3:
25          bucket: tempo-traces
26          endpoint: s3.amazonaws.com
27
28    querier:
29      frontend_worker:
30        frontend_address: tempo-query-frontend:9095
31---
32apiVersion: apps/v1
33kind: Deployment
34metadata:
35  name: tempo
36spec:
37  replicas: 1
38  template:
39    spec:
40      containers:
41        - name: tempo
42          image: grafana/tempo:latest
43          args:
44            - -config.file=/etc/tempo/tempo.yaml
45          volumeMounts:
46            - name: config
47              mountPath: /etc/tempo
48      volumes:
49        - name: config
50          configMap:
51            name: tempo-config

Reference: See assets/jaeger-config.yaml.template

Sampling Strategies

Probabilistic Sampling

1# Sample 1% of traces
2sampler:
3  type: probabilistic
4  param: 0.01

Rate Limiting Sampling

1# Sample max 100 traces per second
2sampler:
3  type: ratelimiting
4  param: 100

Adaptive Sampling

1from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
2
3# Sample based on trace ID (deterministic)
4sampler = ParentBased(root=TraceIdRatioBased(0.01))

Trace Analysis

Finding Slow Requests

Jaeger Query:

service=my-service
duration > 1s

Finding Errors

Jaeger Query:

service=my-service
error=true
tags.http.status_code >= 500

Service Dependency Graph

Jaeger automatically generates service dependency graphs showing:

  • Service relationships
  • Request rates
  • Error rates
  • Average latencies

Best Practices

  1. Sample appropriately (1-10% in production)
  2. Add meaningful tags (user_id, request_id)
  3. Propagate context across all service boundaries
  4. Log exceptions in spans
  5. Use consistent naming for operations
  6. Monitor tracing overhead (<1% CPU impact)
  7. Set up alerts for trace errors
  8. Implement distributed context (baggage)
  9. Use span events for important milestones
  10. Document instrumentation standards

Integration with Logging

Correlated Logs

 1import logging
 2from opentelemetry import trace
 3
 4logger = logging.getLogger(__name__)
 5
 6def process_request():
 7    span = trace.get_current_span()
 8    trace_id = span.get_span_context().trace_id
 9
10    logger.info(
11        "Processing request",
12        extra={"trace_id": format(trace_id, '032x')}
13    )

Troubleshooting

No traces appearing:

  • Check collector endpoint
  • Verify network connectivity
  • Check sampling configuration
  • Review application logs

High latency overhead:

  • Reduce sampling rate
  • Use batch span processor
  • Check exporter configuration

Reference Files

  • references/jaeger-setup.md - Jaeger installation
  • references/instrumentation.md - Instrumentation patterns
  • assets/jaeger-config.yaml.template - Jaeger configuration
  • prometheus-configuration - For metrics
  • grafana-dashboards - For visualization
  • slo-implementation - For latency SLOs

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

Kubernetes 1.20+ (for production deployment)
Docker 20.03+ and Docker Compose
Elasticsearch (for Jaeger storage)

Framework Support

OpenTelemetry ✓ (recommended) Flask/Django (Python) ✓ Express.js (Node.js) ✓ Go standard library ✓ Spring Boot (Java) ✓

Context Window

Token Usage ~3K-8K tokens depending on deployment complexity

Security & Privacy

Information

Author
wshobson
Updated
2026-01-30
Category
debugging