Distributed Tracing

Track requests across microservices to debug latency and failures

✨ The solution you've been looking for

Verified

Tested and verified by our team

25450 Stars

Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.

distributed-tracing microservices observability jaeger tempo opentelemetry debugging performance

See It In Action

Interactive preview & real-world examples

Live Demo

Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

I have a user API that's responding slowly. Help me set up distributed tracing to find the bottleneck across my authentication service, user service, and database layers.

Skill Processing

Analyzing request...

Agent Response

Complete Jaeger setup with instrumented services showing exact timing for each component in your request flow

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install distributed-tracing

claude-code skill install distributed-tracing

2

Config

3

First Trigger

@distributed-tracing help

Commands

Command	Description	Required Args
@distributed-tracing debug-service-latency	Identify which service in your request chain is causing slow response times	None
@distributed-tracing analyze-service-dependencies	Understand how your microservices communicate and depend on each other	None
@distributed-tracing implement-production-tracing	Set up comprehensive tracing infrastructure for a production environment	None

Typical Use Cases

Debug Service Latency

Identify which service in your request chain is causing slow response times

Analyze Service Dependencies

Understand how your microservices communicate and depend on each other

Implement Production Tracing

Set up comprehensive tracing infrastructure for a production environment

Overview

Distributed Tracing

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

Purpose

Track requests across distributed systems to understand latency, dependencies, and failure points.

When to Use

Debug latency issues
Understand service dependencies
Identify bottlenecks
Trace error propagation
Analyze request paths

Distributed Tracing Concepts

Trace Structure

Trace (Request ID: abc123)
  ↓
Span (frontend) [100ms]
  ↓
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]

Key Components

Trace - End-to-end request journey
Span - Single operation within a trace
Context - Metadata propagated between services
Tags - Key-value pairs for filtering
Logs - Timestamped events within a span

Jaeger Setup

Kubernetes Deployment

 1# Deploy Jaeger Operator
 2kubectl create namespace observability
 3kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability
 4
 5# Deploy Jaeger instance
 6kubectl apply -f - <<EOF
 7apiVersion: jaegertracing.io/v1
 8kind: Jaeger
 9metadata:
10  name: jaeger
11  namespace: observability
12spec:
13  strategy: production
14  storage:
15    type: elasticsearch
16    options:
17      es:
18        server-urls: http://elasticsearch:9200
19  ingress:
20    enabled: true
21EOF

Docker Compose

 1version: "3.8"
 2services:
 3  jaeger:
 4    image: jaegertracing/all-in-one:latest
 5    ports:
 6      - "5775:5775/udp"
 7      - "6831:6831/udp"
 8      - "6832:6832/udp"
 9      - "5778:5778"
10      - "16686:16686" # UI
11      - "14268:14268" # Collector
12      - "14250:14250" # gRPC
13      - "9411:9411" # Zipkin
14    environment:
15      - COLLECTOR_ZIPKIN_HOST_PORT=:9411

Reference: See references/jaeger-setup.md

Application Instrumentation

OpenTelemetry (Recommended)

Python (Flask)

 1from opentelemetry import trace
 2from opentelemetry.exporter.jaeger.thrift import JaegerExporter
 3from opentelemetry.sdk.resources import SERVICE_NAME, Resource
 4from opentelemetry.sdk.trace import TracerProvider
 5from opentelemetry.sdk.trace.export import BatchSpanProcessor
 6from opentelemetry.instrumentation.flask import FlaskInstrumentor
 7from flask import Flask
 8
 9# Initialize tracer
10resource = Resource(attributes={SERVICE_NAME: "my-service"})
11provider = TracerProvider(resource=resource)
12processor = BatchSpanProcessor(JaegerExporter(
13    agent_host_name="jaeger",
14    agent_port=6831,
15))
16provider.add_span_processor(processor)
17trace.set_tracer_provider(provider)
18
19# Instrument Flask
20app = Flask(__name__)
21FlaskInstrumentor().instrument_app(app)
22
23@app.route('/api/users')
24def get_users():
25    tracer = trace.get_tracer(__name__)
26
27    with tracer.start_as_current_span("get_users") as span:
28        span.set_attribute("user.count", 100)
29        # Business logic
30        users = fetch_users_from_db()
31        return {"users": users}
32
33def fetch_users_from_db():
34    tracer = trace.get_tracer(__name__)
35
36    with tracer.start_as_current_span("database_query") as span:
37        span.set_attribute("db.system", "postgresql")
38        span.set_attribute("db.statement", "SELECT * FROM users")
39        # Database query
40        return query_database()

Node.js (Express)

 1const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
 2const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");
 3const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
 4const { registerInstrumentations } = require("@opentelemetry/instrumentation");
 5const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
 6const {
 7  ExpressInstrumentation,
 8} = require("@opentelemetry/instrumentation-express");
 9
10// Initialize tracer
11const provider = new NodeTracerProvider({
12  resource: { attributes: { "service.name": "my-service" } },
13});
14
15const exporter = new JaegerExporter({
16  endpoint: "http://jaeger:14268/api/traces",
17});
18
19provider.addSpanProcessor(new BatchSpanProcessor(exporter));
20provider.register();
21
22// Instrument libraries
23registerInstrumentations({
24  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
25});
26
27const express = require("express");
28const app = express();
29
30app.get("/api/users", async (req, res) => {
31  const tracer = trace.getTracer("my-service");
32  const span = tracer.startSpan("get_users");
33
34  try {
35    const users = await fetchUsers();
36    span.setAttributes({ "user.count": users.length });
37    res.json({ users });
38  } finally {
39    span.end();
40  }
41});

Go

 1package main
 2
 3import (
 4    "context"
 5    "go.opentelemetry.io/otel"
 6    "go.opentelemetry.io/otel/exporters/jaeger"
 7    "go.opentelemetry.io/otel/sdk/resource"
 8    sdktrace "go.opentelemetry.io/otel/sdk/trace"
 9    semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
10)
11
12func initTracer() (*sdktrace.TracerProvider, error) {
13    exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
14        jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
15    ))
16    if err != nil {
17        return nil, err
18    }
19
20    tp := sdktrace.NewTracerProvider(
21        sdktrace.WithBatcher(exporter),
22        sdktrace.WithResource(resource.NewWithAttributes(
23            semconv.SchemaURL,
24            semconv.ServiceNameKey.String("my-service"),
25        )),
26    )
27
28    otel.SetTracerProvider(tp)
29    return tp, nil
30}
31
32func getUsers(ctx context.Context) ([]User, error) {
33    tracer := otel.Tracer("my-service")
34    ctx, span := tracer.Start(ctx, "get_users")
35    defer span.End()
36
37    span.SetAttributes(attribute.String("user.filter", "active"))
38
39    users, err := fetchUsersFromDB(ctx)
40    if err != nil {
41        span.RecordError(err)
42        return nil, err
43    }
44
45    span.SetAttributes(attribute.Int("user.count", len(users)))
46    return users, nil
47}

Reference: See references/instrumentation.md

Context Propagation

HTTP Headers

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE

Propagation in HTTP Requests

Python

1from opentelemetry.propagate import inject
2
3headers = {}
4inject(headers)  # Injects trace context
5
6response = requests.get('http://downstream-service/api', headers=headers)

Node.js

1const { propagation } = require("@opentelemetry/api");
2
3const headers = {};
4propagation.inject(context.active(), headers);
5
6axios.get("http://downstream-service/api", { headers });

Tempo Setup (Grafana)

Kubernetes Deployment

 1apiVersion: v1
 2kind: ConfigMap
 3metadata:
 4  name: tempo-config
 5data:
 6  tempo.yaml: |
 7    server:
 8      http_listen_port: 3200
 9
10    distributor:
11      receivers:
12        jaeger:
13          protocols:
14            thrift_http:
15            grpc:
16        otlp:
17          protocols:
18            http:
19            grpc:
20
21    storage:
22      trace:
23        backend: s3
24        s3:
25          bucket: tempo-traces
26          endpoint: s3.amazonaws.com
27
28    querier:
29      frontend_worker:
30        frontend_address: tempo-query-frontend:9095
31---
32apiVersion: apps/v1
33kind: Deployment
34metadata:
35  name: tempo
36spec:
37  replicas: 1
38  template:
39    spec:
40      containers:
41        - name: tempo
42          image: grafana/tempo:latest
43          args:
44            - -config.file=/etc/tempo/tempo.yaml
45          volumeMounts:
46            - name: config
47              mountPath: /etc/tempo
48      volumes:
49        - name: config
50          configMap:
51            name: tempo-config

Reference: See assets/jaeger-config.yaml.template

Sampling Strategies

Probabilistic Sampling

1# Sample 1% of traces
2sampler:
3  type: probabilistic
4  param: 0.01

Rate Limiting Sampling

1# Sample max 100 traces per second
2sampler:
3  type: ratelimiting
4  param: 100

Adaptive Sampling

1from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
2
3# Sample based on trace ID (deterministic)
4sampler = ParentBased(root=TraceIdRatioBased(0.01))

Trace Analysis

Finding Slow Requests

Jaeger Query:

service=my-service
duration > 1s

Finding Errors

Jaeger Query:

service=my-service
error=true
tags.http.status_code >= 500

Service Dependency Graph

Jaeger automatically generates service dependency graphs showing:

Service relationships
Request rates
Error rates
Average latencies

Best Practices

Sample appropriately (1-10% in production)
Add meaningful tags (user_id, request_id)
Propagate context across all service boundaries
Log exceptions in spans
Use consistent naming for operations
Monitor tracing overhead (<1% CPU impact)
Set up alerts for trace errors
Implement distributed context (baggage)
Use span events for important milestones
Document instrumentation standards

Integration with Logging

Correlated Logs

 1import logging
 2from opentelemetry import trace
 3
 4logger = logging.getLogger(__name__)
 5
 6def process_request():
 7    span = trace.get_current_span()
 8    trace_id = span.get_span_context().trace_id
 9
10    logger.info(
11        "Processing request",
12        extra={"trace_id": format(trace_id, '032x')}
13    )

Troubleshooting

No traces appearing:

Check collector endpoint
Verify network connectivity
Check sampling configuration
Review application logs

High latency overhead:

Reduce sampling rate
Use batch span processor
Check exporter configuration

Reference Files

references/jaeger-setup.md - Jaeger installation
references/instrumentation.md - Instrumentation patterns
assets/jaeger-config.yaml.template - Jaeger configuration

prometheus-configuration - For metrics
grafana-dashboards - For visualization
slo-implementation - For latency SLOs

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

Kubernetes 1.20+ (for production deployment)

Docker 20.03+ and Docker Compose

Elasticsearch (for Jaeger storage)

Framework Support

OpenTelemetry ✓ (recommended) Flask/Django (Python) ✓ Express.js (Node.js) ✓ Go standard library ✓ Spring Boot (Java) ✓

Context Window

Token Usage ~3K-8K tokens depending on deployment complexity

Security & Privacy

Information

Author: wshobson
Updated: 2026-01-30
Category: debugging

Related Skills

Distributed Tracing

Implement distributed tracing with Jaeger and Tempo to track requests across microservices and …

View Details →

Error Tracking

Add Sentry v8 error tracking and performance monitoring to your project services. Use this skill …

View Details →

Error Tracking

Add Sentry v8 error tracking and performance monitoring to your project services. Use this skill …

View Details →