Grafana Dashboards

Build production-ready Grafana dashboards for comprehensive observability

✨ The solution you've been looking for

Verified
Tested and verified by our team
25450 Stars

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

grafana monitoring dashboards observability metrics prometheus visualization infrastructure
Repository

See It In Action

Interactive preview & real-world examples

Live Demo
Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

Help me create a Grafana dashboard for monitoring our API service with request rate, error percentage, and P95 latency panels. Include alerting for error rates above 5%.

Skill Processing

Analyzing request...

Agent Response

Complete dashboard JSON with request rate graphs, error rate monitoring with alerts, and latency percentile visualization

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install grafana-dashboards

claude-code skill install grafana-dashboards
2

Config

3

First Trigger

@grafana-dashboards help

Commands

CommandDescriptionRequired Args
@grafana-dashboards api-service-monitoringCreate a comprehensive dashboard to monitor API service health using RED method (Rate, Errors, Duration)None
@grafana-dashboards infrastructure-overviewBuild a high-level infrastructure dashboard showing cluster health and resource utilizationNone
@grafana-dashboards database-performance-dashboardDesign a database monitoring dashboard with key performance indicators and connection metricsNone

Typical Use Cases

API Service Monitoring

Create a comprehensive dashboard to monitor API service health using RED method (Rate, Errors, Duration)

Infrastructure Overview

Build a high-level infrastructure dashboard showing cluster health and resource utilization

Database Performance Dashboard

Design a database monitoring dashboard with key performance indicators and connection metrics

Overview

Grafana Dashboards

Create and manage production-ready Grafana dashboards for comprehensive system observability.

Purpose

Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.

When to Use

  • Visualize Prometheus metrics
  • Create custom dashboards
  • Implement SLO dashboards
  • Monitor infrastructure
  • Track business KPIs

Dashboard Design Principles

1. Hierarchy of Information

┌─────────────────────────────────────┐
│  Critical Metrics (Big Numbers)     │
├─────────────────────────────────────┤
│  Key Trends (Time Series)           │
├─────────────────────────────────────┤
│  Detailed Metrics (Tables/Heatmaps) │
└─────────────────────────────────────┘

2. RED Method (Services)

  • Rate - Requests per second
  • Errors - Error rate
  • Duration - Latency/response time

3. USE Method (Resources)

  • Utilization - % time resource is busy
  • Saturation - Queue length/wait time
  • Errors - Error count

Dashboard Structure

API Monitoring Dashboard

 1{
 2  "dashboard": {
 3    "title": "API Monitoring",
 4    "tags": ["api", "production"],
 5    "timezone": "browser",
 6    "refresh": "30s",
 7    "panels": [
 8      {
 9        "title": "Request Rate",
10        "type": "graph",
11        "targets": [
12          {
13            "expr": "sum(rate(http_requests_total[5m])) by (service)",
14            "legendFormat": "{{service}}"
15          }
16        ],
17        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
18      },
19      {
20        "title": "Error Rate %",
21        "type": "graph",
22        "targets": [
23          {
24            "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
25            "legendFormat": "Error Rate"
26          }
27        ],
28        "alert": {
29          "conditions": [
30            {
31              "evaluator": { "params": [5], "type": "gt" },
32              "operator": { "type": "and" },
33              "query": { "params": ["A", "5m", "now"] },
34              "type": "query"
35            }
36          ]
37        },
38        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
39      },
40      {
41        "title": "P95 Latency",
42        "type": "graph",
43        "targets": [
44          {
45            "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
46            "legendFormat": "{{service}}"
47          }
48        ],
49        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }
50      }
51    ]
52  }
53}

Reference: See assets/api-dashboard.json

Panel Types

1. Stat Panel (Single Value)

 1{
 2  "type": "stat",
 3  "title": "Total Requests",
 4  "targets": [
 5    {
 6      "expr": "sum(http_requests_total)"
 7    }
 8  ],
 9  "options": {
10    "reduceOptions": {
11      "values": false,
12      "calcs": ["lastNotNull"]
13    },
14    "orientation": "auto",
15    "textMode": "auto",
16    "colorMode": "value"
17  },
18  "fieldConfig": {
19    "defaults": {
20      "thresholds": {
21        "mode": "absolute",
22        "steps": [
23          { "value": 0, "color": "green" },
24          { "value": 80, "color": "yellow" },
25          { "value": 90, "color": "red" }
26        ]
27      }
28    }
29  }
30}

2. Time Series Graph

 1{
 2  "type": "graph",
 3  "title": "CPU Usage",
 4  "targets": [
 5    {
 6      "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
 7    }
 8  ],
 9  "yaxes": [
10    { "format": "percent", "max": 100, "min": 0 },
11    { "format": "short" }
12  ]
13}

3. Table Panel

 1{
 2  "type": "table",
 3  "title": "Service Status",
 4  "targets": [
 5    {
 6      "expr": "up",
 7      "format": "table",
 8      "instant": true
 9    }
10  ],
11  "transformations": [
12    {
13      "id": "organize",
14      "options": {
15        "excludeByName": { "Time": true },
16        "indexByName": {},
17        "renameByName": {
18          "instance": "Instance",
19          "job": "Service",
20          "Value": "Status"
21        }
22      }
23    }
24  ]
25}

4. Heatmap

 1{
 2  "type": "heatmap",
 3  "title": "Latency Heatmap",
 4  "targets": [
 5    {
 6      "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
 7      "format": "heatmap"
 8    }
 9  ],
10  "dataFormat": "tsbuckets",
11  "yAxis": {
12    "format": "s"
13  }
14}

Variables

Query Variables

 1{
 2  "templating": {
 3    "list": [
 4      {
 5        "name": "namespace",
 6        "type": "query",
 7        "datasource": "Prometheus",
 8        "query": "label_values(kube_pod_info, namespace)",
 9        "refresh": 1,
10        "multi": false
11      },
12      {
13        "name": "service",
14        "type": "query",
15        "datasource": "Prometheus",
16        "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
17        "refresh": 1,
18        "multi": true
19      }
20    ]
21  }
22}

Use Variables in Queries

sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))

Alerts in Dashboards

 1{
 2  "alert": {
 3    "name": "High Error Rate",
 4    "conditions": [
 5      {
 6        "evaluator": {
 7          "params": [5],
 8          "type": "gt"
 9        },
10        "operator": { "type": "and" },
11        "query": {
12          "params": ["A", "5m", "now"]
13        },
14        "reducer": { "type": "avg" },
15        "type": "query"
16      }
17    ],
18    "executionErrorState": "alerting",
19    "for": "5m",
20    "frequency": "1m",
21    "message": "Error rate is above 5%",
22    "noDataState": "no_data",
23    "notifications": [{ "uid": "slack-channel" }]
24  }
25}

Dashboard Provisioning

dashboards.yml:

 1apiVersion: 1
 2
 3providers:
 4  - name: "default"
 5    orgId: 1
 6    folder: "General"
 7    type: file
 8    disableDeletion: false
 9    updateIntervalSeconds: 10
10    allowUiUpdates: true
11    options:
12      path: /etc/grafana/dashboards

Common Dashboard Patterns

Infrastructure Dashboard

Key Panels:

  • CPU utilization per node
  • Memory usage per node
  • Disk I/O
  • Network traffic
  • Pod count by namespace
  • Node status

Reference: See assets/infrastructure-dashboard.json

Database Dashboard

Key Panels:

  • Queries per second
  • Connection pool usage
  • Query latency (P50, P95, P99)
  • Active connections
  • Database size
  • Replication lag
  • Slow queries

Reference: See assets/database-dashboard.json

Application Dashboard

Key Panels:

  • Request rate
  • Error rate
  • Response time (percentiles)
  • Active users/sessions
  • Cache hit rate
  • Queue length

Best Practices

  1. Start with templates (Grafana community dashboards)
  2. Use consistent naming for panels and variables
  3. Group related metrics in rows
  4. Set appropriate time ranges (default: Last 6 hours)
  5. Use variables for flexibility
  6. Add panel descriptions for context
  7. Configure units correctly
  8. Set meaningful thresholds for colors
  9. Use consistent colors across dashboards
  10. Test with different time ranges

Dashboard as Code

Terraform Provisioning

1resource "grafana_dashboard" "api_monitoring" {
2  config_json = file("${path.module}/dashboards/api-monitoring.json")
3  folder      = grafana_folder.monitoring.id
4}
5
6resource "grafana_folder" "monitoring" {
7  title = "Production Monitoring"
8}

Ansible Provisioning

1- name: Deploy Grafana dashboards
2  copy:
3    src: "{{ item }}"
4    dest: /etc/grafana/dashboards/
5  with_fileglob:
6    - "dashboards/*.json"
7  notify: restart grafana

Reference Files

  • assets/api-dashboard.json - API monitoring dashboard
  • assets/infrastructure-dashboard.json - Infrastructure dashboard
  • assets/database-dashboard.json - Database monitoring dashboard
  • references/dashboard-design.md - Dashboard design guide
  • prometheus-configuration - For metric collection
  • slo-implementation - For SLO dashboards

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

Grafana 8.0+
Prometheus (for metrics)
Valid data source connection

Framework Support

Prometheus ✓ (recommended) InfluxDB ✓ Elasticsearch ✓ MySQL ✓ PostgreSQL ✓

Context Window

Token Usage ~3K-8K tokens for dashboard configurations and documentation

Security & Privacy

Information

Author
wshobson
Updated
2026-01-30
Category
productivity-tools