On Call Handoff Patterns
Master seamless on-call transitions with expert handoff patterns
✨ The solution you've been looking for
Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use when transitioning on-call responsibilities, documenting shift summaries, or improving on-call processes.
See It In Action
Interactive preview & real-world examples
AI Conversation Simulator
See how users interact with this skill
User Prompt
Generate a shift handoff document for the platform team. We have one ongoing API timeout investigation and a major release scheduled for tomorrow.
Skill Processing
Analyzing request...
Agent Response
Complete handoff document with active incidents, ongoing investigations, recent changes, known issues, and upcoming events
Quick Start (3 Steps)
Get up and running in minutes
Install
claude-code skill install on-call-handoff-patterns
claude-code skill install on-call-handoff-patternsConfig
First Trigger
@on-call-handoff-patterns helpCommands
| Command | Description | Required Args |
|---|---|---|
| @on-call-handoff-patterns standard-shift-handoff | Create comprehensive handoff documentation for routine shift transitions | None |
| @on-call-handoff-patterns mid-incident-handoff | Transfer incident ownership during active emergencies | None |
| @on-call-handoff-patterns quick-async-handoff | Document essential information for time-sensitive handoffs | None |
Typical Use Cases
Standard Shift Handoff
Create comprehensive handoff documentation for routine shift transitions
Mid-Incident Handoff
Transfer incident ownership during active emergencies
Quick Async Handoff
Document essential information for time-sensitive handoffs
Overview
On-Call Handoff Patterns
Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.
When to Use This Skill
- Transitioning on-call responsibilities
- Writing shift handoff summaries
- Documenting ongoing investigations
- Establishing on-call rotation procedures
- Improving handoff quality
- Onboarding new on-call engineers
Core Concepts
1. Handoff Components
| Component | Purpose |
|---|---|
| Active Incidents | What’s currently broken |
| Ongoing Investigations | Issues being debugged |
| Recent Changes | Deployments, configs |
| Known Issues | Workarounds in place |
| Upcoming Events | Maintenance, releases |
2. Handoff Timing
Recommended: 30 min overlap between shifts
Outgoing:
├── 15 min: Write handoff document
└── 15 min: Sync call with incoming
Incoming:
├── 15 min: Review handoff document
├── 15 min: Sync call with outgoing
└── 5 min: Verify alerting setup
Templates
Template 1: Shift Handoff Document
1# On-Call Handoff: Platform Team
2
3**Outgoing**: @alice (2024-01-15 to 2024-01-22)
4**Incoming**: @bob (2024-01-22 to 2024-01-29)
5**Handoff Time**: 2024-01-22 09:00 UTC
6
7---
8
9## 🔴 Active Incidents
10
11### None currently active
12
13No active incidents at handoff time.
14
15---
16
17## 🟡 Ongoing Investigations
18
19### 1. Intermittent API Timeouts (ENG-1234)
20
21**Status**: Investigating
22**Started**: 2024-01-20
23**Impact**: ~0.1% of requests timing out
24
25**Context**:
26
27- Timeouts correlate with database backup window (02:00-03:00 UTC)
28- Suspect backup process causing lock contention
29- Added extra logging in PR #567 (deployed 01/21)
30
31**Next Steps**:
32
33- [ ] Review new logs after tonight's backup
34- [ ] Consider moving backup window if confirmed
35
36**Resources**:
37
38- Dashboard: [API Latency](https://grafana/d/api-latency)
39- Thread: #platform-eng (01/20, 14:32)
40
41---
42
43### 2. Memory Growth in Auth Service (ENG-1235)
44
45**Status**: Monitoring
46**Started**: 2024-01-18
47**Impact**: None yet (proactive)
48
49**Context**:
50
51- Memory usage growing ~5% per day
52- No memory leak found in profiling
53- Suspect connection pool not releasing properly
54
55**Next Steps**:
56
57- [ ] Review heap dump from 01/21
58- [ ] Consider restart if usage > 80%
59
60**Resources**:
61
62- Dashboard: [Auth Service Memory](https://grafana/d/auth-memory)
63- Analysis doc: [Memory Investigation](https://docs/eng-1235)
64
65---
66
67## 🟢 Resolved This Shift
68
69### Payment Service Outage (2024-01-19)
70
71- **Duration**: 23 minutes
72- **Root Cause**: Database connection exhaustion
73- **Resolution**: Rolled back v2.3.4, increased pool size
74- **Postmortem**: [POSTMORTEM-89](https://docs/postmortem-89)
75- **Follow-up tickets**: ENG-1230, ENG-1231
76
77---
78
79## 📋 Recent Changes
80
81### Deployments
82
83| Service | Version | Time | Notes |
84| ------------ | ------- | ----------- | -------------------------- |
85| api-gateway | v3.2.1 | 01/21 14:00 | Bug fix for header parsing |
86| user-service | v2.8.0 | 01/20 10:00 | New profile features |
87| auth-service | v4.1.2 | 01/19 16:00 | Security patch |
88
89### Configuration Changes
90
91- 01/21: Increased API rate limit from 1000 to 1500 RPS
92- 01/20: Updated database connection pool max from 50 to 75
93
94### Infrastructure
95
96- 01/20: Added 2 nodes to Kubernetes cluster
97- 01/19: Upgraded Redis from 6.2 to 7.0
98
99---
100
101## ⚠️ Known Issues & Workarounds
102
103### 1. Slow Dashboard Loading
104
105**Issue**: Grafana dashboards slow on Monday mornings
106**Workaround**: Wait 5 min after 08:00 UTC for cache warm-up
107**Ticket**: OPS-456 (P3)
108
109### 2. Flaky Integration Test
110
111**Issue**: `test_payment_flow` fails intermittently in CI
112**Workaround**: Re-run failed job (usually passes on retry)
113**Ticket**: ENG-1200 (P2)
114
115---
116
117## 📅 Upcoming Events
118
119| Date | Event | Impact | Contact |
120| ----------- | -------------------- | ------------------- | ------------- |
121| 01/23 02:00 | Database maintenance | 5 min read-only | @dba-team |
122| 01/24 14:00 | Major release v5.0 | Monitor closely | @release-team |
123| 01/25 | Marketing campaign | 2x traffic expected | @platform |
124
125---
126
127## 📞 Escalation Reminders
128
129| Issue Type | First Escalation | Second Escalation |
130| --------------- | -------------------- | ----------------- |
131| Payment issues | @payments-oncall | @payments-manager |
132| Auth issues | @auth-oncall | @security-team |
133| Database issues | @dba-team | @infra-manager |
134| Unknown/severe | @engineering-manager | @vp-engineering |
135
136---
137
138## 🔧 Quick Reference
139
140### Common Commands
141
142```bash
143# Check service health
144kubectl get pods -A | grep -v Running
145
146# Recent deployments
147kubectl get events --sort-by='.lastTimestamp' | tail -20
148
149# Database connections
150psql -c "SELECT count(*) FROM pg_stat_activity;"
151
152# Clear cache (emergency only)
153redis-cli FLUSHDB
154```
Important Links
Handoff Checklist
Outgoing Engineer
- Document active incidents
- Document ongoing investigations
- List recent changes
- Note known issues
- Add upcoming events
- Sync with incoming engineer
Incoming Engineer
- Read this document
- Join sync call
- Verify PagerDuty is routing to you
- Verify Slack notifications working
- Check VPN/access working
- Review critical dashboards
### Template 2: Quick Handoff (Async)
```markdown
# Quick Handoff: @alice → @bob
## TL;DR
- No active incidents
- 1 investigation ongoing (API timeouts, see ENG-1234)
- Major release tomorrow (01/24) - be ready for issues
## Watch List
1. API latency around 02:00-03:00 UTC (backup window)
2. Auth service memory (restart if > 80%)
## Recent
- Deployed api-gateway v3.2.1 yesterday (stable)
- Increased rate limits to 1500 RPS
## Coming Up
- 01/23 02:00 - DB maintenance (5 min read-only)
- 01/24 14:00 - v5.0 release
## Questions?
I'll be available on Slack until 17:00 today.
Template 3: Incident Handoff (Mid-Incident)
1# INCIDENT HANDOFF: Payment Service Degradation
2
3**Incident Start**: 2024-01-22 08:15 UTC
4**Current Status**: Mitigating
5**Severity**: SEV2
6
7---
8
9## Current State
10
11- Error rate: 15% (down from 40%)
12- Mitigation in progress: scaling up pods
13- ETA to resolution: ~30 min
14
15## What We Know
16
171. Root cause: Memory pressure on payment-service pods
182. Triggered by: Unusual traffic spike (3x normal)
193. Contributing: Inefficient query in checkout flow
20
21## What We've Done
22
23- Scaled payment-service from 5 → 15 pods
24- Enabled rate limiting on checkout endpoint
25- Disabled non-critical features
26
27## What Needs to Happen
28
291. Monitor error rate - should reach <1% in ~15 min
302. If not improving, escalate to @payments-manager
313. Once stable, begin root cause investigation
32
33## Key People
34
35- Incident Commander: @alice (handing off)
36- Comms Lead: @charlie
37- Technical Lead: @bob (incoming)
38
39## Communication
40
41- Status page: Updated at 08:45
42- Customer support: Notified
43- Exec team: Aware
44
45## Resources
46
47- Incident channel: #inc-20240122-payment
48- Dashboard: [Payment Service](https://grafana/d/payments)
49- Runbook: [Payment Degradation](https://wiki/runbooks/payments)
50
51---
52
53**Incoming on-call (@bob) - Please confirm you have:**
54
55- [ ] Joined #inc-20240122-payment
56- [ ] Access to dashboards
57- [ ] Understand current state
58- [ ] Know escalation path
Handoff Sync Meeting
Agenda (15 minutes)
1## Handoff Sync: @alice → @bob
2
31. **Active Issues** (5 min)
4 - Walk through any ongoing incidents
5 - Discuss investigation status
6 - Transfer context and theories
7
82. **Recent Changes** (3 min)
9 - Deployments to watch
10 - Config changes
11 - Known regressions
12
133. **Upcoming Events** (3 min)
14 - Maintenance windows
15 - Expected traffic changes
16 - Releases planned
17
184. **Questions** (4 min)
19 - Clarify anything unclear
20 - Confirm access and alerting
21 - Exchange contact info
On-Call Best Practices
Before Your Shift
1## Pre-Shift Checklist
2
3### Access Verification
4
5- [ ] VPN working
6- [ ] kubectl access to all clusters
7- [ ] Database read access
8- [ ] Log aggregator access (Splunk/Datadog)
9- [ ] PagerDuty app installed and logged in
10
11### Alerting Setup
12
13- [ ] PagerDuty schedule shows you as primary
14- [ ] Phone notifications enabled
15- [ ] Slack notifications for incident channels
16- [ ] Test alert received and acknowledged
17
18### Knowledge Refresh
19
20- [ ] Review recent incidents (past 2 weeks)
21- [ ] Check service changelog
22- [ ] Skim critical runbooks
23- [ ] Know escalation contacts
24
25### Environment Ready
26
27- [ ] Laptop charged and accessible
28- [ ] Phone charged
29- [ ] Quiet space available for calls
30- [ ] Secondary contact identified (if traveling)
During Your Shift
1## Daily On-Call Routine
2
3### Morning (start of day)
4
5- [ ] Check overnight alerts
6- [ ] Review dashboards for anomalies
7- [ ] Check for any P0/P1 tickets created
8- [ ] Skim incident channels for context
9
10### Throughout Day
11
12- [ ] Respond to alerts within SLA
13- [ ] Document investigation progress
14- [ ] Update team on significant issues
15- [ ] Triage incoming pages
16
17### End of Day
18
19- [ ] Hand off any active issues
20- [ ] Update investigation docs
21- [ ] Note anything for next shift
After Your Shift
1## Post-Shift Checklist
2
3- [ ] Complete handoff document
4- [ ] Sync with incoming on-call
5- [ ] Verify PagerDuty routing changed
6- [ ] Close/update investigation tickets
7- [ ] File postmortems for any incidents
8- [ ] Take time off if shift was stressful
Escalation Guidelines
When to Escalate
1## Escalation Triggers
2
3### Immediate Escalation
4
5- SEV1 incident declared
6- Data breach suspected
7- Unable to diagnose within 30 min
8- Customer or legal escalation received
9
10### Consider Escalation
11
12- Issue spans multiple teams
13- Requires expertise you don't have
14- Business impact exceeds threshold
15- You're uncertain about next steps
16
17### How to Escalate
18
191. Page the appropriate escalation path
202. Provide brief context in Slack
213. Stay engaged until escalation acknowledges
224. Hand off cleanly, don't just disappear
Best Practices
Do’s
- Document everything - Future you will thank you
- Escalate early - Better safe than sorry
- Take breaks - Alert fatigue is real
- Keep handoffs synchronous - Async loses context
- Test your setup - Before incidents, not during
Don’ts
- Don’t skip handoffs - Context loss causes incidents
- Don’t hero - Escalate when needed
- Don’t ignore alerts - Even if they seem minor
- Don’t work sick - Swap shifts instead
- Don’t disappear - Stay reachable during shift
Resources
What Users Are Saying
Real feedback from the community
Environment Matrix
Dependencies
Context Window
Security & Privacy
Information
- Author
- wshobson
- Updated
- 2026-01-30
- Category
- productivity-tools
Related Skills
On Call Handoff Patterns
Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use …
View Details →Incident Runbook Templates
Create structured incident response runbooks with step-by-step procedures, escalation paths, and …
View Details →Incident Runbook Templates
Create structured incident response runbooks with step-by-step procedures, escalation paths, and …
View Details →