I am annoyingly chipper in the morning. Mornings are great! A fresh new day, hot coffee, and limitless possibility. That said, I’m picky about being woken up. I only wanna wake up at, you know, morning time. Related, I think on-call is really interesting: exciting new problems, never-before-seen errors, and wild new customer behavior. The part about waking me up in the middle of the night is less so. Life is full of dichotomies!
On-call is vital to take care of our complex systems and customers. It’s where all weird things go that need a human to sort them out. It’s business and off-hours, solo and as a team, automation and manual. On-call is an amazing source of resilience and it deserves the best of tools. That’s why we’re making Hotpot.
I recently had the opportunity to speak about on-call at DevOps Days Rockies and was excited to discuss ways to rethink on-call. I wanted to talk about how tech has given up on-call innovation despite the abundance of value in other fields and industries. You can watch the talk here.
DevOps Days Rockies 2024 Talk
The talk discusses implementing an organizational on-call program to improve resilience. You can watch it, but you can also use Hotpot because we’ve built all this and more into Hotpot's scheduler and handoff features. You get a world-class on-call program for your entire organization!
Citations
- Erik Hollnagel’s Resilience Assessment Grid describes the four attributes required for resilience performance.
- Brendan Gregg’s USE method provides a way to analyze the performance of any system.
- Chad Todd’s Handover Communications in Software Operations looked qualitatively at handoffs and engineer confidence.
- And let’s not forget How Complex Systems Fail as a general guide to why all this is needed!