Cats on a Keyboard

Sign in Subscribe

Writings

Incident Report: Cascading failure on KeyDB server

Incident Report: Cascading failure on KeyDB server

An issue with KeyDB logging caused our infrastructure to briefly crash. All services are now working.

Incident Report: Full Networking failure on April 9, 2023

Incident Report: Full Networking failure on April 9, 2023

On around 2AM ICT, we have detected some unusual CPU utilization on one of our cluster nodes, presumed to be some kind of CRI bug related to containerd and runc which is an upstream issue, throttling all other services and causing them to fail. We have decided to do a

Full Sail: Migrating our Infrastructure to Kubernetes

A story of migration, corner cutting, and elaborate hacks in an attempt to achieve high availbility and observability