As always, this kind of content is interesting to read. Short summary: a slow autoscaling of a AWS managed network service led to interesting cascading failures
« a Syniverse server failed on February 14, 2019, causing messages that were in the queue to go undelivered. For some reason, the server was reactivated 9 months later, causing those 168,149 months-old messages to be sent »
There are still massive outages with things like unmonitored network devices and unconfigured features enabled by default leading to interesting behaviors