Search: [outage] - Kdecherf's links

As always, this kind of content is interesting to read. Short summary: a slow autoscaling of a AWS managed network service led to interesting cascading failures

Wed 03 Feb 2021

outage post-mortem

Countless emails wrongly blocked as spam after Cisco's SpamCop failed to renew domain name at the weekend www.theregister.com/2021/02/01/in_brief_security/

It seems that failing to renew business critical domains is still a thing

Mon 01 Feb 2021

outage

Not just Microsoft: Auth turns out to be a point of failure for Google's cloud too www.theregister.com/2020/12/15/auth_failure_google_and_microsoft/

Wed 16 Dec 2020

@TheRegister

cloud outage

Oh, no one knows what goes on behind locked doors... so don't leave your UPS in there www.theregister.com/2020/12/11/on_call/

😂

Fri 11 Dec 2020

@TheRegister

outage

It's always DNS, especially when a sysadmin makes a hash of their semicolons www.theregister.com/2020/11/23/who_me/

« This was all well and good until the default two weeks time-to-live expired »
😅

Mon 23 Nov 2020

@TheRegister

dns outage

The day I took down the data centre- I mean, the day I saved the day. Right, boss? www.theregister.com/2020/11/09/who_me/

« Be careful what you kick off before lunch if you want a mealtime free of phone calls »

😅

Mon 09 Nov 2020

@TheRegister

ops outage

A Terrible, Horrible, No-Good, Very Bad Day at Slack slack.engineering/a-terrible-horrible-no-good-very-bad-day-at-slack-dfe05b485f82

"Hello? Murphy's here"

Sun 05 Jul 2020 *

@lauralifts

ops outage

Google reveals the wheels almost literally fell off one of its cloudy server racks www.theregister.co.uk/2020/03/16/google_cloud_server_rack_castors/

Well, that's a funny issue 😅

Mon 16 Mar 2020

@TheRegister

datacenter google outage sre

diziet | Let's Encrypt certificate revocation diziet.dreamwidth.org/5368.html

Tue 03 Mar 2020

@jbfavre

letsencrypt outage security tls x509

How a GCP Persistent Disk Incident Snowballed into a 23-Hour Outage -- and Taught Us Some Important Lessons grafana.com/blog/2020/01/23/how-a-gcp-persistent-disk-incident-snowballed-into-a-23-hour-outage-and-taught-us-some-important-lessons/

Tue 28 Jan 2020

@grafana

feedback operations outage

Why 168,149 Valentine’s day text messages arrived in November arstechnica.com/information-technology/2019/11/why-168149-valentines-day-text-messages-arrived-in-november/

« a Syniverse server failed on February 14, 2019, causing messages that were in the queue to go undelivered. For some reason, the server was reactivated 9 months later, causing those 168,149 months-old messages to be sent »

🤷

Fri 15 Nov 2019

fail ops outage

The July Galileo Outage: What happened and why berthub.eu/articles/posts/galileo-accident/

Tue 12 Nov 2019

galileo gnss outage

GPS, Galileo & More: How do they work & what happened during the big outage? • ds9a.nl articles ds9a.nl/articles/posts/gps-gnss-how-do-they-work/

A really cool reading

Tue 12 Nov 2019

@PowerDNS_Bert

galileo gnss outage

How malformed packets caused CenturyLink’s 37-hour, nationwide outage arstechnica.com/information-technology/2019/08/centurylinks-37-hour-outage-blocked-911-service-for-17-million-people/

There are still massive outages with things like unmonitored network devices and unconfigured features enabled by default leading to interesting behaviors

Tue 20 Aug 2019

outage

We had issues with Monzo on 29th July. Here's what happened, and what we did to fix it. monzo.com/blog/2019/09/08/why-monzo-wasnt-working-on-july-29th/

Cool level of transparency

Fri 09 Aug 2019

@simonvc

outage