Lux - Down
Incident Report for Darren Nathanael's Infra
Postmortem

So, the techs left the install media for the Junos in the unit. So when it rebooted, it booted into the installer and was waiting to be used. I can't believe it; they pulled it, rebooted, and went back into the OS. I literally told them to throw the USB key out 4 years ago. It has Junos 18 on it, absolute clownery.

Posted Oct 08, 2024 - 22:30 UTC

Resolved
We've found the core issue, everything is alive and happy now.
Posted Oct 08, 2024 - 22:21 UTC
Identified
We're live! Some of the routers own anti DDoS protection was acting up and causing it to drop ARP's; hence the CPU Spike.
Posted Oct 08, 2024 - 19:53 UTC
Update
LUX MX' Core router CPU usage before the traffic drop. https://lore.dpaste.org/g/xAmUhO.png
Posted Oct 08, 2024 - 15:00 UTC
Update
As we are nearing the 1-hour mark, here's a quick breakdown: (GMT -7)

- At around [13:20 UTC] 6:20 this morning, there was an issue with traffic to/from the router wasn't routing.

- Significant increase in router CPU usage leading up to this incident.

- Despite sending the router for a reboot, it has not yet returned online.

- Router may still be in the process of rebooting, it might still be in the midst of trying to reboot (stuck kernel thread, etc.).

- Luxconnect was contacted, but no immediate assistance was available they're like ("Dave's not here man"), But here's hoping a tech comes back ASAP

- Reached out to http://root.lu for support and awaiting a response.

- Requested the data center to pull the power in an effort to resolve the situation.
Posted Oct 08, 2024 - 14:27 UTC
Update
We’ve sent off an email to LuxConnect, since the router reboot obviously didn’t go happily.
Posted Oct 08, 2024 - 13:56 UTC
Update
The current issue appears to be related to the network, but we are still investigating to pinpoint the exact cause. We are trying to determine if the problem is limited to the router level or if it extends further upstream in the network stack.
Posted Oct 08, 2024 - 13:42 UTC
Update
We're waiting on Juniper MX204 core to reboot, MX' takes 5 min exact to reboot.
Posted Oct 08, 2024 - 13:33 UTC
Update
We're rebooting the core router.
Posted Oct 08, 2024 - 13:30 UTC
Investigating
We are currently investigating this issue.
Posted Oct 08, 2024 - 13:27 UTC
This incident affected: Public Infrastructure (cPanel Enterprise Shared Hosting - Lux, DA Enterprise Shared Hosting - Lux) and Core Infrastructure (Billing Panel).