Network issue

Incident Report for The Linux Foundation

Postmortem

Vexxhost's public cloud experienced internal network issues which affected storage and internal networks for all systems. An internal spine was failing in a way that sometimes packets were passed and sometimes packets were dropped, which put the internal network in a failed state. Vexxhost replaced each part of the network spine until the problem switch was identified and the network was restored. This required some additional poweron/reboot actions from LF to bring compute instances back online.

Posted Nov 16, 2018 - 21:41 UTC

Resolved

This incident has been resolved.

Posted Nov 16, 2018 - 19:55 UTC

Update

We are continuing to monitor for any further issues.

Posted Nov 16, 2018 - 18:32 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Nov 16, 2018 - 17:24 UTC

Update

Critical internal network infrastructure has failed and a replacement is being done right now

Posted Nov 16, 2018 - 17:08 UTC

Update

Upstream provider reported that they're having issues with their local router and they're trying to reload it. ETA will either be under 30 minutes if things go smoothly or it might extend into several hours

Posted Nov 16, 2018 - 15:30 UTC

Identified

An issue with an upstream network provider is impacting our CI services.

Posted Nov 16, 2018 - 13:34 UTC