Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

StartWhat System/Service is affectedWhat is happening?What will be affected?Actions
2017-10-06 09:00NebulaGluster and network issues

1) Gluster sync issues continue from yesterday's Nebula incident.
2) At approximately 16:10, a Nebula networking issue (unrelated to the Gluster issues) occurred resulting in host network drops within the Nebula infrastructure. This internal networking incident resulted in additional gluster and iscsi issues.
Many instances are broken because iSCSI is broken from the Nebula network issues. And any instances that were broken because of gluster are still broken.

Update 2017-10-09 15:30 - One of the storage servers in Nebula will be rebooted October 9, 2017 at 22:00 CDT to resolve iSCSI connections to it. Approximately 100 instances will be shutdown before the storage server is rebooted and restarted afterwards.

Actively working to resolve iSCSI and Gluster issues
2017-10-10 16:30Campus ClusterResource manager crashJob submission and job scheduling.Attempting to restart the resource manager didn't go well. Killed the scheduler to reduce complexity. Opened a case with Adaptive.

Include the keyword "issue" in updates above to trigger actions.

...