Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  •   

1 Block of Nebula Gluster storage is currently acting badly all instances connected to that Block are paused.

 

We currently have an outage of a single block of GlusterFS that is causing load on 2 other nodes. This is causing about 66 instances to be paused at this time.

Update as of 2017-02-22 4:01 PM:
Emergency Nebula Outage - 16:15-18:15 CST Today

Currently, we have an outage of a single block of GlusterFS that is causing problems with 1/5 of Nebula’s GlusterFS nodes. As a result we have about 66 instances paused at this time.

 In order to fix the issue, Nebula is going to have a COMPLETE emergency outage starting at 16:00 CST today to take the GlusterFS volume offline for a hopefully quick repair. During this outage…
- ALL instances will be paused
- ALL access to horizon and the API will be disabled
- We do not expect any data loss for instances

We do not know how long the GlusterFS healing process will take but are hopeful it may be on the order of 30 minutes. If the healing process takes longer than 90 minutes, we will then consider alternate methods of recovery and try to get Nebula back to a partially available mode ASAP.
 

For instances using cinder volumes
1 of our 3 cinder servers will likely need to be rebooted. iSCSI connections from instances using this server will may need to be remapped manually after the outage and will require a longer time to bring back online. This will affect approximately 1/3 of instances that use cinder volumes.

A follow-up email will be sent around 17:45 CST today with an updated status of the outage.

We apologize for the short notice.

Questions can be routed to  help+ nebula@ncsa.illinois.edu or to the Nebula chat room at https://chat.ncsa.illinois.edu/channel/nebula. 

...