You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Watch this page in the wiki to subscribe to automatic updates to this status page.

Current Status

  •  

1 Block of Nebula Gluster storage is currently acting badly all instances connected to that Block are paused.

 

We currently have an outage of a single block of GlusterFS that is causing load on 2 other nodes. This is causing about 66 instances to be paused at this time.

Update as of 2017-02-22 4:01 PM:
Emergency Nebula Outage - 16:15-18:15 CST Today

Currently, we have an outage of a single block of GlusterFS that is causing problems with 1/5 of Nebula’s GlusterFS nodes. As a result we have about 66 instances paused at this time.

 In order to fix the issue, Nebula is going to have a COMPLETE emergency outage starting at 16:00 CST today to take the GlusterFS volume offline for a hopefully quick repair. During this outage…
- ALL instances will be paused
- ALL access to horizon and the API will be disabled
- We do not expect any data loss for instances

We do not know how long the GlusterFS healing process will take but are hopeful it may be on the order of 30 minutes. If the healing process takes longer than 90 minutes, we will then consider alternate methods of recovery and try to get Nebula back to a partially available mode ASAP.
 

For instances using cinder volumes
1 of our 3 cinder servers will need to be rebooted. iSCSI connections from instances using this server may need to be remapped manually after the outage and will require a longer time to bring back online. This will affect approximately 1/3 of instances that use cinder volumes.

A follow-up email will be sent around 17:45 CST today with an updated status of the outage.

We apologize for the short notice.

Questions can be routed to nebula@ncsa.illinois.edu or to the Nebula chat room at https://chat.ncsa.illinois.edu/channel/nebula. 

Include the keyword "issue" in updates above to trigger actions.

Report a problem

Upcoming Scheduled Maintenance

StartEndWhat is happening?What will be affected?
    

 

Previous Outages

StartEndWhat happened?What was affected?Outcome
2017-02-11 19002017-02-11 2359NPCF Power HitBW Lustre was down, xdp heat issues.

RTS 2017-02-11 2359

2017-02-15 08002017-02-15 1800ICC Scheduled PMBatch jobs and login nodes access 
  • No labels