Watch this page in the wiki to subscribe to automatic updates to this status page. |
1 set of Nebula Gluster storage is currently acting badly all instances connected to that Block are having issues. | |
---|---|
The issue with GlusterFS from earlier today has recurred. We currently have an outage of a single set of GlusterFS that is causing causing about 66 instances to be paused at this time. |
Include the keyword "issue" in updates above to trigger actions.
Start | End | What is happening? | What will be affected? |
---|---|---|---|
Start | End | What happened? | What was affected? | Outcome |
---|---|---|---|---|
2017-03-08 19:41 | 2017-03-08 22:41 | XDP powered off that served the four cabinets (c16-10, c17-10, c18-10, c19-10). | scheduler paused, four rack power cycled. moab required a restart, too many down nodes and itterations were stuck. | Scheduler paused three hours |
2017-03-03 1700 | 2017-03-03 2200 | BW hpss emergency outage to clean up db2 database | ncsa#nearline, stores are failing with cache full | Resolved cache full errors |
2017-02-28 1200 | 2017-02-28 1250 | ICC Resource Manager down | User can't submit new jobs or start new jobs | Remove corrupted job file |
2017-02-22 1615 | 2017-02-221815 | Nebula Gluster Issues | All Nebula instances paused while gluster repaired | Nebula is available. |
2017-02-11 1900 | 2017-02-11 2359 | NPCF Power Hit | BW Lustre was down, xdp heat issues. | RTS 2017-02-11 2359 |
2017-02-15 0800 | 2017-02-15 1800 | ICC Scheduled PM | Batch jobs and login nodes access |