Watch this page in the wiki to subscribe to automatic updates to this status page.
Current Status
All Systems running | |
---|---|
|
Include the keyword "issue" in updates above to trigger actions.
Report a problem
Upcoming Scheduled Maintenance
Start | End | What is happening? | What will be affected? |
---|---|---|---|
2017-03-10 13:00 | unknown | ICCP - We lost 10K controllers due to some type of power disturbance at ACB. | ICCP - Lost all filesystem and its a cluster wide outage. |
Previous Outages
Start | End | What happened? | What was affected? | Outcome |
---|---|---|---|---|
2017-03-09 0900 | 2017-03-09 1500 | ROGER planned PM | batch, hadoop, data transfer services & Ambari | system out for 6hrs, DT services out until 0000 |
2017-03-08 19:41 | 2017-03-08 22:41 | XDP powered off that served the four cabinets (c16-10, c17-10, c18-10, c19-10). | scheduler paused, four rack power cycled. moab required a restart, too many down nodes and itterations were stuck. | Scheduler paused three hours |
2017-03-03 1700 | 2017-03-03 2200 | BW hpss emergency outage to clean up db2 database | ncsa#nearline, stores are failing with cache full | Resolved cache full errors |
2017-02-28 1200 | 2017-02-28 1250 | ICC Resource Manager down | User can't submit new jobs or start new jobs | Remove corrupted job file |
2017-02-22 1615 | 2017-02-221815 | Nebula Gluster Issues | All Nebula instances paused while gluster repaired | Nebula is available. |
2017-02-11 1900 | 2017-02-11 2359 | NPCF Power Hit | BW Lustre was down, xdp heat issues. | RTS 2017-02-11 2359 |
2017-02-15 0800 | 2017-02-15 1800 | ICC Scheduled PM | Batch jobs and login nodes access |