You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 29 Next »

Watch this page in the wiki to subscribe to automatic updates to this status page.

Current Status

  •  

All Systems running

 

 

Include the keyword "issue" in updates above to trigger actions.

Report a problem

Upcoming Scheduled Maintenance

StartEndWhat is happening?What will be affected?
2017-03-10 13:00unknownWe lost 10K controllers due to some type of power disturbance.Lost all filesystem and its a cluster wide outage.

 

Previous Outages

StartEndWhat happened?What was affected?Outcome
2017-03-08 19:412017-03-08 22:41XDP powered off that served the four cabinets
(c16-10, c17-10, c18-10, c19-10).
scheduler paused, four rack power cycled.
moab required a restart, too many down nodes
and itterations were stuck.
Scheduler paused
three hours
2017-03-03 17002017-03-03 2200BW hpss emergency outage to clean
up db2 database
ncsa#nearline, stores are failing with cache fullResolved cache full errors
2017-02-28 12002017-02-28 1250ICC Resource Manager downUser can't submit new jobs or start new jobsRemove corrupted job file
2017-02-22 16152017-02-221815Nebula Gluster IssuesAll Nebula instances paused while gluster repairedNebula is available.
2017-02-11 19002017-02-11 2359NPCF Power HitBW Lustre was down, xdp heat issues.

RTS 2017-02-11 2359

2017-02-15 08002017-02-15 1800ICC Scheduled PMBatch jobs and login nodes access 
  • No labels