You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 33 Next »

Watch this page in the wiki to subscribe to automatic updates to this status page.

Current Status

  •  

All Systems running

  

 

Include the keyword "issue" in updates above to trigger actions.

Report a problem

Upcoming Scheduled Maintenance

StartEndWhat is happening?What will be affected?
    

 

Previous Outages

StartEndWhat happened?What was affected?Outcome
2017-03-10 13:002017-03-10 18:00ICCP - We lost 10K controllers due to some type of power disturbance at ACB.ICCP - Lost all filesystem and its a cluster wide outage.Recovered missing and LUNs and rebooted the cluster. Cluster was back in service at 18:00.
2017-03-09 09002017-03-09 1500ROGER planned PMbatch, hadoop, data transfer services & Ambarisystem out for 6hrs, DT services out until 0000
2017-03-08 19:412017-03-08 22:41XDP powered off that served the four cabinets
(c16-10, c17-10, c18-10, c19-10).
scheduler paused, four rack power cycled.
moab required a restart, too many down nodes
and itterations were stuck.
Scheduler paused
three hours
2017-03-03 17002017-03-03 2200BW hpss emergency outage to clean
up db2 database
ncsa#nearline, stores are failing with cache fullResolved cache full errors
2017-02-28 12002017-02-28 1250ICC Resource Manager downUser can't submit new jobs or start new jobsRemove corrupted job file
2017-02-22 16152017-02-221815Nebula Gluster IssuesAll Nebula instances paused while gluster repairedNebula is available.
2017-02-11 19002017-02-11 2359NPCF Power HitBW Lustre was down, xdp heat issues.

RTS 2017-02-11 2359

2017-02-15 08002017-02-15 1800ICC Scheduled PMBatch jobs and login nodes access 
  • No labels