You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »

Watch this page in the wiki to subscribe to automatic updates to this status page.

Current Status

  •  

1 set of Nebula Gluster storage is currently acting badly all instances connected to that Block are having issues.

 

The issue with GlusterFS from earlier today has recurred. We currently have an outage of a single set of GlusterFS that is causing causing about 66 instances to be paused at this time.

Include the keyword "issue" in updates above to trigger actions.

Report a problem

Upcoming Scheduled Maintenance

StartEndWhat is happening?What will be affected?
    

 

Previous Outages

StartEndWhat happened?What was affected?Outcome
2017-03-08 19:412017-03-08 22:41XDP powered off that served the four cabinets
(c16-10, c17-10, c18-10, c19-10).
scheduler paused, four rack power cycled.
moab required a restart, too many down nodes
and itterations were stuck.
Scheduler paused
three hours
2017-03-03 17002017-03-03 2200BW hpss emergency outage to clean
up db2 database
ncsa#nearline, stores are failing with cache fullResolved cache full errors
2017-02-28 12002017-02-28 1250ICC Resource Manager downUser can't submit new jobs or start new jobsRemove corrupted job file
2017-02-22 16152017-02-221815Nebula Gluster IssuesAll Nebula instances paused while gluster repairedNebula is available.
2017-02-11 19002017-02-11 2359NPCF Power HitBW Lustre was down, xdp heat issues.

RTS 2017-02-11 2359

2017-02-15 08002017-02-15 1800ICC Scheduled PMBatch jobs and login nodes access 
  • No labels