Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

Watch this page in the wiki to subscribe to automatic updates to this status page.

Current Status

Active
Issue

Blue Waters Emergency Maintenance HPSS ncsa#Nearline, Mar 3 5PM - Mar 4 1AM.
Software patch to fix delete requests causing cache full error message during store requests.

  •   

1 set of Nebula Gluster storage is currently acting badly all instances connected to that Block are having issues.

 

The issue with GlusterFS from earlier today has recurred. We currently have an outage of a single set of GlusterFS that is causing causing about 66 instances to be paused at this time.

...

StartEndWhat happened?What was affected?Outcome
2017-03-02 03 17002017-03-04 010003 2200BW hpss emergency outage to apply patchclean
up db2 database
ncsa#nearline, stores are failing with cache full Reasolved cache full errors
2017-02-28 12002017-02-28 1250ICC Resource Manager downUser can't submit new jobs or start new jobsRemove corrupted job file
2017-02-22 16152017-02-221815Nebula Gluster IssuesAll Nebula instances paused while gluster repairedNebula is available.
2017-02-11 19002017-02-11 2359NPCF Power HitBW Lustre was down, xdp heat issues.

RTS 2017-02-11 2359

2017-02-15 08002017-02-15 1800ICC Scheduled PMBatch jobs and login nodes access