Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added entry for ROGER's PM

...

StartEndWhat happened?What was affected?Outcome
2017-03-09 09002017-03-09 1500ROGER planned PMbatch, hadoop, data transfer services & Ambarisystem out for 6hrs, DT services out until 0000
2017-03-08 19:412017-03-08 22:41XDP powered off that served the four cabinets
(c16-10, c17-10, c18-10, c19-10).
scheduler paused, four rack power cycled.
moab required a restart, too many down nodes
and itterations were stuck.
Scheduler paused
three hours
2017-03-03 17002017-03-03 2200BW hpss emergency outage to clean
up db2 database
ncsa#nearline, stores are failing with cache fullResolved cache full errors
2017-02-28 12002017-02-28 1250ICC Resource Manager downUser can't submit new jobs or start new jobsRemove corrupted job file
2017-02-22 16152017-02-221815Nebula Gluster IssuesAll Nebula instances paused while gluster repairedNebula is available.
2017-02-11 19002017-02-11 2359NPCF Power HitBW Lustre was down, xdp heat issues.

RTS 2017-02-11 2359

2017-02-15 08002017-02-15 1800ICC Scheduled PMBatch jobs and login nodes access