Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

START
ENDWhat System/Service is affectedWhat is happening?What will be affected?

Actions


2018-06-12 NebulaA storage node crashed, possibly from the thunderstorms.Instances may be slow while the filesystem heals.192018-06-19
17:00
NebulaNebula is undergoing a complete reboot. Last week's storms damaged more than just one node initially thought to be affected.Nebula will be unavailable until 5pm.Shutting down/rebooting all portions of Nebula clusterOnce the storage node is back online, the filesystem will heal itself.
2018-05-03 14:30 iForge gpu queueboth nodes in the general 'gpu' queue are offline due to issues with the GPUsiForge 'gpu' queue cannot be usedDriver updates, ticket with vendor

...

nebula@ncsa.illinois.edu
StartEndWhat System/Service is affectedWhat is happening?What will be affected?Contact Person2018-06-19
10:00

2018-06-19
14:00

All Nebula functionsThe entire system needs to be rebooted. Last week's storms damaged more than just one node initially thought to be affected.All Nebula services
2018-06-15 1330hrs2018-06-15 1530hrsBlue Waters NearlineReplacement of a tape robot transporterThis work is not expected to impact operations. The library system will continue to operate with a single transporter but mount times may be somewhat longer until the second unit is returned to service.hpssadmin@ncsa.illinois.edu
2018-06-07 06:302018-06-07 14:00Blue WatersThe boot node crashed requiring the system to be rebooted. File system and ESLogins remain up.All running jobs were lost, no new jobs were started until system is return to service, Torque was updated to ver. 6.1.2.bw-admin@ncsa.illinois.edu
2018-06-19 08:002018-06-19 12:00LSST L1 Test Stand

Scheduled Maintenance:

  • BIOS firmware updates
  • Puppet and firewall changes (including support of SAL unicast/multicast traffic)
  • OS package updates (staying with CentOS 7.4)

Level One Test Stand, including:

  • lsst-daq
  • lsst-l1-*
lsst-sysadm@ncsa.illinois.edu
2018-06-21 08:002018-06-21 10:00LSST

Monthly maintenance (May):

  • pfSense firewall update
  • OS package updates/reboots for CentOS 6.9 servers (lsst-web, lsst-xfer, lsst-nagios)
  • Slurm update (lsst-dev01, lsst-verify-worker*)
  • iDRAC configuration updates on lsst-dev01 and ESXi hosts

CentOS 6.9 servers:

  • lsst-web
  • lsst-xfer
  • lsst-nagios

Slurm/verification cluster

Other impact is not expected but unexpected issues could lead to connectivity issues for other hosts or downtime for lsst-dev01 or hosted VMs

lsst-sysadm@ncsa.illinois.edu

...