...
Start | What System/Service is affected | What is happening? | What will be affected? | Actions |
---|---|---|---|---|
2017-10-21 17:15 | LSST | Two public/protected network switch is switches are down in rack racks N76, O76 at NPCF | All verify-worker [25-48] & qserv-db[11-20]nodes cannot communicate DNS, LDAP, etc. so largely cannot communicate with other nodes, e.g., no communication between affected verify-worker nodes and the Slurm scheduler on lsst-dev01, no communication between affected qserv-db nodes and the rest of qservnodes (ie: slurm cluster), qserv nodes, sui nodes. | in progress, working to get qserv-db[11-20] connected to other nearby switches as a workaround, replacement switch is already on order UPDATE 2017-10-23 13:28 Borrowing two switches from L1 to put in place of failed switches at NPCF. This will require all qserv, sui and verify-worker nodes to go offline for a period of time while the switches are swapped out. No ETA at this time. |
Include the keyword "issue" in updates above to trigger actions.
...