Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

StartEndWhat System/Service was affected?What happened?What was affected?

Contact Person

Status
2020-01-28 17:002020-01-28 19:00DHCP UpgradeOS updatesDHCP server will be rebooted for all office and wireless networks.  All connected clients will not be affected.  Any new IP requests during the reboot will be delayed.  This shouldn't be impacting for most users.help+neteng@ncsa.illinois.eduCompleted
2020-01-28 17:002020-01-28 19:00Exit-East RouterOS updateMost traffic will be sent via our second router.  Some specific projects may be affected.  Neteng will talk to those projects directly.help+neteng@ncsa.illinois.eduCompleted
2020-01-27 11:542020-01-28 09:36oa4mp.ncsa.illinois.eduan automated CA certificate update caused authentication failuresNCSA RSA authentication to Globus was unavailablehelp+idp@ncsa.illinois.edutemporary work-around in place; proper fix scheduled for 2020-01-29 14:00
(note: oa4mp.ncsa.illinois.edu is scheduled for retirement on 2020-04-01)
2020-01-21 08152020-01-21 0825ldap2ldap2 was returning ldap queries inconsistently so the service was restarted.login to certain services was unusually slow for some users. Jira being the top problem.help+its@ncsa.illinois.eduldap2 queries are working as expected after the restart.
2020-01-16 : 17302020:01-16: 1748Condo NFS serviceNFS exports are failing path resolutioinNFS file system client mountsChad KernerServers rebooted, mounts restored
2020-01-15 08:002020-01-16 01:55ICCP

Quarterly Maintenance

  • Golub IB Core switch FW update
  • Golub 10G Core switch FW update
  • GPFS 5.0.4.1 update
  • Moved golub Rack8 to accommodate expansion
Total outage including export nodes (access to HTC will still available)iccp-admins@campuscluster.illinois.eduComplete
2020-01-15 07:002020-01-15 12:00LSST NCSA Test StandHardware repair in NCSA Test Stand

21 servers in the NCSA Test Stand had their drive backplanes replaced by the vendor.

lsst-admin@ncsa.illinois.edu

Status
subtletrue
colourGreen
titlecomplete

2020-01-06

10:00

2020-01-08

14:30

Code42 Crashplan EndpointsThe Code42 Crashplan servers start edpushing out Code42 Crashplan client updatesAll users of CrashPlan will have their clients upgraded.help+its@ncsa.illinois.eduComplete

2020-01-03 

10:20

2020-01-03 11:20Code42 Crashplan was upgradedSoftware updates to the CrashPlan Auth and Storage servers were appliedBackups were queued while the services restarted.help+its@ncsa.illinois.eduComplete
2020-01-02 11:302020-01-02 17:39NCSA ITS vSphere vCenterVCenter was upgraded to latest patch level. Due to some bugs it took longer to apply than expected.The VMware administrative interface was unavailable during the update.help+its@ncsa.illinois.eduComplete

2019-12-18

08:00

2019-12-18

10:00

Facility infrastructure  Electrical Transformer

TX-5C

Replace defective temperature controller "No Outage"  Production projects on feeder CMO Rantissi

Complete


2019-12-17 06:002019-12-17 10:25JIRAJIRA Upgrade from 7.6 to 8.5All JIRA usershelp+its@ncsa.illinois.edu

Status
subtletrue
colourGreen
titlecomplete

2019-12-12 08:002019-12-12 14:00LSST

Monthly Maintenance:

  • OS updates and reboots
  • GPFS filesystem restructure

ALL LSST systems will be updated, including:

  • lsst-dev01, lsst-xfer, etc.
  • Slurm verification cluster
  • PDAC/Kubernetes/LSP clusters
  • tus-ats01
  • L1 test stand
lsst-admin@ncsa.illinois.edu

Status
subtletrue
colourGreen
titlecomplete


2019-12-12 10:022019-12-12 10:08internal.ncsa.illinois.eduSystem memory was exhausted and OOM killer started killing https connections.Savanna tools were unavailablehelp+its@ncsa.illinois.eduMemory resources for the server were doubled and service was brought back online.
2019-12-10 13:452019-12-10 16:55Internet2 ConnectivityInternet2 Engineers isolated the issue to a malformed route update coming from an external peer to one of its nodes in Ashburn, VA. As this update was propagated throughout the Internet2 Network, it triggered a bug on the Internet2 routers and caused all internal BGP sessions of each router to rapidly flap, thus causing instability across the footprint. Engineers mitigated the issue by placing a filter on the specific peer to reject the malformed packet. The Major Incident has been resolved at this point.Many different external resources, data transfers, sessions, etc. to various destinations.help+neteng@ncsa.illinois.edu

Connectivity has stabilized. Please report any issues should they arise.

2019-12-22019-12-2 afternoonWireless network Tech Services reports they are having authentication issues affecting Wifi and VPN.  Engineers are working on the problem. Tech Services Issue Description.NCSAnet, IllinoisNet wireless are non functional at the moment. NCSA wired network remains available. IllinoisNet_guest is also functional. help+neteng@ncsa.illinois.eduTroubleshooting in progress
2019-11-14 18:002019-11-14 19:00Exit-West RouterSoftware UpgradesThis should not be user impactful.  All traffic will re-route via the other router.help+neteng@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-11-14 5:00 AM2019-11-14 3:30 PMNearline EndpointIssue with one storage librarySome Globus transfers were stalled for the period of the outagebw+storage@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

Nov 7 10:00Nov 7 14:00ICCP.  All login nodes will be down.Reroute some IB cables between Core switches and compute nodes.  Changing topology on Subnet Manager.Scheduler will be pause. No users access to login nodes.  All running jobs will be kill.  help@campuscluster.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-11-05 07:002019-11-05 16:53iForgeQuarterly MaintenanceAll systems will be unavailable during the maintenanceiforge-admin@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-12019-11-1NCSA Windows Domain ControllersITS Migrated all Windows Systems to using the Campus Domain.  The existing NCSA Windows Domain has been decommissioned and shutdown.NCSA Windows Systemshelp+its@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-23

8 a.m.

2019-10-23

12:00 p.m.

Core-West Code upgrades will be performed on Core-West network switch.This should not be user impacting.  All traffic will flow through the redundant Core.neteng+help@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-22 06:12

2019-10-22 07:18

Jira and WikiDuring reboots for system patches the wiki and Jira got stuck in a state that was not providing data to the users.Only web access to these tools was impacted.help+its@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-16 08:002019-10-16 20:30ICC system wideQuarterly maintenanceAll services on ICChelp@campuscluster.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-16

8 a.m.

2019-10-16

12:00 p.m.

Core-East Code upgrades will be performed on Core-East network switch.This should not be user impacting.  All traffic will flow through the redundant Core.neteng+help@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-15 11:45am2019-10-15 11:56AM npcf-exit-east BGP peering flapped over I2 AL2S circuitTraffic got re-routed but some WAN services were impacted as reported by users. help+neteng@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-10 07:00

2019-10-10 07:30

mysql.ncsa.illinois.eduSome table repairs broke replication; this maintenance will update the replicas with newer databases so the service will work as expected again.Wiki, JIRA, and some web sites will stop working.  Email forwarding to user accounts at NCSA will be delayed during the outage.lindsey@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-01


2019-10-03NCSA-Print & Building Printers

Some printers are having issues connecting to the NCSA Print Server.  

After updating drivers on the print server, public printers are working as expected.

Printinghelp+its@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-03 6AM

2019-10-03

7:45AM

Jira and WikiDuring reboots for system patches the wiki and Jira got stuck in a state that was not providing data to the users.Only web access to these tools was impacted.help+its@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

2019-10-01 7AM2019-10-01
8:30PM
Blue WatersNGA work load scheduled testingscheduler testing for NGA workloadDavid King

Status
subtletrue
colourGreen
titleComplete

2019-10-01 10AM2019-10-01
12:04PM
Blue WatersEPO 4 racks lost xdp (cooling)
CRAY warm swapped racks back into system successfully.
scheduler, some computes missing and Gemini was rerouted

Status
subtletrue
colourGreen
titleComplete

2019-10-01 07:00

2019-10-01 07:30

mysql.ncsa.illinois.eduMySQL servers needed to be synchronized to convert the server in NPCF back to a replicated host.Wiki, JIRA, and some web sites stopped working.  Email forwarding to user accounts at NCSA was delayed during the outage.lindsey@ncsa.illinois.edu
Status
subtletrue
colourGreen
titleComplete


...