Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Previous Outages or Maintenance

2022-03-17 09002022-04-12
1030
jiraldap auths have been sporadically failing.  This service is being monitored to determine a root cause.Jira logins breakhelp+service@ncsa.illinois.edu

RESOLVED

2022-04-12 09002022-04-12 0930vsphere.ncsa.illinois.eduvcenter security updates are being installed vm management interface will be unavailable for 15 mins.help@ncsa.illinois.edu

COMPLETE

2022-04-07 19002022-04-07 1950NCSA VPNSoftware Upgrades / SSL CertificateThe appliances hosting the NCSA VPN were patched and receive an updated SSL certificate. Users will experience a brief disconnect as load is failed over between the appliances.neteng@ncsa.illinois.edu

RESOLVED

2022-04-06 22002022-04-07 0000Some office ports on the second floor. Once of the switches on the second floor is experiencing a software problem and is currently down.  Code updates are being applied.One of the six switches on the second floor is down.  Users who are connected to this port, might not receive link.help+neteng@ncsa.illinois.edu

RESOLVED

2022-04-06 15302022-04-07 0630All systems which mount/utilize TaigaA bug involving the multirail functionality caused constant reboots with one of the metadata servers. This resulted in cluster de-stabilization and loss of function.All lustre/NFS mountpoints to Taiga, Globus to Taiga.help@ncsa.illinois.edu

RESOLVED

2022-04-04 09302022-04-04 1000NCSA LDAPInstantiation of Delta resource OU branch in the NCSA LDAP database with replication testing.No impacts to properly configured systems or searches is expected.help@ncsa.illinois.edu

COMPLETE

2022-04-01 06002022-04-01 0700NCSA GitLabGitLab was updated to latest versionAll GitLab services was unavailable for a few minutes.help+service@ncsa.illinois.edu

COMPLETE

2022-03-23 10002022-03-23 1600Email ListsEmail lists (lists.ncsa.illinois.edu) are not functioning

Ability to send to email lists.

Note: Bounced emails will need to be resent.

help+service@ncsa.illinois.edu

COMPLETE

2022-03-22
0730hrs
2022-03-22
0915hrs
ldap - NCSA primary serverOS updates and replication changesNCSA LDAP primary server will be unavailable, replicas should remain accessibleTimothy Bouvet 

COMPLETE

2022-03-21 08002022-03-21 0830cilogon.orgMigrate CILogon Services to AWScilogon.org , demo.cilogon.org , crl.cilogon.orghelp@cilogon.org

COMPLETE

2022-03-19 01002022-03-19 1500Campus ClusterCooling units at ACB stopped functioning, temperatures in the datacenter soared to cause machines to power off due to high temps. By the time ICI was informed, cooling had resumed at ACB. ICI then restored serviceAll of Campus Clusterhelp@campuscluster.illinois.edu

RESOLVED

2022-03-17 11002022-03-17 1123ASD and ACHE vsphere clusters and ldap1 and ldap2certs on ldap1 and ldap2 were updatedlogins to ASD and ACHE vsphere were down for 23 minutes.help@ncsa.illinois.edu

COMPLETE

2022-03-17
09:08

2022-03-17
10:01
JiraLogins are slow or unsuccessfulJira login

RESOLVED

2022-03-16 17002022-03-16 1800DNS1Hardware replacement on DNS1 server.DNS lookups will be on own the primary DNS server while the hardware is being swapped.  DNS2 will remain up.help+neteng@ncsa.illinois.edu

COMPLETE

2022-03-14 18002022-03-15 23:45NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu

COMPLETE

2022-03-10 0700hrs2022-03-10 1500hrsDistribution panel DP-5C-1020. Power feed C to the north east corner power panelsDe-energizing electrical distribution panel  DP-5C-1020 to tie in power cables to Holl-I system

Known resources impacted:

Granite: already planned to be offline for maintenance

iForge: cluster offline for the duration

Radiant: cluster online, without power redundancy

help@ncsa.illinois.eduCOMPLETE


2022-03-09 07002022-03-09 0810linux.ncsa.illinois.edu
(aka public-linux)
Upgrade server to RHEL 8 and add NCSA Duo 2FA authenticationServer was unavailable during maintenance.help+service@ncsa.illinois.edu

COMPLETE

2022-03-02
930
2022-03-07
1715
ICC

Emergency PM

We are seeing some network issues on the cluster. In order to resolve these issues, we need to upgrade code on our infiniband infrastructure


UPDATE: We are currently experiencing unforeseen technical issues with the cluster. We are investigating and expect resolution and restoration of all Campus Cluster services by March 3rd 12PM

UPDATE2: We are still experiencing issues where the compute clients will not properly mount storage. We are engaged with vendor support and continue to work on the situation. Thank you for your patience. We have moved expected return to service to March 4th, 12PM

UPDATE3: Campus cluster is experiencing SLURM job failures in certain pods(sections) of the cluster. Investigations continue and there is a partial return to service with login nodes, storage, and data transfer services still operational. New full return of service date: Monday, March 7th, 12PM.

ICCP filesystem will be offline. Most projects will be impacted. Special arrangements have been made with some to be able to operate to some degree during the outage.help@campuscluster.illinois.edu

COMPLETE

2022-03-02 1237

2022-03-02 1715

iforge (iforge.ncsa.illinois.eduGPFS issue with interruption of filesystem leading to scheduler pause1 running job was aborted, and any new jobs paused during the interruptionhelp@ncsa.illinois.edu 

COMPLETE

2022-03-02
0600
2022-03-02
0630
Jira

Adding Ram
to improve performance

Jira will be unavailable druning maintenance

COMPLETE

2022-03-01
1800
2022-03-01
1810
ldap2 server clients of
NCSA LDAP

on-line maintenance

restart rsyslog and Ldap after relocating /var/logs clients should have redundant servers configuredTimothy Bouvet 

COMPLETE

2022-02-28
1800
2022-02-28
1830
ldap1 server clients of
NCSA LDAP

on-line maintenance

Had to restart rsyslog and Ldap after relocating /var/log

slow response from ldap1 but clients should have redundant servers configuredTimothy Bouvet 

COMPLETE

2022-02-28
0900
2022-02-28
1030
CMDBV1.7.20220228 ReleaseMDB database will be unavailable. ITSM's openDCIM will be down for a short period (~ 5 minutes) while the data is reloaded.

kimber7@illinois.edu

COMPLETE

2022-02-26 07302022-02-26 0750NCSA GitLabGitLab was updated to latest versionAll GitLab services were unavailablehelp+service@ncsa.illinois.edu

COMPLETE

2022-02-25-10:002022-02-25-13:00Taiga - CenterWide FSFull file system outageAll clients mounting Taiga

COMPLETE

2022-02-09 1400

2022-02-25 1030Jira, Internal/Savannah, LDAP, POP, Hosted web servers, virtual classroom, vcenter

The NCSA VMWare cluster is experiencing storage performance issues.

-- Update: Adjustments have been made to storage used by the LDAP servers and other non-essential VM instances have been disabled. Testing is indicating that response times have improved and services are working normally again.

We monitoring services. Please report any issues to help@ncsa.illinois.eduTimothy Bouvet 

RESOLVED FOR NOW

2022-02-24 10002022-02-24 1115

cerberus2.ncsa.illinois.edu, tg-kdc1.security.ncsa.illinois.edu, bwbh2.ncsa.illinois.edu

One of the IRST ESXi machines unexpectedly shutdown.The listed hosts are currently unavailable

COMPLETE

2022-02-23 17002022-02-23 1900DNS2DNS2 hardware will be replaced.There will be a brief outage of DNS2, while IP's are migrated to the new server.help+neteng@ncsa.illinois.edu

COMPLETE

2022-02-22: 08252022-02-22: 1324Slack

Info from Slack (https://status.slack.com/)

We've resolved the issue, and all impacted customers should now be able to access Slack. You may need to reload Slack (Cmd/Ctrl + Shift + R) to see the fix on your end. If that doesn't work, try clearing cache (Help > Troubleshooting > Clear Cache and Restart from the app menu). Thanks for bearing with us and we apologize for the disruption to your work day!

Feb 22, 1:24 PM CST

We're seeing signs of improvement. Please try reloading Slack, and if not a cache reset. We’re still monitoring the situation. We’ll confirm once this issue is fully resolved.

Feb 22, 11:07 AM CST

Slack is not loading for some users. We are continuing to investigate the cause and will provide more information as soon as it's available.

Feb 22, 9:23 AM CST

We're still working towards a full resolution. We'll be back with another update soon. Thank you for your patience.

Feb 22, 8:44 AM CST

We’re investigating the issue where Slack is not loading for some users. We’re looking into the cause and will provide more information as soon as it's available.

Feb 22, 8:25 AM CST

Various issues accessing and using Slackhelp@ncsa.illinois.edu

COMPLETE

2022-02-18 12:10PM

2022-02-18
2PM


Jira

Reboot to add ram/swap

This is to improve stability


Jira tickets unavailableTimothy Bouvet 

COMPLETE

2022-02-10 10302022-02-18 3:55pmNgale filesystem

The Lustre filesystem is not loading correctly. The support team has been contacted.

Still in progress. MDT0001 is partially recovered. Vendor is attempting to fully restore.

Near completion: Working with vendor on additional configuration changes. Hope to complete final validation and return to service by close of business 2022-02-18.

/ngale filesystem is not accessible. Peter Hartman 

COMPLETE

2022-02-18 12:10PM

2022-02-18
2PM


Jira

Reboot to add ram/swap

This is to improve stability


Jira tickets unavailableTimothy Bouvet 

COMPLETE

2022-02-14

1PM

2022-02-14

4:15PM

All NCSA LDAP serversExpanding schema and restarting serverssystems will reconnect to LDAP server after restart

COMPLETE

2022-02-09

1000

2022-02-09

1200

Facility UPSUPS DC voltage calibrationUPS will be taken to maintenance bypass and all connected  systems will be fed from unprotected power source (no power interruption).rantissi@illinois.edu

COMPLETE

2022-02-09 09002022-02-09 0940Line card failure in Core-EastLine card failure in Core-east, which is resulting in connectivity issues for some infrastructure in NCSA 3003.DNS2, and LSST systems in 3003 were down until the uplinks could be migrated to a new port on Coreshelp+neteng.ncsa.illinois.edu

COMPLETE

2022-02-01
8AM
2022-02-01
4PM
Jira/ldap-auth1login issuesJira Access
2022-02-09 05342022-02-09 0811

LDAP (and dependent services, incl. Jira)

vSphere/ICI VMware

Authorization timeouts/failures in dependent services.

ICI staff are investigating.

LDAP (and dependent services, incl. Jira)

vSphere/ICI VMware

Cause of most severe issues was power fluctuations around 0555, but certain LDAP servers showed degraded slightly earlier.


COMPLETE

2022-02-09 06002022-02-09 0645NCSA MySQL

MySQL database servers need to be synchronized to bring replicated database servers online.

NOTE: The MySQL database is back up, but users may experience issues due to an LDAP issue.

Wiki, JIRA, Savannah/Internal, Identity, and some web sites will stop working. More details are linked here.

help+service@ncsa.illinois.edu

COMPLETE

2022-02-08
7AM

22-02-08

3:15PM

iforge / vforge / license serversRegular Maintenanceiforge, vforge, license servers

COMPLETE

2022-02-08 10002022-02-08 1245CMDBV1.6.20220207 ReleaseCMDB database will be unavailable. ITSM's openDCIM will not be impacted.kimber7@illinois.edu

COMPLETE

2022-02-04 06002022-02-04 0640NCSA GitLabGitLab was updated to latest versionAll GitLab services were unavailablehelp+service@ncsa.illinois.edu

COMPLETE

2022-02-01 08002022-02-01 0900cilogon.orgUpdate to OA4MP v5.2.4Improvements in the back-end servicehelp@cilogon.org

COMPLETE

2022-01-252022-01-25Facility UPSReplace UPS batteriesAll systems with facility UPS feedrantissi@illinois.edu

COMPLETE

2022-01-24 18002022-01-24 20:00NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help+service@ncsa.illinois.edu

COMPLETE

2022-01-24  04002022-01-24 0630Failed line card on neo-hpc-1 switch

Line card failure is affecting devices that are plugged into Neo-hpc-1 aggregation switch.  We've migrated links off the failed card, to other ports on the same switch.

No services are currently impacted.

help+neteng@ncsa.illinois.edu

IN PROGRESS

2022-01-19 08002022-01-19 2000ICCICC Quarterly MaintenanceAll ICC services

help@campuscluster.illinois.edu

COMPLETE

2022-01-18 08002022-01-18 0830cilogon.orgUpgrade MyProxy CA servers to CentOS 7Upgrade back-end MyProxy CA VMs from CentOS 6 to CentOS 7. No downtime is expected.help@cilogon.org

COMPLETE

2022-01-14 06002022-01-14 1715Business IT database had bad data.A database that NCSA mirrors from campus changed without notice breaking our MIS system. Business IT isolated the issue and corrected the data.Multiple complex systems have been affected by this data corruption issue.help+service@ncsa.illinois.edu

RESOLVED

2022-01-14 08002022-01-14 1720NCSAnet wirelessNCSAnet Wireless was unavailable due to bad data in ldapUsers couldn't connect to the NCSAnet wireless networkhelp+neteng@ncsa.illinois.edu

RESOLVED

2022-01-05 11002022-01-05 1145CMDBVersion V1.5.20211223 releaseCMDB database will be unavailable for a few moments; openDCIM will be unavailable for a few moments.kimber7@illinois.edu

COMPLETE

2021-12-20 18302021-12-20 2030JIraVersion Upgrade to address security issueJira will be unavailablehelp+service@ncsa.illinois.edu

COMPLETE

2021-12-17 13002021-12-17 1340CMDBVersion V1.4.20211217 releaseCMDB database will be unavailable for a few moments; openDCIM will not  be affected.

kimber7@illinois.edu

COMPLETE

2021-12-17 06002021-12-17 0622NCSA GitLabThe server was updated with some new Puppet configurations.GitLab services was unavailable for a few minutes as the SSL certificate for the service was updated.help+service@ncsa.illinois.edu

COMPLETE

2021-12-16 14002021-12-16 1430HTTP web proxy: httpproxy.ncsa.illinois.eduNCSA's general purpose HTTP web proxy server was rebuilt.HTTP web proxying through httpproxy was unavailable.help+service@ncsa.illinois.edu

COMPLETE

2021-12-10 07002021-12-10 1345iForgeInfiniBand switch maintenanceAll systems unavailableiforge-admin@lists.ncsa.illinois.edu

COMPLETE

2021-12-10 09002021-12-10 1000Bastion Hosts (Production group B)Patching out of cycleBastion Hosts (Production group B) were individually unavailable during reboothelp+security@ncsa.illinois.edu

COMPLETE

2021-12-09 09002021-12-09 0931Bastion Hosts (Production group A)Patching out of cycleBastion Hosts (Production group A) were individually unavailable during reboot

COMPLETE

2021-12-09 08002021-12-09 0900All IDDS servicesIDDS Postgres and Ruby on Rails upgradesAll IDDS servicestolbert@illinois.edu

COMPLETE

2021-12-09 06002021-12-09 0613NCSA GitLabGitLab was updated to latest versionAll GitLab services were unavailable for about 5 minuteshelp+service@ncsa.illinois.edu

COMPLETE

2021-12-07
1400
2021-12-07
1443
LSST

Kubernetes on NTS is not working properly after updates

Kubernetes on NCSA Test Standlsst-admin@ncsa.illinois.edu

RESOLVED

2021-12-07
0800
2021-12-07
1400
LSST

LSST Quarterly Maintenance

All LSST services hosted at NCSAlsst-admin@ncsa.illinois.edu

COMPLETE

2021-12-07

0930

2021-12-07

1030

ACHE Firewallssoftware maintenanceFirewalls will be upgraded using fail over procedures  - no traffic impact expectedJames Eyrich - eyrich on slack

COMPLETE


2021-11-30 0900

2021-11-30

1100

TechServices connectivity at NPCF (wireless, facilities, IRIS, Prox scanners).Tech Services will be replacing several network devices at NPCF that will impact a variety of services at NPCF.  The Tech Services will be replacing 3 devices at NPCF.  Along with sporadic wireless outages, some facilities networks (such as IRIS and card readers) will be offline while some equipment is replaced.  The main router replacement should only take 5 mins or so.  The wireless switches will take 15-20 mins each.help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-11-30 18002021-12-01 00:15NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-11-19 12:522021-11-19 13:22lsst-esx08server crashed

The following VMs rebooted:

Idap-Isst-ncsa3
Isst-condordev-cm01
Isst-condordev-sub01
Isst-git
Isst-influxdb-0
Isst-kubh02
Isst-kubh05
Isst-kubh08
Isst-login03
Isst-logintest01
Isst-ora-dbm01
Isst-pup-npcf
Isst-ss-cfg02
Isst-telegraf-0

lsst-admin@ncsa.illinois.edu

Status
colourGreen
titlerecovered

2021-11-18 14002021-11-18 1750ICI Metrics & AlertsMigration to RHEL 8, ASD Puppet control, & CILogon authenticationThe viewing of ICI dashboards and the firing of ICI alerts was unavailable during this migrationmalone12@illinois.edu, bglick@illinois.edu

Status
colourGreen
titlecomplete

2021-11-11 09252021-11-11 0940NCSA websiteCommunications launched the newly redesigned NCSA site.During launch, you may experience some down-time between while NCSA’s technical team re-points the URL to the new site.communications@lists.ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-11-09 07002021-11-09 1545iForgeQuarterly MaintenanceAll systems unavailableiforge-admin@lists.ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-11-03

0000

2021-11-04Netdot SSL CertificateThe SSL certificate for Netdot expired and network engineering replaced it with a new one. SSL certificate expired. Service remained available throughout the periodhelp+neteng@ncsa.illinois.edu
Status
colourGreen
titlecomplete


2021-11-03

1100

2021-11-03

1400

ESnet 100G link migration. ESnet engineers will be migrating NCSA's 100G link to the new ESnet6 infrastructure. The link will be down during the migration.  Traffic will fall back to alternative paths. help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-11-03

1100

2021-11-03

1120

NCSA GitLabGitLab was updated to latest version.All GitLab services were be unavailablehelp+service@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-11-03 10002021-11-03 1020Core Router Linecard ReplacementNeteng replaced a linecard in one of the core routersAll connections to this linecard are redundant and no outage has been reported.neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-11-02 15:202021-11-02 16:37Production version of DCIM for CMDB (https://ncsa-cmdb.ncsa.illinois.edu)Invalid certificate issue(Fixed) 
The production version of CMDB will be unavailable until new certificate is received and applied. 

In the interim, the test server (https://ncsa-cmdb-test.ncsa.illinois.edu) has been made available for use, with all current data.
Kimber Blum (kimber7@illinois.edu)

Status
colourGreen
titlecomplete

2021-11-02 08002021-11-02 0900cilogon.orgUpdate to OA4MP v5.2.3Address several small issues in the back-end servicehelp@cilogon.org

Status
colourGreen
titleCOMPLETE

0600

0710

JiraJira UpgradeJirahelp+service@illinois.edu

Status
colourGreen
titleComplete

2021-10-25 18002021-10-26 0018NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-10-20 08002021-10-20 1800ICCP

ICCP Quarterly Maintenance

  • VLAN Change for IPMI network
  • OS update
ICCP Cluster nodes onlyhelp@campuscluster.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-10-20 07002021-10-20 0715IDDSIDDS maintenance (puppet changes)All IDDS servicesidds-admin@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-10-15 12302021-10-15 0713NCSA GitLabServer ran out of disk spaceAll GitLab services were unavailablehelp+service@ncsa.illinois.edu

Status
colourGreen
titleresolved

2021-10-11 08002021-10-11 1900Nightingale, ACHEPlanned maintenance on the Nightingale cluster and the ache-dist switchThere was an outage for the following services during the maintenance:
  • ALL Nightingale hosts/services
  • ALL firewalled traffic in/out of ACHE, which includes admin access & monitoring in/out of ALL of ACHE (this portion was complete by 1140)
    • network access to ALL of the ache-esxi-hosted VMs, including ache- and ngale-bastion hosts
    • ACHE FW IPMI interfaces
help+service@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-10-04 10002021-10-04 1005www.ncsa.illinois.edu per-user web directoriesPer-user web directories on the main NCSA website are being redirected to a new website dedicated to per-user web directories.URLs like www.ncsa.ncsa.illinois.edu/People/* are redirected to their new home at https://users.ncsa.illinois.edu/*.help+service@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-09-30
0800
2021-09-30
1200
LSST

LSST Quarterly Maintenance

  • OS updates
  • K8S updates
All LSST services hosted at NCSAlsst-admin@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-09-29 08002021-09-29 0900cilogon.orgUpdate to OA4MP v5.2.2Update Java database libraries, and address several small issueshelp@cilogon.org

Status
colourGreen
titleCOMPLETE

2021-09-29 08002021-09-29 0813CMDB / openDCIMInstalling/upgrading to CMDB release Sep2021The openDCIM front end of CMDB will be down for 15-30 minutes

Status
colourGreen
titleCOMPLETE

2021-09-28 07002021-09-28 1554NPCF work on facility powerDeenergizing power to transformer TX-4C-1020, pulling and terminating busduct cabling from transformer to room 2020. One third of Sonexion racks will lose source 1 power (Feed C) and will continue to operate on source2 degrading reliability by losing power redundancy.

Status
colourGreen
titleCOMPLETE

2021-09-28 07002021-09-28 0900Blue WatersA rack of scratch lost power during the power outage.Scratch was partially unavailable due to TOR power resiliency issue.

Status
colourGreen
titleCOMPLETE

2021-09-28 08002021-09-28 0900idp.ncsa.illinois.eduAssert eduPersonAssurance Cappuccino profile for NCSA StaffNCSA Staff logging in with the NCSA Identity Provider will be able to get Silver CA certificates from cilogon.orghelp+idp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-21-14:502021-09-21-15:02vcenter appliance controlling ASD vspherevcenter appliance was upgradedvsphere.ncsa.illinois.edu was off-line for 12 minutes.help+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-21 07002021-09-20 1115Blue WatersPower Work caused non redundant switches and misconfigured servers to shutoffBlue Waters Compute, Login and Schedulerbw-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-20 1800

2021-09-20 2130

NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-14 00002021-09-14 0600Internet2 WAN circuitInternet2 will be migrating our WAN circuit to new hardware. Traffic over that path will reroute while the change happens.  We anticipate the migration to take less than 30 mins.help+neteng@ncsa.illinois.edu

Status
colourGreen
titleComplete

 0600

 0900

WikiUpgrade to next versionWiki will be unavailable

help+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-09 06002021-09-09 0700NCSA VPNSoftware UpgradesThe appliances hosting the NCSA VPN will be patched. Users will experience a brief disconnect as load is failed over between the appliances.help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-08 13002021-09-08 1400Group prod_b Bastion hostsOut of cycle patchingBastion hosts in group prod_b will be patched and rebooted. (see MOTD for group assignment)help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-08 09002021-09-08 1000Group prod_a Bastion hostsOut of cycle patchingBastion hosts in group prod_a will be patched and rebooted. (see MOTD for group assignment)help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-02 9:30 AM2021-09-02 1PMPDU in rack AA81We are replacing a PDU in NPCF rack AA81All systems in the rack have redundant power connections.  No service outages are expected from this workhelp+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-09-01 07002021-09-01 0800cilogon.orgUpdate to OA4MP v5.2.1Device Authorization Grant Flow transactions will be stored in database rather than in memoryhelp@cilogon.org

Status
colourGreen
titleCOMPLETE

 1200

 1205

WikiSecurity patch is being appliedWiki will be downhelp+service@ncsa.illinois.edu

Status
colourGreen
titleComplete

2021-08-25 9:00am2021-08-25 6:45pmBlue Waters System reboot due to blade fallout coinciding with HSN reroute and SMW not recovering.All jobs interruptedjenos@illinois.edu

Status
colourGreen
titleComplete

2021-08-19 05382021-08-19 0700IRST systems hosted on IRST Node 2Storage controller failure, all VMs taken offlinesome prod_b systems, and non-redundant services.eyrich@illinois.edu

Status
colourGreen
titleRESOLVED

2021-08-19 5:342021-08-19 6:20cilogon.orgStorage controller failure in IRST VM farmcilogon.org was unreachable until we initiated fail-over to our backup servers at NICS.help@cilogon.org

Status
colourGreen
titleCOMPLETE

2021-08-18 11362021-08-18 1156NCSA WikiTest instance caused interference.NCSA Wikihelp+service@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-08-17 05002021-08-17 0700NCSA/NPCF Wide Area NetworkBetween 5:00AM and 7:00 AM CDT on 08/17/2021, Campus ICCN Engineers will be upgrading firmware on the ICCN router 710rtr at the Starlight facility in Chicago.Our peerings with MREN and OmniPoP will go down. All traffic destined for those peerings will reroute via other peerings, so no production impact is expected.help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-08-16 18002021-08-17 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help+service@ncsa.illinois.edu 

Status
colourGreen
titleCOMPLETE

2021-08-12 9:542021-08-12 1012JiraAttempted snapshot of Jira in vSphere was too intensive for the systemJirahelp+service@illinois.edu

Status
colourGreen
titleCOMPLETE

2021-08-10
2000
2021-08-011
0000
Radiant API and Web access

Radiant cluster name change.During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable.  Instances will continue to run and be available over the network with no interruptions.

radiant-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-08-10 07:002021-08-10 17:10iForgeQuarterly MaintenanceAll systems unavailableiforge-admin@lists.ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-08-09 14212021-08-09 1440NCSA WikiDB conflict configuration with Wiki & Wiki-TestNCSA Wiki was unaccessiblehelp+service@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-08-05 10002021-08-05 1030NPCF Core Router - Linecard RebootA problem was identified on one of the line cards in our core router requiring a reboot of the linecard. The linecard was successfully rebooted and we will continue monitoring the hardware for further issues.All connections to this linecard are redundant and there was no impact to users.neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-08-05
0800
2021-08-05
1000
LSST

LSST Emergency OS Patching

LSST services hosted at NCSA except:

  • NTS will remain up (has already been patched)
lsst-admin@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-08-04
0800
2021-08-04
1700
Radiant API and Web access

Installation of new Radiant cluster

Cluster name changes are starting at 1100; This will make the horizon dashboard unreachable.
During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable.  Instances will continue to run and be available over the network with no interruptions.

radiant-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2021-08-04 07002021-08-04 0800cilogon.orgUpdate to OA4MP v5.2.0Added support for Device Authorization Grant Flow (RFC 8628)help@cilogon.org

Status
colourGreen
titleCOMPLETED

2021-08-03
0800
2021-08-03
1700
Radiant API and Web access

Installation of new Radiant cluster


During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable.  Instances will continue to run and be available over the network with no interruptions.

radiant-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2021-08-03 9:00 am2021-08-03 11:30 amRadiant ClusterA change was made to the firewall that unintentionally restricted access for instances and other internal cluster communication.Access to instances and workloadradiant-admin@ncsa.illinois.edu

Status
colourGreen
titleresolved

2021-07-31 06002021-07-31 0630CILogon hosted servicesInfrastructure maintenanceDuring this time each service hosted by CILogon including COmanage Registry, LDAP, Grouper, SAML proxy, and MDQ will become unavailable for a short time. Each individual service outage will last less than 5 minutes. Services that will not be impacted include: * OIDC clients that do not query LDAP for resolving attributes * X.509 certificate issuance and certificate revocation lists * LIGO and GW-Astronomy serviceshelp@cilogon.org

Status
colourGreen
titleCOMPLETE

2021-07-29 13002021-07-29 1400IRST-run bastion hosts (pool B)Security patchingHosts managed by IRST will be patched and rebooted. Only hosts in pool B will be patched at this timehelp+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-07-29 09002021-07-29 1000IRST-run bastion hosts (pool A)Security patchingHosts managed by IRST will be patched and rebooted. Only hosts in pool A will be patched at this timehelp+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-07-28 10002021-07-28 1050LSSTOS Updates on only NCSA Test Stand (NTS)Only the LSST NCSA Test Stand (NTS) services hosted at NCSAlsst-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-07-27 06002021-07-27 0900JiraUpgradeJira will be unavailable

help+serverice@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2021-07-26 18002021-07-27 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu 

Status
colourGreen
titlecomplete

2021-07-21
0800
2021-07-21
2900
ICCP

ICCP Quarterly Maintenance

  • TBD
All ICCP services

help@campuscluster.illinois.edu


Status
colourGreen
titlecomplete

2021-07-21 15:242021-07-21 21:50ASD Vshpere cluster in 3003One of the 4 hypervisors in the cluster paniced.  Unscheduled preventative maintenance is being preformed on it and the other 3 nodes in the cluster.after the initial outage at 15:24, there should be no additional outages.help+service@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2021-07-13 07002021-07-13 0800cilogon.orgUpdate to OA4MP v5.1.4.The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.4.help@cilogon.org

Status
colourGreen
titlecomplete

2021-07-08 08002121-07-08 1000OpenAFSThe remaining OpenAFS database servers were upgraded.No service impacts were seenhelp+service@ncsa.illinois.efu

Status
colourGreen
titlecomplete

2021-07-07 06002021-07-07 0800CILogon AWS Hosted ServicesUpgrading AWS RDS Aurora MySQL v5.6 to v5.7COmanage Registry and Grouper services hosted by CILogon will be unavailablehelp@cilogon.org

Status
colourGreen
titlecomplete

2021-07-01

2140

2021-07-01

1430

Horizon dashboard access was down for the entire period. Cluster networking was down from 1200 to1430.Investigations into Horizon  dashboard accessibility issues resulted in the application of an incorrect default network gateway for the cluster around noon. This was corrected and networking functionality restored around 1400. Instances began recovering soon thereafter.Radiant admins believe running instances have recovered on their own but we advise everyone to check their systems and report any issues they see to the help desk.
help@ncsa.illinois.edu

Status
colourGreen
titleresolved

2021-07-01

0247

2021-07-01

1300

Various systems in NPCF, ACB, NCSA

There was a power event in the Champaign-Urbana area at around 2:47AM today. Details about the cause are currently unknown.  This event caused disruptions to systems at the NCSA building, NPCF and ACB. Known issues have generally been resolved but there may be unidentified issues lingering. If you encounter any problems, please notify NCSA help desk staff (help@ncsa.illinois.edu).

Multiple systems/services were impacted. All have been recovered and return to normal operations is complete.NCSA help desk

Status
colourGreen
titleresolved


2021-06-29 22:00

2021-06-29 23:59

NCSA 4th Floor Office networkRebooting one or more of the office switches on the NCSA Building 4th floor to resolve a phone issue.Office port connectivity will be intermittent during the maintenance window.

Matt Kollross

help+neteng@ncsa.illinois.edu

RESOLVED

2021-06-24
0800
2021-06-24
1345
LSST
  • Updates are being applied on Prod/Stable k8s, rebuild of some ingress nodes
Prod/Stable K8Slsst-admin@ncsa.illinois.edu

RESOLVED

2021-06-24
0800
2021-06-24
1200
LSST

LSST Quarterly Maintenance

  • OS updates on all servers

All LSST services hosted at NCSA

EXCEPT Prod/Stable K8S

lsst-admin@ncsa.illinois.edu

COMPLETE

2021-06-22 0000

2021-06-22 0400

Internet2 WAN linkInternet2 will be migrating NCSA's physical port to their new next generation infrastructure.During the maintenance, our I2 connection will be down.  Traffic will reroute to other connections.  Some point to point connections maybe unavailable for period of time.  The maintenance window is not expected to take all 4 hours.

Matt Kollross

help+neteng@ncsa.illinois.edu

COMPLETE

2021-06-21 18002021-06-22 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu 

COMPLETE

2021-06-17-07002021-06-17-0820OpenAFSThe OpenAFS database server kaskaskia was upgradedNo service outages were observed or reported.help+service@ncsa.illinois.exdu

COMPLETE

2021-06-12 22002021-06-15 1500LSST FirewallThe NPCF secondary firewall was offline due to a hard drive failure.No impact occurred to production services as the primary firewall stayed online.

RESOLVED

2021-06-14 17002021-06-15 0958NCSA GitLabAttempt to fix an authentication bug for a particular user accidentally broke all authentication through the web interface,Authentication through the web interface did not work.help+service@ncsa.illinois.edu

RESOLVED

2021-06-112021-06-11 0905NCSA JiraJira email problemJira is not accepting issues via email, you can still create issue directly via Jira GUI

RESOLVED

2021-06-10 07002021-06-10 0800cilogon.orgUpdate to OA4MP v5.1.3.The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.3.help@cilogon.org

COMPLETE

 1000

 1030

Jira.ncsa.illinois.eduConfiguration change to address a vulnerabilityThere should not be any service interruption, but as with all things, it is possiblehelp+service@ncsa.illinois.edu

RESOLVED

2021-06-022021-06-02NetdotNetdot web access now requires 2FA via SSL VPN, or Cerberus proxy. Security requested that Netdot require 2FA, in order to access the web interface.  To accommodate that request, the Netdot firewall has limited web access to the VPN subnet or via proxy from the Cerberus jump hosts. 

Matt Kollross

help+neteng@ncsa.illinois.edu

RESOLVED

2021-05-252021-05-26vcenters for ache and ASDemergency security updates were applied.the administrative interface was off-line for about 20 minutes as the updates were installed.help+service@ncsa.illinois.edu

RESOLVED

2021-05-26

1000

2021-05-26

1030

VoIP phones at NPCFMigrating the VoIP networks to a campus IP to enable future migrations by tech services.After the networks are migrated, a reboot all phones at the NPCF building will be performed.

Matt Kollross

neteng+help@ncsa.illinois.edu

RESOLVED

2021-05-21

1800

2021-05-21

1900

VoIP phones at the NCSA buildingMigrating the VoIP networks to a campus IP to enable future migrations by tech services.After the networks are migrated, a reboot all phones at the NCSA building will be performed.

Matt Kollross

neteng+help@ncsa.illinois.edu

RESOLVED

2021-05-20 05:402021-05-20 08:45LSST

ESXi host outage causing degradation of select services.


Degradation of select services:

  • data backbone gateway (lsst-dbb-gw01 down)
  • HTCondor (Central Manager nodes down for Prod & DAC)
  • login (lsst-login01 is down)

Also loss of redundancy for some underlying services, including auth/access & k8s head nodes.

lsst-admin@ncsa.illinois.eduRESOLVED


2021-05-15
0600
2021-05-15
0800
CILogon hosted services including COmanage Registry, LDAP, SAML proxy, SAML AA, MDQMaintenanceAll CILogon hosted services were temporarily unavailable.help@cilogon.org

COMPLETE

2021-05-12 07:00

2021-05-12 08:00

internal.ncsa.illinois.edu

NCSA Internal Web Server Upgrade
(aka Savannah or MIS Tools)
Updates were made that will affect the availability of the NCSA internal website and Savannah system. The system was be unavailable during this time.

help+service@ncsa.illinois.edu

COMPLETE

2021-05-11

07:00

2021-05-11

19:00

iForgeQuarterly MaintenanceAll systems unavailable

iforge-admin@lists.ncsa.illinois.edu

COMPLETE

2021-05-06 09002021-05-06 0945WAN Link MigrationNCSA Neteng migrated the WAN link to Internet 2 to new hardware.

Traffic was automatically re-routed to redundant paths during the link outage. Any connections relying on layer-2 connections over AL2S saw a brief blip as the connection is cut over. Affected parties were contacted in advance.

help+neteng@ncsa.illinois.edu

COMPLETE

2021-05-03
0600
2021-05-03
0630
CILogon Multi-tenant COmanage RegistryUpgrade to version 3.3.2The service at https://registry.cilogon.org  was unavailablehelp@cilogon.org

COMPLETE

2021-04-29 16002021-04-29 1700
  • HTCondor Prod
  • HTcondor DAC
Add new nodes into Condor service pools
  • HTCondor Prod
  • HTcondor DAC
lsst-admin@ncsa.illinois.edu

COMPLETE

2021-04-21 08:002021-04-21 20:00ICCPICCP Quarterly MaintenanceThe scheduler will be down.  All compute nodes will be converted to rhel7.9 with RedHat IB.

iccp-admins@campuscluster.illinois.edu

COMPLETE

2021-04-15 16002021-04-15 1700NCSA OpensourceUpgrade of OS on all machines related to opensourcejira, wiki, git etc hosted at https://opensource.ncsa.illinois.edu/kooper@illinois.edu

COMPLETE

2021-04-15

12:25

2021-04-15

14:45

ICI vmware

Several hosts on the vmware service were experiencing timeouts

  • bluewaters
  • bluewaters-test
  • internal
  • its-nagios
  • ldap1
  • vcenter
no or intermittent connectivity to these hostshelp+service@ncsa.illinois.edu

RESOLVED

Root cause is still being investigated.

2021-04-15
0900
2021-04-15
0942
CMDBApplying new certificates and restarting servicesCMDB, including web interface, will be down briefly during the update.ncsagroup+org_itsm@ncsa.illinois.edu

RESOLVED

2021-04-15 09002021-04-15 0920WAN Link MigrationNCSA Neteng will migrated the WAN link to ESnet to new hardware.Traffic was automatically re-routed to redundant paths during the link outage.help+neteng@ncsa.illinois.edu

RESOLVED

2021-04-14 15:002021-04-14 15:00git.ncsa.illinois.eduUsers can no longer access repositories from git clients over HTTPS using their NCSA password.

NCSA passwords can not access repositories with Git clients. Instead use ssh keys over SSH or personal access tokens over HTTPS.

We thought this went into effect during git changes on Nov 2, 2020 but discovered it was still working until we made changes to GitLab to fully remove LDAP functionality.

help+service@ncsa.illinois.edu

COMPLETE

2021-04-13 14152021-04-13 1845git.ncsa.illinois.eduThe GitLab website at git.ncsa.illinois.edu was having issues with authentication. The LDAP server that it uses was timing out.
  • Login to the Git web interface was timing out.
  • Access from git clients continued to work during the outage.
help+service@ncsa.illinois.edu

RESOLVED

2021-04-13 0800

2021-04-13 0830

cilogon.orgUpdate to OA4MP v5.1.1.The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.1.help@cilogon.org

COMPLETE

2021-04-12 18002021-04-12 2245File & Print ServersMonthly Windows File & Print Server MaintenanceWindows File Shares such as HR, Business Office, Home, etc. and printing in the NCSA & NPCF buildings were unavailable.help+service@ncsa.illinois.edu 

COMPLETE

2021-04-10
0600
2021-04-10
0800
CILogon hosted COmanage, Grouper, SATOSA, LDAPOn Saturday, April 10, the CILogon team will perform maintenance on the infrastructure used for hosted services.As part of the maintenance all COmanage Registry, LDAP, Grouper, SAML proxy, SAML attribute authority, and MDQ services hosted by CILogon may experience brief outages. We do not expect that any specific service outage will last for more than a minute.help@cilogon.org

COMPLETE

2021-04-08 09002021-04-08 1045WAN Link MigrationNCSA Neteng migrated the WAN link to ICCN Node-1 to new hardware.Traffic was automatically re-routed to redundant paths during the link outage. Issues were noticed by users during the outage and are currently being investigated in cooperation with our upstream provider.help+neteng@ncsa.illinois.edu

COMPLETE

2021-04-08 07302021-04-08 0734NCSA WikiNCSA's Wiki service was restartedNCSA's Wiki service was restarted to apply a new SSL certificate and renewed Confluence license. The wiki was not available for 4 minutes while it reloaded.help+service@ncsa.illinois.edu 

COMPLETE

2021-04-07 1610

2021-04-07 1733Internal Savannah/MIS websiteThe Savannah/MIS website would not load due to a corrupted MySQL database table referenced across all of the Savannah tools.Internal/Savannahhelp+service@ncsa.illinois.edu

RESOLVED

1st report 7:30am Monday8:19am MondayNCSA LDAP2ldap2 is not responsive to authentication requestsNCSA Jira, any systems using LDAP2 as its only source.help+service@ncsa.illinois.edu

RESOLVED

2021-03-30

0800

2021-03-30

0845

DNS1A software issue was causing BIND to fail. DNS was not able to resolve during the period of time.  DNS2 remained operational. neteng+help@ncsa.illinois.edu

RESOLVED

2021-03-23

2000

2021-03-23

2025

NCSA VPNThe standby VPN hardware was replaced and transitioned into the current VPN cluster. Failover went as expected and firmware was upgraded on the primary after load was shifted to the new standby VPN.Failover between the appliances occurred without issue and there was no impact to users.neteng@ncsa.illinois.edu

RESOLVED

2021-03-18 12301255JiraSome functionality will be limited due to user limit being reachedJirahelp@service@ncsa.illinois.edu

RESOLVED

~16:4017:58AnyConnect VPN Service

An issue with SSL on the VPN service has caused an issue that has disconnected all users. Network engineering is looking into the issue.


Due to a hardware failure and the VPN not failing over properly to the standby users were unable to connect to the VPN. This was due to an issue with syncing certificates.

During the outage, expect that you won't be able to connect/maintain a connection to the VPNhelp+neteng@ncsa.illinois.edu

RESOLVED

2021-03-16 09502021-03-16 1000CMDBWill be applying updates per security vettingCMDB, including web interface, will be down briefly during the update.ncsagroup+org_itsm@ncsa.illinois.edu

RESOLVED

2021-03-11
0900

2021-03-11
0930

WAN Link MigrationNCSA Neteng migrated the link to ICCN to new hardware.Traffic was automatically re-routed to redundant paths during the link outage.help+neteng@ncsa.illinois.edu

RESOLVED

2021-03-04
0900

2021-03-04
0905

WAN Link MigrationNCSA Neteng migrated the 100G link to MREN to new hardware.Traffic was automatically re-routed to redundant paths during the link outage.help+neteng@ncsa.illinois.edu

RESOLVED

2021-03-01 22:112021-03-01 22:47NCSA vSphereAbout 40 VMs lost connection to their NFS storage.Several VM-based services were timing out during the issue, including: vSphere management, a kerberos replica, a ldap replica, httpproxy, license servers, NCSA fileserver, Identity message queuing, monitoring. That triggered some of those VMs to switch to use read-only disk, needing to be rebooted later.service@ncsa.illinois.edu

RESOLVED

...