status.ncsa.illinois.edu
Watch this page in the wiki to subscribe to automatic updates to this status page.
Please do not refer to any NCSA Industry Partners on this page. Please use the iforge nomenclature for all of the *forge infrastructure.
To see older events, see Archive of NCSA Status Home
Report a problem
Current Status
START | END | What System/Service is affected | What is happening? | What will be affected? | Contact Person | Status |
---|---|---|---|---|---|---|
2021-07-26 1800 | 2021-07-22 0700 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu | IN PROGRESS |
Upcoming Scheduled Maintenance
Listed below in chronological order.
Start | End | What System/Service is affected | What is happening? | What will be affected? | Contact Person | Status |
---|---|---|---|---|---|---|
2021-07-21 0800 | 2021-07-21 2900 | ICCP | ICCP Quarterly Maintenance
| All ICCP services | help@campuscluster.illinois.edu | SCHEDULED |
2021-07-27 0600 | 2021-07-27 0900 | Jira | Upgrade | Jira will be unavailable | SCHEDULED | |
2021-07-31 0600 | 2021-07-31 0630 | CILogon hosted services | Infrastructure maintenance | During this time each service hosted by CILogon including COmanage Registry, LDAP, Grouper, SAML proxy, and MDQ will become unavailable for a short time. Each individual service outage will last less than 5 minutes. Services that will not be impacted include: * OIDC clients that do not query LDAP for resolving attributes * X.509 certificate issuance and certificate revocation lists * LIGO and GW-Astronomy services | help@cilogon.org | SCHEDULED |
2021-09-30 0800 | 2021-09-30 1200 | LSST | LSST Quarterly Maintenance
| All LSST services hosted at NCSA | lsst-admin@ncsa.illinois.edu | SCHEDULED |
2021-10-20 0800 | 2021-10-20 2900 | ICCP | ICCP Quarterly Maintenance
| ICCP Cluster nodes only | help@campuscluster.illinois.edu | SCHEDULED |
2021-12-09 0800 | 2021-12-09 1200 | LSST | LSST Quarterly Maintenance
| All LSST services hosted at NCSA | lsst-admin@ncsa.illinois.edu | SCHEDULED |
Previous Outages or Maintenance
Start | End | What System/Service was affected? | What happened? | What was affected? | Contact Person | Status |
---|---|---|---|---|---|---|
2021-07-21 15:24 | 2021-07-21 21:50 | ASD Vshpere cluster in 3003 | One of the 4 hypervisors in the cluster paniced. Unscheduled preventative maintenance is being preformed on it and the other 3 nodes in the cluster. | after the initial outage at 15:24, there should be no additional outages. | help+service@ncsa.illinois.edu | COMPLETE |
2021-07-13 0700 | 2021-07-13 0800 | cilogon.org | Update to OA4MP v5.1.4. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.4. | help@cilogon.org | COMPLETE |
2021-07-08 0800 | 2121-07-08 1000 | OpenAFS | The remaining OpenAFS database servers were upgraded. | No service impacts were seen | help+service@ncsa.illinois.efu | COMPLETE |
2021-07-07 0600 | 2021-07-07 0800 | CILogon AWS Hosted Services | Upgrading AWS RDS Aurora MySQL v5.6 to v5.7 | COmanage Registry and Grouper services hosted by CILogon will be unavailable | help@cilogon.org | COMPLETE |
2021-07-01 2140 | 2021-07-01 1430 | Horizon dashboard access was down for the entire period. Cluster networking was down from 1200 to1430. | Investigations into Horizon dashboard accessibility issues resulted in the application of an incorrect default network gateway for the cluster around noon. This was corrected and networking functionality restored around 1400. Instances began recovering soon thereafter. | Radiant admins believe running instances have recovered on their own but we advise everyone to check their systems and report any issues they see to the help desk. | help@ncsa.illinois.edu | RESOLVED |
2021-07-01 0247 | 2021-07-01 1300 | Various systems in NPCF, ACB, NCSA | There was a power event in the Champaign-Urbana area at around 2:47AM today. Details about the cause are currently unknown. This event caused disruptions to systems at the NCSA building, NPCF and ACB. Known issues have generally been resolved but there may be unidentified issues lingering. If you encounter any problems, please notify NCSA help desk staff (help@ncsa.illinois.edu). | Multiple systems/services were impacted. All have been recovered and return to normal operations is complete. | NCSA help desk | RESOLVED |
2021-07-01 02:58 CDT | 2021-07-01 06:00 CDT | ACHE and NGALE bastion hosts | Loss of power. | All ache-* services, ngale bastion hosts | help@ncsa.illinois.edu | RESOLVED |
2021-06-29 22:00 | 2021-06-29 23:59 | NCSA 4th Floor Office network | Rebooting one or more of the office switches on the NCSA Building 4th floor to resolve a phone issue. | Office port connectivity will be intermittent during the maintenance window. | Matt Kollross help+neteng@ncsa.illinois.edu | RESOLVED |
2021-06-24 0800 | 2021-06-24 1345 | LSST |
| Prod/Stable K8S | lsst-admin@ncsa.illinois.edu | RESOLVED |
2021-06-24 0800 | 2021-06-24 1200 | LSST | LSST Quarterly Maintenance
| All LSST services hosted at NCSA EXCEPT Prod/Stable K8S | lsst-admin@ncsa.illinois.edu | COMPLETE |
2021-06-22 0000 | 2021-06-22 0400 | Internet2 WAN link | Internet2 will be migrating NCSA's physical port to their new next generation infrastructure. | During the maintenance, our I2 connection will be down. Traffic will reroute to other connections. Some point to point connections maybe unavailable for period of time. The maintenance window is not expected to take all 4 hours. | Matt Kollross help+neteng@ncsa.illinois.edu | COMPLETE |
2021-06-21 1800 | 2021-06-22 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2021-06-17-0700 | 2021-06-17-0820 | OpenAFS | The OpenAFS database server kaskaskia was upgraded | No service outages were observed or reported. | help+service@ncsa.illinois.exdu | COMPLETE |
2021-06-12 2200 | 2021-06-15 1500 | LSST Firewall | The NPCF secondary firewall was offline due to a hard drive failure. | No impact occurred to production services as the primary firewall stayed online. | RESOLVED | |
2021-06-14 1700 | 2021-06-15 0958 | NCSA GitLab | Attempt to fix an authentication bug for a particular user accidentally broke all authentication through the web interface, | Authentication through the web interface did not work. | help+service@ncsa.illinois.edu | RESOLVED |
2021-06-11 | 2021-06-11 0905 | NCSA Jira | Jira email problem | Jira is not accepting issues via email, you can still create issue directly via Jira GUI | RESOLVED | |
2021-06-10 0700 | 2021-06-10 0800 | cilogon.org | Update to OA4MP v5.1.3. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.3. | help@cilogon.org | COMPLETE |
1000 | 1030 | Jira.ncsa.illinois.edu | Configuration change to address a vulnerability | There should not be any service interruption, but as with all things, it is possible | help+service@ncsa.illinois.edu | RESOLVED |
2021-06-02 | 2021-06-02 | Netdot | Netdot web access now requires 2FA via SSL VPN, or Cerberus proxy. | Security requested that Netdot require 2FA, in order to access the web interface. To accommodate that request, the Netdot firewall has limited web access to the VPN subnet or via proxy from the Cerberus jump hosts. | Matt Kollross | RESOLVED |
2021-05-25 | 2021-05-26 | vcenters for ache and ASD | emergency security updates were applied. | the administrative interface was off-line for about 20 minutes as the updates were installed. | help+service@ncsa.illinois.edu | RESOLVED |
2021-05-26 1000 | 2021-05-26 1030 | VoIP phones at NPCF | Migrating the VoIP networks to a campus IP to enable future migrations by tech services. | After the networks are migrated, a reboot all phones at the NPCF building will be performed. | Matt Kollross | RESOLVED |
2021-05-21 1800 | 2021-05-21 1900 | VoIP phones at the NCSA building | Migrating the VoIP networks to a campus IP to enable future migrations by tech services. | After the networks are migrated, a reboot all phones at the NCSA building will be performed. | Matt Kollross | RESOLVED |
2021-05-20 05:40 | 2021-05-20 08:45 | LSST | ESXi host outage causing degradation of select services. | Degradation of select services:
Also loss of redundancy for some underlying services, including auth/access & k8s head nodes. | lsst-admin@ncsa.illinois.edu | RESOLVED |
2021-05-15 0600 | 2021-05-15 0800 | CILogon hosted services including COmanage Registry, LDAP, SAML proxy, SAML AA, MDQ | Maintenance | All CILogon hosted services were temporarily unavailable. | help@cilogon.org | COMPLETE |
2021-05-12 07:00 | 2021-05-12 08:00 | NCSA Internal Web Server Upgrade (aka Savannah or MIS Tools) | Updates were made that will affect the availability of the NCSA internal website and Savannah system. The system was be unavailable during this time. | COMPLETE | ||
2021-05-11 07:00 | 2021-05-11 19:00 | iForge | Quarterly Maintenance | All systems unavailable | COMPLETE | |
2021-05-06 0900 | 2021-05-06 0945 | WAN Link Migration | NCSA Neteng migrated the WAN link to Internet 2 to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. Any connections relying on layer-2 connections over AL2S saw a brief blip as the connection is cut over. Affected parties were contacted in advance. | help+neteng@ncsa.illinois.edu | COMPLETE |
2021-05-03 0600 | 2021-05-03 0630 | CILogon Multi-tenant COmanage Registry | Upgrade to version 3.3.2 | The service at https://registry.cilogon.org was unavailable | help@cilogon.org | COMPLETE |
2021-04-29 1600 | 2021-04-29 1700 |
| Add new nodes into Condor service pools |
| lsst-admin@ncsa.illinois.edu | COMPLETE |
2021-04-21 08:00 | 2021-04-21 20:00 | ICCP | ICCP Quarterly Maintenance | The scheduler will be down. All compute nodes will be converted to rhel7.9 with RedHat IB. | COMPLETE | |
2021-04-15 1600 | 2021-04-15 1700 | NCSA Opensource | Upgrade of OS on all machines related to opensource | jira, wiki, git etc hosted at https://opensource.ncsa.illinois.edu/ | kooper@illinois.edu | COMPLETE |
2021-04-15 12:25 | 2021-04-15 14:45 | ICI vmware | Several hosts on the vmware service were experiencing timeouts
| no or intermittent connectivity to these hosts | help+service@ncsa.illinois.edu | RESOLVED Root cause is still being investigated. |
2021-04-15 0900 | 2021-04-15 0942 | CMDB | Applying new certificates and restarting services | CMDB, including web interface, will be down briefly during the update. | ncsagroup+org_itsm@ncsa.illinois.edu | RESOLVED |
2021-04-15 0900 | 2021-04-15 0920 | WAN Link Migration | NCSA Neteng will migrated the WAN link to ESnet to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-04-14 15:00 | 2021-04-14 15:00 | git.ncsa.illinois.edu | Users can no longer access repositories from git clients over HTTPS using their NCSA password. | NCSA passwords can not access repositories with Git clients. Instead use ssh keys over SSH or personal access tokens over HTTPS. We thought this went into effect during git changes on Nov 2, 2020 but discovered it was still working until we made changes to GitLab to fully remove LDAP functionality. | help+service@ncsa.illinois.edu | COMPLETE |
2021-04-13 1415 | 2021-04-13 1845 | git.ncsa.illinois.edu | The GitLab website at git.ncsa.illinois.edu was having issues with authentication. The LDAP server that it uses was timing out. |
| help+service@ncsa.illinois.edu | RESOLVED |
2021-04-13 0800 | 2021-04-13 0830 | cilogon.org | Update to OA4MP v5.1.1. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.1. | help@cilogon.org | COMPLETE |
2021-04-12 1800 | 2021-04-12 2245 | File & Print Servers | Monthly Windows File & Print Server Maintenance | Windows File Shares such as HR, Business Office, Home, etc. and printing in the NCSA & NPCF buildings were unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2021-04-10 0600 | 2021-04-10 0800 | CILogon hosted COmanage, Grouper, SATOSA, LDAP | On Saturday, April 10, the CILogon team will perform maintenance on the infrastructure used for hosted services. | As part of the maintenance all COmanage Registry, LDAP, Grouper, SAML proxy, SAML attribute authority, and MDQ services hosted by CILogon may experience brief outages. We do not expect that any specific service outage will last for more than a minute. | help@cilogon.org | COMPLETE |
2021-04-08 0900 | 2021-04-08 1045 | WAN Link Migration | NCSA Neteng migrated the WAN link to ICCN Node-1 to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. Issues were noticed by users during the outage and are currently being investigated in cooperation with our upstream provider. | help+neteng@ncsa.illinois.edu | COMPLETE |
2021-04-08 0730 | 2021-04-08 0734 | NCSA Wiki | NCSA's Wiki service was restarted | NCSA's Wiki service was restarted to apply a new SSL certificate and renewed Confluence license. The wiki was not available for 4 minutes while it reloaded. | help+service@ncsa.illinois.edu | COMPLETE |
2021-04-07 1610 | 2021-04-07 1733 | Internal Savannah/MIS website | The Savannah/MIS website would not load due to a corrupted MySQL database table referenced across all of the Savannah tools. | Internal/Savannah | help+service@ncsa.illinois.edu | RESOLVED |
1st report 7:30am Monday | 8:19am Monday | NCSA LDAP2 | ldap2 is not responsive to authentication requests | NCSA Jira, any systems using LDAP2 as its only source. | help+service@ncsa.illinois.edu | RESOLVED |
2021-03-30 0800 | 2021-03-30 0845 | DNS1 | A software issue was causing BIND to fail. | DNS was not able to resolve during the period of time. DNS2 remained operational. | neteng+help@ncsa.illinois.edu | RESOLVED |
2021-03-23 2000 | 2021-03-23 2025 | NCSA VPN | The standby VPN hardware was replaced and transitioned into the current VPN cluster. Failover went as expected and firmware was upgraded on the primary after load was shifted to the new standby VPN. | Failover between the appliances occurred without issue and there was no impact to users. | neteng@ncsa.illinois.edu | RESOLVED |
2021-03-18 1230 | 1255 | Jira | Some functionality will be limited due to user limit being reached | Jira | help@service@ncsa.illinois.edu | RESOLVED |
~16:40 | 17:58 | AnyConnect VPN Service | An issue with SSL on the VPN service has caused an issue that has disconnected all users. Network engineering is looking into the issue. Due to a hardware failure and the VPN not failing over properly to the standby users were unable to connect to the VPN. This was due to an issue with syncing certificates. | During the outage, expect that you won't be able to connect/maintain a connection to the VPN | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-03-16 0950 | 2021-03-16 1000 | CMDB | Will be applying updates per security vetting | CMDB, including web interface, will be down briefly during the update. | ncsagroup+org_itsm@ncsa.illinois.edu | RESOLVED |
2021-03-11 | 2021-03-11 | WAN Link Migration | NCSA Neteng migrated the link to ICCN to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-03-04 | 2021-03-04 | WAN Link Migration | NCSA Neteng migrated the 100G link to MREN to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-03-01 22:11 | 2021-03-01 22:47 | NCSA vSphere | About 40 VMs lost connection to their NFS storage. | Several VM-based services were timing out during the issue, including: vSphere management, a kerberos replica, a ldap replica, httpproxy, license servers, NCSA fileserver, Identity message queuing, monitoring. That triggered some of those VMs to switch to use read-only disk, needing to be rebooted later. | service@ncsa.illinois.edu | RESOLVED |
?