...
Listed below in chronological order.
|
| | Subset of Taiga native (Lustre) clients | IB Link on tgio11 began failing RDMA traffic causing some I/O interrupt issues on clients leveraging 3 LNET routers. | Access to the file system via these LNET routers is periodically timing out; suspect is bad IB cable. Confirming with vendor. | set@ncsa.illinois.edu |
Start | End | What System/Service was affected? | What happened? | What was affected? | Contact Person | Status |
---|
2021-09-28 0700 | 2021-09-28 0900 | Blue Waters | A rack of scratch lost power during the power outage. | Scratch was partially unavailable due to TOR power resiliency issue. | David King COMPLETE | 2021-09-28 0800 | 2021-09-28 0900 | |
| | VMware migrations | VMware hosts are migrating to a new license model | All VM guest machines and all services should remain operational and accessible. No downtimes are expected. | help@ncsa |
idp.ncsaAssert eduPersonAssurance Cappuccino profile for NCSA Staff | NCSA Staff logging in with the NCSA Identity Provider will be able to get Silver CA certificates from cilogon.org |
| |
| | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help@ncsa |
help+idp@ncsaCOMPLETE202109-21-14:502021-09-21-15:02 | vcenter appliance controlling ASD vsphere | vcenter appliance was upgraded | vsphere.ncsa.illinois.edu was off-line for 12 minutes. | | | IRST services, including systems run on IRST VMWare clusters | moving to upgraded switches/routers
| Systems run by IRST, and any systems on the IRST-run VMWare cluster. Outage is expected to last < 5min. | help+security@ncsa |
help+service@ncsaCOMPLETE20210921 070020210920 1115Blue Waters | Power Work caused non redundant switches and misconfigured servers to shutoff | Blue Waters Compute, Login and Scheduler | SSLVPN | SSLVPN will start using CILogon for authentication and DUO integration. | Four new profiles have been created (duplicating the existing four) but with the name "cilogon" in the name. These new profiles will use the new authentication method. After a few weeks of testing, if no issues are found, we will remove the old profiles on March 20. | help+neteng@ncsa |
bw-admin@ncsaCOMPLETE | 2021-09-20 1800 | 2021-09-20 2130 | NCSA |
| | NCSA GitLab, NCSA Windows File & Print Servers
|
Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsaWeb Redirect Server
| VMs migrating to a new cluster | Affected services will be unavailable for a few minutes. | help@ncsa.illinois.edu | |
COMPLETE | 2021-09-14 0000 | 2021-09-14 0600 | Internet2 WAN circuit | Internet2 will be migrating our WAN circuit to new hardware. | Traffic over that path will reroute while the change happens. We anticipate the migration to take less than 30 mins |
| | NPCF Wifi | Tech Services will be replacing the AP at NPCF. | Tech Services will be replacing the Access Points at NPCF. No user impact is expected. | help+neteng@ncsa.illinois.edu | |
Bluescheduled | | | Wiki | Upgrade to next version | Wiki will be unavailable | |
| | Jira, Wiki, internal.ncsa.illinois.edu, identity.ncsa |
help+service@ncsa Status |
---|
VMs will migrate to a new cluster. | Services will be unavailable for a few minutes (<5 mins) while the VM is shutdown and moved. | help@ncsa.illinois.edu | |
COMPLETE | 2021-09-09 0600 | 2021-09-09 0700 | NCSA VPN | Software Upgrades | The appliances hosting the NCSA VPN will be patched. Users will experience a brief disconnect as load is failed over between the appliances |
| | NCSA Wifi | Tech Services will be replacing the AP at the NCSA building. | Tech Services will be replacing the Access Points at the NCSA building. No user impact is expected. | help+neteng@ncsa.illinois.edu | |
COMPLETE20210908 130020210908 1400Group prod_b Bastion hosts | Out of cycle patching | Bastion hosts in group prod_b will be patched and rebooted. (see MOTD for group assignment) | | vForge / license servers | Quarterly Planned Maintenance | all vForge nodes and services (incl. related license servers/services) will be unavailable | help@ncsa |
help+security@ncsaCOMPLETE | 2021-09-08 0900 | 2021-09-08 1000 | Group prod_a Bastion hosts | Out of cycle patching | Bastion hosts in group prod_a will be patched and rebooted. (see MOTD for group assignment) | |
03/14/2024 0800 | 03/15/2024 2125
- Extended outage due to a problematic upgrade solution.
- Vendor engineers involved
| Taiga/Granite | Semi-Annual Maintenance | All Taiga & Granite Storage Services | set@ncsa |
help+security@ncsaCOMPLETE | 2021-09-02 9:30 AM | 2021-09-02 1PM | PDU in rack AA81 | We are replacing a PDU in NPCF rack AA81 | All systems in the rack have redundant power connections. No service outages are expected from this work | |
| | vsphere.ncsa.illinois.edu console | Upgrade | The vsphere.ncsa.illinois.edu web console. VMs should not be affected | help@ncsa |
help+service@ncsaCOMPLETE20210901 0700202109 Status |
01 0800cilogon.org | Update to OA4MP v5.2.1 | Device Authorization Grant Flow transactions will be stored in database rather than in memory | help@cilogon.org | 08 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help@ncsa.illinois.edu | |
COMPLETE | | | Wiki | Security patch is being applied | Wiki will be down | |
2024-02-22 0640 | 2024-02-22 0648 | NCSA GitLab | GitLab being updated to latest version | All GitLab services will be unavailable for a few minutes. | help@ncsa |
help+service@ncsaBluescheduled | 2021-08-25 9:00am | 2021-08-25 6:45pm | Blue Waters | System reboot due to blade fallout coinciding with HSN reroute and SMW not recovering. | All jobs interrupted | |
| | ACHE vSphere is being upgraded | ACHE vSphere is being upgraded | ACHE vSphere will not be accessible | help@ncsa.illinois |
jenos@illinoisComplete | 2021-08-19 0538 | 2021-08-19 0700 | IRST systems hosted on IRST Node 2 | Storage controller failure, all VMs taken offline | some prod_b systems, and non-redundant services. | |
| | sslvpn.ncsa.illinois.edu | ssl cert is refreshed | Users may need to manually reconnect if the system drops their session | neteng@ncsa.illinois |
eyrich@illinoisRESOLVED2021-08-19 5:34 | 2021-08-19 6:20 | cilogon.org | Storage controller failure in IRST VM farm | cilogon.org was unreachable until we initiated fail-over to our backup servers at NICS. | help@cilogon.org | 02/08/2024 1030 | 02/08/2024 1330 | NCSA Backbone Network Battery Backup | NPCF Network DC Battery Maintenance | Network Engineering is taking the battery back-up servicing NPCF networking equipment offline for periodic maintenance. This will be non-service impacting, as all core networking equipment still has two independent power feeds. | neteng@ncsa |
Status |
---|
colour | Green |
---|
title | COMPLETE |
---|
2021-08-18 1136 | 2021-08-18 1156 | NCSA Wiki | Test instance caused interference. | NCSA Wiki | help+service@ncsaCOMPLETE20210817 0500202108-17 0700NCSA/NPCF Wide Area Network | Between 5:00AM and 7:00 AM CDT on 08/17/2021, Campus ICCN Engineers will be upgrading firmware on the ICCN router 710rtr at the Starlight facility in Chicago. | Our peerings with MREN and OmniPoP will go down. All traffic destined for those peerings will reroute via other peerings, so no production impact is expected. | 02-05 | UIUC Network | Complete / partial network outage | While NCSA network is up and not impacted, much of the UIUC network is currently offline. This could be affecting a broad range of services such as wireless, facility networks, campus websites, etc. No current ETA, as engineers are still troubleshooting the problem. | neteng@ncsa |
help+neteng@ncsaCOMPLETE | 2021-08-16 1800 | 2021-08-17 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help+service@ncsa.illinois.edu |
| | vCenter Server Appliance | Critical patches are being applied | The vcenter.internal.ncsa.edu site will not be accessible. Operating VMs should not be affected | help@ncsa.illinois.edu | |
COMPLETE | 2021-08-12 9:54 | 2021-08-12 1012 | Jira | Attempted snapshot of Jira in vSphere was too intensive for the system | Jira | |
| | HOLL-I | HOLL-I will enter a shuttered/standby mode | All HOLL-I servers and services will no longer be available after standby mode is activated. | help@ncsa.illinois |
help+service@illinoisCOMPLETE2021081020002021-08-011 0000 | Radiant API and Web accessRadiant cluster name change. | During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable. Instances will continue to run and be available over the network with no interruptions. | | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help@ncsa |
radiant-admin@ncsaCOMPLETE | 2021-08-10 07:00 | 2021-08-10 17:10 | iForge | Quarterly Maintenance | All systems unavailable | |
| | LastPass | yearly audit performed. Users disabled or deleted per policy | Accounts that had been disabled for over a year were deleted. Accounts that were unused for a year were disabled | help+security@ncsa |
iforge-admin@lists.ncsacomplete20210809 142120210809 1440NCSA Wiki | DB conflict configuration with Wiki & Wiki-Test | NCSA Wiki was unaccessible | help+service@ncsacomplete | 2021-08-05 1000 | 2021-08-05 1030 | NPCF Core Router - Linecard Reboot | A problem was identified on one of the line cards in our core router requiring a reboot of the linecard. The linecard was successfully rebooted and we will continue monitoring the hardware for further issues. | All connections to this linecard are redundant and there was no impact to users. | neteng@ncsa.illinois.edu | |
| | Wiki service upgrade | Upgrade version to address recently announce security vulnerabilities. | Wiki will be down during upgrade and testing. | help@ncsa.illinois.edu | |
complete | 2021-08-05 0800 | 2021-08-05 1000 | LSST | LSST Emergency OS Patching | LSST services hosted at NCSA except: - NTS will remain up (has already been patched)
| |
| | Jira service upgrade | Upgrade version to address recently announce security vulnerabilities. | Jira will be down during upgrade and testing. | help@ncsa |
lsst-admin@ncsacomplete2021080408002021-08-04 1700 | Radiant API and Web accessInstallation of new Radiant clusterCluster name changes are starting at 1100; This will make the horizon dashboard unreachable.17 0500 | 2024-01-17 0700 | Wireless connectivity on 2nd, 3rd and 4 floors. | Tech Services will be replacing some network components in switches that provide connectivity for wireless. | Each floor (wireless) will lose connectivity for a few mins while the cards are replaced. | neteng+help@ncsa |
During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable. Instances will continue to run and be available over the network with no interruptions. radiant-admin@ncsaCOMPLETED20210804 0700202108-04 0800cilogon.org | Update to OA4MP v5.2.0 | Added support for Device Authorization Grant Flow (RFC 8628) | 01-16 23:30 | Facility UPS | Second attempt, Preventive Maintenance_Replace UPS capacitors | All systems which are connected to UPS power. During the PM the systems will not lose power but will be unprotected. | rantissi@illinois.edu |
help@cilogon.org |
Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2021080308002021-08-03 1700 | Radiant API and Web accessInstallation of new Radiant clusterDuring this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable. Instances will continue to run and be available over the network with no interruptions.16 2200 | 2024-01-17 0300 | Waster leak in Node 1 on campus. | Node1 ( located on campus) has a water leak that may require full power down to address. This will take out several devices that provide connectivity to NCSA WAN. | No power outage was needed to repair the leak | neteng@ncsa |
radiant-admin@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
202108-03 9:00 am01-10 0700 | 2024-01-10 12:09 | Nightingale | Quarterly Planned Maintenance | All Nightingale servers and services were unavailable (other than the ngale-bastion* nodes) | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2024-01-09 2100 | 2024-01-10 0400 | Wifi | Performing a Code upgrade that will affect the Wi-Fi Environment. The majority of the system will be online and functional while individual Access Points will be upgraded. This upgrade is expected to gracefully migrate clients to adjacent access points to minimize any interruption. | <----
This will more than likely also impact NCSAnet. | neteng+help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | vforge | Radiant upgrade | Entire cluster is shut down | jlong@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| | Radiant | The Radiant cluster was be upgraded from OpenStack Wallaby to Yoga. | The web dashboard and API endpoints were unavailable; networking for instances may have been intermittent. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
2024-01-05 0400 | 2024-01-19 1200 | Wifi | Upgrading the code used for the Authentication on the Wi-Fi system and VPN. There will be an interruption to the IllinoisNet_Guest device registration and the IllinoisNet_Guest self-registration portal; both are expected to be back online before regular business hours. Regular authentication and traffic flow for the Wi-Fi and VPN is not expected to be interrupted. | <----
This will more than likely also impact NCSAnet. | neteng+help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2023-12-19 0400 | 2023-12-19 0800 | Wifi | Upgrading the core campus Wi-Fi hardware. There will be an interruption to Campus Wi-Fi (including IllinoisNet, IllinoisNet_Guest, and eduroam), IllinoisNet_Guest device registration, and the IllinoisNet_Guest self-registration portal. | <----
This will more than likely also impact NCSAnet. | neteng+help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| | wiki.ncsa.illinois.edu and jira.ncsa.illinois.edu | Atlassian has notified us of several critical security vulnerabilities in Confluence and Jira software. A mitigation has been applied to the Jira server and the Confluence server (wiki.ncsa.illinois.edu) will be patched. | There will be a brief outage to patch the Confluence server at 1600. The patching is expected to take 15-20 minutes but the entire hour is reserved as a precaution. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
2023-12-06 0900 | 2023-12-06 1700 | Facility UPS | Preventive Maintenance _ Replace UPS capacitors. | All systems which are connected to UPS power. During the PM the systems will not lose power but will be unprotected. | MO Rantissi | UPS maintenance was halted due to damaged parts. Putting the UPS back together and rescheduling for a later date. The UPS is back online. |
| | IDDS database | Planned maintenance: postgresql upgrade | NCSA identity, group management, campus cluster user management page, TEM shift report tool, and naps | help+idds@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| 2023-11-09 1445 (vForge) 2023-11-09 1630 (license servers) | vForge / license servers | Quarterly Planned Maintenance | all vForge nodes and services (incl. related license servers/services) will be unavailable | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| | vForge / license servers | Quarterly Planned Maintenance | all vForge nodes and services (incl. related license servers/services) will be unavailable | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| | Radiant | Rebuilding rabbitmq service | Dashboard and API services were read-only during this time. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | hub.ncsa.illinois.edu | private docker registry is down due to volumes in radiant in detaching state | hub.ncsa.illinois.edu is not reachable, and images stored are unreachable. Services that have their images local should continue to run, services that want push/pull images will get a 500 error. | Rob Kooper | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | Confluence/Wiki | Upgrade the system | Confluence | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2023-10-31 09:30 | 2023-10-31 10:30 | NCSA OpenSource | upgrade Atlassian products | opensource confluence/jira/bamboo/bitbucket | | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | HAL | Full System PM | All HAL services | | Status |
---|
colour | Green |
---|
title | POSTPONED |
---|
|
|
| | SSLVPN | New auth method was added to a new login profile, ncsa-vpn-saml-tunnelall. | There is now a test profile in place that isn't open to everyone. Please continue to use the profiles you were using before. If you notice and issue please report it. Our testing indicated logins were working as intended. | help+neteng@ncsa.illinois.edu | |
| | Radiant | OpenStack software update | The Radiant team will be conducting an OpenStack software update, from Victoria to Wallaby. This is a software stability update and does not include significant features or changes in functionality. The update will be done online and is not expected to impact running instances or system access. | help@ncsa.illinois.edu | |
| | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help@ncsa.illinois.edu | |
| | Taiga | Online, Rolling patch of Taiga servers | Taiga File System | set@ncsa.illinois.edu | |
| | HOLL-I | CS-2 Appliance Mode upgrade | All HOLL-I servers and services will be limited to internal testers | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
| | HOLL-I | CS-2 Appliance Mode hardware installation | All HOLL-I servers and services will be unavailable | help@ncsa.illinois.edu | |
| | Confluence | Config will be applied to increase the period users can be logged in before logged out | Confluence will be down | help@ncsa.illinois.edu | |
| | sslvpn | testing new auth method | no user impact was observed | help+neteng@ncsa.illinois.edu | |
2023-10-10 0800 | 2023-10-10 1000 | cilogon.org | Moving to new compute infrastructure | cilogon.org, demo.cilogon.org, crl.cilogon.org | help@cilogon.org | |
2023-10-09 1215 | 2023-10-09 1532 | Taiga | Appliance has unmounted all of its OSTs. | Ability to do I/O to Taiga | set@ncsa.illinois.edu | |
| | Confluence | Confluence is being upgrade | Confluence will not be available for use | help@ncsa.illinois.edu | |
2023-10-04 0600 | 2023-10-04 1927 | Delta | Filesystem and OS patching | All Delta resources will be unavailable during the maintenance period including: + Delta login nodes - unavailable + Delta compute nodes - unavailable Delta services + Open OnDemand - unavailable + Delta Globus Online endpoint - unavailable | help@ncsa.illinois.edu | |
2023-10-04 0950 | 2023-10-04 1000 | Opensource Confluence | Patching confluece | opensource confluence will be down | | |
2023-09-27 1700 | 2023-09-28 0700 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu | |
2023-09-26 1400 | 2023-09-26 1500 | Wireless NCSA building | Campus wireless outage. | NCSAnet and IllinoisNet users are experiencing connectivity issues. Tech Services is aware of the problem. | help+neteng@ncsa.illinois.edu | |
2023-09-28 1433 | 2023-09-28 1459 | cilogon.org | service outage due to AWS database issue | logins to cilogon.org were failing | help+cilogon@ncsa.illinois.edu | |
2023-09-26 7:30AM | 2023-09-26 8:00AM | Ldap Primary Server | Maintenance | Ldap updates will be disabled during maintenance | Timothy Bouvet | |
09/22/2023 8:00am | 9/22/2023 1:30pm | Wireless access | NCSANet is not authenticating users and denying connections. | Anyone attempting to connect to the wireless NCSANet ID. | neteng@ncsa.illinois.edu | |
2023-08-29 | 2023-09-21 - 1300 | opensource bitbucket | Bitbucket is not compatible with the deployed version of git, see https://jira.atlassian.com/browse/BSERV-14390 | opensource.ncsa.illinois.edu/bitbcket | | |
2023-Sep-19 - 0745 | 2023-Sep-19 - 0750 | LastPass | Rekey the LastPass/Duo Integration | LastPass users that utilize duo may not be able to authenticate until completed | James Eyrich | |
2023-Sep-18 - 1511 | 2023-Sep-18 - 1749 | Taiga | Outage due to failed MDS failover. | Taiga access was unavailable. | set@ncsa.illinois.edu | |
2023-09-14 0700 | 2023-09-14 2015 | vForge / license servers | Quarterly Planned Maintenance | all vForge nodes and services (and related license servers/services) will be unavailable | help@ncsa.illinois.edu | |
2023-09-14-0800 | 2023-09-14-2000 | Taiga & Granite Services | Semi-Annual Planned Maintenance | All Taiga and Granite services will be offline | set@ncsa.illinois.edu | |
2023-09-08 06:58 | 2023-09-08 09:50 | disruption to NPCF-DES-CORE, NPCF-CWMGMT-FW1 & 2, MForge VPN | NPCF-CORE-EAST has a DEAD linecard. Relocating affected links to other linecards with open ports while we work with vendor support for a replacement. | redundancy has been lost, access and activity remain normal. | | |
2023-09-06 09:48 | 2023-9-6 10:35 | NCSA Center Wide Management Network | the firewall protecting this network is showing offline | Centerwide management networks in NCSA building | (John) Walker | |
2023-9-6 09:50 | 2023-9-6 10:35 | The main switch in NCSA 3003 | In debugging a link problem between NCSA and NPCF the wrong fiber was inadvertently pulled | Networking in and out of 3003 was down for 35 mins | neteng@ncsa.illinois.edu | |
2023-08-26 23:06 | 2023-08-27 01:36 | CILogon | CILogon database replication error | CILogon OAuth/OIDC services unavailable | help+cilogon@ncsa.illinois.edu | |
2023-08-24 13:02 | 2023-08-25 14:15 | Taiga | Multiple SAS cable backend failure causing OSTs to go into write protect and unmount | Access to certain OSTs in Taiga | set@ncsa.illinois.edu | |
2023-08-16 1700 | 2023-08-17 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu | |
2023-08-16 0700 | 2023-08-16 1003 | Nightingale | Quarterly Planned Maintenance | All Nightingale servers and services were unavailable (other than the ngale-bastion* nodes) | help@ncsa.illinois.edu | |
2023-08-15 0400 | 2023-08-15 0500 | VMWare Gateway | VMWare is updating the Gateway OS | No expected effects | help@ncsa.illinois.edu | |
2023-08-14 0935 | 2023-08-14 2000 | NCSA VPN | Duo implemented new ssl checks that we were not passing | Users couldn't authenticate with DUO to establish new connections to the VPN. Existing VPN sessions remain connected. | Matthew Elliott | |
2023-07-27 1700 | 2023-07-27 1715 | HOLL-I | Live kernel patching | kernel was updated in response to recent security issue. | help@ncsa.illinois.edu | |
2023-07-25 0900 | 2023-07-25 1630 | Radiant | Changes to the OpenStack network configuration and network service node (increasing MTU on customer networks and adding a new dedicated network server) | These changes will impact project/instance networks and cause them to be unreachable for an extended period of time. Expect network timeouts and failure of NFS file system access. Systems may be unreachable for several hours - up to the entire planned time - but we are making every effort to minimze the downtime. | James Glasgow via help@ncsa.illlinois.edu | |
| | ICCP | ICCP Quarterly Maintenance | All ICCP services | help@campuscluster.illinois.edu | |
2023-07-19 0800 | 2023-07-19 0900 | u1carne router | scheduled maintenance | mForge, Magnus, and Access will have a brief outage as the routers reboot. | Michael Douglas via neteng@ncsa.illinois.edu | |
2023-07-12 1700 | 2023-07-12 2130 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu | |
2023-07-07 1000 | 2023-07-07 1100 | NCSA Kerberos | Deleting out principals that were disabled on 2023-06-07 | Kerberos authentication should already be disabled for the planned hosts, so there should be zero notable effect. | help@ncsa.illinois.edu | |
| | HOLL-I | Transitioning CS-2 Execution Mode from Weight Streaming to Pipelined | Holl-I CS-2 | help@ncsa.illinois.edu | |
2023-06-30 0600 | 2023-06-30 0605 | NCSA GitLab | GitLab was updated to latest version | All GitLab services was unavailable for a few minutes. | help@ncsa.illinois.edu | |
2023-06-29 1849 | 2023-06-29 2000 | Delta | More Power fluctuations due to the severe weather have caused in all NCSA buildings. NCSA staff are working to restore all services to full functionality. | Delta Login, Openondemand and Scheduling. | help@ncsa.illinois.edu | |
2023-06-29 1316 | 2023-06-29 1530 | Most NCSA computer systems | Power fluctuations due to the severe weather have caused multiple system failures in all NCSA buildings. NCSA staff are working to restore all services to full functionality. | Virtually all systems have been impacted to some extent. Most NCSA compute resources have returned to service. | help@ncsa.illinois.edu | |
2023-06-21 1800 | 2023-06-21 1900 | DNS1 / DNS2 | BIND security patches | Due to a security issue with BIND, neteng will be rebooting both DNS servers (staggered) starting tonight at 1800. | neteng@ncsa.illinois.edu | |
2023-06-15 0600 | 2023-06-15 0605 | NCSA GitLab | GitLab updated to use new backup method | All GitLab services were unavailable for a few minutes. | help@ncsa.illinois.edu | |
2023-06-14 1700 | 2023-06-14 2200 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu | |
2023-06-08 1100 | 2023-06-08 1205 | ICI Metrics | Major Upgrade to Grafana 9.5.x and Unified Alerting | Access to https://metrics.ncsa.illinois.edu and all alerting was paused | malone12@illinois.edu | |
2023-06-07 1300 | 2023-06-07 1500 | NCSA Kerberos | Disabling Kerberos Host Principals not in DNS | Kerberos Authentication for hosts may stop working. Please create a ticket if you think your host principal may have been disabled erroneously. | help@ncsa.illinois.edu | |
2023-05-23 1240 | 2023-05-23 1410 | Taiga | Failover events on tgio02 | I/O to and from Taiga for all services intermittently during this period | set@ncsa.illinois.edu | |
2023-05-23 0800 | 2023-05-23 1400 | Granite Tape Archive | Unplanned Library Maintenance due to component failure | Retrieval of data; | bdickin2@illinois.edu | |
2023-05-18 0800 | 2023-05-18 1400 | Granite Tape Archive | Library Preventative Maintenance | Retrieval of data; | bdickin2@illinois.edu | |
2023-05-17 0700 | 2023-05-17 2125 | Nightingale | Quarterly Planned Maintenance | All Nightingale servers and services were unavailable (other than the ngale-bastion* nodes) | help@ncsa.illinois.edu | |
2023-05-17 0530 | 2023-05-17 0600 | Wireless and VoIP (NCSA Building) | Router Upgrades | Wireless, VoIP and anything directly connected to the campus switches will be down, while they upgrade firmware on the router. | help+neteng@ncsa.illinois.edu | |
2023-05-16 0700 | 2023-05-16 1100 | HOLL-I | Quarterly Planned Maintenance | all HOLL-I nodes and services were unavailable | help@ncsa.illinois.edu | |
2023-05-15 0900 | 2023-05-15 1200 | HOLL-I | CS-2 CDU maintenance | the HOLL-I CS-2 was unavailable and there was a reservation in Slurm | help@ncsa.illinois.edu | |
2023-05-12 0600 | 2023-05-12 0615 | NCSA GitLab | GitLab was updated to latest version | All GitLab services was unavailable for a few minutes. | help@ncsa.illinois.edu | |
2023-05-11 0700 | 2023-05-11 1900 | vForge / license servers | Quarterly Planned Maintenance | all vForge nodes and services (and related license servers/services) will be unavailable | help@ncsa.illinois.edu | |
2023-May-03 0800 | 2023-April-25 0900 | ACHE FW Cluster Upgrade - Secondary | Upgrading ACHE Firewall member B | No outage expected | eyrich@illinois.edu | |
| | vSphere and hosts on it. | VMWare Licensing issues. Was forced to migrate to new vSphere. | LDAP, Wordpress Sites, various | help@ncsa.illinois.edu | |
2023-05-01 1530 | 2023-05-01 1605 | ICI VMware | Apply updates to address software issue. | | aloftus@ncsa.illinois.edu | |
2023-April-26 0800 | 2023-April-25 0900 | ACHE FW Cluster Upgrade - primary | Upgrading ACHE Firewall member A | No outage expected | eyrich@illinois.edu | |
2023-April-25 0800 | 2023-April-25 0900 | NPCF CWFM Cluster Upgrade secondary | Upgrading NPCF CW Firewall member B | No outage expected | eyrich@illinois.edu | |
| | NCSA VPN | The certificate on the NCSA VPN was replaced. | Users will be disconnected from the VPN and have to manually reconnect. | neteng@ncsa.illinois.edu | |
|
|
|
|
|
|
|
2023-April-20 0800 | 2023-April-20 0900 | NPCF CWFM Cluster Upgrade primary | Upgrading NPCF CW Firewall member A | No outage expected | eyrich@illinois.edu | |
2023-04-19 1800 | 2023-04-19 2300 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing were unavailable. | help@ncsa.illinois.edu | |
| | Delta | HSN and OS is being updated, | The entire system will be offline. | kingda@illinois.edu | |
04/17/23 0900 | 04/17/23 1400 | Granite Tape Archive | Upgrades to FS | Ingest or retrieval of data; | bdickin2@illinois.edu | |
2023-03-27 | 2023-04-16 | NCSA OpenSource BitBucket | incompatibility with git, only versions that can be installed are 2.25 or 2.40, and Bitbucket requires version 2.31 - 2.39 https://opensource.ncsa.illinois.edu/bitbucket is down until new version of BitBucket |
| Rob Kooper | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2023-04-03 | 2023-04-04 | NCSAnet, IllinoisNet, EDUroam | Tech Services is deploying a new certificate for all wireless networks. | Check #announce on NCSA Slack for more information, including links to download software that will update your wireless profiles. | help+neteng@ncsa.illinois.edu | |
2023-03-29 12:00 CDT | 2023-03-29 12:30 CDT | Primary Kerberos server | Configuration changes to match secondary KDCs | Password changes may have been delayed by ten minutes | Christopher Lindsey | |
2023-03-23 0843 | 2023-03-23 1030 | DHCP serving NCSAnet wireless and NCSA office wired wall jacks | The main NCSA DHCP server stopped answering queries and was restarted | If you didn't already have a DHCP lease your system would have been unable to connect to NCSAnet or register on an office wired wall jack. | neteng@ncsa.illinois.edu | |
2023-03-15 1800 | 2023-03-16 2300 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu | |
2023-03-14 1100 | 2023-03-14 1150 | Authentication to vsphere.ncsa.illinois.edu and ache-vcenter will fail | Replacing SSL certs on Ldap1/2 | Ldap will be restarted on Ldap1/2 | tbouvet@illinois.edu | |
2023-03-09 0700 | 2023-03-09 17:20 | vForge / license servers | Quarterly Planned Maintenance | all nodes and services will be unavailable | help@ncsa.illinois.edu | |
| 03/09/2023 1713 | NCSA Taiga & Granite | Taiga Service Node Updates & Granite Upgrade | Taiga Public LNET router was upgraded and a second one added; access via public LNET was down from 0800 to 1100. Globus and NFS services were patched in a rolling/online fashion.
Granite experienced a short full downtime as we upgraded its software. | set@ncsa.illinois.edu | |
03/07/2023 8:30am | 03/07/2023 10:15am | Delta HSN | The HSN was dropping nodes and not allowing nodes to reconnect | High Speed Connectivity | help@ncsa.illinois.edu | |
2023-03-01: 1100 | 2023-03-01: 1115 | Radiant OpenStack Services | Changes to the OpenStack controller node to address networking performance issues | All OpenStack services were restarted to effect system configuration changes. The work was completed successfully and all services are available again | help@ncsa.illinois.edu | |
| | NCSA email | A mail loop caused routing and processing problems. | Mail routing and delivery was blocked. | help@ncsa.illinois.edu | |
| | HOLL-I | Quarterly Planned Maintenance | all nodes and services will be unavailable | help@ncsa.illinois.edu | |
2023-02-16 ~14:15 | 2023-02-16 ~14:25 | cerberus4 | mis-configuration caused roughly 50% of connections to be dropped | 50% of connections in and out dropped | help+security@ncsa | |
2023-02-10 0910 | 2023-02-10 0915 | users.ncsa.illinois.edu web site | restarting the system | no web pages from users.ncsa.illinois.edu will be available | help@ncsa.illinois.edu | |
02/08/2023 1800 | 02/09/2023 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu | |
| | Jira | Jira will be restarted to fix stuck notification emails. | Jira will unavailable during this time. | Andrew Loftus Also posted to #announce (Slack) | |
| | ICCP head node login and golub compute resources | Lost network connectivity for golub infrastructure | ICCP head node logins (ie cc-login.campuscluster.illinois.edu) and golub compute resources | help@campuscluster.illinois.edu | |
| | Jira | Jira offline for service restart to fix stuck emails. | Jira will unavailable during this time. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
01/25/2023 0800 | 01/25/2023 0830 | NCSA LDAP | rolling LDAP restarts of redundant servers to deploy new schema file | Minimal impact for service restarts | | Status |
---|
colour | Green |
---|
title | Completed |
---|
|
|
2023-01-19 1310 | 2023-01-19 1330 | Jira | Jira offline for reboot to fix Boards. | Jira will unavailable during this time. | help@ncsa.illinois.edu | Status |
---|
subtle | true |
---|
colour | Green |
---|
title | Complete |
---|
|
|
| | ICCP | ICCP Quarterly Maintenance | All ICCP services | help@campuscluster.illinois.edu | |
2023-01-13 1200 | 2023-01-13 1230 | Jira | Jira offline for dashboard fixes. | Jira will unavailable during this time. | help@ncsa.illinois.edu | Status |
---|
subtle | true |
---|
colour | Green |
---|
title | completed |
---|
|
|
2023-01-12 0800 | 2023-01-13 1230 | Jira | Minor issues noticed in Jira likely caused by the upgrade yesterday evening. | Gadgets and dashboards are having issues. | | Status |
---|
subtle | true |
---|
colour | Green |
---|
title | resolved |
---|
|
|
2023-01-11 0700 | 2023-01-12 1200 | Nightingale | Quarterly Planned Maintenance | All Nightingale servers and services will be unavailable (other than the ngale-bastion* nodes) Maintenance has been extended until noon Thu, Jan 12 due to complications with firmware update on the Lustre storage appliance. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2023-01-12 0700 | 2023-01-12 0715 | NCSA VPN | Router Migration | The NCSA VPN was migrated to a different upstream router. Users were briefly disconnected. | help+neteng@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | NCSA GitLab | GitLab upgrade | All GitLab services were unavailable for a few minutes while it upgraded to the latest version. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
06:40 1/9/2023 | 2023-01-11 2100 | vSphere in 3003 | One of the storage appliances serving vsphere.ncsa.uiuc.edu started access issues. This has caused issues with 19vms. | crashplan has returned to service | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
2023-01-11 1730 | 2023-01-11 1915 | Jira | Jira software upgrade | Jira will be unavailable while software upgrades are applied. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
1/9/2023 6:40am | 1/11/2023 various | vSphere in 3003 | One of the storage appliances serving vsphere.ncsa.uiuc.edu had access issues. Data was moved to different storage for affected VMs. | digitalag.ncsa.illinois.edu, gecat, reu.ncsa.illinois.edu, ACIpartnership.org, astro, edream, caiiwp, brainstormhpcd.org, internal-dev, cmdb-dev-kimber7, reu-international.ncsa.illinois.edu, avl-test, mharp - ergo, infews-er.net, ncsa30, bluewaters - 2018-03-05. | help@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
12/23/2022 6:30pm | 12/27/2022 1:30pm | Taiga | Single OST is failing to re-mount following failover | File system is unavailable | set@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | Wireless at NCSA building. | Router Upgrade | Tech Services will be upgrading their NCSA building router which will effect wireless at the NCSA building. Downtime will be estimated at 15 mins. | help+neteng@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | COMPLETED |
---|
|
|
| | Radiant | System maintenance | OpenStack: - "Minor system configuration changes will be made to increase system logging and optimize memory usage/allocation across nodes. No noticeable impact to end users is expected."
Networking:
- Swap fiber links to correct issue with security taps: In order to minimize user impact, we will swap one link at a time. User should see no impact however there is a slight possibility of a temporary network outage potentially lasting a few minutes however we currently do not anticipate this happening.
- Update Ethernet switch firmware: Switch reboots will be done in a rolling fashion and so are not expected to be disruptive to ongoing operations (due to switch/path redundancy).
| help@ncsa |
2021-08-03 11:30 am | Radiant Cluster | A change was made to the firewall that unintentionally restricted access for instances and other internal cluster communication. | Access to instances and workload | radiant-admin@ncsaresolved | 2021-07-31 0600 | 2021-07-31 0630 | CILogon hosted services | Infrastructure maintenance | During this time each service hosted by CILogon including COmanage Registry, LDAP, Grouper, SAML proxy, and MDQ will become unavailable for a short time. Each individual service outage will last less than 5 minutes. Services that will not be impacted include: * OIDC clients that do not query LDAP for resolving attributes * X.509 certificate issuance and certificate revocation lists * LIGO and GW-Astronomy services | |
15 Dec 2022 0900 | 15 Dec 2022 0935 | NCSA Kerberos | NCSA's Read-Write KDC is being upgraded | Password changes and new accounts are being queued for completion after the upgrade. | help@ncsa.illnois.edu |
help@cilogon.orgCOMPLETE | 2021-07-29 1300 | 2021-07-29 1400 | IRST-run bastion hosts (pool B) | Security patching | Hosts managed by IRST will be patched and rebooted. Only hosts in pool B will be patched at this time | |
| | NCSA GitLab | GitLab was upgraded to latest version | All GitLab services was unavailable for a few minutes. | help@ncsa |
help+security@ncsaCOMPLETE20210729 090020210729 1000IRST-run bastion hosts (pool A) | Security patching | Hosts managed by IRST will be patched and rebooted. Only hosts in pool A will be patched at this time | 2022 0700 | NCSA VPN | Software Upgrades | The appliances hosting the NCSA VPN were patched. Users experienced a brief disconnect as load is failed over between the appliances. The anyconnect client was upgraded at this time | neteng@ncsa |
help+security@ncsaCOMPLETE | 2021-07-28 1000 | 2021-07-28 1050 | LSST | OS Updates on only NCSA Test Stand (NTS) | Only the LSST NCSA Test Stand (NTS) services hosted at NCSA | |
| | NCSA identity password resets | The password reset process is not completing. | Users password resets were queued and then applied when the issue was fixed. Users who tried to change their password should find there password is now set to the password of their last attempt. | help@ncsa |
lsst-admin@ncsaCOMPLETE | 2021-07-27 0600 | 2021-07-27 0900 | Jira | Upgrade | Jira will be unavailable | |
| | capnjack (license server) | Changes to IPTABLES | Unknown servers. Licenses affected are IDL, PGI, Intel, MATLAB, Abaqus, Sention LM, Luda, Ansys, CDL, Adaptive, Converge, CFD, RLM Type, rr_ld | meberger@illinois.edu re: SVCPLAN-1465 |
help+serverice@ncsa.illinois.eduCOMPLETE20210726 180020210727 0000NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu complete2021072108002021-07-21 2900 | ICCP | ICCP Quarterly Maintenance | All ICCP services | 15 0700 | 2022-11-15 1700 | HOLL-I | Quarterly Planned Maintenance | all nodes and services will be unavailable | help@ncsa |
help@campusclustercomplete202107-21 15:242021-07-21 21:50 | ASD Vshpere cluster in 3003 | One of the 4 hypervisors in the cluster paniced. Unscheduled preventative maintenance is being preformed on it and the other 3 nodes in the cluster. | after the initial outage at 15:24, there should be no additional outages. | 11-10 0700 | 2022-11-10 1200 | vForge / license servers | Quarterly Planned Maintenance | all nodes and services will be unavailable | help@ncsa |
help+service@ncsacomplete202107-13 07002021-07-13 0800 | cilogon.org | Update to OA4MP v5.1.4. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.4. | 11-10 11:00 | 2022-11-10 11:50 | ASD Vsphere, specifically vm's using the tintri storage appliance. | Network connections were upgraded to 25G speed. | There was no disruption of service with this work. | help@ncsa.illinois.edu |
help@cilogon.orgcomplete2021-08 08002121-08 1000OpenAFS | The remaining OpenAFS database servers were upgraded. | No service impacts were seen | help+service@ncsa.illinois.efucomplete20210707 0600202107-07 0800CILogon AWS Hosted Services | Upgrading AWS RDS Aurora MySQL v5.6 to v5.7 | COmanage Registry and Grouper services hosted by CILogon will be unavailable | 11-04 1930 | SET Taiga | SET caused a failover of tgio02 and then failed back. This fixed the mounting issue. | Clients with taiga currently mounted may experience slow or stopped IO during the failover. Failover completed properly and solved the mounting issue. | set@ncsa.illinois.edu |
help@cilogon.orgcomplete | 2021-07-01 2140 | 2021-07-01 1430 | Horizon dashboard access was down for the entire period. Cluster networking was down from 1200 to1430. | Investigations into Horizon dashboard accessibility issues resulted in the application of an incorrect default network gateway for the cluster around noon. This was corrected and networking functionality restored around 1400. Instances began recovering soon thereafter. | Radiant admins believe running instances have recovered on their own but we advise everyone to check their systems and report any issues they see to the help desk |
2022-11-03 1132 | 2022-11-04 1930 | Delta | Taiga filesystem (/taiga/ and /projects/) problem on dt-login01 and dt-login02 | The issue is limited to dt-login01 and dt-login02. Commands attempting to access /taiga/ or /projects/ on these nodes will hang. Users are advised to use dt-login03 or the login.delta.ncsa.illinois.edu "round robin" address UPDATE: dt-login01 and dt-login02 are fully functional again and back in the login.delta.ncsa.illinois.edu DNS "round robin". | help@ncsa.illinois.edu | |
resolved202107010247
2021-07-01 1300 | Various systems in NPCF, ACB, NCSA | There was a power event in the Champaign-Urbana area at around 2:47AM today. Details about the cause are currently unknown. This event caused disruptions to systems at the NCSA building, NPCF and ACB. Known issues have generally been resolved but there may be unidentified issues lingering. If you encounter any problems, please notify NCSA help desk staff (help@ncsa.illinois.edu). | Multiple systems/services were impacted. All have been recovered and return to normal operations is complete. | 03 0048 | 2022-11-03 0106 | SET Taiga | tgio02 and tgio04 failed over | OSTs on the two nodes were inaccessible until the reboots were complete. This is a known issue with a vendor patch in progress. | set@ncsa.illinois.edu |
NCSA help deskresolved202107-01 02:58 CDT2021-07-01 06:00 CDT | ACHE and NGALE bastion hosts | Loss of power. | All ache-* services, ngale bastion hostsresolved2021Matt Kollross
06-29 22:002021-06-29 23:59 | NCSA 4th Floor Office network | Rebooting one or more of the office switches on the NCSA Building 4th floor to resolve a phone issue. | Office port connectivity will be intermittent during the maintenance window. | 11-02 1700 | 2022-11-02 2000 | DNS Services | Patching for out of cycle security updates. | DNS1 and DNS2 will be patched and rebooted (staggered) to applied needed updates. |
help+neteng@ncsa.illinois.edu | |
resolved2021062408002021-06-24 1345 | LSST | - Updates are being applied on Prod/Stable k8s, rebuild of some ingress nodes
| Prod/Stable K8S | 01 1800 | 2022-11-02 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa |
lsst-admin@ncsaGreen | title | resolved |
---|
2021-06-24 0800 | 2021-06-24 1200 | LSST | LSST Quarterly Maintenance - OS updates on all servers
| All LSST services hosted at NCSA EXCEPT Prod/Stable K8S | lsst-admin@ncsacomplete20210622 00002021Matt Kollross
06-22 0400Internet2 WAN link | Internet2 will be migrating NCSA's physical port to their new next generation infrastructure. | During the maintenance, our I2 connection will be down. Traffic will reroute to other connections. Some point to point connections maybe unavailable for period of time. The maintenance window is not expected to take all 4 hours. | 10-25 0900 | NCSA building 1st Floor Wifi / Security Cameras | Tech Services is replacing a networking switch on the 1st for of the NCSA building that powers the Access Points on the first floor. | This should be a short down time, but the access points will reboot while we migrate cables to the new switch. |
help+neteng@ncsa.illinois.edu | |
title | complete |
---|
2021-06-21 1800 | 2021-06-22 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu | Status |
---|
colour | Green |
---|
title | complete |
---|
2021-06-17-0700 | 2021-06-17-0820 | OpenAFS | The OpenAFS database server kaskaskia was upgraded | No service outages were observed or reported. | help+service@ncsa.illinois.exducomplete202106-12 22002021-06-15 1500 | LSST Firewall | The NPCF secondary firewall was offline due to a hard drive failure. | No impact occurred to production services as the primary firewall stayed online. | | 10-18 15:00 | 2022-10-18 15:30 | Radiant instance creation/management | system setting changes | No noticeable impact | pl@illinois.edu | |
RESOLVED202106-14 17002021-06-15 0958 | NCSA GitLab | Attempt to fix an authentication bug for a particular user accidentally broke all authentication through the web interface, | Authentication through the web interface did not work. | 10-18 12:00 | 2022-10-18 12:05 | identity, email to NCSA addresses | system updates | 1 minute window to cause email delays and identity frontend unavailable | cpl@illinois |
help+service@ncsa.illinoisRESOLVED2021061120210611 0905 JiraJira email problem | Jira is not accepting issues via email, you can still create issue directly via Jira GUI | Cameron Pitceloffice firewall upgrade | Upgrading code on the office firewall. | Office networks will be offline during this upgrade. | help+neteng@ncsa.illinois.edu | |
RESOLVED20210610 070020210610 0800cilogon.org | Update to OA4MP v5.1.3. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.3. | 13 1800 | SSLVPN Maintenance | The second member of the HA pair will be put back into service. | The second member was added with no outage. | help+neteng@ncsa.illinois.edu |
help@cilogon.org | | Jira.ncsa.illinois.edu | Configuration change to address a vulnerability | There should not be any service interruption, but as with all things, it is possible | 2022-10-12 11:00 | 2022-10-12 12:00 | ASD Vsphere, specifically vm's using the tintri storage appliance. | Network connection on tintri storage box were switch to new hardware but their speed was unchanged. Additional work will need to be scheduled to complete the speed increase. | This had no service impact. | help@ncsa |
help+service@ncsaGreenResolved2021060220210602Netdot | Netdot web access now requires 2FA via SSL VPN, or Cerberus proxy. | Security requested that Netdot require 2FA, in order to access the web interface. To accommodate that request, the Netdot firewall has limited web access to the VPN subnet or via proxy from the Cerberus jump hosts. | NCSA VPN | The NCSA VPN had a member of the HA pair fail and licensing didn't fail over. | Users were unable to connect to the VPN until the licensing issue was resolved. |
Matt KollrossRESOLVED202105252021-05-26 | vcenters for ache and ASD | emergency security updates were applied. | the administrative interface was off-line for about 20 minutes as the updates were installed. | help+service@ncsaRESOLVED202105261000
2021-05-26 1030 | VoIP phones at NPCF | Migrating the VoIP networks to a campus IP to enable future migrations by tech services. | After the networks are migrated, a reboot all phones at the NPCF building will be performed. | 27 1100 | 2022-09-28 1700 | odd numbered bastion hosts (cerberus1, cerberus3, ache-bastion-1, ngale-bastion-1, etc.) | puppet code refactoring for SSH configs | More changes were pushed out around 5p on 2022-09-28 and we believe the SSHD config issues are resolved. You can use the even numbered (cerberus2, cerebrus4) bastions as a work-around if any issues persist. | help+security@ncsa |
Matt Kollross
neteng+help@ncsaRESOLVED202105Matt Kollross
neteng+211800
2021-05-21 1900 | VoIP phones at the NCSA building | Migrating the VoIP networks to a campus IP to enable future migrations by tech services. | After the networks are migrated, a reboot all phones at the NCSA building will be performed. | 28 0930 | 2022-09-28 1050 | Jira outgoing email | outgoing email degraded | Jira failed to send some/most outgoing email during this time frame. |
RESOLVED202105-20 05:402021-05-20 08:45 | LSST | ESXi host outage causing degradation of select services.
09-24 1445 | 2022-09-25 1045 | Granite | Building power outage caused Disk Storage Unit to power cycle | Any user operations on the cluster were interrupted and unavailable until resolution. | bdickin2@illinois.edu | |
2022-09-21 0800 | 2022-09-21 0930 | HOLL-I | Change CS-2 execution mode to Pipelined | Execution mode of the CS-2 was changed from Weight Streaming to Piplined. | help@ncsa |
Degradation of select services: - data backbone gateway (lsst-dbb-gw01 down)
- HTCondor (Central Manager nodes down for Prod & DAC)
- login (lsst-login01 is down)
Also loss of redundancy for some underlying services, including auth/access & k8s head nodes. | lsst-admin@ncsaRESOLVED2021051506002021-05-15 0800 | CILogon hosted services including COmanage Registry, LDAP, SAML proxy, SAML AA, MDQ | Maintenance | All CILogon hosted services were temporarily unavailable. | help@cilogon.org202105-12 07:002021-05-12 08:00 | internal.ncsa.illinois.edu | NCSA Internal Web Server Upgrade (aka Savannah or MIS Tools) | Updates were made that will affect the availability of the NCSA internal website and Savannah system. The system was be unavailable during this time. | 09-09 0943 | 2022-09-09 1457 | Jira | outgoing email degraded | Jira failed to send some/most outgoing email during this time frame. | help@ncsa |
help+service@ncsa2021051107:00
2021-05-11 19:00 | iForge | Quarterly Maintenance | All systems unavailable | 08 0700 | 2022-09-08 1010: license servers 2022-09-09 0230: vForge | vForge / license servers | Quarterly Planned Maintenance | all nodes and services will be unavailable | help@ncsa. |
iforge-admin@lists.ncsa.2021-05-06 0900 | 2021-05-06 0945 | WAN Link Migration | NCSA Neteng migrated the WAN link to Internet 2 to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. Any connections relying on layer-2 connections over AL2S saw a brief blip as the connection is cut over. Affected parties were contacted in advance. | | | ASD VM services net | Routing in the switch stacks is being swiched from NCSA 3003 to NPCF | All systems on the 141.142.192.x network will be unreachable for up to 5 minutes. | help@ncsa |
help+neteng@ncsa2021050306002021-05-03 0630 | CILogon Multi-tenant COmanage Registry | Upgrade to version 3.3.2 | The service at https://registry.cilogon.org was unavailable | 31 1800 | 2022-09-01 0700 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were be unavailable during maintenance. Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help@ncsa.illinois.edu |
help@cilogon.org2021-04-29 1600 | 2021-04-29 1700 | - HTCondor Prod
- HTcondor DAC
| Add new nodes into Condor service pools | - HTCondor Prod
- HTcondor DAC
| | | Jira | Jira service will be restarted | Jira will not be available | help@ncsa |
lsst-admin@ncsa20210421 08:002021-04-21 20:00 | ICCP | ICCP Quarterly Maintenance | The scheduler will be down. All compute nodes will be converted to rhel7.9 with RedHat IB. | iccp-admins@campuscluster.illinois.edu | | 2021-04-15 1600 | 2021-04-15 1700 | NCSA Opensource | Upgrade of OS on all machines related to opensource | jira, wiki, git etc hosted at https://opensource.ncsa.illinois.edu/ | kooper@illinois.edu | |
2021-04-15 12:25 | 2021-04-15 14:45 | ICI vmware | Several hosts on the vmware service were experiencing timeouts - bluewaters
- bluewaters-test
- internal
- its-nagios
- ldap1
- vcenter
| no or intermittent connectivity to these hosts | help+service@ncsa.illinois.edu | RESOLVED Root cause is still being investigated. |
2021-04-15 0900 | 2021-04-15 0942 | CMDB | Applying new certificates and restarting services | CMDB, including web interface, will be down briefly during the update. | ncsagroup+org_itsm@ncsa.illinois.edu | |
2021-04-15 0900 | 2021-04-15 0920 | WAN Link Migration | NCSA Neteng will migrated the WAN link to ESnet to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | |
2021-04-14 15:00 | 2021-04-14 15:00 | git.ncsa.illinois.edu | Users can no longer access repositories from git clients over HTTPS using their NCSA password. | NCSA passwords can not access repositories with Git clients. Instead use ssh keys over SSH or personal access tokens over HTTPS. We thought this went into effect during git changes on Nov 2, 2020 but discovered it was still working until we made changes to GitLab to fully remove LDAP functionality. | help+service@ncsa.illinois.edu | |
2021-04-13 1415 | 2021-04-13 1845 | git.ncsa.illinois.edu | The GitLab website at git.ncsa.illinois.edu was having issues with authentication. The LDAP server that it uses was timing out. | - Login to the Git web interface was timing out.
- Access from git clients continued to work during the outage.
| help+service@ncsa.illinois.edu | |
2021-04-13 0800 | 2021-04-13 0830 | cilogon.org | Update to OA4MP v5.1.1. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.1. | help@cilogon.org | |
2021-04-12 1800 | 2021-04-12 2245 | File & Print Servers | Monthly Windows File & Print Server Maintenance | Windows File Shares such as HR, Business Office, Home, etc. and printing in the NCSA & NPCF buildings were unavailable. | help+service@ncsa.illinois.edu | |
2021-04-10 0600 | 2021-04-10 0800 | CILogon hosted COmanage, Grouper, SATOSA, LDAP | On Saturday, April 10, the CILogon team will perform maintenance on the infrastructure used for hosted services. | As part of the maintenance all COmanage Registry, LDAP, Grouper, SAML proxy, SAML attribute authority, and MDQ services hosted by CILogon may experience brief outages. We do not expect that any specific service outage will last for more than a minute. | help@cilogon.org | |
2021-04-08 0900 | 2021-04-08 1045 | WAN Link Migration | NCSA Neteng migrated the WAN link to ICCN Node-1 to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. Issues were noticed by users during the outage and are currently being investigated in cooperation with our upstream provider. | help+neteng@ncsa.illinois.edu | |
2021-04-08 0730 | 2021-04-08 0734 | NCSA Wiki | NCSA's Wiki service was restarted | NCSA's Wiki service was restarted to apply a new SSL certificate and renewed Confluence license. The wiki was not available for 4 minutes while it reloaded. | help+service@ncsa.illinois.edu | |
| 2021-04-07 1733 | Internal Savannah/MIS website | The Savannah/MIS website would not load due to a corrupted MySQL database table referenced across all of the Savannah tools. | Internal/Savannah | help+service@ncsa.illinois.edu | |
1st report 7:30am Monday | 8:19am Monday | NCSA LDAP2 | ldap2 is not responsive to authentication requests | NCSA Jira, any systems using LDAP2 as its only source. | help+service@ncsa.illinois.edu | |
2021-03-30 0800 | 2021-03-30 0845 | DNS1 | A software issue was causing BIND to fail. | DNS was not able to resolve during the period of time. DNS2 remained operational. | neteng+help@ncsa.illinois.edu | |
2021-03-23 2000 | 2021-03-23 2025 | NCSA VPN | The standby VPN hardware was replaced and transitioned into the current VPN cluster. Failover went as expected and firmware was upgraded on the primary after load was shifted to the new standby VPN. | Failover between the appliances occurred without issue and there was no impact to users. | neteng@ncsa.illinois.edu | |
2021-03-18 1230 | 1255 | Jira | Some functionality will be limited due to user limit being reached | Jira | help@service@ncsa.illinois.edu | |
~16:40 | 17:58 | AnyConnect VPN Service | An issue with SSL on the VPN service has caused an issue that has disconnected all users. Network engineering is looking into the issue.
Due to a hardware failure and the VPN not failing over properly to the standby users were unable to connect to the VPN. This was due to an issue with syncing certificates.
During the outage, expect that you won't be able to connect/maintain a connection to the VPN | help+neteng@ncsa.illinois.edu | | 2021-03-16 0950 | 2021-03-16 1000 | CMDB | Will be applying updates per security vetting | CMDB, including web interface, will be down briefly during the update. | ncsagroup+org_itsm@ncsa.illinois.edu | |
2021-03-11 0900 | 2021-03-11 0930 | WAN Link Migration | NCSA Neteng migrated the link to ICCN to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | |
2021-03-04 0900 | 2021-03-04 0905 | WAN Link Migration | NCSA Neteng migrated the 100G link to MREN to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | |
2021-03-01 22:11 | 2021-03-01 22:47 | NCSA vSphere | About 40 VMs lost connection to their NFS storage. | Several VM-based services were timing out during the issue, including: vSphere management, a kerberos replica, a ldap replica, httpproxy, license servers, NCSA fileserver, Identity message queuing, monitoring. That triggered some of those VMs to switch to use read-only disk, needing to be rebooted later. | service@ncsa.illinois.edu | |
...