...
Previous Outages or Maintenance
2022-03-17 0900 | 2022-04-12 1030 | jira | ldap auths have been sporadically failing. This service is being monitored to determine a root cause. | Jira logins break | help+service@ncsa.illinois.edu | RESOLVED |
2022-04-12 0900 | 2022-04-12 0930 | vsphere.ncsa.illinois.edu | vcenter security updates are being installed | vm management interface will be unavailable for 15 mins. | help@ncsa.illinois.edu | COMPLETE |
2022-04-07 1900 | 2022-04-07 1950 | NCSA VPN | Software Upgrades / SSL Certificate | The appliances hosting the NCSA VPN were patched and receive an updated SSL certificate. Users will experience a brief disconnect as load is failed over between the appliances. | neteng@ncsa.illinois.edu | RESOLVED |
2022-04-06 2200 | 2022-04-07 0000 | Some office ports on the second floor. | Once of the switches on the second floor is experiencing a software problem and is currently down. Code updates are being applied. | One of the six switches on the second floor is down. Users who are connected to this port, might not receive link. | help+neteng@ncsa.illinois.edu | RESOLVED |
2022-04-06 1530 | 2022-04-07 0630 | All systems which mount/utilize Taiga | A bug involving the multirail functionality caused constant reboots with one of the metadata servers. This resulted in cluster de-stabilization and loss of function. | All lustre/NFS mountpoints to Taiga, Globus to Taiga. | help@ncsa.illinois.edu | RESOLVED |
2022-04-04 0930 | 2022-04-04 1000 | NCSA LDAP | Instantiation of Delta resource OU branch in the NCSA LDAP database with replication testing. | No impacts to properly configured systems or searches is expected. | help@ncsa.illinois.edu | COMPLETE |
2022-04-01 0600 | 2022-04-01 0700 | NCSA GitLab | GitLab was updated to latest version | All GitLab services was unavailable for a few minutes. | help+service@ncsa.illinois.edu | COMPLETE |
2022-03-23 1000 | 2022-03-23 1600 | Email Lists | Email lists (lists.ncsa.illinois.edu) are not functioning | Ability to send to email lists. Note: Bounced emails will need to be resent. | help+service@ncsa.illinois.edu | COMPLETE |
2022-03-22 0730hrs | 2022-03-22 0915hrs | ldap - NCSA primary server | OS updates and replication changes | NCSA LDAP primary server will be unavailable, replicas should remain accessible | Timothy Bouvet | COMPLETE |
2022-03-21 0800 | 2022-03-21 0830 | cilogon.org | Migrate CILogon Services to AWS | cilogon.org , demo.cilogon.org , crl.cilogon.org | help@cilogon.org | COMPLETE |
2022-03-19 0100 | 2022-03-19 1500 | Campus Cluster | Cooling units at ACB stopped functioning, temperatures in the datacenter soared to cause machines to power off due to high temps. By the time ICI was informed, cooling had resumed at ACB. ICI then restored service | All of Campus Cluster | help@campuscluster.illinois.edu | RESOLVED |
2022-03-17 1100 | 2022-03-17 1123 | ASD and ACHE vsphere clusters and ldap1 and ldap2 | certs on ldap1 and ldap2 were updated | logins to ASD and ACHE vsphere were down for 23 minutes. | help@ncsa.illinois.edu | COMPLETE |
2022-03-17 | 2022-03-17 10:01 | Jira | Logins are slow or unsuccessful | Jira login | RESOLVED | |
2022-03-16 1700 | 2022-03-16 1800 | DNS1 | Hardware replacement on DNS1 server. | DNS lookups will be on own the primary DNS server while the hardware is being swapped. DNS2 will remain up. | help+neteng@ncsa.illinois.edu | COMPLETE |
2022-03-14 1800 | 2022-03-15 23:45 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2022-03-10 0700hrs | 2022-03-10 1500hrs | Distribution panel DP-5C-1020. Power feed C to the north east corner power panels | De-energizing electrical distribution panel DP-5C-1020 to tie in power cables to Holl-I system | Known resources impacted: Granite: already planned to be offline for maintenance iForge: cluster offline for the duration Radiant: cluster online, without power redundancy | help@ncsa.illinois.edu | COMPLETE |
2022-03-09 0700 | 2022-03-09 0810 | linux.ncsa.illinois.edu (aka public-linux) | Upgrade server to RHEL 8 and add NCSA Duo 2FA authentication | Server was unavailable during maintenance. | help+service@ncsa.illinois.edu | COMPLETE |
2022-03-02 930 | 2022-03-07 1715 | ICC | Emergency PM UPDATE: We are currently experiencing unforeseen technical issues with the cluster. We are investigating and expect resolution and restoration of all Campus Cluster services by March 3rd 12PM | ICCP filesystem will be offline. Most projects will be impacted. Special arrangements have been made with some to be able to operate to some degree during the outage. | help@campuscluster.illinois.edu | COMPLETE |
2022-03-02 1237 | 2022-03-02 1715 | iforge (iforge.ncsa.illinois.edu | GPFS issue with interruption of filesystem leading to scheduler pause | 1 running job was aborted, and any new jobs paused during the interruption | help@ncsa.illinois.edu | COMPLETE |
2022-03-02 0600 | 2022-03-02 0630 | Jira | Adding Ram | Jira will be unavailable druning maintenance | COMPLETE | |
2022-03-01 1800 | 2022-03-01 1810 | ldap2 server clients of NCSA LDAP | on-line maintenance | restart rsyslog and Ldap after relocating /var/logs clients should have redundant servers configured | Timothy Bouvet | COMPLETE |
2022-02-28 1800 | 2022-02-28 1830 | ldap1 server clients of NCSA LDAP | on-line maintenance Had to restart rsyslog and Ldap after relocating /var/log | slow response from ldap1 but clients should have redundant servers configured | Timothy Bouvet | COMPLETE |
2022-02-28 0900 | 2022-02-28 1030 | CMDB | V1.7.20220228 Release | MDB database will be unavailable. ITSM's openDCIM will be down for a short period (~ 5 minutes) while the data is reloaded. | COMPLETE | |
2022-02-26 0730 | 2022-02-26 0750 | NCSA GitLab | GitLab was updated to latest version | All GitLab services were unavailable | help+service@ncsa.illinois.edu | COMPLETE |
2022-02-25-10:00 | 2022-02-25-13:00 | Taiga - CenterWide FS | Full file system outage | All clients mounting Taiga | COMPLETE | |
2022-02-09 1400 | 2022-02-25 1030 | Jira, Internal/Savannah, LDAP, POP, Hosted web servers, virtual classroom, vcenter | The NCSA VMWare cluster is experiencing storage performance issues. -- Update: Adjustments have been made to storage used by the LDAP servers and other non-essential VM instances have been disabled. Testing is indicating that response times have improved and services are working normally again. | We monitoring services. Please report any issues to help@ncsa.illinois.edu | Timothy Bouvet | RESOLVED FOR NOW |
2022-02-24 1000 | 2022-02-24 1115 | cerberus2.ncsa.illinois.edu, tg-kdc1.security.ncsa.illinois.edu, bwbh2.ncsa.illinois.edu | One of the IRST ESXi machines unexpectedly shutdown. | The listed hosts are currently unavailable | COMPLETE | |
2022-02-23 1700 | 2022-02-23 1900 | DNS2 | DNS2 hardware will be replaced. | There will be a brief outage of DNS2, while IP's are migrated to the new server. | help+neteng@ncsa.illinois.edu | COMPLETE |
2022-02-22: 0825 | 2022-02-22: 1324 | Slack | Info from Slack (https://status.slack.com/) We've resolved the issue, and all impacted customers should now be able to access Slack. You may need to reload Slack (Cmd/Ctrl + Shift + R) to see the fix on your end. If that doesn't work, try clearing cache (Help > Troubleshooting > Clear Cache and Restart from the app menu). Thanks for bearing with us and we apologize for the disruption to your work day! Feb 22, 1:24 PM CST We're seeing signs of improvement. Please try reloading Slack, and if not a cache reset. We’re still monitoring the situation. We’ll confirm once this issue is fully resolved. Feb 22, 11:07 AM CST Slack is not loading for some users. We are continuing to investigate the cause and will provide more information as soon as it's available. Feb 22, 9:23 AM CST We're still working towards a full resolution. We'll be back with another update soon. Thank you for your patience. Feb 22, 8:44 AM CST We’re investigating the issue where Slack is not loading for some users. We’re looking into the cause and will provide more information as soon as it's available. Feb 22, 8:25 AM CST | Various issues accessing and using Slack | help@ncsa.illinois.edu | COMPLETE |
2022-02-18 12:10PM | 2022-02-18 | Jira | Reboot to add ram/swap This is to improve stability | Jira tickets unavailable | Timothy Bouvet | COMPLETE |
2022-02-10 1030 | 2022-02-18 3:55pm | Ngale filesystem | The Lustre filesystem is not loading correctly. The support team has been contacted. Near completion: Working with vendor on additional configuration changes. Hope to complete final validation and return to service by close of business 2022-02-18. | /ngale filesystem is not accessible. | Peter Hartman | COMPLETE |
2022-02-18 12:10PM | 2022-02-18 | Jira | Reboot to add ram/swap This is to improve stability | Jira tickets unavailable | Timothy Bouvet | COMPLETE |
2022-02-14 1PM | 2022-02-14 4:15PM | All NCSA LDAP servers | Expanding schema and restarting servers | systems will reconnect to LDAP server after restart | COMPLETE | |
2022-02-09 1000 | 2022-02-09 1200 | Facility UPS | UPS DC voltage calibration | UPS will be taken to maintenance bypass and all connected systems will be fed from unprotected power source (no power interruption). | rantissi@illinois.edu | COMPLETE |
2022-02-09 0900 | 2022-02-09 0940 | Line card failure in Core-East | Line card failure in Core-east, which is resulting in connectivity issues for some infrastructure in NCSA 3003. | DNS2, and LSST systems in 3003 were down until the uplinks could be migrated to a new port on Cores | help+neteng.ncsa.illinois.edu | COMPLETE |
2022-02-01 8AM | 2022-02-01 4PM | Jira/ldap-auth1 | login issues | Jira Access | ||
2022-02-09 0534 | 2022-02-09 0811 | LDAP (and dependent services, incl. Jira) vSphere/ICI VMware | Authorization timeouts/failures in dependent services. ICI staff are investigating. | LDAP (and dependent services, incl. Jira) vSphere/ICI VMware Cause of most severe issues was power fluctuations around 0555, but certain LDAP servers showed degraded slightly earlier. | COMPLETE | |
2022-02-09 0600 | 2022-02-09 0645 | NCSA MySQL | MySQL database servers need to be synchronized to bring replicated database servers online. NOTE: The MySQL database is back up, but users may experience issues due to an LDAP issue. | Wiki, JIRA, Savannah/Internal, Identity, and some web sites will stop working. More details are linked here. | help+service@ncsa.illinois.edu | COMPLETE |
2022-02-08 7AM | 22-02-08 3:15PM | iforge / vforge / license servers | Regular Maintenance | iforge, vforge, license servers | COMPLETE | |
2022-02-08 1000 | 2022-02-08 1245 | CMDB | V1.6.20220207 Release | CMDB database will be unavailable. ITSM's openDCIM will not be impacted. | kimber7@illinois.edu | COMPLETE |
2022-02-04 0600 | 2022-02-04 0640 | NCSA GitLab | GitLab was updated to latest version | All GitLab services were unavailable | help+service@ncsa.illinois.edu | COMPLETE |
2022-02-01 0800 | 2022-02-01 0900 | cilogon.org | Update to OA4MP v5.2.4 | Improvements in the back-end service | help@cilogon.org | COMPLETE |
2022-01-25 | 2022-01-25 | Facility UPS | Replace UPS batteries | All systems with facility UPS feed | rantissi@illinois.edu | COMPLETE |
2022-01-24 1800 | 2022-01-24 20:00 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2022-01-24 0400 | 2022-01-24 0630 | Failed line card on neo-hpc-1 switch | Line card failure is affecting devices that are plugged into Neo-hpc-1 aggregation switch. We've migrated links off the failed card, to other ports on the same switch. | No services are currently impacted. | help+neteng@ncsa.illinois.edu | IN PROGRESS |
2022-01-19 0800 | 2022-01-19 2000 | ICC | ICC Quarterly Maintenance | All ICC services | COMPLETE | |
2022-01-18 0800 | 2022-01-18 0830 | cilogon.org | Upgrade MyProxy CA servers to CentOS 7 | Upgrade back-end MyProxy CA VMs from CentOS 6 to CentOS 7. No downtime is expected. | help@cilogon.org | COMPLETE |
2022-01-14 0600 | 2022-01-14 1715 | Business IT database had bad data. | A database that NCSA mirrors from campus changed without notice breaking our MIS system. Business IT isolated the issue and corrected the data. | Multiple complex systems have been affected by this data corruption issue. | help+service@ncsa.illinois.edu | RESOLVED |
2022-01-14 0800 | 2022-01-14 1720 | NCSAnet wireless | NCSAnet Wireless was unavailable due to bad data in ldap | Users couldn't connect to the NCSAnet wireless network | help+neteng@ncsa.illinois.edu | RESOLVED |
2022-01-05 1100 | 2022-01-05 1145 | CMDB | Version V1.5.20211223 release | CMDB database will be unavailable for a few moments; openDCIM will be unavailable for a few moments. | kimber7@illinois.edu | COMPLETE |
2021-12-20 1830 | 2021-12-20 2030 | JIra | Version Upgrade to address security issue | Jira will be unavailable | help+service@ncsa.illinois.edu | COMPLETE |
2021-12-17 1300 | 2021-12-17 1340 | CMDB | Version V1.4.20211217 release | CMDB database will be unavailable for a few moments; openDCIM will not be affected. | COMPLETE | |
2021-12-17 0600 | 2021-12-17 0622 | NCSA GitLab | The server was updated with some new Puppet configurations. | GitLab services was unavailable for a few minutes as the SSL certificate for the service was updated. | help+service@ncsa.illinois.edu | COMPLETE |
2021-12-16 1400 | 2021-12-16 1430 | HTTP web proxy: httpproxy.ncsa.illinois.edu | NCSA's general purpose HTTP web proxy server was rebuilt. | HTTP web proxying through httpproxy was unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2021-12-10 0700 | 2021-12-10 1345 | iForge | InfiniBand switch maintenance | All systems unavailable | iforge-admin@lists.ncsa.illinois.edu | COMPLETE |
2021-12-10 0900 | 2021-12-10 1000 | Bastion Hosts (Production group B) | Patching out of cycle | Bastion Hosts (Production group B) were individually unavailable during reboot | help+security@ncsa.illinois.edu | COMPLETE |
2021-12-09 0900 | 2021-12-09 0931 | Bastion Hosts (Production group A) | Patching out of cycle | Bastion Hosts (Production group A) were individually unavailable during reboot | COMPLETE | |
2021-12-09 0800 | 2021-12-09 0900 | All IDDS services | IDDS Postgres and Ruby on Rails upgrades | All IDDS services | tolbert@illinois.edu | COMPLETE |
2021-12-09 0600 | 2021-12-09 0613 | NCSA GitLab | GitLab was updated to latest version | All GitLab services were unavailable for about 5 minutes | help+service@ncsa.illinois.edu | COMPLETE |
2021-12-07 1400 | 2021-12-07 1443 | LSST | Kubernetes on NTS is not working properly after updates | Kubernetes on NCSA Test Stand | lsst-admin@ncsa.illinois.edu | RESOLVED |
2021-12-07 0800 | 2021-12-07 1400 | LSST | LSST Quarterly Maintenance | All LSST services hosted at NCSA | lsst-admin@ncsa.illinois.edu | COMPLETE |
2021-12-07 0930 | 2021-12-07 1030 | ACHE Firewalls | software maintenance | Firewalls will be upgraded using fail over procedures - no traffic impact expected | James Eyrich - eyrich on slack | COMPLETE |
2021-11-30 0900 | 2021-11-30 1100 | TechServices connectivity at NPCF (wireless, facilities, IRIS, Prox scanners). | Tech Services will be replacing several network devices at NPCF that will impact a variety of services at NPCF. | The Tech Services will be replacing 3 devices at NPCF. Along with sporadic wireless outages, some facilities networks (such as IRIS and card readers) will be offline while some equipment is replaced. The main router replacement should only take 5 mins or so. The wireless switches will take 15-20 mins each. | help+neteng@ncsa.illinois.edu |
| ||||||||
2021-11-30 1800 | 2021-12-01 00:15 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2021-11-19 12:52 | 2021-11-19 13:22 | lsst-esx08 | server crashed | The following VMs rebooted: Idap-Isst-ncsa3 | lsst-admin@ncsa.illinois.edu |
| ||||||||
2021-11-18 1400 | 2021-11-18 1750 | ICI Metrics & Alerts | Migration to RHEL 8, ASD Puppet control, & CILogon authentication | The viewing of ICI dashboards and the firing of ICI alerts was unavailable during this migration | malone12@illinois.edu, bglick@illinois.edu |
| ||||||||
2021-11-11 0925 | 2021-11-11 0940 | NCSA website | Communications launched the newly redesigned NCSA site. | During launch, you may experience some down-time between while NCSA’s technical team re-points the URL to the new site. | communications@lists.ncsa.illinois.edu |
| ||||||||
2021-11-09 0700 | 2021-11-09 1545 | iForge | Quarterly Maintenance | All systems unavailable | iforge-admin@lists.ncsa.illinois.edu |
| ||||||||
2021-11-03 0000 | 2021-11-04 | Netdot SSL Certificate | The SSL certificate for Netdot expired and network engineering replaced it with a new one. | SSL certificate expired. Service remained available throughout the period | help+neteng@ncsa.illinois.edu |
| ||||||||
2021-11-03 1100 | 2021-11-03 1400 | ESnet 100G link migration. | ESnet engineers will be migrating NCSA's 100G link to the new ESnet6 infrastructure. | The link will be down during the migration. Traffic will fall back to alternative paths. | help+neteng@ncsa.illinois.edu |
| ||||||||
2021-11-03 1100 | 2021-11-03 1120 | NCSA GitLab | GitLab was updated to latest version. | All GitLab services were be unavailable | help+service@ncsa.illinois.edu |
| ||||||||
2021-11-03 1000 | 2021-11-03 1020 | Core Router Linecard Replacement | Neteng replaced a linecard in one of the core routers | All connections to this linecard are redundant and no outage has been reported. | neteng@ncsa.illinois.edu |
| ||||||||
2021-11-02 15:20 | 2021-11-02 16:37 | Production version of DCIM for CMDB (https://ncsa-cmdb.ncsa.illinois.edu) | Invalid certificate issue | (Fixed) The production version of CMDB will be unavailable until new certificate is received and applied. In the interim, the test server (https://ncsa-cmdb-test.ncsa.illinois.edu) has been made available for use, with all current data. | Kimber Blum (kimber7@illinois.edu) |
| ||||||||
2021-11-02 0800 | 2021-11-02 0900 | cilogon.org | Update to OA4MP v5.2.3 | Address several small issues in the back-end service | help@cilogon.org |
| ||||||||
0600 | 0710 | Jira | Jira Upgrade | Jira | help+service@illinois.edu |
| ||||||||
2021-10-25 1800 | 2021-10-26 0018 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2021-10-20 0800 | 2021-10-20 1800 | ICCP | ICCP Quarterly Maintenance
| ICCP Cluster nodes only | help@campuscluster.illinois.edu |
| ||||||||
2021-10-20 0700 | 2021-10-20 0715 | IDDS | IDDS maintenance (puppet changes) | All IDDS services | idds-admin@ncsa.illinois.edu |
| ||||||||
2021-10-15 1230 | 2021-10-15 0713 | NCSA GitLab | Server ran out of disk space | All GitLab services were unavailable | help+service@ncsa.illinois.edu |
| ||||||||
2021-10-11 0800 | 2021-10-11 1900 | Nightingale, ACHE | Planned maintenance on the Nightingale cluster and the ache-dist switch | There was an outage for the following services during the maintenance:
| help+service@ncsa.illinois.edu |
| ||||||||
2021-10-04 1000 | 2021-10-04 1005 | www.ncsa.illinois.edu per-user web directories | Per-user web directories on the main NCSA website are being redirected to a new website dedicated to per-user web directories. | URLs like www.ncsa.ncsa.illinois.edu/People/* are redirected to their new home at https://users.ncsa.illinois.edu/*. | help+service@ncsa.illinois.edu |
| ||||||||
2021-09-30 0800 | 2021-09-30 1200 | LSST | LSST Quarterly Maintenance
| All LSST services hosted at NCSA | lsst-admin@ncsa.illinois.edu |
| ||||||||
2021-09-29 0800 | 2021-09-29 0900 | cilogon.org | Update to OA4MP v5.2.2 | Update Java database libraries, and address several small issues | help@cilogon.org |
| ||||||||
2021-09-29 0800 | 2021-09-29 0813 | CMDB / openDCIM | Installing/upgrading to CMDB release Sep2021 | The openDCIM front end of CMDB will be down for 15-30 minutes |
| |||||||||
2021-09-28 0700 | 2021-09-28 1554 | NPCF work on facility power | Deenergizing power to transformer TX-4C-1020, pulling and terminating busduct cabling from transformer to room 2020. | One third of Sonexion racks will lose source 1 power (Feed C) and will continue to operate on source2 degrading reliability by losing power redundancy. |
| |||||||||
2021-09-28 0700 | 2021-09-28 0900 | Blue Waters | A rack of scratch lost power during the power outage. | Scratch was partially unavailable due to TOR power resiliency issue. |
| |||||||||
2021-09-28 0800 | 2021-09-28 0900 | idp.ncsa.illinois.edu | Assert eduPersonAssurance Cappuccino profile for NCSA Staff | NCSA Staff logging in with the NCSA Identity Provider will be able to get Silver CA certificates from cilogon.org | help+idp@ncsa.illinois.edu |
| ||||||||
2021-09-21-14:50 | 2021-09-21-15:02 | vcenter appliance controlling ASD vsphere | vcenter appliance was upgraded | vsphere.ncsa.illinois.edu was off-line for 12 minutes. | help+service@ncsa.illinois.edu |
| ||||||||
2021-09-21 0700 | 2021-09-20 1115 | Blue Waters | Power Work caused non redundant switches and misconfigured servers to shutoff | Blue Waters Compute, Login and Scheduler | bw-admin@ncsa.illinois.edu |
| ||||||||
2021-09-20 1800 | 2021-09-20 2130 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2021-09-14 0000 | 2021-09-14 0600 | Internet2 WAN circuit | Internet2 will be migrating our WAN circuit to new hardware. | Traffic over that path will reroute while the change happens. We anticipate the migration to take less than 30 mins. | help+neteng@ncsa.illinois.edu |
| ||||||||
0600 | 0900 | Wiki | Upgrade to next version | Wiki will be unavailable |
| |||||||||
2021-09-09 0600 | 2021-09-09 0700 | NCSA VPN | Software Upgrades | The appliances hosting the NCSA VPN will be patched. Users will experience a brief disconnect as load is failed over between the appliances. | help+neteng@ncsa.illinois.edu |
| ||||||||
2021-09-08 1300 | 2021-09-08 1400 | Group prod_b Bastion hosts | Out of cycle patching | Bastion hosts in group prod_b will be patched and rebooted. (see MOTD for group assignment) | help+security@ncsa.illinois.edu |
| ||||||||
2021-09-08 0900 | 2021-09-08 1000 | Group prod_a Bastion hosts | Out of cycle patching | Bastion hosts in group prod_a will be patched and rebooted. (see MOTD for group assignment) | help+security@ncsa.illinois.edu |
| ||||||||
2021-09-02 9:30 AM | 2021-09-02 1PM | PDU in rack AA81 | We are replacing a PDU in NPCF rack AA81 | All systems in the rack have redundant power connections. No service outages are expected from this work | help+service@ncsa.illinois.edu |
| ||||||||
2021-09-01 0700 | 2021-09-01 0800 | cilogon.org | Update to OA4MP v5.2.1 | Device Authorization Grant Flow transactions will be stored in database rather than in memory | help@cilogon.org |
| ||||||||
1200 | 1205 | Wiki | Security patch is being applied | Wiki will be down | help+service@ncsa.illinois.edu |
| ||||||||
2021-08-25 9:00am | 2021-08-25 6:45pm | Blue Waters | System reboot due to blade fallout coinciding with HSN reroute and SMW not recovering. | All jobs interrupted | jenos@illinois.edu |
| ||||||||
2021-08-19 0538 | 2021-08-19 0700 | IRST systems hosted on IRST Node 2 | Storage controller failure, all VMs taken offline | some prod_b systems, and non-redundant services. | eyrich@illinois.edu |
| ||||||||
2021-08-19 5:34 | 2021-08-19 6:20 | cilogon.org | Storage controller failure in IRST VM farm | cilogon.org was unreachable until we initiated fail-over to our backup servers at NICS. | help@cilogon.org |
| ||||||||
2021-08-18 1136 | 2021-08-18 1156 | NCSA Wiki | Test instance caused interference. | NCSA Wiki | help+service@ncsa.illinois.edu |
| ||||||||
2021-08-17 0500 | 2021-08-17 0700 | NCSA/NPCF Wide Area Network | Between 5:00AM and 7:00 AM CDT on 08/17/2021, Campus ICCN Engineers will be upgrading firmware on the ICCN router 710rtr at the Starlight facility in Chicago. | Our peerings with MREN and OmniPoP will go down. All traffic destined for those peerings will reroute via other peerings, so no production impact is expected. | help+neteng@ncsa.illinois.edu |
| ||||||||
2021-08-16 1800 | 2021-08-17 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2021-08-12 9:54 | 2021-08-12 1012 | Jira | Attempted snapshot of Jira in vSphere was too intensive for the system | Jira | help+service@illinois.edu |
| ||||||||
2021-08-10 2000 | 2021-08-011 0000 | Radiant API and Web access | Radiant cluster name change. | During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable. Instances will continue to run and be available over the network with no interruptions. | radiant-admin@ncsa.illinois.edu |
| ||||||||
2021-08-10 07:00 | 2021-08-10 17:10 | iForge | Quarterly Maintenance | All systems unavailable | iforge-admin@lists.ncsa.illinois.edu |
| ||||||||
2021-08-09 1421 | 2021-08-09 1440 | NCSA Wiki | DB conflict configuration with Wiki & Wiki-Test | NCSA Wiki was unaccessible | help+service@ncsa.illinois.edu |
| ||||||||
2021-08-05 1000 | 2021-08-05 1030 | NPCF Core Router - Linecard Reboot | A problem was identified on one of the line cards in our core router requiring a reboot of the linecard. The linecard was successfully rebooted and we will continue monitoring the hardware for further issues. | All connections to this linecard are redundant and there was no impact to users. | neteng@ncsa.illinois.edu |
| ||||||||
2021-08-05 0800 | 2021-08-05 1000 | LSST | LSST Emergency OS Patching | LSST services hosted at NCSA except:
| lsst-admin@ncsa.illinois.edu |
| ||||||||
2021-08-04 0800 | 2021-08-04 1700 | Radiant API and Web access | Installation of new Radiant cluster Cluster name changes are starting at 1100; This will make the horizon dashboard unreachable. | During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable. Instances will continue to run and be available over the network with no interruptions. | radiant-admin@ncsa.illinois.edu |
| ||||||||
2021-08-04 0700 | 2021-08-04 0800 | cilogon.org | Update to OA4MP v5.2.0 | Added support for Device Authorization Grant Flow (RFC 8628) | help@cilogon.org |
| ||||||||
2021-08-03 0800 | 2021-08-03 1700 | Radiant API and Web access | Installation of new Radiant cluster | During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable. Instances will continue to run and be available over the network with no interruptions. | radiant-admin@ncsa.illinois.edu |
| ||||||||
2021-08-03 9:00 am | 2021-08-03 11:30 am | Radiant Cluster | A change was made to the firewall that unintentionally restricted access for instances and other internal cluster communication. | Access to instances and workload | radiant-admin@ncsa.illinois.edu |
| ||||||||
2021-07-31 0600 | 2021-07-31 0630 | CILogon hosted services | Infrastructure maintenance | During this time each service hosted by CILogon including COmanage Registry, LDAP, Grouper, SAML proxy, and MDQ will become unavailable for a short time. Each individual service outage will last less than 5 minutes. Services that will not be impacted include: * OIDC clients that do not query LDAP for resolving attributes * X.509 certificate issuance and certificate revocation lists * LIGO and GW-Astronomy services | help@cilogon.org |
| ||||||||
2021-07-29 1300 | 2021-07-29 1400 | IRST-run bastion hosts (pool B) | Security patching | Hosts managed by IRST will be patched and rebooted. Only hosts in pool B will be patched at this time | help+security@ncsa.illinois.edu |
| ||||||||
2021-07-29 0900 | 2021-07-29 1000 | IRST-run bastion hosts (pool A) | Security patching | Hosts managed by IRST will be patched and rebooted. Only hosts in pool A will be patched at this time | help+security@ncsa.illinois.edu |
| ||||||||
2021-07-28 1000 | 2021-07-28 1050 | LSST | OS Updates on only NCSA Test Stand (NTS) | Only the LSST NCSA Test Stand (NTS) services hosted at NCSA | lsst-admin@ncsa.illinois.edu |
| ||||||||
2021-07-27 0600 | 2021-07-27 0900 | Jira | Upgrade | Jira will be unavailable |
| |||||||||
2021-07-26 1800 | 2021-07-27 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2021-07-21 0800 | 2021-07-21 2900 | ICCP | ICCP Quarterly Maintenance
| All ICCP services | help@campuscluster.illinois.edu |
| ||||||||
2021-07-21 15:24 | 2021-07-21 21:50 | ASD Vshpere cluster in 3003 | One of the 4 hypervisors in the cluster paniced. Unscheduled preventative maintenance is being preformed on it and the other 3 nodes in the cluster. | after the initial outage at 15:24, there should be no additional outages. | help+service@ncsa.illinois.edu |
| ||||||||
2021-07-13 0700 | 2021-07-13 0800 | cilogon.org | Update to OA4MP v5.1.4. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.4. | help@cilogon.org |
| ||||||||
2021-07-08 0800 | 2121-07-08 1000 | OpenAFS | The remaining OpenAFS database servers were upgraded. | No service impacts were seen | help+service@ncsa.illinois.efu |
| ||||||||
2021-07-07 0600 | 2021-07-07 0800 | CILogon AWS Hosted Services | Upgrading AWS RDS Aurora MySQL v5.6 to v5.7 | COmanage Registry and Grouper services hosted by CILogon will be unavailable | help@cilogon.org |
| ||||||||
2021-07-01 2140 | 2021-07-01 1430 | Horizon dashboard access was down for the entire period. Cluster networking was down from 1200 to1430. | Investigations into Horizon dashboard accessibility issues resulted in the application of an incorrect default network gateway for the cluster around noon. This was corrected and networking functionality restored around 1400. Instances began recovering soon thereafter. | Radiant admins believe running instances have recovered on their own but we advise everyone to check their systems and report any issues they see to the help desk. | help@ncsa.illinois.edu |
| ||||||||
2021-07-01 0247 | 2021-07-01 1300 | Various systems in NPCF, ACB, NCSA | There was a power event in the Champaign-Urbana area at around 2:47AM today. Details about the cause are currently unknown. This event caused disruptions to systems at the NCSA building, NPCF and ACB. Known issues have generally been resolved but there may be unidentified issues lingering. If you encounter any problems, please notify NCSA help desk staff (help@ncsa.illinois.edu). | Multiple systems/services were impacted. All have been recovered and return to normal operations is complete. | NCSA help desk |
|
2021-06-29 22:00 | 2021-06-29 23:59 | NCSA 4th Floor Office network | Rebooting one or more of the office switches on the NCSA Building 4th floor to resolve a phone issue. | Office port connectivity will be intermittent during the maintenance window. | Matt Kollross | RESOLVED |
2021-06-24 0800 | 2021-06-24 1345 | LSST |
| Prod/Stable K8S | lsst-admin@ncsa.illinois.edu | RESOLVED |
2021-06-24 0800 | 2021-06-24 1200 | LSST | LSST Quarterly Maintenance
| All LSST services hosted at NCSA EXCEPT Prod/Stable K8S | lsst-admin@ncsa.illinois.edu | COMPLETE |
2021-06-22 0000 | 2021-06-22 0400 | Internet2 WAN link | Internet2 will be migrating NCSA's physical port to their new next generation infrastructure. | During the maintenance, our I2 connection will be down. Traffic will reroute to other connections. Some point to point connections maybe unavailable for period of time. The maintenance window is not expected to take all 4 hours. | Matt Kollross | COMPLETE |
2021-06-21 1800 | 2021-06-22 0000 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2021-06-17-0700 | 2021-06-17-0820 | OpenAFS | The OpenAFS database server kaskaskia was upgraded | No service outages were observed or reported. | help+service@ncsa.illinois.exdu | COMPLETE |
2021-06-12 2200 | 2021-06-15 1500 | LSST Firewall | The NPCF secondary firewall was offline due to a hard drive failure. | No impact occurred to production services as the primary firewall stayed online. | RESOLVED | |
2021-06-14 1700 | 2021-06-15 0958 | NCSA GitLab | Attempt to fix an authentication bug for a particular user accidentally broke all authentication through the web interface, | Authentication through the web interface did not work. | help+service@ncsa.illinois.edu | RESOLVED |
2021-06-11 | 2021-06-11 0905 | NCSA Jira | Jira email problem | Jira is not accepting issues via email, you can still create issue directly via Jira GUI | RESOLVED | |
2021-06-10 0700 | 2021-06-10 0800 | cilogon.org | Update to OA4MP v5.1.3. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.3. | help@cilogon.org | COMPLETE |
1000 | 1030 | Jira.ncsa.illinois.edu | Configuration change to address a vulnerability | There should not be any service interruption, but as with all things, it is possible | help+service@ncsa.illinois.edu | RESOLVED |
2021-06-02 | 2021-06-02 | Netdot | Netdot web access now requires 2FA via SSL VPN, or Cerberus proxy. | Security requested that Netdot require 2FA, in order to access the web interface. To accommodate that request, the Netdot firewall has limited web access to the VPN subnet or via proxy from the Cerberus jump hosts. | Matt Kollross | RESOLVED |
2021-05-25 | 2021-05-26 | vcenters for ache and ASD | emergency security updates were applied. | the administrative interface was off-line for about 20 minutes as the updates were installed. | help+service@ncsa.illinois.edu | RESOLVED |
2021-05-26 1000 | 2021-05-26 1030 | VoIP phones at NPCF | Migrating the VoIP networks to a campus IP to enable future migrations by tech services. | After the networks are migrated, a reboot all phones at the NPCF building will be performed. | Matt Kollross | RESOLVED |
2021-05-21 1800 | 2021-05-21 1900 | VoIP phones at the NCSA building | Migrating the VoIP networks to a campus IP to enable future migrations by tech services. | After the networks are migrated, a reboot all phones at the NCSA building will be performed. | Matt Kollross | RESOLVED |
2021-05-20 05:40 | 2021-05-20 08:45 | LSST | ESXi host outage causing degradation of select services. | Degradation of select services:
Also loss of redundancy for some underlying services, including auth/access & k8s head nodes. | lsst-admin@ncsa.illinois.edu | RESOLVED |
2021-05-15 0600 | 2021-05-15 0800 | CILogon hosted services including COmanage Registry, LDAP, SAML proxy, SAML AA, MDQ | Maintenance | All CILogon hosted services were temporarily unavailable. | help@cilogon.org | COMPLETE |
2021-05-12 07:00 | 2021-05-12 08:00 | NCSA Internal Web Server Upgrade (aka Savannah or MIS Tools) | Updates were made that will affect the availability of the NCSA internal website and Savannah system. The system was be unavailable during this time. | COMPLETE | ||
2021-05-11 07:00 | 2021-05-11 19:00 | iForge | Quarterly Maintenance | All systems unavailable | COMPLETE | |
2021-05-06 0900 | 2021-05-06 0945 | WAN Link Migration | NCSA Neteng migrated the WAN link to Internet 2 to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. Any connections relying on layer-2 connections over AL2S saw a brief blip as the connection is cut over. Affected parties were contacted in advance. | help+neteng@ncsa.illinois.edu | COMPLETE |
2021-05-03 0600 | 2021-05-03 0630 | CILogon Multi-tenant COmanage Registry | Upgrade to version 3.3.2 | The service at https://registry.cilogon.org was unavailable | help@cilogon.org | COMPLETE |
2021-04-29 1600 | 2021-04-29 1700 |
| Add new nodes into Condor service pools |
| lsst-admin@ncsa.illinois.edu | COMPLETE |
2021-04-21 08:00 | 2021-04-21 20:00 | ICCP | ICCP Quarterly Maintenance | The scheduler will be down. All compute nodes will be converted to rhel7.9 with RedHat IB. | COMPLETE | |
2021-04-15 1600 | 2021-04-15 1700 | NCSA Opensource | Upgrade of OS on all machines related to opensource | jira, wiki, git etc hosted at https://opensource.ncsa.illinois.edu/ | kooper@illinois.edu | COMPLETE |
2021-04-15 12:25 | 2021-04-15 14:45 | ICI vmware | Several hosts on the vmware service were experiencing timeouts
| no or intermittent connectivity to these hosts | help+service@ncsa.illinois.edu | RESOLVED Root cause is still being investigated. |
2021-04-15 0900 | 2021-04-15 0942 | CMDB | Applying new certificates and restarting services | CMDB, including web interface, will be down briefly during the update. | ncsagroup+org_itsm@ncsa.illinois.edu | RESOLVED |
2021-04-15 0900 | 2021-04-15 0920 | WAN Link Migration | NCSA Neteng will migrated the WAN link to ESnet to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-04-14 15:00 | 2021-04-14 15:00 | git.ncsa.illinois.edu | Users can no longer access repositories from git clients over HTTPS using their NCSA password. | NCSA passwords can not access repositories with Git clients. Instead use ssh keys over SSH or personal access tokens over HTTPS. We thought this went into effect during git changes on Nov 2, 2020 but discovered it was still working until we made changes to GitLab to fully remove LDAP functionality. | help+service@ncsa.illinois.edu | COMPLETE |
2021-04-13 1415 | 2021-04-13 1845 | git.ncsa.illinois.edu | The GitLab website at git.ncsa.illinois.edu was having issues with authentication. The LDAP server that it uses was timing out. |
| help+service@ncsa.illinois.edu | RESOLVED |
2021-04-13 0800 | 2021-04-13 0830 | cilogon.org | Update to OA4MP v5.1.1. | The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.1. | help@cilogon.org | COMPLETE |
2021-04-12 1800 | 2021-04-12 2245 | File & Print Servers | Monthly Windows File & Print Server Maintenance | Windows File Shares such as HR, Business Office, Home, etc. and printing in the NCSA & NPCF buildings were unavailable. | help+service@ncsa.illinois.edu | COMPLETE |
2021-04-10 0600 | 2021-04-10 0800 | CILogon hosted COmanage, Grouper, SATOSA, LDAP | On Saturday, April 10, the CILogon team will perform maintenance on the infrastructure used for hosted services. | As part of the maintenance all COmanage Registry, LDAP, Grouper, SAML proxy, SAML attribute authority, and MDQ services hosted by CILogon may experience brief outages. We do not expect that any specific service outage will last for more than a minute. | help@cilogon.org | COMPLETE |
2021-04-08 0900 | 2021-04-08 1045 | WAN Link Migration | NCSA Neteng migrated the WAN link to ICCN Node-1 to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. Issues were noticed by users during the outage and are currently being investigated in cooperation with our upstream provider. | help+neteng@ncsa.illinois.edu | COMPLETE |
2021-04-08 0730 | 2021-04-08 0734 | NCSA Wiki | NCSA's Wiki service was restarted | NCSA's Wiki service was restarted to apply a new SSL certificate and renewed Confluence license. The wiki was not available for 4 minutes while it reloaded. | help+service@ncsa.illinois.edu | COMPLETE |
2021-04-07 1610 | 2021-04-07 1733 | Internal Savannah/MIS website | The Savannah/MIS website would not load due to a corrupted MySQL database table referenced across all of the Savannah tools. | Internal/Savannah | help+service@ncsa.illinois.edu | RESOLVED |
1st report 7:30am Monday | 8:19am Monday | NCSA LDAP2 | ldap2 is not responsive to authentication requests | NCSA Jira, any systems using LDAP2 as its only source. | help+service@ncsa.illinois.edu | RESOLVED |
2021-03-30 0800 | 2021-03-30 0845 | DNS1 | A software issue was causing BIND to fail. | DNS was not able to resolve during the period of time. DNS2 remained operational. | neteng+help@ncsa.illinois.edu | RESOLVED |
2021-03-23 2000 | 2021-03-23 2025 | NCSA VPN | The standby VPN hardware was replaced and transitioned into the current VPN cluster. Failover went as expected and firmware was upgraded on the primary after load was shifted to the new standby VPN. | Failover between the appliances occurred without issue and there was no impact to users. | neteng@ncsa.illinois.edu | RESOLVED |
2021-03-18 1230 | 1255 | Jira | Some functionality will be limited due to user limit being reached | Jira | help@service@ncsa.illinois.edu | RESOLVED |
~16:40 | 17:58 | AnyConnect VPN Service | An issue with SSL on the VPN service has caused an issue that has disconnected all users. Network engineering is looking into the issue. Due to a hardware failure and the VPN not failing over properly to the standby users were unable to connect to the VPN. This was due to an issue with syncing certificates. | During the outage, expect that you won't be able to connect/maintain a connection to the VPN | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-03-16 0950 | 2021-03-16 1000 | CMDB | Will be applying updates per security vetting | CMDB, including web interface, will be down briefly during the update. | ncsagroup+org_itsm@ncsa.illinois.edu | RESOLVED |
2021-03-11 | 2021-03-11 | WAN Link Migration | NCSA Neteng migrated the link to ICCN to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-03-04 | 2021-03-04 | WAN Link Migration | NCSA Neteng migrated the 100G link to MREN to new hardware. | Traffic was automatically re-routed to redundant paths during the link outage. | help+neteng@ncsa.illinois.edu | RESOLVED |
2021-03-01 22:11 | 2021-03-01 22:47 | NCSA vSphere | About 40 VMs lost connection to their NFS storage. | Several VM-based services were timing out during the issue, including: vSphere management, a kerberos replica, a ldap replica, httpproxy, license servers, NCSA fileserver, Identity message queuing, monitoring. That triggered some of those VMs to switch to use read-only disk, needing to be rebooted later. | service@ncsa.illinois.edu | RESOLVED |
...