...
Start | End | What System/Service is affected | What is happening? | What will be affected? | Contact Person | Status |
---|---|---|---|---|---|---|
2022-03-17 09:08 | Jira | Logins are slow or unsuccessful | Status | | ||
colour | Yellow | |||||
title | investigating | ICC | IO hanging. GPFS servers could not talk to Compute nodes. Some compute nodes were expelled. Nodes are currently returning to service. The filesystem and scheduler survived without significant interruptions except to a select number of nodes | Some slurm compute nodes | help@campuscluster.illinois.edu |
|
...
Start | End | What System/Service was affected? | What happened? | What was affected? | Contact Person | Status | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022-03-17 | 2022-03-17 10:01 | Jira | Logins are slow or unsuccessful | Jira login |
| |||||||||
2022-03-16 1700 | 2022-03-16 1800 | DNS1 | Hardware replacement on DNS1 server. | DNS lookups will be on own the primary DNS server while the hardware is being swapped. DNS2 will remain up. | help+neteng@ncsa.illinois.edu |
| ||||||||
2022-03-14 1800 | 2022-03-15 23:45 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares were unavailable during maintenance. Users were able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2022-03-10 0700hrs | 2022-03-10 1500hrs | Distribution panel DP-5C-1020. Power feed C to the north east corner power panels | De-energizing electrical distribution panel DP-5C-1020 to tie in power cables to Holl-I system | Known resources impacted: Granite: already planned to be offline for maintenance iForge: cluster offline for the duration Radiant: cluster online, without power redundancy | help@ncsa.illinois.edu |
| ||||||||
2022-03-09 0700 | 2022-03-09 0810 | linux.ncsa.illinois.edu (aka public-linux) | Upgrade server to RHEL 8 and add NCSA Duo 2FA authentication | Server was unavailable during maintenance. | help+service@ncsa.illinois.edu |
| ||||||||
2022-03-02 930 | 2022-03-07 1715 | ICC | Emergency PM UPDATE: We are currently experiencing unforeseen technical issues with the cluster. We are investigating and expect resolution and restoration of all Campus Cluster services by March 3rd 12PM | ICCP filesystem will be offline. Most projects will be impacted. Special arrangements have been made with some to be able to operate to some degree during the outage. | help@campuscluster.illinois.edu |
| ||||||||
2022-03-02 1237 | 2022-03-02 1715 | iforge (iforge.ncsa.illinois.edu | GPFS issue with interruption of filesystem leading to scheduler pause | 1 running job was aborted, and any new jobs paused during the interruption | help@ncsa.illinois.edu |
| ||||||||
2022-03-02 0600 | 2022-03-02 0630 | Jira | Adding Ram | Jira will be unavailable druning maintenance |
| |||||||||
2022-03-01 1800 | 2022-03-01 1810 | ldap2 server clients of NCSA LDAP | on-line maintenance | restart rsyslog and Ldap after relocating /var/logs clients should have redundant servers configured | Timothy Bouvet |
| ||||||||
2022-02-28 1800 | 2022-02-28 1830 | ldap1 server clients of NCSA LDAP | on-line maintenance Had to restart rsyslog and Ldap after relocating /var/log | slow response from ldap1 but clients should have redundant servers configured | Timothy Bouvet |
| ||||||||
2022-02-28 0900 | 2022-02-28 1030 | CMDB | V1.7.20220228 Release | MDB database will be unavailable. ITSM's openDCIM will be down for a short period (~ 5 minutes) while the data is reloaded. |
| |||||||||
2022-02-26 0730 | 2022-02-26 0750 | NCSA GitLab | GitLab was updated to latest version | All GitLab services were unavailable | help+service@ncsa.illinois.edu |
| ||||||||
2022-02-25-10:00 | 2022-02-25-13:00 | Taiga - CenterWide FS | Full file system outage | All clients mounting Taiga |
| |||||||||
2022-02-09 1400 | 2022-02-25 1030 | Jira, Internal/Savannah, LDAP, POP, Hosted web servers, virtual classroom, vcenter | The NCSA VMWare cluster is experiencing storage performance issues. -- Update: Adjustments have been made to storage used by the LDAP servers and other non-essential VM instances have been disabled. Testing is indicating that response times have improved and services are working normally again. | We monitoring services. Please report any issues to help@ncsa.illinois.edu | Timothy Bouvet |
| ||||||||
2022-02-24 1000 | 2022-02-24 1115 | cerberus2.ncsa.illinois.edu, tg-kdc1.security.ncsa.illinois.edu, bwbh2.ncsa.illinois.edu | One of the IRST ESXi machines unexpectedly shutdown. | The listed hosts are currently unavailable |
| |||||||||
2022-02-23 1700 | 2022-02-23 1900 | DNS2 | DNS2 hardware will be replaced. | There will be a brief outage of DNS2, while IP's are migrated to the new server. | help+neteng@ncsa.illinois.edu |
| ||||||||
2022-02-22: 0825 | 2022-02-22: 1324 | Slack | Info from Slack (https://status.slack.com/) We've resolved the issue, and all impacted customers should now be able to access Slack. You may need to reload Slack (Cmd/Ctrl + Shift + R) to see the fix on your end. If that doesn't work, try clearing cache (Help > Troubleshooting > Clear Cache and Restart from the app menu). Thanks for bearing with us and we apologize for the disruption to your work day! Feb 22, 1:24 PM CST We're seeing signs of improvement. Please try reloading Slack, and if not a cache reset. We’re still monitoring the situation. We’ll confirm once this issue is fully resolved. Feb 22, 11:07 AM CST Slack is not loading for some users. We are continuing to investigate the cause and will provide more information as soon as it's available. Feb 22, 9:23 AM CST We're still working towards a full resolution. We'll be back with another update soon. Thank you for your patience. Feb 22, 8:44 AM CST We’re investigating the issue where Slack is not loading for some users. We’re looking into the cause and will provide more information as soon as it's available. Feb 22, 8:25 AM CST | Various issues accessing and using Slack | help@ncsa.illinois.edu |
| ||||||||
2022-02-18 12:10PM | 2022-02-18 | Jira | Reboot to add ram/swap This is to improve stability | Jira tickets unavailable | Timothy Bouvet |
| ||||||||
2022-02-10 1030 | 2022-02-18 3:55pm | Ngale filesystem | The Lustre filesystem is not loading correctly. The support team has been contacted. Near completion: Working with vendor on additional configuration changes. Hope to complete final validation and return to service by close of business 2022-02-18. | /ngale filesystem is not accessible. | Peter Hartman |
| ||||||||
2022-02-18 12:10PM | 2022-02-18 | Jira | Reboot to add ram/swap This is to improve stability | Jira tickets unavailable | Timothy Bouvet |
| ||||||||
2022-02-14 1PM | 2022-02-14 4:15PM | All NCSA LDAP servers | Expanding schema and restarting servers | systems will reconnect to LDAP server after restart |
| |||||||||
2022-02-09 1000 | 2022-02-09 1200 | Facility UPS | UPS DC voltage calibration | UPS will be taken to maintenance bypass and all connected systems will be fed from unprotected power source (no power interruption). | rantissi@illinois.edu |
| ||||||||
2022-02-09 0900 | 2022-02-09 0940 | Line card failure in Core-East | Line card failure in Core-east, which is resulting in connectivity issues for some infrastructure in NCSA 3003. | DNS2, and LSST systems in 3003 were down until the uplinks could be migrated to a new port on Cores | help+neteng.ncsa.illinois.edu |
| ||||||||
2022-02-01 8AM | 2022-02-01 4PM | Jira/ldap-auth1 | login issues | Jira Access | ||||||||||
2022-02-09 0534 | 2022-02-09 0811 | LDAP (and dependent services, incl. Jira) vSphere/ICI VMware | Authorization timeouts/failures in dependent services. ICI staff are investigating. | LDAP (and dependent services, incl. Jira) vSphere/ICI VMware Cause of most severe issues was power fluctuations around 0555, but certain LDAP servers showed degraded slightly earlier. |
| |||||||||
2022-02-09 0600 | 2022-02-09 0645 | NCSA MySQL | MySQL database servers need to be synchronized to bring replicated database servers online. NOTE: The MySQL database is back up, but users may experience issues due to an LDAP issue. | Wiki, JIRA, Savannah/Internal, Identity, and some web sites will stop working. More details are linked here. | help+service@ncsa.illinois.edu |
| ||||||||
2022-02-08 7AM | 22-02-08 3:15PM | iforge / vforge / license servers | Regular Maintenance | iforge, vforge, license servers |
| |||||||||
2022-02-08 1000 | 2022-02-08 1245 | CMDB | V1.6.20220207 Release | CMDB database will be unavailable. ITSM's openDCIM will not be impacted. | kimber7@illinois.edu |
| ||||||||
2022-02-04 0600 | 2022-02-04 0640 | NCSA GitLab | GitLab was updated to latest version | All GitLab services were unavailable | help+service@ncsa.illinois.edu |
| ||||||||
2022-02-01 0800 | 2022-02-01 0900 | cilogon.org | Update to OA4MP v5.2.4 | Improvements in the back-end service | help@cilogon.org |
| ||||||||
2022-01-25 | 2022-01-25 | Facility UPS | Replace UPS batteries | All systems with facility UPS feed | rantissi@illinois.edu |
| ||||||||
2022-01-24 1800 | 2022-01-24 20:00 | NCSA File & Print Servers | Scheduled Windows Server Maintenance | File & Print Shares will be unavailable during maintenance. Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2022-01-24 0400 | 2022-01-24 0630 | Failed line card on neo-hpc-1 switch | Line card failure is affecting devices that are plugged into Neo-hpc-1 aggregation switch. We've migrated links off the failed card, to other ports on the same switch. | No services are currently impacted. | help+neteng@ncsa.illinois.edu |
| ||||||||
2022-01-19 0800 | 2022-01-19 2000 | ICC | ICC Quarterly Maintenance | All ICC services |
| |||||||||
2022-01-18 0800 | 2022-01-18 0830 | cilogon.org | Upgrade MyProxy CA servers to CentOS 7 | Upgrade back-end MyProxy CA VMs from CentOS 6 to CentOS 7. No downtime is expected. | help@cilogon.org |
| ||||||||
2022-01-14 0600 | 2022-01-14 1715 | Business IT database had bad data. | A database that NCSA mirrors from campus changed without notice breaking our MIS system. Business IT isolated the issue and corrected the data. | Multiple complex systems have been affected by this data corruption issue. | help+service@ncsa.illinois.edu |
| ||||||||
2022-01-14 0800 | 2022-01-14 1720 | NCSAnet wireless | NCSAnet Wireless was unavailable due to bad data in ldap | Users couldn't connect to the NCSAnet wireless network | help+neteng@ncsa.illinois.edu |
| ||||||||
2022-01-05 1100 | 2022-01-05 1145 | CMDB | Version V1.5.20211223 release | CMDB database will be unavailable for a few moments; openDCIM will be unavailable for a few moments. | kimber7@illinois.edu |
| ||||||||
2021-12-20 1830 | 2021-12-20 2030 | JIra | Version Upgrade to address security issue | Jira will be unavailable | help+service@ncsa.illinois.edu |
| ||||||||
2021-12-17 1300 | 2021-12-17 1340 | CMDB | Version V1.4.20211217 release | CMDB database will be unavailable for a few moments; openDCIM will not be affected. |
| |||||||||
2021-12-17 0600 | 2021-12-17 0622 | NCSA GitLab | The server was updated with some new Puppet configurations. | GitLab services was unavailable for a few minutes as the SSL certificate for the service was updated. | help+service@ncsa.illinois.edu |
| ||||||||
2021-12-16 1400 | 2021-12-16 1430 | HTTP web proxy: httpproxy.ncsa.illinois.edu | NCSA's general purpose HTTP web proxy server was rebuilt. | HTTP web proxying through httpproxy was unavailable. | help+service@ncsa.illinois.edu |
| ||||||||
2021-12-10 0700 | 2021-12-10 1345 | iForge | InfiniBand switch maintenance | All systems unavailable | iforge-admin@lists.ncsa.illinois.edu |
| ||||||||
2021-12-10 0900 | 2021-12-10 1000 | Bastion Hosts (Production group B) | Patching out of cycle | Bastion Hosts (Production group B) were individually unavailable during reboot | help+security@ncsa.illinois.edu |
| ||||||||
2021-12-09 0900 | 2021-12-09 0931 | Bastion Hosts (Production group A) | Patching out of cycle | Bastion Hosts (Production group A) were individually unavailable during reboot |
| |||||||||
2021-12-09 0800 | 2021-12-09 0900 | All IDDS services | IDDS Postgres and Ruby on Rails upgrades | All IDDS services | tolbert@illinois.edu |
| ||||||||
2021-12-09 0600 | 2021-12-09 0613 | NCSA GitLab | GitLab was updated to latest version | All GitLab services were unavailable for about 5 minutes | help+service@ncsa.illinois.edu |
| ||||||||
2021-12-07 1400 | 2021-12-07 1443 | LSST | Kubernetes on NTS is not working properly after updates | Kubernetes on NCSA Test Stand | lsst-admin@ncsa.illinois.edu |
| ||||||||
2021-12-07 0800 | 2021-12-07 1400 | LSST | LSST Quarterly Maintenance | All LSST services hosted at NCSA | lsst-admin@ncsa.illinois.edu |
| ||||||||
2021-12-07 0930 | 2021-12-07 1030 | ACHE Firewalls | software maintenance | Firewalls will be upgraded using fail over procedures - no traffic impact expected | James Eyrich - eyrich on slack |
|
...