Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

StartEndWhat System/Service was affected?What happened?What was affected?

Contact Person

Status
2023-03-23 08432023-03-23 1030DHCP serving NCSAnet wireless and NCSA office wired wall jacksThe main NCSA DHCP server stopped answering queries and was restartedIf you didn't already have a DHCP lease your system would have been unable to connect to NCSAnet or register on an office wired wall jack.neteng@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-03-15 18002023-03-16 2300NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-03-14 11002023-03-14 1150Authentication to vsphere.ncsa.illinois.edu and ache-vcenter will failReplacing SSL certs on Ldap1/2Ldap will be restarted on Ldap1/2tbouvet@illinois.edu

Status
colourGreen
titleComplete

2023-03-09 0700

2023-03-09 17:20

vForge / license serversQuarterly Planned Maintenance

all nodes and services will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titleComplete


03/09/2023 0800

03/09/2023 1713NCSA Taiga & GraniteTaiga Service Node Updates & Granite UpgradeTaiga Public LNET router was upgraded and a second one added; access via public LNET was down from 0800 to 1100.  Globus and NFS services were patched in a rolling/online fashion.  

Granite experienced a short full downtime as we upgraded its software.  
set@ncsa.illinois.edu 

Status
colourGreen
titleComplete

03/07/2023 8:30am03/07/2023 10:15amDelta HSNThe HSN was dropping nodes and not allowing nodes to reconnectHigh Speed Connectivityhelp@ncsa.illinois.edu
Status
colourGreen
titleComplete
2023-03-01: 1100

2023-03-01: 1115

Radiant OpenStack ServicesChanges to the OpenStack controller node to address networking performance issues

All OpenStack services were restarted to effect system configuration changes. The work was completed successfully and all services are available again

help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-02-25

2023-02-27 0930

NCSA email A mail loop caused routing and processing problems.Mail routing and delivery was blocked.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2023-02-21 0700

2023-02-21 1640

HOLL-IQuarterly Planned Maintenanceall nodes and services will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2023-02-16 ~14:152023-02-16 ~14:25cerberus4mis-configuration caused roughly 50% of connections to be dropped50% of connections in and out droppedhelp+security@ncsa

Status
colourGreen
titlecomplete

2023-02-10 09102023-02-10 0915users.ncsa.illinois.edu web siterestarting the systemno web pages from users.ncsa.illinois.edu will be available help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

02/08/2023 180002/09/2023 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

1215

1230

JiraJira will be restarted to fix stuck notification emails.Jira will unavailable during this time.

Andrew Loftus 

Also posted to #announce (Slack)

Status
colourGreen
titlecomplete

 11:18

15:02

ICCP head node login and golub compute resourcesLost network connectivity for golub infrastructureICCP head node logins (ie cc-login.campuscluster.illinois.edu) and golub compute resources

help@campuscluster.illinois.edu

Status
colourGreen
titleresolved

1200

1300

JiraJira offline for service restart to fix stuck emails.Jira will unavailable during this time.help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

01/25/2023 0800

01/25/2023 0830

NCSA LDAProlling LDAP restarts of redundant servers to deploy new schema file

Minimal impact for service restarts

Status
colourGreen
titleCompleted

2023-01-19 13102023-01-19 1330JiraJira offline for reboot to fix Boards.Jira will unavailable during this time.help@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

  0800

1700

ICCPICCP Quarterly MaintenanceAll ICCP services

help@campuscluster.illinois.edu

Status
colourGreen
titleresolved

2023-01-13 12002023-01-13 1230JiraJira offline for dashboard fixes.Jira will unavailable during this time.help@ncsa.illinois.edu

Status
subtletrue
colourGreen
titlecompleted

2023-01-12 08002023-01-13 1230JiraMinor issues noticed in Jira likely caused by the upgrade yesterday evening. Gadgets and dashboards are having issues.

Status
subtletrue
colourGreen
titleresolved

2023-01-11 07002023-01-12 1200NightingaleQuarterly Planned Maintenance

All Nightingale servers and services will be unavailable (other than the ngale-bastion* nodes)

Maintenance has been extended until noon Thu, Jan 12 due to complications with firmware update on the Lustre storage appliance.

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2023-01-12 07002023-01-12 0715NCSA VPN Router MigrationThe NCSA VPN was migrated to a different upstream router. Users were briefly disconnected. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

  0600

  0615

NCSA GitLabGitLab upgradeAll GitLab services were unavailable for a few minutes while it upgraded to the latest version.

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

06:40 1/9/20232023-01-11 2100vSphere in 3003One of the storage appliances serving vsphere.ncsa.uiuc.edu started access issues. This has caused issues with 19vms.

crashplan has returned to service

help@ncsa.illinois.edu
Status
colourGreen
titleCOMPLETED


2023-01-11 17302023-01-11 1915Jira Jira software upgradeJira will be unavailable while software upgrades are applied.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

1/9/2023 6:40am1/11/2023 variousvSphere in 3003One of the storage appliances serving vsphere.ncsa.uiuc.edu had access issues.  Data was moved to different storage for affected VMs.digitalag.ncsa.illinois.edu, gecat, reu.ncsa.illinois.edu, ACIpartnership.org, astro, edream, caiiwp, brainstormhpcd.org, internal-dev, cmdb-dev-kimber7, reu-international.ncsa.illinois.edu, avl-test, mharp - ergo, infews-er.net, ncsa30, bluewaters - 2018-03-05.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

12/23/2022 6:30pm12/27/2022 1:30pmTaigaSingle OST is failing to re-mount following failoverFile system is unavailableset@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

 0530

 0600

Wireless at NCSA building.Router UpgradeTech Services will be upgrading their NCSA building router which will effect wireless at the NCSA building.  Downtime will be estimated at 15 mins. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

0800

1200

RadiantSystem maintenance

OpenStack:

  • "Minor system configuration changes will be made to increase system logging and optimize memory usage/allocation across nodes. No noticeable impact to end users is expected."

Networking:
  • Swap fiber links to correct issue with security taps: In order to minimize user impact, we will swap one link at a time. User should see no impact however there is a slight possibility of a temporary network outage potentially lasting a few minutes however we currently do not anticipate this happening.
  • Update Ethernet switch firmware: Switch reboots will be done in a rolling fashion and so are not expected to be disruptive to ongoing operations (due to switch/path redundancy).
help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

15 Dec 2022 090015 Dec 2022 0935NCSA KerberosNCSA's Read-Write KDC is being upgradedPassword changes and new accounts are being queued for completion after the upgrade.help@ncsa.illnois.edu

Status
colourGreen
titleCOMPLETED

  0600

  0615

NCSA GitLabGitLab was upgraded to latest versionAll GitLab services was unavailable for a few minutes.

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

12-01-2022 060012-01-2022 0700NCSA VPNSoftware UpgradesThe appliances hosting the NCSA VPN were patched. Users experienced a brief disconnect as load is failed over between the appliances. The anyconnect client was upgraded at this timeneteng@ncsa.illinois.edu

Status
colourGreen
titleResolved

1130

1400

NCSA identity password resetsThe password reset process is not completing.Users password resets were queued and then applied when the issue was fixed.  Users who tried to change their password should find there password is now set to the password of their last attempt.help@ncsa.illinois.edu

Status
colourGreen
titleresolved

 

 

capnjack (license server)Changes to IPTABLESUnknown servers. Licenses affected are IDL, PGI, Intel, MATLAB, Abaqus, Sention LM, Luda, Ansys, CDL, Adaptive, Converge, CFD, RLM Type, rr_ld

meberger@illinois.edu

re: SVCPLAN-1465

Status
colourGreen
titleCompleted

2022-11-16 10422022-11-16 1351CILogonDocker Swarm failureCILogon services were unavailable. See: https://cilogon.statuspage.io/incidents/2blf564965s0help@cilogon.org

Status
colourGreen
titleResolved

2022-11-15 07002022-11-15 1700HOLL-IQuarterly Planned Maintenanceall nodes and services will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-10 0700

2022-11-10 1200

vForge / license serversQuarterly Planned Maintenance

all nodes and services will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2022-11-10 11:002022-11-10 11:50ASD Vsphere, specifically vm's using the tintri storage appliance.Network connections were upgraded to 25G speed.There was no disruption of service with this work.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-07 09002022-11-07 0958set-analytics.ncsa.illinois.eduPhysical Machine Move from 3003 to NPCFThe SET managed Grafana/InfluxDB instance will be unavailableset@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-04 19002022-11-04 1930SET TaigaSET  caused a failover of tgio02 and then failed back.  This fixed the mounting issue.Clients with taiga currently mounted may experience slow or stopped IO during the failover.  Failover completed properly and solved the mounting issue.set@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-03 11322022-11-04 1930Delta

Taiga filesystem (/taiga/ and /projects/) problem on dt-login01 and dt-login02

The issue is limited to dt-login01 and dt-login02. Commands attempting to access /taiga/ or /projects/ on these nodes will hang.

Users are advised to use dt-login03 or the login.delta.ncsa.illinois.edu "round robin" address

UPDATE: dt-login01 and dt-login02 are fully functional again and back in the login.delta.ncsa.illinois.edu DNS "round robin".

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-03 00482022-11-03 0106SET Taigatgio02 and tgio04 failed overOSTs on the two nodes were inaccessible until the reboots were complete. This is a known issue with a vendor patch in progress. set@ncsa.illinois.edu
Status
colourGreen
titlecompleted


2022-11-03 06002022-11-03 0615NCSA GitLabGitLab was upgraded to latest versionAll GitLab services was unavailable for a few minutes.

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-02 17002022-11-02 2000DNS ServicesPatching for out of cycle security updates.DNS1 and DNS2 will be patched and rebooted (staggered) to applied needed updates.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-01 18002022-11-02 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

0800

0830

idp.ncsa.illinois.eduEnable Duo Universal PromptNCSA Identity Provider will now use Duo Universal Prompthelp+idp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2022-10-25 08002022-10-25 0900NCSA building 1st Floor Wifi / Security CamerasTech Services is replacing a networking switch on the 1st for of the NCSA building that powers the Access Points on the first floor.This should be a short down time, but the access points will reboot while we migrate cables to the new switch.  help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-10-19 07:002022-10-20 07:15Some SSH Bastion HostsOut-of-Cycle reboot needed after failed patching.
Will reboot tomorrow at 07:00am
bwbh1.ncsa.illinois.edu
bwbh3.ncsa.illinois.edu
cerberus1.ncsa.illinois.edu
cerberus3.ncsa.illinois.edu
ache-bastion-1.ncsa.illinois.edu
ngale-bastion-1.ncsa.illinois.edu
help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2022-10-18 15:002022-10-18 15:30Radiant instance creation/managementsystem setting changesNo noticeable impactpl@illinois.edu

Status
colourGreen
titlecomplete

2022-10-18 12:002022-10-18 12:05identity, email to NCSA addressessystem updates 1 minute window to cause email delays and identity frontend unavailablecpl@illinois.edu

Status
colourGreen
titlecomplete

2022-10-14 22002022-10-15NCSA office firewall upgradeUpgrading code on the office firewall.Office networks will be offline during this upgrade.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-10-13 17002022-10-13 1800SSLVPN MaintenanceThe second member of the HA pair will be put back into service.The second member was added with no outage.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-10-12 11:002022-10-12 12:00ASD Vsphere, specifically vm's using the tintri storage appliance.Network connection on tintri storage box were switch to new hardware but their speed was unchanged. Additional work will need to be scheduled to complete the speed increase.This had no service impact.help@ncsa.illinois.edu

Status
colourYellow
titleINCOMPLETE

2022-10-10 00002022-10-10 1040NCSA VPNThe NCSA VPN had a member of the HA pair fail and licensing didn't fail over. Users were unable to connect to the VPN until the licensing issue was resolved.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-10-03 08002022-10-03 0845HOLL-Iinstall security updates and reboot
help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-30 06002022-09-30 0615NCSA GitLabGitLab was upgraded to latest versionAll GitLab services were unavailable for a few minutes.

help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-27 11002022-09-28 1700odd numbered bastion hosts (cerberus1, cerberus3, ache-bastion-1, ngale-bastion-1, etc.)puppet code refactoring for SSH configs

More changes were pushed out around 5p on 2022-09-28 and we believe the SSHD config issues are resolved.  You can use the even numbered (cerberus2, cerebrus4) bastions as a work-around if any issues persist.

help+security@ncsa.illinois.edu
Status
colourGreen
titleResolved


2022-09-28 09302022-09-28 1050Jira outgoing emailoutgoing email degradedJira failed to send some/most outgoing email during this time frame.help@ncsa.illinois.edu

Status
colourGreen
titleresolved

2022-09-24 14452022-09-25 1045GraniteBuilding power outage caused Disk Storage Unit to power cycleAny user operations on the cluster were interrupted and unavailable until resolution.bdickin2@illinois.edu

Status
colourGreen
titlecomplete

2022-09-21 08002022-09-21 0930HOLL-IChange CS-2 execution mode to PipelinedExecution mode of the CS-2 was changed from Weight Streaming to Piplined.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-08 08002022-09-10 1100GraniteGranite Bi-annual Maintenance (now back in service)Any ingest or retrieval to/from the Archivebdickin2@illinois.edu  slack-id: briandi
set@ncsa.illinois.edu 

Status
colourGreen
titlecomplete

2022-09-09 09432022-09-09 1457Jiraoutgoing email degradedJira failed to send some/most outgoing email during this time frame.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-08
0700

2022-09-08 1010: license servers

2022-09-09
0230: vForge

vForge / license serversQuarterly Planned Maintenance

all nodes and services will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

 

0500

0600

ASD VM services netRouting in the switch stacks is being swiched from NCSA 3003 to NPCFAll systems on the 141.142.192.x network will be unreachable for up to 5 minutes.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-08-31 18002022-09-01 0700NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were be unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

1730

1830

JiraJira service will be restartedJira will not be availablehelp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

08-24-22 183008-26-22 0800Granite Tape ArchiveFS crash and lockupA few files that were transferred into the archive shortly before the crash needed to be re-transferred.

bdickin2@illinois.edu  slack-id: briandi
set@ncsa.illinois.edu 


Status
colourGreen
titleCOMPLETE

2022-08-17
1200
n/aAll LSST hosts at NCSAServers will be shutoff and retired.All LSST servers and services at NCSA.lsst-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-17 07002022-08-17 1320NightingaleQuarterly Planned MaintenanceAll Nightingale servers and services will be unavailable (other than the ngale-bastion* nodes)help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-16 0700

2022-08-17 1305

HOLL-IQuarterly Planned Maintenance

All HOLL-I servers and services will be unavailable

2022-08-16 1505 - HOLL-I cluster return to service, but CS-2 remains offline for further work; CS-2 expected return to service by 2022-08-17 1000

2022-08-17 1305 - HOLL-I CS-2 is returned to service

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-09 20002022-08-09 2300Office Networks on 2nd FloorCode updates on office network switches.Office ports will be offline as switches reboot. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-10 20002022-08-10 2300Office Networks on 3rd FloorCode updates on office network switches.Office ports will be offline as switches reboot. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-11 20002022-08-11 2300Office Networks on 4th FloorCode updates on office network switches.Office ports will be offline as switches reboot. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-03
0900
2022-08-03
1000
NPCF Center-wide management firewallsSecondary firewall will be upgradedNo impact to services is anticipated.  Traffic will flow normally through the primary firewall as the secondary is upgraded.help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-27
0940
2022-07-28
15:36
ACHE, Nightingale Several accounts have had their Covered Entity status revokedAffected users/accounts will not be able to access resources that requires Covered Entity enrollment 

help+hippa@ncsa.illinois.edu

Status
colourGreen
titleresolved

2022-07-27
0900
2022-07-27
1000
NPCF Center-wide management firewallsPrimary firewall will be upgradedNo impact to services is anticipated.  Traffic will be failed over to the secondary firewall, the primary will be updated, and then traffic will be moved back to the primary.help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0900

0915

JiraAdditional LDAP group will be added for exclusion to sync with LDAP users.In theory, nothing.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0800

  2000

ICCICC Quarterly MaintenanceAll ICC services

help@campuscluster.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-19 07002022-07-19 0900RadiantVictoria UpdateMinimally disruptive, brief interruptions to OpenStack services, such as the Horizon dashboardradiant-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-14
2345
2022-07-14
2359
WikiThe service will be restarted in order to increase the login timeout.Wiki will be unavailable for about 5 mins while it restarts.

Status
colourGreen
titlecomplete

2022-07-08
1700
2022-07-11
0800
LSST hosts in NCSA 3003Due to a full building power outage at NCSA on Sunday, 10 July, some LSST servers will be unavailable over the weekend. Servers will be shutdown at COB on Friday and returned to service on Monday morning.lsst-dbb-fts1
lsst-dbb-rucio
lsst-demo
lsst-dm-monitor
lsst-int-monitor
lsst-mon-dev
lsst-pup
lsst-test5
lsst-xfer
l1-cl-arctl
l1-cl-fault
l1-cl-header
nts-ccamfwdr1
nts-acamfwdr2
nts-acamfwdr1
lsst-admin@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-07-11 08:302-22-07-11 9:30All ITSM (CMDB) VMsAll ITSM VMs are currently down. Ticket has been created to get them brought back up.Production CMDB service (openDCIM) is not availablekimber7@illinois.edu

Status
colourGreen
titleresolved

2022-07-10 0700

2022-07-10 1430

NCSA building powerBuilding power feed work for multiple campus BuildingsAVL, LSST, ISL and Software standard services were down from Friday afternoon until Monday morning.Daniel Lapine 

Status
colourGreen
titleCOMPLETE

2022-07-8

1600

2022-07-11

0900

cerberus2 and cerberus4Campus is doing work on a common feed that affects multiple buildings, include the NCSA Building. Work is scheduled from 0700-1700, but may finish earlyVM hosts running these 2 bastions will be down for the weekend due to the scheduled power work at NCSAhelp+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-06 17302022-07-06 2030Wiki (wiki.ncsa.illinois.edu)Confluence and MySQL upgradeswiki will be down during the upgrade

Status
colourGreen
titleCOMPLETE

2022-07-05 18002022-07-05 2130NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-05 1800N/AiForgeend of serviceiForge was removed from service. Operations have moved to the new vForge virtual cluster.help+industry@ncsa.illinois.edu

Status
colourGreen
titleRESOLVED

...