You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 1606 Next »

status.ncsa.illinois.edu


Watch this page in the wiki to subscribe to automatic updates to this status page.

Please do not refer to any NCSA Industry Partners on this page. Please use the iforge nomenclature for all of the *forge infrastructure.

To see older events, see Archive of NCSA Status Home

Report a problem 

Current Status  

START
ENDWhat System/Service is affectedWhat is happening?What will be affected?Contact PersonStatus

2021-07-01

2140


Radiant Openstack ClusterA networking issue starting yesterday evening broke access to the Horizon dashboard. Running instances were not impacted until a second event at around 1200 today caused widespread loss of network connectivity. Admins are investigating and attempting to repair the cluster in-place, however, at this time, all running instances are down.
NCSA Help Desk (help@ncsa.illinois.edu)

SYSTEM DOWN


Upcoming Scheduled Maintenance

Listed below in chronological order.

StartEndWhat System/Service is affectedWhat is happening?What will be affected?Contact PersonStatus
2021-07-07 06002021-07-07 0800CILogon AWS Hosted ServicesUpgrading AWS RDS Aurora MySQL v5.6 to v5.7COmanage Registry and Grouper services hosted by CILogon will be unavailablehelp@cilogon.org

SCHEDULED

2021-07-08 08002021-07-08 1000OpenAFSRemaining OpenAFS Database servers will be upgraded.There is no service impact expected during this maintenance.   help+service@ncsa.illinois.edu

SCHEDULED

2021-07-21
0800
2021-07-21
2900
ICCP

ICCP Quarterly Maintenance

  • TBD
All ICCP services

help@campuscluster.illinois.edu


SCHEDULED

2021-09-30
0800
2021-09-30
1200
LSST

LSST Quarterly Maintenance

  • TBD
All LSST services hosted at NCSAlsst-admin@ncsa.illinois.edu

SCHEDULED

2021-10-20
0800
2021-10-20
2900
ICCP

ICCP Quarterly Maintenance

  • TBD
ICCP Cluster nodes onlyhelp@campuscluster.illinois.edu

SCHEDULED

2021-12-09
0800
2021-12-09
1200
LSST

LSST Quarterly Maintenance

  • TBD
All LSST services hosted at NCSAlsst-admin@ncsa.illinois.edu

SCHEDULED


Previous Outages or Maintenance

StartEndWhat System/Service was affected?What happened?What was affected?

Contact Person

Status

2021-07-01

0247

2021-07-01

1300

Various systems in NPCF, ACB, NCSA

There was a power event in the Champaign-Urbana area at around 2:47AM today. Details about the cause are currently unknown.  This event caused disruptions to systems at the NCSA building, NPCF and ACB. Known issues have generally been resolved but there may be unidentified issues lingering. If you encounter any problems, please notify NCSA help desk staff (help@ncsa.illinois.edu).

Multiple systems/services were impacted. All have been recovered and return to normal operations is complete.NCSA help desk

RESOLVED

2021-07-01 02:58 CDT2021-07-01 06:00 CDTACHE and NGALE bastion hostsLoss of power.All ache-* services, ngale bastion hostshelp@ncsa.illinois.edu

RESOLVED

2021-06-29 22:00

2021-06-29 23:59

NCSA 4th Floor Office networkRebooting one or more of the office switches on the NCSA Building 4th floor to resolve a phone issue.Office port connectivity will be intermittent during the maintenance window.

Matt Kollross

help+neteng@ncsa.illinois.edu

RESOLVED

2021-06-24
0800
2021-06-24
1345
LSST
  • Updates are being applied on Prod/Stable k8s, rebuild of some ingress nodes
Prod/Stable K8Slsst-admin@ncsa.illinois.edu

RESOLVED

2021-06-24
0800
2021-06-24
1200
LSST

LSST Quarterly Maintenance

  • OS updates on all servers

All LSST services hosted at NCSA

EXCEPT Prod/Stable K8S

lsst-admin@ncsa.illinois.edu

COMPLETE

2021-06-22 0000

2021-06-22 0400

Internet2 WAN linkInternet2 will be migrating NCSA's physical port to their new next generation infrastructure.During the maintenance, our I2 connection will be down.  Traffic will reroute to other connections.  Some point to point connections maybe unavailable for period of time.  The maintenance window is not expected to take all 4 hours.

Matt Kollross

help+neteng@ncsa.illinois.edu

COMPLETE

2021-06-21 18002021-06-22 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu 

COMPLETE

2021-06-17-07002021-06-17-0820OpenAFSThe OpenAFS database server kaskaskia was upgradedNo service outages were observed or reported.help+service@ncsa.illinois.exdu

COMPLETE

2021-06-12 22002021-06-15 1500LSST FirewallThe NPCF secondary firewall was offline due to a hard drive failure.No impact occurred to production services as the primary firewall stayed online.

RESOLVED

2021-06-14 17002021-06-15 0958NCSA GitLabAttempt to fix an authentication bug for a particular user accidentally broke all authentication through the web interface,Authentication through the web interface did not work.help+service@ncsa.illinois.edu

RESOLVED

2021-06-112021-06-11 0905NCSA JiraJira email problemJira is not accepting issues via email, you can still create issue directly via Jira GUI

RESOLVED

2021-06-10 07002021-06-10 0800cilogon.orgUpdate to OA4MP v5.1.3.The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.3.help@cilogon.org

COMPLETE

1000

1030

Jira.ncsa.illinois.eduConfiguration change to address a vulnerabilityThere should not be any service interruption, but as with all things, it is possiblehelp+service@ncsa.illinois.edu

RESOLVED

2021-06-022021-06-02NetdotNetdot web access now requires 2FA via SSL VPN, or Cerberus proxy. Security requested that Netdot require 2FA, in order to access the web interface.  To accommodate that request, the Netdot firewall has limited web access to the VPN subnet or via proxy from the Cerberus jump hosts. 

Matt Kollross

help+neteng@ncsa.illinois.edu

RESOLVED

2021-05-252021-05-26vcenters for ache and ASDemergency security updates were applied.the administrative interface was off-line for about 20 minutes as the updates were installed.help+service@ncsa.illinois.edu

RESOLVED

2021-05-26

1000

2021-05-26

1030

VoIP phones at NPCFMigrating the VoIP networks to a campus IP to enable future migrations by tech services.After the networks are migrated, a reboot all phones at the NPCF building will be performed.

Matt Kollross

neteng+help@ncsa.illinois.edu

RESOLVED

2021-05-21

1800

2021-05-21

1900

VoIP phones at the NCSA buildingMigrating the VoIP networks to a campus IP to enable future migrations by tech services.After the networks are migrated, a reboot all phones at the NCSA building will be performed.

Matt Kollross

neteng+help@ncsa.illinois.edu

RESOLVED

2021-05-20 05:402021-05-20 08:45LSST

ESXi host outage causing degradation of select services.


Degradation of select services:

  • data backbone gateway (lsst-dbb-gw01 down)
  • HTCondor (Central Manager nodes down for Prod & DAC)
  • login (lsst-login01 is down)

Also loss of redundancy for some underlying services, including auth/access & k8s head nodes.

lsst-admin@ncsa.illinois.eduRESOLVED


2021-05-15
0600
2021-05-15
0800
CILogon hosted services including COmanage Registry, LDAP, SAML proxy, SAML AA, MDQMaintenanceAll CILogon hosted services were temporarily unavailable.help@cilogon.org

COMPLETE

2021-05-12 07:00

2021-05-12 08:00

internal.ncsa.illinois.edu

NCSA Internal Web Server Upgrade
(aka Savannah or MIS Tools)
Updates were made that will affect the availability of the NCSA internal website and Savannah system. The system was be unavailable during this time.

help+service@ncsa.illinois.edu

COMPLETE

2021-05-11

07:00

2021-05-11

19:00

iForgeQuarterly MaintenanceAll systems unavailable

iforge-admin@lists.ncsa.illinois.edu

COMPLETE

2021-05-06 09002021-05-06 0945WAN Link MigrationNCSA Neteng migrated the WAN link to Internet 2 to new hardware.

Traffic was automatically re-routed to redundant paths during the link outage. Any connections relying on layer-2 connections over AL2S saw a brief blip as the connection is cut over. Affected parties were contacted in advance.

help+neteng@ncsa.illinois.edu

COMPLETE

2021-05-03
0600
2021-05-03
0630
CILogon Multi-tenant COmanage RegistryUpgrade to version 3.3.2The service at https://registry.cilogon.org  was unavailablehelp@cilogon.org

COMPLETE

2021-04-29 16002021-04-29 1700
  • HTCondor Prod
  • HTcondor DAC
Add new nodes into Condor service pools
  • HTCondor Prod
  • HTcondor DAC
lsst-admin@ncsa.illinois.edu

COMPLETE

2021-04-21 08:002021-04-21 20:00ICCPICCP Quarterly MaintenanceThe scheduler will be down.  All compute nodes will be converted to rhel7.9 with RedHat IB.

iccp-admins@campuscluster.illinois.edu

COMPLETE

2021-04-15 16002021-04-15 1700NCSA OpensourceUpgrade of OS on all machines related to opensourcejira, wiki, git etc hosted at https://opensource.ncsa.illinois.edu/kooper@illinois.edu

COMPLETE

2021-04-15

12:25

2021-04-15

14:45

ICI vmware

Several hosts on the vmware service were experiencing timeouts

  • bluewaters
  • bluewaters-test
  • internal
  • its-nagios
  • ldap1
  • vcenter
no or intermittent connectivity to these hostshelp+service@ncsa.illinois.edu

RESOLVED

Root cause is still being investigated.

2021-04-15
0900
2021-04-15
0942
CMDBApplying new certificates and restarting servicesCMDB, including web interface, will be down briefly during the update.ncsagroup+org_itsm@ncsa.illinois.edu

RESOLVED

2021-04-15 09002021-04-15 0920WAN Link MigrationNCSA Neteng will migrated the WAN link to ESnet to new hardware.Traffic was automatically re-routed to redundant paths during the link outage.help+neteng@ncsa.illinois.edu

RESOLVED

2021-04-14 15:002021-04-14 15:00git.ncsa.illinois.eduUsers can no longer access repositories from git clients over HTTPS using their NCSA password.

NCSA passwords can not access repositories with Git clients. Instead use ssh keys over SSH or personal access tokens over HTTPS.

We thought this went into effect during git changes on Nov 2, 2020 but discovered it was still working until we made changes to GitLab to fully remove LDAP functionality.

help+service@ncsa.illinois.edu

COMPLETE

2021-04-13 14152021-04-13 1845git.ncsa.illinois.eduThe GitLab website at git.ncsa.illinois.edu was having issues with authentication. The LDAP server that it uses was timing out.
  • Login to the Git web interface was timing out.
  • Access from git clients continued to work during the outage.
help+service@ncsa.illinois.edu

RESOLVED

2021-04-13 0800

2021-04-13 0830

cilogon.orgUpdate to OA4MP v5.1.1.The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.1.help@cilogon.org

COMPLETE

2021-04-12 18002021-04-12 2245File & Print ServersMonthly Windows File & Print Server MaintenanceWindows File Shares such as HR, Business Office, Home, etc. and printing in the NCSA & NPCF buildings were unavailable.help+service@ncsa.illinois.edu 

COMPLETE

2021-04-10
0600
2021-04-10
0800
CILogon hosted COmanage, Grouper, SATOSA, LDAPOn Saturday, April 10, the CILogon team will perform maintenance on the infrastructure used for hosted services.As part of the maintenance all COmanage Registry, LDAP, Grouper, SAML proxy, SAML attribute authority, and MDQ services hosted by CILogon may experience brief outages. We do not expect that any specific service outage will last for more than a minute.help@cilogon.org

COMPLETE

2021-04-08 09002021-04-08 1045WAN Link MigrationNCSA Neteng migrated the WAN link to ICCN Node-1 to new hardware.Traffic was automatically re-routed to redundant paths during the link outage. Issues were noticed by users during the outage and are currently being investigated in cooperation with our upstream provider.help+neteng@ncsa.illinois.edu

COMPLETE

2021-04-08 07302021-04-08 0734NCSA WikiNCSA's Wiki service was restartedNCSA's Wiki service was restarted to apply a new SSL certificate and renewed Confluence license. The wiki was not available for 4 minutes while it reloaded.help+service@ncsa.illinois.edu 

COMPLETE

2021-04-07 1610

2021-04-07 1733Internal Savannah/MIS websiteThe Savannah/MIS website would not load due to a corrupted MySQL database table referenced across all of the Savannah tools.Internal/Savannahhelp+service@ncsa.illinois.edu

RESOLVED

1st report 7:30am Monday8:19am MondayNCSA LDAP2ldap2 is not responsive to authentication requestsNCSA Jira, any systems using LDAP2 as its only source.help+service@ncsa.illinois.edu

RESOLVED

2021-03-30

0800

2021-03-30

0845

DNS1A software issue was causing BIND to fail. DNS was not able to resolve during the period of time.  DNS2 remained operational. neteng+help@ncsa.illinois.edu

RESOLVED

2021-03-23

2000

2021-03-23

2025

NCSA VPNThe standby VPN hardware was replaced and transitioned into the current VPN cluster. Failover went as expected and firmware was upgraded on the primary after load was shifted to the new standby VPN.Failover between the appliances occurred without issue and there was no impact to users.neteng@ncsa.illinois.edu

RESOLVED

2021-03-18 12301255JiraSome functionality will be limited due to user limit being reachedJirahelp@service@ncsa.illinois.edu

RESOLVED

~16:4017:58AnyConnect VPN Service

An issue with SSL on the VPN service has caused an issue that has disconnected all users. Network engineering is looking into the issue.


Due to a hardware failure and the VPN not failing over properly to the standby users were unable to connect to the VPN. This was due to an issue with syncing certificates.

During the outage, expect that you won't be able to connect/maintain a connection to the VPNhelp+neteng@ncsa.illinois.edu

RESOLVED

2021-03-16 09502021-03-16 1000CMDBWill be applying updates per security vettingCMDB, including web interface, will be down briefly during the update.ncsagroup+org_itsm@ncsa.illinois.edu

RESOLVED

2021-03-11
0900

2021-03-11
0930

WAN Link MigrationNCSA Neteng migrated the link to ICCN to new hardware.Traffic was automatically re-routed to redundant paths during the link outage.help+neteng@ncsa.illinois.edu

RESOLVED

2021-03-04
0900

2021-03-04
0905

WAN Link MigrationNCSA Neteng migrated the 100G link to MREN to new hardware.Traffic was automatically re-routed to redundant paths during the link outage.help+neteng@ncsa.illinois.edu

RESOLVED

2021-03-01 22:112021-03-01 22:47NCSA vSphereAbout 40 VMs lost connection to their NFS storage.Several VM-based services were timing out during the issue, including: vSphere management, a kerberos replica, a ldap replica, httpproxy, license servers, NCSA fileserver, Identity message queuing, monitoring. That triggered some of those VMs to switch to use read-only disk, needing to be rebooted later.service@ncsa.illinois.edu

RESOLVED

?


  • No labels