You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 1734 Next »

status.ncsa.illinois.edu


Watch this page in the wiki to subscribe to automatic updates to this status page.

Please do not refer to any NCSA Industry Partners on this page. Please use the iforge nomenclature for all of the *forge infrastructure.

To see older events, see Archive of NCSA Status Home

Report a problem 

Current Status  

START
ENDWhat System/Service is affectedWhat is happening?What will be affected?Contact PersonStatus
2021-11-02 15:20
Production version of DCIM for CMDB (https://ncsa-cmdb.ncsa.illinois.edu)Invalid certificate issueThe production version of CMDB will be unavailable until new certificate is received and applied. 

In the interim, the test server (https://ncsa-cmdb-test.ncsa.illinois.edu) has been made available for use, with all current data.
Kimber Blum (kimber7@illinois.edu)

IN PROGRESS


Upcoming Scheduled Maintenance

Listed below in chronological order.

StartEndWhat System/Service is affectedWhat is happening?What will be affected?Contact PersonStatus
2021-11-03 10002021-11-03 1100Core Router Linecard ReplacementNeteng will be replacing a linecard in one of the core routersAll connections to this linecard are redundant and no outage is expected.neteng@ncsa.illinois.edu

SCHEDULED

2021-11-03

1100

2021-11-03

1400

ESnet 100G link migration. ESnet engineers will be migrating NCSA's 100G link to the new ESnet6 infrastructure. The link will be down during the migration.  Traffic will fall back to alternative paths. help+neteng@ncsa.illinois.edu

SCHEDULED

2021-12-09
0800
2021-12-09
1200
LSST

LSST Quarterly Maintenance

  • TBD
All LSST services hosted at NCSAlsst-admin@ncsa.illinois.edu

SCHEDULED


Previous Outages or Maintenance

StartEndWhat System/Service was affected?What happened?What was affected?

Contact Person

Status
2021-11-02 08002021-11-02 0900cilogon.orgUpdate to OA4MP v5.2.3Address several small issues in the back-end servicehelp@cilogon.org

COMPLETE

0600

0710

JiraJira UpgradeJirahelp+service@illinois.edu

IN PROGRESS

2021-10-25 18002021-10-26 0018NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help+service@ncsa.illinois.edu

COMPLETE

2021-10-20 08002021-10-20 1800ICCP

ICCP Quarterly Maintenance

  • VLAN Change for IPMI network
  • OS update
ICCP Cluster nodes onlyhelp@campuscluster.illinois.edu

COMPLETE

2021-10-20 07002021-10-20 0715IDDSIDDS maintenance (puppet changes)All IDDS servicesidds-admin@ncsa.illinois.edu

COMPLETE

2021-10-15 12302021-10-15 0713NCSA GitLabServer ran out of disk spaceAll GitLab services were unavailablehelp+service@ncsa.illinois.edu

RESOLVED

2021-10-11 08002021-10-11 1900Nightingale, ACHEPlanned maintenance on the Nightingale cluster and the ache-dist switchThere was an outage for the following services during the maintenance:
  • ALL Nightingale hosts/services
  • ALL firewalled traffic in/out of ACHE, which includes admin access & monitoring in/out of ALL of ACHE (this portion was complete by 1140)
    • network access to ALL of the ache-esxi-hosted VMs, including ache- and ngale-bastion hosts
    • ACHE FW IPMI interfaces
help+service@ncsa.illinois.edu

COMPLETE

2021-10-04 10002021-10-04 1005www.ncsa.illinois.edu per-user web directoriesPer-user web directories on the main NCSA website are being redirected to a new website dedicated to per-user web directories.URLs like www.ncsa.ncsa.illinois.edu/People/* are redirected to their new home at https://users.ncsa.illinois.edu/*.help+service@ncsa.illinois.edu

COMPLETE

2021-09-30
0800
2021-09-30
1200
LSST

LSST Quarterly Maintenance

  • OS updates
  • K8S updates
All LSST services hosted at NCSAlsst-admin@ncsa.illinois.edu

COMPLETE

2021-09-29 08002021-09-29 0900cilogon.orgUpdate to OA4MP v5.2.2Update Java database libraries, and address several small issueshelp@cilogon.org

COMPLETE

2021-09-29 08002021-09-29 0813CMDB / openDCIMInstalling/upgrading to CMDB release Sep2021The openDCIM front end of CMDB will be down for 15-30 minutes

COMPLETE

2021-09-28 07002021-09-28 1554NPCF work on facility powerDeenergizing power to transformer TX-4C-1020, pulling and terminating busduct cabling from transformer to room 2020. One third of Sonexion racks will lose source 1 power (Feed C) and will continue to operate on source2 degrading reliability by losing power redundancy.

COMPLETE

2021-09-28 07002021-09-28 0900Blue WatersA rack of scratch lost power during the power outage.Scratch was partially unavailable due to TOR power resiliency issue.

COMPLETE

2021-09-28 08002021-09-28 0900idp.ncsa.illinois.eduAssert eduPersonAssurance Cappuccino profile for NCSA StaffNCSA Staff logging in with the NCSA Identity Provider will be able to get Silver CA certificates from cilogon.orghelp+idp@ncsa.illinois.edu

COMPLETE

2021-09-21-14:502021-09-21-15:02vcenter appliance controlling ASD vspherevcenter appliance was upgradedvsphere.ncsa.illinois.edu was off-line for 12 minutes.help+service@ncsa.illinois.edu

COMPLETE

2021-09-21 07002021-09-20 1115Blue WatersPower Work caused non redundant switches and misconfigured servers to shutoffBlue Waters Compute, Login and Schedulerbw-admin@ncsa.illinois.edu

COMPLETE

2021-09-20 1800

2021-09-20 2130

NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not be able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu

COMPLETE

2021-09-14 00002021-09-14 0600Internet2 WAN circuitInternet2 will be migrating our WAN circuit to new hardware. Traffic over that path will reroute while the change happens.  We anticipate the migration to take less than 30 mins.help+neteng@ncsa.illinois.edu

SCHEDULED

 0600

 0900

WikiUpgrade to next versionWiki will be unavailable

help+service@ncsa.illinois.edu

COMPLETE

2021-09-09 06002021-09-09 0700NCSA VPNSoftware UpgradesThe appliances hosting the NCSA VPN will be patched. Users will experience a brief disconnect as load is failed over between the appliances.help+neteng@ncsa.illinois.edu

COMPLETE

2021-09-08 13002021-09-08 1400Group prod_b Bastion hostsOut of cycle patchingBastion hosts in group prod_b will be patched and rebooted. (see MOTD for group assignment)help+security@ncsa.illinois.edu

COMPLETE

2021-09-08 09002021-09-08 1000Group prod_a Bastion hostsOut of cycle patchingBastion hosts in group prod_a will be patched and rebooted. (see MOTD for group assignment)help+security@ncsa.illinois.edu

COMPLETE

2021-09-02 9:30 AM2021-09-02 1PMPDU in rack AA81We are replacing a PDU in NPCF rack AA81All systems in the rack have redundant power connections.  No service outages are expected from this workhelp+service@ncsa.illinois.edu

COMPLETE

2021-09-01 07002021-09-01 0800cilogon.orgUpdate to OA4MP v5.2.1Device Authorization Grant Flow transactions will be stored in database rather than in memoryhelp@cilogon.org

COMPLETE

 1200

 1205

WikiSecurity patch is being appliedWiki will be downhelp+service@ncsa.illinois.edu

SCHEDULED

2021-08-25 9:00am2021-08-25 6:45pmBlue Waters System reboot due to blade fallout coinciding with HSN reroute and SMW not recovering.All jobs interruptedjenos@illinois.edu

COMPLETE

2021-08-19 05382021-08-19 0700IRST systems hosted on IRST Node 2Storage controller failure, all VMs taken offlinesome prod_b systems, and non-redundant services.eyrich@illinois.edu

RESOLVED

2021-08-19 5:342021-08-19 6:20cilogon.orgStorage controller failure in IRST VM farmcilogon.org was unreachable until we initiated fail-over to our backup servers at NICS.help@cilogon.org

COMPLETE

2021-08-18 11362021-08-18 1156NCSA WikiTest instance caused interference.NCSA Wikihelp+service@ncsa.illinois.edu

COMPLETE

2021-08-17 05002021-08-17 0700NCSA/NPCF Wide Area NetworkBetween 5:00AM and 7:00 AM CDT on 08/17/2021, Campus ICCN Engineers will be upgrading firmware on the ICCN router 710rtr at the Starlight facility in Chicago.Our peerings with MREN and OmniPoP will go down. All traffic destined for those peerings will reroute via other peerings, so no production impact is expected.help+neteng@ncsa.illinois.edu

COMPLETE

2021-08-16 18002021-08-17 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help+service@ncsa.illinois.edu 

COMPLETE

2021-08-12 9:542021-08-12 1012JiraAttempted snapshot of Jira in vSphere was too intensive for the systemJirahelp+service@illinois.edu

COMPLETE

2021-08-10
2000
2021-08-011
0000
Radiant API and Web access

Radiant cluster name change.During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable.  Instances will continue to run and be available over the network with no interruptions.

radiant-admin@ncsa.illinois.edu

COMPLETE

2021-08-10 07:002021-08-10 17:10iForgeQuarterly MaintenanceAll systems unavailableiforge-admin@lists.ncsa.illinois.edu

COMPLETE

2021-08-09 14212021-08-09 1440NCSA WikiDB conflict configuration with Wiki & Wiki-TestNCSA Wiki was unaccessiblehelp+service@ncsa.illinois.edu

COMPLETE

2021-08-05 10002021-08-05 1030NPCF Core Router - Linecard RebootA problem was identified on one of the line cards in our core router requiring a reboot of the linecard. The linecard was successfully rebooted and we will continue monitoring the hardware for further issues.All connections to this linecard are redundant and there was no impact to users.neteng@ncsa.illinois.edu

COMPLETE

2021-08-05
0800
2021-08-05
1000
LSST

LSST Emergency OS Patching

LSST services hosted at NCSA except:

  • NTS will remain up (has already been patched)
lsst-admin@ncsa.illinois.edu

COMPLETE

2021-08-04
0800
2021-08-04
1700
Radiant API and Web access

Installation of new Radiant cluster

Cluster name changes are starting at 1100; This will make the horizon dashboard unreachable.
During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable.  Instances will continue to run and be available over the network with no interruptions.

radiant-admin@ncsa.illinois.edu

COMPLETED

2021-08-04 07002021-08-04 0800cilogon.orgUpdate to OA4MP v5.2.0Added support for Device Authorization Grant Flow (RFC 8628)help@cilogon.org

COMPLETED

2021-08-03
0800
2021-08-03
1700
Radiant API and Web access

Installation of new Radiant cluster


During this time access to the API endpoints and the Horizon web dashboard will be intermittently unavailable.  Instances will continue to run and be available over the network with no interruptions.

radiant-admin@ncsa.illinois.edu

COMPLETED

2021-08-03 9:00 am2021-08-03 11:30 amRadiant ClusterA change was made to the firewall that unintentionally restricted access for instances and other internal cluster communication.Access to instances and workloadradiant-admin@ncsa.illinois.edu

RESOLVED

2021-07-31 06002021-07-31 0630CILogon hosted servicesInfrastructure maintenanceDuring this time each service hosted by CILogon including COmanage Registry, LDAP, Grouper, SAML proxy, and MDQ will become unavailable for a short time. Each individual service outage will last less than 5 minutes. Services that will not be impacted include: * OIDC clients that do not query LDAP for resolving attributes * X.509 certificate issuance and certificate revocation lists * LIGO and GW-Astronomy serviceshelp@cilogon.org

COMPLETE

2021-07-29 13002021-07-29 1400IRST-run bastion hosts (pool B)Security patchingHosts managed by IRST will be patched and rebooted. Only hosts in pool B will be patched at this timehelp+security@ncsa.illinois.edu

COMPLETE

2021-07-29 09002021-07-29 1000IRST-run bastion hosts (pool A)Security patchingHosts managed by IRST will be patched and rebooted. Only hosts in pool A will be patched at this timehelp+security@ncsa.illinois.edu

COMPLETE

2021-07-28 10002021-07-28 1050LSSTOS Updates on only NCSA Test Stand (NTS)Only the LSST NCSA Test Stand (NTS) services hosted at NCSAlsst-admin@ncsa.illinois.edu

COMPLETE

2021-07-27 06002021-07-27 0900JiraUpgradeJira will be unavailable

help+serverice@ncsa.illinois.edu

COMPLETE

2021-07-26 18002021-07-27 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were not able to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help+service@ncsa.illinois.edu 

COMPLETE

2021-07-21
0800
2021-07-21
2900
ICCP

ICCP Quarterly Maintenance

  • TBD
All ICCP services

help@campuscluster.illinois.edu


COMPLETE

2021-07-21 15:242021-07-21 21:50ASD Vshpere cluster in 3003One of the 4 hypervisors in the cluster paniced.  Unscheduled preventative maintenance is being preformed on it and the other 3 nodes in the cluster.after the initial outage at 15:24, there should be no additional outages.help+service@ncsa.illinois.edu

COMPLETE

2021-07-13 07002021-07-13 0800cilogon.orgUpdate to OA4MP v5.1.4.The OAuth2/OIDC backend of the CILogon Service will be updated to OA4MP v5.1.4.help@cilogon.org

COMPLETE

2021-07-08 08002121-07-08 1000OpenAFSThe remaining OpenAFS database servers were upgraded.No service impacts were seenhelp+service@ncsa.illinois.efu

COMPLETE

2021-07-07 06002021-07-07 0800CILogon AWS Hosted ServicesUpgrading AWS RDS Aurora MySQL v5.6 to v5.7COmanage Registry and Grouper services hosted by CILogon will be unavailablehelp@cilogon.org

COMPLETE

2021-07-01

2140

2021-07-01

1430

Horizon dashboard access was down for the entire period. Cluster networking was down from 1200 to1430.Investigations into Horizon  dashboard accessibility issues resulted in the application of an incorrect default network gateway for the cluster around noon. This was corrected and networking functionality restored around 1400. Instances began recovering soon thereafter.Radiant admins believe running instances have recovered on their own but we advise everyone to check their systems and report any issues they see to the help desk.
help@ncsa.illinois.edu

RESOLVED

2021-07-01

0247

2021-07-01

1300

Various systems in NPCF, ACB, NCSA

There was a power event in the Champaign-Urbana area at around 2:47AM today. Details about the cause are currently unknown.  This event caused disruptions to systems at the NCSA building, NPCF and ACB. Known issues have generally been resolved but there may be unidentified issues lingering. If you encounter any problems, please notify NCSA help desk staff (help@ncsa.illinois.edu).

Multiple systems/services were impacted. All have been recovered and return to normal operations is complete.NCSA help desk

RESOLVED

?


  • No labels