You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 1142 Next »

status.ncsa.illinois.edu


Watch this page in the wiki to subscribe to automatic updates to this status page.

Please do not refer to any NCSA Industry Partners on this page. Please use the iforge nomenclature for all of the *forge infrastructure.

To see older events, see Archive of NCSA Status Home

Report a problem 

Current Status  


START
ENDWhat System/Service is affectedWhat is happening?What will be affected?Contact PersonStatus
2020-01-28 17:002020-01-28 19:00Exit-East RouterOS updateMost traffic will be sent via our second router.  Some specific projects may be affected.  Neteng will talk to those projects directly.help+neteng@ncsa.illinois.edu
2020-01-28 17:002020-01-28 19:00DHCP UpgradeOS updatesDHCP server will be rebooted for all office and wireless networks.  All connected clients will not be affected.  Any new IP requests during the reboot will be delayed.  This shouldn't be impacting for most users.help+neteng@ncsa.illinois.edu








Upcoming Scheduled Maintenance

StartEndWhat System/Service is affectedWhat is happening?What will be affected?Contact PersonStatus
2020-01-30 8:002020-01-30 9:00LSST FirewallsFirewall upgradeNo impact is expected.  Traffic to/from 141.142.181.0/24 and 141.142.182.128/26 will be failed over from the primary firewall to the secondary firewall while the primary is upgraded, then failed back.  Traffic between these subnets and the LSST storage network does not traverse the firewall.help+security@ncsa.illinois.edu
2020-01-30 13:002020-01-30 13:05Systems using acctdIDDS will install triggers on the production database to support the new project-data message.There are no changes that need to be made to current acctd implementations. The only impact acctd users may notice is the presence of project-data messages in acctd logs.

help+idds@ncsa.illinois.edu



2020-02-03 10:002020-02-03 10:05Systems connection to idds-prodITS will be updating firewall settings for idds-prod.No impact is expected, but users should contact help+idds if issues occur.

help+idds@ncsa.illinois.edu



2020-02-04

10:00

2020-02-04 12:00CILogon upgradeCILogon Service web front-end Bootstrap upgrade (http://bit.ly/36BvG57)No downtime is expected.help@cilogon.org
2020-02-27 08:002020-02-27 12:00LSST

Monthly Maintenance:

  • OS updates and reboots
  • Other updates as needed

ALL LSST systems will be updated, including:

  • TBD
lsst-admin@ncsa.illinois.edu

TBD



Previous Outages or Maintenance

StartEndWhat System/Service was affected?What happened?What was affected?

Contact Person

Status
2020-01-27 11:542020-01-28 09:36oa4mp.ncsa.illinois.eduan automated CA certificate update caused authentication failuresNCSA RSA authentication to Globus was unavailablehelp+idp@ncsa.illinois.edutemporary work-around in place; proper fix scheduled for 2020-01-29 14:00
(note: oa4mp.ncsa.illinois.edu is scheduled for retirement on 2020-04-01)
2020-01-21 08152020-01-21 0825ldap2ldap2 was returning ldap queries inconsistently so the service was restarted.login to certain services was unusually slow for some users. Jira being the top problem.help+its@ncsa.illinois.eduldap2 queries are working as expected after the restart.
2020-01-16 : 17302020:01-16: 1748Condo NFS serviceNFS exports are failing path resolutioinNFS file system client mountsChad KernerServers rebooted, mounts restored
2020-01-15 08:002020-01-16 01:55ICCP

Quarterly Maintenance

  • Golub IB Core switch FW update
  • Golub 10G Core switch FW update
  • GPFS 5.0.4.1 update
  • Moved golub Rack8 to accommodate expansion
Total outage including export nodes (access to HTC will still available)iccp-admins@campuscluster.illinois.eduComplete
2020-01-15 07:002020-01-15 12:00LSST NCSA Test StandHardware repair in NCSA Test Stand

21 servers in the NCSA Test Stand had their drive backplanes replaced by the vendor.

lsst-admin@ncsa.illinois.edu

COMPLETE

2020-01-06

10:00

2020-01-08

14:30

Code42 Crashplan EndpointsThe Code42 Crashplan servers start edpushing out Code42 Crashplan client updatesAll users of CrashPlan will have their clients upgraded.help+its@ncsa.illinois.eduComplete

2020-01-03 

10:20

2020-01-03 11:20Code42 Crashplan was upgradedSoftware updates to the CrashPlan Auth and Storage servers were appliedBackups were queued while the services restarted.help+its@ncsa.illinois.eduComplete
2020-01-02 11:302020-01-02 17:39NCSA ITS vSphere vCenterVCenter was upgraded to latest patch level. Due to some bugs it took longer to apply than expected.The VMware administrative interface was unavailable during the update.help+its@ncsa.illinois.eduComplete

2019-12-18

08:00

2019-12-18

10:00

Facility infrastructure  Electrical Transformer

TX-5C

Replace defective temperature controller "No Outage"  Production projects on feeder CMO Rantissi

Complete


2019-12-17 06:002019-12-17 10:25JIRAJIRA Upgrade from 7.6 to 8.5All JIRA usershelp+its@ncsa.illinois.edu

COMPLETE

2019-12-12 08:002019-12-12 14:00LSST

Monthly Maintenance:

  • OS updates and reboots
  • GPFS filesystem restructure

ALL LSST systems will be updated, including:

  • lsst-dev01, lsst-xfer, etc.
  • Slurm verification cluster
  • PDAC/Kubernetes/LSP clusters
  • tus-ats01
  • L1 test stand
lsst-admin@ncsa.illinois.edu

COMPLETE


2019-12-12 10:022019-12-12 10:08internal.ncsa.illinois.eduSystem memory was exhausted and OOM killer started killing https connections.Savanna tools were unavailablehelp+its@ncsa.illinois.eduMemory resources for the server were doubled and service was brought back online.
2019-12-10 13:452019-12-10 16:55Internet2 ConnectivityInternet2 Engineers isolated the issue to a malformed route update coming from an external peer to one of its nodes in Ashburn, VA. As this update was propagated throughout the Internet2 Network, it triggered a bug on the Internet2 routers and caused all internal BGP sessions of each router to rapidly flap, thus causing instability across the footprint. Engineers mitigated the issue by placing a filter on the specific peer to reject the malformed packet. The Major Incident has been resolved at this point.Many different external resources, data transfers, sessions, etc. to various destinations.help+neteng@ncsa.illinois.edu

Connectivity has stabilized. Please report any issues should they arise.

2019-12-22019-12-2 afternoonWireless network Tech Services reports they are having authentication issues affecting Wifi and VPN.  Engineers are working on the problem. Tech Services Issue Description.NCSAnet, IllinoisNet wireless are non functional at the moment. NCSA wired network remains available. IllinoisNet_guest is also functional. help+neteng@ncsa.illinois.eduTroubleshooting in progress
2019-11-14 18:002019-11-14 19:00Exit-West RouterSoftware UpgradesThis should not be user impactful.  All traffic will re-route via the other router.help+neteng@ncsa.illinois.edu

COMPLETE

2019-11-14 5:00 AM2019-11-14 3:30 PMNearline EndpointIssue with one storage librarySome Globus transfers were stalled for the period of the outagebw+storage@ncsa.illinois.edu

COMPLETE

Nov 7 10:00Nov 7 14:00ICCP.  All login nodes will be down.Reroute some IB cables between Core switches and compute nodes.  Changing topology on Subnet Manager.Scheduler will be pause. No users access to login nodes.  All running jobs will be kill.  help@campuscluster.illinois.edu

COMPLETE

2019-11-05 07:002019-11-05 16:53iForgeQuarterly MaintenanceAll systems will be unavailable during the maintenanceiforge-admin@ncsa.illinois.edu

COMPLETE

2019-10-12019-11-1NCSA Windows Domain ControllersITS Migrated all Windows Systems to using the Campus Domain.  The existing NCSA Windows Domain has been decommissioned and shutdown.NCSA Windows Systemshelp+its@ncsa.illinois.edu

COMPLETE

2019-10-23

8 a.m.

2019-10-23

12:00 p.m.

Core-West Code upgrades will be performed on Core-West network switch.This should not be user impacting.  All traffic will flow through the redundant Core.neteng+help@ncsa.illinois.edu

COMPLETE

2019-10-22 06:12

2019-10-22 07:18

Jira and WikiDuring reboots for system patches the wiki and Jira got stuck in a state that was not providing data to the users.Only web access to these tools was impacted.help+its@ncsa.illinois.edu

COMPLETE

2019-10-16 08:002019-10-16 20:30ICC system wideQuarterly maintenanceAll services on ICChelp@campuscluster.illinois.edu

COMPLETE

2019-10-16

8 a.m.

2019-10-16

12:00 p.m.

Core-East Code upgrades will be performed on Core-East network switch.This should not be user impacting.  All traffic will flow through the redundant Core.neteng+help@ncsa.illinois.edu

COMPLETE

2019-10-15 11:45am2019-10-15 11:56AM npcf-exit-east BGP peering flapped over I2 AL2S circuitTraffic got re-routed but some WAN services were impacted as reported by users. help+neteng@ncsa.illinois.edu

COMPLETE

2019-10-10 07:00

2019-10-10 07:30

mysql.ncsa.illinois.eduSome table repairs broke replication; this maintenance will update the replicas with newer databases so the service will work as expected again.Wiki, JIRA, and some web sites will stop working.  Email forwarding to user accounts at NCSA will be delayed during the outage.lindsey@ncsa.illinois.edu

COMPLETE

2019-10-01


2019-10-03NCSA-Print & Building Printers

Some printers are having issues connecting to the NCSA Print Server.  

After updating drivers on the print server, public printers are working as expected.

Printinghelp+its@ncsa.illinois.edu

COMPLETE

2019-10-03 6AM

2019-10-03

7:45AM

Jira and WikiDuring reboots for system patches the wiki and Jira got stuck in a state that was not providing data to the users.Only web access to these tools was impacted.help+its@ncsa.illinois.edu

COMPLETE

2019-10-01 7AM2019-10-01
8:30PM
Blue WatersNGA work load scheduled testingscheduler testing for NGA workloadDavid King

COMPLETE

2019-10-01 10AM2019-10-01
12:04PM
Blue WatersEPO 4 racks lost xdp (cooling)
CRAY warm swapped racks back into system successfully.
scheduler, some computes missing and Gemini was rerouted

COMPLETE

2019-10-01 07:00

2019-10-01 07:30

mysql.ncsa.illinois.eduMySQL servers needed to be synchronized to convert the server in NPCF back to a replicated host.Wiki, JIRA, and some web sites stopped working.  Email forwarding to user accounts at NCSA was delayed during the outage.lindsey@ncsa.illinois.eduCOMPLETE



  • No labels