Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

status.ncsa.illinois.edu

 


Note

Watch this page in the wiki to subscribe to automatic updates to this status page.

Current Status

Please do not refer to any NCSA Industry Partners on this page. Please use the iforge nomenclature for all of the *forge infrastructure.

To see older events, see Archive of NCSA Status Home

...

Report a problem

...

 

Current Status  

...

StartEndWhat System/Service is affectedWhat is happening?What will be affected?
2017-08-08 7:002017-08-08 17:00iforge/cforge/cfdforge/aforgeQuarterly maintenanceNew 6.9 OS images

...

Contact PersonStatus

Upcoming Scheduled Maintenance

Listed below in chronological order.

StartEndWhat System/Service
was
is affected
?
What
happened
is happening?What
was
will be affected?
Outcome2017-07-28 17:002017-07-31 eveningLocking out the old DES operational database at 5pm CST FridayUpdate - All of the production data has been migrated except for the largest object table. That is loading now, then the user space will be loaded. Should all hopefully be done by this evening. Migration of operational database to new hardware happening during the weekend. DES old operational databasemigration done successfully. Some other maintenance tasks that will give DES additional disk space was done, too and some performance improvements.2017-07-27 11:002017-07-28 15:00netact.ncsa.illinois.edu

 The netact.ncsa.illinois.edu network activation server VM needed to be restored from backup

Network Activation serviceThe service has been fully restored2017-07-25 02:362017-07-25 18:00Campus Cluster / Scheduler downBlip on mgmt1 causing GPFS drop and scheduler to crashScheduler offlineStill taking long time for Scheduler to initialize but jobs can start and run as usual. Opened case with Adaptive.2017-07-20 09:002017-07-20 17:00ROGER Ambari and OpenStackUpdates to openstack control node and the Ambari clusterAmbari nodes (cg-hm08 - cg-hm18), OpenStack instances and serversOpenstack was back in service on time. Ambari had issues mounting hdfs was held out of service. HDFS was remounted on 25 July2017-07-20 06:002017-07-20 10:00All NCSA hosted LSST resourcesMonthly OS patches (addressing issues including CESA-2017:1615 and CESA-2017:1680). Roll-out updated puppet modules. Batch nodes updated firmware.All nodes in NCSA 3003 and NPCF (batch nodes) will reboot.Overall success. Exceptions: verify-worker31 failed a firmware update and is out of comission (LSST-914) and there are connectivity issues for some VMs used by the NCSA DM team (IHS-365). adm01, backup01, and test[09-10] will be patched in the near future.2017-07-19 08:002017-07-19 14:44Campus ClusterJuly Maintenance (applied security patch)Cluster wide, except mwt2 nodesApplied new kernel, glibc, bind patches and newest NVIDIA driver.2017-06-29
18002017-06-30 0000Blue WatersEmergency maintenance to apply security patch addressing Stack Guard security vulnerability.Compute, Login, Scheduler are offline.Kernel and glibc library patched on all affected system.2017-06-22 08002017-06-22 1200All NCSA hosted LSST resourcesCRITICAL kernel and package updates to address Stack Guard Page security vulnerability.

Systems will be patched and rebooted.

Outage was extended to last past 1000 until 1200. Systems were successfully patched as planned except for qserv-db12 and qserv-db27, which will not boot. We will follow up on those with a ticket.2017-06-22 08002017-06-22 0930LSST cluster nodes (verify-worker*, qserv*, sui*, bastion01, test*, backup01)Deploy Unbound (local caching DNS resolver)DNS resolving may have a short (~30 mins) delay. Successfully deployed and all tests (including reverse DNS and intra-cluster SSH) pass.2017-06-20
09302017-06-20
1100BluewatersXDP shutting down causing EPO on cabinet c1-7 and c2-7.Scheduler was paused to isolate the failing components, then resumed.Warmswap of failing components, and returned them to service.

2017-06-20

0900

2017-06-20

1000

NCSA Open Source

Security upgrade needed for Bamboo, will also update the following components: Bamboo, JIRA, Confluence, BitBucket, FishEye

Most of the subcomponents of NCSA opensource will be down for a short time when the software is updated.Upgraded Bamboo, JIRA, Confluence, BitBucket, FishEye to latest versions

2017-06-16

0900

2017-06-16

1100

ROGER Openstack nfs backend failed and was restartedThe primary CES server for the openstack backend failed and tried to fail over to the secondary server, which also failed. SET was notified and they had the CES nfs service back up by 1100The RoGER openstack dashboard went down and needed a restart. Several VM's experienced "virtual drive errors" and will need to be restartedSET is still investigating the cause of the GPFS CES service failover. CyberGIS is working with their users to get the affected VM's restarted2017-06-15 08002017-06-15 0930LSST cluster nodes (verify-worker*, qserv*, sui*, bastion01, test*, backup01)Deploy unboundDNS resolving may have a short (~30 mins) delay.

Updates deployed successfully via new puppet module. All tests passed.

EDIT 2017-06-15 1500 - Reverse DNS not working, which broke ssh to qserv* nodes. Disaabled unbound.

6/14/2017

8:00 a.m.

6/14/2017

10:00 p.m.

Network Core SwitchNetwork Engineering will be replacing a line card in one of our Core switches due to hardware issue.All services should remain active. Any affected switch will have a second redundant link to the other core to pass traffic.Line card was successfully replaced.2017-06-08 12:002017-06-11 22:20Campus Cluster (scheduler paused)Disk Enclosure 3 failure on DDN 10K.Lost redundancy and force us to drain the cluster.Repair/replacement for controller can be time consuming so we took action to rebalance data out of failed enclosure. Scheduler was resumed as of 22:00.

2017-06-07 12:07

2017-06-07 12:42NCSA LDAPThe NCSA LDAP service crashedNCSA LDAP service was unavailableLDAP software and OS were updated and server rebooted. LDAP is working normally.2017-05-31 20:062017-05-31 20:36NCSA LDAPThe NCSA LDAP service was timing outNCSA LDAP service was unavailableThe root cause of LDAP timeouts is still being investigated.2017-05-222017-05-26Campus Cluster VMsNetwork issue ESXI (hypervisor) Boxes after maintenanceCould no longer able to login to start VMs. License Server, nagios, all MWT2 VMs were down

The issue is fixed on 5/24. Restored license and Nagios service on 5/24. Moved MWT2 VMs to Campus Farm. All VMs return to service as of noon 5/26.

5/12/20175/18/2017Condo/NFS partitions onlythe NFS partition for the condo became extremely unstable after a replication (normal daily maintenance) was completed. Many iterations with FSCK and IBM on the phone got it resolved, and then 1.5 days restoring files that had been put in Lost and found.UofI library was switched to the READONLY version on the ADS during this timeThe root cause is still being investigated.2017-05-23 14:052017-05-23 14:13NCSA LDAPThe NCSA LDAP service was timing outNCSA LDAP service was unavailableThe issue is still being investigated, but seems to be steadily available since the incident.2017-05-22 15:412017-05-22 15:51idp.ncsa.illinois.edu
oa4mp.ncsa.illinois.eduApache Tomcat out of memoryInCommon/SAML IdP and OIDC authentication services were unavailable.Service restored by failing over to secondary server while memory is being increased on primary server.05/20/2017 21:09

05/20/2017 23:37

DES nodes on Campus ClusterCould not communicate outside the switchAll nodes connected to switch in POD22 Rack2 @ACBUpgraded the code on the switch resolved the issue.05/20/2017 05:0005/20/2017 21:09Campus Cluster and Active Data Storage (ADS)Total power outage at ACBAll systems currently reside at ACB

Power was restored around 13:00hrs. We rotated ADS rack to align with Campus Cluster Storage Rack. Changed couple of VLAN IDs to reflect campus for future merger. ESXI boxes are down due to a configuration error after reboot. No major issue from output of FSCK from scratch02.

05/17/2017 02:0005/17/2017 10:45Internet2 WAN connectivityIntermittent WAN connectivity. The outage was a result of Tech Services' DWDM system, which provides us with our physical optical path up to Chicago via the ICCN. Specifically, the Adva card that our 100G wave is on was seeing strange errors, which was causing input framing errors for traffic coming in on this interface.General WAN connectivity to XSEDE sites, certain commodity routes, and other I2 AL2S connections.The Adva card was rebooted and we stopped seeing the input framing errors. Tech Services is working with Adva to find the root cause of the issues on the card.5/11/20175/12/2017ESnet 100G connectionNCSA and ESnet will be moving their 100G connection to a different location in Chicago.We have several diverse high speed paths to ESnet and DOE, traffic will be redirected to a secondary path. 2017-05-11
06:452017-05-11
07:33NCSA Jabber upgradeUpgraded Openfire XMMP jabber softwareNCSA Jabber was unavailable during the upgrade.Jabber was upgraded to the latest version of Openfire

2017-05-09

07:00

2017-05-09

18:15

iForge, cForge, GPFS, License ServersiForge/cForge Planned MaintenanceiForge/cForge systems, including the ability to submit/run jobs.Pm was completed early at 18152017-05-06 22:002017-05-06 23:00NCSA Open SourceUpgrades of Atlassian softwareNCSA Open Source BitBucketBitBucket is upgraded.2017-05-06 09:002017-05-06 10:00NCSA Open SourceUpgrade of Atlassian SoftwareMost services hosted at NCSA Open Source were down for 5 minutes during rolling upgrades.The following services were upgraded: HipChat, Bamboo, JIRA, Confluence, FishEye and CROWD.2017-05-05 17:432017-05-05 20:02ITS vSphereA VM node panickedSeveral VMs died when the node panicked and were restarted on other VM nodes. This included LDAP, JIRA, Help/RT, SMTP, Identity, and others.All affected VMs were restarted on other VM nodes. Most restarted automatically.2017-04-27 18:102017-04-27 18:55Campus ClusterAnother GPFS interruptionBoth Resource Manager and Scheduler went down along with hand full of compute nodes.Restarted the RM and Scheduler and rebooted all down nodes.2017-04-27 13:112017-04-27 14:20Nebulaglusterfs crashed due to this bug, so no instances could access their filesystemsAll instances running on NebulaNeeded to reboot the node that systems were mounting from, but took the opportunity to upgrade all gluster clients on other systems while waiting for a reboot. Version 3.10.1 fixes the bug. All instances with errors in their logs were restarted.2017-04-27 11:202017-04-27 12:45Campus ClusterGPFS interruptionBoth Resource Manager and Scheduler went down.Torque serverdb file was corrupted. Restore the file from this morning snapshot and modified the data to match the current state.2017-04-26 12:002017-04-26 18:30CondoA bug in the delete of a disk partition from GPFS. a problem within GPFSDES, Condo partitions, and UofI Library.Partitions had been up for 274 days, and many changes. The delete partition bug caused us to stop ALL operations on the condo and repair each disk through GPFS. Must have quarterly maintenance. Just too complicated to go a year without reseting things.2017-04-19 16:542017-04-20 08:45gpfs01, iforge, cforge

Filled-up metadata disks on I\O servers caused failures on gpfs01.

iforge and cforge clusters, including all currently running jobs.

Scheduling on iForge and cForge was paused for the duration of the incident. Running jobs were killed.13% metadata space was freed. Clusters were rebooted and scheduling resumed.

2017-04-19 08:002017-04-19 13:00Campus ClusterMerging xpacc data and /usr/local back to data01 (April PM)Resource manager and Scheduler were unavailable during the maintenance.Once again, /usr/local, /projects/xpacc and /home/<xpacc users> are mounting from data01. No more split cluster.2017-04-04 (1330)2017-04-04 (1600)NetworkingSome fiber cuts caused a routing loop inside one of the campus ISP's network.Certain traffic that traversed this ISP would never make the final destination. Some DNS lookups would have also failed.Campus was able to route around the problem, and the ISP also corrected their internal problem. The cut fiber was restored last night.2017-03-28 (0000)2017-03-29 (1600)LSSTNPCF Chilled Water OutageLSST - Slurm cluster nodes will be offline during the outage. All other LSST systems are expected to remain operational.No issues. Slurm nodes restarted.2017-03-28 (0000)2017-03-29 (0230)Blue WatersNPCF Chilled Water OutageFull system shutdown on Blue Waters (except Sonexion which is needed for fsck)FSCK done on all lustre file systems, XDP piping works done (no leakage found), Software updates (PE, darshan) completed.2017-03-25
10:15PM2017-03-26
00:08AMBlue WatersBW scratch MDT failover, df hangsBW scratch MDT failover, load on mds was 500+ delayed failover. Post FO had some issues that delayed RTS.scheduler was paused2017-03-25
4pm2017-03-25
8PpmBlue WatersBW login node ps hangrebooted h1-h3, lost bw/h2ologin DNS record, had neteng recreate the record. Had to rotate login in and out of round-robins until all rebooted. User email sent (2).Logins nodes rebooted
DNS round-robin changes2017-03-23 (1000)2017-03-23 (1500)NebulaNCSA Nebula OutageNebula will take an outage to balance and build a more stable setup for the file system. This will require a pause of all instances, and Horizon being unavailable.File system online and stable. At this time all blocks were balanced and healed.2017-03-16 (0630)2017-03-16 (1130)LSSTLSST monthly maintenanceGPFS filesystems will go offline for entire duration of outages. Some systems may be rebooted, especially those that mount one or more of the GPFS filesystems. 2017-03-15
15:11 2017-03-15
16:01 Blue WatersFailure on cabinet c9-7, affecting HSN.Filesystem hung for several minutes.Scheduler was paused for 50 minutes.
Warmswap cabinet c9-7.
Nodes on c9-7 are reserved for further diagnosis.  2017-03-15 09:002017-03-15 12:47Campus ClusterUPS work at ACB.Reshuffling electrical drops on 10k controllers, storage IB switches and some servers.Scheduler will be paused for regular jobs. MWT2 and DES will continue run on their nodes.UPS work at ACB - incomplete (required additional parts)Redistributing power work done.Scheduler was paused for 3hrs 50 mins.2017-03-10 13:002017-03-10 18:00Campus ClusterICCP - We lost 10K controllers due to some type of power disturbance at ACB.ICCP - Lost all filesystem and its a cluster wide outage.Recovered missing LUNs and rebooted the cluster. Cluster was back in service at 18:00.2017-03-09 09002017-03-09 1500RogerROGER planned PMbatch, hadoop, data transfer services & Ambarisystem out for 6hrs, DT services out until 00002017-03-08 19:412017-03-08 22:41Blue WatersXDP powered off that served the four cabinets
(c16-10, c17-10, c18-10, c19-10).scheduler paused, four rack power cycled.
moab required a restart, too many down nodes
and itterations were stuck.Scheduler paused
three hours2017-03-03 17002017-03-03 2200Blue WatersBW hpss emergency outage to clean
up db2 databasencsa#nearline, stores are failing with cache fullResolved cache full errors2017-02-28 12002017-02-28 1250Campus ClusterICC Resource Manager downUser can't submit new jobs or start new jobsRemove corrupted job file2017-02-22 16152017-02-221815NebulaNebula Gluster IssuesAll Nebula instances paused while gluster repairedNebula is available.2017-02-11 19002017-02-11 2359NPCFNPCF Power HitBW Lustre was down, xdp heat issues.RTS 2017-02-11 23592017-02-15 08002017-02-15 1800Campus ClusterICC Scheduled PMBatch jobs and login nodes access 
Contact PersonStatus
2024-05-15 07002024-05-15 19:00NightingaleQuarterly Planned MaintenanceAll Nightingale servers and services were unavailable (other than the ngale-bastion* nodes)help@ncsa.illinois.edu

Status
colourGrey
titleSCHEDULED


Previous Outages or Maintenance

StartEndWhat System/Service was affected?What happened?What was affected?

Contact Person

Status

10:30

11:00

wiki.ncsa.illinois.eduwiki.ncsa.illinois.edu has slow load time or completely times out (ticketed as SVC-24573)Viewing/Editing Confluence (Wiki) pages.help@ncsa.illinois.edu

Status
colourGreen
titleresolved

2024-04-25 17:00

2024-04-25 17:45

sslvpn.ncsa.illinois.eduAn update for an active 0day attack was installedvpn connections dropped for users and they had to reconnect.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

 
1700


1700

NCSA WikiServer, DB, and Application upgradeswiki.ncsa.illinois.edu will be unavailable during this timehelp@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2024-04-19 08:00

2024-04-19 09:15

DHCP leases for NCSANet and NCSA wired networkDHCP server stopped renewing IP address leases after a reboot for patchingDHCP lease renewals ceased, causing loss of connectivity for systems that tried to renew during the window. Already established connections were unaffected.help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2024-04-17 13:15

2024-04-17 13:16

ache-repo.ncsa.illinois.eduA filesystem is being fsck'ed.The data filesystem will be offline so packages will be unavailable.help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2024-04-16 08:45

2024-04-16 09:15

TaigaRolling Failovers on Taiga's MDT0 applianceAccess to FS was interrupted

set@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2024-04-04 12:35

2024-04-04 16:35

Subset of Taiga native (Lustre) clientsIB Link on tgio11 began failing RDMA traffic causing some I/O interrupt issues on clients leveraging 3 LNET routers.  Access to the file system via these LNET routers is periodically timing out; suspect is bad IB cable.  Confirming with vendor. 

set@ncsa.illinois.edu

Status
colourGreen
titleMitigated

 
0700


0730 

VMware migrationsVMware hosts are migrating to a new license modelAll VM guest machines and all services should remain operational and accessible. No downtimes are expected.

help@ncsa.illinois.edu


Status
colourGreen
titlecomplete

2024-04-03 1800

2024-04-04 0700

NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2024-04-02 11:00

2024-04-02 11:30

IRST services, including systems run on IRST VMWare clustersmoving to upgraded switches/routers

Systems run by IRST, and any systems on the IRST-run VMWare cluster. Outage is expected to last < 5min.

help+security@ncsa.illinois.edu

Status
colourGreen
titlecomplete



2024-02-28

2024-04-01

SSLVPNSSLVPN will start using CILogon for authentication and DUO integration. Four new profiles have been created (duplicating the existing four) but with the name "cilogon" in the name.  These new profiles will use the new authentication method.   After a few weeks of testing, if no issues are found, we will remove the old profiles on March 20. help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

 
2330

 
2345

NCSA GitLab,
NCSA Windows File & Print Servers
Web Redirect Server

VMs migrating to a new clusterAffected services will be unavailable for a few minutes.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

 

 

NPCF WifiTech Services will be replacing the AP at NPCF.

Tech Services will be replacing the Access Points at NPCF.  No user impact is expected.

help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete


2100 hrs


2200 hrs

Jira, Wiki, internal.ncsa.illinois.edu, identity.ncsa.illinois.eduVMs will migrate to a new cluster.Services will be unavailable for a few minutes (<5 mins) while the VM is shutdown and moved.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

 

 

NCSA WifiTech Services will be replacing the AP at the NCSA building. 

Tech Services will be replacing the Access Points at the NCSA building.  No user impact is expected.

help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2024-03-14 0700

2024-03-15  2140

vForge / license serversQuarterly Planned Maintenanceall vForge nodes and services (incl. related license servers/services) will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titlecomplete


03/14/2024 080003/15/2024 2125
  • Extended outage due to a problematic upgrade solution. 
  • Vendor engineers involved
Taiga/GraniteSemi-Annual Maintenance

All Taiga & Granite Storage Services

set@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

1200

1500

vsphere.ncsa.illinois.edu consoleUpgradeThe vsphere.ncsa.illinois.edu web console. VMs should not be affectedhelp@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2024-03-072024-03-08NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2024-02-22 06402024-02-22 0648NCSA GitLabGitLab being updated to latest versionAll GitLab services will be unavailable for a few minutes.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

1500

1600

ACHE vSphere is being upgradedACHE vSphere is being upgraded

ACHE vSphere will not be accessible

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

1600

 

1630

sslvpn.ncsa.illinois.edussl cert is refreshed

Users may need to manually reconnect if the system drops their session

neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

02/08/2024 103002/08/2024 1330NCSA Backbone Network Battery BackupNPCF Network DC Battery Maintenance Network Engineering is taking the battery back-up servicing NPCF networking equipment offline for periodic maintenance. This will be non-service impacting, as all core networking equipment still has two independent power feeds. neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2024-01-29 16002024-02-05UIUC NetworkComplete / partial network outageWhile NCSA network is up and not impacted, much of the UIUC network is currently offline.  This could be affecting a broad range of services such as wireless, facility networks, campus websites, etc.  No current ETA, as engineers are still troubleshooting the problem. neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

1730

1830

vCenter Server ApplianceCritical patches are being applied

The vcenter.internal.ncsa.edu site will not be accessible. Operating VMs should not be affected

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

0800

1400

HOLL-IHOLL-I will enter a shuttered/standby modeAll HOLL-I servers and services will no longer be available after standby mode is activated.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2024-01-24 1800

2024-01-25 0700

NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help@ncsa.illinois.edu
Status
colourGreen
titlecompleted


0900

0915

LastPassyearly audit performed. Users disabled or deleted per policyAccounts that had been disabled for over a year were deleted. Accounts that were unused for a year were disabledhelp+security@ncsa.illinois.edu

Status
colourGreen
titlecompleted


2024-01-22 0900

2024-01-22 1000

idp.ncsa.illinois.eduConfiguration update for sslvpnConfiguration for the NCSA Identity Provider will be updated with login support for sslvpn.ncsa.illinois.edu.help+idp@ncsa.illinois.edu

Status
colourGreen
titlecompleted

1700

1745

Wiki service upgradeUpgrade version to address recently announce security vulnerabilities.

Wiki will be down during upgrade and testing.

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

  2200

  2330

Jira service upgradeUpgrade version to address recently announce security vulnerabilities.

Jira will be down during upgrade and testing.

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2024-01-17 05002024-01-17 0700Wireless connectivity on 2nd, 3rd and 4 floors. Tech Services will be replacing some network components in switches that provide connectivity for wireless. 

Each floor (wireless) will lose connectivity for a few mins while the cards are replaced. 

neteng+help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2024-01-16 09002024-01-16 23:30Facility UPSSecond attempt, Preventive Maintenance_Replace UPS capacitorsAll systems which are connected to UPS power. During the PM the systems will not lose power but will be unprotected.rantissi@illinois.edu


Status
colourGreen
titleCOMPLETED

2024-01-16 22002024-01-17 0300Waster leak in Node 1 on campus.Node1 ( located on campus) has a water leak that may require full power down to address.  This will take out several devices that provide connectivity to NCSA WAN. No power outage was needed to repair the leakneteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2024-01-10 07002024-01-10 12:09NightingaleQuarterly Planned MaintenanceAll Nightingale servers and services were unavailable (other than the ngale-bastion* nodes)help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2024-01-09 21002024-01-10 0400WifiPerforming a Code upgrade that will affect the Wi-Fi Environment. The majority of the system will be online and functional while individual Access Points will be upgraded. This upgrade is expected to gracefully migrate clients to adjacent access points to minimize any interruption. 

<----


This will more than likely also impact NCSAnet.

neteng+help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

08 Jan 2024 0700

08 Jan 2024 1600

vforgeRadiant upgradeEntire cluster is shut downjlong@ncsa.illinois.edu
Status
colourGreen
titleCompleted


0700

1430

RadiantThe Radiant cluster was be upgraded from OpenStack Wallaby to Yoga.The web dashboard and API endpoints were unavailable; networking for instances may have been intermittent.help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2024-01-05 04002024-01-19 1200WifiUpgrading the code used for the Authentication on the Wi-Fi system and VPN. There will be an interruption to the IllinoisNet_Guest device registration and the IllinoisNet_Guest self-registration portal; both are expected to be back online before regular business hours. Regular authentication and traffic flow for the Wi-Fi and VPN is not expected to be interrupted.

<----


This will more than likely also impact NCSAnet.

neteng+help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2023-12-19  04002023-12-19 0800WifiUpgrading the core campus Wi-Fi hardware. There will be an interruption to Campus Wi-Fi (including IllinoisNet, IllinoisNet_Guest, and eduroam), IllinoisNet_Guest device registration, and the IllinoisNet_Guest self-registration portal.

<----


This will more than likely also impact NCSAnet.

neteng+help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2023-12-13 1800

2023-12-14 0700

NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help@ncsa.illinois.edu
Status
colourGreen
titleCompleted


  1600

1630

wiki.ncsa.illinois.edu and jira.ncsa.illinois.eduAtlassian has notified us of several critical security vulnerabilities in Confluence and Jira software. A mitigation has been applied to the Jira server and the Confluence server (wiki.ncsa.illinois.edu) will be patched.There will be a brief outage to patch the Confluence server at 1600. The patching is expected to take 15-20 minutes but the entire hour is reserved as a precaution.help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2023-12-06 09002023-12-06 1700Facility UPSPreventive Maintenance _ Replace UPS capacitors.All systems which are connected to UPS power. During the PM the systems will not lose power but will be unprotected.MO Rantissi

Status
colourRed
titleINCOMPLETE

UPS maintenance was halted due to damaged parts.  Putting the UPS back together and rescheduling for a later date.

The UPS is back online.

2023-11-10
0800

2023-11-10
0900

IDDS databasePlanned maintenance: postgresql upgradeNCSA identity, group management, campus cluster user management page, TEM shift report tool, and napshelp+idds@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2023-11-09 0700

2023-11-09 1445 (vForge)

2023-11-09 1630 (license servers)

vForge / license serversQuarterly Planned Maintenanceall vForge nodes and services (incl. related license servers/services) will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2023-11-09 0700

2023-11-09 14:30

vForge / license serversQuarterly Planned Maintenanceall vForge nodes and services (incl. related license servers/services) will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2023-11-08 1200

2023-11-08 1230

RadiantRebuilding rabbitmq serviceDashboard and API services were read-only during this time. help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2023-11-03

2023-11-07

hub.ncsa.illinois.eduprivate docker registry is down due to volumes in radiant in detaching statehub.ncsa.illinois.edu is not reachable, and images stored are unreachable. Services that have their images local should continue to run, services that want push/pull images will get a 500 error.Rob Kooper 

Status
colourGreen
titleCOMPLETED

1700

1830

Confluence/WikiUpgrade the systemConfluencehelp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2023-10-31 09:302023-10-31 10:30NCSA OpenSourceupgrade Atlassian productsopensource confluence/jira/bamboo/bitbucket

Status
colourGreen
titleCOMPLETED

 

 

HALFull System PMAll HAL services

Status
colourGreen
titlePOSTPONED


1330

14:30

SSLVPNNew auth method was added to a new login profile, ncsa-vpn-saml-tunnelall. There is now a test profile in place that isn't open to everyone. Please continue to use the profiles you were using before. If you notice and issue please report it. Our testing indicated logins were working as intended. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0900

1700

RadiantOpenStack software updateThe Radiant team will be conducting an OpenStack software update, from Victoria to Wallaby. This is a software stability update and does not include significant features or changes in functionality. The update will be done online and is not expected to impact running instances or system access.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

  1700

  0700

NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares will be unavailable during maintenance.  Users will be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing will be unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

0930

1100

TaigaOnline, Rolling patch of Taiga serversTaiga File Systemset@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0800

0800

HOLL-ICS-2 Appliance Mode upgradeAll HOLL-I servers and services will be limited to internal testershelp@ncsa.illinois.edu

Status
colourGreen
titleCompleted

0800

0800

HOLL-ICS-2 Appliance Mode hardware installationAll HOLL-I servers and services will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

1800

1815

ConfluenceConfig will be applied to increase the period users can be logged in before logged outConfluence will be downhelp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

  1700

  1800

sslvpntesting new auth methodno user impact was observedhelp+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-10-10 08002023-10-10 1000cilogon.orgMoving to new compute infrastructurecilogon.org, demo.cilogon.org, crl.cilogon.orghelp@cilogon.org

Status
colourGreen
titleCOMPLETE

2023-10-09 12152023-10-09 1532TaigaAppliance has unmounted all of its OSTs. Ability to do I/O to Taigaset@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0600

0700

ConfluenceConfluence is being upgradeConfluence will not be available for use

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-10-04 06002023-10-04 1927DeltaFilesystem and OS patchingAll Delta resources will be unavailable during the maintenance period
including:
+ Delta login nodes - unavailable
+ Delta compute nodes - unavailable
Delta services
+ Open OnDemand - unavailable
+ Delta Globus Online endpoint - unavailable
help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-10-04 09502023-10-04 1000Opensource ConfluencePatching conflueceopensource confluence will be down

Status
colourGreen
titleCOMPLETE

2023-09-27 17002023-09-28 0700NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-09-26 14002023-09-26 1500Wireless NCSA buildingCampus wireless outage.NCSAnet and IllinoisNet users are experiencing connectivity issues.   Tech Services is aware of the problem. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-09-28 14332023-09-28 1459cilogon.orgservice outage due to AWS database issuelogins to cilogon.org were failinghelp+cilogon@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-09-26 7:30AM2023-09-26 8:00AMLdap Primary  ServerMaintenanceLdap updates will be disabled during maintenanceTimothy Bouvet 

Status
colourGreen
titleCOMPLETE

09/22/2023 8:00am9/22/2023 1:30pmWireless accessNCSANet is not authenticating users and denying connections.Anyone attempting to connect to the wireless NCSANet ID.neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-08-292023-09-21 - 1300opensource bitbucketBitbucket is not compatible with the deployed version of git, see https://jira.atlassian.com/browse/BSERV-14390opensource.ncsa.illinois.edu/bitbcket

Status
colourGreen
titleCOMPLETE

2023-Sep-19 - 07452023-Sep-19 - 0750LastPassRekey the LastPass/Duo IntegrationLastPass users that utilize duo may not be able to authenticate until completedJames Eyrich

Status
colourGreen
titleCOMPLETE

2023-Sep-18 - 15112023-Sep-18 - 1749TaigaOutage due to failed MDS failover.Taiga access was unavailable.set@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-09-14 07002023-09-14 2015vForge / license serversQuarterly Planned Maintenanceall vForge nodes and services (and related license servers/services) will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-09-14-08002023-09-14-2000Taiga & Granite ServicesSemi-Annual Planned MaintenanceAll Taiga and Granite services will be offline 

set@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-09-08  06:582023-09-08 09:50disruption to NPCF-DES-CORE, NPCF-CWMGMT-FW1 & 2, MForge VPN

NPCF-CORE-EAST has a DEAD linecard.

Relocating affected links to other linecards with open ports while we work with vendor support for a replacement.

redundancy has been lost, access and activity remain normal.

Status
colourGreen
titleCOMPLETE

2023-09-06 09:482023-9-6 10:35NCSA Center Wide Management Networkthe firewall protecting this network is showing offlineCenterwide management networks in NCSA building(John) Walker 

Status
colourGreen
titleCOMPLETE


2023-9-6 09:502023-9-6 10:35The main switch in NCSA 3003In debugging a link problem between NCSA and NPCF the wrong fiber was inadvertently pulledNetworking in and out of 3003 was down for 35 minsneteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-08-26 23:062023-08-27 01:36CILogonCILogon database replication errorCILogon OAuth/OIDC services unavailablehelp+cilogon@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-08-24 13:022023-08-25 14:15TaigaMultiple SAS cable backend failure causing OSTs to go into write protect and unmountAccess to certain OSTs in Taiga

set@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-08-16 17002023-08-17 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-08-16 07002023-08-16 1003NightingaleQuarterly Planned MaintenanceAll Nightingale servers and services were unavailable (other than the ngale-bastion* nodes)help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2023-08-15 04002023-08-15 0500VMWare GatewayVMWare is updating the Gateway OSNo expected effects

help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-08-14 09352023-08-14 2000NCSA VPNDuo implemented new ssl checks that we were not passingUsers couldn't authenticate with DUO to establish new connections to the VPN. Existing VPN sessions remain connected.Matthew Elliott 

Status
colourGreen
titleComplete

2023-07-27 17002023-07-27 1715HOLL-ILive kernel patchingkernel was updated in response to recent security issue.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-07-25 09002023-07-25 1630RadiantChanges to the OpenStack network configuration and network service node (increasing MTU on customer networks and adding a new dedicated network server)These changes will impact project/instance networks and cause them to be unreachable for an extended period of time. Expect network timeouts and failure of NFS file system access. Systems may be unreachable for several hours - up to the entire planned time - but we are making every effort to minimze the downtime.

James Glasgow via help@ncsa.illlinois.edu

Status
colourGreen
titleComplete

2023-07-19 0800

2023-07-19 2000

ICCPICCP Quarterly MaintenanceAll ICCP services

help@campuscluster.illinois.edu

Status
colourGreen
titleComplete

2023-07-19 08002023-07-19 0900u1carne routerscheduled maintenancemForge, Magnus, and Access will have a brief outage as the routers reboot.

Michael Douglas via neteng@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-07-12 17002023-07-12 2130NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-07-07 10002023-07-07 1100NCSA KerberosDeleting out principals that were disabled on 2023-06-07Kerberos authentication should already be disabled for the planned hosts, so there should be zero notable effect. help@ncsa.illinois.edu

Status
colourGreen
titleComplete

1345

1450

HOLL-ITransitioning CS-2 Execution Mode from Weight Streaming to PipelinedHoll-I CS-2help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-06-30 06002023-06-30 0605NCSA GitLabGitLab was updated to latest versionAll GitLab services was unavailable for a few minutes.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-06-29 18492023-06-29 2000DeltaMore Power fluctuations due to the severe weather have caused in all NCSA buildings. NCSA staff are working to restore all services to full functionality.Delta Login, Openondemand and Scheduling.help@ncsa.illinois.edu

Status
colourGreen
titleRESOLVEd

2023-06-29 13162023-06-29 1530Most NCSA computer systemsPower fluctuations due to the severe weather have caused multiple system failures in all NCSA buildings. NCSA staff are working to restore all services to full functionality.

Virtually all systems have been impacted to some extent.

Most NCSA compute resources have returned to service.

help@ncsa.illinois.edu

Status
colourGreen
titleRESOLVEd

2023-06-21 18002023-06-21 1900DNS1 / DNS2 BIND security patchesDue to a security issue with BIND, neteng will be rebooting both DNS servers (staggered) starting tonight at 1800.  neteng@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-06-15 06002023-06-15 0605NCSA GitLabGitLab updated to use new backup methodAll GitLab services were unavailable for a few minutes.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-06-14 17002023-06-14 2200NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-06-08 11002023-06-08 1205ICI MetricsMajor Upgrade to Grafana 9.5.x and Unified AlertingAccess to https://metrics.ncsa.illinois.edu and all alerting was pausedmalone12@illinois.edu

Status
colourGreen
titleComplete

2023-06-07 13002023-06-07 1500NCSA KerberosDisabling Kerberos Host Principals not in DNSKerberos Authentication for hosts may stop working. Please create a ticket if you think your host principal may have been disabled erroneously. help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-05-23 12402023-05-23 1410TaigaFailover events on tgio02I/O to and from Taiga for all services intermittently during this periodset@ncsa.illinois.edu

Status
colourGreen
titleRESOLVEd

2023-05-23 08002023-05-23 1400Granite Tape ArchiveUnplanned Library Maintenance due to component failureRetrieval of data;bdickin2@illinois.edu

Status
colourGreen
titleRESOLVEd

2023-05-18 08002023-05-18 1400Granite Tape ArchiveLibrary Preventative MaintenanceRetrieval of data;bdickin2@illinois.edu

Status
colourGreen
titleCOMPLETE

2023-05-17 07002023-05-17 2125NightingaleQuarterly Planned Maintenance

All Nightingale servers and services were unavailable (other than the ngale-bastion* nodes)

help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-05-17 05302023-05-17 0600Wireless and VoIP (NCSA Building)Router UpgradesWireless, VoIP and anything directly connected to the campus switches will be down, while they upgrade firmware on the router.help+neteng@ncsa.illinois.edu

Status
colourBlue
titleSCHEDULED

2023-05-16 07002023-05-16 1100HOLL-IQuarterly Planned Maintenanceall HOLL-I nodes and services were unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-05-15 09002023-05-15 1200HOLL-ICS-2 CDU maintenancethe HOLL-I CS-2 was unavailable and there was a reservation in Slurmhelp@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-05-12 06002023-05-12 0615NCSA GitLabGitLab was updated to latest versionAll GitLab services was unavailable for a few minutes.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-05-11 0700

2023-05-11 1900

vForge / license serversQuarterly Planned Maintenance

all vForge nodes and services (and related license servers/services) will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-May-03 08002023-April-25 0900ACHE FW Cluster Upgrade - SecondaryUpgrading ACHE Firewall member BNo outage expectedeyrich@illinois.edu

Status
colourGreen
titleCOMPLETE

13:00

16:30

vSphere and hosts on it.VMWare Licensing issues. Was forced to migrate to new vSphere.LDAP, Wordpress Sites, varioushelp@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2023-05-01
1530
2023-05-01
1605
ICI VMwareApply updates to address software issue.


Expand
titleVMs that could experience a short outage...

campuscluster
congo-vm.ncsa.uiuc.edu
internal-test
ldap-mg1.ncsa.illinois.edu
ldap-radiant1
ldap1.delta
ldap2
ldap2.ngale.ncsa.illinois.edu
ldap3.ncsa.illinois.edu
manage - uillinois.edu AD
metrics01
metrics02
midwestbigdatahub.org
rad-adm01
studentcluster.ncsa.illinois.edu
svna-build
tintri-global-center
asd-backup01.internal.ncsa.edu
asd-log.internal.ncsa.edu
asd-pup01.internal.ncsa.edu
hli-pup01.internal.ncsa.edu
jlongtest
mlong-agent1
oncall-test


aloftus@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2023-April-26 08002023-April-25 0900ACHE FW Cluster Upgrade - primaryUpgrading ACHE Firewall member ANo outage expectedeyrich@illinois.edu

Status
colourGreen
titleCOMPLETE

2023-April-25 08002023-April-25 0900NPCF CWFM Cluster Upgrade secondaryUpgrading NPCF CW Firewall member BNo outage expectedeyrich@illinois.edu

Status
colourGreen
titleCOMPLETE

 

1700

 

1715

NCSA VPNThe certificate on the NCSA VPN was replaced.Users will be disconnected from the VPN and have to manually reconnect.neteng@ncsa.illinois.edu 

Status
colourGreen
titleCOMPLETE








2023-April-20 08002023-April-20 0900NPCF CWFM Cluster Upgrade primaryUpgrading NPCF CW Firewall member ANo outage expectedeyrich@illinois.edu
Status
colourGreen
titleComplete


2023-04-19 18002023-04-19 2300NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were be unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing were unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

  0800

  2000

DeltaHSN and OS is being updated,The entire system will be offline.kingda@illinois.edu

Status
colourGreen
titleComplete

04/17/23 090004/17/23 1400Granite Tape ArchiveUpgrades to FSIngest or retrieval of data;bdickin2@illinois.edu

Status
colourGreen
titleComplete

2023-03-27


2023-04-16NCSA OpenSource BitBucketincompatibility with git, only versions that can be installed are 2.25 or 2.40, and Bitbucket requires version 2.31 - 2.39
https://opensource.ncsa.illinois.edu/bitbucket is down until new version of BitBucket

Rob Kooper 

Status
colourGreen
titleCOMPLETED

2023-04-032023-04-04NCSAnet, IllinoisNet, EDUroamTech Services is deploying a new certificate for all wireless networks.Check #announce on NCSA Slack for more information, including links to download software that will update your wireless profiles. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-03-29 12:00 CDT2023-03-29 12:30 CDTPrimary Kerberos serverConfiguration changes to match secondary KDCsPassword changes may have been delayed by ten minutesChristopher Lindsey 

Status
colourGreen
titleComplete

2023-03-23 08432023-03-23 1030DHCP serving NCSAnet wireless and NCSA office wired wall jacksThe main NCSA DHCP server stopped answering queries and was restartedIf you didn't already have a DHCP lease your system would have been unable to connect to NCSAnet or register on an office wired wall jack.neteng@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-03-15 18002023-03-16 2300NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-03-14 11002023-03-14 1150Authentication to vsphere.ncsa.illinois.edu and ache-vcenter will failReplacing SSL certs on Ldap1/2Ldap will be restarted on Ldap1/2tbouvet@illinois.edu

Status
colourGreen
titleComplete

2023-03-09 0700

2023-03-09 17:20

vForge / license serversQuarterly Planned Maintenance

all nodes and services will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titleComplete


03/09/2023 0800

03/09/2023 1713NCSA Taiga & GraniteTaiga Service Node Updates & Granite UpgradeTaiga Public LNET router was upgraded and a second one added; access via public LNET was down from 0800 to 1100.  Globus and NFS services were patched in a rolling/online fashion.  

Granite experienced a short full downtime as we upgraded its software.  
set@ncsa.illinois.edu 

Status
colourGreen
titleComplete

03/07/2023 8:30am03/07/2023 10:15amDelta HSNThe HSN was dropping nodes and not allowing nodes to reconnectHigh Speed Connectivityhelp@ncsa.illinois.edu
Status
colourGreen
titleComplete
2023-03-01: 1100

2023-03-01: 1115

Radiant OpenStack ServicesChanges to the OpenStack controller node to address networking performance issues

All OpenStack services were restarted to effect system configuration changes. The work was completed successfully and all services are available again

help@ncsa.illinois.edu

Status
colourGreen
titleComplete

2023-02-25

2023-02-27 0930

NCSA email A mail loop caused routing and processing problems.Mail routing and delivery was blocked.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2023-02-21 0700

2023-02-21 1640

HOLL-IQuarterly Planned Maintenanceall nodes and services will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2023-02-16 ~14:152023-02-16 ~14:25cerberus4mis-configuration caused roughly 50% of connections to be dropped50% of connections in and out droppedhelp+security@ncsa

Status
colourGreen
titlecomplete

2023-02-10 09102023-02-10 0915users.ncsa.illinois.edu web siterestarting the systemno web pages from users.ncsa.illinois.edu will be available help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

02/08/2023 180002/09/2023 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

1215

1230

JiraJira will be restarted to fix stuck notification emails.Jira will unavailable during this time.

Andrew Loftus 

Also posted to #announce (Slack)

Status
colourGreen
titlecomplete

 11:18

15:02

ICCP head node login and golub compute resourcesLost network connectivity for golub infrastructureICCP head node logins (ie cc-login.campuscluster.illinois.edu) and golub compute resources

help@campuscluster.illinois.edu

Status
colourGreen
titleresolved

1200

1300

JiraJira offline for service restart to fix stuck emails.Jira will unavailable during this time.help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

01/25/2023 0800

01/25/2023 0830

NCSA LDAProlling LDAP restarts of redundant servers to deploy new schema file

Minimal impact for service restarts

Status
colourGreen
titleCompleted

2023-01-19 13102023-01-19 1330JiraJira offline for reboot to fix Boards.Jira will unavailable during this time.help@ncsa.illinois.edu

Status
subtletrue
colourGreen
titleComplete

  0800

1700

ICCPICCP Quarterly MaintenanceAll ICCP services

help@campuscluster.illinois.edu

Status
colourGreen
titleresolved

2023-01-13 12002023-01-13 1230JiraJira offline for dashboard fixes.Jira will unavailable during this time.help@ncsa.illinois.edu

Status
subtletrue
colourGreen
titlecompleted

2023-01-12 08002023-01-13 1230JiraMinor issues noticed in Jira likely caused by the upgrade yesterday evening. Gadgets and dashboards are having issues.

Status
subtletrue
colourGreen
titleresolved

2023-01-11 07002023-01-12 1200NightingaleQuarterly Planned Maintenance

All Nightingale servers and services will be unavailable (other than the ngale-bastion* nodes)

Maintenance has been extended until noon Thu, Jan 12 due to complications with firmware update on the Lustre storage appliance.

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2023-01-12 07002023-01-12 0715NCSA VPN Router MigrationThe NCSA VPN was migrated to a different upstream router. Users were briefly disconnected. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

  0600

  0615

NCSA GitLabGitLab upgradeAll GitLab services were unavailable for a few minutes while it upgraded to the latest version.

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

06:40 1/9/20232023-01-11 2100vSphere in 3003One of the storage appliances serving vsphere.ncsa.uiuc.edu started access issues. This has caused issues with 19vms.

crashplan has returned to service

help@ncsa.illinois.edu
Status
colourGreen
titleCOMPLETED


2023-01-11 17302023-01-11 1915Jira Jira software upgradeJira will be unavailable while software upgrades are applied.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

1/9/2023 6:40am1/11/2023 variousvSphere in 3003One of the storage appliances serving vsphere.ncsa.uiuc.edu had access issues.  Data was moved to different storage for affected VMs.digitalag.ncsa.illinois.edu, gecat, reu.ncsa.illinois.edu, ACIpartnership.org, astro, edream, caiiwp, brainstormhpcd.org, internal-dev, cmdb-dev-kimber7, reu-international.ncsa.illinois.edu, avl-test, mharp - ergo, infews-er.net, ncsa30, bluewaters - 2018-03-05.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

12/23/2022 6:30pm12/27/2022 1:30pmTaigaSingle OST is failing to re-mount following failoverFile system is unavailableset@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

 0530

 0600

Wireless at NCSA building.Router UpgradeTech Services will be upgrading their NCSA building router which will effect wireless at the NCSA building.  Downtime will be estimated at 15 mins. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

0800

1200

RadiantSystem maintenance

OpenStack:

  • "Minor system configuration changes will be made to increase system logging and optimize memory usage/allocation across nodes. No noticeable impact to end users is expected."

Networking:
  • Swap fiber links to correct issue with security taps: In order to minimize user impact, we will swap one link at a time. User should see no impact however there is a slight possibility of a temporary network outage potentially lasting a few minutes however we currently do not anticipate this happening.
  • Update Ethernet switch firmware: Switch reboots will be done in a rolling fashion and so are not expected to be disruptive to ongoing operations (due to switch/path redundancy).
help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

15 Dec 2022 090015 Dec 2022 0935NCSA KerberosNCSA's Read-Write KDC is being upgradedPassword changes and new accounts are being queued for completion after the upgrade.help@ncsa.illnois.edu

Status
colourGreen
titleCOMPLETED

  0600

  0615

NCSA GitLabGitLab was upgraded to latest versionAll GitLab services was unavailable for a few minutes.

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

12-01-2022 060012-01-2022 0700NCSA VPNSoftware UpgradesThe appliances hosting the NCSA VPN were patched. Users experienced a brief disconnect as load is failed over between the appliances. The anyconnect client was upgraded at this timeneteng@ncsa.illinois.edu

Status
colourGreen
titleResolved

1130

1400

NCSA identity password resetsThe password reset process is not completing.Users password resets were queued and then applied when the issue was fixed.  Users who tried to change their password should find there password is now set to the password of their last attempt.help@ncsa.illinois.edu

Status
colourGreen
titleresolved

 

 

capnjack (license server)Changes to IPTABLESUnknown servers. Licenses affected are IDL, PGI, Intel, MATLAB, Abaqus, Sention LM, Luda, Ansys, CDL, Adaptive, Converge, CFD, RLM Type, rr_ld

meberger@illinois.edu

re: SVCPLAN-1465

Status
colourGreen
titleCompleted

2022-11-16 10422022-11-16 1351CILogonDocker Swarm failureCILogon services were unavailable. See: https://cilogon.statuspage.io/incidents/2blf564965s0help@cilogon.org

Status
colourGreen
titleResolved

2022-11-15 07002022-11-15 1700HOLL-IQuarterly Planned Maintenanceall nodes and services will be unavailablehelp@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-10 0700

2022-11-10 1200

vForge / license serversQuarterly Planned Maintenance

all nodes and services will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titleCompleted

2022-11-10 11:002022-11-10 11:50ASD Vsphere, specifically vm's using the tintri storage appliance.Network connections were upgraded to 25G speed.There was no disruption of service with this work.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-07 09002022-11-07 0958set-analytics.ncsa.illinois.eduPhysical Machine Move from 3003 to NPCFThe SET managed Grafana/InfluxDB instance will be unavailableset@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-04 19002022-11-04 1930SET TaigaSET  caused a failover of tgio02 and then failed back.  This fixed the mounting issue.Clients with taiga currently mounted may experience slow or stopped IO during the failover.  Failover completed properly and solved the mounting issue.set@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-03 11322022-11-04 1930Delta

Taiga filesystem (/taiga/ and /projects/) problem on dt-login01 and dt-login02

The issue is limited to dt-login01 and dt-login02. Commands attempting to access /taiga/ or /projects/ on these nodes will hang.

Users are advised to use dt-login03 or the login.delta.ncsa.illinois.edu "round robin" address

UPDATE: dt-login01 and dt-login02 are fully functional again and back in the login.delta.ncsa.illinois.edu DNS "round robin".

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-03 00482022-11-03 0106SET Taigatgio02 and tgio04 failed overOSTs on the two nodes were inaccessible until the reboots were complete. This is a known issue with a vendor patch in progress. set@ncsa.illinois.edu
Status
colourGreen
titlecompleted


2022-11-03 06002022-11-03 0615NCSA GitLabGitLab was upgraded to latest versionAll GitLab services was unavailable for a few minutes.

help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-02 17002022-11-02 2000DNS ServicesPatching for out of cycle security updates.DNS1 and DNS2 will be patched and rebooted (staggered) to applied needed updates.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-11-01 18002022-11-02 0000NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecompleted

0800

0830

idp.ncsa.illinois.eduEnable Duo Universal PromptNCSA Identity Provider will now use Duo Universal Prompthelp+idp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2022-10-25 08002022-10-25 0900NCSA building 1st Floor Wifi / Security CamerasTech Services is replacing a networking switch on the 1st for of the NCSA building that powers the Access Points on the first floor.This should be a short down time, but the access points will reboot while we migrate cables to the new switch.  help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecompleted

2022-10-19 07:002022-10-20 07:15Some SSH Bastion HostsOut-of-Cycle reboot needed after failed patching.
Will reboot tomorrow at 07:00am
bwbh1.ncsa.illinois.edu
bwbh3.ncsa.illinois.edu
cerberus1.ncsa.illinois.edu
cerberus3.ncsa.illinois.edu
ache-bastion-1.ncsa.illinois.edu
ngale-bastion-1.ncsa.illinois.edu
help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETED

2022-10-18 15:002022-10-18 15:30Radiant instance creation/managementsystem setting changesNo noticeable impactpl@illinois.edu

Status
colourGreen
titlecomplete

2022-10-18 12:002022-10-18 12:05identity, email to NCSA addressessystem updates 1 minute window to cause email delays and identity frontend unavailablecpl@illinois.edu

Status
colourGreen
titlecomplete

2022-10-14 22002022-10-15NCSA office firewall upgradeUpgrading code on the office firewall.Office networks will be offline during this upgrade.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-10-13 17002022-10-13 1800SSLVPN MaintenanceThe second member of the HA pair will be put back into service.The second member was added with no outage.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-10-12 11:002022-10-12 12:00ASD Vsphere, specifically vm's using the tintri storage appliance.Network connection on tintri storage box were switch to new hardware but their speed was unchanged. Additional work will need to be scheduled to complete the speed increase.This had no service impact.help@ncsa.illinois.edu

Status
colourYellow
titleINCOMPLETE

2022-10-10 00002022-10-10 1040NCSA VPNThe NCSA VPN had a member of the HA pair fail and licensing didn't fail over. Users were unable to connect to the VPN until the licensing issue was resolved.help+neteng@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-10-03 08002022-10-03 0845HOLL-Iinstall security updates and reboot
help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-30 06002022-09-30 0615NCSA GitLabGitLab was upgraded to latest versionAll GitLab services were unavailable for a few minutes.

help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-27 11002022-09-28 1700odd numbered bastion hosts (cerberus1, cerberus3, ache-bastion-1, ngale-bastion-1, etc.)puppet code refactoring for SSH configs

More changes were pushed out around 5p on 2022-09-28 and we believe the SSHD config issues are resolved.  You can use the even numbered (cerberus2, cerebrus4) bastions as a work-around if any issues persist.

help+security@ncsa.illinois.edu
Status
colourGreen
titleResolved


2022-09-28 09302022-09-28 1050Jira outgoing emailoutgoing email degradedJira failed to send some/most outgoing email during this time frame.help@ncsa.illinois.edu

Status
colourGreen
titleresolved

2022-09-24 14452022-09-25 1045GraniteBuilding power outage caused Disk Storage Unit to power cycleAny user operations on the cluster were interrupted and unavailable until resolution.bdickin2@illinois.edu

Status
colourGreen
titlecomplete

2022-09-21 08002022-09-21 0930HOLL-IChange CS-2 execution mode to PipelinedExecution mode of the CS-2 was changed from Weight Streaming to Piplined.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-08 08002022-09-10 1100GraniteGranite Bi-annual Maintenance (now back in service)Any ingest or retrieval to/from the Archivebdickin2@illinois.edu  slack-id: briandi
set@ncsa.illinois.edu 

Status
colourGreen
titlecomplete

2022-09-09 09432022-09-09 1457Jiraoutgoing email degradedJira failed to send some/most outgoing email during this time frame.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-09-08
0700

2022-09-08 1010: license servers

2022-09-09
0230: vForge

vForge / license serversQuarterly Planned Maintenance

all nodes and services will be unavailable

help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

 

0500

0600

ASD VM services netRouting in the switch stacks is being swiched from NCSA 3003 to NPCFAll systems on the 141.142.192.x network will be unreachable for up to 5 minutes.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-08-31 18002022-09-01 0700NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were be unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titlecomplete

1730

1830

JiraJira service will be restartedJira will not be availablehelp@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

08-24-22 183008-26-22 0800Granite Tape ArchiveFS crash and lockupA few files that were transferred into the archive shortly before the crash needed to be re-transferred.

bdickin2@illinois.edu  slack-id: briandi
set@ncsa.illinois.edu 


Status
colourGreen
titleCOMPLETE

2022-08-17
1200
n/aAll LSST hosts at NCSAServers will be shutoff and retired.All LSST servers and services at NCSA.lsst-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-17 07002022-08-17 1320NightingaleQuarterly Planned MaintenanceAll Nightingale servers and services will be unavailable (other than the ngale-bastion* nodes)help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-16 0700

2022-08-17 1305

HOLL-IQuarterly Planned Maintenance

All HOLL-I servers and services will be unavailable

2022-08-16 1505 - HOLL-I cluster return to service, but CS-2 remains offline for further work; CS-2 expected return to service by 2022-08-17 1000

2022-08-17 1305 - HOLL-I CS-2 is returned to service

help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-09 20002022-08-09 2300Office Networks on 2nd FloorCode updates on office network switches.Office ports will be offline as switches reboot. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-10 20002022-08-10 2300Office Networks on 3rd FloorCode updates on office network switches.Office ports will be offline as switches reboot. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-11 20002022-08-11 2300Office Networks on 4th FloorCode updates on office network switches.Office ports will be offline as switches reboot. help+neteng@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-08-03
0900
2022-08-03
1000
NPCF Center-wide management firewallsSecondary firewall will be upgradedNo impact to services is anticipated.  Traffic will flow normally through the primary firewall as the secondary is upgraded.help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-27
0940
2022-07-28
15:36
ACHE, Nightingale Several accounts have had their Covered Entity status revokedAffected users/accounts will not be able to access resources that requires Covered Entity enrollment 

help+hippa@ncsa.illinois.edu

Status
colourGreen
titleresolved

2022-07-27
0900
2022-07-27
1000
NPCF Center-wide management firewallsPrimary firewall will be upgradedNo impact to services is anticipated.  Traffic will be failed over to the secondary firewall, the primary will be updated, and then traffic will be moved back to the primary.help+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0900

0915

JiraAdditional LDAP group will be added for exclusion to sync with LDAP users.In theory, nothing.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

0800

  2000

ICCICC Quarterly MaintenanceAll ICC services

help@campuscluster.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-19 07002022-07-19 0900RadiantVictoria UpdateMinimally disruptive, brief interruptions to OpenStack services, such as the Horizon dashboardradiant-admin@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-14
2345
2022-07-14
2359
WikiThe service will be restarted in order to increase the login timeout.Wiki will be unavailable for about 5 mins while it restarts.

Status
colourGreen
titlecomplete

2022-07-08
1700
2022-07-11
0800
LSST hosts in NCSA 3003Due to a full building power outage at NCSA on Sunday, 10 July, some LSST servers will be unavailable over the weekend. Servers will be shutdown at COB on Friday and returned to service on Monday morning.lsst-dbb-fts1
lsst-dbb-rucio
lsst-demo
lsst-dm-monitor
lsst-int-monitor
lsst-mon-dev
lsst-pup
lsst-test5
lsst-xfer
l1-cl-arctl
l1-cl-fault
l1-cl-header
nts-ccamfwdr1
nts-acamfwdr2
nts-acamfwdr1
lsst-admin@ncsa.illinois.edu

Status
colourGreen
titlecomplete

2022-07-11 08:302-22-07-11 9:30All ITSM (CMDB) VMsAll ITSM VMs are currently down. Ticket has been created to get them brought back up.Production CMDB service (openDCIM) is not availablekimber7@illinois.edu

Status
colourGreen
titleresolved

2022-07-10 0700

2022-07-10 1430

NCSA building powerBuilding power feed work for multiple campus BuildingsAVL, LSST, ISL and Software standard services were down from Friday afternoon until Monday morning.Daniel Lapine 

Status
colourGreen
titleCOMPLETE

2022-07-8

1600

2022-07-11

0900

cerberus2 and cerberus4Campus is doing work on a common feed that affects multiple buildings, include the NCSA Building. Work is scheduled from 0700-1700, but may finish earlyVM hosts running these 2 bastions will be down for the weekend due to the scheduled power work at NCSAhelp+security@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-06 17302022-07-06 2030Wiki (wiki.ncsa.illinois.edu)Confluence and MySQL upgradeswiki will be down during the upgrade

Status
colourGreen
titleCOMPLETE

2022-07-05 18002022-07-05 2130NCSA File & Print ServersScheduled Windows Server MaintenanceFile & Print Shares were unavailable during maintenance.  Users were unable to access shares on Fileserver (e.g. home, busnoff, hr, etc.), and printing was unavailable.help@ncsa.illinois.edu

Status
colourGreen
titleCOMPLETE

2022-07-05 1800N/AiForgeend of serviceiForge was removed from service. Operations have moved to the new vForge virtual cluster.help+industry@ncsa.illinois.edu

Status
colourGreen
titleRESOLVED

Legend:

Status
colourRed
titleIN PROGRESS

Status
colourGreen
titleComplete

Status
colourGreen
titleresolved

Status
colourBlue
titlescheduled

Status
colourYellow
titleMonitoring