Open discussions on specific topics selected by the Software Working Group and selected from the list of SWG Topics For Discussion.

Tuesday, January 25, 2022 - Project Preservation topics (backup strategies, data publication at end of project, life beyond funding, portable code), moderated by Max Burnette

Slides:  RoundTable_Preservation1.25.22.pptx


Recording: https://uofi.box.com/s/c8qxw1psulvz6hheechlgdfmdvb5a9ph

Attendees:

Maxwell Burnette 

Luigi Marini 

Sara Lambert 

Camille Goudeseune 

Kenton McHenry 

Vismayak Mohanarajan 

Mikolaj Kowalik 

Timothy Andrew Manning 

Minu Mathew 

Jeremy Sykes 

Galen Arnold 

Nathan Tolbert 

James Phillips 

Rob Kooper 

Xiaoxia Liao 

Elizabeth Yanello 

Jeff Terstriep 

Kathryn Naum 

Michal Ondrejcek 

Christopher Navarro 

Yong Wook Kim 




Discussion:

See Slides for Presentation - What to do with the infracstructure of projects that have finished.

Keep lines of communication open. It could lead to future funding

Have an annual meeting with former project leads/collaborators

Small projects do not reed much space; Large projects require large storage (1.2 Petabytes!).  We don't want to lose this data. For this amount of data, we use tape storage

VM's do not maintain themselves; there is an inherent cost in maintaining VM's (software licenses; machine usage, Kubernetes).  This requires people time. 

Custom software requires much maintenance.  Rob Kooper suggests that as you near the end of a project (two months out or more) that you begin to keep your projects maintaining.

Storage also costs money.

Reaching out to collaborators who we are maintaining their data and reach out to them for future  funding on various projects.

It's important that personnel  that leave pass their information forward.  It makes finding documentation less frustrating

IIB is looking at externally facing sites for keeping data up and running for years/decades.

We need to develop a "paper trail" for chain of documentation.  We do brain dumps periodically, but there needs to be a more formal tracking of data, especially projects that are closed.

Many of our projects are research based and we need to know where these data are stored and we need a catalog for preservation

Some projects were not moved from Nebula to Radiant and now that data is gone.

Can the data be used in future projects? Is the data still relevant?

At the beginning of projects, we need to talk with the collaborators about maintaining their data past the end of the project.

Software Directorate has a small amount of storage available in Radiant.

Galen notes that he needs space cover the recent loss of > 22,000 computed nodes

We can't pre-pay for storage, whlere it gets charged to an account that is no longer in existence because the project is closed.  Where to we pay for this storage?  We need an account to sustain this data into the future

Should there be an "end of life" note at the bottom of project website?

There is Illinois Data Bank that allows University storage. 

Does NSF need to provide funding for post project data?

IPFS is something that everyone should access

We need to come up with best practices

Perhaps costs can be shared for storage by the external collaborator who actually will need to access the data.

Rob Kooper notes that MOU's are in place for some projects that share storage costs, which has an annual meeting as part of the MOU.

There comes a point where the cost/benefit no longer benefits us and we need to talk to the PI about this and share this with the PI up front.

Kenton will talk with the cabinet about funding strategies for projects.

This is a very fluid topic that affects all researchers, developers and collaborators. 

Recommend that the zeroth step is to search the NCSA wikis and related documentation platforms for existing efforts to formalize continuity best practices. This has undoubtedly been a concern in the past as well, and so I would expect there may be something to build upon or resurrect.







Links Shared During the Talk:

Depending on the nature of the website, S3 has a clever option: https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteHosting.html

Going back to Mike's link. GitHub also provides the ability to host static websites. https://pages.github.com/

IPFS is something that everyone should access https://ipfs.io/

https://databank.illinois.edu/

This page might be a good place to link docs about project sustainability and end-of-life recommendations/policies: Project Development








If you are interested in contributing to a Round Table, please see these links:

Round Table Discussions

SWG Topics For Discussion




  • No labels