Open discussions on specific topics selected by the Software Working Group and selected from the list of SWG Topics For Discussion.

Tuesday, June 14, 2022 Data Science Community Group moderated by John MacMullen

Slides:





Recording:


Attendees:

Luigi Marini 

William MacMullen 

Ana Lucic 

Jong Lee 

Kastan Day 

Chen Wang 

Jessica Saw 

Galen Arnold 

Christopher Stephens 

Sara Lambert 

Christopher Pond 

Dipannita Dey 

Yong Wook Kim 

Maxwell Burnette 

Minu Mathew 

Rebecca Eveland 



Discussion:

See recording and slides for in depth content

Many of the software directorate work with data scientists and this topic is very relevant to all of us at NCSA.  John MacMullen (MBDH) thanked everyone for joining.

NCSA Data Science Community Building was a project started with the help of Bill Gropp, Colleen Bushell and John MacMullen. Shannon Bradley is the PM for this project.

How does data science affect NCSA and the community in general?

Data Science is whatever you think it is, and many more things.  What does data science mean to you?

Many of the projects in SWG deal with data science, and we help to create generic forms, but getting projects to collaborate center wide would help not only NCSA but all of the projects that we support.

Having tool kits for data management and data science would be beneficial to all.

Create definitions for data science, data engineering, data management. This is complicated.

Workforce development - data carpentry, scientific software development, pipelines, computing

The software engineers need to learn the languages that the researchers are familiar with, such as R, Python, Jupyter Notebooks, etc.

Discussion of correlating dataset and data management

Challenge of doing computations across datasets without the actual data available, such as HIPAA.  When we have sensitive data that cannot be accessed, there is a secure way to access the datasets, run a query without actually seeing the data, this is something that NCSA would benefit from.  This will help us with industry partners and health providers as well as campus partners.

Goal: Present NCSA as the leader on campus as a data service provider

Goal: develop an NCSA community for sharing

Goal: after getting the NCSA community together, add the campus community and external collaborators.

Next Steps: What would getting the NCSA community together in sharing projects, data, etc.?  Will there be a metadata standard in format, which would definitely  help in sharing data.

Kastan notes RE transforming out internal data products to be more compatible: https://www.getdbt.com/
Ana notes it would be helpful to have a group for internal data sharing

Best Practices Group for Data Science and Data Management.

SD is developing a Best Practices Handbook

There is a Slack Channel: 

NCSA Slack channel #data-science-coordination

Sharing through GitHub, Jupyter Notebooks and Slack, rather than wiki

Who is responsible for maintaining datasets?

Accessing data can be tricky!

As part of MBDH, there are carpentry workshops and MBDH is a member of carpentry community. They also do workshops with training the trainer.  NCSA could develop skills at these workshops.  Would software be interested in attending these?

https://jupyterbook.org. We also like it because it is based on Sphinx (something we are familiar with) and it defaults to markdown (something easily portable).







Links mentioned in this Round Table:




If you are interested in contributing to a Round Table, please see these links:

Round Table Discussions

SWG Topics For Discussion




  • No labels