This conference report represents the experience of a first-time PEARC attendee. The full meeting was hosted from Monday, July 24, to Thursday, July 27. The complete program is available at https://pearc.acm.org/pearc23/schedule/. Please use the Table of Contents to find what you need:

What were the great talks you attended and why?

Here is a full note of all the presentations I attended. I took photos of slides for most of them, so please let me know if you are interested in the details of any presentation.

Mon, July 24 - Tutorials


Synthetic Data Generation for Training Object Detection Models

Nvidia hosted a hands-on tutorial on generating synthetic data from original vision files (PSD, png, etc.) and how to use the synthetic data in a pre-trained machine-learning model using Omniverse.

The presenter showcased some Omniverse projects:

  • NASA Project: one cluster for simulation, one cluster for visualization to achieve real-time virtualization with intensive computing.
  • Visualization of cancer cells in real-time. Climate change is being visualized over time (Lockheed Martin project). Train surgery & test robotics with real-time virtualization.
  • When many parties collaborate to create a scene (designers, game developers, etc.), Omniverse can help with all 3rd party collaboration and build a more collaborative pipeline.
  • Digital Twin: virtualization of real-world objects.
  • Audio2Face: creating face animation of an animal from a human being's facial expression.

The main content of the tutorial is to generate synthetic data from a series of 3D graph files. The generated synthetic data retains the attributes of the original 3D objects but changes the real value of the attributes, such as light, position, etc

Unlocking the Potential of HPC in the Cloud with Open-Source Tools

This tutorial is about Running HPC in Google Cloud. 

Google Cloud's HPC infrastructure has three parts. Each part has some modules.
1. Compute
2. Storage
3. Networking

Compute 

we got an introduction to Google's VM families, here listed a few:

1) N2D VM Family
- for general purpose

2) T2D VM Family
- for scale-out workloads


- Hyperdisk Throughput: next generation to optimize disk performance & cost
3) C2 VMs/C2D VMs
- High Performance Compute VMs

4) A2 VMs
- Latest with GPU support

5) Confidential VM
- Like regular VM
- Just a button/line of code to make a VM confidential
- All memory pages are encrypted. Decrypted only in CPU chip. Even data center people cannot access it

6) Bulk API
- For HPC running on scales
- Regional deployment

7) Spot VMs
- 60% - 90% cheaper
- Good for Genomics, Physics, Math, Financial services, Monte Carlo Simulations
- Price doesn't change more than once every 30 days
- Resources can be re-claimed
- Broad Institute (Genomics) is using Spot VMs

Storage
- Cloud Storage: JSON/S3
- Storage transfer service to transfer services among different storage solutions

Network
- Placement Policies: Spread (for high availability) -> Compact (Preferred for HPC): low latency

HPC Toolkit

  • Google suggested using Cloud HPC Toolkit for a pre-configured, easy experience
  • If using Batch: fully managed scheduling services. Quick and Easy. Good to run batch processing jobs
  • CentOS 7 as image
  • If someone wants to learn more about the HPC toolkit: Google has a full-day workshop to have a blueprint to run HPC on Google Cloud, then define Proof of Concept Architecture, then implement POC
  • Ansible for configuration
  • Blueprint file (in yaml) to control workflow. It is smart. For example, the network will create both network and firewalls, etc. Files will have both folders and mount them, etc
  • GKE can be used and Toolkit makes GKE easy

Best Practice
1. Use HPC Toolkit to force best practice
2. Autoscale resources
3. Budget monitoring
4. Use Intel MPI if not going open source


Tues, July 25


Plenary I Welcome & Rewards


- Chair welcome notes: PEARC23 is a three-year delay to Portland. The convention center is beautiful. He appreciated Exhibitors. Cambridge computing covers beverages (smile)
- Alan Chalker gives steering committee reports. John and Leslle in 2022-2023 Commitee. Committee created Conference Planning Guide and worked hard for DEIA efforts and created Strategic Plan
- How to engage with PEARC: fill evaluations, volunteer for the committee, submit tech content, spread the word
- Ken & Jeff & Kristin: announce the award.

  • Best full paper - Workforce: Professionalization for Research Computing and Data: An expanded agenda
  • Best short paper - Workforce: Cyberinfrastructure deployments on public research clouds enable accessible environmental data science education
  • Best full paper - App&Software: Airavata Metascheduloer: A Reliable, Fault Tolerant, and Resource-Aware Job Scheduling Service
  • Best Full Student Paper - App&Software: Efficient Parallelization of Dynamic Programming for Large Applications
  • Best short paper - App&Software: Scalable and Reproducible Virtual Screening through an API-Integrated Workflow
  • Best short student paper: A Further study of Linux Kernal Hugepapges on A64FX with FLASH, an Astrophiycu simulation code
  • Best full paper - system: Active RDM with the Djago Globus Portal Framework
  • Best short & student paper - system: Insights from the HAPP Framework: Using an AI-Drive Approvaach for Efficient Resource Allocation in HPC Scientific Workflows



Plenary Speaker I: D.K. Panda: Creating Intelligent Cyberinfrastructure for Democratizing AI: Overview of the Activities at the NSF-AI Institute ICICLE


- ICICLE Team Introduction (https://icicle.osu.edu/)
- Computing Phases: we are in phase 3: HPC+AI (2010+)
- Computing Continum: SC + Big Data + AI (Clouds + Edge + IoT)
- Challenge example 1: Agriculture. We need to the democratization of digital agriculture capabilities (recommendations, privacy & ethical considerations, etc.) AI-Drive agriculture to detect disease (Artificial Intelligence in Agriculture), Challenging example 2: Animal Ecology. Understanding animal behavior (AI-Driven Animal Ecology), Challenging example 3: Smart food distributions
- A typical flow: data from sensor -> Models on computer -> Move model & dat to cloud -> Data/Models: HECs
- There is a gap between AI and accessibility to users. The solution is not widely shared between cases
- A broader challenge: plug-and-play solutions for different stakeholders
- The presenter plays a video to show that it is hard to create standard solutions for different stakeholders. ICICLE is trying to create an infrastructure & framework to create plug-and-play solutions. Like ChatGPT interface to ask questions
- Vision (photo). Learn from the system. Intelligent CI. Example: digital agriculture project in US and India.
- CI for AI (photo) & AI4CI & privacy + data integrity & visual analytics to explain AI: show some demo videos


Decision Pathways Perspectives: Changing the frame on place-based model integration and intelligent decision support services

- Difference way of researching (see photo): using a transdisciplinary approach
- Presented by the decision support office of TACC, regarding their approach to designing & implementing of Applied Decision Support System
- Users own solutions, while TACC only provides services
- Dimensions of complex models (photo): 
- Computing centers normally use a Horizontal/Vertical approach => TACC uses a decision pathway approach





Enabling Research through Federated Access of Compute Resources for Sensitive Data (BoF)

The BoF is about sensitive data computing.

  • University of Virginia: ACCORD Project, use CoManage
  • A secure computing platform (UC Santa Cruz has federated access. PSU also has it)
  • Only allow de-identified HIPAA data (concern about storing HIPAA from other institutions)
  • Challenges: level of sensitive data defined by the VA office
  • Considering: virtual office hours (for user consulting)
  • How secure of software that we embedded into our system? 1. an external-provided risk assessment team 2. u of Chicago control inbound and outbound traffic (whitelisting) 3. Only temporarily keep package (ex. python) | agreement of during the installation time, no data in 4. UC system: if the software is licensed, in the buying process, doing the vendor assessment 5. Team of software monitoring
  • It is very common to run non-up-to-date secured software sometime
  • VA: frequently update software (shuffling binary so that the attackers won't have time to react) -> ACCORD solution
  • Will VA's way (change signature frequently) affect re-producibility? the solution is to consider this situation from the beginning.
  • How to ensure changing versions do not change the result? tolerate this risk. Trust the re-writer
  • Federated is about two/more institutions sharing resources. Sometimes just give the other side a university account.
  • UC system uses PIP in terms of federated access
  • UV's initiative: Research Computing Infrasture
  • UVA proposal: create a reference book for online resources. Want to make this a community effort
  • Currently in the designing phase (2024: dev, 2026: finish deployment)
  • Experiences in secure & federated systems: UofT (group-only access to specific nodes). VM serves as the front end of a certain cluster. Institutions that own the data determine the rule (encrypted or non-encrypted)



Cybershuttle: An end-to-end Cyberinfrastructure Continuum to accelerate Discovery in Science and Engineering

- Solve the fragmented research workflow
- create a Gateway science 'framework'
- Data can come from many resources, how to create a system to ingest datasets and figure out what are computing resources
- Seamless connection between local resources and cyber shuttle server
- Architecture CyberShuttle (see photo)
- User deploys to agent





Augmenting the User Experience in Open OnDemand


- This is a follow-up to a presentation at the last supercomputing conference
- Demo of features that are developed by Harvard: UI redesign
- Widgets design


ACCESS: Advancing Innovation

Introduction/Update of ACCESS Program:

  • 49 allocation sites
  • NSF funded 5 different core services


Scholarly Data Share 2.0: Granular Access to Research Data


- Created by IU to solve the problem that data didn't fit everyone's needs
- Focus on how to share data
- Used to do from SDA. It is not an archive. More I/O expected
- https://gis.iu.edu/ (first version): ISDP
- Has different datasets (resource lab), each resource lab(collections) have different datasets.
- Using Omeka (provide major search functions) for metadata import as front-end (talk to OIDC Client)
- Data set includes different metadata and made them searchable



Wed, July 26


Plenary II: Fighting Fires Using Data and Computing. Presenter: Ilkay Altintas

- This presentation is about Wifire Commons created by UCSD (https://wifire.ucsd.edu/)
- Models are using the same resources, but we have to build different models for different cases. For example, real-time data analysis needs a similar model because of the speed of calculation, but it will be hard to include all cases
- Next generation fire prediction model, which is more generalized, needs collaborative thinking. Solving wildfire problems needs standard collaborative data infrastructure and rapid individual solutions
- Data modeling: high throughput fire behavior (Cloud + HPC)
- For visualization, preferring 3D in prescribed fire than 2D. The 3D outputs are generated using real-time data
- Composable systems: Expanse, Nautilus (used in federation learning), Sage





Plenary II: Accessible, Inclusive a,nd Sustainable Cyberinfrastructure Ecosystem Development through NSF Support (Panel)


- NSF Update of Cyberinfrasturue (A new director just joined three months ago)
- Introduction of NSF resources:

  1. NAIRR: National AI Initiative, Strategic Plan, Research Resource, Institutions
  2. ACSS: submissions, deadlines, etc
  3. MRI: small scale
  4. ACCESS (year 2): CI coordination services
  5. CC*: very detailed proposal required. Campus-wide
  6. CICI: open-science. Applied side of cybersecurity research
  7. IRNC: international peer network (currently doesn't accept new proposals)
  8. CyberTraining: learning & workforce
  9. SCIPE: research & develop the CI system
  10. CSSI: Science-Driven, innovative for the community
  11. Public Access Plan 2.0: immediate access to peer-reviewed publics and datasets (2025)

- Q&A:
What if someone from the community feels they are not qualified to be PI, thus not submitting proposals? NSF doesn't have limitations to person/background. Limitations are normally set up institution level. NSF doesn't have limitations to who can submit the proposal


A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code

- Fortune program with 2D visualization
- Profiling
- TLB misses (photo) subset of the page table. Use TLB instead
- Linux memory is allocated using pages in the page table
- Improve compiler

Immersive OSPRay: Enabling VR Experiences with OSPRay

- Gesture-based XR application
- OSRay created by Intel (Open Sourced) for ray tracing
- Using Unity to build 3D feature
- Containerized app
- Demo: The model is created by data scientists
- Using Kinnet for detection



Advanced Application user interfaces for time-dependent Recursive indeXing (tRecX) Code: from Design to Production Deployment

- Deploy UI through Science Gateways
- AMOS is mostly used in Quantum Physics. Lots of applications are deployed in the gateways.
- Use AMOSGateway.org for deployment: users can use the web interface from this service
- Use Apache Airavata (can support multiple gateways)
- AMOSGateway has both HPC deployment and many integrations
- tRexX Software: C++ code with MPI-based distribution, but with legacy UI
- Design process: take interface to design through Science Gateway, have students in India design and create mockups
- Use the Django framework and use Vue.js as the front-end
- Django portal server fetch output file from Airavataa and do immediate analysis and plotting
- Using Python to do dynamic plotting
- Has a tutorial function for users to learn and just modify the tutorial to do their own researcher
- Three months of design.



Reproducibility of Computational Research (BoF)


- https://docs.google.com/document/d/1NyKKmL1AXalnZ7IJiRhkBSFARW09628sAMhKmlG6yGc/edit (Discussions)
- The challenging goal is to make the results of the research reproducible
- When reusing research results, the question is do you have enough information to reproduce the results?
- Initiative to minimize the effort to put together the artifact to be shared
- Challenges from MN: needs user engagement. Need to determine the value of data
- Reproducibility is interdisciplinary

Volunteering at NCSA Booth

  1. I learned that some demo apps can help to market NCSA to researchers
  2. Some people ask questions about Delta resource allocations and NCSA history when stopping by
  3. A scholar from Yale shows interest in creating a similar app to FlightPathA Ph.D. student from Texas A&M interested in similar Policy Design Lab visualization for national housing data
  4. I got a child to try FlightPath and she feels really interested
  5. I discussed the necessity of gamification in overall software development with others. Since users normally don't read instructions, we need to make the instruction plugin to the searching process for the FlightPath app.

Thursday, July 27


BoF on Open Cloud Infrastructure for advanced research computing workloads


- Use Etherpad for BoF
- SIG Slack (https://join.slack.com/t/os-scientific-sig/shared_invite/zt-1zg1sok2d-77MwVGgeiFxtpWfVC5ozXA)
- Presenters from different companies using OpenStack
- IU is using OpenStack
- *NeSi*'s OpenStack Project: sensitive data & Maori data. Hosting Platform for BYO kit
- Kubernetes + CloudFoundry + Terraform
- Use case: Aotearoa Genomic Data Repository. E-research platform: AgResearch
- Discussion of audience's use case: 1) mostly use cloud for storage (Anderson Cancer Center). S3 for Storage with Swift 2) Question to research: if you run out of your funding, you need to have someway to persistae your data (for example, transfer to local storage).
- Biology researchers use JetStream
- Difference between tapis and airavata


Plenary III: Tribal colleges and Community-driven Computingn Presenter: Al Kuslikis

Presetnation about American Indian Higher Education Consortium (AIHEC), and encourage HPC community to join and support American Indian students

What are some technologies that you saw that could be interesting now and in the future?

I notice that:

1) AI/ML is widely adopted by projects. Lots of HPC topics are involved in how to better support ML modern running. 

2) AR/VR technology draws attention:
The University of Indiana showed a live demo of utilizing AR/VR glasses to present virtual collaboration spaces, humans, and architecture models (https://gis.iu.edu/). 

I brought the team's iPad to demo FlightPath demo in my exhibit volunteering. People are interested in it and gave good feedback including improving instructions in the app, trying the broader way of adopting such technology (for example, for MRI patients to do eye tracking), and preventing people from getting dizzy while using the app. 

3) Common libraries/tech/infrastructure that many organizations are using:

  1. Tpis (from TACC https://tapis-project.org/)
  2. Slurm Cluster/OpenStack Cloud
  3. different Science Gateways
  4. OOKAMI
  5. Apache Airavata

4) Some unique techs/libraries that projects are using and we can try in the future:

  1. Osprey from Intel for ray tracking: https://github.com/ospray/ospray
  2. Indiana Spatial Data Portal (https://gis.iu.edu/) from IU
  3. Nvidia's Omniverse
  4. SDSC's Wifire system (Deep learning-based Smoke Detection): Expanse, Nautilus (used in federation learning), Sage

How can we participate next year, what projects/fields?

1) Speaking from VA's perspective:

  1. Demo of AR/VR projects
  2. Projects that utilize the power of HPC, such as deep learning model demo visualization

2) Some questions that come to my exhibit session include the ACCESS allocation process for Delta resources, current projects that host on Delta and What are specializations of Delta...So it would be helpful to host a session to guide how researchers to can apply for Delta resources, and what preparations they need to do. It would also be helpful to do some demos of projects hosted on Delta

3) PEARC has some slide programs such as student programs and early career programs. Students/interns and new employees who are interested in HPC can apply to these programs

What were some lesser things of attending the conference (hotel, venue, etc)?

Hotel, Venue and Networking Session of PEARC

The conference is hosted at Portland Convention Center. It is clean and has lots of space. You can call Uber/Lyft from Airport (~30 mins) or take the Red Max Line bus from Airport. Unfortunately, the Red Max Bus Line is under some renovation in mid-July 2023, so some parts of the route need a shuttle to be connected. 

Portland International Airport took around 30 mins to an hour to pass the security checkpoint. Plan your time accordingly.

My hotel (Courtyard in Downtown/Convention) center is very close to the Portland Convention Center (5 mins walk). It is clean and well-managed. There are Safeway, Dollar Tree, and many restaurants (McDownloads, Dannies, Wendys..) that run early/late hours.

The networking session is a good place to get some drinks, connect with other HPC professionals and enjoy some games:

Portland Traveling

This is my first time in Portland, and I found the city very fun! Please note that downtown could be a bit dangerous to walk alone after 7 pm (in summer).

The PEARC gave FREE travel passes for attendees to travel to Portland. The travel pass can be used in all public transportation.

Recommended destinations that I tried this time:

  1.  Washington Park (https://explorewashingtonpark.org/). It is a whole area of multiple parks connected by 12 miles of trails. You can take the Red/Blue Max Line to the part directly from the convention center. The park provided free shuttles to drive visitors to their destinations. All parks are free (including the International Rose Park) except Oregon Zoo and Japanese Garden.
  2. Pioneer Square shopping. Pioneer Square area is between Washington Park and Convention Center. It has many Luxury stores to shop and TAX-FREE. Just be careful if you come after dark since it is in the downtown area.
  3. The Tilikum Crossing Bridge is a great bridge to walk across the rivers and view the bridges of Portland. It can be reached by many buses and max lines
  4. Powell's City of Books is the largest used and new independent bookstore in the world. It has all kinds of books and is very fun to navigate.


  • No labels