This group The NCSA Genomics Group is a host for research into the use of high performance computing (HPC) for primary genomics analyses, such as alignment, variant calling, genome assembly, and RNASeq. By its nature, this research is highly collaborative. Every member of our team is affiliated with multiple departments or campus initiatives. The student participants in this group serve as a bond between the campus faculty using computational genomics analyses in their research, and the NCSA experts in HPC, storage, networking, databases, etc. Together we enable the use of advanced computing infrastructure in computational genomics. Explore this page to find out who is involved, how we are connected, and what projects are currently ongoing.
Staff members from the software directorate and other groups within research consulting are frequently collaborators on our projects.
NCSA Press:
Crossing over, branching out: Meet the NCSA Genomics team
Collaborative efforts produce clinical workflows for fast, translational genetic analysis
Table of Contents:
Table of Contents | ||
---|---|---|
|
Active Projects
Project name | Project description | Genomics staff | Collaborators |
---|---|---|---|
Biomedical Pipeline Development | Multiple projects involving workflow development and feature addition including determination on whether workflows can be deployed into the cloud. | Joshua Allen Mohith Manjunath Weihao Ge Raghid Alhazmy | The Mayo Clinic |
CROPPS | GPU accleration and refactoring of code for gRNA design for CRISPR assays. | David Bianchi | Cornell Computer Science Crop Science |
NEAT | Software Development of sequence simulator with mutational models. | Josh Allen Raghid Alhazmy | The Mayo Clinic Ontario Cancer Research Center The Broad Institute University of Wyoming |
Farm to Food Bank Mobile Application Development | Development of a mobile app for farmers to sell off-spec and extra produce to food banks. | Christina Fliege | |
MINERVA | Developing an interface for combined genomic and diagnostic analysis for improved prediction, research and clinical interpretation of genomic variation. | David Bianchi Raghid Alhazmy | The Mayo Clinic |
Investigator Portal | Developing a research compute portal for clinical diagnostics and genomics, where investigators can filter and query results against bioinformatic catalogs and applications, and generate re-useable datasets that can be viewed, analyzed and managed. | David Bianchi Misael Lazaro | The Mayo Clinic |
GWAS Study of Dairy Cattle | Analyze genotype data of over 11 cow farms, to find the cows’ susceptibility to diseases and underlying genomic variants | Joshua Allen Weihao Ge | Prof. Sandra Rodriguez Zas |
Metabolomics data analysis on microbiome | Understand gut microbiome products for personalized medicine. | Weihao Ge Misael Lazaro David Bianchi | Prof. Issac Cann |
genomic selection for maize and sorghum | Evaluate how much genomic variants will contribute to traits between maize and sorghum. | Weihao Ge | Prof. Alex Lipka |
Christina Elizabeth Fliege
Technical Program Manager
Senior Research Scientist, National Center for Supercomputing Applications
NCSA Genomics, September 2017. Credit: Steve Deunsing
Research Assistant Professor, Institute of Genomic Biology
217-300-0568
NCSA Genomics 'Best Original Undergraduate Research' Award
NCSA Genomics - Bluewaters tour, 17 June 2019
Staff
Research Interests | ||
Joshua Allen Senior Research Programmer BA Mathematics and English (2001) MA English (2005) MS Bioinformatics (2019) | Sequence Simulation and Advanced Workflow Development. Modeling next-generation sequencing data, applying machine learning techniques to enhance models, writing production-ready code. | |
David Bianchi Research Scientist Ph.D Physical Chemistry (2022) | Metabolic Engineering, Synthetic Biology, Gene Regulation, Genomics Analysis, Digital Agriculture, Spatial/Multi Omics, Personalized Medicine, Research Software Engineering, High-Performance Computing, GPU Computing | |
Mohith Manjunath Research Programmer Ph.D Aersospace Engineering (2014) | Quantum Computing in Biology and Chemistry | |
Weihao Ge B.S. Physics (2008) M.S. Physics (2011) Ph.D. Biophysics (2018) | Machine Learning in Epidemiology and Genomics. Biostatistics and Informatics. | |
Raghid Alhazmy Research Programmer B.S. Biology (2021 |
Projects
) M.S. Bioinformatics ( |
2023) |
Department of Crop Sciences, UIUC
CompGen fellow
advised by Dr. Matthew Hudson
Mutation profiles of cancer
Mr. Weber is developing machine learning methods to effectively stratify cancers
based on the statistical properties of mutations found in afflicted individuals.
Cancer stratification is predictive of disease outcomes, drug response and drug metabolism.
Effective computational approaches based on total data acquired to-date can make this process cheaper in the clinic.
Matt collaborates with the Ontario Institute for Cancer Research to make sure his models are realistic
Genomic data analysis, Machine Learning, and Computational Biology. | ||
Joao Paulo Gomes Viana | ||
Misael Lazaro Academic Researcher B.S. Biochemistry (2019) M.S. Biochemistry (2023) | Genomics Data Analysis, Drug Development and Design, and Computational Biology | |
Graduate Students | ||
Dhruvesh Shah School of Information | ||
Yash Wasnik Information | Racial Health DisparitiesYazhuo is involved in Racial Health Disparities project and researches with machine learning and data science skills. Her work is to do statistical analysis and write codes to build a pipeline on health datasets in collaboration with team members. | |
Undergraduate Students | ||
Alumni
Katherine Kendig Associate Project Manager B.A. Anthropology (2012) M.F.A. Creative Writing (2017) | Project ManagementKatherine is a project manager with the NCSA Industry Program, working primarily with biomedical partners. She benchmarked the Sentieon variant calling software for the Mayo Grand Challenge: https://www.biorxiv.org/content/10.1101/396325v1 She has also contributed to NCSA’s Public Affairs team, writing articles about NCSA and XSEDE research: After the storm; Bringing supercomputing to psychology; DISSCO Tech; ECSS: Profiles in Consulting; NCSA Genomics; History was here | |
Ramshankar Venkatakrishnan Research Programmer B.S. Electronics & Communications (2012) M.S. Electrical & Computer Engineering (2015) | Phillips 66 and Hardware supportRam is developing code for the Phillips 66 project with the Data Analytics team. The idea of the code is to use Machine Learning to determine the best price to sell their petroleum products. The model considers a vast array of parameters to make the decision. Ram is working with the Innovative Systems Laboratory (ISL) at NCSA to create roofline model for a U250 Xilinx card using convolution as the code to plot the model. Ram also provides software and installation support for the HPC clusters at NCSA for a variety of clients. | |
Dan Lanier, Research Programmer B.S. Applied Mathematics (2008) | NCSA IndustryDan supports biomedical partners in the NCSA Industry program. Dan provides a complementary mix of expertise in HPC and mathematical data analysis to enable pharmaceutical, agricultural and medical companies to utilize the high performance computing resources at NCSA. | |
Matthew Kendzior Research Programmer BS Crop Sciences (2016) MS Bioinformatics (2019) | Mayo Grand ChallengeMr. K is working as a researcher in the Mayo Grand Challenge, which aims to drastically speed up the time for detection of genomic variants, and to extract more information from whole genome sequencing data. |
Junyu Li
B.S. Molecular and Cellular Biology (2017)
minor in Computer Science
SPIN fellow
Genomic variant calling by assembly |
Mr. K |
is focusing on a method to detect genomic variants by assembly. |
He is employing the software Cortex-var, which constructs de-novo genome assembly |
on multiple |
to detect where they |
for detecting novel variants, |
such as polyploid plants and |
Junyu and Mr. K work as an interdisciplinary team. Junyu provides the expertise in
math and computer science to automate the Cortex-var workflow and interpret the algorithm.
Mr. K is also working with Tiffany on the genomic analysis of HLHS for the Mayo Grand Challenge. Poster: Variant Calling by Assembly Poster: Reference-guided variant calling for non-repetitive sequences in Glycine Max | ||
Brian Bliss, Research Programmer | Data compressionBrian will be working on data compression for the Mayo Grand Challenge project. | |
Sushma Yellapragada Bachelor of Technology: Computer Science Engineering, Northcap University (2019) M.S. Computer Science, UIUC (2022) | NEATSushma is currently working on the NEAT project, contributing code and testing. |
Angelo Santos | ||
Yazhuo Zhang MS in Information Management | Racial Health DisparitiesYazhuo is involved in Racial Health Disparities project and researches with machine learning and data science skills. Her work is to do statistical analysis and write codes to build a pipeline on health datasets in collaboration with team members. | |
Sijia Huo |
B.S. |
Mathematics & Computer Science (2018) second major in Statistics third major in Economics | Parallelization of RSijia is working with NCSA Faculty Fellow Dr. Zeynep Madak-Erdogan to introduce parallel R code into her research. Dr. Madak-Erdogan is exploring racial disparities in breast cancer occurrence through the lens of diet and nutrition. |
Ryan Chui B.S. Biochemistry (2016) |
M.S. Bioinformatics ( |
2017) |
Graduate Fellow in the College of ACES
NCSA IndustryRyan performed software installation, benchmarking, and development for a variety of industry partners.To investigate how the training time for deep neural networks (DNN’s) can be affected, Ryan worked with TensorFlow, Google’s deep learning library, to perform multi-label classification on a data set. He built an autoencoder – an unsupervised deep neural network - to extract salient features from the On Github: EpiQuant: Hadoop, C, Tensorflow - epistasis software prototypes MLCC - multi-label cancer classification q2b - binary representation of nucleotides ptgz - parallel tar gzip Usage Analyzer - log analyzer for HPC schedulers |
Jennie Zermeno |
B.S. Integrative Biology (2017) | Benchmarking performance and accuracy of genomic variant calling softwareJennie |
collaborated to document our efforts in benchmarking variant calling on HPC systems. |
We have run variant calling experiments on 500 genomes in parallel, on Blue Waters,
to identify performance bottlenecks when using the GATK best practices workflow.
Jennie is documenting this work in a publication.
We have also tested a number of alternative software, such as Isaac, Genalice, Sentieon,
as well as Dragen - a hardware solution.
Tiffany is documenting the pros and cons of each of these excellent approaches in a separate manuscript.
Tiffany Li
BS Integrative Biology (2018)
minor in Computer Science
Jennie also participated in the debugging of the H3ABioNet GATK Germline Workflow. Bioinformatics in the CloudJennie is investigating the issues of portability, reproducibility and scaling of bioinformatics workflows in cloud infrastructure by instantiating containerized versions of workflows. Students Capitalize on Computational Genomics Research Using AWS | ||
Angela Chen M.S. Statistics (2017) Department of Statistics, UIUC CompGen fellow advised by Dr. Alexander Lipka | Accurate and scalable GWAS algorithmsAngela and Khory |
collaborated to improve the scalability and parallelization |
of the statistical software TASSEL5, widely used for conducting genome wide association studies (GWAS) in plants. Angela |
wrote a manuscript to demonstrate that her new stepwise epistatic model selection procedure |
has greater statistical power compared to other methods. However, the Java-based TASSEL5 cannot be |
easily parallelized across multiple nodes in a computational cluster, to run on |
which tend to be very large, such as the Alzheimer's SNP panel. Khory |
provided the expertise in computer science to convert this Java code |
into C++ and parallelize it in HPC environment. |
Khory Wagner advised by Dr. Vologymyr Kindratenko |
NCSA Industry
Nainika Roy B.S. Molecular and Cellular Biology (2017) minor in Informatics and Chemistry SPIN fellow | Data formats and data structures in computational genomics | |
Junyu Li B.S. Molecular and Cellular Biology (2017) minor in Computer Science SPIN fellow | Genomic variant calling by assemblyJunyu worked with Mr. K in an interdisciplinary team, providing the expertise in math and computer science to automate the Cortex-var workflow and interpret the algorithm. Poster: Reference-guided variant calling for novel non-repetitive sequences in Glycine max | |
Noah Flynn B.S. Bioengineering, Mathematics (2017) minor in computer science SPIN fellow | Evolution of molecular networks and persistence of organisms | |
Jacob Heldenbrand Research Programmer B.S. Biochemistry (2014) M.S. Bioinformatics (2016) | NCSA IndustryJacob supports biomedical partners in the NCSA Industry program. Jacob provides a complementary mix of expertise in HPC and bioinformatics data analysis to enable pharmaceutical, agricultural and medical companies to utilize the high performance computing resources at NCSA. Jacob and Azza Ahmed (Ph. D. candidate, University of Khartoum) are exploring and evaluating the | |
Matthew Weber B.S. Molecular and Cellular Biology |
Ryan Chui
B.S. Biochemistry(2016) M.S. Bioinformatics ( |
2018) |
CompGen fellow advised by Dr. Matthew Hudson | Mutation profiles of cancerMr. Weber is developing machine learning methods to effectively stratify cancers based on the statistical properties of mutations found in afflicted individuals. Cancer stratification is predictive of disease outcomes, drug response and drug metabolism. Effective computational approaches based on total data acquired to-date can make this process cheaper in the clinic. Matt collaborates with the Ontario Institute for Cancer Research to make sure his models are realistic. Paper: Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models Poster: Statistical models to capture mutational properties for NextGen Sequencing Data |
Aishwarya Raj |
B.S. |
Biochemistry ( |
2019) minor in |
Bioinformatics | Evolution of molecular networks and persistence of organismsConstruct and compare gene, metabolic and signaling networks from organisms across the tree of life. The goal of the project is to provide support for the general framework of persistence strategies. It postulates that persistence is achieved by biological systems via a tradeoff of traits that serve either economy, flexibility, or robustness. In this project we want to determine and quantify the molecular mechanisms that underlie these persistence strategies. |
Will analysis of the biomolecular networks allow us to differentiate between organisms of differing economy, flexibility, and robustness, and subsequently classify unknown, newly discovered, or modified organisms within such predefined Poster: Persistence Strategies in Biomolecular Network Architecture NCUR Slides: Architecture and Dynamics of Biomolecular Networks Facilitate Evolution of Persistence Strategies in Living Organisms | |
Cynthia Liu |
B.S. |
Bioengineering (2019) minor in |
Illinois Informatics Institute fellow
Ellen Nie
BS. Biochemistry (2018),
minor in Computer Science
Big data network transfers for genomics
Ellen is benchmarking the network transfers of genomic data across multiple sites.
She wants to understand the limitations of modern network backbone for big data genomics,
and to facilitate correct configuration of the endpoints to resolve those limitations.
Ellen is looking at the sites of our collaborators in Toronto, South Africa, Sudan, and the UK.
Computer Science | Workflow management comparisonsCynthia worked to learn the Nextflow system for workflow management and to compare and contrast Poster: Comparative Analysis of Genomic Sequencing Workflow Management Systems | |
Brian Rao B.S Integrative Biology (2018) Minor in Informatics | Brian wrote and tested the variant calling workflow code for the Mayo Grand Challenge. He focused on the accuracy and performance considerations of tumor variant detection in clinical settings. | |
Angelynn Huang | Angelynn contributed to benchmarking the performance and accuracy of Minimap2 (Li, 2018) - Minimap2 maps the sequencing reads against the reference genome for the species. Poster : Minimap2_BWA MEM Spotlight: http://www.ncsa.illinois.edu/news/story/ncsa_student_spotlight_angelynn_huang_and_sophia_torrellas | |
Sparsh Agarwal B.Tech + M.Tech in Biochemical Engineering and Biotechnology (2018) MS in Bioinformatics (2020) | Mayo Grand Challenge Project He is working on Mayo Grand Challenge project that aims to detect genomic variants in humans responsible for HLHS disease by using Cortex-var software as the de novo assembler and variant caller. | |
Prakruthi Burra B. E. Computer Science (2018) M.S. Biological Sciences (2018) | Human Heredity & Health in AfricaPrakruthi contributes to UIUC's work with the H3Africa Consortium. She is involved with projects on graph representations of genome assemblies and machine learning techniques applied to biological problems. Workflow management for variant callingPrakruthi is also implementing a variant calling workflow in Nextflow, an increasingly popular workflow manager. Prior to her workflow development work, she was briefly involved in testing the workflow developed for the Mayo Grand Challenge. | |
Dave Istanto B.S. Crop Sciences (2018) | Nextflow Cortex_Var Structural Variant Calling WorkflowDave is responsible to develop a user-friendly and cluter-portable version of cortex_var workflow to detect large structural variants in given genomes using Nextflow workflow management language Soybean Haplotype and Structural Variant Profiling and AnalysisDave is responsible for both profiling of variants in 481 soybean lines, which later will be processed by correlating them to certain visible characteristics | |
Shubham Rawlani Bachelors in Electronics and Communication Engineering Masters in Information Management | Space Search Reduction and EpiQuantShubham is involved in data analysis part where he writes code for data wrangling, extraction and cleaning to ease out the evaluation of statistical algorithms in the analysis of GWAS data for genomic variant epistasis
| |
Priya Balgi Bachelors in Information Technology Engineering Masters in Information Management | Project ManagementPriya is responsible for assisting in execution of Project Management tasks. Additionally, she performs genomics workflow testing using bash scripting in HPC environment and is developing a website using GitHub Pages/Jekyll for creation & auto-maintenance of project documentation. She also lead a student group of 8 for representing NCSA industry research during the Engineering Open House where the Genomics group won the Second Best Original Under Graduate Research Award and will also represent NCSA Industry research at the BioIT World Conference. Poster: NCSA Industry Research | |
Mingyu Yang B.E. Network Engineering
| Mayo Grand Challenge ProjectMingyu is working on optimize and test the performance of GABAC, which is a gene compression application. | |
Yazhuo Zhang MS in Information Management | Racial Health DisparitiesYazhuo is involved in Racial Health Disparities project and researches with machine learning and data science skills. Her work is to do statistical analysis and write codes to build a pipeline on health datasets in collaboration with team members. | |
Dipro Ray B.S. Computer Science (2020) Minor in Mathematics | Resolving Racial Disparities by Applying Statistics on Complex, Multidimensional DatasetsDipro is working on turning a proof-of-concept prototype, of a statistical pipeline to analyze health data, into a well-structured open source package that is very portable, containerized and deployable through the cloud (like AWS), making such critical software available to researchers and collaborators with only a few commands. In pursuit of this goal, Dipro also works on refining the statistical pipeline in a modular manner and chalking out key design decisions for its implementation, and improving the package's computational efficiency (by making use of the host computer's architecture and resources)." | |
Tajesvi Bhat B.S. Computer Science (2020) | Deployment of Variant Calling Workflows on Cloud PlatformTajesvi is working on this that project aims to deploy variant calling workflows implemented using systems such as WDL and Nextflow in AWS and other cloud services. | |
Tiffany Li B.S. Integrative Biology (2018) minor in Computer Science | Benchmarking performance and accuracy of genomic variant calling softwareTiffany collaborates to document our efforts in benchmarking variant calling on HPC systems. We have run variant calling experiments on 500 genomes in parallel, on Blue Waters, to identify performance bottlenecks when using the GATK best practices workflow. We have also tested a number of alternative software, such as Isaac, Genalice, and Sentieon, as well as Dragen - a hardware solution. Tiffany is documenting the pros and cons of each of these excellent approaches in a separate manuscript. Validation and benchmarking on ParFu - a parallel file packaging utilityTiffany is also involved in testing and benchmarking of ParFu, an MPI tool for creating or extracting directory tree archives written by Dr. Craig Steffen, who works in the Blue Waters team. |
Nainika Roy
B.S. Molecular and Cellular Biology (2017)
minor in Informatics and Chemistry
SPIN fellow
Other Collaborations
Dr. Matthew Hudson Bioinformatics Crop Science | HPCBio, Carver Biotechnology Center |
| Dan Wickland Ph.D. Informatics (2019) |
Dr. Daniel Katz Computer Science |
SPIN fellow
NCSA Scientific Software and Applications |
Portable variant calling workflow |
H3Africa Consortium
in Swift | ||
Azza Ahmed Computer Science advised by Dr. Faisal Fadlelmola | ||
Dr. Zeynep Madak-Erdogan Food Science & Human Nutrition | Madak-Erdogan LabSystems Biology of Estrogen Signaling
| |
Brandi Smith Ph.D. Food Science |
Bioinformatics
and Human Nutrition (2021) | H3Africa Consortium
| |
Morgan Taschuk Bioinformatics | OICR
| |
Paul Hatton HPC / Visualisation | University of Birmingham |
Nahil Sobh Machine Learning, AI | UIUC Beckman Institute | |
Umberto Ravaioli Cyberinfrastructure, ECE | UIUC ECE, Beckman Institute | |
Lynn Hassan Jones Radiology | UIUC |