Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This group The NCSA Genomics Group is a host for research into the use of high performance computing (HPC) for primary genomics analyses, such as alignment, variant calling, genome assembly, and RNASeq. By its nature, this research is highly collaborative. Every member of our team is affiliated with multiple departments or campus initiatives. The student participants in this group serve as a bond between the campus faculty using computational genomics analyses in their research, and the NCSA experts in HPC, storage, networking, databases, etc. Together we enable the use of advanced computing infrastructure in computational genomics. Explore this page to find out who is involved, how we are connected, and what projects are currently ongoing.


Staff members from the software directorate and other groups within research consulting are frequently collaborators on our projects.

NCSA Press:

Crossing over, branching out: Meet the NCSA Genomics team

Engineering Open House Award

Collaborative efforts produce clinical workflows for fast, translational genetic analysis


Table of Contents:

Table of Contents
maxLevel1

Active Projects

Project name


Project description

Genomics staff 

Collaborators

Biomedical Pipeline DevelopmentMultiple projects involving workflow development and feature addition including determination on whether workflows can be deployed into the cloud.

Joshua Allen

Mohith Manjunath

Weihao Ge

Raghid Alhazmy

The Mayo Clinic


CROPPSGPU accleration and refactoring of code for gRNA design for CRISPR assays.

David Bianchi

Cornell Computer Science

Crop Science

NEATSoftware Development of sequence simulator with mutational models. 

Josh Allen

Raghid Alhazmy

The Mayo Clinic

Ontario Cancer Research Center

The Broad Institute

University of Wyoming


Farm to Food Bank Mobile Application DevelopmentDevelopment of a mobile app for farmers to sell off-spec and extra produce to food banks.Christina Fliege

Google


MINERVADeveloping an interface for combined genomic and diagnostic analysis for improved prediction, research and clinical interpretation of genomic variation.

David Bianchi

Raghid Alhazmy


The Mayo Clinic
Investigator PortalDeveloping a research compute portal for clinical diagnostics and genomics, where investigators can filter and query results against bioinformatic catalogs and applications, and generate re-useable datasets that can be viewed, analyzed and managed.

David Bianchi

Misael Lazaro

The Mayo Clinic
GWAS Study of Dairy CattleAnalyze genotype data of over 11 cow farms, to find the cows’ susceptibility to diseases and underlying genomic variants

Joshua Allen

Weihao Ge

Prof. Sandra Rodriguez Zas
Metabolomics data analysis on microbiomeUnderstand gut microbiome products for personalized medicine. 

Weihao Ge

Misael Lazaro

David Bianchi

Prof. Issac Cann
genomic selection for maize and sorghumEvaluate how much genomic variants will contribute to traits between maize and sorghum.Weihao GeProf. Alex Lipka




Christina Elizabeth Fliege

Technical Program Manager

Image Removed

Liudmila Sergeevna Mainzer

Senior Research Scientist, National Center for Supercomputing Applications


Image Added

NCSA Genomics, September 2017. Credit: Steve Deunsing


Image Added

Research Assistant Professor, Institute of Genomic Biology

217-300-0568

 

NCSA Genomics 'Best Original Undergraduate Research' Award


Image Added

NCSA Genomics - Bluewaters tour, 17 June 2019

Staff

Research Interests

Image Added

Joshua Allen

Senior Research Programmer

BA Mathematics and English (2001)

MA English (2005)

MS Bioinformatics (2019)

Sequence Simulation and Advanced Workflow Development. Modeling next-generation sequencing data, applying machine learning techniques to enhance models, writing production-ready code.

Image Added

David Bianchi

Research Scientist

Ph.D Physical Chemistry (2022)

Metabolic Engineering, Synthetic Biology, Gene Regulation, Genomics Analysis, Digital Agriculture, Spatial/Multi Omics, Personalized Medicine, Research Software Engineering, High-Performance Computing, GPU Computing

Mohith Manjunath

Research Programmer

Ph.D Aersospace Engineering (2014)

Quantum Computing in Biology and Chemistry


Image Added

Weihao Ge

B.S. Physics (2008)

M.S. Physics (2011)

Ph.D. Biophysics (2018)

Machine Learning in Epidemiology and Genomics. Biostatistics and Informatics.

Image Added

Raghid Alhazmy

Research Programmer

B.S. Biology (2021

Projects

Image Removed

Image Removed

Matthew Weber

B.S. Molecular and Cellular Biology (2016

)

M.S. Bioinformatics (

2018

2023)

Department of Crop Sciences, UIUC

CompGen fellow

advised by Dr. Matthew Hudson

 

Mutation profiles of cancer

Mr. Weber is developing machine learning methods to effectively stratify cancers

based on the statistical properties of mutations found in afflicted individuals.

Cancer stratification is predictive of disease outcomes, drug response and drug metabolism.

Effective computational approaches based on total data acquired to-date can make this process cheaper in the clinic.

Matt collaborates with the Ontario Institute for Cancer Research to make sure his models are realistic

Image result for junyu li uiucImage Removed
Genomic data analysis, Machine Learning, and Computational Biology.

Joao Paulo Gomes Viana



Image Added

Misael Lazaro

Academic Researcher

B.S. Biochemistry (2019)

M.S. Biochemistry (2023)

Genomics Data Analysis, Drug Development and Design, and Computational Biology

Graduate Students


Dhruvesh Shah

School of Information



Yash Wasnik

Information

Racial Health Disparities

Yazhuo is involved in Racial Health Disparities project and researches with machine learning and data science skills. Her work is to do statistical analysis and write codes to build a pipeline on health datasets in collaboration with team members.

Undergraduate Students




Alumni

Image Added

Katherine Kendig

Associate Project Manager

B.A. Anthropology (2012)

M.F.A. Creative Writing (2017)

Project Management

Katherine is a project manager with the NCSA Industry Program, working primarily with biomedical partners.

She benchmarked the Sentieon variant calling software for the Mayo Grand Challenge: https://www.biorxiv.org/content/10.1101/396325v1

She has also contributed to NCSA’s Public Affairs team, writing articles about NCSA and XSEDE research:

After the storm; Bringing supercomputing to psychology; DISSCO Tech; ECSS: Profiles in Consulting; NCSA Genomics; History was here

Image Added

Ramshankar Venkatakrishnan

Research Programmer

B.S. Electronics & Communications (2012)

M.S. Electrical & Computer Engineering (2015)

Phillips 66 and Hardware support

Ram is developing code for the Phillips 66 project with the Data Analytics team. The idea of the code is to use Machine Learning to determine the best price to sell their petroleum products. The model considers a vast array of parameters to make the decision.

Ram is working with the Innovative Systems Laboratory (ISL) at NCSA to create roofline model for a U250 Xilinx card using convolution as the code to plot the model.

Ram also provides software and installation support for the HPC clusters at NCSA for a variety of clients.

Image Added

Dan Lanier, Research Programmer

B.S. Applied Mathematics (2008)

NCSA Industry

Dan supports biomedical partners in the NCSA Industry program.

Dan provides a complementary mix of expertise in HPC and mathematical data analysis to enable pharmaceutical, agricultural and medical companies to utilize the high performance computing resources at NCSA.



Image Added

Matthew Kendzior

Research Programmer

BS Crop Sciences (2016)

MS Bioinformatics (2019)

Mayo Grand Challenge

Mr. K is working as a researcher in the Mayo Grand Challenge, which aims to drastically speed up the time for detection of genomic variants, and to extract more information from whole genome sequencing data.

Junyu Li

B.S. Molecular and Cellular Biology (2017)

minor in Computer Science

SPIN fellow

 

Genomic variant calling by assembly

Junyu and

Mr. K

are

is focusing on a method to detect genomic variants by assembly.

They are

He is employing the software Cortex-var, which constructs de-novo genome assembly

on multiple
sequencing samples, and then compares the resultant de Bruijn graphs

to detect where they
diverge, indicating a potential variant. This could be a good method

for detecting novel variants,
especially repeats and complex rearrangements in complex genomes,

such as polyploid plants and

cancer.

Junyu and Mr. K work as an interdisciplinary team. Junyu provides the expertise in

math and computer science to automate the Cortex-var workflow and interpret the algorithm.


cancer. Mr. K is using his strong background in genomics to interpret, clean-up and validate the output.

Mr. K is also working with Tiffany on the genomic analysis of HLHS for the Mayo Grand Challenge.

Poster: Variant Calling by Assembly

Poster: Reference-guided variant calling for non-repetitive sequences in Glycine Max

Image Added

Brian Bliss, Research Programmer

Data compression

Brian will be working on data compression for the Mayo Grand Challenge project.






Sushma Yellapragada

Bachelor of Technology: Computer Science Engineering, Northcap University (2019)

M.S. Computer Science, UIUC (2022)

NEAT

Sushma is currently working on the NEAT project, contributing code and testing.


Angelo Santos

Image Added

Yazhuo Zhang

MS in Information Management

Racial Health Disparities

Yazhuo is involved in Racial Health Disparities project and researches with machine learning and data science skills. Her work is to do statistical analysis and write codes to build a pipeline on health datasets in collaboration with team members.


Sijia Huo

Image RemovedMatthew Kendzior

B.S.

Crop Science (2016)

Mathematics & Computer Science (2018)

second major in Statistics

third major in Economics

Parallelization of R

Sijia is working with NCSA Faculty Fellow Dr. Zeynep Madak-Erdogan to introduce parallel R code into her research.

Dr. Madak-Erdogan is exploring racial disparities in breast cancer occurrence through the lens of diet and nutrition.

Image Added


Ryan Chui

B.S. Biochemistry (2016)

Plant Biotechnology, Molecular Biology

M.S. Bioinformatics (

2018

2017)

Department of

Crop Sciences, UIUC

Graduate Fellow in the College of ACES

Computer Science, UIUC

NCSA Industry

Ryan performed software installation, benchmarking, and development for a variety of industry partners.To investigate how the training time for deep neural networks (DNN’s) can be affected, Ryan worked with TensorFlow, Google’s deep learning library, to perform multi-label classification on a data set.

He built an autoencoder – an unsupervised deep neural network - to extract salient features from the
data.

On Github:

EpiQuant: Hadoop, C, Tensorflow - epistasis software prototypes

MLCC - multi-label cancer classification

q2b - binary representation of nucleotides

ptgz - parallel tar gzip

Usage Analyzer - log analyzer for HPC schedulers

advised by Dr. Matthew Hudson 

Image Modified

Jennie Zermeno

BS

B.S. Integrative Biology (2017)

Benchmarking performance and accuracy of genomic variant calling software

Jennie

and Tiffany collaborate

collaborated to document our efforts in benchmarking variant calling on HPC systems.

We have run variant calling experiments on 500 genomes in parallel, on Blue Waters,

to identify performance bottlenecks when using the GATK best practices workflow.

Jennie is documenting this work in a publication.

 

We have also tested a number of alternative software, such as Isaac, Genalice, Sentieon,

as well as Dragen - a hardware solution.

Tiffany is documenting the pros and cons of each of these excellent approaches in a separate manuscript.

Image Removed

Tiffany Li

BS Integrative Biology (2018)

minor in Computer Science

Jennie also participated in the debugging of the H3ABioNet GATK Germline Workflow.

Bioinformatics in the Cloud

Jennie is investigating the issues of portability, reproducibility and scaling of bioinformatics workflows in cloud infrastructure by instantiating containerized versions of workflows.

Students Capitalize on Computational Genomics Research Using AWS

Image Modified

Angela Chen

M.S. Statistics (2017)

Department of Statistics, UIUC

CompGen fellow

advised by Dr. Alexander Lipka

Accurate and scalable GWAS algorithms

Angela and Khory

are collaborating

collaborated to improve the scalability and parallelization

of the statistical software TASSEL5, widely used for conducting genome wide association studies (GWAS) in plants.

Angela

is writing

wrote a manuscript to demonstrate that her new stepwise epistatic model selection procedure

has greater statistical power compared to other methods. However, the Java-based TASSEL5 cannot be

easily parallelized across multiple nodes in a computational cluster, to run on
modern, relevant datasets,

which tend to be very large, such as the Alzheimer's SNP panel.


Khory

is providing

provided the expertise in computer science to convert this Java code

into C++ and parallelize

it in HPC environment.

 


Khory Wagner

advised by Dr. Vologymyr Kindratenko

Image result for Jacob HeldenbrandImage RemovedJacob Heldenbrand

NCSA Industry

program

Image result for nainika roy uiucImage Added

Nainika Roy

B.S. Molecular and Cellular Biology (2017)

minor in Informatics and Chemistry

SPIN fellow

Data formats and data structures in computational genomics

Image result for junyu li uiucImage Added

Junyu Li

B.S. Molecular and Cellular Biology (2017)

minor in Computer Science

SPIN fellow

Genomic variant calling by assembly

Junyu worked with Mr. K in an interdisciplinary team, providing the expertise in math and computer science to automate the Cortex-var workflow and interpret the algorithm.

Poster: Reference-guided variant calling for novel non-repetitive sequences in Glycine max

Image Added

Noah Flynn

B.S. Bioengineering, Mathematics (2017)

minor in computer science

SPIN fellow

 

Evolution of molecular networks and persistence of organisms

Image Added

Jacob Heldenbrand

Research Programmer

B.S. Biochemistry (2014)

M.S. Bioinformatics (2016)


NCSA Industry

Jacob supports biomedical partners in the NCSA Industry program.

Jacob provides a complementary mix of expertise in HPC and bioinformatics data analysis to enable pharmaceutical, agricultural and medical companies to utilize the high performance computing resources at NCSA.

Jacob and Azza Ahmed (Ph. D. candidate, University of Khartoum) are exploring and evaluating the
use of Swift T for variant calling.

Github: Swift T Variant Calling

Guide: Downloading large datasets with SRA Toolkit

Image Added

Matthew Weber

B.S. Molecular and Cellular Biology

Image Removed

Ryan Chui

B.S. Biochemistry

(2016)

M.S. Bioinformatics (

2017

2018)

Department of

Computer Science

Crop Sciences, UIUC

Image Removed

CompGen fellow

advised by Dr. Matthew Hudson

Mutation profiles of cancer

Mr. Weber is developing machine learning methods to effectively stratify cancers based on the statistical properties of mutations found in afflicted individuals. Cancer stratification is predictive of disease outcomes, drug response and drug metabolism. Effective computational approaches based on total data acquired to-date can make this process cheaper in the clinic. Matt collaborates with the Ontario Institute for Cancer Research to make sure his models are realistic.

Paper: Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models

Poster: Statistical models to capture mutational properties for NextGen Sequencing Data

Image Added

Aishwarya Raj

Noah Flynn

B.S.

Bioengineering, Mathematics

Biochemistry (

2017

2019)

minor in

computer scienceSPIN

Bioinformatics

Illinois Informatics Institute fellow

Evolution of molecular networks and persistence of organisms

Construct and compare gene, metabolic and signaling networks from organisms across the tree of life.

The goal of the project is to provide support for the general framework of persistence strategies.

It postulates that persistence is achieved by biological systems via a tradeoff of traits that serve either economy, flexibility, or robustness. In this project we want to determine and quantify the molecular mechanisms that underlie these persistence strategies.

Will analysis of the biomolecular networks allow us to differentiate between organisms of differing economy, flexibility, and robustness, and subsequently classify unknown, newly discovered, or modified organisms within such predefined
classes?

Poster: Persistence Strategies in Biomolecular Network Architecture

NCUR Slides: Architecture and Dynamics of Biomolecular Networks Facilitate Evolution of Persistence Strategies in Living Organisms

Image Added

Cynthia Liu

Image result for bird drawing flying upImage RemovedAishwarya Raj

B.S.

Biochemistry

Bioengineering (2019)

minor in

Bioinformatics

Illinois Informatics Institute fellow

Image Removed

Ellen Nie

BS. Biochemistry (2018),

minor in Computer Science

Big data network transfers for genomics

Ellen is benchmarking the network transfers of genomic data across multiple sites.

She wants to understand the limitations of modern network backbone for big data genomics,

and to facilitate correct configuration of the endpoints to resolve those limitations.

Ellen is looking at the sites of our collaborators in Toronto, South Africa, Sudan, and the UK.

Computer Science

Workflow management comparisons

Cynthia worked to learn the Nextflow system for workflow management and to compare and contrast
three competing workflow management options for bioinformatics in association with the work Ram is performing for the Mayo Grand Challenge.


Poster: Comparative Analysis of Genomic Sequencing Workflow Management Systems


Brian Rao

B.S Integrative Biology

(2018)

Minor in Informatics

Brian wrote and tested the variant calling workflow code for the Mayo Grand Challenge.  He focused on the accuracy and performance considerations of tumor variant detection in clinical settings.

Image Added

Angelynn Huang


Angelynn contributed to benchmarking the performance and accuracy of Minimap2 (Li, 2018) -
a program used for analyzing sequencing read data in genomics.

Minimap2 maps the sequencing reads against the reference genome for the species.
Currently, BWA MEM (Li, 2013) is the most widely used tool for this purpose,
with Novoalign (Hercus and Albertyn, 2012) coming as a close second.
However, recent research (Li, 2018) suggests that Minimap2 is equally accurate yet also faster than BWA MEM.
Are these claims true? Can we validate them independently using our own measurements?
Sophia and Angelynn ran tests in AWS to answer these questions.


Poster : Minimap2_BWA MEM


Spotlight: http://www.ncsa.illinois.edu/news/story/ncsa_student_spotlight_angelynn_huang_and_sophia_torrellas

Image Added

Sparsh Agarwal

B.Tech + M.Tech in Biochemical Engineering and Biotechnology (2018)

MS in Bioinformatics (2020)

Mayo Grand Challenge Project

He is working on Mayo Grand Challenge project that aims to detect genomic variants in humans responsible for HLHS disease by using Cortex-var software as the de novo assembler and variant caller.

Image Added

Prakruthi Burra

B. E. Computer Science (2018)

M.S. Biological Sciences (2018)

Human Heredity & Health in Africa

Prakruthi contributes to UIUC's work with the H3Africa Consortium. She is involved with projects on graph representations of genome assemblies and machine learning techniques applied to biological problems. 

Workflow management for variant calling

Prakruthi is also implementing a variant calling workflow in Nextflow, an increasingly popular workflow manager. Prior to her workflow development work, she was briefly involved in testing the workflow developed for the Mayo Grand Challenge. 


Image Added

Dave Istanto

B.S. Crop Sciences (2018)

Nextflow Cortex_Var Structural Variant Calling Workflow

Dave is responsible to develop a user-friendly and cluter-portable version of cortex_var workflow to detect large structural variants in given genomes using Nextflow workflow management language

Soybean Haplotype and Structural Variant Profiling and Analysis

Dave is responsible for both profiling of variants in 481 soybean lines, which later will be processed by correlating them to certain visible characteristics

Shubham RawlaniImage Added

Shubham Rawlani

Bachelors in Electronics and Communication Engineering

Masters in Information Management

Space Search Reduction and EpiQuant

Shubham is involved in data analysis part where he writes code for data wrangling, extraction and cleaning to ease out the evaluation of statistical algorithms in the analysis of GWAS data for genomic variant epistasis


Shubham is also involved in benchmarking the EpiQuant project and will collaborate to improve the scalability by testing on different datasets and nodes to achieve efficient results

Image Added

Priya Balgi

Bachelors in Information Technology Engineering

Masters in Information Management

Project Management

Priya is responsible for assisting in execution of Project Management tasks. Additionally, she performs genomics workflow testing using bash scripting in HPC environment and is developing a website using GitHub Pages/Jekyll for creation & auto-maintenance of project documentation.

She also lead a student group of 8 for representing NCSA industry research during the Engineering Open House where the Genomics group won the Second Best Original Under Graduate Research Award and will also represent NCSA Industry research at the BioIT World Conference.

Poster: NCSA Industry Research

Image Added

Mingyu Yang


B.E. Network Engineering


M.S. Electrical and Computer Engineering


Mayo Grand Challenge Project

Mingyu is working on optimize and test the performance of GABAC, which is a gene compression application.

Image Added

Yazhuo Zhang

MS in Information Management

Racial Health Disparities

Yazhuo is involved in Racial Health Disparities project and researches with machine learning and data science skills. Her work is to do statistical analysis and write codes to build a pipeline on health datasets in collaboration with team members.

Image Added

Dipro Ray

B.S. Computer Science (2020)

Minor in Mathematics

Resolving Racial Disparities by Applying Statistics on Complex, Multidimensional Datasets

Dipro is working on turning a proof-of-concept prototype, of a statistical pipeline to analyze health data, into a well-structured open source package that is very portable, containerized and deployable through the cloud (like AWS), making such critical software available to researchers and collaborators with only a few commands.

In pursuit of this goal, Dipro also works on refining the statistical pipeline in a modular manner and chalking out key design decisions for its implementation, and improving the package's computational efficiency (by making use of the host computer's architecture and resources)."

Image Added

Tajesvi Bhat

B.S. Computer Science (2020)
Minor in Bioengineering

Deployment of Variant Calling Workflows on Cloud Platform

Tajesvi is working on this that project aims to deploy variant calling workflows implemented using systems such as WDL and Nextflow in AWS and other cloud services.

Image Added

Tiffany Li

B.S. Integrative Biology (2018)

minor in Computer Science

Benchmarking performance and accuracy of genomic variant calling software

Tiffany collaborates to document our efforts in benchmarking variant calling on HPC systems. We have run variant calling experiments on 500 genomes in parallel, on Blue Waters, to identify performance bottlenecks when using the GATK best practices workflow.

We have also tested a number of alternative software, such as Isaac, Genalice, and Sentieon, as well as Dragen - a hardware solution. Tiffany is documenting the pros and cons of each of these excellent approaches in a separate manuscript.

Validation and benchmarking on ParFu - a parallel file packaging utility

Tiffany is also involved in testing and benchmarking of ParFu, an MPI tool for creating or extracting directory tree archives written by Dr. Craig Steffen, who works in the Blue Waters team.

Github: Parfu Archive Tool

Image result for nainika roy uiucImage Removed

Nainika Roy

B.S. Molecular and Cellular Biology (2017)

minor in Informatics and Chemistry

SPIN fellow

Data formats and data structures in computational genomics


Other Collaborations

Hudson Pic.jpgImage Modified

Dr. Matthew Hudson

Bioinformatics

Crop Science

HPCBio, Carver Biotechnology Center

Image result for eliott rodriguez uiucImage RemovedElliott Rodriguez

http://hpcbio.illinois.edu/

 Image Added

Dan Wickland

Ph.D. Informatics (2019)

Image result for dr daniel katz uiucImage Added

Dr. Daniel Katz

Computer Science

SPIN fellow

NCSA Scientific Software and Applications

+ University of Khartoum 

Portable variant calling workflow

 

H3Africa Consortium

Image Removed

in Swift

Github: Swift Variant Calling


Image result for azza ahmedImage Added

Azza Ahmed

Computer Science

University of Khartoum

advised by Dr. Faisal Fadlelmola

Image Added

Dr. Zeynep Madak-Erdogan

Food Science & Human Nutrition

Madak-Erdogan Lab

Systems Biology of Estrogen Signaling


Image Added

Brandi Smith

Ph.D. Food Science

Bioinformatics

 

 

Image result for human health and heredity in africaImage Removed

Image Removed

and

Human Nutrition (2021)

H3Africa Consortium

  • bioinformatics workflows in the cloud
  • custom genotyping chip for African populations
  • H3Africa bioinformatics node accreditation
Image Added

Morgan Taschuk

Bioinformatics

Image Added

OICR

  • production infrastructure for primary genomics analyses
  • reproducibility of research in cancer genomics


Image result for paul Hatton University of BirminghamImage Modified

Paul Hatton

HPC / Visualisation


Image result for university of birminghamImage Modified


University of Birmingham

 

 

 

 

The profile picture for Nahil SobhImage Added

Nahil Sobh

Machine Learning, AI

UIUC Beckman Institute

Curriculum Vitae

Umberto RavaioliImage Added

Umberto Ravaioli

Cyberinfrastructure, ECE

UIUC ECE, Beckman Institute

Biosketch


Lynn Hassan Jones

Radiology

UIUC

Resume