Page tree
Skip to end of metadata
Go to start of metadata

This group is a host for research into the use of high performance computing (HPC) for primary genomics analyses, such as alignment, variant calling, genome assembly, and RNASeq. By its nature, this research is highly collaborative. Every member of our team is affiliated with multiple departments or campus initiatives. The student participants in this group serve as a bond between the campus faculty using computational genomics analyses in their research, and the NCSA experts in HPC, storage, networking, databases, etc. Together we enable the use of advanced computing infrastructure in computational genomics. Explore this page to find out who is involved, how we are connected, and what projects are currently ongoing.

NCSA Press: Crossing over, branching out: Meet the NCSA Genomics team

Liudmila Sergeevna Mainzer

Technical Program Manager, National Center for Supercomputing Applications

Research Assistant Professor, Institute of Genomic Biology



NCSA Genomics, September 2017. Credit: Steve Deunsing
Not pictured: Matt Weber, Ram Venkatakrishnan

Current People and Projects

Jacob Heldenbrand, Research Programmer

B.S. Biochemistry (2014)

M.S. Bioinformatics (2016)


NCSA Industry

Jacob supports biomedical partners in the NCSA Industry program.

Jacob provides a complementary mix of expertise in HPC and bioinformatics data analysis to enable
pharmaceutical, agricultural and medical companies to utilize the high performance computing resources at NCSA.

Jacob and Azza Ahmed (Ph. D. candidate, University of Khartoum) are exploring and evaluating the
use of Swift T for variant calling.

Github: Swift T Variant Calling

Guide: Downloading large datasets with SRA Toolkit

Ramshankar Venkatakrishnan, Research Programmer

B.S. Electronics & Communications (2012)

M.S. Electrical & Computer Engineering (2015)

Mayo Grand Challenge: evaluating and streamlining genomics workflows

Ramshankar is working on computational improvements for the Mayo Grand Challenge, a genomics research
project in partnership with the Mayo Clinic. Ram is rewriting Mayo's variant calling pipeline using the Cromwell/WDL workflow management.

Ram will also contribute his hardware expertise to the project, evaluating system architecture options to complement the team’s
software and coding improvements.

Github: MayomicsVC Pipeline

Katherine Kendig, Associate Project Manager

B.A. Anthropology (2012)

M.F.A. Creative Writing (2017)

Project Management

Katherine assists the team with research coordination, manuscript editing and preparation, and documentation.

She benchmarked the Sentieon variant calling software for the Mayo Grand Challenge.

She also contributes to NCSA’s Public Affairs team, writing articles about NCSA and XSEDE research:

After the storm; Bringing supercomputing to psychology; DISSCO Tech; ECSS: Profiles in Consulting; NCSA Genomics; History was here

Brian Bliss, Research Programmer

Data compression

Brian will be working on data compression for the Mayo Grand Challenge project.

Graduate Students

Weihao Ge

B.S. Physics (2008)

M.S. Physics (2011)

Ph.D. Biophysics (2018)

advised by Dr. Eric Jacobsson

Search Space Reduction

Weihao is evaluating statistical methods for search space reduction in the analysis of GWAS data for genomic variant
epistasis in association with disease to allow for faster, more meaningful results.

Her work is part of the CCBGM project "Scaling the Computation of Epistatic Interactions in GWAS Data."

Matthew Weber

B.S. Molecular and Cellular Biology (2016)

M.S. Bioinformatics (2018)

Department of Crop Sciences, UIUC

CompGen fellow

advised by Dr. Matthew Hudson

Mutation profiles of cancer

Mr. Weber is developing machine learning methods to effectively stratify cancers based on the
statistical properties of mutations found in afflicted individuals. Cancer stratification is predictive of
disease outcomes, drug response and drug metabolism. Effective computational approaches based
on total data acquired to-date can make this process cheaper in the clinic. Matt collaborates with the
Ontario Institute for Cancer Research to make sure his models are realistic.

Paper: Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models

Poster: Statistical models to capture mutational properties for NextGen Sequencing Data

Matthew Kendzior

B.S. Crop Science (2016)

Plant Biotechnology, Molecular Biology

M.S. Bioinformatics (2018)

Department of Crop Sciences, UIUC

Graduate Fellow in the College of ACES

advised by Dr. Matthew Hudson 

Genomic variant calling by assembly

Mr. K is focusing on a method to detect genomic variants by assembly.

He is employing the software Cortex-var, which constructs de-novo genome assembly on multiple
sequencing samples, and then compares the resultant de Bruijn graphs to detect where they
diverge, indicating a potential variant. This could be a good method for detecting novel variants,
especially repeats and complex rearrangements in complex genomes, such as polyploid plants and
cancer. Mr. K is using his strong background in genomics to interpret, clean-up and validate the output.

Mr. K is also working with Tiffany on the genomic analysis of HLHS for the Mayo Grand Challenge.

Poster: Variant Calling by Assembly

Poster: Reference-guided variant calling for non-repetitive sequences in Glycine Max

Prakruthi Burra

B. E. Computer Science (2018)

M.S. Biological Sciences (2018)

Workflow management for variant calling

Prakruthi is implementing a variant calling workflow in Nextflow (a workflow manager).

She is also in charge of testing the workflow developed for the Mayo Grand Challenge before delivery.

Human Heredity & Health in Africa

Prakruthi will be contributing to UIUC's work with the H3Africa Consortium.

Dave Istanto

B.S. Crop Sciences (2018)

Workflow management for structural variant calling

Dave is creating a Nextflow workflow for structural variant calling using Cortex-var.

Undergraduate Students

Aishwarya Raj

B.S. Biochemistry (2019)

minor in Bioinformatics

Illinois Informatics Institute fellow

Evolution of molecular networks and persistence of organisms

Construct and compare gene, metabolic and signaling networks from organisms across the tree of life.

The goal of the project is to provide support for the general framework of persistence strategies.

It postulates that persistence is achieved by biological systems via a tradeoff of traits that serve either
economy, flexibility, or robustness. In this project we want to determine and quantify the molecular
mechanisms that underlie these persistence strategies. Will analysis of the biomolecular networks
allow us to differentiate between organisms of differing economy, flexibility, and robustness, and
subsequently classify unknown, newly discovered, or modified organisms within such predefined

Poster: Persistence Strategies in Biomolecular Network Architecture

NCUR Slides: Architecture and Dynamics of Biomolecular Networks Facilitate Evolution of Persistence Strategies in Living Organisms

Cynthia Liu

B.S. Bioengineering (2019)

minor in Computer Science

Workflow management comparisons

Cynthia worked to learn the Nextflow system for workflow management and to compare and contrast three competing workflow management options

for bioinformatics in association with the work Ram is performing for the Mayo Grand Challenge.


Poster: Comparative Analysis of Genomic Sequencing Workflow Management Systems

Former Group Members

Ellen Nie

B.S. Computer Science (2018)

Big data network transfers for genomics

Ellen is benchmarking the network transfers of genomic data across multiple sites.
She wants to understand the limitations of modern network backbone for big data genomics,
and to facilitate correct configuration of the endpoints to resolve those limitations.
Ellen is looking at the sites of our collaborators in Toronto, South Africa, Sudan, and the UK.

Poster: Benchmarking and Optimization of Long Distance Big Data Transfers

Validation of Sentieon - the fast alternative to GATK

Ellen is also collaborating with OICR to validate the speed and accuracy of the new software
package for genomic variant calling, called Sentieon DNASeq.

Convert Java-based GWAS code for Spark

In a project described below (Accurate and scalable GWAS algorithms) we are improving performance of
a stepwise epistatic model selection for Genome-Wide Association Studies. The method itself works well,
but the current Java implementation is way too slow for modern data sizes.

We would like to deploy this Java code on Spark, to see if the necessary performance gains could be obtained.

A successful student applicant will use Java Spark API to adapt the current code for a Spark platform that
is being deployed at NCSA ISL2.0. This code will be validated for correctness in collaboration with a student
statistician from the lab of Dr. Lipka, who developed this statistical method.

Poster: Scaling the Computation of Epistatic Interactions in GWAS Data

Tiffany Li

B.S. Integrative Biology (2018)

minor in Computer Science

Benchmarking performance and accuracy of genomic variant calling software

Tiffany collaborates to document our efforts in benchmarking variant calling on HPC
systems. We have run variant calling experiments on 500 genomes in parallel, on Blue Waters,
to identify performance bottlenecks when using the GATK best practices workflow.

We have also tested a number of alternative software, such as Isaac, Genalice, and Sentieon,
as well as Dragen - a hardware solution. Tiffany is documenting the pros and cons of each of these
excellent approaches in a separate manuscript.

Validation and benchmarking on ParFu - a parallel file packaging utility

Tiffany is also involved in testing and benchmarking of ParFu, an MPI tool for creating or extracting
directory tree archives written by Dr. Craig Steffen, who works in the Blue Waters team.

Github: Parfu Archive Tool


Sijia Huo

B.S. Mathematics & Computer Science (2018)

second major in Statistics

third major in Economics

Parallelization of R

Sijia is working with NCSA Faculty Fellow Dr. Zeynep Madak-Erdogan to introduce parallel R code into her research.

Dr. Madak-Erdogan is exploring racial disparities in breast cancer occurrence through the lens of diet and nutrition.

Ryan Chui

B.S. Biochemistry (2016)

M.S. Bioinformatics (2017)

Department of Computer Science, UIUC

NCSA Industry

Ryan performed software installation, benchmarking, and development for a variety of industry partners.

To investigate how the training time for deep neural networks (DNN’s) can be affected, Ryan worked
with TensorFlow, Google’s deep learning library, to perform multi-label classification on a data set.
He built an autoencoder – an unsupervised deep neural network - to extract salient features from the

On Github:

EpiQuant: Hadoop, C, Tensorflow - epistasis software prototypes

MLCC - multi-label cancer classification

q2b - binary representation of nucleotides

ptgz - parallel tar gzip

Usage Analyzer - log analyzer for HPC schedulers

Jennie Zermeno

B.S. Integrative Biology (2017)

Benchmarking performance and accuracy of genomic variant calling software

Jennie collaborated to document our efforts in benchmarking variant calling on HPC
systems. Jennie also participated in the debugging of the H3ABioNet GATK Germline Workflow.

Bioinformatics in the Cloud

Jennie is investigating the issues of portability, reproducibility and scaling of bioinformatics workflows
in cloud infrastructure by instantiating containerized versions of workflows.

Students Capitalize on Computational Genomics Research Using AWS

Angela Chen

M.S. Statistics (2017)

Department of Statistics, UIUC

CompGen fellow

advised by Dr. Alexander Lipka

Accurate and scalable GWAS algorithms

Angela and Khory collaborated to improve the scalability and parallelization of the statistical software
TASSEL5, widely used for conducting genome wide association studies (GWAS) in plants.

Angela wrote a manuscript to demonstrate that her new stepwise epistatic model selection
procedure has greater statistical power compared to other methods. However, the Java-based
TASSEL5 cannot be easily parallelized across multiple nodes in a computational cluster, to run on
modern, relevant datasets, which tend to be very large, such as the Alzheimer's SNP panel.

Khory provided the expertise in computer science to convert this Java code into C++ and parallelize
it in HPC environment.


Khory Wagner

advised by Dr. Vologymyr Kindratenko

Image result for nainika roy uiuc

Nainika Roy

B.S. Molecular and Cellular Biology (2017)

minor in Informatics and Chemistry

SPIN fellow

Data formats and data structures in computational genomics

Image result for junyu li uiuc

Junyu Li

B.S. Molecular and Cellular Biology (2017)

minor in Computer Science

SPIN fellow

Genomic variant calling by assembly

Junyu worked with Mr. K in an interdisciplinary team, providing the expertise in math and computer
science to automate the Cortex-var workflow and interpret the algorithm.

Poster: Reference-guided variant calling for novel non-repetitive sequences in Glycine max

Noah Flynn

B.S. Bioengineering, Mathematics (2017)

minor in computer science

SPIN fellow


Evolution of molecular networks and persistence of organisms


Other Collaborations

Hudson Pic.jpg

Dr. Matthew Hudson


Crop Science

HPCBio, Carver Biotechnology Center


Dan Wickland

Ph.D. Informatics (2019)

Image result for Daniel Katz uiuc ncsa

Dr. Daniel Katz

Computer Science

NCSA Scientific Software and Applications

Portable variant calling workflow in Swift

Github: Swift Variant Calling


Image result for azza ahmed

Azza Ahmed

Computer Science

University of Khartoum

advised by Dr. Faisal Fadlelmola

Dr. Zeynep Madak-Erdogan

Food Science & Human Nutrition

Madak-Erdogan Lab

Systems Biology of Estrogen Signaling


Brandi Smith

Ph.D. Food Science and

Human Nutrition (2021)

Image result for human health and heredity in africa 

H3Africa Consortium

  • bioinformatics workflows in the cloud
  • custom genotyping chip for African populations
  • H3Africa bioinformatics node accreditation

Morgan Taschuk



  • production infrastructure for primary genomics analyses
  • reproducibility of research in cancer genomics


Image result for paul Hatton University of Birmingham

Paul Hatton

HPC / Visualisation


Image result for university of birmingham


University of Birmingham










  • No labels