Page tree
Skip to end of metadata
Go to start of metadata

This group is a host for research into the use of high performance computing (HPC) for primary genomics analyses, such as alignment, variant calling, genome assembly, and RNASeq. By its nature, this research is highly collaborative. Every member of our team is affiliated with multiple departments or campus initiatives. The student participants in this group serve as a bond between the campus faculty using computational genomics analyses in their research, and the NCSA experts in HPC, storage, networking, databases, etc. Together we enable the use of advanced computing infrastructure in computational genomics. Explore this page to find out who is involved, how we are connected, and what projects are currently ongoing.

Liudmila Sergeevna Mainzer

Senior Research Scientist, National Center for Supercomputing Applications

Research Assistant Professor, Institute of Genomic Biology

217-300-0568

 

Open Projects looking for students

YOUR FACE HERE

  • Required skills:
    • OOP
    • Java
    • mapreduce/Hadoop/Spark
    • matrix manipulation
    • linear regression/linear solvers
    • basics of genomics
  • Desired skills:
    • statistics
    • bioinformatics

Convert Java-based GWAS code for Spark

In a project described below (Accurate and scalable GWAS algorithms) we are improving performance of
a stepwise epistatic model selection for Genome-Wide Association Studies. The method itself works well,
but the current Java implementation is way too slow for modern data sizes.

We would like to deploy this Java code on Spark, to see if the necessary performance gains could be obtained.

A successful student applicant will use Java Spark API to adapt the current code for a Spark platform that
is being deployed at NCSA ISL2.0. This code will be validated for correctness in collaboration with a student
statistician from the lab of Dr. Lipka, who developed this statistical method.

Current Projects

Matthew Weber

B.S. Molecular and Cellular Biology (2016)

M.S. Bioinformatics (2018)

Department of Crop Sciences, UIUC

CompGen fellow

advised by Dr. Matthew Hudson

Mutation profiles of cancer

Mr. Weber is developing machine learning methods to effectively stratify cancers
based on the statistical properties of mutations found in afflicted individuals.
Cancer stratification is predictive of disease outcomes, drug response and drug metabolism.
Effective computational approaches based on total data acquired to-date can make this process cheaper in the clinic.
Matt collaborates with the Ontario Institute for Cancer Research to make sure his models are realistic.

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0167047
https://f1000research.com/posters/5-1076

Image result for junyu li uiuc

Junyu Li

B.S. Molecular and Cellular Biology (2017)

minor in Computer Science

SPIN fellow

 

Genomic variant calling by assembly

Junyu and Mr. K are focusing on a method to detect genomic variants by assembly.

They are employing the software Cortex-var, which constructs de-novo genome assembly

on multiple sequencing samples, and then compares the resultant de Bruijn graphs

to detect where they diverge, indicating a potential variant. This could be a good method

for detecting novel variants, especially repeats and complex rearrangements in complex genomes,

such as polyploid plants and cancer.

Junyu and Mr. K work as an interdisciplinary team. Junyu provides the expertise in

math and computer science to automate the Cortex-var workflow and interpret the algorithm.

Mr. K is using his strong background in genomics to interpret, clean-up and validate the output.

Matthew Kendzior

B.S. Crop Science (2016)

Plant Biotechnology, Molecular Biology

M.S. Bioinformatics (2018)

Department of Crop Sciences, UIUC

Graduate Fellow in the College of ACES

advised by Dr. Matthew Hudson 

Jennie Zermeno

B.S. Integrative Biology (2017)

Benchmarking performance and accuracy of genomic variant calling software

Jennie and Tiffany collaborate to document our efforts in benchmarking variant calling on HPC systems.

We have run variant calling experiments on 500 genomes in parallel, on Blue Waters,
to identify performance bottlenecks when using the GATK best practices workflow.

Jennie is documenting this work in a publication.

 

We have also tested a number of alternative software, such as Isaac, Genalice, Sentieon,
as well as Dragen - a hardware solution.

Tiffany is documenting the pros and cons of each of these excellent approaches in a separate manuscript.

Validation and benchmarking on ParFu - a parallel file packaging utility

Tiffany is also involved in testing and benchmarking of ParFu,
an MPI tool for creating or extracting directory tree archives written by Dr. Craig Steffen,
who works in the Blue Waters team.

https://github.com/ncsa/parfu_archive_tool

Tiffany Li

B.S. Integrative Biology (2018)

minor in Computer Science

Angela Chen

M.S. Statistics (2017)

Department of Statistics, UIUC

CompGen fellow

advised by Dr. Alexander Lipka

Accurate and scalable GWAS algorithms

Angela and Khory are collaborating to improve the scalability and parallelization
of the statistical software TASSEL5, widely used for conducting genome wide association studies (GWAS) in plants.

Angela is writing a manuscript to demonstrate that her new stepwise epistatic model selection procedure
has greater statistical power compared to other methods. However, the Java-based TASSEL5 cannot be
easily parallelized across multiple nodes in a computational cluster, to run on modern, relevant datasets,
which tend to be very large, such as the Alzheimer's SNP panel.

Khory is providing the expertise in computer science to convert this Java code
into C++ and parallelize it in HPC environment.

 

Khory Wagner

advised by Dr. Vologymyr Kindratenko

Image result for Jacob Heldenbrand

Jacob Heldenbrand

B.S. Biochemistry (2014)

M.S. Bioinformatics (2016)

Bioinformatics specialist and

research programmer at NCSA

NCSA Industry

Jacob and Ryan collaborate to support the biomedical partners in NCSA Industry program.

They provide a complementary mix of expertise in computing (Ryan) and bioinformatics data analysis (Jacob)
to enable pharmaceutical, agricultural and medical companies utilize the high performance computing
resources at NCSA.

https://github.com/jacobrh91/Swift-T-Variant-Calling

 

Ryan Chui

B.S. Biochemistry (2016)

M.S. Bioinformatics (2017)

Department of Computer Science, UIUC

Noah Flynn

B.S. Bioengineering, Mathematics (2017)

minor in computer science

SPIN fellow

Evolution of molecular networks and persistence of organisms

Construct and compare gene, metabolic and signaling networks from organisms across the tree of life.

The goal of the project is to provide support for the general framework of persistence strategies.

It postulates that persistence is achieved by biological systems via a tradeoff of traits that serve either
economy, flexibility, or robustness. In this project we want to determine and quantify the molecular
mechanisms that underlie these persistence strategies. Will analysis of the biomolecular networks
allow us to differentiate between organisms of differing economy, flexibility, and robustness, and
subsequently classify unknown, newly discovered, or modified organisms within such predefined
classes?

Aishwarya Raj

B.S. Biochemistry (2019)

minor in Bioinformatics

Illinois Informatics Institute fellow

Ellen Nie

B.S. Computer Science (2018)

Big data network transfers for genomics

Ellen is benchmarking the network transfers of genomic data across multiple sites.
She wants to understand the limitations of modern network backbone for big data genomics,
and to facilitate correct configuration of the endpoints to resolve those limitations.
Ellen is looking at the sites of our collaborators in Toronto, South Africa, Sudan, and the UK.

Validation of Sentieon - the fast alternative to GATK

Ellen is also collaborating with OICR to validate the speed and accuracy of the new software
package for genomic variant calling, called Sentieon DNASeq.

Image result for nainika roy uiuc

Nainika Roy

B.S. Molecular and Cellular Biology (2017)

minor in Informatics and Chemistry

SPIN fellow

Data formats and data structures in computational genomics

Other Collaborations

Hudson Pic.jpg

Dr. Matthew Hudson

Bioinformatics

Crop Science

HPCBio, Carver Biotechnology Center

http://hpcbio.illinois.edu/

Image result for Daniel Katz uiuc ncsa

Dr. Daniel Katz

Computer Science

NCSA Scientific Software and Applications

Portable variant calling workflow in Swift

https://github.com/edrodri2/Swift-Variant-Calling

Image result for azza ahmed

Azza Ahmed

Computer Science

University of Khartoum

advised by Dr. Faisal Fadlelmola

Image result for human health and heredity in africa 

H3Africa Consortium

  • bioinformatics workflows in the cloud
  • custom genotyping chip for African populations
  • H3Africa bioinformatics node accreditation

Morgan Taschuk

Bioinformatics

OICR

  • production infrastructure for primary genomics analyses
  • reproducibility of research in cancer genomics

 

Image result for paul Hatton University of Birmingham

Paul Hatton

Image result for university of birmingham

 

University of Birmingham

 

 

 

 

 

  • No labels