AbstractAs biological sequencing data grows in scale, so too do the computational demands for moving from data to results.  This challenges biological cyberinfrastructure for everyone - for small labs that are generating large data sets, for sequencing centers, and for computing centers that are working with researchers.  Compounding the scaling problem, the needs of data-intensive science are a mismatch for standard high-performance computing approaches, which have traditionally focused on compute throughput.  I'll discuss some of our research approaches that focus on cloud computing and algorithm scaling rather than on scaling compute infrastructure, and talk about the future of data-driven discovery as I see it.  No compute centers were harmed in my research.

Bio: My pre-faculty research was been in developmental molecular biology, gene networks, and systems biology, but since then I've focused on how to make (biological) sense of large amounts of sequencing data, typically from non-model systems (so largely transcriptomes and environmental metagenomes). I'm particularly interested in questions of scaling and data integration, and how to make use of data to move more quickly to biologically relevant hypotheses. I have an abiding fascination with marine evo-devo and soil/sediment metagenomics. I've just moved into the School of Veterinary Medicine at UC Davis and would be very interested in discussing challenges of moving into a more applied/clinical setting, as well.

On the computational side, I've invested heavily in streaming and probabilistic data structures as applied to short-read analysis, and we are starting to work on read-to-graph sequence alignment and genome reference graphs.

I teach quite a few workshops on next-gen sequence analysis and better computational training, too.

Finally, I'm pro-open science, open source, and open access, and I use social media a lot. Sometimes people are interested in talking about that.


 

  • No labels