Genome assembly framework on massively parallel, distributed memory supercomputers

Friedrich Menhorn; Matthias Reumann

doi:10.1515/bmt-2013-4309

Biomedizinische Technik

Paper

01 Aug 2013

Genome assembly framework on massively parallel, distributed memory supercomputers

View publication

Abstract

Genome Assembly describes the process of assembling a long Deoxyribonucleic acid sequence out of next generation sequencing (NGS) data. Computational resources can become a bottleneck or large scale routine use. We propose a genome assembly framework for massively parallel, distributed memory supercomputers. Our frameworks builds on the simple idea to equally distribute the number of reads to each processor. Each processor holds the whole reference genome. Each processor aligns the short reads independently and sends the reads back to root processor together with the corresponding position and the whole genome can be aligned. We run our alignment framework on up to 8,196 processors of the IBM Blue Gene/Q "Avoca" at the Victorian Life Science Computation Initiative. The results show that more than 6 Million reads of over 324 Million nucleotides can be assembled in under 20 minutes versus previously requiring hours. Thus, our framework allows fast assembly of NGS data.

Paper