Big Data: Cloud computing in genomics applications
Abstract
Healthcare applications typically require big data management as well as intensive computation. This is especially true with recently developed next generation sequencing technology which increases interests in processing the huge amount of information in a timely fashion. In this paper, we focus on testing whether the healthcare applications can scale well on commercial big data platforms that implement MapReduce framework. We selected short read sequence alignment and assembly workloads in genome analysis workloads, and chose Bowtie, Blast and Contrail-bio which are publically available applications designed to run on the Hadoop MapReduce framework. To speed-up the processes we compressed the intermediate data using various compression schemes the compression schemes are compared. The test results are very promising and indicate that the wide range of genomic analysis workflows can be optimized on MapReduce frameworks with great computational efficiency and scalability.