About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Big Data 2015
Conference paper
Big Data: Cloud computing in genomics applications
Abstract
Healthcare applications typically require big data management as well as intensive computation. This is especially true with recently developed next generation sequencing technology which increases interests in processing the huge amount of information in a timely fashion. In this paper, we focus on testing whether the healthcare applications can scale well on commercial big data platforms that implement MapReduce framework. We selected short read sequence alignment and assembly workloads in genome analysis workloads, and chose Bowtie, Blast and Contrail-bio which are publically available applications designed to run on the Hadoop MapReduce framework. To speed-up the processes we compressed the intermediate data using various compression schemes the compression schemes are compared. The test results are very promising and indicate that the wide range of genomic analysis workflows can be optimized on MapReduce frameworks with great computational efficiency and scalability.