HMSC: A Hybrid Metagenomic Sequence Classification Algorithm

Subrata Saha; Zigeng Wang; Sanguthevar Rajasekaran

doi:10.1145/3388440.3412468

BCB 2020

Conference paper

21 Sep 2020

HMSC: A Hybrid Metagenomic Sequence Classification Algorithm

View publication

Abstract

Widespread availability of next-generation sequencing (NGS) technologies has prompted a recent surge in interest in the microbiome. As a consequence, metagenomics is a fast growing field in bioinformatics and computational biology. An important problem in analyzing metagenomic sequenced data is to identify the microbes present in the sample and figure out their relative abundances. Genome databases such as RefSeq and GenBank provide a growing resource to characterize metagenomic sequenced datasets. However, both the size of these databases and the high degree of sequence homology that can exist between related genomes mean that accurate analysis of metagenomic reads is computationally challenging. In this article we propose a highly efficient algorithm dubbed as "Hybrid Metagenomic Sequence Classifier"(HMSC) to accurately detect microbes and their relative abundances in a metagenomic sample. The algorithmic approach is fundamentally different from other state-of-The-Art algorithms currently existing in this domain. HMSC judiciously exploits both alignment-free and alignment-based approaches to accurately characterize metagenomic sequenced data. Rigorous experimental evaluations on both real and synthetic datasets show that HMSC is indeed an effective, scalable, and efficient algorithm compared to the other state-of-The-Art methods in terms of accuracy, memory, and runtime.

Paper