Host Trait Prediction of Metagenomic Data for Topology-Based Visualization
Abstract
Microbiome and metagenomic research continues to grow as well as the size and complexity of the collected data. Additionally, it is understood that the microbiome can have a complex relationship with the environment or host it inhabits, such as in gastrointestinal disease. The goal of this study is to accurately predict a host’s trait using only metagenomic data, by training a statistical model on available metagenome sequencing data. We compare a traditional Support Vector Regression approach to a new non-parametric method developed here, called PKEM, which uses dimensionality reduction combined with Kernel Density Estimation. The results are visualized using methods from Topological Data Analysis. Such representations assist in understanding how the data organizes and can lead to new insights. We apply this visualization-of-prediction technique to cat, dog and human microbiome obtained from fecal samples. In the first two the host trait is irritable bowel syndrome while in the last the host trait is Kwashiorkor, a form of severe malnutrition.