ISMB 2022

RubricOE: what Machine Learning can say about Alzheimer’s Disease


Alzheimer’s disease (AD) is notable for having substantial heritability unaccounted for by single nucleotide polymorphism alleles. A small handful of SNPs strongly linked with the APOE E4 allele shows very strong odds ratios associated with Alzheimer’s. Among AD patients, early onset (EOAD) vs. late onset (LOAD) show different impacts in memory and language vs. motor and atypical AD symptoms, as well as familial heritability in EOAD. Generally, genetic association studies do not account for the heritability. This study considers the question of what machine learning (ML) approaches may reveal about pathogenic processes through identified alleles. We have constructed an ML pipeline, which we call RubricOE, comprised of linear kernel support vector machines, using feature ranking based on heritability estimated by variance predicted by linear ridge regression, and with multiple layers of cross validation to identify “stable sets” of the most strongly predictive features that remain consistent across all the training/test and validation splits. We evaluate these features using logistic regression characterizing expected sampling-induced variability, to relate cross-validation stable sets with GWAS confidence levels and to identify novel features that ML identified GWAS-like methods miss.