Nature Machine Intelligence

Unsupervised ensemble-based phenotyping enhances discoverability of genes related to left-ventricular morphology

Download paper


Recent genome-wide association studies have successfully identified associations between genetic variants and simple cardiac morphological parameters derived from cardiac magnetic resonance images. However, the emergence of large databases, including genetic data linked to cardiac magnetic resonance facilitates the investigation of more nuanced patterns of cardiac shape variability than those studied so far. Here we propose a framework for gene discovery coined unsupervised phenotype ensembles. The unsupervised phenotype ensemble builds a redundant yet highly expressive representation by pooling a set of phenotypes learnt in an unsupervised manner, using deep learning models trained with different hyperparameters. These phenotypes are then analysed via genome-wide association studies, retaining only highly confident and stable associations across the ensemble. We applied our approach to the UK Biobank database to extract geometric features of the left ventricle from image-derived three-dimensional meshes. We demonstrate that our approach greatly improves the discoverability of genes that influence left ventricle shape, identifying 49 loci with study-wide significance and 25 with suggestive significance. We argue that our approach would enable more extensive discovery of gene associations with image-derived phenotypes for other organs or image modalities.