Jihun Yun, Aurelie Lozano, et al.
NeurIPS 2021
Computational cell type annotation is an essential task for efficiently understating single-cell sequencing data. With advanced machine learning technologies, accurate cell type annotation has been possible by learning cell-type classification models. Transcriptomic foundation models incorporating large public sequencing data and using a variation of BERT have been developed to learn gene expression representation. Such models exhibit high-performance annotation because the learned representation retains essential information from large genetic repositories. Following re-training allows using rather small data associated with cell labels to effectively predict cell types in unannotated datasets. This study provides a cell annotation pipeline for Inflammatory Bowel Disease (IBD) using Biomedical Foundation Model (BMFM) based on scBERT (a variant of BERT). A model is first pre-trained using huge transcriptomic data followed by re-training with a single IBD dataset to predict its cell types. Our re-trained model can subsequently be used to annotate new IBD transcriptomic datasets. Using this pipeline, we examined how cell-types are properly annotated. With a pre-trained models using Panglao DB (contains 1 million cells in various conditions), an IBD dataset of SCP1884 (700K cells) is annotated by cell-type re-trained model by another IBD dataset of SCP259 (360K cells), which gives a cell-type mapping from SCP259 to SCP1884. Examination of the mapping found high concordance and showed that 69% (47 of 68 SCP1884 cell types) are appropriately predicted with corresponding cell type in three lineages (epithelial, immune, and stroma). Our results indicate BMFM-based annotation approach effectively helps understand a huge variation of IBD cells.
Jihun Yun, Aurelie Lozano, et al.
NeurIPS 2021
Uri Kartoun, Kingsley Njoku, et al.
AMIA ... Annual Symposium proceedings. AMIA Symposium
Ge Gao, Xi Yang, et al.
AAAI 2024
Imran Nasim, Michael E. Henderson
Mathematics