About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ISMB 2022
Poster
Discovering Proteins: Function to Name
Abstract
Currently approximately half of all microbial proteins are tagged as putative or hypothetical proteins and lack functional annotation which leads to a reduced understanding of biological function at the genome-level and limits the classification of microorganisms especially pathogens. Here, we developed an approach to perform functional annotation of hypothetical proteins from over 50 million named proteins and 27K functional codes (InterProScan domain codes). We train 3 separate models for performing functional annotations at domain, family, and superfamily levels using Kraken. Furthermore, we construct a functional space to visualize these proteins and perform biological validation of results, while also enabling the discovery of potentially new proteins and their function. Most interestingly, this high dimensional functional space will facilitate the shift from genotype to phenotype for named proteins. Leveraging this space, we identify function-based clusters; if new clusters are formed due to improved annotation of hypothetical proteins, we will possibly uncover and understand evolutionary paths shared with known proteins. We use data from our Functional Genomics Platform for our work which has over 300K prokaryotic genomes, 75 million gene sequences, 55 million protein sequences, and over 260 million functional domains.