Sequence-based protein domain boundary prediction using BP neural network with various property profiles

Lei Ye; Ting Liu; Zhaohui Wu; Ruhong Zhou

doi:10.1002/prot.21745

Proteins: Structure, Function and Genetics

Paper

11 Oct 2007

Sequence-based protein domain boundary prediction using BP neural network with various property profiles

View publication

Abstract

Given the rapid growth in the number of sequences without known structures, it is becoming increasingly important to not only accurately define protein structural domains but also predict domain boundaries from the amino-acid sequence alone. In this article, we present a Back-Propagation (BP) neural network method using 9 different sequence profiles, based on chemical, physical, and statistical properties, to predict the domain boundary of two-domain proteins from one dimensional sequences. We have achieved an accuracy of 69% with a 10-fold cross validation on a 238 nonredundant two-domain protein dataset that we built based on a common set from both SCOP and CATH classifications. The method has also been applied to a larger third-party dataset with 522 proteins; and an accuracy of 62% has been achieved. Our prediction results on both datasets are found to be significantly better than those from some other methods, such as DomCut and DGS on the same datasets, and also comparable to that from the PPRODO method, upon which the larger dataset was based. Our cross validation results are aho noticeably better than previous ones from other BP neural network methods, probably because we have used more property descriptors with significantly more training nodes in our neural network The integration with PPRODO method also indicates that the information obtained from our current approach is complementary to that available through multiple sequence alignments. Moreover, the relative importance of each property profile has been analyzed in detail. © 2007 Wiley-Liss, Inc.

Paper