Building Dictionaries Of 1D and 3D Motifs By Mining The Unaligned 1D Sequences Of 17 Archaeal and Bacterial Genomes

Isidore Rigoutsos; Yuan Gao; Aris Floratos; Laxmi Parida

ISMB 1999

Conference paper

06 Aug 1999

Building Dictionaries Of 1D and 3D Motifs By Mining The Unaligned 1D Sequences Of 17 Archaeal and Bacterial Genomes

Abstract

We have used the TEIRESIAS algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

Conference paper