Publication
Journal of Computational Biology
Paper

Gene proximity analysis across whole genomes via PQ trees

View publication

Abstract

Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal π pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases. © Mary Ann Liebert, Inc.

Date

Publication

Journal of Computational Biology

Authors

Topics

Share