International Journal of Computational Biology and Drug Design

Breaking the computational barrier: A divide-conquer and aggregate based approach for Alu insertion site characterisation

View publication


Insertion site characterisation of Alu elements is an important problem in primate-specific bioinformatics research. Key characteristics of this challenging problem include: • Data are not in the pre-defined feature vectors for predictive model construction. • Without any prior knowledge, can we discover the general patterns that could exist and also make biological insights? • How to obtain the compact yet discriminative patterns given a search space of 4200? This paper provides an integrated algorithmic framework for fulfilling the above mining tasks. Compared to the benchmark biological study, our results provide a further refined analysis of the patterns involved in Alu insertion. In particular, we acquire a 200nt predictive profile around the primary insertion site which not only contains the widely accepted consensus, but also suggests a longer pattern (T)7AA[G|A]AATAA. This pattern provides more insight into the favourable sequence variations allowed for preferred binding and cleavage by the L1 ORF2 endonuclease. The proposed method is general enough that can be also applied to other sequence detection problems, such as microRNA target prediction. © 2009 Inderscience Enterprises Ltd.