About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
BCB 2016
Conference paper
Scalable algorithms at genomic resolution to fit LD distributions
Abstract
While the problem of reconstructing a population that matches a given LD (linkage disequilibrium) distribution is not straightforward, it is further compounded if the population must additionally match MAF (minimum allele frequency) distribution as well. Here we address the task of co-fitting the multiple distributions at genomic resolutions. The solution is based on incrementally scaling a fast, i.e., linear time, non-generative algorithm (SimBA). Non-generative implies that the algorithm does not generate the population through evolution-simulation. Instead it directly builds the genomes in terms of polymorphic alleles that mimic the the structure of the desired population. We present an incremental framework to scale up the algorithm that continues to be both accurate and efficient. We demonstrate the efficacy of the algorithm on a variety of data sets, both human as well as plant data. Such simulation of populations that match summary distributions play a critical role in in-silico hypothesis-testing and optimization. For instance in-silico breeding optimization in plants can model years or decades of experimentation to predict breeding outcomes in an incredibly short time of days, if not hours or minutes.