Publication
IEEE/ACM TCBB
Paper

Linear time algorithms to construct populations fitting multiple constraint distributions at genomic scales

View publication

Abstract

Computer simulations can be used to study population genetic methods, models, and parameters, as well as to predict potential outcomes. For example, in plant populations, predicting the outcome of breeding operations can be studied using simulations. In-silico construction of populations with pre-specified characteristics is an important task in breeding optimization and other population genetic studies. We present two linear time Simulation using Best-fit Algorithms SimBA for two classes of problems where each co-fits two distributions: SimBA-LD fits linkage disequilibrium and minimum allele frequency distributions, while SimBA-hap fits founder-haplotype and polyploid allele dosage distributions. An incremental gap-filling version of previously introduced SimBA-LD is here demonstrated to accurately fit the target distributions, allowing efficient large scale simulations. SimBA-hap accuracy and efficiency is demonstrated by simulating tetraploid populations with varying numbers of founder haplotypes, we evaluate both a linear time greedy algoritm and an optimal solution based on mixed-integer programming. SimBA is available on http://researcher.watson.ibm.com/project/5669.

Date

01 Jul 2019

Publication

IEEE/ACM TCBB

Authors

Share