Clinical Genomic Analysis Workshop
March 2, 2010
Organized by the IBM Research - Haifa
Abstracts |
Postgenomic Era and P4 Medicine: Integrative Systems Biology Approaches
Pierre Baldi, Director, Institute for Genomics and Bioinformatics, University of California, Irvine
We will first provide a brief historical overview of genomic and P4 (Personalized, Predictive, Preventive, Participatory) medicine and some of the current computational challenges and opportunities. We will then present some recent results derived in our group in two areas: (1) charting regulatory networks and GWAS (Genome-Wide Association Studies) studies and (2) drug discovery. We will show how Bayesian methods can be used to better assess evolutionary conservation and build the most accurate genome-wide maps of regulatory elements. In turn, these maps can be used to study gene regulatory circuits and, for instance, map SNPs from GWAS to regulatory elements and regulatory hypotheses. For the drug discovery problem, we will demonstrate how chemoinformatics and other computational methods can be developed and brought to bear on the problem by identifying useful drug leads to fight tuberculosis. Finally, if time permits, we will also address issues of genome storage and privacy.
Mapping by Admixture Aberration Analysis Using Pooled DNA
Sivan Bercovici, Technion
Admixture mapping is a gene mapping approach used for the identification of genomic regions harboring disease-susceptibility genes in the case of recently admixed populations, such as African Americans. We present a novel method for admixture mapping called admixture aberration analysis (AAA) that uses a DNA pool of affected admixed individuals. We demonstrate through simulations that AAA is a powerful and economical mapping method under a range of scenarios, capturing complex human diseases such as hypertension and end-stage kidney disease. The method has a low false-positive rate and is robust to deviation from model assumptions. Simulation results indicate that the method can yield over 96% reduction in genotyping. Finally, we apply AAA on real admixture-mapping data of African Americans, replicating a known risk locus.
Enhancing Genetic Association Studies by Reordering Relevant SNPs
Hani Neuvirth-Telem, IBM
Hundreds of recent genetic association studies provide an opportunity to derive generalizing principles regarding SNPs' association potential and utilize these for improving new GWAS studies. Several features of SNPs have been identified as significantly overrepresented in trait-associated SNPs (TASs) based on published data for 19 diseases (including type-1 and type-2 diabetes and multiple sclerosis). A regression model based on these features and the published data has been constructed. The model provides, for each SNP in a given SNP panel, the probability of being relevant to a trait under study, before the SNP data of this study is examined. The resulting probabilities are used as weights that increase or decrease the needed threshold for p-value for each SNP using standard association methods. For 18 out of the 19 diseases, SNPs judged to be associated with the disease in previous studies have been promoted by the computed weights, and were practically unchanged for the remaining disease. Repeating a real study analysis of type-2 diabetes showed an increase in the number of relevant SNPs found for every sample size in this study.
Exploiting Population Diversity in the Analysis of Human Genetic Variation
Eran Halperin, Tel Aviv University
Population substructure has been known to be an obstacle in the analysis of genetic variation, particularly in the case of genome-wide association studies population diversity might lead to false discoveries of association, and it has to be accounted for in such analysis. In some cases, however, population diversity may be used to improve the analysis of genetic variation, both in the context of disease associations, and in relation to the estimation of population genetics parameters such as selection forces and recombination hotspot detection. In this talk, I will discuss the benefits of using a heterogeneous set of populations in the analysis of genetic data, mainly in the context of disease associations.
From Ivory Tower to Hospital - Bringing Clinical Genomic Research to the Medical Practice
A panel discussion with Jacques Beckmann, Mordechai Muszkat and Fabio Macciardi
Jacques Beckmann
Jacques S. Beckmann was appointed in October 2002 Professor of Human Genetics and Director of the Department of Medical Genetics at the Faculty of Biology and Medicine of the University of Lausanne, as well as head of the Medical Genetics Service of the Centre Hospitalier Universitaire Vaudois (CHUV). Previously, he held a chair as Full Professor at the Department of Molecular Genetics at the Weizmann Institute of Science in Rehovot, Israel. Initially trained in molecular genetics, he later moved to genetics. In the 1980s, Prof. Beckman together with Prof. M. Soller from the Hebrew University pioneered the use of marker-assisted genetic improvement in plants and animals, focusing on Quantitative Trait Loci (QTLs). His interest shifted in 1990 to human genetics with a move to Paris, where he held successively senior research positions at the CEPH, Généthon (Evry), and finally the Centre National de Génotypage (CNG, Evry), where he was Deputy Director. During those years, he collaborated with Prof. D. Cohen, J. Weissenbach, M. Lathrop, J. Dausset, and others and contributed significantly to the elaboration of genetic, physical, and gene maps of the human genome, as well as to the positional cloning of a number of disease loci, many of which are involved in muscular dystrophy. Prof. Beckmann has published over 290 scientific peer-reviewed articles in molecular genetics, genetics, and genomics and has an ISI H-index of 65; he has served on the editorial boards of a number of scientific journals and has been a board member of the ESHG, ENMC, Italian Telethon, and HGVS committees. His recent research interests also include genomic disorders, pharmacogenetics, and the genetic basis of complex traits.
Mordechai Muszkat
Dr. Mordechai Muszkat is an M.D. in the Division of Clinical Pharmacology, Department of Medicine, Hadassah Medical Center and a senior lecturer at the Hadassah Hebrew University School of Medicine. Dr Muszkat was trained in internal medicine and clinical pharmacology at the Hadassah University Hospital, followed by pharmacogenomic research at the Division of Clinical Pharmacology at Vanderbilt University's School of Medicine with Professors Wood and Stein. Dr Muszkat combines clinical work in the Internal Medicine Department with pharmacogenetic/genomic research, has published more than 40 research articles, and is a reviewer for leading clinical pharmacology journals.
His areas of pharmacogenetic/genomic research interest include the genetic variability in alpha2 and beta1 adrenergic receptors (AR) and the genetic determinants of warfarin metabolism and anticoagulant effect. Dr. Muszkat's work on genetic variation in alpha2 and beta1 ARs is focused on their in vivo functional effects on hemodynamic responses to selective adrenergic agonists. These translational studies act as a bridge between in vitro, cell-based data and clinical outcomes. Dr. Muszkat's work on genetic determinants of warfarin response is aimed at the translation of pharmacogenomic data to clinical practice. In a pioneering prospective clinical study (with Prof. Caraco) that compared a genotype-based warfarin dosing algorithm to a standard, clinically validated protocol, the genotype-based dosing algorithm resulted in superior efficacy and safety of warfarin treatment. Currently, prospective clinical studies that examine the clinical efficacy of genetically based algorithms that combine CYP2C9 and VKORC1 haplotypes, as well as other genetic variation, are underway.
What Can Be Learned from a Large Clinical Cohort
Sven Bergmann, University of Lausanne
The Cohorte Lausannoise (CoLaus) is a random population sample of more than 6,000 individuals who were genotyped using Affymetrix 500k SNP-arrays and for whom a large number of clinically relevant parameters have been measured. Comparing the country of origin of these individuals with the projection of their genotypic profile onto the principal components of the entire genotypic dataset revealed an astonishingly close correspondence between genetic and geographic distances. Indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. Whole-genome association studies for height, body-mass-index, serum lipid and calcium concentrations, blood pressure, and other clinical phenotypes using classical scans testing one SNP at a time elucidated many loci with highly significant associations, which are promising candidates towards unraveling mechanisms of actions. Yet like in many other studies, together these variants only explain a small fraction of the phenotypic variance, indicating that we are still missing a comprehensive picture of the following: (a) What are the causal variants? (b) What effects are attributed by rare variants and/or copy number variations? (c) What fraction of the variance can be explained by SNP-SNP or SNP-environment interactions? (d) What are the intrinsic limitations of currently used algorithms in dealing with very large sets of genotypic and phenotypic data, which are partially incomplete or noisy? I will outline our research dealing with these challenges.
Prediction using GWA Data: Utilising Pathway Analysis and Shrinkage Regression - Case-study on 3 Inflammatory Diseases
Clive Hoggart, Imperial College
Although the introduction of genome-wide association studies have identified many SNPs associated with common diseases, only a small proportion of the predicted genetic contribution has so far been elucidated. We demonstrate the utility of a dual approach to extract additional information from genome-wide data. We first apply a pathway based approach, which utilises a novel test for association between disease and pathways known to be involved in immune response. The most informative SNPs in the selected pathways are then identified using shrinkage regression implemented in the HyperLasso software. The method is applied to three auto-immune diseases genotyped by the WTCCC: Crohn's disease, rheumatoid arthritis and type 1 diabetes. The results are validated in two independent datasets.
The HIV Cohort Data Study: Selection and Assessment of SNP-drug Interactions Affecting Lipid Responses of HIV Patients to Antiretroviral Treatment
Diana Marek, University of Lausanne and Swiss Institute of Bioinformatics
Pharmacogenetics is the study of how people respond to drug therapy. It has long been known that genetic variants influence the risk of developing certain diseases or determine certain traits. Genetic research keeps generating massive data which combined with phenotypic traits allow the detection of genetic differences (SNPs) that explain the phenotypic variations among a population (i.e. genome wide association studies). Altogether these variants only explain a small fraction of the phenotypic variance, leaving room for other missing factors. One of them, the existence of SNP–environment interactions and their impact, remains a difficult task within the context of large scale data analysis. At a different level, understanding why some HIV patients treated with antiretroviral therapy develop adverse side effects, such as dyslipidemia, is one challenging goal of pharmacogenetics. In this work, we tried to explain the differential drug responses through the use of genetic variants, drugs and SNP-drug interactions. We analyzed a dataset from the Swiss HIV cohort containing 12170 measurements of lipid levels from 752 patients that underwent different treatments (up to 5 out of 16 drugs), as well as their genotype for 58 SNPs in genes involved in lipid transport and metabolism (22 SNPs associate with triglycerides (TG) variations, 20 with HDL and 14 with non HDL). Using a linear regression model and the TG response, we tested each single SNP and each single drug on the response (adjusted for the major confounding factors) and keep only those which were significant at a level of 5%. 10 of the 22 SNPs and 11 of the 16 drugs were selected. We next extended our model to include also SNP-drug interactions. In order to avoid overfitting, we used different methods to select only a fraction of all (110) possible interactions: (1) a naïve approach, where each SNP-drug interaction term was evaluated (on top of the SNPs and drugs effects) and included in the full model if significant at a level of 5%. (2) a stepwise procedure, starting with the SNPs and drugs model and adding one SNP-drug interaction term at a time based on the minimization of the F-statistic. To assess the performances of the final model containing 21 linear terms and 23 interactions, we used the R² and ROC analysis (random vs. stratified partitioning of the measurements) for predicting elevated lipid levels and compared them to control models that included the same number of interactions but picked at random. R² and AUC were compared for both in-sample (fitting) and for out-of-sample predictions (k-fold cross-validation). We find that adding the relevant SNP-drug interactions significantly improves the fitting of the data compared to models with no interaction or randomly selected interactions. When applying a random k-fold cross-validation, the increase in predictive power for the final model is about 15% R², whereas it reaches 10% under the stratified cross-validation. Nevertheless, under any of the two cross-validation approaches, a relative increase of 3% R² is observed between the selected model and the control models, confirming the mild, but significant effect of the selected SNP-drug interactions at explaining the variations observed in the triglycerides. These results shows that a fraction of SNPs and drugs explain a non negligible part of the triglycerides fluctuations observed among the HIV treated population. But more interestingly, some of their interactions seem to also play a role, quantified here as relatively small, certainly due to missing factors, the size and the potential noise of the dataset.
Genomewide Association Study for Schizophrenia in an Arab Israeli Family Sample
Ana Alkelai, Biological Psychiatry Laboratory, Dept. of Psychiatry, Hadassah - Hebrew University Medical Center
Previously, we used whole-genome linkage analysis for identification of schizophrenia susceptibility loci in an Arab Israeli family sample (Lerer et al., 2003) and reported significant evidence for a susceptibility locus at chromosome 6q23 and suggestive evidence for loci at chromosomes 10q23-26, 2q36.1-37.3, and 7p21.1-22.3. To identify susceptibility genes within the 6q23 region, extensive fine-mapping was performed (Levi et al., 2005) and the most significant genetic association with schizophrenia was found within a 500 kb genomic region which harbors the AHI1 and C6orf217 genes (Amann-Zalcenstein et al., 2006). Currently, the major emphasis in genetic research on schizophrenia is on genome-wide association studies (GWAS), which are anticipated to identify susceptibility genes with an effect size in the range expected for schizophrenia. Expanding our project beyond the 6q23 region, we performed a GWAS of the expanded Arab Israeli sample using the Illumina HumanCNV370-Duo BeadChip. In general this GWAS in the TKT sample replicates and strengthens the results of our previous studies. The best GWAS SNPs (p-value<0.0001) include SNPs in AHI1/c6orf217 (6q linkage region) and 10q, 2q and 7p linkage regions. We found genome wide significant association (q-value<0.05) with 17 SNPs. Five of these SNPs are in introns of a brain expressed, potential schizophrenia candidate gene (2q36.1-q37.3 linkage area; p=2.24x10-12, p=1.09x10-8, p=1.39x10-6, p=1.64x10-6 and p=2.214x10-6). The locations of all of these SNPs are predicted to be in intronic enhancers. One of the markers significantly associated with schizophrenia was found in the intergenic region between two additional candidate genes (2q36.1-q37.3 linkage area; p=7.4x10-10), and another in the intron (p=4.679x10-7, predicted intronic enhancer) of one of these genes. Additional significant SNPs are located in the intergenic region between the CITED and NMBR genes (6q22.33-q24.1 linkage area; p=7.9x10-9), in the promoter region of the SLC29A4 gene (7p22.3-p21.1 linkage area; p=2.378x10-6), downstream to the NXPH1 gene (7p22.3-p21.1 linkage area; p=1.475x10-7), downstream to the EPHA4 gene (2q36.1-q37.3 linkage area; p=1.179x10-6) and in the intron of the previously associated with schizophrenia in our sample AHI1 gene (6q22.33-q24.1 linkage area; p=1.838x10-6). Supported in part by the Israel Science Foundation.