Clinical Genomic Analysis Workshop 2011
June 2, 2011
Organized by IBM Haifa Research
Lab
Abstracts
Estimating Heritability Using Random Effects Models
David Golan, Tel Aviv
University
Random effects models have recently been introduced as an
approach for analyzing genome wide association studies
(GWAS), which allows estimation of overall heritability of
traits without explicitly identifying the genetic loci
responsible. Using this approach, Yang et al. (2010) have
demonstrated that the heritability of height is much higher
than the ~10% associated with identified genetic factors.
However, Yang et al. relied on a heuristic for performing
estimation in this model. We adopt the model framework of
Yang et al. (2010) and develop a method for maximum
likelihood (ML) estimation in this framework. Our method is
based on MCEM (Wei et al., 1990), an
expectation-maximization algorithm wherein a Markov chain
Monte Carlo approach is used in the E-step. We demonstrate
that this method leads to more stable and accurate
heritability estimation compared to the approach of Yang et
al. (2010), and it also allows us to find ML estimates of
the portion of markers which are causal, indicating whether
the heritability stems from a small number of powerful
genetic factors or a large number of less powerful ones.
Linkage Analysis in the Presence of Germline Mosaicism
Dan Geiger, Technion – Israel
Institute of Technology
Genetic linkage analysis is a widely used statistical method
for genetic mapping. This method is successful in mapping
genes involved in simple Mendelian diseases, but is less
powerful in mapping genes that do not follow simple
Mendelian inheritance. Germline mosaicism is a genetic
condition in which some germ cells of an individual contain
a mutation. We extend the statistical model used for genetic
linkage analysis in order to incorporate germline mosaicism.
We develop a likelihood ratio test for detecting whether a
genetic trait has been introduced into a pedigree by
germline mosaicism. We analyze the statistical properties of
this test and demonstrate its effectiveness via computer
simulations. We further use this test to provide solid
statistical evidence that the MDN syndrome studied by
Genzer-Nir et al. was originated by germline mosaicism. This
work was done jointly by Omer Weissbrod and the speaker.
Generalized Alpha Investing: Definitions, Optimality
Results, and Applications to Public Bioinformatics
Databases
Ehud Aharoni, IBM Research -
Haifa
The increasing prevalence and utility of large, public
databases in the field of bio-informatics necessitates the
development of appropriate methods for controlling false
discovery. Motivated by this problem, we discuss the generic
problem of testing a possibly infinite stream of null
hypotheses. In this context, Foster and Stine (2007)
proposed a false discovery measure they called mFDR, and an
approach for controlling it named alpha investing. We
generalize alpha investing and use our generalization to
derive optimal allocation rules for the case of simple
hypotheses. We demonstrate empirically that this approach is
more powerful than alpha investing while controlling mFDR.
We then present the concept of quality preserving databases
(QPD), originally introduced in Aharoni et al. (2010), which
formalizes efficient public database management to
simultaneously save costs and control false discovery. We
show how one variant or generalized alpha investing can be
used to control mFDR in a QPD and lead to significant
reduction in costs compared to naïve approaches for
controlling the family-wise error rate implemented in
Aharoni et al. (2010).
Enrichment Statistics for Ranked Lists and Applications
in Genomics
Zohar Yakhini, Agilent
Laboratories and the Technion
I will describe a statistical approach to assessing the
statistical significance of high density of 1s in either
side of a binary vector. This method is used for analyzing
the enrichment of elements at the top of ranked lists. The
full characterization of the distribution of this statistics
can be obtained through a simple dynamic programming
procedure.
Useful applications include motif finding, the
identification of sequence elements related to DNA
methylation, enrichment of GO derived gene sets (through the
web-based application GOrilla), the joint analysis of miRNA
and mRNA profiling data and the study of interactions
between miRNA and RBPs (RNA binding proteins). I will
discuss examples with emphasis on the biological results.
For example, we used the ranked lists approach to perform
miRNA and mRNA joint analysis in a study of a cohort of 100
breast cancer samples and discovered several novel
relationships, including a direct association of miR-29 to
extra cellular matrix density of the tumors.
Computational Analysis of Gene Regulation, Disease
Classification, and Protein Networks
Ron Shamir, Tel Aviv University
Understanding complex disease is one of today's grand
challenges. In spite of the rapid advance of biotechnology,
disease understanding is still very limited and further
computational tools for disease-related data analysis are in
dire need. In this talk I will describe some of the tools
that we are developing for these challenges. I will describe
methods for utilizing expression profiles of sick and
healthy individuals to identify pathways dysregulated in the
disease, methods for integrated analysis for microRNA
expression and protein interactions in stem cells, and
methods for regulatory motif discovery.
Analysis of Complex Population Structure with
Applications
Eran Halperin, Tel Aviv
University
It is becoming increasingly evident that the analysis of
genotype data from populations of complex structure such as
recently admixed populations provides important insights
into human population demographic history and disease
genetics. Such analyses have been used to find novel genomic
regions associated with disease, to understand recombination
rate variation and recent selection events. In this talk, I
will provide an overview of the methods we developed for the
analysis of such populations, and I will illustrate how
these methods provide opportunities to identify regions
under selection, reconstruct recombination maps, and to
reconstruct haplotypes of extinct populations.
Uncovering the Human Cell Lineage Tree: The Next Grand
Scientific Challenge
Ehud Shapiro, Weizmann
Institute of Science
The cell lineage tree of a person captures the history of
the person's cells since conception. In computer science
terms it is a rooted, labeled binary tree, where the root
represents the primary fertilized egg, leaves represent
extant cells, internal nodes represent past cell divisions,
and vertex labels record cell types. It has approximately
100 trillion leaves and 100 trillion branches (≈100,000
bigger than the Human genome); it is unknown.
We should strive to know it, as many central questions in
biology and medicine are actually specific questions about
the Human cell lineage tree, in health and disease: Which
cancer cells initiate relapse after chemotherapy? Which
cancer cells can metastasize? Do insulin-producing beta
cells renew in healthy adults? Do eggs renew in adult
females? Which cells renew in healthy and in unhealthy adult
brain? Knowing the Human cell lineage tree would answer all
these questions and more.
Fortunately, our cell lineage tree is implicitly encoded in
our cells' genomes via mutations that accumulate when body
cells divide. Theoretically, it could be reconstructed with
high precision by sequencing every cell in our body, at a
prohibitive cost. Practically, analyzing only highly-mutable
fragments of the genome is sufficient for cell lineage
reconstruction. Our lab has developed a proof-of-concept
method and system for cell lineage analysis from somatic
mutations. The talk will describe the system and results
obtained with it so far, and future plans for this project.