Machine Learning Seminar 2014
Sunday, November 23rd, 2014IBM Research - Haifa, Israel
Tab navigation
- Invitation
- Program- selected tab,
- Registration
- Posters
Program
09:15-10:00 |
Registration |
---|---|
10:00-10:15 |
Opening Remarks, |
10:15-10:45 |
Multi-Modal Models for In-Depth Computational
Semantics,
Abstract: Vector space models (VSMs),
particularly those that are based on neural
embeddings have led to significant progress in the
modeling of natural language semantics over the past
few years. In the standard setup of this problem,
vector representations of words, phrases and even
full sentences are automatically acquired from large
textual corpora (e.g. Wikipedia) and are used for
predicting human-based conceptual similarity.
Recently, it has been established that combining
information from multiple modalities, especially
textual and visual, can enhance VSM quality,
supporting the intuition that conceptual meaning is
inherently linked to multiple sensual modalities,
and is not only linguistic in nature.
In this talk I will describe two inquiries into this standard setup. I will first address the question of utilizing visual information for the modeling of abstract concept meaning. Not surprisingly, the positive impact of such information on the expressive power of VSMs has only been demonstrated for concrete concepts (e.g penguin, table) while no impact was demonstrated on the vast majority of linguistic concepts (both verbal and nominal) that are known to be abstract (e.g. love, war). I will present models that can integrate information from multiple modalities for improved abstract concept modeling, and analyze their expressive power and limitations. I will then present an analysis of the leading data sets for the conceptual similarity task and demonstrate that the human ratings they contain strongly correlate with conceptual association (e.g. Freud and psychology) rather than similarity (e.g. car and train). To compensate for this bias, I will describe simLex-999, a new gold standard in which word pairs are judged for similarity rather than association. Experimental study demonstrates that existing VSMs substantially differ in their ability to model these two semantic qualities, although their objectives were not designed to prefer either of them. I will conclude with a list of open questions which will demonstrate that despite the substantial progress VSMs have brought to computational semantics we are still far from capturing the richness of human language meaning. |
10:45-11:45 |
Keynote: Scaling and Generalizing Variational
Inference,
Abstract: Latent variable models have become
a key tool for the modern statistician, letting us
express complex assumptions about the hidden
structures that underlie our data. Latent variable
models have been successfully applied in numerous
fields including natural language processing,
computer vision, population genetics, and many
others.
The central computational problem in latent variable modeling is posterior inference, the problem of approximating the conditional distribution of the latent variables given the observations. Inference is essential to both exploratory and predictive tasks. Modern inference algorithms have revolutionized Bayesian statistics, revealing its potential as a usable and general-purpose language for data analysis. Bayesian statistics, however, has not yet reached this potential. First, statisticians and scientists regularly encounter massive data sets, but existing algorithms do not scale well. Second, most approximate inference algorithms are not generic; each must be adapted to the specific model at hand. This requires significant model-specific analysis, which precludes us from easily exploring a variety of models. In this talk I will discuss our recent research on addressing these two limitations. First I will describe stochastic variational inference, an approximate inference algorithm for handling massive data sets. Stochastic inference is easily applied to a large class of Bayesian models, including topic models, time-series models, factor models, and Bayesian nonparametric models. Then I will discuss black box variational inference, a generic algorithm for approximating the posterior. We can use black box inference on many models with little model-specific derivation. Together, these algorithms make Bayesian statistics a flexible and practical tool for modern data analysis. This is joint work based on these two papers:
|
11:45-12:00 |
Break |
12:00-12:30 |
Online Principal Component Analysis,
Abstract: We consider the online version of
the well known Principal Component Analysis (PCA)
problem. In standard PCA, the input to the problem
is a set of d dimensional vectors x_1,... x_n and a
target dimension k < d; the output is a set of k
dimensional vectors y_1,..., y_n that best capture
the top singular directions of the original
vectors. In the online setting, the vectors x_t are
presented to the algorithm one by one, and for every
presented x_t the algorithm must output a vector y_t
before receiving x_{t+1}.
We present the first approximation algorithms for this setting of online PCA. Our algorithm produces vectors of dimension k * poly(1/\epsilon) whose quality admit an additive \epsilon approximation to the optimal offline solution allowed to use k dimensions. |
12:30-13:00 |
Probabilistic Graphical Models of Dyslexia,
Abstract: Reading is a complex cognitive
process, errors in which may assume diverse forms.
To capture the complex structure of reading errors,
a novel way of analyzing reading errors made by
dyslexic people is proposed; it's base is
probabilistic graphical models. The talk is focuses
on three questions. (a) which graphical model best
captures the hidden structure of reading errors. (b)
whether a graphical model can diagnose dyslexia
closely to how experts do (c) how can statistical
models support arguments in the debate about the
definition and heterogeneity of dyslexia. I will
show that Naive Bayes model best agrees with labels
given by clinicians and can be therefore used for
automation of the diagnosis process. An LDA-based
model best captures patterns of reading errors and
could therefore contribute to the understanding of
dyslexia and to the diagnostic procedure. Finally,
results on individuals data clearly support a model
assuming multiple dyslexia subtypes.
This is a joint work with Yair Lakretz, Gal Chechik and Naama Fridman. |
13:00-13:30 |
Single Sensory and Multisensory Information
Processing for Internet of Things,
Abstract: There will be over 50 billion
connected "things" in the year 2020. Intel's Cloud
Internet of Things Analytics Platform is designed to
greatly minimize the complexities of ingesting and
processing massive amounts of data generated in IoT
scenarios. Its vision includes collecting data from
numerous devices and sensors and storing it in a
cloud. In this talk we will describe innovative
algorithms for single sensory and multisensory
information processing, including sensor types
determination and prototyping followed by
information-theoretic based multisensory change
detection and One Class SVM based anomaly detection.
|
13:30-14:45 |
Lunch Break |
14:45-15:15 |
SystemML: A Declarative Machine Learning System, |
15:15 - 15:45 |
Inference by Randomly Perturbing
Max-Solvers,
Abstract: Modern inference problems can be
increasingly understood in terms of discrete
structures such as arrangements of objects in
computer vision, parses in natural language
processing or molecular structures in computational
biology. In a fully probabilistic treatment, all
possible alternative assignments are considered thus
sampling from traditional structured probabilistic
models may be computationally expensive for many
machine learning applications. These computational
difficulties are circumvented with a variety of
optimization techniques that provide max-solvers to
predict the most likely structure.
In this talk I will present a new approach to relax the exponential complexity of probabilistic reasoning in structured models while relying on efficient predictions under random perturbations. This approach leads to a new inference framework that is based on probability models that measure the stability of the prediction to random changes of the structures scores. |
15:45-16:15 |
Fully Unsupervised Ranking and Ensemble Learning,
or How to Make Good Decisions When you Know
Nothing,
Abstract: In various decision making
problems, one is given the advice or predictions of
several experts of unknown reliability, over
multiple questions or queries. This scenario
is different from the standard supervised setting
where classifier accuracy can be assessed using
available labeled training or validation data, and
raises several questions: Given only the predictions
of several classifiers of unknown accuracies, over a
large set of unlabeled test data, is it possible
to a) reliably rank them, and b) construct a
meta-classifier more accurate than any individual
classifier in the ensemble?
In this talk we'll show that under standard independence assumptions on classifier errors, this high dimensional data hides a simple low dimensional structure. We then present a spectral approach to address the above questions, and derive a new unsupervised spectral meta-learner (SML). We illustrate the competitive advantage of our approach on both simulated and real data, showing its robustness even in practical cases where some of the model assumptions are not precisely satisfied. Joint work with Fabio Parisi, Francesco Strino and Yuval Kluger (Yale) and with Ariel Jaffe (WIS). |
16:15 - 16:30 |
Closing Remarks, |
16:30-17:30 |