Machine Learning Seminar 2015
Monday, November 9th, 2015IBM Research - Haifa, Israel
Tab navigation
- Invitation
- Program- selected tab,
- Registration
- Posters
Program
09:15-10:00 |
Registration |
---|---|
10:00-10:15 |
Opening Remarks, |
10:15-10:45 |
Constrain, Train, Validate and Explain: A
Classifier for Mission-Critical Applications,
Abstract: Classifiers used in
mission-critical applications, where
misclassification errors incur high costs, should be
robust to training-set artifacts, such as
insufficient or misrepresentative coverage and
severe forms of bias. As such, they are required to
support intensive designer-control, and a range of
validation procedures that must go beyond
cross-validation. For such applications, we advocate
the use of a family of classifiers that employ a
factored model of the posterior class probabilities.
These classifiers are simple, interpretable, allow
their designers to enforce a variety of
domain-specific constraints, and can tolerate
missing data both during training and at prediction
time. Such classifiers are also capable of
explaining their decisions in terms of the basic
measured quantities. This classifier is used in
several projects, one of which is described in this
talk.
|
10:45-11:45 |
Keynote: Deep Networks: a Theory?,
Abstract: IS-theory starts from the
hypothesis that invariant representations of images
are the main computational goal of the ventral
stream in visual cortex. Invariant representations
can be proved to lead to lower sample complexity in
image recognition. We propose a biologically
plausible simple-complex cells module (HW module)
for computing components of an invariant signature.
We use in a hierarchical architecture that add
selectivity to invariance by efficient approximation
of multidimensional functions. The architecture uses
an extension of additive splines that we call
hierarchical additive splines. We show that today's
Deep Convolutional Networks can be characterized in
terms of this theoretical framework.
|
11:45-12:00 |
Break |
12:00-12:30 |
Active Learning for Regression,
Abstract: Active learning is the field of
machine learning which studies learning when
examples are abundant, but labels are expensive. For
instance, this occurs when examples are documents or
photos that are freely available on the web, but
identifying their content reliably requires human
labor. Active learning algorithms interactively
select which labels to collect, taking into account
the usefulness of the unknown answer.
I will present a new active learning algorithm for parametric linear regression with random design. This algorithm has finite sample convergence guarantees for general distributions in the misspecified model. This is the first active learner for this setting that provably can improve over passive learning. Following the stratification technique advocated in Monte-Carlo function integration, this active learner approaches the optimal risk using piecewise constant approximations. Based on joint work with Remi Munos, INRIA Lille. |
12:30-13:00 |
Singular Values and Eigenvalues in Data
Analysis,
Abstract: Spectral algorithms have played a
central part in data analysis across all branches of
science since at least the 1930s. Singular values
and eigenvalues of data matrices appear under
numerous names: factors, principal components,
canonical correlations, etc. Recent theoretical
advances allow systematic study of the spectral
algorithms, and sometimes lead to optimal estimation
algorithms. I'll show how such optimal algorithms
are derived in two problems: matrix denoising and
covariance estimation.
|
13:00-13:30 |
A Tight Convex Upper Bound on the Likelihood of a
Finite Mixture,
Abstract: The likelihood function of a finite
mixture model is a non-convex function with multiple
local maxima and commonly used iterative algorithms
such as EM will converge to different solutions
depending on initial conditions. In this work we
ask: is it possible to find the global maximum of
the likelihood?
Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. We also present a convex estimation algorithm that works directly on the discrete set. Taken together, for any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood. Joint work with Yair Weiss. |
13:30-14:30 |
Lunch |
14:30-15:00 |
Context Sensitive Lexical Similarity via
Joint-context and Embedding models,
Abstract: Identifying similarities between
word meanings is a fundamental task in natural
language processing, which was found useful for many
applications. A prominent unsupervised learning
approach for this task is Distributional Similarity
– two words would be regarded similar if they tend
to appear in similar lexical contexts. Recently,
this approach gained tremendous attention thanks to
novel word embedding methods, and their efficient
implementations, which represent target and context
words as continuous vectors.
In this talk we present two recent advancements in modeling distributional word similarity. First, we show how joint contexts can be represented effectively via Substitute Vectors, based on language models, yielding a more informative context representation than typical bag of word models. Second, we show how similarity can be measured in a context sensitive manner, allowing us to predict different similarities for a target word depending on the particular context in which it appears. Our empirical results show that context-sensitive similarity is best modeled using substitute vectors, but can be approached also by a simple computation over word embedding vectors. Joint work with Oren Melamud, Jacob Goldberger, Omer Levy, Idan Szpektor and Deniz Yuret. |
15:00-15:30 |
Robust Inference and Local Algorithms,
Abstract: Robust inference is an extension of
probabilistic inference, where some of the
observations may be adversarially corrupted. We
limit the adversarial corruption to a finite set of
modification rules. We model robust inference as a
zero-sum game between an adversary, who selects a
modification rule, and a predictor, who wants to
accurately predict the state of nature.
There are two variants of the model, one where the adversary needs to pick the modification rule in advance and one where the adversary can select the modification rule after observing the realized uncorrupted input. For both settings we derive efficient near optimal policy runs in polynomial time. Our efficient algorithms are based on methodologies for developing local computation algorithms. Based on joint works with Uriel Feige, Aviad Rubinstein, Robert Schapira, Moshe Tennenholtz, Shai Vardi. |
15:30-15:45 |
Best student paper award |
15:45-16:15 |
Machine Learning Building Blocks,
Abstract: Big Data Analytics attracts a
growing interest, more than ever before. This, in
turn, creates a flood of innovative ideas, problems
and tasks to handle. It also poses challenges for
technologists that strive to keep in pace with the
explosion of new algorithmic and modelling toolsets,
and provide relevant and competitive solutions.
The goal of this work is to help closing this gap. To this end, we will introduce the concept of Machine Learning Building Blocks, which is a finite set of elements that can be mapped to hardware and software primitives and patterns. We will provide some intuition for the definition of the basic building blocks, and specific examples for the mapping to commonly used algorithms and modeling techniques, data characteristics, and usage scenarios. Next, we will present the design of a machine learning benchmark suite that provides a comprehensive coverage for selected building blocks. The novel construction is based on a selection of representative algorithms, real and synthesized data sets, and activation parameters. We will conclude with a few examples that demonstrate the utility of this approach for performance analysis. |
16:15-16:30 |
Closing Remarks, |
16:30-18:00 |
Poster Session |