Machine Learning Seminar 2011
June 12, 2011
Organized by IBM Haifa Research
Lab
Abstracts
Know What You Don't Know: Uncertain Models Learns with
Partial Feedback
Koby Crammer, Technion – Israel Institute of Technology
The talk will focus on a class of algorithms that maintain
uncertainty information about classifier parameters.
Informally, the uncertainty is modeled via a covariance
matrix over the weights of a linear hypotheses. Learning in
this framework modify parameters by estimating weights and
reducing the uncertainty. I will describe few algorithms to
learn models in this framework, both with full and partial
feedback, and show a mistake bound analysis. The usefulness
of maintaining confidence will be demonstrated in
multi-class decision problems and domain adaptation.
PARIS: Principal Atom Recognition In Sets
Michal Aharon, HP Labs Israel
We introduce a new method for discovering latent topics in
sets of objects, such as documents. Our method, which we
call PARIS (for Principal Atoms Recognition In Sets), aims
to detect principal sets of elements, representing latent
topics in the data, that tend to appear frequently together.
These latent topics, which we refer to as `atoms', are used
as the basis for clustering, classification, collaborative
filtering, and more. We develop a target function which
balances compression and low error of representation, and
the algorithm which minimizes the function. Optimization of
the target function enables an automatic discovery of the
number of atoms, representing the dimensionality of the
data, and the atoms themselves, all in a single iterative
procedure. We demonstrate PARIS's ability to discover latent
topics, even when those are arranged hierarchically, on
synthetic, documents and movie ranking data, showing
improved performance compared to existing algorithms, such
as LDA, on text analysis and collaborative filtering tasks.
Learning visual similarity using classifiers
Lior Wolf, The Blavatnik School of Computer Science,
Tel-Aviv University
The One-Shot-Similarity (OSS) is a framework for
classifier-based similarity functions. It is based on the
use of background samples and was shown to excel in tasks
ranging from face recognition to document analysis.
In this talk we will present the framework as well as the
following results:
(1) when using a version of LDA as the underlying classifier,
this score is a Conditionally Positive Definite kernel and
may be used within kernel-methods (e.g., SVM),
(2) OSS can be efficiently computed, and (3) a metric
learning technique that is geared toward improved OSS
performance.
Multi-class Norm-based AdaBoost-like Algorithm
Dan Gutfreund, IBM Haifa Research Lab
We propose a new approach to generalize the Adaboost
algorithm of Freund & Schapire to the multi-class setting.
The basic idea is to map labels and confidence-based
classifiers to a normed vector space, and to measure
performance by distances in this space. The result is a
family of algorithms which can handle the standard
multi-class classification setting, as well as the setting
where there is a structure on the label-space in the form of
distances between the labels. In the talk I will present
both theoretical analysis as well as experimental results.
Online Learning in The Manifold of Low-Rank Matrices
Gal Chechik, Bar-Ilan University
Models parametrized as matrices are common in multi-task
learning and metric learning, as in the case of learning a
Mahalanobis distance matrix. But when the dimensionality of
the data is large, learning such models becomes infeasible
due to memory constraints. A natural approach to learn
memory-restricted matrix models is to limit their ranks,
since an n by m matrix of rank k can be stored using (m+n)k
parameters. Unfortunately, the set of low-rank matrices is
non-convex, and approaches like iterative projected gradient
are very costly in run time.
Here we use the fact that low-rank matrices can be viewed as
embedded Riemann manifolds in a Euclidean space and describe
a procedure to learn such models in O((m+n)k) linear time.
Our approach, Loreta, is based on *retractions*.These are
operations that map a point onto the low-rank manifold as
accurately as projections, and we show how to compute a
special retraction in linear time when the gradients are of
rank 1. We find that Loreta learns similarity of text
documents through a ranking loss much more accurately than
full rank models using the same memory footprint. Loreta
also outperformed other methods using thesame memory when
tested in a large scale image annotation task.
Suggesting Friends Using the Implicit Social Graph
Sigalit Bar, Google
Users of online communication tools implicitly cluster
contacts, by virtue of their interactions with them, forming
implicit groups. In this talk we describe the implicit
social graph formed by these interactions, which is distinct
from explicit social graphs in which users explicitly add
other individuals as their "friends".
We introduce an interaction-based metric for estimating a
user's affinity to his contacts and groups. We then describe
a novel friend suggestion algorithm that uses a user's
implicit social graph to generate a friend group, given a
small seed set of contacts which the user has already
labeled as friends.
We show experimental results that demonstrate the importance
of both implicit group relationships and interaction-based
affinity ranking in suggesting friends. Finally, we discuss
two applications of the Friend Suggest algorithm that have
been released as Gmail Labs features.
What Else Can We Do with More Data
Shai Shalev-Shwartz, The Hebrew University of Jerusalem
In some applications, data is plentiful. By now, we have a
rather clear understanding of how more data can be used to
improve the accuracy of learning algorithms. In the talk
I'll show how more data can be beneficiary for other
purposes. In particular, I'll present algorithms that can
leverage large data sets to reduce the required training
runtime, prediction runtime, and algorithms that can use the
multitude of examples to compensate for lack of full
information on each individual example.
Automatically Tagging Email by Leveraging Other Users'
Folders
Yehuda Koren, Yahoo!
Most email applications devote a significant part of their
real estate to organization mechanisms such as folders. Yet,
we verified on the Yahoo! Mail service that 70% of email
users have never defined a single folder. This implies that
one of the most well known email features is underexploited.
We propose here to revive the feature by providing a method
for generating a lighter form of folders, or tags,
benefiting even the most passive users. The method
automatically associates, whenever possible, an appropriate
semantic tag with a given email. This gives rise to an
alternate mechanism for organizing and searching email.
We advocate a novel modeling approach that exploits the
overall population of users, thereby learning from the
wisdom-of-crowds how to categorize messages. Given our
massive user base, it is enough to learn from a minority of
the users who label certain messages in order to label that
kind of messages for the general population. We design a
novel cascade classification approach, which copes with the
severe scalability and accuracy constraints we are facing.
Significant efficiency gains are achieved by working within
a low dimensional latent space, and by using a novel
hierarchical classifier. Precision level is controlled by
separating the task into a two-phase classification process.
We performed an extensive empirical study covering three
different time periods, over 100 million messages, and
thousands of candidate tags per message. The results are
encouraging and compare favorably with alternative
approaches. Our method successfully tags 72% of incoming
email traffic. Performance-wise, the computational overhead,
even on surge large traffic, is sufficiently low for our
approach to be applicable in production on any large Web
mail service.