**Machine Learning Seminar 2008May 25, 2008Organized by IBM Haifa Research
Lab**

**Abstracts**

**An Overview of Agnostic Learning**

Adam Kalai, Georgia Institute of Technology

The agnostic model of learning (Kearns, Schapire, and
Sellie, 1992) is a binary classification model that places
little or no requirements on the data. The main assumption
is that future examples will be drawn from the same
distribution as training examples, while accuracy is
compared to the most accurate classifier in some class.
Hence, such algorithms can tolerate arbitrary or even
adversarial noise. Furthermore, as in Valiant's PAC model,
an agnostic algorithm should be computationally efficient.
Designing computationally efficient agnostic learning
algorithms has proven to be an interesting challenge. I will
give an overview of older and more recent work on agnostic
learning, focusing on provably efficient algorithms (instead
of impossibility results). The analysis of agnostic learning
algorithms sheds light on several key Machine Learning
notions, such as Fourier learning, SVMs, active learning,
decision tree learning, and boosting.

**Learning for Debugging Large-Scale Web Services**

Ira Cohen, HP Labs

Large scale IT systems, and web services in particular, are
difficult to debug and manage. Commercial tools monitor
these services, but provide little in terms of analysis and
root cause information when problems occur. In recent years
there has been an effort to leverage and develop machine
learning methods for transforming these monitors to
actionable information and improve the manageability of
systems. The talk will describe our work in using machine
learning for web services performance debugging. In
particular, it will focus on the use of clustering with
pair-wise constraints, work that is motivated by the fact
that obtaining reliable labels for different types of
performance problems has proven to be very difficult. I'll
describe our probabilistic clustering method which allows
soft pair-wise constraints, represented by an intuitive
measure of confidence in the constraint, and compare it to
existing methods in terms of robustness to misspecified
constraints.

**Bi-Level Path Following for Cross Validated Solution of
Kernel Quantile Regression**

Saharon Rosset, Tel-Aviv University

In this talk I will introduce three important concepts, then
bring them together to discuss some of my latest research.
The three concepts are: 1. Estimation of conditional
distributions P(Y|X) through quantile regression. This is an
approach that allows maximal flexibility in understanding
and modeling the dependence of a response Y on features X,
as each quantile (=percentile) of P(Y|X) can be modeled
separately. I will introduce kernel quantile regression
(i.e., quantile regression in a reproducing kernel Hilbert
space) as the main modeling tool of interest.2. The
statistical role of regularization and the importance of
model selection. Model selection can be driven by
theoretical considerations (like SRM) or by empirical
performance (via cross validation), which is our focus.3.
Efficient generation of ''regularization paths'' to
regularized problems. The idea here is to use geometrical
considerations and homotopy approaches to follow solution
paths of regularized problems for a small computational
cost. Such algorithms have been developed for support vector
machines, lasso, and other important modeling tools.
Bringing these three elements together, I will present an
algorithm to solve the quantile regression problem for all
quantiles simultaneously, and obtain the optimal cross
validated solution for every quantile, via a path-following
approach.

**Revealing Structure in Unseeable Data**

Noam Slonim, IBM Haifa Research Lab

Given the pairwise affinity relations associated with a set
of data items, clustering algorithms aim to reveal the
underlying data structure by automatically partitioning the
data into a small number of homogeneous clusters. An
inherent difficulty in this task is the fact that the input
size is quadratic in the number of data points.
Specifically, if the number of data points is large or if
computing each affinity relation is demanding, existing
algorithms become infeasible. In this talk I will describe
novel rigorous clustering strategies to handle such
scenarios, and will present results for various domains of
great practical importance. In particular, I will argue that
in many real world problems the underlying data structure
can be revealed by observing only small fractions of the
entire data.

**Keynote: Learning in an Integrated Intelligent System:
Examples from the CALO System**

Thomas G. Dietterich, Director of Intelligent Systems
Research, Oregon State University

CALO (Cognitive Assistant that Learns and Organizes) is a
large 5-year effort to build an integrated intelligent
assistant for the desktop knowledge worker. A central focus
of CALO was on integrating modern machine learning methods
into an AI system. This talk will describe two themes
emerging from this research: (a) an architecture for
integrating multiple learning components to achieve
end-to-end learning and (b) methods for learning procedures
from user demonstrations by combining linguistic and data
flow knowledge. The talk will include videos and
demonstrations of some CALO components.

**Solving Biological Problems with Probabilistic
Graphical Models**

Eran Segal, Weizmann Institute of Science

Genomic datasets, spanning many organisms and data types,
are rapidly being produced, creating new opportunities for
understanding the molecular mechanisms underlying human
disease, and for studying complex biological processes on a
global scale. Transforming these immense amounts of data
into biological information is a challenging task. In this
talk, I will describe a statistical modeling language that
we developed, based on Bayesian networks, for representing
heterogeneous biological entities and modeling the mechanism
by which they interact. I will present the statistical
learning approaches that we employ to automatically learn
the details of these models (structure and parameters), and
outline the computational and mathematical challenges that
lie ahead.

**Machine Learning Techniques inside the Medical
Information Hub**

Andre Elisseeff, IBM Zurich Research Lab

The complexity emerging from modern information systems and
global standards is overwhelming medical practitioners and
patients. Text based user interactions on unstructured data
and complex standards take too much time and are not
intuitive. In this talk, we will present a tool aiming at
changing the interaction with medical records from a text
based to a graphical based paradigm. It enriches the user
experience by mapping all medical records to a browseable
virtual 2D/3D representation of a human body. It is based on
a large ontology (SNOMED with about half a million concepts)
for the mapping and relies on machine learning for semantic
search, data entry and user assistance as well as 3D
navigation. This talk will focus on the application of
machine learning techniques in the development of such a
tool.

**Projection Methods: Algorithmic Structure and
Acceleration Techniques**

Yair Censor, University of Haifa

Recognizing that the Perceptron algorithm is a projection
method inspires our interest in the class of projection
methods. Are there other methods in that class which can be
interpreted as, or modified into, machine learning
algorithms? What else can methods from this class "do" for
us? How do projection methods work? What are the algorithmic
structures available in the class of projection methods?
what can be said about convergence properties and/or
experimental initial behavior patterns? When, why and how
are projection methods preferable over other methods? How
can they be accelerated?

**Learning to See in Plato's Cave and Distributed
Teachers**

Ran Gilad-Bachrach, Intel Research Israel Lab

In the allegory of the cave, the great Greek philosopher
Plato describes a prisoner who is captured in a cave and can
not see reality but only its reflections as they appear as
shadows on the walls of the cave. The prisoner learns to
distinguish between the different objects based on the
shadows they cast. However, once the prisoner is released
and steps outside of the cave, will he be able to identify
objects in the reality? In this talk, we discuss the results
of an experiment we ran on learning object class recognition
tasks using images rendered by 3D image processing software.
These tools make it possible to generate multiple views of a
single object from different view points, under different
illuminations and more. Therefore, it is easy to generate
large and very informative training-sets using these tools.
However, in this setting, our learner is trapped in the cave
of virtual-reality images and the question remains how well
the learner generalize when escaping the cave and faced with
real world images.

Another method we present for augmenting available data for
machine vision algorithm is the use of distributed teachers.
Distributed teachers are plentiful but lack the inner
consistency one would expect from labels. We present
algorithms for unifying the labels of such labelers to
generate consistent labels.

This is joint work by Aharon Bar-Hillel, Chen Goldberg, Eyal
Krupka, Liat Ein-Dor and Yossi Ittach.