Eitan Farchi, Verification & Quality Technologies, IBM Research - Haifa


These pages cover the content of the 2017-2018 ML study group attempts to focus on aspects of ML and statistics that are relevant or may be helpful in achieving industry-ready ML-based systems. This content can also serve as a general resource for the study of ML that focuses on concepts, mathematics, and statistics. The topics covered are detailed on the left sidebar. The Home page also includes a crash directory on ML for newcomers, as well as a general Statement of Interest regarding the construction of AI systems and abstraction. On the right you will find a summary of a motivating lecture given to high school students.

Traditional software is built using information hiding and the decomposition process. The system is divided into sub-systems, which are then broken down into components, and so on until the lowest object in the hierarchy is defined. The introduction of components implemented using machine learning and dependent on data introduces a host of new challenges to the system readability. Challenges include a good match of business objectives, machine learning optimization objectives and relevant data, stability of the quality of the solution due to changes in data, artistic choices of machine learning parameters, as well as correctness of components in probability. 

The second part of the presentation below gives more details on potential pitfalls in the development of ML systems:

ML crash directory

Are you familiar with regression? One way to view ML is regression on steroids... which mean a harder optimization problem (one that does not have a close analytic solution and/or is not convex) with many parameters.

Let's consider supervised learning first. You are given n labeled data points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best predicts y on a new batch of x. When y is continuous, it is called regression, and when it's discrete, it's known as classification.

    There are two things to notice right away:
  1. To solve this, an optimization problem is defined; for example, a minimization of square error in our original regression problem.
  2. Trying to explain the given data completely, sometimes called extrapolation, is actually a pitfall; you may capture random trends and your prediction power may be hindered. This is called overfitting.

The basic intuition underlying many approaches to the classification problem is that had we known p(x, y) and given a new x we would have calculated p(x, y) for each y, and choose y with the greatest probability. The difficulty is that it is not easy to estimate p(x, y).

A simplifying independence assumption leads to the naive base approach that is intuitively covered in the first part of Ariel Kleiner's crash course on ML.

Yet another approach is to define an optimization that attempts to maximize performance on the training data while keeping f(x) simple. This is done in a variety of ways.

To deep dive on ML concepts see reference three below, and in addition a simple ML tutorial in python or R to master the subject.


  1. Introduction to programmers on why mL is useful to master. Notice that this introduction ignores the challenges of applying it where it excels and dealing with drift.
  2. A nice overview that starts with classification. The only thing to be careful of is the claim that neural networks are not statistical models. Estimating a neural network performance should be done using the same standard statistical tools, e.g., cross validation.
  3. An intuitive deep dive on the concepts of machine learning by Haul Daume III.
  4. A collection of different interesting books about ML.

Statement of Interest

AI is fundamentally concerned with the creation of higher, more abstract representations of the world from simpler representations, automatically by a machine. Ideally, such representations are required to be associated with statistical guarantees of their correctness.

Previous attempts to this end identified homomorphism in algebraic structures as a fundamental tool for abstraction. Early AI attempts applied it to solve simple board games by abstracting the board states. In addition, more recent advances in image processing suggest that symmetries in groups is a good way to capture abstraction by ignoring unimportant changes to the imageSome. More concretely, we say that s is a symmetry of f(x) = y if f(s(x)) = f(x). These two notions together suggest focusing on groups augmented with a probability measure to study the question of automatic abstraction.

We thus focus next on representation, symmetry, and homomorphism in groups.
This is a nice introduction to the concept of group representations with examples:

For any set X, the set of all 1-1 onto functions f: X -> X with the composition operation form a group. As mentioned above, a symmetry of f is a s: X -> X such that F(s(x)) = f(x).
The first half of this lecture by Alex Flournoy (up to ~32) motivates symmetries over transformation f: X -> X and introduces some relevant language such as continuous, discrete, infinite, compact, local, and global symmetries. See the associated lecture notes here.

    Some highlights from the work on symmetry learning by Pedro Domingos et al:
  1. Symmetries are changes in the data obtained by group operations such as rotation of a chair you want the classifier to be invariant under.
  2. Symmetries may reduce the number of features; thus we can learn with less data and still achieve the golden ratios of number of features and size of training set.
  3. Symmetry may reduce a search space.
  4. It is not dependent on the mL method being used.
    Some related papers:
  1. An algebraic abstraction approach to reinforcement learning
  2. An approximate homomorphism approach
  3. Symmetry based semantic meaning uses the concept of an orbit in a group to represent a set of paraphrases that defines implicitly the Semitic of a sentence
  4. Work on deep symmetry network

Motivating high-school kids

The following set of resources are aimed at motivating high school kids. Only one link is in Hebrew.

למידה חישובית הינה תחום מדעי שמשנה את חיינו. בעתיד מכוניות יסעו ללא נהג והחלטות יתקבלו על-ידי מכונה או בעזרתה. בקורס נלמד את המושגים היסודים של למידה חישובית ואיך לישמם בעזרת שפת התיכנות phyton.
כדי להשתתף בקורס נדרשת גישה לאינטרנט.
ההשתתפות פשוטה. עליך להירשם עלידי שליחת בקשת הרשמה אל
תקבל הודעה כאשר שעור חדש מוכן ותוכל לשאול שאלות בבלוג.
את השעור הראשון ניתן למצא כאן -

Self driving car concept. No driver.

Prof. Shai Shalev-Shwartz demo on the road to Jerusalem

Balancing balls demo

3d simulation of neural network

Better intersection handling

זיהוי דימנציה דוגמה למסוג בינארי

How a simple neural network recognize digits with a lot of animation

Learning to walk