These pages cover the content of the 2017-2018 ML study group attempts to focus on aspects of ML and statistics that are relevant or may be helpful in achieving industry-ready ML-based systems. This content can also serve as a general resource for the study of ML that focuses on concepts, mathematics, and statistics. The topics covered are detailed on the left sidebar. The Home page also includes a crash directory on ML for newcomers, as well as a general Statement of Interest regarding the construction of AI systems and abstraction. On the right you will find a summary of a motivating lecture given to high school students.
Traditional software is built using information hiding and the decomposition process. The system is divided into sub-systems, which are then broken down into components, and so on until the lowest object in the hierarchy is defined. The introduction of components implemented using machine learning and dependent on data introduces a host of new challenges to the system readability. Challenges include a good match of business objectives, machine learning optimization objectives and relevant data, stability of the quality of the solution due to changes in data, artistic choices of machine learning parameters, as well as correctness of components in probability.
The second part of the presentation below gives more details on potential pitfalls in the development of ML systems:
ML concepts and pitfalls
ML crash directory
Are you familiar with regression? One way to view ML is regression on steroids... which mean a harder optimization problem (one that does not have a close analytic solution and/or is not convex) with many parameters.
Let's consider supervised learning first. You are given n labeled data points,
( x1,y1),...,(xn,yn). Your objective is to find a function f(x)=y that best predicts y on a new batch of x. When y is continuous, it is called regression, and when it's discrete, it's known as classification.
There are two things to notice right away:
- To solve this, an optimization problem is defined; for example, a minimization of square error in our original regression problem.
- Trying to explain the given data completely, sometimes called extrapolation, is actually a pitfall; you may capture random trends and your prediction power may be hindered. This is called overfitting.
The basic intuition underlying many approaches to the classification problem is that had we known p(x, y) and given a new x we would have calculated p(x, y) for each y, and choose y with the greatest probability. The difficulty is that it is not easy to estimate p(x, y).
A simplifying independence assumption leads to the naive base approach that is intuitively covered in the first part of Ariel Kleiner's crash course on ML.
Yet another approach is to define an optimization that attempts to maximize performance on the training data while keeping f(x) simple. This is done in a variety of ways.
To deep dive on ML concepts see reference three below, and in addition a simple ML tutorial in python or R to master the subject.
- Introduction to programmers on why mL is useful to master. Notice that this introduction ignores the challenges of applying it where it excels and dealing with drift.
- A nice overview that starts with classification. The only thing to be careful of is the claim that neural networks are not statistical models. Estimating a neural network performance should be done using the same standard statistical tools, e.g., cross validation.
- An intuitive deep dive on the concepts of machine learning by Haul Daume III.
- A collection of different interesting books about ML.
Statement of Interest
AI is fundamentally concerned with the creation of higher, more abstract representations of the world from simpler representations, automatically by a machine. Ideally, such representations are required to be associated with statistical guarantees of their correctness.
Previous attempts to this end identified homomorphism in algebraic structures as a fundamental tool for abstraction. Early AI attempts applied it to solve simple board games by abstracting the board states. In addition, more recent advances in image processing suggest that symmetries in groups is a good way to capture abstraction by ignoring unimportant changes to the imageSome. More concretely, we say that s is a symmetry of f(x) = y if f(s(x)) = f(x). These two notions together suggest focusing on groups augmented with a probability measure to study the question of automatic abstraction.
We thus focus next on representation, symmetry, and homomorphism in groups.
This is a nice introduction to the concept of group representations with examples:
For any set X, the set of all 1-1 onto functions f: X -> X with the composition operation form a group. As mentioned above, a symmetry of f is a s: X -> X such that F(s(x)) = f(x).
The first half of this lecture by Alex Flournoy (up to ~32) motivates symmetries over transformation f: X -> X and introduces some relevant language such as continuous, discrete, infinite, compact, local, and global symmetries. See the associated lecture notes here.
Some highlights from the work on symmetry learning by Pedro Domingos et al:
- Symmetries are changes in the data obtained by group operations such as rotation of a chair you want the classifier to be invariant under.
- Symmetries may reduce the number of features; thus we can learn with less data and still achieve the golden ratios of number of features and size of training set.
- Symmetry may reduce a search space.
- It is not dependent on the mL method being used.
Study group meeting slide
YouTube playlist of the ML study group meetings - Hebrew and English
Some related papers:
- An algebraic abstraction approach to reinforcement learning
- An approximate homomorphism approach
- Symmetry based semantic meaning uses the concept of an orbit in a group to represent a set of paraphrases that defines implicitly the Semitic of a sentence
- Work on deep symmetry network